address-deduplicator-stream

1.0.4 • Public • Published

address deduplicator stream

A stream that performs address deduplication using the robust OpenVenues deduplicator; note that it must be separately installed and running.

API

address-deduplicator-stream exports a single function: createDeduplicateStream( requestBatchSize, maxLiveRequests, serverUrl ), which accepts three optional arguments:

  • requestBatchSize (default: 100): The number of addresses to buffer into a batch before sending it to the deduplicator. The higher the number, the less time and energy collectively spent in making requests, but the bigger the memory consumption buildup.
  • maxLiveRequests (default: 10): Since the deduper is implemented as a standalone server and processes data more slowly than the importer feeds it, the stream needs to rate-limit itself. maxLiveRequests indicates the maximum number of unresolved concurrent requests at any time; when that number is hit, the stream will pause reading until the number of concurrent requests falls below it.
  • serverUrl (default: 'http://localhost:5000'): The HTTP base URL of the address deduplicator server.

and returns a Transform stream, which accepts un-deduplicated addresses and filters out the duplicates; note that it'll likely be the slowest part of your data pipeline because of all the involved heavy lifting. The addresses themselves are expected to be pelias/model Document objects.

/address-deduplicator-stream/

    Package Sidebar

    Install

    npm i address-deduplicator-stream

    Weekly Downloads

    1

    Version

    1.0.4

    License

    MIT

    Last publish

    Collaborators

    • missinglink
    • dianashk
    • julian_mapzen
    • pelias
    • trescube