Bloom filter implemented in C++
It's a Bloom filter, implemented in C++ for your JavaScript pleasure.
Quick refresher: a bloom filter has two methods: add()
and test()
. You
add()
an element (a utf-8 string, in this case) and voodoo happens; you
test()
for an element and it returns false
if the element is definitely
not in the set, or true
if the element is probably in the set.
With the right parameters, you can tweak "probably" to be pretty darned probable.
Usage
First, npm install overview-js-bloom-filter
.
Now:
var filter = m: 1234567 k: 3 ; // Add some Stringsfilter; // UCS-2 / UTF16-LE String: 6 bytes longfilter; // UTF-8 String: 3 bytes long // Test for things: Strings...console; // trueconsole; // falseconsole; // false: 'bar' is UTF-16 // A Buffer...console; // true // And even a _piece_ of a Buffer, to avoid a `slice()` object allocationconsole
Confused about what m
and k
should be? Try
http://hur.st/bloomfilter, which plugs numbers
into the formulae.
Implementation details
It's a basic bloom filter. It's built to be fast, so:
- The hashing function is Farmhash, because it's fast and well-tested.
- It's in C++ because Farmhash is in C++.
- We hash the string as 64-bit, then treat the result as two independent 32-bit hashes. All but the first two hashes are derived from the first two. Two is enough.
- We handle
Buffer
s because they let you write fast JavaScript.
LICENSE
AGPL-3.0. This project is (c) Overview Services Inc. Please contact us should you desire a more permissive license.