I've never seen a Javascript
Bloom filter
implementation so
I had to write one, and then had to port it to
Perl and
Ruby.
You'll understand why soon if you don't already.
Quick aside: more than half of
"my"
bloomfilter.js is somebody
else's BSD-licensed Javascript MD5 library but you could pretty
easily replace it with some other hash library if you felt the
compulsion.
I don't think this one would be practical for really large input sets
(like a spell check dictionary built from /usr/dict/words), but I'm sure
you'll think of something if you work at it. Get moving on that.
To get your brain in gear, here's a small demonstration.
First, I put the text of a really excellent Wire song (I won't
say which), one word per line (minus punctuation) in a file.
Then using the Ruby version I ran the file through this:
require 'bloomfilter'
bf = BloomFilter.new({ 'bits' => 0xff, 'salts' => %w(z A) })
STDIN.each_line { |l| l.chomp!; bf.add(l.downcase) }
print(b.to_json)
Or maybe I used the Perl one and ran it through this:
require 'bloomfilter.pl'; # what is this, 1994?
my $bf = BloomFilter->new({ bits => 0xff, salts => [qw(z A)] });
while(<>) { chomp; $bf->add(lc $_); }
print $bf->as_JSON();
Doesn't matter. Both give the same output (though I changed the formatting
to make it a little easier to read):
{
"buckets":[1509645767,990260715,1062483727,150062916,551224088,354022058,1818090233,1463314593,110],
"bucketsize":31,
"bits":255,
"salts":["z","A"]
};
Type some Wire lyrics into the box below and smell the magic:
The filter check code runs 'onchange' for that text box.
Just type (or paste in) some text, then click outside the box and
you should get some feedback.
I encourage you to view the source of this page to see what's really going on
but here's a quick example of the Javascript interface for the impatient:
// choose an appropriate number of bits and salts
// (don't ask me; you can figure it out)
var bf = new BloomFilter({ bits: 0xffff, salts: ['x','y','z','!'] });
bf.add('one');
bf.add('two');
bf.add('three');
bf.test('one'); // will definitely return true
bf.test('two'); // will definitely return true
bf.test('three'); // will definitely return true
bf.test('eins'); // will probably return false
bf.test('zwei'); // will probably return false
bf.test('drei'); // will probably return false
If you find bugs in this, please let me know. Also I'd really like to
hear from anyone who finds other cool uses for this (aside from
Wire lyric guessing games). And just to get this
out of the way, I have no idea if this code can interoperate with
the Bloom::Filter module on the CPAN but I should probably find out.