Daniel Rench

Web application development : Servers : Networks : E-Mail : DNS : Databases : Programming for hire

previous : contact : linkedin : code : links : pictures : facebook : twitter : next

EHANN: Javascript (and Perl and Ruby) Bloom Filters

I've never seen a Javascript Bloom filter implementation so I had to write one, and then had to port it to Perl and Ruby. You'll understand why soon if you don't already.

Quick aside: more than half of "my" bloomfilter.js is somebody else's BSD-licensed Javascript MD5 library but you could pretty easily replace it with some other hash library if you felt the compulsion.

I don't think this one would be practical for really large input sets (like a spell check dictionary built from /usr/dict/words), but I'm sure you'll think of something if you work at it. Get moving on that.

To get your brain in gear, here's a small demonstration.

First, I put the text of a really excellent Wire song (I won't say which), one word per line (minus punctuation) in a file. Then using the Ruby version I ran the file through this:

require 'bloomfilter'

bf = BloomFilter.new({ 'bits' => 0xff, 'salts' => %w(z A) })

STDIN.each_line { |l| l.chomp!; bf.add(l.downcase) }

print(b.to_json)

Or maybe I used the Perl one and ran it through this:

require 'bloomfilter.pl'; # what is this, 1994?

my $bf = BloomFilter->new({ bits => 0xff, salts => [qw(z A)] });

while(<>) { chomp; $bf->add(lc $_); }
print $bf->as_JSON();

Doesn't matter. Both give the same output (though I changed the formatting to make it a little easier to read):

{
 "buckets":[1509645767,990260715,1062483727,150062916,551224088,354022058,1818090233,1463314593,110],
 "bucketsize":31,
 "bits":255,
 "salts":["z","A"]
};

Type some Wire lyrics into the box below and smell the magic:

 

The filter check code runs 'onchange' for that text box. Just type (or paste in) some text, then click outside the box and you should get some feedback.

I encourage you to view the source of this page to see what's really going on but here's a quick example of the Javascript interface for the impatient:

// choose an appropriate number of bits and salts
// (don't ask me; you can figure it out)

var bf = new BloomFilter({ bits: 0xffff, salts: ['x','y','z','!'] });

bf.add('one');
bf.add('two');
bf.add('three');

bf.test('one'); // will definitely return true
bf.test('two'); // will definitely return true
bf.test('three'); // will definitely return true

bf.test('eins'); // will probably return false
bf.test('zwei'); // will probably return false
bf.test('drei'); // will probably return false

If you find bugs in this, please let me know. Also I'd really like to hear from anyone who finds other cool uses for this (aside from Wire lyric guessing games). And just to get this out of the way, I have no idea if this code can interoperate with the Bloom::Filter module on the CPAN but I should probably find out.

<< Hash::Server: a (mostly) memcached-compatible network interface to Perl hashes | Home | Portfolio | Contact | acuff: Apache Configuration File Finder >>