Sunday, January 4, 2015

Bloom filter using Google Guava

A Bloom filter is a data structure designed to tell you, rapidly and memory-efficiently, whether an element is present in a set.it does not actually store the element in the set to confirm availablity of the element.it is a probabilistic data structure which tells us that the element either definitely is not in the set or may be in the set.

The base data structure of a Bloom filter is a Bit Vector.
How BloomFilter works
Adding an element in the BloomFilter
Add the element to the filter.
Hash it a several times on some hashing techniques and set the bits to 1 where the index matches the results of the hash in the Bit Vector.

Test whether and element in the set or not
if an element is in the set, you follow the same hashing techniques and check if the bits are set to 1 or 0.
BloomFilter can guarantee an element does not exist. If the bits are not set, it’s simply impossible for the element to be in the set. However, a positive answer means the element is in the set or a hashing collision occurred.
/**
* 
*/
package com.rajkrrsingh.test.guava;

import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnel;
import com.google.common.hash.Sink;

/**
* @author rks
* @03-Jan-2015
*/
public class GuavaBloomFilterTest {

private static BloomFilter<Employee> bloomFilter = BloomFilter.create(new Funnel<Employee>() {

@Override
public void funnel(Employee emp, Sink into) {
into.putString(emp.getEmpid())
.putString(emp.getEmpName())
.putInt(emp.getAge());
}
}, 100);

public static void main(String[] args) {

for(Employee e : Employee.getEmployeeList()){
GuavaBloomFilterTest.bloomFilter.put(e);
}

boolean mightContain = GuavaBloomFilterTest.bloomFilter.mightContain(new Employee("101", "RKS", 10000, 31));
System.out.println(mightContain);

// negative test
boolean mightContain1 = GuavaBloomFilterTest.bloomFilter.mightContain(new Employee("102", "RKS", 10000, 31));
System.out.println(mightContain1);

}

}

Post a Comment