Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make BloomFilter.bitSize() public #6866

Open
4 tasks done
MartinHaeusler opened this issue Dec 9, 2023 · 0 comments
Open
4 tasks done

Make BloomFilter.bitSize() public #6866

MartinHaeusler opened this issue Dec 9, 2023 · 0 comments

Comments

@MartinHaeusler
Copy link

MartinHaeusler commented Dec 9, 2023

1. What are you trying to do?

I am using Guava's bloom filters as part of a persistent file format, i.e. the raw byte array of the bloom filter lives somewhere in the file. It would be beneficial to the efficiency to know the length of the byte array produced by the bloom filter beforehand (i.e. without actually serializing it).

There is already a method called bitSize in the BloomFilter, but unfortunately it is not public. The method also doesn't include the two bytes from the strategy and the number of hash functions, as well as the integer for the length of the bits.data array.

2. What's the best code you can write to accomplish that without the new feature?

public int getByteSizeOf(BloomFilter<*> bloomFilter) {
    return serialize(bloomFilter).length;
}

public byte[] serialize(BloomFilter<*> bloomFilter){
    try(var baos = new ByteArrayOutputStream()) {
        bloomFilter.writeTo(baos);
        return baos.toByteArray();
    }
}

The method getSizeOf is very inefficient because it actually serializes the bloom filter to get its size. It would be nice if we could do it without the serialization.

3. What would that same code look like if we added your feature?

BloomFilter<*> bloom = ...;
var size = bloom.getSizeInBytes();

(Optional) What would the method signatures for your feature look like?

public class BloomFilter<T> {

    public int getSizeInBytes();

}

Concrete Use Cases

Serialization of the bloom filter as a building block for more complex formats.

Packages

com.google.common.hash

Checklist

@MartinHaeusler MartinHaeusler added the type=addition A new feature label Dec 9, 2023
@kluever kluever changed the title BloomFilter: add method to get the binary size in public API Make BloomFilter.bitSize() public Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants