Show HN: Extending RocksDB to Deduplicate Values

2 points

6 months ago

I've come across the problem a few times to need to remove duplicate values from my data. Usually, the data are higher level objects like images or text blobs. I end up writing custom deduplication pipelines every time.

I got sick of doing this over and over, so I wrote a wrapper around RocksDB that deduplicates values after a Put() operation. Currently only exact deduplication is performed, but I want to extend it in a number of ways, including semantic (fuzzy) deduplication for things like images and text.

Any feedback on the project would be appreciated:

https://github.com/demajh/prestige

No comments

No comments