Dramatically cut down on token usage and speed up response times by up to 130x. A blend of Python and Go leverages the FAISS library for semantic similarity detection, ensuring cache hits are as accurate as they are fast. Easy to run with docker!
Still very much a work in progress... would appreciate feedback and/or contributors!