An AI assistant with a midas touch Mr. Bond! Goldthinker uses a sentence transformer model to convert the text you want to store into a vector that can be used for semantic search. The vector is then added to an index for approximate nearest neighbor search using the Annoy library. When you submit a question, the best matches are then fed via prompt to a the LLaMa 2 7b chat model as context. Finally, the output from the LLM is piped through TTS to be converted to audio.
- Create a free Hugging Face Account here
- Accept terms and request access to the Llama-2-7b-chat-hf model here
- Install huggingface-cli
- run
huggingface-cli login
Info here
- create a python environment using conda or venv (python >= 3.10 recommended)
pip install -r requirements.txt
cd web
npm install
- Optional: Download the weights for Llama-2-7b-chat-hf and set the value of
LLAMA_PATH
in your .env file
cd web
npm run dev
cd ..
flask --app api run
Note: The first time you run the server it will download around 10G of model files from HF if you did not do install step 5 above. This can take a while.