Web application implementing the RoBERTa NLP model to answer questions given text(context) input. RoBERTa is an open source transformer model used for various NLP tasks such as text classification and Q&A.
- Two docker containers part of the same network
- Python API container (FastAPI, RoBERTa, Query DB)
- Elastic Search container (For paragraph ranking)
The approach is simple, given that transformers models are not optimized for a large corpus, such as hundreds of paragraphs, we first pass the text file through Elasticsearch, then, the answer is extracted from the ranked pagraphs.
- Create the docker network (connect API with ElasticSearch)
docker network create qa-net
- Build and start the API image
docker build -t qa-img -f Dockerfile .
docker run -d --network qa-net -e PORT=8000 -p 80:8000 qa-img
- Pull and start the Elastic Search container with Docker
docker pull docker.elastic.co/elasticsearch/elasticsearch:7.17.6
docker run -d --name es01 --network qa-net -d -p 127.0.0.1:9200:9200 -p 127.0.0.1:9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.17.6
Note: The name of the ElasticSearch container is the address the API will make requests to.
- Modify the baseURL in apis/axiosManager.js with your local port (where the API is running).
baseURL = "http://127.0.0.1:<your port>"
- Type a context and a question, the latter should be retrieved from the former.
- If desired, the app can store your query for metrics purposes.
- The AI understood that asking for Bill is also asking for Smith (same person). Not only that, but the text does not explicitly specifies the age of Bill (no "Bill is 36 years old...", no "Bill's age is ..."), it understood from the context that 36 is Bill's age.
- Additionally, provide a text file (PDF) as context.
- When uploading the file, the app will read and process the file, then store the text in ElasticSearch.
- The app can now receive questions where the response may be in the text file.
- ElasticSearch will rank paragraphs using BM25, then RoBERTa will extract the answer from these.