Skip to content

Commit e8cffa8

Browse files
committed
Small updates on topical rails evaluation.
1 parent 1a4710f commit e8cffa8

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

nemoguardrails/eval/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,11 +45,12 @@ pick the most similar intent above this threshold.
4545
**Evaluation Results**
4646

4747
For the initial evaluation experiments for topical rails, we have used two datasets used for conversational NLU:
48-
- [_chit-chat_](https://github.com/RasaHQ/rasa-demo/blob/main/data/nlu/chitchat.yml) dataset
48+
- [_chit-chat_](https://github.com/rahul051296/small-talk-rasa-stack) dataset
4949
- [_banking_](https://github.com/PolyAI-LDN/task-specific-datasets/tree/master/banking_data) dataset
5050

5151
The datasets were transformed into a NeMo Guardrails app, by defining canonical forms for each intent, specific dialogue flows, and even bot messages (for the _chit-chat_ dataset alone).
5252
The two datasets have a large number of user intents, thus topical rails. One of them is very generic and with higher-grained intents (_chit-chat_), while the _banking_ dataset is domain-specific and more fine-grained.
53+
More details about running the topical rails evaluation experiments and the evaluation datasets is available [here](./data/topical/README.md).
5354

5455
Preliminary evaluation results follow next. In all experiments, we have chosen to have a balanced test set with at most 3 samples per intent.
5556
For both datasets, we have assessed the performance for various LLMs and also for the number of samples (`k = all, 3, 1`) per intent that are indexed in the vector database.

0 commit comments

Comments
 (0)