Skip to content

Commit 9d4ab08

Browse files
committed
Minor fixes to code and documentation.
1 parent b30f8a1 commit 9d4ab08

File tree

3 files changed

+12
-25
lines changed

3 files changed

+12
-25
lines changed

nemoguardrails/eval/data/topical/README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ For additional information about topical rails evaluation and results on the two
2828
### Chit-chat dataset
2929

3030
We are using a slightly modified version of the chit-chat dataset available [here](https://github.com/rahul051296/small-talk-rasa-stack).
31-
For this dataset, we have configured a [Guardrail app](./chitchat) that already has the:
31+
For this dataset, we have configured a [Guardrail app](./chitchat) that already has:
3232
- Config file: `config.yml`
3333
- A set of defined flows: `flows.co`
3434
- A set of predefined bot messages for the topical rails: `bot.co`
@@ -51,7 +51,7 @@ To run the topical evaluation on this dataset run:
5151
### Banking dataset
5252

5353
We are starting from the banking dataset available [here](https://github.com/PolyAI-LDN/task-specific-datasets/tree/master/banking_data).
54-
For this dataset, we have configured a [Guardrail app](./banking) that already has the:
54+
For this dataset, we have configured a [Guardrail app](./banking) that already has:
5555
- Config file: `config.yml`
5656
- A set of defined flows: `flows.co`
5757
- A file mapping the user intents in the original dataset to user canonical forms used by Guardrails: `categories_canonical_forms.json`
@@ -73,5 +73,7 @@ To run the topical evaluation on this dataset run:
7373

7474
If you want to assess the performance of topical rails with a new NLU dataset, you can use the `./nemoguardrails/eval/data/topical/dataset_tools.py` functionality.
7575
For each dataset, you need to define a new class that extends the `DatasetConnector` class and implements the two following two functions:
76-
- `read_dataset`
77-
- `_read_canonical_forms`
76+
- `read_dataset`: Reads the dataset from the specified path, instantiating at least intent names, intent canonical forms, and intent samples.
77+
The path received as parameter should contain the original dataset files, in the specific format they were distributed.
78+
- `_read_canonical_forms`: Reads the intent - canonical form mappings from a file.
79+
This can be a `json` or any other format and should be created by the evaluation user as the mapping is not part of the original dataset.

nemoguardrails/eval/data/topical/create_colang_intent_file.py

Lines changed: 0 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -62,21 +62,6 @@ def main(
6262
output_file_name="./chitchat/user.co",
6363
num_samples_per_intent=max_samples_intent,
6464
)
65-
66-
with open("./chitchat/user.co") as file1:
67-
lines1 = file1.readlines()
68-
intents1 = []
69-
for line in lines1:
70-
if line.startswith("define user"):
71-
intents1.append(line)
72-
73-
with open("./../../../../evals/config/chitchat/user.co") as file2:
74-
lines2 = file2.readlines()
75-
for line in lines2:
76-
if line.startswith("define user"):
77-
if line not in intents1:
78-
print("Not found: " + line)
79-
8065
print("Created user.co file for banking dataset.")
8166
else:
8267
print(f"Unknown dataset {dataset_name}, cannot create user.co file!")

nemoguardrails/eval/data/topical/dataset_tools.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -139,8 +139,10 @@ def _read_canonical_forms(
139139
for intent_canonical_entry in data:
140140
if len(intent_canonical_entry) != 2:
141141
print(
142-
"Problem: no canonical form found or too many canonical forms!"
142+
f"Problem: no canonical form found or too many canonical forms "
143+
f"for entry {intent_canonical_entry}!"
143144
)
145+
continue
144146
intent = intent_canonical_entry[0]
145147
canonical_form = intent_canonical_entry[1]
146148
intent_canonical_forms[intent] = canonical_form
@@ -186,8 +188,6 @@ def read_dataset(self, dataset_path: str = BANKING77_FOLDER) -> None:
186188
)
187189
)
188190

189-
return None
190-
191191

192192
class ChitChatConnector(DatasetConnector):
193193
CHITCHAT_FOLDER = "./chitchat/original_dataset/"
@@ -208,8 +208,10 @@ def _read_canonical_forms(
208208
for intent_canonical_entry in data:
209209
if len(intent_canonical_entry) != 2:
210210
print(
211-
"Problem: no canonical form found or too many canonical forms!"
211+
f"Problem: no canonical form found or too many canonical forms "
212+
f"for entry {intent_canonical_entry}!"
212213
)
214+
continue
213215
intent = intent_canonical_entry[0]
214216
canonical_form = intent_canonical_entry[1]
215217
intent_canonical_forms[intent] = canonical_form
@@ -261,5 +263,3 @@ def read_dataset(self, dataset_path: str = CHITCHAT_FOLDER) -> None:
261263
intent=intent, text=text, dataset_split=dataset_type
262264
)
263265
)
264-
265-
return None

0 commit comments

Comments
 (0)