Comparing changes

Add raw request/response logging with RAGAS_ENABLE_HTTP_LOG env flag

fixes: #1907

…#1906) #1904

The issue arises because `self.single_turn_prompt.instruction` and `self.multi_turn_prompt.instruction` were not being properly assigned during the initialization of `SimpleCriteriaScore()`.

This bug probably raise a KeyError since `personas` is not set. This could be a bug due to an oversight. Related to #1917

…r) (#1910) This PR updates the documentation to correctly describe the Semantic similarity. ### Issue The documentation previously stated that a **cross-encoder** was used for computing the semantic similarity score. However, after reviewing the implementation, it is clear that the current approach follows a **bi-encoder** strategy: - The ground truth and response are encoded independently - Their embeddings are then compared using cosine similarity A cross-encoder would typically process both texts together in a single forward pass (e.g., concatenating them before encoding), which is not the case in the current implementation. ### Current Implementation For example, in the current implementation: ```python embedding_1 = np.array(await self.embeddings.embed_text(ground_truth)) embedding_2 = np.array(await self.embeddings.embed_text(answer)) # Normalization factors of the above embeddings norms_1 = np.linalg.norm(embedding_1, keepdims=True) norms_2 = np.linalg.norm(embedding_2, keepdims=True) embedding_1_normalized = embedding_1 / norms_1 embedding_2_normalized = embedding_2 / norms_2 similarity = embedding_1_normalized @ embedding_2_normalized.T score = similarity.flatten() ``` This code shows that the ground truth and response are encoded separately, and their similarity is computed using cosine similarity, which is characteristic of a **bi-encoder** approach. ### Fix The term "cross-encoder" has been corrected to "bi-encoder" in the documentation to ensure consistency with the actual implementation.

first iteration of Nvidia accuracy metric Added a simple implementation for single turn accuracy, context_relevance and answer_groundedness metrics --------- Co-authored-by: jjmachan <[email protected]>

Add @t.runtime_checkable to the ModeMetric protocol to allow runtime type checking using isinstance() and issubclass().

- r2r integration - fixed spelling mistake in swarm tutorial - added @t.runtime_checkable to metric mode --------- Co-authored-by: Jithin James <[email protected]>

Co-authored-by: Jithin James <[email protected]>

Easy fix Groundedness metric, 5 retries early break. Added logger to the 3 nv_metrics retries. Fixed input context max lenght to 7k to avoid 8k break.

- the current code was logging a name and mode of noise sensitivity metric instead of just the name

When saving the knowledge graph, all the fields of each node is no longer stored in the relationships. This can save a very large amount of hard disk space.

Use json.dumps() on the dict output of Pydantics's BaseModel.model_json_schema() to ensure the string representation is valid JSON using double quotes instead of single quotes.

…th Token Count 101–500 (#1934) The `default_ transforms` function defined at `src/ragas/testset/transforms/default.py` has a problem with handling transforms for documents with 101-500 tokens. The code divides the `transforms` configurations based on the document's token count. Several transforms are instantiated when the ["101-500" token count bins the first quartile (Q1](https://github.com/explodinggradients/ragas/blob/2bc29a2b8358ddb6b167fdf7ab0518ad9371463c/src/ragas/testset/transforms/default.py#L128), among them the `cosine_sim_builder`. While `cosine_sim_builder` is correctly **instantiated** (line 139), it's then **not included** in the list of transforms that are actually returned (line 153). It appears that `cosine_sim_builder` was likely unintentionally omitted from the returned transforms list. The intended behavior should probably mirror how `ner_overlap_sim` is handled (line 120), where `cosine_sim_builder` is instantiated and added to the returned list. The current code effectively instantiates `cosine_sim_builder` but then discards it. This omission might impact the number of relationships created in the knowledge graph.

1. I have noticed that if `response.statements` is an empty list, then the output of `verify_claims` function is `array([], dtype=float64)`, which raises a type error when `~` operation is applied to `reference_response` variable. This occurs in a rare case when hypothesis_list is empty (when there are no claims in the response). 2. A small change is the choice of words in `LLMDidNotFinishException` --------- Co-authored-by: Mikhail Zybin <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comparing changes

Open a pull request

Commits on Feb 5, 2025

Commits on Feb 8, 2025

Commits on Feb 10, 2025

Commits on Feb 14, 2025

Commits on Feb 18, 2025

Commits on Feb 20, 2025

Commits on Feb 24, 2025

Commits on Mar 3, 2025

Commits on Mar 4, 2025

This comparison is taking too long to generate.

Uh oh!