Deterministic metrics
These metrics are created by logical tests that are run on LLM output.
Assertion Type | Returns true if... |
---|---|
assert-set | A configurable threshold of grouped assertions pass |
classifier | HuggingFace classifier returns expected class above threshold |
contains | output contains substring |
contains-all | output contains all list of substrings |
contains-any | output contains any of the listed substrings |
contains-json | output contains valid json (optional json schema validation) |
contains-html | output contains HTML content |
contains-sql | output contains valid sql |
contains-xml | output contains valid xml |
cost | Inference cost is below a threshold |
equals | output matches exactly |
f-score | F-score is above a threshold |
finish-reason | model stopped for the expected reason |
icontains | output contains substring, case insensitive |
icontains-all | output contains all list of substrings, case insensitive |
icontains-any | output contains any of the listed substrings, case insensitive |
is-html | output is valid HTML |
is-json | output is valid json (optional json schema validation) |
is-sql | output is valid SQL statement (optional authority list validation) |
is-valid-function-call | Ensure that the function call matches the function's JSON schema |
is-valid-openai-function-call | Ensure that the function call matches the function's JSON schema |
is-valid-openai-tools-call | Ensure all tool calls match the tools JSON schema |
is-xml | output is valid xml |
javascript | provided Javascript function validates the output |
latency | Latency is below a threshold (milliseconds) |
levenshtein | Levenshtein distance is below a threshold |
perplexity-score | Normalized perplexity |
perplexity | Perplexity is below a threshold |
pi | Pi Labs scorer returns score above threshold |
python | provided Python function validates the output |
regex | output matches regex |
rouge-n | Rouge-N score is above a given threshold |
select-best | Output is selected as best among multiple outputs |
similar | Embedding similarity is above threshold |
starts-with | output starts with string |
trace-span-count | Count spans matching patterns with min/max thresholds |
trace-span-duration | Check span durations with percentile support |
trace-error-spans | Detect errors in traces by status codes, attributes, and messages |
webhook | provided webhook returns {pass: true} |
Every test type can be negated by prepending not-
. For example, not-equals
or not-regex
.
Assertion types
Contains
The contains
assertion checks if the LLM output contains the expected value.
Example:
assert:
- type: contains
value: 'The expected substring'
The icontains
is the same, except it ignores case:
assert:
- type: icontains
value: 'The expected substring'
Contains-All
The contains-all
assertion checks if the LLM output contains all of the specified values.
Example:
assert:
- type: contains-all
value:
- 'Value 1'
- 'Value 2'
- 'Value 3'
Contains-Any
The contains-any
assertion checks if the LLM output contains at least one of the specified values.
Example:
assert:
- type: contains-any
value:
- 'Value 1'
- 'Value 2'
- 'Value 3'
For case insensitive matching, use icontains-any
.
Regex
The regex
assertion checks if the LLM output matches the provided regular expression.
Example:
assert:
- type: regex
value: "\\d{4}" # Matches a 4-digit number
Contains-JSON
The contains-json
assertion checks if the LLM output contains a valid JSON structure.
Example:
assert:
- type: contains-json
You may optionally set a value
as a JSON schema in order to validate the JSON contents:
assert:
- type: contains-json
value:
required:
- latitude
- longitude
type: object
properties:
latitude:
minimum: -90
type: number
maximum: 90
longitude:
minimum: -180
type: number
maximum: 180
JSON is valid YAML, so you can also just copy in any JSON schema directly:
assert:
- type: contains-json
value:
{
'required': ['latitude', 'longitude'],
'type': 'object',
'properties':
{
'latitude': { 'type': 'number', 'minimum': -90, 'maximum': 90 },
'longitude': { 'type': 'number', 'minimum': -180, 'maximum': 180 },
},
}
If your JSON schema is large, import it from a file:
assert:
- type: contains-json
value: file://./path/to/schema.json
See also: is-json
Contains-Html
The contains-html
assertion checks if the LLM output contains HTML content. This is useful when you want to verify that the model has generated HTML markup, even if it's embedded within other text.
Example:
assert:
- type: contains-html
The assertion uses multiple indicators to detect HTML:
- Opening and closing tags (e.g.,
<div>
,</div>
) - Self-closing tags (e.g.,
<br />
,<img />
) - HTML entities (e.g.,
&
,
,{
) - HTML attributes (e.g.,
class="example"
,id="test"
) - HTML comments (e.g.,
<!-- comment -->
) - DOCTYPE declarations
This assertion requires at least two HTML indicators to avoid false positives from text like "a < b" or email addresses.
Is-Html
The is-html
assertion checks if the entire LLM output is valid HTML (not just contains HTML fragments). The output must start and end with HTML tags, with no non-HTML content outside the tags.
Example:
assert:
- type: is-html
This assertion will pass for:
- Complete HTML documents:
<!DOCTYPE html><html>...</html>
- HTML fragments:
<div>Content</div>
- Multiple elements:
<h1>Title</h1><p>Paragraph</p>
- Self-closing tags:
<img src="test.jpg" />
It will fail for:
- Plain text:
Just text
- Mixed content:
Text before <div>HTML</div> text after
- XML documents:
<?xml version="1.0"?><root>...</root>
- Incomplete HTML:
<div>Unclosed div
- Non-HTML content with HTML inside:
Here is some HTML: <div>test</div>
Contains-Sql
This assertion ensure that the output is either valid SQL, or contains a code block with valid SQL.
assert:
- type: contains-sql
See is-sql
for advanced usage, including specific database types and allowlists for tables and columns.
Cost
The cost
assertion checks if the cost of the LLM call is below a specified threshold.
This requires LLM providers to return cost information. Currently this is only supported by OpenAI GPT models and custom providers.
Example:
providers:
- openai:gpt-4.1-mini
- openai:gpt-4
assert:
# Pass if the LLM call costs less than $0.001
- type: cost
threshold: 0.001
Equality
The equals
assertion checks if the LLM output is equal to the expected value.
Example:
assert:
- type: equals
value: 'The expected output'
You can also check whether it matches the expected JSON format.
assert:
- type: equals
value: { 'key': 'value' }
If your expected JSON is large, import it from a file:
assert:
- type: equals
value: 'file://path/to/expected.json'
Is-JSON
The is-json
assertion checks if the LLM output is a valid JSON string.
Example:
assert:
- type: is-json
You may optionally set a value
as a JSON schema. If set, the output will be validated against this schema:
assert:
- type: is-json
value:
required:
- latitude
- longitude
type: object
properties:
latitude:
minimum: -90
type: number
maximum: 90
longitude:
minimum: -180
type: number
maximum: 180
JSON is valid YAML, so you can also just copy in any JSON schema directly:
assert:
- type: is-json
value:
{
'required': ['latitude', 'longitude'],
'type': 'object',
'properties':
{
'latitude': { 'type': 'number', 'minimum': -90, 'maximum': 90 },
'longitude': { 'type': 'number', 'minimum': -180, 'maximum': 180 },
},
}
If your JSON schema is large, import it from a file:
assert:
- type: is-json
value: file://./path/to/schema.json
Is-XML
The is-xml
assertion checks if the entire LLM output is a valid XML string. It can also verify the presence of specific elements within the XML structure.
Example:
assert:
- type: is-xml
This basic usage checks if the output is valid XML.
You can also specify required elements:
assert:
- type: is-xml
value:
requiredElements:
- root.child
- root.sibling
This checks if the XML is valid and contains the specified elements. The elements are specified as dot-separated paths, allowing for nested element checking.
How it works
- The assertion first attempts to parse the entire output as XML using a parser (fast-xml-parser).
- If parsing succeeds, it's considered valid XML.
- If
value
is specified:- It checks for a requiredElements key with an array of required elements.
- Each element path (e.g., "root.child") is split by dots.
- It traverses the parsed XML object following these paths.
- If any required element is not found, the assertion fails.
Examples
Basic XML validation:
assert:
- type: is-xml
Passes for: <root><child>Content</child></root>
Fails for: <root><child>Content</child></root
(missing closing tag)
Checking for specific elements:
assert:
- type: is-xml
value:
requiredElements:
- analysis.classification
- analysis.color
Passes for: <analysis><classification>T-shirt</classification><color>Red</color></analysis>
Fails for: <analysis><classification>T-shirt</classification></analysis>
(missing color element)
Checking nested elements:
assert:
- type: is-xml
value:
requiredElements:
- root.parent.child.grandchild
Passes for: <root><parent><child><grandchild>Content</grandchild></child></parent></root>
Fails for: <root><parent><child></child></parent></root>
(missing grandchild element)
Inverse assertion
You can use the not-is-xml
assertion to check if the output is not valid XML:
assert:
- type: not-is-xml
This will pass for non-XML content and fail for valid XML content.
Note: The is-xml
assertion requires the entire output to be valid XML. For checking XML content within a larger text, use the contains-xml
assertion.
Contains-XML
The contains-xml
is identical to is-xml
, except it checks if the LLM output contains valid XML content, even if it's not the entire output. For example, the following is valid.
Sure, here is your xml:
<root><child>Content</child></root>
let me know if you have any other questions!
Is-SQL
The is-sql
assertion checks if the LLM output is a valid SQL statement.
Example:
assert:
- type: is-sql
To use this assertion, you need to install the node-sql-parser
package. You can install it using npm:
npm install node-sql-parser
You can optionally set a databaseType
in the value
to determine the specific database syntax that your LLM output will be validated against. The default database syntax is MySQL. For a complete and up-to-date list of supported database syntaxes, please refer to the node-sql-parser documentation.
The supported database syntax list:
- Athena
- BigQuery
- DB2
- FlinkSQL
- Hive
- MariaDB
- MySQL
- Noql
- PostgresQL
- Redshift
- Snowflake(alpha)
- Sqlite
- TransactSQL
Example:
assert:
- type: is-sql
value:
databaseType: 'MySQL'
You can also optionally set a allowedTables
/allowedColumns
in the value
to determine the SQL authority list that your LLM output will be validated against.
The format of allowedTables:
{type}::{dbName}::{tableName} // type could be select, update, delete or insert
The format of allowedColumns:
{type}::{tableName}::{columnName} // type could be select, update, delete or insert
For SELECT *
, DELETE
, and INSERT INTO tableName VALUES()
without specified columns, the .*
column authority regex is required.
Example:
assert:
- type: is-sql
value:
databaseType: 'MySQL'
allowedTables:
- '(select|update|insert|delete)::null::departments'
allowedColumns:
- 'select::null::name'
- 'update::null::id'
is-valid-function-call
This ensures that any JSON LLM output adheres to the schema specified in the functions
configuration of the provider. This is implemented for a subset of providers. Learn more about the Google Vertex provider, Google AIStudio provider, Google Live provider and OpenAI provider, which this is implemented for.
is-valid-openai-function-call
Legacy - please use is-valid-function-call instead. This ensures that any JSON LLM output adheres to the schema specified in the functions
configuration of the provider. Learn more about the OpenAI provider.