Skip to content

Add evals for the bindings server and a hyperdrive binding #117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Apr 30, 2025

Conversation

deloreyj
Copy link
Collaborator

This PR adds tools for Hyperdrive and introduces evals to the workers-bindings MCP server

input: 'List all my Cloudflare accounts.',
expected: 'The accounts_list tool should be called to retrieve the list of accounts.',
},
{
Copy link
Collaborator

@cmsparks cmsparks Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would split these into separate describe eval scripts. Like

describeEval("List accounts", ...)
describeEval("Set account", ...)

Then you don't need to have the if statements below, because to me that feels like a bit of a code smell in tests.

Copy link
Collaborator

@cmsparks cmsparks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

const client = await initializeClient(/* Pass necessary mocks/config */)
const { promptOutput, toolCalls, fullResult } = await runTask(client, model, input)

if (input.includes('List all my Cloudflare KV Namespaces')) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I'd split this into a separate describe eval.

@cmsparks cmsparks self-requested a review April 30, 2025 20:59
@deloreyj deloreyj merged commit 38aa001 into main Apr 30, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants