Skip to content

Add a cloudflare docs MCP server using autorag #67

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 23, 2025
Merged

Conversation

cmsparks
Copy link
Collaborator

@cmsparks cmsparks commented Apr 22, 2025

Waiting on my autorag to finish indexing, but this should be good for a review.

This autorag is intentionally unauthenticated, because it's not linked to any cloudflare specific resources.

@cmsparks cmsparks force-pushed the csparks/autorag-mcp branch from 6315cbe to 4c32657 Compare April 22, 2025 23:14
Copy link
Collaborator

@irvinebroque irvinebroque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, i wonder in practice if mcp clients end up using the optional arguments (especially scoreThreshold) erroneously, when they would have been better off omitting? but we'll learn from this

does make me wonder about instrumentation

@aninibread
Copy link

lgtm, i wonder in practice if mcp clients end up using the optional arguments (especially scoreThreshold) erroneously, when they would have been better off omitting? but we'll learn from this

does make me wonder about instrumentation

Thank you for putting this together!!

I also wonder how this would work in practice. With AI search the expectation is for the developer to preset that value from experimentation. Here are values I'd recommend to start out and people can tune:

  • max_results: 10
  • match_threshold: 0
    This ensures that something will definitely be retrieved, although might not be the most accurate.

- You are unsure of how to use some Cloudflare functionality
- You are writing Cloudflare Workers code and need to look up Workers-specific documentation

This tool returns a number of results from a vector database. These are embedded as resources in the response and are plaintext doucments in a variety of formats.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This tool returns a number of results from a vector database. These are embedded as resources in the response and are plaintext doucments in a variety of formats.
This tool returns a number of results from a vector database. These are embedded as resources in the response and are plaintext documents in a variety of formats.

@cmsparks cmsparks force-pushed the csparks/autorag-mcp branch from 4c32657 to 2333b63 Compare April 23, 2025 15:42
@cmsparks
Copy link
Collaborator Author

lgtm, i wonder in practice if mcp clients end up using the optional arguments (especially scoreThreshold) erroneously, when they would have been better off omitting? but we'll learn from this
does make me wonder about instrumentation

Thank you for putting this together!!

I also wonder how this would work in practice. With AI search the expectation is for the developer to preset that value from experimentation. Here are values I'd recommend to start out and people can tune:

* max_results: 10

* match_threshold: 0
  This ensures that something will definitely be retrieved, although might not be the most accurate.

Yeah, I think I'm also a bit fuzzy about what the match_threshold looks like in practice, so also not sure if the LLM would use that parameter correctly. In my testing, it seems to not be doing anything super weird with that match_threshold parameter though and I get reasonably relevant results.

@cmsparks cmsparks merged commit 2d76979 into main Apr 23, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants