Skip to content

search_repositories returns unmanageable amount of data without pagination #105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aaronsb opened this issue Apr 4, 2025 · 2 comments · Fixed by #129
Closed

search_repositories returns unmanageable amount of data without pagination #105

aaronsb opened this issue Apr 4, 2025 · 2 comments · Fixed by #129

Comments

@aaronsb
Copy link

aaronsb commented Apr 4, 2025

Describe the bug

When using the search_repositories tool, the server returns a very large response with all matching repositories at once, without pagination or limiting the results to a manageable amount. This makes it difficult to process and use the data effectively, especially in contexts where the response needs to be displayed or processed in a user-friendly way.

Affected version

server version v0.1.0 (b89336793c5bc9b9abdd5100d876babbc1031f5d) 2025-04-04T15:38:19Z

Steps to reproduce the behavior

  1. Use the search_repositories tool with a somewhat generic query like "MCP Model Context Protocol"
  2. Receive a response with a large number of repositories (30+ in my case)
  3. The entire response is returned at once without pagination, making it difficult to manage

Expected vs actual behavior

Expected behavior:

  • The server should return a limited number of results (e.g., 10 repositories)
  • The response should include pagination information (e.g., total count, current page, next page token)
  • There should be a way to request subsequent pages of results

Actual behavior:

  • The server returns all matching repositories in a single response
  • No pagination mechanism is provided
  • The response can be extremely large and unwieldy for common search terms

Logs

The response included 30+ repositories with full details for each, resulting in a very large JSON payload. The beginning of the response looked like this:

{"total_count":1746,"incomplete_results":false,"items":[{"id":905016458,"node_id":"R_kgDONfF0ig","owner":{"login":"lastmile-ai","id":123273171,"node_id":"O_kgDOB1j_0w","avatar_url":"https://avatars.githubusercontent.com/u/123273171?v=4","html_url":"https://github.com/lastmile-ai","gravatar_id":"","type":"Organization","site_admin":false,...

Note that while the response includes "total_count":1746, it doesn't provide any way to paginate through these results.

@aaronsb
Copy link
Author

aaronsb commented Apr 4, 2025

For clarity, this bug report is from the coding agent itself, however I believe the report to be accurate.

@juruen
Copy link
Collaborator

juruen commented Apr 5, 2025

@aaronsb

The total_count value is not the actual number of repositories returned to the agent — it’s the total number of repositories matched by the search. The number of repositories actually sent back to the agent is determined by the perPage parameter provided by the tool.

That said, we had a bug that prevented us from setting the default page size, so it was always returning 30. I addressed this in #129.

One issue with this tool is that, even though the number of repositories returned isn't that large (30 max, by default), the API response includes so much data per repository that it’s easy to saturate the context.

There are a few ways to mitigate this:

  • We should probably trim down the response and return only the useful fields.
  • Alternatively, you can also ask the agent to return fewer repositories — see:

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants