Skip to content

Rename "search_files" to "search_files_by_name". Add "search_file_contents" #897

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sebastien-rosset
Copy link

@sebastien-rosset sebastien-rosset commented Mar 17, 2025

Description

  1. Fix Confusing search_files function description leads to misunderstanding of its purpose #896: "Search file by name" enhancements
    1. Rename the existing search_files function to search_files_by_name
    2. Improve performance for deeply nested directories by processing directories in parallel batches.
    3. Improve description to make it clear it's a file/directory name matcher, not a file content search.
    4. Improve error handling
  2. Add new search_file_contents to search file content.
    1. Plain text or regex pattern matching
    2. Case-sensitive/insensitive search
    3. Configurable result limits
    4. Context lines around matches
    5. File glob filtering (include/exclude)
  3. Better parameter documentation throughout
  4. Fix "search_files" tool of Filesystem example MCP server does not work as expected #735, File server tool not finding files. #1067: sometimes the LLM was confused what is the exact syntax for file name matching. Is it substring match, regex match, glob pattern or something else? It wasn't very clear from the name and description.

Server Details

  • Server: filesystem
  • Changes to: tools

Motivation and Context

Both Claude and human users may misinterpret the search_files function as a content search tool (like grep) rather than a file/directory name matcher. The suggested description makes the distinction clear from the beginning and explicitly notes that it doesn't search file contents.

How Has This Been Tested?

Tested with Claude Desktop with the following prompts, before and after the change.

  1. The "Before" column shows which function was selected by the LLM before this PR.
  2. The "After" column shows which function was selected by the LLM with this PR, for the exact same prompt.
Prompt Before After
Find all references to "XYZ" under
/Users/my_userid/my_project/
search_files search_file_content
Find files whose names contain "color" under
/Users/my_userid/my_project/
search_files search_files_by_name
Search files that contain "XYZ" under
/Users/my_userid/my_project using filesystem MCP.
search_files search_file_content
Search "XYZ" under
/Users/my_userid/my_project using filesystem MCP.
search_files search_file_content
Search "xyz" in file a/b/c/foo.cpp search_files search_file_content

For prompt 1, the LLM can easily infer it should be searching for file contents, not matching file names. Previously, the LLM mistakenly invoked the search_files function, which typically would return no matching results.

For prompts 3 and 4, there is intentionally some prompt ambiguity to test how the LLM decide whether to use search_files_by_name or search_file_content based on the function names and descriptions.

Breaking Changes

Will users need to update their MCP client configurations?

No, but I'm not sure I understand breaking changes. I'm unclear whether it's ok to rename a function, or if it's considered to be a breaking change.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Protocol Documentation
  • My changes follows MCP security best practices
  • I have updated the server's README accordingly
  • I have tested this with an LLM client
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have documented all environment variables and configuration options

Additional context

Tests for the search_file_contents function

  • Basic text search returns correct results for common terms
  • Case-sensitive search correctly differentiates between uppercase and lowercase patterns
  • Regex pattern search correctly identifies patterns like function declarations async function \w+\(
  • Context lines display works correctly with different values (0, 2, 4 lines)
  • Max results parameter properly limits the number of results returned
  • File extension filtering correctly includes only the specified file types (.ts, .js, etc.)
  • File extension filtering handles files without extensions correctly
  • Exclude patterns with wildcards (**/*.test.ts) correctly filter out matching files
  • Exclude patterns with simple strings (*config*) correctly filter content containing those strings
  • Combined search with regex + case sensitivity + file types works correctly
  • Search handles large files efficiently without excessive memory usage
  • Directory recursion works correctly for nested file structures
  • Error handling for inaccessible files or directories works as expected
  • Race conditions are avoided when searching directories in parallel
  • Files with non-UTF8 encoding are handled gracefully without crashing

@sebastien-rosset sebastien-rosset marked this pull request as draft March 17, 2025 21:08
@sebastien-rosset sebastien-rosset marked this pull request as ready for review March 17, 2025 21:37
@sebastien-rosset
Copy link
Author

sebastien-rosset commented Mar 17, 2025

Another typically test scenario is when refactoring code. I've noticed Claude typically tries to read a file if the user has provided the file name, as in the example below:

I'll help you rewrite the message so that the configuration parameter can be properly translated. Let me first examine the file to understand the context better.
View result from read_file from filesystem (local)
Result too long, truncated to 100000 characters

If the file is too large, then the LLM tries to search content within the file(s). Previously, it was invoking the search_file function, but that always returned empty results, because Claude was confused about the purpose of the search_file function. With this PR, the LLM correctly invokes search_file_contents:

Now let me look for any references to translating the parameter name:
View result from search_file_contents from filesystem (local)

This makes it possible to have better discovery of a code base, e.g.:

View result from list_allowed_directories from filesystem (local)
Let me look at the xxxx.cpp file to understand the current constructor implementation:
View result from read_file from filesystem (local)
Now I'll look at the xxxx.h file to understand the class definitions better:
View result from read_file from filesystem (local)
Let's look for all header files related to the xxxx plugin:
View result from search_files_by_name from filesystem (local)
Let's try another approach and see what files are in the src directory
View result from list_directory from filesystem (local)
Now let's look for the corresponding header files:
View result from list_directory from filesystem (local)
Great! Now let's look at the zzzz.h file to understand how the xxxx class is defined:
View result from read_file from filesystem (local)
Now let's examine the xxxx.cpp file to see how the constructor is implemented:
View result from read_file from filesystem (local)
Now that I have a good understanding of the codebase, I can modify xxxx ...

@sebastien-rosset
Copy link
Author

sebastien-rosset commented Mar 19, 2025

I've noticed sometimes Claude invokes search_files_by_name with the following input:

{
  `path`: `/Users/xxxx/myproject/src`,
  `pattern`: `.h`
}
[{"type": "text", "text": "No matches found"}]

I improved the description of the pattern argument to make it clear this is a substring match, not a glob and not a regex.

@olaservo olaservo added server-filesystem Reference implementation for the Filesystem MCP server - src/filesystem enhancement New feature or request labels Mar 27, 2025
@sebastien-rosset
Copy link
Author

sebastien-rosset commented Apr 7, 2025

I've noticed that Claude AI often tries to invoke the search functions with a regex, most likely because the name of the input argument is "pattern". I've added a useRegex argument to make it more explicit.

@FalacerSelene
Copy link

Thanks for this, I came here with the same issue that you've solved here. I was going to suggest find_files and grep_files as the names of the tools here, to match the names of the tools that they behave like, but if by_name and contents is enough to make it clear to Claude what to do then I'm happy with it.

@sebastien-rosset sebastien-rosset force-pushed the search_files branch 3 times, most recently from af8476e to 3259d90 Compare April 29, 2025 03:27
@sebastien-rosset
Copy link
Author

Thanks for this, I came here with the same issue that you've solved here. I was going to suggest find_files and grep_files as the names of the tools here, to match the names of the tools that they behave like, but if by_name and contents is enough to make it clear to Claude what to do then I'm happy with it.

I've used this for over a month and it generally works well. I don't mind changing to "find_files" and "grep_files". I'm not sure if it would change the behavior.
Based on my use of Claude for the past several weeks, I have renamed the pattern argument to searchText. I had noticed sometimes Claude was invoking the tool with a regex, probably because the word "pattern" is strongly associated with "regular expression".

… of its purpose.

Rename "search_files" to search_files_by_name.
Add new "search_file_contents" function.
Add descriptions to help the LLM understand the fields.
Support single file search.
Support glob pattern for search by file name.
Rename 'pattern' argument to 'searchText'. Sometimes the name 'pattern' caused
Claude to pass a regex even though it was processed as a simple substring match.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request server-filesystem Reference implementation for the Filesystem MCP server - src/filesystem
Projects
None yet
3 participants