Skip to content

Conversation

ZhongRuoyu
Copy link
Contributor

@ZhongRuoyu ZhongRuoyu commented Sep 26, 2025

Currently, *-tblgen do not escape special characters in dependency filenames. This can lead to unnecessary rebuilds when the filenames contain characters that require escaping, as the build system may not treat them correctly.

For instance, when building LLVM in a directory that contains spaces (/home/user/repos/llvm project in the example below), the build system always rebuilds a large portion of the project despite nothing having changed, due to an unescaped space in the dependency filename:

$ ninja -C build -d explain
ninja: Entering directory `build'
ninja explain: /home/user/repos/llvm is dirty
...
$ cat build/include/llvm/IR/IntrinsicsRISCV.h.d
IntrinsicsRISCV.h: /home/user/repos/llvm project/llvm/include/llvm/CodeGen/SDNodeProperties.td ...

Fix this by escaping special characters in dependency filenames using backslashes. This is consistent with how GCC, Clang 1 and lld 2 handle this.

After this change (notice the escaped space):

$ cat build/include/llvm/IR/IntrinsicsRISCV.h.d
IntrinsicsRISCV.h: /home/user/repos/llvm\ project/llvm/include/llvm/CodeGen/SDNodeProperties.td ...

N.B. git clang-format wanted me to format some other parts of the source so there is a second commit a8d9a90 that does that.

See also:

If these PRs are accepted then I'll try to extract the logic into a new llvm/support helper class. These changes are presented as separate PRs because each of them change the behavior slightly differently.

Footnotes

  1. https://github.com/llvm/llvm-project/blob/2cacf7117ba0fb7c134413a1a51302f8d6649052/clang/lib/Frontend/DependencyFile.cpp#L267-L344

  2. https://github.com/llvm/llvm-project/blob/2cacf7117ba0fb7c134413a1a51302f8d6649052/lld/ELF/Driver.cpp#L2482-L2503

Currently, *-tblgen do not escape special characters in dependency
filenames. This can lead to unnecessary rebuilds when the filenames
contain characters that require escaping, as the build system may not
treat them correctly.

For instance, when building LLVM in a directory that contains spaces
("/home/user/repos/llvm project" in the example below), the build system
always rebuilds a large portion of the project despite nothing having
changed, due to an unescaped space in the dependency filename:

    $ ninja -C build -d explain
    ninja: Entering directory `build'
    ninja explain: /home/user/repos/llvm is dirty
    ...
    $ cat build/include/llvm/IR/IntrinsicsRISCV.h.d
    IntrinsicsRISCV.h: /home/user/repos/llvm project/llvm/include/llvm/CodeGen/SDNodeProperties.td ...

Fix this by escaping special characters in dependency filenames using
backslashes. This is consistent with how GCC, Clang [1] and lld [2]
handle this.

After this change (notice the escaped space):

    $ cat build/include/llvm/IR/IntrinsicsRISCV.h.d
    IntrinsicsRISCV.h: /home/user/repos/llvm\ project/llvm/include/llvm/CodeGen/SDNodeProperties.td ...

[1]: https://github.com/llvm/llvm-project/blob/2cacf7117ba0fb7c134413a1a51302f8d6649052/clang/lib/Frontend/DependencyFile.cpp#L267-L344
[2]: https://github.com/llvm/llvm-project/blob/2cacf7117ba0fb7c134413a1a51302f8d6649052/lld/ELF/Driver.cpp#L2482-L2503

Signed-off-by: Ruoyu Zhong <[email protected]>
Signed-off-by: Ruoyu Zhong <[email protected]>
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Sep 26, 2025

@llvm/pr-subscribers-tablegen

Author: Ruoyu Zhong (ZhongRuoyu)

Changes

Currently, *-tblgen do not escape special characters in dependency filenames. This can lead to unnecessary rebuilds when the filenames contain characters that require escaping, as the build system may not treat them correctly.

For instance, when building LLVM in a directory that contains spaces (/home/user/repos/llvm project in the example below), the build system always rebuilds a large portion of the project despite nothing having changed, due to an unescaped space in the dependency filename:

$ ninja -C build -d explain
ninja: Entering directory `build'
ninja explain: /home/user/repos/llvm is dirty
...
$ cat build/include/llvm/IR/IntrinsicsRISCV.h.d
IntrinsicsRISCV.h: /home/user/repos/llvm project/llvm/include/llvm/CodeGen/SDNodeProperties.td ...

Fix this by escaping special characters in dependency filenames using backslashes. This is consistent with how GCC, Clang 1 and lld 2 handle this.

After this change (notice the escaped space):

$ cat build/include/llvm/IR/IntrinsicsRISCV.h.d
IntrinsicsRISCV.h: /home/user/repos/llvm\ project/llvm/include/llvm/CodeGen/SDNodeProperties.td ...

N.B. git clang-format wanted me to format some other parts of the source so there is a second commit a8d9a90 that does that.


Full diff: https://github.com/llvm/llvm-project/pull/160834.diff

1 Files Affected:

  • (modified) llvm/lib/TableGen/Main.cpp (+42-13)
diff --git a/llvm/lib/TableGen/Main.cpp b/llvm/lib/TableGen/Main.cpp
index f545706d6fe30..76b161d087f60 100644
--- a/llvm/lib/TableGen/Main.cpp
+++ b/llvm/lib/TableGen/Main.cpp
@@ -17,12 +17,14 @@
 #include "llvm/TableGen/Main.h"
 #include "TGLexer.h"
 #include "TGParser.h"
+#include "llvm/ADT/SmallString.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/ADT/Twine.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/ErrorOr.h"
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/Path.h"
 #include "llvm/Support/SMLoc.h"
 #include "llvm/Support/SourceMgr.h"
 #include "llvm/Support/ToolOutputFile.h"
@@ -37,26 +39,24 @@
 #include <utility>
 using namespace llvm;
 
-static cl::opt<std::string>
-OutputFilename("o", cl::desc("Output filename"), cl::value_desc("filename"),
-               cl::init("-"));
+static cl::opt<std::string> OutputFilename("o", cl::desc("Output filename"),
+                                           cl::value_desc("filename"),
+                                           cl::init("-"));
 
-static cl::opt<std::string>
-DependFilename("d",
-               cl::desc("Dependency filename"),
-               cl::value_desc("filename"),
-               cl::init(""));
+static cl::opt<std::string> DependFilename("d", cl::desc("Dependency filename"),
+                                           cl::value_desc("filename"),
+                                           cl::init(""));
 
 static cl::opt<std::string>
-InputFilename(cl::Positional, cl::desc("<input file>"), cl::init("-"));
+    InputFilename(cl::Positional, cl::desc("<input file>"), cl::init("-"));
 
 static cl::list<std::string>
 IncludeDirs("I", cl::desc("Directory of include files"),
             cl::value_desc("directory"), cl::Prefix);
 
 static cl::list<std::string>
-MacroNames("D", cl::desc("Name of the macro to be defined"),
-            cl::value_desc("macro name"), cl::Prefix);
+    MacroNames("D", cl::desc("Name of the macro to be defined"),
+               cl::value_desc("macro name"), cl::Prefix);
 
 static cl::opt<bool>
 WriteIfChanged("write-if-changed", cl::desc("Only write output if it changed"));
@@ -83,6 +83,35 @@ static int reportError(const char *ProgName, Twine Msg) {
   return 1;
 }
 
+/// Escape a filename in the dependency file so that it is correctly
+/// interpreted by `make`. This is consistent with Clang, GCC, and lld.
+static TGLexer::DependenciesSetTy::value_type escapeDependencyFilename(
+    const TGLexer::DependenciesSetTy::value_type &Filename) {
+  std::string Res;
+  raw_string_ostream OS(Res);
+
+  SmallString<256> NativePath;
+  sys::path::native(Filename, NativePath);
+
+  for (unsigned I = 0, E = NativePath.size(); I != E; ++I) {
+    if (NativePath[I] == '#') {
+      OS << '\\';
+    } else if (NativePath[I] == ' ') {
+      OS << '\\';
+      unsigned J = I;
+      while (J > 0 && NativePath[--J] == '\\') {
+        OS << '\\';
+      }
+    } else if (NativePath[I] == '$') {
+      OS << '$';
+    }
+    OS << NativePath[I];
+  }
+
+  OS.flush();
+  return Res;
+}
+
 /// Create a dependency file for `-d` option.
 ///
 /// This functionality is really only for the benefit of the build system.
@@ -96,9 +125,9 @@ static int createDependencyFile(const TGParser &Parser, const char *argv0) {
   if (EC)
     return reportError(argv0, "error opening " + DependFilename + ":" +
                                   EC.message() + "\n");
-  DepOut.os() << OutputFilename << ":";
+  DepOut.os() << escapeDependencyFilename(OutputFilename) << ":";
   for (const auto &Dep : Parser.getDependencies()) {
-    DepOut.os() << ' ' << Dep;
+    DepOut.os() << ' ' << escapeDependencyFilename(Dep);
   }
   DepOut.os() << "\n";
   DepOut.keep();

Footnotes

  1. https://github.com/llvm/llvm-project/blob/2cacf7117ba0fb7c134413a1a51302f8d6649052/clang/lib/Frontend/DependencyFile.cpp#L267-L344

  2. https://github.com/llvm/llvm-project/blob/2cacf7117ba0fb7c134413a1a51302f8d6649052/lld/ELF/Driver.cpp#L2482-L2503

Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to write a lit test for this, especially for the funky logic handling backslashes before a space?

@ZhongRuoyu ZhongRuoyu force-pushed the tablegen-escape-dep-filenames branch from b11bf0f to 67488de Compare September 26, 2025 10:44
// CHECK-DAG: file\ with\ spaces.td
// CHECK-DAG: file\#with\#hash.td
// CHECK-DAG: file$$with$$dollar.td
// CHECK-DAG: file\ with\ escape\\\ before\ spaces.td
Copy link
Contributor Author

@ZhongRuoyu ZhongRuoyu Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test (file with escape\ before spaces) actually failed on UNIX before 7718c55, as the filename was escaped as file\ with\ space/\ before\ spaces.td suggesting that the backslash in the filename was mistakenly (?) transformed to the path separator / due to this:

llvm::replace(Path, '\\', '/');

So if the test case actually makes sense, I believe it's more correct to only transform the path to its native form using sys::path::native on Windows. Its "On Unix, it converts all '' to '/'" behavior doesn't seem to be needed here (*-tblgen don't have the -fms-compatibility flag where backslashes are treated as path separators):

/// Convert path to the native form. This is used to give paths to users and
/// operating system calls in the platform's normal way. For example, on Windows
/// all '/' are converted to '\'. On Unix, it converts all '\' to '/'.
///
/// @param path A path that is transformed to native format.
/// @param result Holds the result of the transformation.
LLVM_ABI void native(const Twine &path, SmallVectorImpl<char> &result,
Style style = Style::native);

As for Clang and lld (as mentioned in the PR description), this also seems to be an issue that's worth checking.

$ echo '#include <a\ b>' | gcc-15 -M -MG -xc -
-: a\\\ b
$ echo '#include <a\ b>' | clang-21 -M -MG -xc - 
-.o: a/\ b

Copy link
Contributor Author

@ZhongRuoyu ZhongRuoyu Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed the same change in Clang and lld, respectively:

If these changes are accepted we can probably DRY this up into a new llvm/support helper class.

@ZhongRuoyu ZhongRuoyu requested a review from jayfoad September 26, 2025 11:07
@ZhongRuoyu ZhongRuoyu force-pushed the tablegen-escape-dep-filenames branch from f400432 to 7718c55 Compare September 26, 2025 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants