-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[TableGen] correctly escape dependency filenames #160834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Currently, *-tblgen do not escape special characters in dependency filenames. This can lead to unnecessary rebuilds when the filenames contain characters that require escaping, as the build system may not treat them correctly. For instance, when building LLVM in a directory that contains spaces ("/home/user/repos/llvm project" in the example below), the build system always rebuilds a large portion of the project despite nothing having changed, due to an unescaped space in the dependency filename: $ ninja -C build -d explain ninja: Entering directory `build' ninja explain: /home/user/repos/llvm is dirty ... $ cat build/include/llvm/IR/IntrinsicsRISCV.h.d IntrinsicsRISCV.h: /home/user/repos/llvm project/llvm/include/llvm/CodeGen/SDNodeProperties.td ... Fix this by escaping special characters in dependency filenames using backslashes. This is consistent with how GCC, Clang [1] and lld [2] handle this. After this change (notice the escaped space): $ cat build/include/llvm/IR/IntrinsicsRISCV.h.d IntrinsicsRISCV.h: /home/user/repos/llvm\ project/llvm/include/llvm/CodeGen/SDNodeProperties.td ... [1]: https://github.com/llvm/llvm-project/blob/2cacf7117ba0fb7c134413a1a51302f8d6649052/clang/lib/Frontend/DependencyFile.cpp#L267-L344 [2]: https://github.com/llvm/llvm-project/blob/2cacf7117ba0fb7c134413a1a51302f8d6649052/lld/ELF/Driver.cpp#L2482-L2503 Signed-off-by: Ruoyu Zhong <[email protected]>
Signed-off-by: Ruoyu Zhong <[email protected]>
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-tablegen Author: Ruoyu Zhong (ZhongRuoyu) ChangesCurrently, For instance, when building LLVM in a directory that contains spaces ( $ ninja -C build -d explain
ninja: Entering directory `build'
ninja explain: /home/user/repos/llvm is dirty
...
$ cat build/include/llvm/IR/IntrinsicsRISCV.h.d
IntrinsicsRISCV.h: /home/user/repos/llvm project/llvm/include/llvm/CodeGen/SDNodeProperties.td ... Fix this by escaping special characters in dependency filenames using backslashes. This is consistent with how GCC, Clang 1 and lld 2 handle this. After this change (notice the escaped space): $ cat build/include/llvm/IR/IntrinsicsRISCV.h.d
IntrinsicsRISCV.h: /home/user/repos/llvm\ project/llvm/include/llvm/CodeGen/SDNodeProperties.td ... N.B. Full diff: https://github.com/llvm/llvm-project/pull/160834.diff 1 Files Affected:
diff --git a/llvm/lib/TableGen/Main.cpp b/llvm/lib/TableGen/Main.cpp
index f545706d6fe30..76b161d087f60 100644
--- a/llvm/lib/TableGen/Main.cpp
+++ b/llvm/lib/TableGen/Main.cpp
@@ -17,12 +17,14 @@
#include "llvm/TableGen/Main.h"
#include "TGLexer.h"
#include "TGParser.h"
+#include "llvm/ADT/SmallString.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/Twine.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/ErrorOr.h"
#include "llvm/Support/FileSystem.h"
#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/Path.h"
#include "llvm/Support/SMLoc.h"
#include "llvm/Support/SourceMgr.h"
#include "llvm/Support/ToolOutputFile.h"
@@ -37,26 +39,24 @@
#include <utility>
using namespace llvm;
-static cl::opt<std::string>
-OutputFilename("o", cl::desc("Output filename"), cl::value_desc("filename"),
- cl::init("-"));
+static cl::opt<std::string> OutputFilename("o", cl::desc("Output filename"),
+ cl::value_desc("filename"),
+ cl::init("-"));
-static cl::opt<std::string>
-DependFilename("d",
- cl::desc("Dependency filename"),
- cl::value_desc("filename"),
- cl::init(""));
+static cl::opt<std::string> DependFilename("d", cl::desc("Dependency filename"),
+ cl::value_desc("filename"),
+ cl::init(""));
static cl::opt<std::string>
-InputFilename(cl::Positional, cl::desc("<input file>"), cl::init("-"));
+ InputFilename(cl::Positional, cl::desc("<input file>"), cl::init("-"));
static cl::list<std::string>
IncludeDirs("I", cl::desc("Directory of include files"),
cl::value_desc("directory"), cl::Prefix);
static cl::list<std::string>
-MacroNames("D", cl::desc("Name of the macro to be defined"),
- cl::value_desc("macro name"), cl::Prefix);
+ MacroNames("D", cl::desc("Name of the macro to be defined"),
+ cl::value_desc("macro name"), cl::Prefix);
static cl::opt<bool>
WriteIfChanged("write-if-changed", cl::desc("Only write output if it changed"));
@@ -83,6 +83,35 @@ static int reportError(const char *ProgName, Twine Msg) {
return 1;
}
+/// Escape a filename in the dependency file so that it is correctly
+/// interpreted by `make`. This is consistent with Clang, GCC, and lld.
+static TGLexer::DependenciesSetTy::value_type escapeDependencyFilename(
+ const TGLexer::DependenciesSetTy::value_type &Filename) {
+ std::string Res;
+ raw_string_ostream OS(Res);
+
+ SmallString<256> NativePath;
+ sys::path::native(Filename, NativePath);
+
+ for (unsigned I = 0, E = NativePath.size(); I != E; ++I) {
+ if (NativePath[I] == '#') {
+ OS << '\\';
+ } else if (NativePath[I] == ' ') {
+ OS << '\\';
+ unsigned J = I;
+ while (J > 0 && NativePath[--J] == '\\') {
+ OS << '\\';
+ }
+ } else if (NativePath[I] == '$') {
+ OS << '$';
+ }
+ OS << NativePath[I];
+ }
+
+ OS.flush();
+ return Res;
+}
+
/// Create a dependency file for `-d` option.
///
/// This functionality is really only for the benefit of the build system.
@@ -96,9 +125,9 @@ static int createDependencyFile(const TGParser &Parser, const char *argv0) {
if (EC)
return reportError(argv0, "error opening " + DependFilename + ":" +
EC.message() + "\n");
- DepOut.os() << OutputFilename << ":";
+ DepOut.os() << escapeDependencyFilename(OutputFilename) << ":";
for (const auto &Dep : Parser.getDependencies()) {
- DepOut.os() << ' ' << Dep;
+ DepOut.os() << ' ' << escapeDependencyFilename(Dep);
}
DepOut.os() << "\n";
DepOut.keep();
Footnotes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to write a lit test for this, especially for the funky logic handling backslashes before a space?
b11bf0f
to
67488de
Compare
// CHECK-DAG: file\ with\ spaces.td | ||
// CHECK-DAG: file\#with\#hash.td | ||
// CHECK-DAG: file$$with$$dollar.td | ||
// CHECK-DAG: file\ with\ escape\\\ before\ spaces.td |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test (file with escape\ before spaces
) actually failed on UNIX before 7718c55, as the filename was escaped as file\ with\ space/\ before\ spaces.td
suggesting that the backslash in the filename was mistakenly (?) transformed to the path separator /
due to this:
llvm-project/llvm/lib/Support/Path.cpp
Line 564 in 2cacf71
llvm::replace(Path, '\\', '/'); |
So if the test case actually makes sense, I believe it's more correct to only transform the path to its native form using sys::path::native
on Windows. Its "On Unix, it converts all '' to '/'" behavior doesn't seem to be needed here (*-tblgen
don't have the -fms-compatibility
flag where backslashes are treated as path separators):
llvm-project/llvm/include/llvm/Support/Path.h
Lines 258 to 265 in 2cacf71
/// Convert path to the native form. This is used to give paths to users and | |
/// operating system calls in the platform's normal way. For example, on Windows | |
/// all '/' are converted to '\'. On Unix, it converts all '\' to '/'. | |
/// | |
/// @param path A path that is transformed to native format. | |
/// @param result Holds the result of the transformation. | |
LLVM_ABI void native(const Twine &path, SmallVectorImpl<char> &result, | |
Style style = Style::native); |
As for Clang and lld (as mentioned in the PR description), this also seems to be an issue that's worth checking.
$ echo '#include <a\ b>' | gcc-15 -M -MG -xc -
-: a\\\ b
$ echo '#include <a\ b>' | clang-21 -M -MG -xc -
-.o: a/\ b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposed the same change in Clang and lld, respectively:
- [Clang] only convert dependency filename to native form when MS-compatible #160903
- [lld][ELF] Only convert dependency filename to native form on Windows #160927
If these changes are accepted we can probably DRY this up into a new llvm/support
helper class.
f400432
to
7718c55
Compare
Currently,
*-tblgen
do not escape special characters in dependency filenames. This can lead to unnecessary rebuilds when the filenames contain characters that require escaping, as the build system may not treat them correctly.For instance, when building LLVM in a directory that contains spaces (
/home/user/repos/llvm project
in the example below), the build system always rebuilds a large portion of the project despite nothing having changed, due to an unescaped space in the dependency filename:Fix this by escaping special characters in dependency filenames using backslashes. This is consistent with how GCC, Clang 1 and lld 2 handle this.
After this change (notice the escaped space):
N.B.
git clang-format
wanted me to format some other parts of the source so there is a second commit a8d9a90 that does that.See also:
If these PRs are accepted then I'll try to extract the logic into a new
llvm/support
helper class. These changes are presented as separate PRs because each of them change the behavior slightly differently.Footnotes
https://github.com/llvm/llvm-project/blob/2cacf7117ba0fb7c134413a1a51302f8d6649052/clang/lib/Frontend/DependencyFile.cpp#L267-L344 ↩
https://github.com/llvm/llvm-project/blob/2cacf7117ba0fb7c134413a1a51302f8d6649052/lld/ELF/Driver.cpp#L2482-L2503 ↩