-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[MemProf] Restructure the pruning of unneeded NotCold contexts #138792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This change is mostly NFC, other than the addition of a new message printed when contexts are pruned when -memprof-report-hinted-sizes is enabled. To prepare for a follow on change, adjust the way we determine which NotCold contexts can be pruned (because they overlap with longer NotCold contexts), and change the way we perform this pruning. Instead of determining the points at which we need to keep NotCold contexts during the building of the trie, we now determine this on the fly as the MIB metadata nodes are recursively built. This simplifies a follow on change that performs additional pruning of some NotCold contexts, and which can affect which others need to be kept as the longest overlapping NotCold contexts.
@llvm/pr-subscribers-llvm-transforms Author: Teresa Johnson (teresajohnson) ChangesThis change is mostly NFC, other than the addition of a new message To prepare for a follow on change, adjust the way we determine which Instead of determining the points at which we need to keep NotCold Full diff: https://github.com/llvm/llvm-project/pull/138792.diff 3 Files Affected:
diff --git a/llvm/include/llvm/Analysis/MemoryProfileInfo.h b/llvm/include/llvm/Analysis/MemoryProfileInfo.h
index f75783a4fef50..deb7ab134c161 100644
--- a/llvm/include/llvm/Analysis/MemoryProfileInfo.h
+++ b/llvm/include/llvm/Analysis/MemoryProfileInfo.h
@@ -56,37 +56,6 @@ class CallStackTrie {
// Allocation types for call context sharing the context prefix at this
// node.
uint8_t AllocTypes;
- // Updated as we add allocations to note if this is the deepest point in the
- // trie that has an ambiguous allocation type (both Cold and NotCold). It is
- // used to prune unneeded NotCold contexts, taking advantage of the fact
- // that we later will only clone Cold contexts, as NotCold is the allocation
- // default. We only need to keep as metadata the NotCold contexts that
- // overlap the longest with Cold allocations, so that we know how deeply we
- // need to clone. For example, assume we add the following contexts to the
- // trie:
- // 1 3 (notcold)
- // 1 2 4 (cold)
- // 1 2 5 (notcold)
- // 1 2 6 (notcold)
- // the trie looks like:
- // 1
- // / \
- // 2 3
- // /|\
- // 4 5 6
- //
- // It is sufficient to prune all but one not cold contexts (either 1,2,5 or
- // 1,2,6, we arbitrarily keep the first one we encounter which will be
- // 1,2,5). We'll initially have DeepestAmbiguousAllocType set false for trie
- // node 1 after the trie is built, and true for node 2. This indicates that
- // the not cold context ending in 3 is not needed (its immediate callee has
- // this value set false). The first notcold context we encounter when
- // iterating the callers of node 2 will be the context ending in 5 (since
- // std::map iteration is in sorted order of key), which will see that this
- // field is true for its callee, so we will keep it. But at that point we
- // set the callee's flag to false which prevents adding the not cold context
- // ending in 6 unnecessarily.
- bool DeepestAmbiguousAllocType = true;
// If the user has requested reporting of hinted sizes, keep track of the
// associated full stack id and profiled sizes. Can have more than one
// after trimming (e.g. when building from metadata). This is only placed on
@@ -134,8 +103,7 @@ class CallStackTrie {
bool buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
std::vector<uint64_t> &MIBCallStack,
std::vector<Metadata *> &MIBNodes,
- bool CalleeHasAmbiguousCallerContext,
- bool &CalleeDeepestAmbiguousAllocType);
+ bool CalleeHasAmbiguousCallerContext);
public:
CallStackTrie() = default;
diff --git a/llvm/lib/Analysis/MemoryProfileInfo.cpp b/llvm/lib/Analysis/MemoryProfileInfo.cpp
index 6ca5b5e492723..d7f5396a6ccf3 100644
--- a/llvm/lib/Analysis/MemoryProfileInfo.cpp
+++ b/llvm/lib/Analysis/MemoryProfileInfo.cpp
@@ -163,16 +163,10 @@ void CallStackTrie::addCallStack(
continue;
}
// Update existing caller node if it exists.
- CallStackTrieNode *Prev = nullptr;
auto [Next, Inserted] = Curr->Callers.try_emplace(StackId);
if (!Inserted) {
- Prev = Curr;
Curr = Next->second;
Curr->addAllocType(AllocType);
- // If this node has an ambiguous alloc type, its callee is not the deepest
- // point where we have an ambigous allocation type.
- if (!hasSingleAllocType(Curr->AllocTypes))
- Prev->DeepestAmbiguousAllocType = false;
continue;
}
// Otherwise add a new caller node.
@@ -248,41 +242,114 @@ void CallStackTrie::convertHotToNotCold(CallStackTrieNode *Node) {
convertHotToNotCold(Caller.second);
}
+// Copy over some or all of NewMIBNodes to the SavedMIBNodes vector, depending
+// on options that enable filtering out some NotCold contexts.
+static void SaveFilteredNewMIBNodes(std::vector<Metadata *> &NewMIBNodes,
+ std::vector<Metadata *> &SavedMIBNodes,
+ unsigned CallerContextLength) {
+ // In the simplest case, with pruning disabled, keep all the new MIB nodes.
+ if (MemProfKeepAllNotColdContexts)
+ append_range(SavedMIBNodes, NewMIBNodes);
+
+ auto EmitMessageForRemovedContexts = [](const MDNode *MIBMD, StringRef Tag,
+ StringRef Extra) {
+ assert(MIBMD->getNumOperands() > 2);
+ for (unsigned I = 2; I < MIBMD->getNumOperands(); I++) {
+ MDNode *ContextSizePair = dyn_cast<MDNode>(MIBMD->getOperand(I));
+ assert(ContextSizePair->getNumOperands() == 2);
+ uint64_t FullStackId =
+ mdconst::dyn_extract<ConstantInt>(ContextSizePair->getOperand(0))
+ ->getZExtValue();
+ uint64_t TS =
+ mdconst::dyn_extract<ConstantInt>(ContextSizePair->getOperand(1))
+ ->getZExtValue();
+ errs() << "MemProf hinting: Total size for " << Tag
+ << " non-cold full allocation context hash " << FullStackId
+ << Extra << ": " << TS << "\n";
+ }
+ };
+
+ // Prune unneeded NotCold contexts, taking advantage of the fact
+ // that we later will only clone Cold contexts, as NotCold is the allocation
+ // default. We only need to keep as metadata the NotCold contexts that
+ // overlap the longest with Cold allocations, so that we know how deeply we
+ // need to clone. For example, assume we add the following contexts to the
+ // trie:
+ // 1 3 (notcold)
+ // 1 2 4 (cold)
+ // 1 2 5 (notcold)
+ // 1 2 6 (notcold)
+ // the trie looks like:
+ // 1
+ // / \
+ // 2 3
+ // /|\
+ // 4 5 6
+ //
+ // It is sufficient to prune all but one not-cold contexts (either 1,2,5 or
+ // 1,2,6, we arbitrarily keep the first one we encounter which will be
+ // 1,2,5).
+ //
+ // To do this pruning, we first check if there were any not-cold
+ // contexts kept for a deeper caller, which will have a context length larger
+ // than the CallerContextLength being handled here (i.e. kept by a deeper
+ // recursion step). If so, none of the not-cold MIB nodes added for the
+ // immediate callers need to be kept. If not, we keep the first (created
+ // for the immediate caller) not-cold MIB node.
+ unsigned LongestNonCold = 0;
+ for (auto *MIB : NewMIBNodes) {
+ auto MIBMD = cast<MDNode>(MIB);
+ if (getMIBAllocType(MIBMD) == AllocationType::Cold)
+ continue;
+ MDNode *StackMD = getMIBStackNode(MIBMD);
+ assert(StackMD);
+ auto ContextLength = StackMD->getNumOperands();
+ if (ContextLength > LongestNonCold)
+ LongestNonCold = ContextLength;
+ }
+ // Don't need to emit any for the immediate caller if we already have
+ // longer overlapping contexts;
+ bool KeepFirstNewNotCold = LongestNonCold == CallerContextLength;
+ auto NewColdMIBNodes = make_filter_range(NewMIBNodes, [&](const Metadata *M) {
+ auto MIBMD = cast<MDNode>(M);
+ // Only keep cold contexts and first (longest non-cold context).
+ if (getMIBAllocType(MIBMD) != AllocationType::Cold) {
+ MDNode *StackMD = getMIBStackNode(MIBMD);
+ assert(StackMD);
+ auto ContextLength = StackMD->getNumOperands();
+ // Keep any already kept for longer contexts.
+ if (ContextLength > CallerContextLength)
+ return true;
+ // Otherwise keep the first one added by the immediate caller if there
+ // were no longer contexts.
+ if (KeepFirstNewNotCold) {
+ KeepFirstNewNotCold = false;
+ return true;
+ }
+ if (MemProfReportHintedSizes)
+ EmitMessageForRemovedContexts(MIBMD, "pruned", "");
+ return false;
+ }
+ return true;
+ });
+ for (auto *M : NewColdMIBNodes)
+ SavedMIBNodes.push_back(M);
+}
+
// Recursive helper to trim contexts and create metadata nodes.
// Caller should have pushed Node's loc to MIBCallStack. Doing this in the
// caller makes it simpler to handle the many early returns in this method.
bool CallStackTrie::buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
std::vector<uint64_t> &MIBCallStack,
std::vector<Metadata *> &MIBNodes,
- bool CalleeHasAmbiguousCallerContext,
- bool &CalleeDeepestAmbiguousAllocType) {
+ bool CalleeHasAmbiguousCallerContext) {
// Trim context below the first node in a prefix with a single alloc type.
// Add an MIB record for the current call stack prefix.
if (hasSingleAllocType(Node->AllocTypes)) {
- // Because we only clone cold contexts (we don't clone for exposing NotCold
- // contexts as that is the default allocation behavior), we create MIB
- // metadata for this context if any of the following are true:
- // 1) It is cold.
- // 2) The immediate callee is the deepest point where we have an ambiguous
- // allocation type (i.e. the other callers that are cold need to know
- // that we have a not cold context overlapping to this point so that we
- // know how deep to clone).
- // 3) MemProfKeepAllNotColdContexts is enabled, which is useful if we are
- // reporting hinted sizes, and want to get information from the indexing
- // step for all contexts, or have specified a value less than 100% for
- // -memprof-cloning-cold-threshold.
- if (Node->hasAllocType(AllocationType::Cold) ||
- CalleeDeepestAmbiguousAllocType || MemProfKeepAllNotColdContexts) {
- std::vector<ContextTotalSize> ContextSizeInfo;
- collectContextSizeInfo(Node, ContextSizeInfo);
- MIBNodes.push_back(createMIBNode(Ctx, MIBCallStack,
- (AllocationType)Node->AllocTypes,
- ContextSizeInfo));
- // If we just emitted an MIB for a not cold caller, don't need to emit
- // another one for the callee to correctly disambiguate its cold callers.
- if (!Node->hasAllocType(AllocationType::Cold))
- CalleeDeepestAmbiguousAllocType = false;
- }
+ std::vector<ContextTotalSize> ContextSizeInfo;
+ collectContextSizeInfo(Node, ContextSizeInfo);
+ MIBNodes.push_back(createMIBNode(
+ Ctx, MIBCallStack, (AllocationType)Node->AllocTypes, ContextSizeInfo));
return true;
}
@@ -291,14 +358,21 @@ bool CallStackTrie::buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
if (!Node->Callers.empty()) {
bool NodeHasAmbiguousCallerContext = Node->Callers.size() > 1;
bool AddedMIBNodesForAllCallerContexts = true;
+ // Accumulate all new MIB nodes by the recursive calls below into a vector
+ // that will later be filtered before adding to the caller's MIBNodes
+ // vector.
+ std::vector<Metadata *> NewMIBNodes;
for (auto &Caller : Node->Callers) {
MIBCallStack.push_back(Caller.first);
- AddedMIBNodesForAllCallerContexts &= buildMIBNodes(
- Caller.second, Ctx, MIBCallStack, MIBNodes,
- NodeHasAmbiguousCallerContext, Node->DeepestAmbiguousAllocType);
+ AddedMIBNodesForAllCallerContexts &=
+ buildMIBNodes(Caller.second, Ctx, MIBCallStack, NewMIBNodes,
+ NodeHasAmbiguousCallerContext);
// Remove Caller.
MIBCallStack.pop_back();
}
+ // Pass in the stack length of the MIB nodes added for the immediate caller,
+ // which is the current stack length plus 1.
+ SaveFilteredNewMIBNodes(NewMIBNodes, MIBNodes, MIBCallStack.size() + 1);
if (AddedMIBNodesForAllCallerContexts)
return true;
// We expect that the callers should be forced to add MIBs to disambiguate
@@ -372,13 +446,8 @@ bool CallStackTrie::buildAndAttachMIBMetadata(CallBase *CI) {
// The CalleeHasAmbiguousCallerContext flag is meant to say whether the
// callee of the given node has more than one caller. Here the node being
// passed in is the alloc and it has no callees. So it's false.
- // Similarly, the last parameter is meant to say whether the callee of the
- // given node is the deepest point where we have ambiguous alloc types, which
- // is also false as the alloc has no callees.
- bool DeepestAmbiguousAllocType = true;
if (buildMIBNodes(Alloc, Ctx, MIBCallStack, MIBNodes,
- /*CalleeHasAmbiguousCallerContext=*/false,
- DeepestAmbiguousAllocType)) {
+ /*CalleeHasAmbiguousCallerContext=*/false)) {
assert(MIBCallStack.size() == 1 &&
"Should only be left with Alloc's location in stack");
CI->setMetadata(LLVMContext::MD_memprof, MDNode::get(Ctx, MIBNodes));
diff --git a/llvm/test/Transforms/PGOProfile/memprof.ll b/llvm/test/Transforms/PGOProfile/memprof.ll
index 73226df861ea5..4a3ddcc38b263 100644
--- a/llvm/test/Transforms/PGOProfile/memprof.ll
+++ b/llvm/test/Transforms/PGOProfile/memprof.ll
@@ -63,15 +63,20 @@
;; give both memprof and pgo metadata.
; RUN: opt < %s -passes='pgo-instr-use,memprof-use<profile-filename=%t.pgomemprofdata>' -pgo-test-profile-file=%t.pgomemprofdata -pgo-warn-missing-function -S 2>&1 | FileCheck %s --check-prefixes=MEMPROF,ALL,PGO
-;; Check that the total sizes are reported if requested.
-; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes -memprof-keep-all-not-cold-contexts 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZES
+;; Check that the total sizes are reported if requested. A message should be
+;; emitted for the pruned context.
+; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZES,TOTALSIZENOKEEPALL
+
+;; Check that the total sizes are reported if requested, and prevent pruning
+;; via -memprof-keep-all-not-cold-contexts.
+; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes -memprof-keep-all-not-cold-contexts 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZES,TOTALSIZESKEEPALL
;; Check that we hint additional allocations with a threshold < 100%
; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes -memprof-matching-cold-threshold=60 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZESTHRESH60
;; Make sure that the -memprof-cloning-cold-threshold flag is enough to cause
;; the size metadata to be generated for the LTO link.
-; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-cloning-cold-threshold=80 -memprof-keep-all-not-cold-contexts 2>&1 | FileCheck %s --check-prefixes=TOTALSIZES
+; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-cloning-cold-threshold=80 -memprof-keep-all-not-cold-contexts 2>&1 | FileCheck %s --check-prefixes=TOTALSIZES,TOTALSIZESKEEPALL
;; Make sure we emit a random hotness seed if requested.
; RUN: llvm-profdata merge -memprof-random-hotness %S/Inputs/memprof.memprofraw --profiled-binary %S/Inputs/memprof.exe -o %t.memprofdatarand 2>&1 | FileCheck %s --check-prefix=RAND
@@ -370,6 +375,8 @@ for.end: ; preds = %for.cond
; MEMPROF: ![[C10]] = !{i64 2061451396820446691}
; MEMPROF: ![[C11]] = !{i64 1544787832369987002}
+; TOTALSIZENOKEEPALL: Total size for pruned non-cold full allocation context hash 1093248920606587996: 10
+
;; For non-context sensitive allocations that get attributes we emit a message
;; with the full allocation context hash, type, and size in bytes.
; TOTALSIZESTHRESH60: Total size for full allocation context hash 8525406123785421946 and dominant alloc type cold: 10
@@ -393,8 +400,8 @@ for.end: ; preds = %for.cond
; TOTALSIZES: !"cold", ![[CONTEXT4:[0-9]+]], ![[CONTEXT5:[0-9]+]]}
; TOTALSIZES: ![[CONTEXT4]] = !{i64 -2103941543456458045, i64 10}
; TOTALSIZES: ![[CONTEXT5]] = !{i64 -191931298737547222, i64 10}
-; TOTALSIZES: !"notcold", ![[CONTEXT6:[0-9]+]]}
-; TOTALSIZES: ![[CONTEXT6]] = !{i64 1093248920606587996, i64 10}
+; TOTALSIZESKEEPALL: !"notcold", ![[CONTEXT6:[0-9]+]]}
+; TOTALSIZESKEEPALL: ![[CONTEXT6]] = !{i64 1093248920606587996, i64 10}
; MEMPROFNOCOLINFO: #[[A1]] = { builtin allocsize(0) "memprof"="notcold" }
; MEMPROFNOCOLINFO: #[[A2]] = { builtin allocsize(0) "memprof"="cold" }
|
@llvm/pr-subscribers-llvm-analysis Author: Teresa Johnson (teresajohnson) ChangesThis change is mostly NFC, other than the addition of a new message To prepare for a follow on change, adjust the way we determine which Instead of determining the points at which we need to keep NotCold Full diff: https://github.com/llvm/llvm-project/pull/138792.diff 3 Files Affected:
diff --git a/llvm/include/llvm/Analysis/MemoryProfileInfo.h b/llvm/include/llvm/Analysis/MemoryProfileInfo.h
index f75783a4fef50..deb7ab134c161 100644
--- a/llvm/include/llvm/Analysis/MemoryProfileInfo.h
+++ b/llvm/include/llvm/Analysis/MemoryProfileInfo.h
@@ -56,37 +56,6 @@ class CallStackTrie {
// Allocation types for call context sharing the context prefix at this
// node.
uint8_t AllocTypes;
- // Updated as we add allocations to note if this is the deepest point in the
- // trie that has an ambiguous allocation type (both Cold and NotCold). It is
- // used to prune unneeded NotCold contexts, taking advantage of the fact
- // that we later will only clone Cold contexts, as NotCold is the allocation
- // default. We only need to keep as metadata the NotCold contexts that
- // overlap the longest with Cold allocations, so that we know how deeply we
- // need to clone. For example, assume we add the following contexts to the
- // trie:
- // 1 3 (notcold)
- // 1 2 4 (cold)
- // 1 2 5 (notcold)
- // 1 2 6 (notcold)
- // the trie looks like:
- // 1
- // / \
- // 2 3
- // /|\
- // 4 5 6
- //
- // It is sufficient to prune all but one not cold contexts (either 1,2,5 or
- // 1,2,6, we arbitrarily keep the first one we encounter which will be
- // 1,2,5). We'll initially have DeepestAmbiguousAllocType set false for trie
- // node 1 after the trie is built, and true for node 2. This indicates that
- // the not cold context ending in 3 is not needed (its immediate callee has
- // this value set false). The first notcold context we encounter when
- // iterating the callers of node 2 will be the context ending in 5 (since
- // std::map iteration is in sorted order of key), which will see that this
- // field is true for its callee, so we will keep it. But at that point we
- // set the callee's flag to false which prevents adding the not cold context
- // ending in 6 unnecessarily.
- bool DeepestAmbiguousAllocType = true;
// If the user has requested reporting of hinted sizes, keep track of the
// associated full stack id and profiled sizes. Can have more than one
// after trimming (e.g. when building from metadata). This is only placed on
@@ -134,8 +103,7 @@ class CallStackTrie {
bool buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
std::vector<uint64_t> &MIBCallStack,
std::vector<Metadata *> &MIBNodes,
- bool CalleeHasAmbiguousCallerContext,
- bool &CalleeDeepestAmbiguousAllocType);
+ bool CalleeHasAmbiguousCallerContext);
public:
CallStackTrie() = default;
diff --git a/llvm/lib/Analysis/MemoryProfileInfo.cpp b/llvm/lib/Analysis/MemoryProfileInfo.cpp
index 6ca5b5e492723..d7f5396a6ccf3 100644
--- a/llvm/lib/Analysis/MemoryProfileInfo.cpp
+++ b/llvm/lib/Analysis/MemoryProfileInfo.cpp
@@ -163,16 +163,10 @@ void CallStackTrie::addCallStack(
continue;
}
// Update existing caller node if it exists.
- CallStackTrieNode *Prev = nullptr;
auto [Next, Inserted] = Curr->Callers.try_emplace(StackId);
if (!Inserted) {
- Prev = Curr;
Curr = Next->second;
Curr->addAllocType(AllocType);
- // If this node has an ambiguous alloc type, its callee is not the deepest
- // point where we have an ambigous allocation type.
- if (!hasSingleAllocType(Curr->AllocTypes))
- Prev->DeepestAmbiguousAllocType = false;
continue;
}
// Otherwise add a new caller node.
@@ -248,41 +242,114 @@ void CallStackTrie::convertHotToNotCold(CallStackTrieNode *Node) {
convertHotToNotCold(Caller.second);
}
+// Copy over some or all of NewMIBNodes to the SavedMIBNodes vector, depending
+// on options that enable filtering out some NotCold contexts.
+static void SaveFilteredNewMIBNodes(std::vector<Metadata *> &NewMIBNodes,
+ std::vector<Metadata *> &SavedMIBNodes,
+ unsigned CallerContextLength) {
+ // In the simplest case, with pruning disabled, keep all the new MIB nodes.
+ if (MemProfKeepAllNotColdContexts)
+ append_range(SavedMIBNodes, NewMIBNodes);
+
+ auto EmitMessageForRemovedContexts = [](const MDNode *MIBMD, StringRef Tag,
+ StringRef Extra) {
+ assert(MIBMD->getNumOperands() > 2);
+ for (unsigned I = 2; I < MIBMD->getNumOperands(); I++) {
+ MDNode *ContextSizePair = dyn_cast<MDNode>(MIBMD->getOperand(I));
+ assert(ContextSizePair->getNumOperands() == 2);
+ uint64_t FullStackId =
+ mdconst::dyn_extract<ConstantInt>(ContextSizePair->getOperand(0))
+ ->getZExtValue();
+ uint64_t TS =
+ mdconst::dyn_extract<ConstantInt>(ContextSizePair->getOperand(1))
+ ->getZExtValue();
+ errs() << "MemProf hinting: Total size for " << Tag
+ << " non-cold full allocation context hash " << FullStackId
+ << Extra << ": " << TS << "\n";
+ }
+ };
+
+ // Prune unneeded NotCold contexts, taking advantage of the fact
+ // that we later will only clone Cold contexts, as NotCold is the allocation
+ // default. We only need to keep as metadata the NotCold contexts that
+ // overlap the longest with Cold allocations, so that we know how deeply we
+ // need to clone. For example, assume we add the following contexts to the
+ // trie:
+ // 1 3 (notcold)
+ // 1 2 4 (cold)
+ // 1 2 5 (notcold)
+ // 1 2 6 (notcold)
+ // the trie looks like:
+ // 1
+ // / \
+ // 2 3
+ // /|\
+ // 4 5 6
+ //
+ // It is sufficient to prune all but one not-cold contexts (either 1,2,5 or
+ // 1,2,6, we arbitrarily keep the first one we encounter which will be
+ // 1,2,5).
+ //
+ // To do this pruning, we first check if there were any not-cold
+ // contexts kept for a deeper caller, which will have a context length larger
+ // than the CallerContextLength being handled here (i.e. kept by a deeper
+ // recursion step). If so, none of the not-cold MIB nodes added for the
+ // immediate callers need to be kept. If not, we keep the first (created
+ // for the immediate caller) not-cold MIB node.
+ unsigned LongestNonCold = 0;
+ for (auto *MIB : NewMIBNodes) {
+ auto MIBMD = cast<MDNode>(MIB);
+ if (getMIBAllocType(MIBMD) == AllocationType::Cold)
+ continue;
+ MDNode *StackMD = getMIBStackNode(MIBMD);
+ assert(StackMD);
+ auto ContextLength = StackMD->getNumOperands();
+ if (ContextLength > LongestNonCold)
+ LongestNonCold = ContextLength;
+ }
+ // Don't need to emit any for the immediate caller if we already have
+ // longer overlapping contexts;
+ bool KeepFirstNewNotCold = LongestNonCold == CallerContextLength;
+ auto NewColdMIBNodes = make_filter_range(NewMIBNodes, [&](const Metadata *M) {
+ auto MIBMD = cast<MDNode>(M);
+ // Only keep cold contexts and first (longest non-cold context).
+ if (getMIBAllocType(MIBMD) != AllocationType::Cold) {
+ MDNode *StackMD = getMIBStackNode(MIBMD);
+ assert(StackMD);
+ auto ContextLength = StackMD->getNumOperands();
+ // Keep any already kept for longer contexts.
+ if (ContextLength > CallerContextLength)
+ return true;
+ // Otherwise keep the first one added by the immediate caller if there
+ // were no longer contexts.
+ if (KeepFirstNewNotCold) {
+ KeepFirstNewNotCold = false;
+ return true;
+ }
+ if (MemProfReportHintedSizes)
+ EmitMessageForRemovedContexts(MIBMD, "pruned", "");
+ return false;
+ }
+ return true;
+ });
+ for (auto *M : NewColdMIBNodes)
+ SavedMIBNodes.push_back(M);
+}
+
// Recursive helper to trim contexts and create metadata nodes.
// Caller should have pushed Node's loc to MIBCallStack. Doing this in the
// caller makes it simpler to handle the many early returns in this method.
bool CallStackTrie::buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
std::vector<uint64_t> &MIBCallStack,
std::vector<Metadata *> &MIBNodes,
- bool CalleeHasAmbiguousCallerContext,
- bool &CalleeDeepestAmbiguousAllocType) {
+ bool CalleeHasAmbiguousCallerContext) {
// Trim context below the first node in a prefix with a single alloc type.
// Add an MIB record for the current call stack prefix.
if (hasSingleAllocType(Node->AllocTypes)) {
- // Because we only clone cold contexts (we don't clone for exposing NotCold
- // contexts as that is the default allocation behavior), we create MIB
- // metadata for this context if any of the following are true:
- // 1) It is cold.
- // 2) The immediate callee is the deepest point where we have an ambiguous
- // allocation type (i.e. the other callers that are cold need to know
- // that we have a not cold context overlapping to this point so that we
- // know how deep to clone).
- // 3) MemProfKeepAllNotColdContexts is enabled, which is useful if we are
- // reporting hinted sizes, and want to get information from the indexing
- // step for all contexts, or have specified a value less than 100% for
- // -memprof-cloning-cold-threshold.
- if (Node->hasAllocType(AllocationType::Cold) ||
- CalleeDeepestAmbiguousAllocType || MemProfKeepAllNotColdContexts) {
- std::vector<ContextTotalSize> ContextSizeInfo;
- collectContextSizeInfo(Node, ContextSizeInfo);
- MIBNodes.push_back(createMIBNode(Ctx, MIBCallStack,
- (AllocationType)Node->AllocTypes,
- ContextSizeInfo));
- // If we just emitted an MIB for a not cold caller, don't need to emit
- // another one for the callee to correctly disambiguate its cold callers.
- if (!Node->hasAllocType(AllocationType::Cold))
- CalleeDeepestAmbiguousAllocType = false;
- }
+ std::vector<ContextTotalSize> ContextSizeInfo;
+ collectContextSizeInfo(Node, ContextSizeInfo);
+ MIBNodes.push_back(createMIBNode(
+ Ctx, MIBCallStack, (AllocationType)Node->AllocTypes, ContextSizeInfo));
return true;
}
@@ -291,14 +358,21 @@ bool CallStackTrie::buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
if (!Node->Callers.empty()) {
bool NodeHasAmbiguousCallerContext = Node->Callers.size() > 1;
bool AddedMIBNodesForAllCallerContexts = true;
+ // Accumulate all new MIB nodes by the recursive calls below into a vector
+ // that will later be filtered before adding to the caller's MIBNodes
+ // vector.
+ std::vector<Metadata *> NewMIBNodes;
for (auto &Caller : Node->Callers) {
MIBCallStack.push_back(Caller.first);
- AddedMIBNodesForAllCallerContexts &= buildMIBNodes(
- Caller.second, Ctx, MIBCallStack, MIBNodes,
- NodeHasAmbiguousCallerContext, Node->DeepestAmbiguousAllocType);
+ AddedMIBNodesForAllCallerContexts &=
+ buildMIBNodes(Caller.second, Ctx, MIBCallStack, NewMIBNodes,
+ NodeHasAmbiguousCallerContext);
// Remove Caller.
MIBCallStack.pop_back();
}
+ // Pass in the stack length of the MIB nodes added for the immediate caller,
+ // which is the current stack length plus 1.
+ SaveFilteredNewMIBNodes(NewMIBNodes, MIBNodes, MIBCallStack.size() + 1);
if (AddedMIBNodesForAllCallerContexts)
return true;
// We expect that the callers should be forced to add MIBs to disambiguate
@@ -372,13 +446,8 @@ bool CallStackTrie::buildAndAttachMIBMetadata(CallBase *CI) {
// The CalleeHasAmbiguousCallerContext flag is meant to say whether the
// callee of the given node has more than one caller. Here the node being
// passed in is the alloc and it has no callees. So it's false.
- // Similarly, the last parameter is meant to say whether the callee of the
- // given node is the deepest point where we have ambiguous alloc types, which
- // is also false as the alloc has no callees.
- bool DeepestAmbiguousAllocType = true;
if (buildMIBNodes(Alloc, Ctx, MIBCallStack, MIBNodes,
- /*CalleeHasAmbiguousCallerContext=*/false,
- DeepestAmbiguousAllocType)) {
+ /*CalleeHasAmbiguousCallerContext=*/false)) {
assert(MIBCallStack.size() == 1 &&
"Should only be left with Alloc's location in stack");
CI->setMetadata(LLVMContext::MD_memprof, MDNode::get(Ctx, MIBNodes));
diff --git a/llvm/test/Transforms/PGOProfile/memprof.ll b/llvm/test/Transforms/PGOProfile/memprof.ll
index 73226df861ea5..4a3ddcc38b263 100644
--- a/llvm/test/Transforms/PGOProfile/memprof.ll
+++ b/llvm/test/Transforms/PGOProfile/memprof.ll
@@ -63,15 +63,20 @@
;; give both memprof and pgo metadata.
; RUN: opt < %s -passes='pgo-instr-use,memprof-use<profile-filename=%t.pgomemprofdata>' -pgo-test-profile-file=%t.pgomemprofdata -pgo-warn-missing-function -S 2>&1 | FileCheck %s --check-prefixes=MEMPROF,ALL,PGO
-;; Check that the total sizes are reported if requested.
-; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes -memprof-keep-all-not-cold-contexts 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZES
+;; Check that the total sizes are reported if requested. A message should be
+;; emitted for the pruned context.
+; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZES,TOTALSIZENOKEEPALL
+
+;; Check that the total sizes are reported if requested, and prevent pruning
+;; via -memprof-keep-all-not-cold-contexts.
+; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes -memprof-keep-all-not-cold-contexts 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZES,TOTALSIZESKEEPALL
;; Check that we hint additional allocations with a threshold < 100%
; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes -memprof-matching-cold-threshold=60 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZESTHRESH60
;; Make sure that the -memprof-cloning-cold-threshold flag is enough to cause
;; the size metadata to be generated for the LTO link.
-; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-cloning-cold-threshold=80 -memprof-keep-all-not-cold-contexts 2>&1 | FileCheck %s --check-prefixes=TOTALSIZES
+; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-cloning-cold-threshold=80 -memprof-keep-all-not-cold-contexts 2>&1 | FileCheck %s --check-prefixes=TOTALSIZES,TOTALSIZESKEEPALL
;; Make sure we emit a random hotness seed if requested.
; RUN: llvm-profdata merge -memprof-random-hotness %S/Inputs/memprof.memprofraw --profiled-binary %S/Inputs/memprof.exe -o %t.memprofdatarand 2>&1 | FileCheck %s --check-prefix=RAND
@@ -370,6 +375,8 @@ for.end: ; preds = %for.cond
; MEMPROF: ![[C10]] = !{i64 2061451396820446691}
; MEMPROF: ![[C11]] = !{i64 1544787832369987002}
+; TOTALSIZENOKEEPALL: Total size for pruned non-cold full allocation context hash 1093248920606587996: 10
+
;; For non-context sensitive allocations that get attributes we emit a message
;; with the full allocation context hash, type, and size in bytes.
; TOTALSIZESTHRESH60: Total size for full allocation context hash 8525406123785421946 and dominant alloc type cold: 10
@@ -393,8 +400,8 @@ for.end: ; preds = %for.cond
; TOTALSIZES: !"cold", ![[CONTEXT4:[0-9]+]], ![[CONTEXT5:[0-9]+]]}
; TOTALSIZES: ![[CONTEXT4]] = !{i64 -2103941543456458045, i64 10}
; TOTALSIZES: ![[CONTEXT5]] = !{i64 -191931298737547222, i64 10}
-; TOTALSIZES: !"notcold", ![[CONTEXT6:[0-9]+]]}
-; TOTALSIZES: ![[CONTEXT6]] = !{i64 1093248920606587996, i64 10}
+; TOTALSIZESKEEPALL: !"notcold", ![[CONTEXT6:[0-9]+]]}
+; TOTALSIZESKEEPALL: ![[CONTEXT6]] = !{i64 1093248920606587996, i64 10}
; MEMPROFNOCOLINFO: #[[A1]] = { builtin allocsize(0) "memprof"="notcold" }
; MEMPROFNOCOLINFO: #[[A2]] = { builtin allocsize(0) "memprof"="cold" }
|
@llvm/pr-subscribers-pgo Author: Teresa Johnson (teresajohnson) ChangesThis change is mostly NFC, other than the addition of a new message To prepare for a follow on change, adjust the way we determine which Instead of determining the points at which we need to keep NotCold Full diff: https://github.com/llvm/llvm-project/pull/138792.diff 3 Files Affected:
diff --git a/llvm/include/llvm/Analysis/MemoryProfileInfo.h b/llvm/include/llvm/Analysis/MemoryProfileInfo.h
index f75783a4fef50..deb7ab134c161 100644
--- a/llvm/include/llvm/Analysis/MemoryProfileInfo.h
+++ b/llvm/include/llvm/Analysis/MemoryProfileInfo.h
@@ -56,37 +56,6 @@ class CallStackTrie {
// Allocation types for call context sharing the context prefix at this
// node.
uint8_t AllocTypes;
- // Updated as we add allocations to note if this is the deepest point in the
- // trie that has an ambiguous allocation type (both Cold and NotCold). It is
- // used to prune unneeded NotCold contexts, taking advantage of the fact
- // that we later will only clone Cold contexts, as NotCold is the allocation
- // default. We only need to keep as metadata the NotCold contexts that
- // overlap the longest with Cold allocations, so that we know how deeply we
- // need to clone. For example, assume we add the following contexts to the
- // trie:
- // 1 3 (notcold)
- // 1 2 4 (cold)
- // 1 2 5 (notcold)
- // 1 2 6 (notcold)
- // the trie looks like:
- // 1
- // / \
- // 2 3
- // /|\
- // 4 5 6
- //
- // It is sufficient to prune all but one not cold contexts (either 1,2,5 or
- // 1,2,6, we arbitrarily keep the first one we encounter which will be
- // 1,2,5). We'll initially have DeepestAmbiguousAllocType set false for trie
- // node 1 after the trie is built, and true for node 2. This indicates that
- // the not cold context ending in 3 is not needed (its immediate callee has
- // this value set false). The first notcold context we encounter when
- // iterating the callers of node 2 will be the context ending in 5 (since
- // std::map iteration is in sorted order of key), which will see that this
- // field is true for its callee, so we will keep it. But at that point we
- // set the callee's flag to false which prevents adding the not cold context
- // ending in 6 unnecessarily.
- bool DeepestAmbiguousAllocType = true;
// If the user has requested reporting of hinted sizes, keep track of the
// associated full stack id and profiled sizes. Can have more than one
// after trimming (e.g. when building from metadata). This is only placed on
@@ -134,8 +103,7 @@ class CallStackTrie {
bool buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
std::vector<uint64_t> &MIBCallStack,
std::vector<Metadata *> &MIBNodes,
- bool CalleeHasAmbiguousCallerContext,
- bool &CalleeDeepestAmbiguousAllocType);
+ bool CalleeHasAmbiguousCallerContext);
public:
CallStackTrie() = default;
diff --git a/llvm/lib/Analysis/MemoryProfileInfo.cpp b/llvm/lib/Analysis/MemoryProfileInfo.cpp
index 6ca5b5e492723..d7f5396a6ccf3 100644
--- a/llvm/lib/Analysis/MemoryProfileInfo.cpp
+++ b/llvm/lib/Analysis/MemoryProfileInfo.cpp
@@ -163,16 +163,10 @@ void CallStackTrie::addCallStack(
continue;
}
// Update existing caller node if it exists.
- CallStackTrieNode *Prev = nullptr;
auto [Next, Inserted] = Curr->Callers.try_emplace(StackId);
if (!Inserted) {
- Prev = Curr;
Curr = Next->second;
Curr->addAllocType(AllocType);
- // If this node has an ambiguous alloc type, its callee is not the deepest
- // point where we have an ambigous allocation type.
- if (!hasSingleAllocType(Curr->AllocTypes))
- Prev->DeepestAmbiguousAllocType = false;
continue;
}
// Otherwise add a new caller node.
@@ -248,41 +242,114 @@ void CallStackTrie::convertHotToNotCold(CallStackTrieNode *Node) {
convertHotToNotCold(Caller.second);
}
+// Copy over some or all of NewMIBNodes to the SavedMIBNodes vector, depending
+// on options that enable filtering out some NotCold contexts.
+static void SaveFilteredNewMIBNodes(std::vector<Metadata *> &NewMIBNodes,
+ std::vector<Metadata *> &SavedMIBNodes,
+ unsigned CallerContextLength) {
+ // In the simplest case, with pruning disabled, keep all the new MIB nodes.
+ if (MemProfKeepAllNotColdContexts)
+ append_range(SavedMIBNodes, NewMIBNodes);
+
+ auto EmitMessageForRemovedContexts = [](const MDNode *MIBMD, StringRef Tag,
+ StringRef Extra) {
+ assert(MIBMD->getNumOperands() > 2);
+ for (unsigned I = 2; I < MIBMD->getNumOperands(); I++) {
+ MDNode *ContextSizePair = dyn_cast<MDNode>(MIBMD->getOperand(I));
+ assert(ContextSizePair->getNumOperands() == 2);
+ uint64_t FullStackId =
+ mdconst::dyn_extract<ConstantInt>(ContextSizePair->getOperand(0))
+ ->getZExtValue();
+ uint64_t TS =
+ mdconst::dyn_extract<ConstantInt>(ContextSizePair->getOperand(1))
+ ->getZExtValue();
+ errs() << "MemProf hinting: Total size for " << Tag
+ << " non-cold full allocation context hash " << FullStackId
+ << Extra << ": " << TS << "\n";
+ }
+ };
+
+ // Prune unneeded NotCold contexts, taking advantage of the fact
+ // that we later will only clone Cold contexts, as NotCold is the allocation
+ // default. We only need to keep as metadata the NotCold contexts that
+ // overlap the longest with Cold allocations, so that we know how deeply we
+ // need to clone. For example, assume we add the following contexts to the
+ // trie:
+ // 1 3 (notcold)
+ // 1 2 4 (cold)
+ // 1 2 5 (notcold)
+ // 1 2 6 (notcold)
+ // the trie looks like:
+ // 1
+ // / \
+ // 2 3
+ // /|\
+ // 4 5 6
+ //
+ // It is sufficient to prune all but one not-cold contexts (either 1,2,5 or
+ // 1,2,6, we arbitrarily keep the first one we encounter which will be
+ // 1,2,5).
+ //
+ // To do this pruning, we first check if there were any not-cold
+ // contexts kept for a deeper caller, which will have a context length larger
+ // than the CallerContextLength being handled here (i.e. kept by a deeper
+ // recursion step). If so, none of the not-cold MIB nodes added for the
+ // immediate callers need to be kept. If not, we keep the first (created
+ // for the immediate caller) not-cold MIB node.
+ unsigned LongestNonCold = 0;
+ for (auto *MIB : NewMIBNodes) {
+ auto MIBMD = cast<MDNode>(MIB);
+ if (getMIBAllocType(MIBMD) == AllocationType::Cold)
+ continue;
+ MDNode *StackMD = getMIBStackNode(MIBMD);
+ assert(StackMD);
+ auto ContextLength = StackMD->getNumOperands();
+ if (ContextLength > LongestNonCold)
+ LongestNonCold = ContextLength;
+ }
+ // Don't need to emit any for the immediate caller if we already have
+ // longer overlapping contexts;
+ bool KeepFirstNewNotCold = LongestNonCold == CallerContextLength;
+ auto NewColdMIBNodes = make_filter_range(NewMIBNodes, [&](const Metadata *M) {
+ auto MIBMD = cast<MDNode>(M);
+ // Only keep cold contexts and first (longest non-cold context).
+ if (getMIBAllocType(MIBMD) != AllocationType::Cold) {
+ MDNode *StackMD = getMIBStackNode(MIBMD);
+ assert(StackMD);
+ auto ContextLength = StackMD->getNumOperands();
+ // Keep any already kept for longer contexts.
+ if (ContextLength > CallerContextLength)
+ return true;
+ // Otherwise keep the first one added by the immediate caller if there
+ // were no longer contexts.
+ if (KeepFirstNewNotCold) {
+ KeepFirstNewNotCold = false;
+ return true;
+ }
+ if (MemProfReportHintedSizes)
+ EmitMessageForRemovedContexts(MIBMD, "pruned", "");
+ return false;
+ }
+ return true;
+ });
+ for (auto *M : NewColdMIBNodes)
+ SavedMIBNodes.push_back(M);
+}
+
// Recursive helper to trim contexts and create metadata nodes.
// Caller should have pushed Node's loc to MIBCallStack. Doing this in the
// caller makes it simpler to handle the many early returns in this method.
bool CallStackTrie::buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
std::vector<uint64_t> &MIBCallStack,
std::vector<Metadata *> &MIBNodes,
- bool CalleeHasAmbiguousCallerContext,
- bool &CalleeDeepestAmbiguousAllocType) {
+ bool CalleeHasAmbiguousCallerContext) {
// Trim context below the first node in a prefix with a single alloc type.
// Add an MIB record for the current call stack prefix.
if (hasSingleAllocType(Node->AllocTypes)) {
- // Because we only clone cold contexts (we don't clone for exposing NotCold
- // contexts as that is the default allocation behavior), we create MIB
- // metadata for this context if any of the following are true:
- // 1) It is cold.
- // 2) The immediate callee is the deepest point where we have an ambiguous
- // allocation type (i.e. the other callers that are cold need to know
- // that we have a not cold context overlapping to this point so that we
- // know how deep to clone).
- // 3) MemProfKeepAllNotColdContexts is enabled, which is useful if we are
- // reporting hinted sizes, and want to get information from the indexing
- // step for all contexts, or have specified a value less than 100% for
- // -memprof-cloning-cold-threshold.
- if (Node->hasAllocType(AllocationType::Cold) ||
- CalleeDeepestAmbiguousAllocType || MemProfKeepAllNotColdContexts) {
- std::vector<ContextTotalSize> ContextSizeInfo;
- collectContextSizeInfo(Node, ContextSizeInfo);
- MIBNodes.push_back(createMIBNode(Ctx, MIBCallStack,
- (AllocationType)Node->AllocTypes,
- ContextSizeInfo));
- // If we just emitted an MIB for a not cold caller, don't need to emit
- // another one for the callee to correctly disambiguate its cold callers.
- if (!Node->hasAllocType(AllocationType::Cold))
- CalleeDeepestAmbiguousAllocType = false;
- }
+ std::vector<ContextTotalSize> ContextSizeInfo;
+ collectContextSizeInfo(Node, ContextSizeInfo);
+ MIBNodes.push_back(createMIBNode(
+ Ctx, MIBCallStack, (AllocationType)Node->AllocTypes, ContextSizeInfo));
return true;
}
@@ -291,14 +358,21 @@ bool CallStackTrie::buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
if (!Node->Callers.empty()) {
bool NodeHasAmbiguousCallerContext = Node->Callers.size() > 1;
bool AddedMIBNodesForAllCallerContexts = true;
+ // Accumulate all new MIB nodes by the recursive calls below into a vector
+ // that will later be filtered before adding to the caller's MIBNodes
+ // vector.
+ std::vector<Metadata *> NewMIBNodes;
for (auto &Caller : Node->Callers) {
MIBCallStack.push_back(Caller.first);
- AddedMIBNodesForAllCallerContexts &= buildMIBNodes(
- Caller.second, Ctx, MIBCallStack, MIBNodes,
- NodeHasAmbiguousCallerContext, Node->DeepestAmbiguousAllocType);
+ AddedMIBNodesForAllCallerContexts &=
+ buildMIBNodes(Caller.second, Ctx, MIBCallStack, NewMIBNodes,
+ NodeHasAmbiguousCallerContext);
// Remove Caller.
MIBCallStack.pop_back();
}
+ // Pass in the stack length of the MIB nodes added for the immediate caller,
+ // which is the current stack length plus 1.
+ SaveFilteredNewMIBNodes(NewMIBNodes, MIBNodes, MIBCallStack.size() + 1);
if (AddedMIBNodesForAllCallerContexts)
return true;
// We expect that the callers should be forced to add MIBs to disambiguate
@@ -372,13 +446,8 @@ bool CallStackTrie::buildAndAttachMIBMetadata(CallBase *CI) {
// The CalleeHasAmbiguousCallerContext flag is meant to say whether the
// callee of the given node has more than one caller. Here the node being
// passed in is the alloc and it has no callees. So it's false.
- // Similarly, the last parameter is meant to say whether the callee of the
- // given node is the deepest point where we have ambiguous alloc types, which
- // is also false as the alloc has no callees.
- bool DeepestAmbiguousAllocType = true;
if (buildMIBNodes(Alloc, Ctx, MIBCallStack, MIBNodes,
- /*CalleeHasAmbiguousCallerContext=*/false,
- DeepestAmbiguousAllocType)) {
+ /*CalleeHasAmbiguousCallerContext=*/false)) {
assert(MIBCallStack.size() == 1 &&
"Should only be left with Alloc's location in stack");
CI->setMetadata(LLVMContext::MD_memprof, MDNode::get(Ctx, MIBNodes));
diff --git a/llvm/test/Transforms/PGOProfile/memprof.ll b/llvm/test/Transforms/PGOProfile/memprof.ll
index 73226df861ea5..4a3ddcc38b263 100644
--- a/llvm/test/Transforms/PGOProfile/memprof.ll
+++ b/llvm/test/Transforms/PGOProfile/memprof.ll
@@ -63,15 +63,20 @@
;; give both memprof and pgo metadata.
; RUN: opt < %s -passes='pgo-instr-use,memprof-use<profile-filename=%t.pgomemprofdata>' -pgo-test-profile-file=%t.pgomemprofdata -pgo-warn-missing-function -S 2>&1 | FileCheck %s --check-prefixes=MEMPROF,ALL,PGO
-;; Check that the total sizes are reported if requested.
-; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes -memprof-keep-all-not-cold-contexts 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZES
+;; Check that the total sizes are reported if requested. A message should be
+;; emitted for the pruned context.
+; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZES,TOTALSIZENOKEEPALL
+
+;; Check that the total sizes are reported if requested, and prevent pruning
+;; via -memprof-keep-all-not-cold-contexts.
+; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes -memprof-keep-all-not-cold-contexts 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZES,TOTALSIZESKEEPALL
;; Check that we hint additional allocations with a threshold < 100%
; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes -memprof-matching-cold-threshold=60 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZESTHRESH60
;; Make sure that the -memprof-cloning-cold-threshold flag is enough to cause
;; the size metadata to be generated for the LTO link.
-; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-cloning-cold-threshold=80 -memprof-keep-all-not-cold-contexts 2>&1 | FileCheck %s --check-prefixes=TOTALSIZES
+; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-cloning-cold-threshold=80 -memprof-keep-all-not-cold-contexts 2>&1 | FileCheck %s --check-prefixes=TOTALSIZES,TOTALSIZESKEEPALL
;; Make sure we emit a random hotness seed if requested.
; RUN: llvm-profdata merge -memprof-random-hotness %S/Inputs/memprof.memprofraw --profiled-binary %S/Inputs/memprof.exe -o %t.memprofdatarand 2>&1 | FileCheck %s --check-prefix=RAND
@@ -370,6 +375,8 @@ for.end: ; preds = %for.cond
; MEMPROF: ![[C10]] = !{i64 2061451396820446691}
; MEMPROF: ![[C11]] = !{i64 1544787832369987002}
+; TOTALSIZENOKEEPALL: Total size for pruned non-cold full allocation context hash 1093248920606587996: 10
+
;; For non-context sensitive allocations that get attributes we emit a message
;; with the full allocation context hash, type, and size in bytes.
; TOTALSIZESTHRESH60: Total size for full allocation context hash 8525406123785421946 and dominant alloc type cold: 10
@@ -393,8 +400,8 @@ for.end: ; preds = %for.cond
; TOTALSIZES: !"cold", ![[CONTEXT4:[0-9]+]], ![[CONTEXT5:[0-9]+]]}
; TOTALSIZES: ![[CONTEXT4]] = !{i64 -2103941543456458045, i64 10}
; TOTALSIZES: ![[CONTEXT5]] = !{i64 -191931298737547222, i64 10}
-; TOTALSIZES: !"notcold", ![[CONTEXT6:[0-9]+]]}
-; TOTALSIZES: ![[CONTEXT6]] = !{i64 1093248920606587996, i64 10}
+; TOTALSIZESKEEPALL: !"notcold", ![[CONTEXT6:[0-9]+]]}
+; TOTALSIZESKEEPALL: ![[CONTEXT6]] = !{i64 1093248920606587996, i64 10}
; MEMPROFNOCOLINFO: #[[A1]] = { builtin allocsize(0) "memprof"="notcold" }
; MEMPROFNOCOLINFO: #[[A2]] = { builtin allocsize(0) "memprof"="cold" }
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/190/builds/19634 Here is the relevant piece of the build log for the reference
|
…138792) This change is mostly NFC, other than the addition of a new message printed when contexts are pruned when -memprof-report-hinted-sizes is enabled. To prepare for a follow on change, adjust the way we determine which NotCold contexts can be pruned (because they overlap with longer NotCold contexts), and change the way we perform this pruning. Instead of determining the points at which we need to keep NotCold contexts during the building of the trie, we now determine this on the fly as the MIB metadata nodes are recursively built. This simplifies a follow on change that performs additional pruning of some NotCold contexts, and which can affect which others need to be kept as the longest overlapping NotCold contexts.
This change is mostly NFC, other than the addition of a new message
printed when contexts are pruned when -memprof-report-hinted-sizes is
enabled.
To prepare for a follow on change, adjust the way we determine which
NotCold contexts can be pruned (because they overlap with longer NotCold
contexts), and change the way we perform this pruning.
Instead of determining the points at which we need to keep NotCold
contexts during the building of the trie, we now determine this on the
fly as the MIB metadata nodes are recursively built. This simplifies a
follow on change that performs additional pruning of some NotCold
contexts, and which can affect which others need to be kept as the
longest overlapping NotCold contexts.