Skip to content

Commit 355e694

Browse files
[MemProf] Fix clone edge comparison (#113753)
The issue fixed in PR113337 exposed a bug in the comparisons done in allocTypesMatch, which compares a vector of alloc types to those in the given vector of Edges. The form of std::equal used, which didn't provide the end iterator for the Edges vector, will iterate through as many entries in the Edges vector as in the InAllocTypes vector, which can fail if there are fewer entries in the Edges vector, because we may dereference a bogus Edge pointer. This function is called twice, once for the Node, with its callee edges, in which case the number of edges should always match the number of entries in allocTypesMatch, which is computed from the Node's callee edges. It was also called for Node's clones, and it turns out that after cloning and edge modifications done for other allocations, the number of callee edges in Node and its clones may no longer match. In some cases, more common with memprof ICP before the PR113337, the number of clone edges can be smaller leading to a bad dereference. I found for a large application even before adding memprof ICP support we sometimes call this with fewer entries in the clone's callee edges, but were getting lucky as they had allocation type None, and we didn't end up attempting to dereference the bad edge pointer. Fix this by passing Edges.end() to std::equal, which means std::equal will fail if the number of entries in the 2 vectors are not equal. However, this is too conservative, as clone edges may have been added or removed since it was initially cloned, and in fact can be wrong as we may not be comparing allocation types corresponding to the same callee. Therefore, a couple of enhancements are made to avoid regressing and improve the checking and cloning: - Don't bother calling the alloc type comparison when the clone and the Node's alloc type for the current allocation are precise (have a single allocation type) and are the same (which is guaranteed by an earlier check, and an assert is added to confirm that). In that case we can trivially determine that the clone can be used. - Split the alloc type matching handling into a separate function for the clone case. In that case, for each of the InAllocType entries, attempt to find and compare to the clone callee edge with the same callee as the corresponding original node callee. To create a test case I needed to take a spec application (xalancbmk), and repeatedly apply random hot/cold-ness to the memprof contexts when building, until I hit the problematic case. I then reduced that full LTO IR using llvm-reduce and then manually.
1 parent 93da642 commit 355e694

File tree

2 files changed

+159
-6
lines changed

2 files changed

+159
-6
lines changed

llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp

Lines changed: 60 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -928,8 +928,11 @@ bool allocTypesMatch(
928928
const std::vector<uint8_t> &InAllocTypes,
929929
const std::vector<std::shared_ptr<ContextEdge<DerivedCCG, FuncTy, CallTy>>>
930930
&Edges) {
931+
// This should be called only when the InAllocTypes vector was computed for
932+
// this set of Edges. Make sure the sizes are the same.
933+
assert(InAllocTypes.size() == Edges.size());
931934
return std::equal(
932-
InAllocTypes.begin(), InAllocTypes.end(), Edges.begin(),
935+
InAllocTypes.begin(), InAllocTypes.end(), Edges.begin(), Edges.end(),
933936
[](const uint8_t &l,
934937
const std::shared_ptr<ContextEdge<DerivedCCG, FuncTy, CallTy>> &r) {
935938
// Can share if one of the edges is None type - don't
@@ -942,6 +945,46 @@ bool allocTypesMatch(
942945
});
943946
}
944947

948+
// Helper to check if the alloc types for all edges recorded in the
949+
// InAllocTypes vector match the alloc types for callee edges in the given
950+
// clone. Because the InAllocTypes were computed from the original node's callee
951+
// edges, and other cloning could have happened after this clone was created, we
952+
// need to find the matching clone callee edge, which may or may not exist.
953+
template <typename DerivedCCG, typename FuncTy, typename CallTy>
954+
bool allocTypesMatchClone(
955+
const std::vector<uint8_t> &InAllocTypes,
956+
const ContextNode<DerivedCCG, FuncTy, CallTy> *Clone) {
957+
const ContextNode<DerivedCCG, FuncTy, CallTy> *Node = Clone->CloneOf;
958+
assert(Node);
959+
// InAllocTypes should have been computed for the original node's callee
960+
// edges.
961+
assert(InAllocTypes.size() == Node->CalleeEdges.size());
962+
// First create a map of the clone callee edge callees to the edge alloc type.
963+
DenseMap<const ContextNode<DerivedCCG, FuncTy, CallTy> *, uint8_t>
964+
EdgeCalleeMap;
965+
for (const auto &E : Clone->CalleeEdges) {
966+
assert(!EdgeCalleeMap.contains(E->Callee));
967+
EdgeCalleeMap[E->Callee] = E->AllocTypes;
968+
}
969+
// Next, walk the original node's callees, and look for the corresponding
970+
// clone edge to that callee.
971+
for (unsigned I = 0; I < Node->CalleeEdges.size(); I++) {
972+
auto Iter = EdgeCalleeMap.find(Node->CalleeEdges[I]->Callee);
973+
// Not found is ok, we will simply add an edge if we use this clone.
974+
if (Iter == EdgeCalleeMap.end())
975+
continue;
976+
// Can share if one of the edges is None type - don't
977+
// care about the type along that edge as it doesn't
978+
// exist for those context ids.
979+
if (InAllocTypes[I] == (uint8_t)AllocationType::None ||
980+
Iter->second == (uint8_t)AllocationType::None)
981+
continue;
982+
if (allocTypeToUse(Iter->second) != allocTypeToUse(InAllocTypes[I]))
983+
return false;
984+
}
985+
return true;
986+
}
987+
945988
} // end anonymous namespace
946989

947990
template <typename DerivedCCG, typename FuncTy, typename CallTy>
@@ -3364,11 +3407,22 @@ void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::identifyClones(
33643407
allocTypeToUse(CallerAllocTypeForAlloc))
33653408
continue;
33663409

3367-
if (!allocTypesMatch<DerivedCCG, FuncTy, CallTy>(
3368-
CalleeEdgeAllocTypesForCallerEdge, CurClone->CalleeEdges))
3369-
continue;
3370-
Clone = CurClone;
3371-
break;
3410+
bool BothSingleAlloc = hasSingleAllocType(CurClone->AllocTypes) &&
3411+
hasSingleAllocType(CallerAllocTypeForAlloc);
3412+
// The above check should mean that if both have single alloc types that
3413+
// they should be equal.
3414+
assert(!BothSingleAlloc ||
3415+
CurClone->AllocTypes == CallerAllocTypeForAlloc);
3416+
3417+
// If either both have a single alloc type (which are the same), or if the
3418+
// clone's callee edges have the same alloc types as those for the current
3419+
// allocation on Node's callee edges (CalleeEdgeAllocTypesForCallerEdge),
3420+
// then we can reuse this clone.
3421+
if (BothSingleAlloc || allocTypesMatchClone<DerivedCCG, FuncTy, CallTy>(
3422+
CalleeEdgeAllocTypesForCallerEdge, CurClone)) {
3423+
Clone = CurClone;
3424+
break;
3425+
}
33723426
}
33733427

33743428
// The edge iterator is adjusted when we move the CallerEdge to the clone.
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
;; Test to make sure we don't fail when cloning in a case where we end up with
2+
;; a clone that has fewer edges than the node it was initially cloned from.
3+
;; This test was reduced and simplified from xalancbmk with some random hotness
4+
;; applied to the profile that reproduced the issue.
5+
6+
; RUN: opt -passes=memprof-context-disambiguation -supports-hot-cold-new \
7+
; RUN: -memprof-verify-ccg -memprof-verify-nodes \
8+
; RUN: -pass-remarks=memprof-context-disambiguation %s -S 2>&1 | FileCheck %s
9+
10+
;; Make sure we created some clones
11+
; CHECK: created clone A.memprof.1
12+
; CHECK: created clone C.memprof.1
13+
; CHECK: created clone D.memprof.1
14+
; CHECK: created clone E.memprof.1
15+
; CHECK: created clone B.memprof.1
16+
; CHECK: created clone F.memprof.1
17+
; CHECK: created clone G.memprof.1
18+
19+
; ModuleID = '<stdin>'
20+
source_filename = "reduced.ll"
21+
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
22+
target triple = "x86_64-grtev4-linux-gnu"
23+
24+
define void @A() {
25+
call void @B(), !callsite !0
26+
ret void
27+
}
28+
29+
define void @C() {
30+
call void @D(), !callsite !1
31+
ret void
32+
}
33+
34+
define void @D() {
35+
call void @A(), !callsite !2
36+
ret void
37+
}
38+
39+
define void @E() {
40+
%1 = call ptr @_Znwm(i64 0), !memprof !3, !callsite !20
41+
ret void
42+
}
43+
44+
define void @B() {
45+
call void @F(), !callsite !21
46+
ret void
47+
}
48+
49+
define void @F() {
50+
call void @E(), !callsite !22
51+
call void @G(), !callsite !23
52+
ret void
53+
}
54+
55+
define void @G() {
56+
%1 = call ptr @_Znwm(i64 0), !memprof !24, !callsite !37
57+
ret void
58+
}
59+
60+
declare ptr @_Znwm(i64)
61+
62+
!0 = !{i64 1995602625719775354}
63+
!1 = !{i64 4312698517630782220}
64+
!2 = !{i64 5516454029445989383}
65+
!3 = !{!4, !6, !8, !10, !12, !14, !16, !18}
66+
!4 = !{!5, !"notcold"}
67+
!5 = !{i64 -2282665745786859978, i64 -6758300505622211768, i64 2938340307832638862, i64 7147584705143805656, i64 -6456074186195384663, i64 1995602625719775354}
68+
!6 = !{!7, !"cold"}
69+
!7 = !{i64 -2282665745786859978, i64 -6758300505622211768, i64 2938340307832638862, i64 7147584705143805656, i64 -6456074186195384663, i64 2077908580042347045, i64 4312698517630782220, i64 5379466077518675850}
70+
!8 = !{!9, !"cold"}
71+
!9 = !{i64 -2282665745786859978, i64 -6758300505622211768, i64 2938340307832638862, i64 7147584705143805656, i64 -6456074186195384663, i64 2077908580042347045, i64 4312698517630782220, i64 -7632894069000375689}
72+
!10 = !{!11, !"cold"}
73+
!11 = !{i64 -2282665745786859978, i64 -6758300505622211768, i64 2938340307832638862, i64 7147584705143805656, i64 -6456074186195384663, i64 2939944783060497247}
74+
!12 = !{!13, !"notcold"}
75+
!13 = !{i64 -2282665745786859978, i64 -6758300505622211768, i64 2938340307832638862, i64 7147584705143805656, i64 -6456074186195384663, i64 5642549674080861567, i64 5516454029445989383, i64 4312698517630782220, i64 -7632894069000375689}
76+
!14 = !{!15, !"cold"}
77+
!15 = !{i64 -2282665745786859978, i64 -6758300505622211768, i64 2938340307832638862, i64 7147584705143805656, i64 -6456074186195384663, i64 5642549674080861567, i64 5516454029445989383, i64 4312698517630782220, i64 -1805555115991223293}
78+
!16 = !{!17, !"notcold"}
79+
!17 = !{i64 -2282665745786859978, i64 -6758300505622211768, i64 2938340307832638862, i64 7147584705143805656, i64 -6456074186195384663, i64 -4746997736434041076, i64 5516454029445989383, i64 4312698517630782220, i64 -1805555115991223293}
80+
!18 = !{!19, !"notcold"}
81+
!19 = !{i64 -2282665745786859978, i64 -6758300505622211768, i64 2938340307832638862, i64 7147584705143805656, i64 -6456074186195384663, i64 -4637272929643682959}
82+
!20 = !{i64 -2282665745786859978, i64 -6758300505622211768, i64 2938340307832638862}
83+
!21 = !{i64 -6456074186195384663}
84+
!22 = !{i64 7147584705143805656}
85+
!23 = !{i64 3938822378769440754}
86+
!24 = !{!25, !27, !29, !31, !33, !35}
87+
!25 = !{!26, !"cold"}
88+
!26 = !{i64 -2282665745786859978, i64 -3548918226713766361, i64 4077289288013931196, i64 3938822378769440754, i64 -6456074186195384663, i64 1995602625719775354, i64 5516454029445989383, i64 4312698517630782220, i64 -1805555115991223293}
89+
!27 = !{!28, !"notcold"}
90+
!28 = !{i64 -2282665745786859978, i64 -3548918226713766361, i64 4077289288013931196, i64 3938822378769440754, i64 -6456074186195384663, i64 2077908580042347045, i64 4312698517630782220, i64 -7632894069000375689}
91+
!29 = !{!30, !"cold"}
92+
!30 = !{i64 -2282665745786859978, i64 -3548918226713766361, i64 4077289288013931196, i64 3938822378769440754, i64 -6456074186195384663, i64 -4746997736434041076, i64 5516454029445989383, i64 4312698517630782220, i64 -7632894069000375689}
93+
!31 = !{!32, !"notcold"}
94+
!32 = !{i64 -2282665745786859978, i64 -3548918226713766361, i64 4077289288013931196, i64 3938822378769440754, i64 -6456074186195384663, i64 -4746997736434041076, i64 5516454029445989383, i64 4312698517630782220, i64 -1805555115991223293}
95+
!33 = !{!34, !"cold"}
96+
!34 = !{i64 -2282665745786859978, i64 -3548918226713766361, i64 4077289288013931196, i64 3938822378769440754, i64 -6456074186195384663, i64 -4637272929643682959}
97+
!35 = !{!36, !"notcold"}
98+
!36 = !{i64 -2282665745786859978, i64 -3548918226713766361, i64 4077289288013931196, i64 3938822378769440754, i64 -6456074186195384663, i64 -4409412896859835674}
99+
!37 = !{i64 -2282665745786859978, i64 -3548918226713766361, i64 4077289288013931196}

0 commit comments

Comments
 (0)