Skip to content

[llvm][CodeGen] Add a new software pipeliner 'Window Scheduler' #84443

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 13, 2024

Conversation

huaatian
Copy link
Contributor

@huaatian huaatian commented Mar 8, 2024

This commit implements the Window Scheduler as described in the RFC:
https://discourse.llvm.org/t/rfc-window-scheduling-algorithm-for-machinepipeliner-in-llvm/74718

This Window Scheduler implements the window algorithm designed by
Steven Muchnick in the book "Advanced Compiler Design And Implementation",
with some improvements:

  1. Copy 3 times of the loop kernel and construct the corresponding DAG
    to identify dependencies between MIs;
  2. Use heuristic algorithm to obtain a set of window offsets.

The window algorithm is equivalent to modulo scheduling algorithm with a
stage of 2. It is mainly applied in targets where hardware resource
conflicts are severe, and the SMS algorithm often fails in such cases.
On our own DSA, this window algorithm typically can achieve a performance
improvement of over 10%.

Co-authored-by: Kai Yan [email protected]
Co-authored-by: Ran Xiao [email protected]

This commit implements the Window Scheduler as described in the RFC:
https://discourse.llvm.org/t/rfc-window-scheduling-algorithm-for-machinepipeliner-in-llvm/74718

This Window Scheduler implements the window algorithm designed by
Steven Muchnick in the book "Advanced Compiler Design And Implementation",
with some improvements:

1. Copy 3 times of the loop kernel and construct the corresponding DAG
   to identify dependencies between MIs;
2. Use heuristic algorithm to obtain a set of window offsets.

The window algorithm is equivalent to modulo scheduling algorithm with a
stage of 2. It is mainly applied in targets where hardware resource
conflicts are severe, and the SMS algorithm often fails in such cases.
On our own DSA, this window algorithm typically can achieve a performance
improvement of over 10%.

Co-authored-by: Kai Yan <[email protected]>
Co-authored-by: Ran Xiao <[email protected]>
body: |
bb.0.entry:
successors: %bb.2(0x30000000), %bb.1(0x50000000)
liveins: $r0, $r1, $r2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you run this through -run-pass=none to compact the register numbers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

declare <32 x i32> @llvm.hexagon.V6.vsubw.128B(<32 x i32>, <32 x i32>)

attributes #0 = { "target-features"="+hvx-length128b,+hvxv69,+v66,-long-calls" }
...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the IR references in the MMOs relevant to the scheduling test? If not, can you drop the IR section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the exp and sqrt cases, MMO is indeed needed, otherwise, a 'barrier' dependency will be generated; the other two cases have been updated.
image
image

Context.MF = MF;
Context.MLI = MLI;
Context.MDT = MDT;
Context.PassConfig = &getAnalysis<TargetPassConfig>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the WindowScheduler, we use createMachineScheduler() through 'PassConfig' to call the target's custom MachineScheduler:
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, two ScheduleDAGs are used here: one for analyzing the dependencies between all instructions after copying, and the other for scheduling instructions in the window. The difference is that the former does not need to consider register pressure.

@huaatian
Copy link
Contributor Author

Hi everyone, I would like to ask if there are any new review comments? Thank you for your time. @arsenm @dtcxzyw @davemgreen @bcahoon @ytmukai @jayfoad

@@ -199,6 +199,9 @@ class TargetSubtargetInfo : public MCSubtargetInfo {
/// True if the subtarget should run MachinePipeliner
virtual bool enableMachinePipeliner() const { return true; };

/// True if the subtarget should run WindowScheduler.
virtual bool enableWindowScheduler() const { return true; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On our own DSA, this window algorithm typically can achieve a performance
improvement of over 10%.

Could you please share some performance data (e.g., SPEC benchmarks) on other non-VLIW architectures? IIRC AArch64 and PowerPC also support MachinePipeliner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverse ping :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over the past few days, we have specifically tested the performance of the Software Pipeliner on aarch64. The test hardware and software environment is as follows: Apple M1 Pro 32GB, Docker 25.0.3, Ubuntu 22.04.4 LTS, GCC 11.4.0, and SPEC2006 1.2. We ran 11 integer benchmarks in ref mode, each 5 times. The final result shows that the base performance is 51.2, and the performance with software pipelining enabled is 51.0, which are almost the same.
Although this result is strongly related to the local test conditions, we believe the relative results are credible. This is because there are very few loops in SPEC that meet the criteria for applying software pipelining, and even fewer loops with long computation times (which aligns with the original design intention of SPEC, "Computer Architecture: A Quantitative Approach" 1.11). Therefore, we still believe that the software pipelining algorithm should play a major role in DSP or DSA.

@dtcxzyw dtcxzyw requested a review from chenzheng1030 March 19, 2024 07:58
@huaatian
Copy link
Contributor Author

huaatian commented Mar 26, 2024

Hello everyone, do you have some new review coimments? Thank you for your time.
We have done a lot of research in the field of DSA, and we would like to share the window scheduling algorithm with all developers. It has indeed helped us improve the performance of AI operators on our DSA,and we hope that it can also be beneficial to others:)
ping @arsenm @dtcxzyw @davemgreen @bcahoon @ytmukai @jayfoad @chenzheng1030

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to do a deeper look


WindowScheduler::WindowScheduler(MachineSchedContext *C, MachineLoop &ML)
: Context(C), MF(C->MF), MBB(ML.getHeader()), Loop(ML) {
Subtarget = &(MF->getSubtarget());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Subtarget = &(MF->getSubtarget());
Subtarget = &MF->getSubtarget();

Can do this in initializer list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment on lines +203 to +208
for (auto Def : PhiDefs)
if (MI.readsRegister(Def, TRI)) {
LLVM_DEBUG(
dbgs()
<< "Consecutive phis are not allowed in window scheduling!\n");
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wouldn't be valid IR, so why is this validating this?

Copy link
Contributor Author

@huaatian huaatian Apr 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is used to handle loop-carried case,for example:
image
This is indeed a rare situation, and we have chosen not to schedule in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the current scheduler operates on post-SSA MIR, so the scheduler never encounters phis in the first place. Is this running somewhere different / earlier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this algorithm is used in the MachinePipeliner pass, where PHIs are present.

// Step 1: Performing the first copy of MBB instructions, excluding
// terminators. At the same time, we back up the anti-register of phis.
// DefPairs hold the old and new define register pairs.
std::map<Register, Register> DefPairs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid std::map

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

// %1 = phi i32 [%2, %BB.1], [%7, %BB.3]
// The new phi is:
// %1 = phi i32 [%2, %BB.1], [%11, %BB.3]
for (auto &Phi : MBB->phis())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Braces

Copy link
Contributor Author

@huaatian huaatian Apr 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated,thank you!

Comment on lines +369 to +370
if (Phi.readsRegister(DefRegPair.first, TRI))
Phi.substituteRegister(DefRegPair.first, DefRegPair.second, 0, *TRI);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just go direct to substituteRegister?

Copy link
Contributor Author

@huaatian huaatian Apr 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I have understood your comment correctly. Let me elaborate on this part of the code.
In this section of the algorithm, DefPairs also includes the substitute registers for the phi-defined registers. Therefore it must be constrained to replacing the registers that are read by phi.

assert(SearchRatio <= 100 && "SearchRatio should be equal or less than 100!");
unsigned MaxIdx = SchedInstrNum * SearchRatio / 100;
unsigned Step = SearchNum > 0 && SearchNum <= MaxIdx ? MaxIdx / SearchNum : 1;
SmallVector<unsigned> SearchIndexes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just produce this computed list when the vector would be consumed?

Copy link
Contributor Author

@huaatian huaatian Apr 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed a very good question!
Our main consideration is to facilitate modifications in downstream targets. These indexes are crucial for the performance of the window algorithm, and we have implemented target-specific search algorithm on our own DSA. We also recommend that other target developers, if possible, consider adopting more complex search algorithms.
So, we would still prefer to have these logics encapsulated in a separate function:)

Comment on lines 439 to 441
LLVM_DEBUG(dbgs() << "\tCycle " << CurCycle << " [S."
<< getOriStage(getOriMI(&MI), Offset) << "]: ");
LLVM_DEBUG(MI.dump());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just directly << MI in the first debug statement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment on lines 543 to 544
LLVM_DEBUG(dbgs() << "\tCycle range [0, " << LateCycle << "] ");
LLVM_DEBUG(Phi.dump());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, just << Phi above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Stages[MI] = std::get<2>(Info);
LLVM_DEBUG(dbgs() << "\tCycle " << Cycles[MI] << " [S." << Stages[MI]
<< "]: ");
LLVM_DEBUG(MI->dump());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@@ -0,0 +1,124 @@
# REQUIRES: asserts
# RUN: llc --march=hexagon %s -run-pass=pipeliner -O2 -debug-only=pipeliner \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-O2 won't do anything here, I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, thank you!
Updated

@huaatian
Copy link
Contributor Author

Thank you for your time. Are there any new review comments or feedback? @arsenm @dtcxzyw @davemgreen @bcahoon @ytmukai @jayfoad @chenzheng1030

@arsenm arsenm requested review from michaelmaitland and atrick May 7, 2024 20:45
SmallVector<Register, 8> PhiDefs;
auto PLI = TII->analyzeLoopForPipelining(MBB);
for (auto &MI : *MBB) {
if (MI.isDebugInstr() || MI.isTerminator())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this should be upgraded to isMetaInst

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment on lines +203 to +208
for (auto Def : PhiDefs)
if (MI.readsRegister(Def, TRI)) {
LLVM_DEBUG(
dbgs()
<< "Consecutive phis are not allowed in window scheduling!\n");
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the current scheduler operates on post-SSA MIR, so the scheduler never encounters phis in the first place. Is this running somewhere different / earlier?

"window scheduling!\n");
return false;
}
for (auto &Def : MI.defs())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be worried about all_defs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

return false;
}
for (auto &Def : MI.defs())
if (Def.isReg() && Def.getReg().isPhysical())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this ignore dead defs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we believe that dead defs do not need a special handling process.

@bzEq bzEq requested a review from jsji May 10, 2024 04:42
(MI->isTerminator() && Cnt < DuplicateNum - 1))
continue;
auto *NewMI = MF->CloneMachineInstr(MI);
DenseMap<Register, Register> NewDefs;
// New defines are updated.
for (auto MO : NewMI->defs())
for (auto MO : NewMI->all_defs())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these type of changes should have tests to go with them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, we have added test cases for dead def, implicit-def, and meta instruction.

@huaatian
Copy link
Contributor Author

huaatian commented Jun 6, 2024

Does anyone have any more comments? We really hope that this algorithm can be merged into the mainline so that other DSA developers can use it.
ping

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about scheduling, but I think this has waited long enough for more comments

@huaatian
Copy link
Contributor Author

huaatian commented Jun 13, 2024

Could someone please help merge our PR? Thank you very much!
ping

@dtcxzyw dtcxzyw merged commit b6bf402 into llvm:main Jun 13, 2024
4 checks passed
@huaatian
Copy link
Contributor Author

@dtcxzyw hi,yingwei,looks like our multiple commits were not squashed together~

@dtcxzyw
Copy link
Member

dtcxzyw commented Jun 13, 2024

@dtcxzyw hi,yingwei,looks like our multiple commits were not squashed together~

Do you mean the commit message?

@huaatian
Copy link
Contributor Author

huaatian commented Jun 13, 2024

@dtcxzyw hi,yingwei,looks like our multiple commits were not squashed together~

Do you mean the commit message?

Yes,there are five commits here. Sorry for the trouble.
image

@dtcxzyw
Copy link
Member

dtcxzyw commented Jun 13, 2024

@dtcxzyw hi,yingwei,looks like our multiple commits were not squashed together~

Do you mean the commit message?

Yes,there are five commits here. Sorry for the trouble. image

Emm, you need to update the PR description before requesting a merge :)
Sorry about that.

@huaatian
Copy link
Contributor Author

@dtcxzyw hi,yingwei,looks like our multiple commits were not squashed together~

Do you mean the commit message?

Yes,there are five commits here. Sorry for the trouble. image

Emm, you need to update the PR description before requesting a merge :) Sorry about that.

Sorry for the trouble. How do you suggest we handle it now?

@dtcxzyw
Copy link
Member

dtcxzyw commented Jun 13, 2024

How do you suggest we handle it now?

Just keep it as is. Please remember to do it next time :)

@arsenm
Copy link
Contributor

arsenm commented Jun 13, 2024

How do you suggest we handle it now?

Just keep it as is. Please remember to do it next time :)

But squash-and-merge is the only enabled mode for the repository. How did you submit it?

@arsenm
Copy link
Contributor

arsenm commented Jun 13, 2024

How do you suggest we handle it now?

Just keep it as is. Please remember to do it next time :)

I see it squashed in the repo, so what is the issue?

@huaatian
Copy link
Contributor Author

Sorry, my mistake. I was looking at the wrong branch. There's no issue. Thank you, everyone!

@nathanchance
Copy link
Member

I am seeing a crash when building the Linux kernel for Hexagon after this change. A C and LLVM IR reproducer from cvise and llvm-reduce respectively:

struct khazad_ctx {
  long long E[1];
  long D[];
};
long T7[] = {};
void *khazad_setkey_in_key___trans_tmp_1;
int khazad_setkey_in_key_r;
void khazad_setkey_in_key() {
  struct khazad_ctx *ctx = khazad_setkey_in_key___trans_tmp_1;
  unsigned *key = (unsigned *)khazad_setkey_in_key;
  long long K1 = key[3];
  for (; khazad_setkey_in_key_r; khazad_setkey_in_key_r++) {
    K1 = ctx->E[khazad_setkey_in_key_r];
    ctx->D[khazad_setkey_in_key_r] = T7[K1 >> 32] ^ T7[K1 >> 4 & 5];
  }
}
target datalayout = "e-m:e-p:32:32:32-a:0-n16:32-i64:64:64-i32:32:32-i16:16:16-i1:8:8-f32:32:32-f64:64:64-v32:32:32-v64:64:64-v512:512:512-v1024:1024:1024-v2048:2048:2048"
target triple = "hexagon-unknown-linux"

define void @khazad_setkey_in_key(ptr %0, ptr %1) {
  br label %3

3:                                                ; preds = %3, %2
  %4 = phi i32 [ %16, %3 ], [ 0, %2 ]
  %5 = load i64, ptr %0, align 8
  %6 = trunc i64 %5 to i32
  store i32 %6, ptr %1, align 4
  %7 = lshr i64 %5, 32
  %8 = trunc i64 %7 to i32
  %9 = getelementptr [0 x i32], ptr null, i32 0, i32 %8
  %10 = load i32, ptr %9, align 4
  %11 = lshr i32 %6, 1
  %12 = and i32 %11, 1
  %13 = getelementptr [0 x i32], ptr null, i32 0, i32 %12
  %14 = load i32, ptr %13, align 4
  %15 = xor i32 %14, %10
  store i32 %15, ptr %0, align 4
  %16 = add i32 %4, 1
  %17 = icmp eq i32 %4, 0
  br i1 %17, label %18, label %3

18:                                               ; preds = %3
  ret void
}

@ a144bf2

$ clang --target=hexagon-linux -O2 -c -o /dev/null khazad.i

$ llc -o /dev/null reduced.ll

@ b6bf402

$ clang --target=hexagon-linux -O2 -c -o /dev/null khazad.i
clang: /home/nathan/tmp/cvise.XAc760fE9l/src/llvm/lib/CodeGen/WindowScheduler.cpp:650: int llvm::WindowScheduler::getOriCycle(MachineInstr *): Assertion `OriToCycle.count(OriMI) && "Cannot find schedule cycle!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: clang --target=hexagon-linux -O2 -c -o /dev/null khazad.i
1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module 'khazad.i'.
4.	Running pass 'Modulo Software Pipelining' on function '@khazad_setkey_in_key'
 #0 0x00005615664e1c36 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x3957c36)
 #1 0x00005615664df6ae llvm::sys::RunSignalHandlers() (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x39556ae)
 #2 0x0000561566463c1d CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x00007f67001dbae0 (/usr/lib/libc.so.6+0x3cae0)
 #4 0x00007f6700233e44 (/usr/lib/libc.so.6+0x94e44)
 #5 0x00007f67001dba30 raise (/usr/lib/libc.so.6+0x3ca30)
 #6 0x00007f67001c34c3 abort (/usr/lib/libc.so.6+0x244c3)
 #7 0x00007f67001c33df (/usr/lib/libc.so.6+0x243df)
 #8 0x00007f67001d3c67 (/usr/lib/libc.so.6+0x34c67)
 #9 0x0000561565e1d3ce llvm::WindowScheduler::getOriCycle(llvm::MachineInstr*) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x32933ce)
#10 0x0000561565e1cc5a llvm::WindowScheduler::calculateMaxCycle(llvm::ScheduleDAGInstrs&, unsigned int) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x3292c5a)
#11 0x0000561565e1daeb llvm::WindowScheduler::analyseII(llvm::ScheduleDAGInstrs&, unsigned int) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x3293aeb)
#12 0x0000561565e1a1ba llvm::WindowScheduler::run() (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x32901ba)
#13 0x0000561565a688e3 llvm::MachinePipeliner::runWindowScheduler(llvm::MachineLoop&) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x2ede8e3)
#14 0x0000561565a66afb llvm::MachinePipeliner::scheduleLoop(llvm::MachineLoop&) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x2edcafb)
#15 0x0000561565a6695b llvm::MachinePipeliner::runOnMachineFunction(llvm::MachineFunction&) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x2edc95b)
#16 0x0000561565a2464e llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x2e9a64e)
#17 0x0000561565fe9217 llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x345f217)
#18 0x0000561565ff1c22 llvm::FPPassManager::runOnModule(llvm::Module&) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x3467c22)
#19 0x0000561565fe9d17 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x345fd17)
#20 0x0000561566cfc4a0 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x41724a0)
#21 0x0000561566d22598 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x4198598)
#22 0x0000561568136819 clang::ParseAST(clang::Sema&, bool, bool) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x55ac819)
#23 0x000056156719f2dd clang::FrontendAction::Execute() (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x46152dd)
#24 0x000056156710590d clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x457b90d)
#25 0x0000561567279c14 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x46efc14)
#26 0x00005615650a6dd0 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x251cdd0)
#27 0x00005615650a36ae ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#28 0x0000561566f320c9 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::$_0>(long) Job.cpp:0:0
#29 0x0000561566463956 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x38d9956)
#30 0x0000561566f31753 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x43a7753)
#31 0x0000561566eea07c clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x436007c)
#32 0x0000561566eea5d7 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x43605d7)
#33 0x0000561566f0c729 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x4382729)
#34 0x00005615650a2b9d clang_main(int, char**, llvm::ToolContext const&) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x2518b9d)
#35 0x00005615650b3d56 main (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x2529d56)
#36 0x00007f67001c4c88 (/usr/lib/libc.so.6+0x25c88)
#37 0x00007f67001c4d4c __libc_start_main (/usr/lib/libc.so.6+0x25d4c)
#38 0x00005615650a0fa5 _start (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/clang-19+0x2516fa5)
clang: error: clang frontend command failed with exit code 134 (use -v to see invocation)
ClangBuiltLinux clang version 19.0.0git (https://github.com/llvm/llvm-project.git b6bf4024a031a5e7b58aff1425d94841a88002d6)
Target: hexagon-unknown-linux
Thread model: posix
InstalledDir: /home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin
Build config: +assertions
clang: note: diagnostic msg: Error generating preprocessed source(s) - no preprocessable inputs.

$ llc -o /dev/null reduced.ll
llc: /home/nathan/tmp/cvise.XAc760fE9l/src/llvm/lib/CodeGen/WindowScheduler.cpp:650: int llvm::WindowScheduler::getOriCycle(MachineInstr *): Assertion `OriToCycle.count(OriMI) && "Cannot find schedule cycle!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: llc -o /dev/null reduced.ll
1.      Running pass 'Function Pass Manager' on module 'reduced.ll'.
2.      Running pass 'Modulo Software Pipelining' on function '@khazad_setkey_in_key'
 #0 0x00005651bf6712a6 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/llc+0x353c2a6)
 #1 0x00005651bf66ecae llvm::sys::RunSignalHandlers() (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/llc+0x3539cae)
 #2 0x00005651bf6719b4 SignalHandler(int) Signals.cpp:0:0
 #3 0x00007fbbbccc0ae0 (/usr/lib/libc.so.6+0x3cae0)
 #4 0x00007fbbbcd18e44 (/usr/lib/libc.so.6+0x94e44)
 #5 0x00007fbbbccc0a30 raise (/usr/lib/libc.so.6+0x3ca30)
 #6 0x00007fbbbcca84c3 abort (/usr/lib/libc.so.6+0x244c3)
 #7 0x00007fbbbcca83df (/usr/lib/libc.so.6+0x243df)
 #8 0x00007fbbbccb8c67 (/usr/lib/libc.so.6+0x34c67)
 #9 0x00005651bea1e8be llvm::WindowScheduler::getOriCycle(llvm::MachineInstr*) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/llc+0x28e98be)
#10 0x00005651bea1e14a llvm::WindowScheduler::calculateMaxCycle(llvm::ScheduleDAGInstrs&, unsigned int) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/llc+0x28e914a)
#11 0x00005651bea1efdb llvm::WindowScheduler::analyseII(llvm::ScheduleDAGInstrs&, unsigned int) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/llc+0x28e9fdb)
#12 0x00005651bea1b56a llvm::WindowScheduler::run() (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/llc+0x28e656a)
#13 0x00005651be709b53 llvm::MachinePipeliner::runWindowScheduler(llvm::MachineLoop&) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/llc+0x25d4b53)
#14 0x00005651be707d1b llvm::MachinePipeliner::scheduleLoop(llvm::MachineLoop&) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/llc+0x25d2d1b)
#15 0x00005651be707b7b llvm::MachinePipeliner::runOnMachineFunction(llvm::MachineFunction&) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/llc+0x25d2b7b)
#16 0x00005651be69875e llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/llc+0x256375e)
#17 0x00005651bec14b97 llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/llc+0x2adfb97)
#18 0x00005651bec1d682 llvm::FPPassManager::runOnModule(llvm::Module&) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/llc+0x2ae8682)
#19 0x00005651bec15727 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/llc+0x2ae0727)
#20 0x00005651bdb84a42 main (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/llc+0x1a4fa42)
#21 0x00007fbbbcca9c88 (/usr/lib/libc.so.6+0x25c88)
#22 0x00007fbbbcca9d4c __libc_start_main (/usr/lib/libc.so.6+0x25d4c)
#23 0x00005651bdb7e8e5 _start (/home/nathan/tmp/cvise.XAc760fE9l/install/llvm-bad/bin/llc+0x1a498e5)
# bad: [e84ecf26fa5d9a4be4da078a1f85e988731308af] [NFC][PowerPC] Add test to check lanemasks for subregisters. (#94363)
# good: [a91c8398f22c28618d681497e9856c3a4b8753c3] [MC] Move -save-temp-labels from llvm-mc to MCTargetOptionsCommandFlags
git bisect start 'e84ecf26fa5d9a4be4da078a1f85e988731308af' 'a91c8398f22c28618d681497e9856c3a4b8753c3'
# bad: [ba7d5ebe4bb2dc9b6885adf8346529e763cd6fce] [libc] Fix build breaks caused by f16sqrtf changes (#95459)
git bisect bad ba7d5ebe4bb2dc9b6885adf8346529e763cd6fce
# bad: [71e4d70f0bb04bae6c8521c711c770293fd228a5] [clang][NFC] Update CWG issues list                                                                                                                                          git bisect bad 71e4d70f0bb04bae6c8521c711c770293fd228a5
# good: [65f746e76c97b6f8aece139199aed44ce632255c] [flang] Update UBOUND runtime API and lowering (#95085)
git bisect good 65f746e76c97b6f8aece139199aed44ce632255c
# bad: [846e47e7b880bcf6b8f5773fe0fe236d486f3239] [MC] Reduce size of MCDataFragment by 8 bytes (#95293)
git bisect bad 846e47e7b880bcf6b8f5773fe0fe236d486f3239
# good: [445973caceea9154b7f05a0b574ced346955be87] [LegalizeTypes] Handle non byte-sized elt types when splitting INSERT/EXTRACT_VECTOR_ELT (#93357)
git bisect good 445973caceea9154b7f05a0b574ced346955be87
# good: [3475116e2c37a2c8a69658b36c02871c322da008] [clang][NFC] Add a test for CWG2685 (#95206)
git bisect good 3475116e2c37a2c8a69658b36c02871c322da008
# bad: [b6bf4024a031a5e7b58aff1425d94841a88002d6] [llvm][CodeGen] Add a new software pipeliner 'Window Scheduler' (#84443)
git bisect bad b6bf4024a031a5e7b58aff1425d94841a88002d6
# good: [a144bf2b2511b47fc165755817eda17f79ef5476] [Clang] Fix handling of brace ellison when building deduction guides (#94889)
git bisect good a144bf2b2511b47fc165755817eda17f79ef5476
# first bad commit: [b6bf4024a031a5e7b58aff1425d94841a88002d6] [llvm][CodeGen] Add a new software pipeliner 'Window Scheduler' (#84443)

@huaatian
Copy link
Contributor Author

huaatian commented Jun 15, 2024

Okay, we will handle it right away.

@huaatian
Copy link
Contributor Author

huaatian commented Jun 15, 2024

Okay, we will handle it right away.

Here is the patch with the corresponding solution. Please try it out to see if it resolves the issue. Sorry for the inconvenience.@nathanchance
#95636

@nathanchance
Copy link
Member

Here is the patch with the corresponding solution. Please try it out to see if it resolves the issue. Sorry for the inconvenience.@nathanchance #95636

No worries, thanks for the forward fix! I found another crash that is not resolved with it.

void *poll_for_response_data;
long poll_for_response_response_0;
int poll_for_response_crc_err_retries;
void poll_for_response(char size) {
  int i;
  char *data_byte = poll_for_response_data;
retry:
  if (poll_for_response_crc_err_retries)
    goto fail;
  {
    unsigned val = poll_for_response_response_0;
    i = 0;
    for (; i < size; i++)
      data_byte[-i - 1] = val >>= 8;
  }
  goto retry;
fail:;
}
$ clang --target=hexagon-linux -O2 -c -o /dev/null fsi-master-gpio.i
clang: /home/nathan/tmp/cvise.YaBcQe05cQ/src/llvm/lib/CodeGen/WindowScheduler.cpp:650: int llvm::WindowScheduler::getOriCycle(MachineInstr *): Assertion `TriToOri.count(NewMI) && "Cannot find original MI!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: clang --target=hexagon-linux -O2 -c -o /dev/null fsi-master-gpio.i
1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module 'fsi-master-gpio.i'.
4.	Running pass 'Modulo Software Pipelining' on function '@poll_for_response'
 #0 0x000000000330fc60 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x330fc60)
 #1 0x000000000330dba8 llvm::sys::RunSignalHandlers() (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x330dba8)
 #2 0x00000000032992e4 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x0000ffffb26797f0 (linux-vdso.so.1+0x7f0)
 #4 0x0000ffffb20385e0 __pthread_kill_implementation (/lib64/libc.so.6+0x985e0)
 #5 0x0000ffffb1fe5a00 gsignal (/lib64/libc.so.6+0x45a00)
 #6 0x0000ffffb1fd0288 abort (/lib64/libc.so.6+0x30288)
 #7 0x0000ffffb1fde3e0 __assert_fail_base (/lib64/libc.so.6+0x3e3e0)
 #8 0x0000ffffb1fde454 (/lib64/libc.so.6+0x3e454)
 #9 0x0000000002cc5728 llvm::WindowScheduler::getOriCycle(llvm::MachineInstr*) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x2cc5728)
#10 0x0000000002cc641c llvm::WindowScheduler::schedulePhi(int, unsigned int&) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x2cc641c)
#11 0x0000000002cc2770 llvm::WindowScheduler::run() (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x2cc2770)
#12 0x000000000292f16c llvm::MachinePipeliner::runWindowScheduler(llvm::MachineLoop&) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x292f16c)
#13 0x000000000292d7a4 llvm::MachinePipeliner::scheduleLoop(llvm::MachineLoop&) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x292d7a4)
#14 0x000000000292d6f4 llvm::MachinePipeliner::scheduleLoop(llvm::MachineLoop&) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x292d6f4)
#15 0x000000000292d5e8 llvm::MachinePipeliner::runOnMachineFunction(llvm::MachineFunction&) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x292d5e8)
#16 0x00000000028f0b24 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x28f0b24)
#17 0x0000000002e7a160 llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x2e7a160)
#18 0x0000000002e81be8 llvm::FPPassManager::runOnModule(llvm::Module&) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x2e81be8)
#19 0x0000000002e7aa80 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x2e7aa80)
#20 0x0000000003a622c4 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x3a622c4)
#21 0x0000000003a84404 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x3a84404)
#22 0x0000000004c75190 clang::ParseAST(clang::Sema&, bool, bool) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x4c75190)
#23 0x0000000003e3b810 clang::FrontendAction::Execute() (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x3e3b810)
#24 0x0000000003dc1974 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x3dc1974)
#25 0x0000000003f07098 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x3f07098)
#26 0x0000000002081dec cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x2081dec)
#27 0x000000000207ec98 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#28 0x0000000003c673e0 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::$_0>(long) Job.cpp:0:0
#29 0x000000000329904c llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x329904c)
#30 0x0000000003c669c0 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x3c669c0)
#31 0x0000000003c2e478 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x3c2e478)
#32 0x0000000003c2e6c4 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x3c2e6c4)
#33 0x0000000003c47710 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x3c47710)
#34 0x000000000207e06c clang_main(int, char**, llvm::ToolContext const&) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x207e06c)
#35 0x000000000208c138 main (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x208c138)
#36 0x0000ffffb1fd0a1c __libc_start_call_main (/lib64/libc.so.6+0x30a1c)
#37 0x0000ffffb1fd0afc __libc_start_main@GLIBC_2.17 (/lib64/libc.so.6+0x30afc)
#38 0x000000000207c8b0 _start (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/clang-19+0x207c8b0)
clang: error: clang frontend command failed with exit code 134 (use -v to see invocation)
ClangBuiltLinux clang version 19.0.0git (https://github.com/llvm/llvm-project.git 355e4a9e56c644f24fc10f780cb2fc68b660d0a0)
Target: hexagon-unknown-linux
Thread model: posix
InstalledDir: /home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin
Build config: +assertions
clang: note: diagnostic msg: Error generating preprocessed source(s) - no preprocessable inputs.

target datalayout = "e-m:e-p:32:32:32-a:0-n16:32-i64:64:64-i32:32:32-i16:16:16-i1:8:8-f32:32:32-f64:64:64-v32:32:32-v64:64:64-v512:512:512-v1024:1024:1024-v2048:2048:2048"
target triple = "hexagon-unknown-linux"

define void @poll_for_response(i32 %0, ptr %1) {
  br label %4

3:                                                ; preds = %4
  ret void

4:                                                ; preds = %4, %2
  %5 = phi i32 [ 0, %4 ], [ %0, %2 ]
  %6 = phi i32 [ 1, %4 ], [ 0, %2 ]
  %7 = phi i32 [ %11, %4 ], [ 0, %2 ]
  %8 = lshr i32 %5, 1
  %9 = trunc i32 %8 to i8
  store i8 %9, ptr %1, align 1
  %10 = getelementptr i8, ptr %1, i32 %6
  store i8 0, ptr %10, align 1
  %11 = add i32 %7, 1
  %12 = icmp eq i32 %7, 1
  br i1 %12, label %3, label %4
}
$ llc -o /dev/null reduced.ll
llc: /home/nathan/tmp/cvise.YaBcQe05cQ/src/llvm/lib/CodeGen/WindowScheduler.cpp:650: int llvm::WindowScheduler::getOriCycle(MachineInstr *): Assertion `TriToOri.count(NewMI) && "Cannot find original MI!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: llc -o /dev/null reduced.ll
1.      Running pass 'Function Pass Manager' on module 'reduced.ll'.
2.      Running pass 'Modulo Software Pipelining' on function '@poll_for_response'
 #0 0x00000000031fa6f8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/llc+0x31fa6f8)
 #1 0x00000000031f85a0 llvm::sys::RunSignalHandlers() (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/llc+0x31f85a0)
 #2 0x00000000031fae14 SignalHandler(int) Signals.cpp:0:0
 #3 0x0000ffff9199b7f0 (linux-vdso.so.1+0x7f0)
 #4 0x0000ffff913585e0 __pthread_kill_implementation (/lib64/libc.so.6+0x985e0)
 #5 0x0000ffff91305a00 gsignal (/lib64/libc.so.6+0x45a00)
 #6 0x0000ffff912f0288 abort (/lib64/libc.so.6+0x30288)
 #7 0x0000ffff912fe3e0 __assert_fail_base (/lib64/libc.so.6+0x3e3e0)
 #8 0x0000ffff912fe454 (/lib64/libc.so.6+0x3e454)
 #9 0x00000000026d81b4 llvm::WindowScheduler::getOriCycle(llvm::MachineInstr*) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/llc+0x26d81b4)
#10 0x00000000026d8ea8 llvm::WindowScheduler::schedulePhi(int, unsigned int&) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/llc+0x26d8ea8)
#11 0x00000000026d50a4 llvm::WindowScheduler::run() (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/llc+0x26d50a4)
#12 0x00000000023edbf8 llvm::MachinePipeliner::runWindowScheduler(llvm::MachineLoop&) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/llc+0x23edbf8)
#13 0x00000000023ec1e8 llvm::MachinePipeliner::scheduleLoop(llvm::MachineLoop&) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/llc+0x23ec1e8)
#14 0x00000000023ec02c llvm::MachinePipeliner::runOnMachineFunction(llvm::MachineFunction&) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/llc+0x23ec02c)
#15 0x0000000002380258 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/llc+0x2380258)
#16 0x00000000028ba804 llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/llc+0x28ba804)
#17 0x00000000028c2348 llvm::FPPassManager::runOnModule(llvm::Module&) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/llc+0x28c2348)
#18 0x00000000028bb1b4 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/llc+0x28bb1b4)
#19 0x00000000019a1da8 main (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/llc+0x19a1da8)
#20 0x0000ffff912f0a1c __libc_start_call_main (/lib64/libc.so.6+0x30a1c)
#21 0x0000ffff912f0afc __libc_start_main@GLIBC_2.17 (/lib64/libc.so.6+0x30afc)
#22 0x000000000199c4f0 _start (/home/nathan/tmp/cvise.YaBcQe05cQ/install/llvm-bad/bin/llc+0x199c4f0)

@huaatian
Copy link
Contributor Author

huaatian commented Jun 18, 2024

We have addressed this issue in this patch #95900. Thank you~


Changed = swingModuloScheduler(L);
if (useWindowScheduler(Changed))
Changed = runWindowScheduler(L);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we run one scheduler after another? This came up in a discussion today at the vectorizer meeting. cc: @ayalz

Copy link
Contributor Author

@huaatian huaatian Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your review comment. Let me explain the current design considerations:

  1. We understand that both SMS (Swing Modulo Scheduling) and WS (Window Scheduling) belong to the category of software pipelining algorithms, and the conditions for determining their feasibility are the same. To avoid redundant checks, we have placed both in the MachinePipeliner.
  2. The basic principle of both SMS and WS scheduling is to fold the loop multiple times to obtain the kernel. The advantage of SMS is that it can fold more times, i.e., the stage can be greater than 2. On the other hand, the advantage of WS is that it is less affected by resource conflicts and can always get a scheduling result. Therefore, performing WS after SMS fails can be seen as an enhanced algorithm for targets with many resource conflicts.

I hope my explanation addresses your concerns. Thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, as an example, our VLIW target experiences more hardware conflicts due to accurate modeling. This results in a higher failure rate for SMS. Therefore, we have opted to directly use the WS algorithm.

EthanLuisMcDonough pushed a commit to EthanLuisMcDonough/llvm-project that referenced this pull request Aug 13, 2024
…#84443)

This commit implements the Window Scheduler as described in the RFC:

https://discourse.llvm.org/t/rfc-window-scheduling-algorithm-for-machinepipeliner-in-llvm/74718

This Window Scheduler implements the window algorithm designed by
Steven Muchnick in the book "Advanced Compiler Design And
Implementation",
with some improvements:

1. Copy 3 times of the loop kernel and construct the corresponding DAG
   to identify dependencies between MIs;
2. Use heuristic algorithm to obtain a set of window offsets.

The window algorithm is equivalent to modulo scheduling algorithm with a
stage of 2. It is mainly applied in targets where hardware resource
conflicts are severe, and the SMS algorithm often fails in such cases.
On our own DSA, this window algorithm typically can achieve a
performance
improvement of over 10%.

Co-authored-by: Kai Yan <[email protected]>
Co-authored-by: Ran Xiao <[email protected]>

---------

Co-authored-by: Kai Yan <[email protected]>
Co-authored-by: Ran Xiao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants