Skip to content

[CIR][ABI] Add X86_64 float and double CC lowering #714

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 26, 2024

Conversation

sitio-couto
Copy link
Collaborator

Implements calling convention lowering of float and double arguments and return values conventions for X86_64.

@sitio-couto sitio-couto self-assigned this Jul 2, 2024
Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@bcardosolopes
Copy link
Member

Good to merge when the tests get fixed. From the bots, looks like it's triggering an assert:

Assertion failed: (llvm::isPowerOf2_32(Align) && "Alignment must be power of 2"), function getTypeInfoImpl, file CIRLowerContext.cpp, line 94.

@sitio-couto sitio-couto force-pushed the vinicius/add-x86-fp-cc-lowering branch from 95c6122 to 8abc5a5 Compare July 4, 2024 14:56
@sitio-couto
Copy link
Collaborator Author

sitio-couto commented Jul 4, 2024

@bcardosolopes I'll be running the x86_64/AArch64 CC lowering tests only when the ClangIR build is for Ubuntu. Windows and Darwin each have different behaviors and are throwing different issues. Setting up a dev environment for all three platforms would take too much time.

@bcardosolopes
Copy link
Member

@bcardosolopes I'll be running the x86_64/AArch64 CC lowering tests only when the ClangIR build is for Ubuntu. Windows and Darwin each have different behaviors and are throwing different issues. Setting up a dev environment for all three platforms would take too much time.

Unfortunately that's not acceptable, you should look at OG codegen tests and figure out what silly thing is missing, it's subtle but shouldn't be complicated - the host shouldn't affect codegen for the target.

@bcardosolopes
Copy link
Member

Let me try this locally on macOS and see if I can repro for you

@bcardosolopes
Copy link
Member

I can reproduce it, let me debug it and get back to you

@bcardosolopes
Copy link
Member

Ok, couple of different things going on. On a release build:

 #5 0x0000000107cb8da4 mlir::cir::CIRLowerContext::getTypeInfoImpl(mlir::Type) const (/bin/clang-19+0x107128da4)
 #6 0x0000000107cb8bec mlir::cir::CIRLowerContext::getTypeInfo(mlir::Type) const (/bin/clang-19+0x107128bec)
 #7 0x0000000107cc0c28 mlir::cir::CIRLowerContext::getTypeSize(mlir::Type) const (/bin/clang-19+0x107130c28)
 #8 0x0000000107cc1954 mlir::cir::X86_64ABIInfo::GetSSETypeAtOffset(mlir::Type, unsigned int, mlir::Type, unsigned int) const (/bin/clang-19+0x107131954)
 #9 0x0000000107cc2128 mlir::cir::X86_64ABIInfo::classifyReturnType(mlir::Type) const (/bin/clang-19+0x107132128)

However, on a debug build, it crashes a bit before as part of MergeCleanups:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x1f)
  * frame #0: 0x0000000107fad70c clang`llvm::detail::PunnedPointer<mlir::Region*>::asInt(this=0x000000000000001f) const at PointerIntPair.h:41:5
    frame #1: 0x0000000107fad6f0 clang`llvm::detail::PunnedPointer<mlir::Region*>::operator long(this=0x000000000000001f) const at PointerIntPair.h:45:48
    frame #2: 0x0000000107fa90b0 clang`llvm::PointerIntPair<mlir::Region*, 1u, bool, llvm::PointerLikeTypeTraits<mlir::Region*>, llvm::PointerIntPairInfo<mlir::Region*, 1u, llvm::PointerLikeTypeTraits<mlir::Region*>>>::getPointer(this=0x000000000000001f) const at PointerIntPair.h:94:58
    frame #3: 0x0000000107fa908c clang`mlir::Block::getParent(this=0x0000000000000007) const at Block.cpp:26:66
    frame #4: 0x0000000107fa90dc clang`mlir::Block::getParentOp(this=0x0000000000000007) at Block.cpp:31:10
    frame #5: 0x0000000105d5da2c clang`mlir::Operation::getParentOp(this=0x000000014b64e3d0) at Operation.h:234:52
    frame #6: 0x0000000107b5aee4 clang`(anonymous namespace)::getDumpRootOp(op=0x000000014b64e3d0) at GreedyPatternRewriteDriver.cpp:177:33
    frame #7: 0x0000000107b5a4d0 clang`(anonymous namespace)::GreedyPatternRewriteDriver::processWorklist(this=0x000000016fdfacf8) at GreedyPatternRewriteDriver.cpp:496:33
    frame #8: 0x0000000107b554c4 clang`(anonymous namespace)::MultiOpPatternRewriteDriver::simplify(this=0x000000016fdfacf8, ops=ArrayRef<mlir::Operation *> @ 0x000000016fdfac18, changed=0x0000000000000000) && at GreedyPatternRewriteDriver.cpp:979:17
    frame #9: 0x0000000107b55038 clang`mlir::applyOpPatternsAndFold(ops=ArrayRef<mlir::Operation *> @ 0x000000016fdface0, patterns=0x000000016fdfaf38, config=GreedyRewriteConfig @ 0x000000016fdfaed8, changed=0x0000000000000000, allErased=0x0000000000000000) at GreedyPatternRewriteDriver.cpp:1052:47
    frame #10: 0x0000000107109a3c clang`(anonymous namespace)::MergeCleanupsPass::runOnOperation(this=0x000000014b64e5a0) at MergeCleanups.cpp:94:7
    frame #11: 0x0000000107dcef0c clang`mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_7::operator()(this=0x000000016fdfb278) const at Pass.cpp:527:17
    frame #12: 0x0000000107dcee90 clang`void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_7>(callable=6171898488) at STLFunctionalExtras.h:45:12
    frame #13: 0x00000001017d82a0 clang`llvm::function_ref<void ()>::operator()(this=0x000000016fdfb1a0) const at STLFunctionalExtras.h:68:12
    frame #14: 0x0000000107dc1464 clang`void mlir::MLIRContext::executeAction<mlir::PassExecutionAction, mlir::Pass&>(this=0x000000014b606570, actionFn=mlir::function_ref<void ()> @ 0x000000016fdfb1a0, irUnits=ArrayRef<mlir::IRUnit> @ 0x000000016fdfb190, args=0x000000014b64e5a0) at MLIRContext.h:275:7
    frame #15: 0x0000000107dc114c clang`mlir::detail::OpToOpPassAdaptor::run(pass=0x000000014b64e5a0, op=0x000000014b63ba60, am=AnalysisManager @ 0x000000016fdfb3b0, verifyPasses=true, parentInitGeneration=1) at Pass.cpp:521:21
    frame #16: 0x0000000107dc1880 clang`mlir::detail::OpToOpPassAdaptor::runPipeline(pm=0x000000016fdfbbf8, op=0x000000014b63ba60, am=AnalysisManager @ 0x000000016fdfb700, verifyPasses=true, parentInitGeneration=1, instrumentor=0x0000000000000000, parentInfo=0x0000000000000000) at Pass.cpp:593:16
    frame #17: 0x0000000107dc3da8 clang`mlir::PassManager::runPasses(this=0x000000016fdfbbf8, op=0x000000014b63ba60, am=AnalysisManager @ 0x000000016fdfb750) at Pass.cpp:904:10
    frame #18: 0x0000000107dc3a78 clang`mlir::PassManager::run(this=0x000000016fdfbbf8, op=0x000000014b63ba60) at Pass.cpp:884:60
    frame #19: 0x00000001060eac5c clang`cir::runCIRToCIRPasses(theModule=ModuleOp @ 0x000000016fdfbc90, mlirCtx=0x000000014b606570, astCtx=0x000000014b80b200, enableVerifier=true, enableLifetime=false, lifetimeOpts=(Data = "", Length = 0), enableIdiomRecognizer=false, idiomRecognizerOpts=(Data = "", Length = 0), enableLibOpt=false, libOptOpts=(Data = "", Length = 0), passOptParsingFailure="", flattenCIR=false, emitMLIR=false, enableCallConvLowering=true) at CIRPasses.cpp:84:13
    frame #20: 0x0000000105d0aba0 clang`cir::CIRGenConsumer::HandleTranslationUnit(this=0x000000016fdfc2c0)::'lambda'()::operator()() const at CIRGenAction.cpp:184:11
    frame #21: 0x0000000105d08498 clang`cir::CIRGenConsumer::HandleTranslationUnit(this=0x000000014b6055f0, C=0x000000014b80b200) at CIRGenAction.cpp:233:9
    frame #22: 0x0000000109fb0230 clang`clang::ParseAST(S=0x000000014b891800, PrintStats=false, SkipFunctionBodies=false) at ParseAST.cpp:176:13
    frame #23: 0x0000000104ee95e8 clang`clang::ASTFrontendAction::ExecuteAction(this=0x000000014b724c50) at FrontendAction.cpp:1212:3
    frame #24: 0x0000000105d076c0 clang`cir::CIRGenAction::ExecuteAction(this=0x000000014b724c50) at CIRGenAction.cpp:407:30
    frame #25: 0x0000000104ee8da4 clang`clang::FrontendAction::Execute(this=0x000000014b724c50) at FrontendAction.cpp:1098:8
    frame #26: 0x0000000104e022d0 clang`clang::CompilerInstance::ExecuteAction(this=0x000000014b720230, Act=0x000000014b724c50) at CompilerInstance.cpp:1061:33
    frame #27: 0x00000001050109c8 clang`clang::ExecuteCompilerInvocation(Clang=0x000000014b720230) at ExecuteCompilerInvocation.cpp:342:25
    frame #28: 0x0000000100011b08 clang`cc1_main(Argv=ArrayRef<const char *> @ 0x000000016fdfcc90, Argv0="/bin/clang", MainAddr=0x0000000100001cfc) at cc1_main.cpp:232:15
    frame #29: 0x000000010000365c clang`ExecuteCC1Tool(ArgV=0x000000016fdfe3d8, ToolContext=0x000000016fdfec28) at driver.cpp:215:12
    frame #30: 0x0000000100002390 clang`clang_main(Argc=17, Argv=0x000000016fdfef48, ToolContext=0x000000016fdfec28) at driver.cpp:256:12
    frame #31: 0x000000010003c540 clang`main(argc=17, argv=0x000000016fdfef48) at clang-driver.cpp:17:10
    frame #32: 0x000000019f2be0e0 dyld`start + 2360

It happens while trying to dump the root op in GreedyPatternRewriteDriver.cpp:496, which indicates there's something malformed that's only get triggered in some platforms.

Bisecting the tests a bit, I can see the crash it only triggers for 3 of them: Bool, Float and Double.

@bcardosolopes
Copy link
Member

bcardosolopes commented Jul 8, 2024

Bool before conversion:

  cir.func @_Z4Boolb(%arg0: !cir.bool) -> !cir.bool attributes {ast = #cir.function.decl.ast} extra(#cir<extra({inline = #cir.inline<no>, nothrow = #cir.nothrow, optnone = #cir.optnone})>) {
    %0 = cir.alloca !cir.bool, !cir.ptr<!cir.bool>, ["a", init] {alignment = 1 : i64}
    %1 = cir.alloca !cir.bool, !cir.ptr<!cir.bool>, ["__retval"] {alignment = 1 : i64}
    cir.store %arg0, %0 : !cir.bool, !cir.ptr<!cir.bool>
    %2 = cir.load %0 : !cir.ptr<!cir.bool>, !cir.bool
    %3 = cir.call @_Z4Boolb(%2) : (!cir.bool) -> !cir.bool
    cir.store %3, %1 : !cir.bool, !cir.ptr<!cir.bool>
    cir.br ^bb1
  ^bb1:  // pred: ^bb0
    %4 = cir.load %1 : !cir.ptr<!cir.bool>, !cir.bool
    cir.return %4 : !cir.bool
  }
}

Bool after (as of crash state):

cir.func @_Z4Boolb(%arg0: !cir.bool) -> !cir.bool attributes {ast = #cir.function.decl.ast} extra(#cir<extra({inline = #cir.inline<no>, nothrow = #cir.nothrow, optnone = #cir.optnone})>) {
    %0 = cir.alloca !cir.bool, !cir.ptr<!cir.bool>, ["a", init] {alignment = 1 : i64}
    %1 = cir.alloca !cir.bool, !cir.ptr<!cir.bool>, ["__retval"] {alignment = 1 : i64}
    cir.store %arg0, %0 : !cir.bool, !cir.ptr<!cir.bool>
    %2 = cir.load %0 : !cir.ptr<!cir.bool>, !cir.bool
    %3 = cir.call @_Z4Boolb(%2) : (!cir.bool) -> !cir.bool
    cir.store %3, %1 : !cir.bool, !cir.ptr<!cir.bool>
    %4 = cir.load %1 : !cir.ptr<!cir.bool>, !cir.bool
    cir.return %4 : !cir.bool
  }

However, looking at BrOp::fold, it seems like it's breaking the invariants of folding as described in https://mlir.llvm.org/docs/Canonicalization/#canonicalizing-with-the-fold-method. Let me see if I can still repro after reinstanting previous behavior

@bcardosolopes
Copy link
Member

This is related to 3b9f698, @Kritoooo ^

@bcardosolopes
Copy link
Member

Reverting that commit now gets me to the actual crash, so it wasn't related at all. Anyways, let me look into the real one now :)

@bcardosolopes
Copy link
Member

@Kritoooo reverted in 5792eb0, do any of your other improvements there could also be leading to similar problems? Sorry I didn't catch this during review time

@bcardosolopes
Copy link
Member

bcardosolopes commented Jul 8, 2024

Ok, back to the actual issue. For the double example:

   87       if (auto doubleTy = dyn_cast<DoubleType>(T)) {
   88         Width = Target->getDoubleWidth();
   89         Align = Target->getDoubleAlign();

Target->getDoubleAlign() is returning zero, which causes the assertion below. In theory this shouldn't be possible because getDoubleAlign returns DoubleAlign, which is initialized by default to 64 in TargetInfo::TargetInfo and isn't changed for x86_64 target for the platform in question. This is hinting at some dangling reference of sorts.

My guess is that you are missing some initialization bits for clang::TargetInfo, or you are creating some of the dependencies on the stack and they are gone at the time we need to query the information. Maybe go back to how things are getting created in OG versus how you do it in createLowerModule. If there's any patch on top you'd like me to test let me know.

@Kritoooo
Copy link
Contributor

Kritoooo commented Jul 9, 2024

@Kritoooo reverted in 5792eb0, do any of your other improvements there could also be leading to similar problems?

I think the logic of RemoveRedundantBranches doesn't seem suitable to be implemented in fold (or there might be an issue with my implementation), because when deleting operations in fold, it falls into the loop I described earlier. #690 (comment).
And it seems that fold is also not suitable for deleting operations without results, and BrOp, switchOp, etc. happen to have no results.
If by similar problems you mean the problems described in this issue, then I haven't encountered them.

Sorry I didn't catch this during review time

I am also responsible for this error, as I failed to identify the potential issues. However, I have been a bit busy lately. When I have some free time, I will try to address the previous unresolved issues.

@bcardosolopes
Copy link
Member

I think the logic of RemoveRedundantBranches doesn't seem suitable to be implemented in fold (or there might be an issue with my implementation), because when deleting operations in fold, it falls into the loop I described earlier. #690 (comment).

Yea, I was thinking it could be related to that, but at that point I missed the invariants about fold.

And it seems that fold is also not suitable for deleting operations without results, and BrOp, switchOp, etc. happen to have no results. If by similar problems you mean the problems described in this issue, then I haven't encountered them.

I think for these we could leverage the existing passes that take advantage of Op::getSuccessorOperands and friends without using the default canonicalizer, and do something that perhaps call into those infra similar to what we do in MergeCleanups.

I am also responsible for this error, as I failed to identify the potential issues. However, I have been a bit busy lately. When I have some free time, I will try to address the previous unresolved issues.

It's all good, thanks for the fast response, and your contribs are always welcome!

@sitio-couto sitio-couto force-pushed the vinicius/add-x86-fp-cc-lowering branch from 00ce2d7 to 7b9c005 Compare July 10, 2024 11:38
@bcardosolopes
Copy link
Member

Seems like it passes macOS now, yay! There's a remaining failure on windows still: NYI UNREACHABLE executed at D:\a\clangir\clangir\clang\lib\CIR\Dialect\Transforms\TargetLowering\Targets\X86.cpp:173!, which points to:

Type T1 = {};
unsigned T0Size = TD.getTypeAllocSize(T0);
if (SourceSize > T0Size)
  llvm_unreachable("NYI");

@sitio-couto
Copy link
Collaborator Author

sitio-couto commented Jul 10, 2024

Seems like it passes macOS now, yay!

The dangling ref guess was spot on 🔥

There's a remaining failure on windows still

I'm trying to set up a WS2019 VM to debug this. Fixing it without being able to reproduce it is not very practical.

@bcardosolopes
Copy link
Member

I'm trying to set up a WS2019 VM to debug this. Fixing it without being able to reproduce it is not very practical.

Yep. Alternative approaches I suggest: build a ASANified clang and run the test with it, if it's still related to dangling stuff, it might catch it on linux as well - I have used that sometimes for these types of issues. Something else to keep an eye on is that for x86_64, windows inherits from the more generic target info and adds some stuff, it's possible some slicing is happening.

@sitio-couto sitio-couto force-pushed the vinicius/add-x86-fp-cc-lowering branch from 7b9c005 to 63730b5 Compare July 25, 2024 00:26
Implements calling convention lowering of float and double arguments and
return values conventions for X86_64.
@sitio-couto sitio-couto force-pushed the vinicius/add-x86-fp-cc-lowering branch from 63730b5 to bbb45d2 Compare July 26, 2024 13:47
@sitio-couto sitio-couto merged commit c7e7b19 into llvm:main Jul 26, 2024
6 checks passed
@sitio-couto sitio-couto deleted the vinicius/add-x86-fp-cc-lowering branch July 26, 2024 14:42
Hugobros3 pushed a commit to shady-gang/clangir that referenced this pull request Oct 2, 2024
Implements calling convention lowering of float and double arguments and
return values conventions for X86_64.
smeenai pushed a commit to smeenai/clangir that referenced this pull request Oct 9, 2024
Implements calling convention lowering of float and double arguments and
return values conventions for X86_64.
smeenai pushed a commit to smeenai/clangir that referenced this pull request Oct 9, 2024
Implements calling convention lowering of float and double arguments and
return values conventions for X86_64.
keryell pushed a commit to keryell/clangir that referenced this pull request Oct 19, 2024
Implements calling convention lowering of float and double arguments and
return values conventions for X86_64.
lanza pushed a commit that referenced this pull request Nov 5, 2024
Implements calling convention lowering of float and double arguments and
return values conventions for X86_64.
lanza pushed a commit that referenced this pull request Mar 18, 2025
Implements calling convention lowering of float and double arguments and
return values conventions for X86_64.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants