-
Notifications
You must be signed in to change notification settings - Fork 136
crossgen2 fails with 139 exit code on Arm64 #4007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
@jkoritzinsky - Can you help investigate? This was working recently and regressed. |
I can try but I don't have a Linux ARM64 machine. Do you know the first VMR build that this broke in? I'll see if I can narrow down where the failure from. |
It looks like the most recent passing build was https://dev.azure.com/dnceng/internal/_build/results?buildId=2359219&view=results and the first failing build was https://dev.azure.com/dnceng/internal/_build/results?buildId=2359821&view=results. The runtime range of commits between these two is dotnet/runtime@8accd80...e99836a The commits that pop out to me as possibly causing crossgen2 crashes are: I think the second is more likely than the first. cc: @EgorBo |
Checking now |
It's getting a different error now and failing with error code 134:
Failing build (internal link) |
It's consistently failing but not always failing with the same error. The two errors mentioned in this issue above are all I've seen so far. |
Interestingly, the build succeeded in this recent run (internal link). But I'm not sure I trust that the issue is resolved. Is it perhaps an intermittent issue? The source changes that exist between the two builds (failing and passing) don't include any runtime changes at all. |
Yes, it is possible that the failure is intermittent given your sample with the 134 exit code. It's likely a native corruption somewhere in crossgen2. If this is using a live crossgen2 (which I think it is), then it could be anywhere in the JIT, GC, or CoreCLR runtime. This also means it could have appeared before the first instance during the time frame where the build was failing before getting to installer, which makes this harder to diagnose... |
There's been 3 other builds since that one that passed and they've all failed. |
@MichaelSimons is there an easy way to test a build with a runtime change reverted? We might see some benefit at just looking at a build with each of those two commits that Jeremy identified reverted |
This comment was marked as outdated.
This comment was marked as outdated.
@mthalman I see you're testing a revert of @EgorBo 's change here: https://dev.azure.com/dnceng/internal/_build?definitionId=1219. Could you also test a revert of dotnet/runtime@205ef03 in parallel just to be comprehensive in our checking? (It seems unlikely to be the problem; it should have only deleted unused code) |
The latest
|
Long story short: the issue was handled offline and it turned out that the issue was introduced in dotnet/runtime#96969 and the fix (revert) is dotnet/runtime#97679 (although, it's probably reverted only from release/9.0p1 branch?) |
Yes, and the latest VMR builds for
|
It's not a separate issue. The fix wasn't applied to |
Should be fixed by dotnet/runtime#97817 |
@mthalman can you please share repro instructions with me? I am not familiar with the source build and while I have tried to mimic what I've seen in the ci log, I am not sure if I did it right. ./prep.sh
BUILD_SOURCEVERSION=220c455bc727ab0a8bdd9feb61f474a605d30339 ./build.sh --ci --clean-while-building --prepareMachine --source-only (I have used the current main for build, so I've changed the 220c455bc727ab0a8bdd9feb61f474a605d30339 to the commit hash of my checkout) |
It should be enough to just run these commands:
Depending on how much disk space you have available, you may need to run |
@mthalman my build keeps failing due to this: I've tried several times, even with git clean -xdf in between. Any idea what is wrong? |
That looks related to not having the latest source. Have you pulled latest from main branch of the VMR? That error was relevant in an older commit but was fixed by dotnet/installer@90ccc0c in dotnet/installer#18641. Oh, and you'll also need to run it in with the |
I did a fresh clone today, I didn't have that repo locally. I am on this commit:
|
I've synced to the latest state now again and I am retrying. |
Interesting. Well that might be another issue with Arm then. Since this is failing in the build of the aspnetcore repo, it means you passed the point of the crossgen failure in the runtime repo (runtime builds before aspnetcore). So this means you're not repro'ing the issue. Be aware that this is an intermittent issue so you may need to build several times to get it. To get a fresh VMR to start over from, I execute |
Ok, I'll ignore that error then and just retry a clean build. |
I don't think this has anything to do with R2R itself. The failure in the linked issue is due to an unhandled exception. The error message is:
|
@agocke the Ubuntu run seems to be failing with 139 during crossgen2 run though. |
Ah, you're right I was lookihg at a different log |
That was a separate issue (identified in #4111 (comment)), that has since been fixed. |
@mthalman mystery solved. It is not a new issue. The thing is that crossgen doesn't use the current build of runtime, but rather an older version (it uses the dotnet sdk in the .dotnet folder). In this case, it used version 9.0.0-preview.2.24080.1 built from commit d40c654c274fe4f4afe66328f0599130f3eb2ea6 that predates the fix in main by two days. |
Ok, I'll update the VMR to use a more recent SDK and check the results. |
This is continuing to be an issue. The most recent related error that I've seen is in runtime:
Link to build (internal link) |
The bootstrap work to resolve this is still pending. |
This should be fixed by dotnet/installer#18763 |
Hi, |
Reopening this because it's not yet resolved for source build in the VMR. We're still waiting for a bootstrap of the VMR onto an SDK build which contains the fix for this issue. But that's currently blocked on #4206. |
@sec - This is the fix: dotnet/runtime#97817. But it's not enough to have the fix in the source that you're currently building. You need to build with an SDK that has this fix. That's what we're currently working towards for the VMR, to get it bootstrapped with an SDK that has this fix in it. That will allow builds to work using that SDK. |
@mthalman Thanks for the info. If I get it correctly, I should wait for your work to be done, then I can crosscompile SDK for FreeBSD and use it as bootstrap for preview-2, which should be fine. BUT, as looking at this, it's a bug in libunwind - which under FreeBSD is taken from ports - looking in there I've already found libunwind/libunwind#715 - this should also be fixed otherwise I will still hit this in future :) |
The fix was flowed into the VMR with dotnet/installer#19145 |
This is causing a failure in the VMR for .NET 9 Preview 1 when building the installer repo. It fails in the Ubuntu2204Arm64_Offline_MsftSdk_arm64 job (example build [internal link]).
Error:
Here is a binlog:
sourcebuild.installer.zip
We need this resolved for .NET 9 Preview 1.
The text was updated successfully, but these errors were encountered: