-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New version of Intel MKL. #5867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #5867 +/- ##
==========================================
- Coverage 68.27% 68.21% -0.06%
==========================================
Files 1134 1134
Lines 242028 242028
Branches 25306 25306
==========================================
- Hits 165235 165106 -129
- Misses 70152 70272 +120
- Partials 6641 6650 +9
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Can you explain this more? Is there something we should be doing to ensure this doesn't happen? Or something an app using ML.NET can do? Even with the explanation, it still sounds like a bug they should fix... I assume multiple apps can hit this. |
@eerhardt Let me reach out to them and ask them your follow up questions. My guess is that they will just say its our responsibility to make sure the first call happen outside of DLLMain. That same call that you added in the tests as a workaround, could we add that somewhere in the initialization of something so it always happens safely? |
This is the part I am not understanding. We don't control their DllMain. |
My understanding from the description is they are advising not doing the first call to the MKL library inside DllMain (any user DllMain). this can create a deadlock as they will call LoadLibrary in the MKL library. I don't think this affect us as I expect we don't call the library in any DllMain we have. |
It has caused problems for us in the past. Eric had to add a manual call to MKL in our test initialization so that we didn't have deadlock issue. With that manual call the tests don't run into problems, but an end user still could. I don't think we expose any way for them to manually trigger that initialization. We may want to either expose a way for the user to do it or to have it be automatically done for them so its a non-issue. I would vote on doing it automatically if possible. But only if the MKL package has been included. |
We don't have a The issue, as far as I could tell, is when 2 different threads are running and both calling into MKL at roughly the same time. MKL tries initializing itself from both threads, resulting in a deadlock. See #1073 for my full analysis.
I would vote on MKL fixing their issue. And if it is not possible, they should have a documented workaround/API for people to do in order to avoid the issue. |
I agree that MKL should fix their issue and have it all documented. Seeing as its been about 3 years since this issue was raised though it doesn't seem like its high on their priority list. I have already reached back out to them for more clarification and will update you on what they say. Internally, we can use your workaround in the tests and it works just fine. We don't expose anyway for the user to do that though, so if they run into the issue they will have to figure out the same fix we have already done. And since our custom version of MKL uses a different DLL name, they would also have to figure out what that name is in order to manually initialize it the same way we are. Since it doesn't appear to be high on Intel's priority list, we already have a fix that works, and we have a custom dll name I suggested we see if there is a way we can do it automatically for the user. I agree that Intel should fix it, but until they do I think we should do something if we can. |
How many user reports have we had on the issue? If it is a small amount, we can give them the same workaround we are using. And also have them contact Intel. If we put a Band-Aid over it, that makes it even less of a priority for Intel. |
Other than the one internal report we have gotten I haven't heard anything or seen anything in issues. So probably very few, if any, other users have hit it. Good point about the band-aid. I hadn't thought about it like that. Then yeah, lets just leave things as is for now and if we do get any issues about it we can just point them to the fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. I thought I already approved this. LGTM.
Update version of Intel MKL.
The issue of thread contention is still not fixed. According to them:
This is the latest version of Intel MKL and includes the windows .PDB files.