Skip to content

Onedal algorithms backed by nuget packages #6521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 66 commits into from
Dec 21, 2022

Conversation

rgesteve
Copy link
Contributor

@rgesteve rgesteve commented Dec 7, 2022

Supercedes #6373 and #6364 allowing for full build of functionality without extra packages (they're downloaded by build.sh). Also includes sample to illustrate use.

We are excited to review your PR.

So we can do the best job, please check:

  • There's a descriptive title that will make sense to other developers some time from now.
  • There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format Fixes #nnnn in your description to cause GitHub to automatically close the issue(s) when your PR is merged.
  • Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
  • You have included any necessary tests in the same PR.

@ericstj
Copy link
Member

ericstj commented Dec 19, 2022

@rgesteve @michaelgsharp -- I gave a local test of this and was able to reproduce the problem. It looks like the error is due to missing onedal_core.1.dll which is a dependency of OneDalNative.dll. You can see this if you enable loader snaps in the native debugger. The .NET runtime can't distinguish between a failure from the library itself vs some dependency as the failure happens inside the call to loadlibrary and the OS doesn't distinguish.

var currentDir = AppContext.BaseDirectory;
Output.WriteLine($"**** Running from directory {currentDir}.");

var dllDir = AppContext.GetData("NATIVE_DLL_SEARCH_DIRECTORIES").ToString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work on .NETFramework, but I image you've just added this for debugging purposes. Make sure to remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, this whole test is there on a very temporary basis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit #e2a60945 removes the test.

@ericstj
Copy link
Member

ericstj commented Dec 19, 2022

I tried to manually workaround the missing onedal_core.1.dll by finding it in the inteldal.redist.win-x64 nuget package and manually copying it over, then the test hit errors for other missing dlls:

Intel oneDAL FATAL ERROR: Cannot find/load library onedal_thread.1.dll.
Intel oneDAL FATAL ERROR: Cannot find/load library onedal_sequential.1.dll.
Intel oneDAL FATAL ERROR: Cannot load neither onedal_thread.1.dll nor onedal_sequential.1.dll

After this, the test failed fast and terminated the process. I couldn't find those dlls to copy over so it seems blocked.
Update: those dlls were present, I actually copied them over along with onedal_core.1.dll but for some reason the OS loader isn't probing next to onedal_core.1.dll it's only probing next to the application dotnet.exe (in this case) and other machine wide paths.

What's interesting is that this is different than the load for OneDalNative.dll > onedal_core.1.dll. In that case the OS considers the directory for OneDalNative.dll when looking for onedal_core.1.dll. This load is happening as a result of the import table in OneDalNative.dll, so it makes sense that the OS would look next to it. I'm not sure what is causing the load of onedal_thread.1.dll -- I don't see that in the import table of onedal_core.1.dll -- is this being loaded manually through a call in onedal_core.1.dll? If so, it could be that the flags passed are not permitting search next to the executing binary. I guess that would make sense for a normal call to load since it doesn't "know" that the load would be coming from a library that's not located in the application's directory. I think the fix here would be to make those loads in the onedal libraries explicitly consider the directory of the onedal_core.1.dll library when doing further loads.

@rgesteve
Copy link
Contributor Author

Thank you Eric! Loading OneDalNative successfully and failing to load the dependencies was my observation as well in Linux:

 193059:[pid 21092] openat(AT_FDCWD, "/data/projects/machinelearning/.dotnet/shared/Microsoft.NETCore.App/6.0.9/libOneDalNative.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
 193060:[pid 21092] openat(AT_FDCWD, "/data/projects/machinelearning/artifacts/bin/Microsoft.ML.Tests/Debug/net6.0/libOneDalNative.so", O_RDONLY|O_CLOEXEC) = 415
 193069:[pid 21092] openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 415
 193073:[pid 21092] openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libonedal_core.so.1.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
 193074:[pid 21092] openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libonedal_core.so.1.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
 193075:[pid 21092] openat(AT_FDCWD, "/lib/libonedal_core.so.1.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)

Like you said, the loader doesn't even bother trying to find the dependencies in directories other than the system-wide ldconfig standard ones. ldd does show libonedal_{core,thread} as dependencies of OneDalCore, but your suggestion of the library dynamically loading its dependencies makes a lot of sense, especially in light that, pointing LD_LIBRARY_PATH to the exact same directory, with the exact same contents that the one the workload should be search works. The last commit includes the dependencies in the nupkg for distribution. I don't expect that this will change the loader issue much, but at least it'll allow people to try explictitly pointing LD_LIBRARY_PATH/PATH to where the dependencies are in their filesystem.

In the meanwhile will work with the oneDAL team to see if we can figure out what's going on.

@ericstj
Copy link
Member

ericstj commented Dec 20, 2022

It might be worthwhile following up with Interop folks like @AaronRobinsonMSFT and @elinor-fung to understand if it is expected that the runtime doesn’t set this native probing path itself. It could be something odd about how ML.Net runs tests.

@rgesteve
Copy link
Contributor Author

Looks like commit 35a18a1 addresses this for Linux, by specifying in the RPATH that dependencies should be searched for in the same directory OneDalNative is found. Trying to figure out what the equivalent command line options are for the VC compiler

Copy link
Member

@michaelgsharp michaelgsharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@michaelgsharp michaelgsharp merged commit 0880a90 into dotnet:main Dec 21, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jan 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants