-
Notifications
You must be signed in to change notification settings - Fork 13
BUG: Duplicate packages with same PURL break SBOM import and DejaCode component catalog #295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Manually going through them sorted by date in the admin dashboard I think I managed to delete the affected ones manually. |
I also noticed that reimporting SBOMs on a product that already had an SBOM import can result in dependency count increasing, as it is seemingly not checking for duplicates. Perhaps this issue is related to that? |
The behavior is quite strange. For some packages the SBOM import will create new package entries, despite there already being one with the same PURL, albeit different / non-empty download URL. However, this does not happen for every package, so I'm still missing some factor that plays into this decision. |
Running "Improve Package from PurlDB" fails with |
It seems there are two issues here:
|
I'll investigate the issue further today and see if I can narrow down condition under which this happens. |
Some clarification about a Package uniqueness from the source code:
Note that the For example:
In DejaCode, those 2 packages may share the same |
I'm able to reproduce the error locally. This is a bug in the importer logic that occurs when multiple packages with the same PURL, but different I'll provide a fix to ensure the proper existing package (from the multiple records) is fetched and assigned.
As explained above, the issue occurs when there's more than 1 package with the same PURL but different
That would be a search bug. Have you tried the advanced syntax such as
That's a separate issue that should also be fixed. |
@tdruez Thank you for looking into this! I can confirm that your description is consistent with what I am seeing, having checked and cross-referenced data from the SBOM against product packages and packages in the catalog. Besides the case where there are already two or more packages with the same PURL but different download URLs, I have also had new package creation happen when there was only one existing package with the same PURL. It seems that this has happened when the checksums from the package where different, not the value itself but rather one has SHA1 the other SHA512 or one has both and the other just one of them. This can still cause trouble down the line when applying data from PURL DB would essentially create the exact same package by filling in the missing data, at which point the uniqueness constraint is triggered. If I'm not wrong with my assessment, this would fully describe the behavior I'm seeing. |
As far as I can tell the search in package does not allow to use the advanced syntax to search in the identifier column. Although this might also be a formatting issue since the purl contains colon and slashes, but quoting also didn't help. Just searching for the package name seems to exclude the ones with duplicate PURL. So for instance searching the UI for "parse-json" results in both I'll check on the dependency count issue when importing SBOMs multiple times and if that issues is still reproducible, file an additional issues. Edit: Issues has been filed #297 Thank you very much for your efforts |
Right, that's because the identifier value is not a column in the database but a dynamic property.
This exists for legacy support of packages created without a PURL.
This would be a bug with the admin search then but I cannot manage to reproduce locally so far. |
@tdruez I can file a separate issue, as the underlying cause is unrelated to this ticket. It's not an important issue though, as it is now clear what the issue is, how to workaround it, and once the patch is done, should probably not be all that relevant. The reason why the duplicate packages happen a bit more on my system is that there are projects have pulled the package from an internal package registry and some from the official package registry so both of them coexist. For others stumbling upon the same issue:
|
Signed-off-by: tdruez <[email protected]>
Signed-off-by: tdruez <[email protected]>
Signed-off-by: tdruez <[email protected]>
@ghsa-retrieval The initial import issue should be fixed by #298 We'll handle the PurlDB issue in a separate PR ;) |
@tdruez Thank you so much. I'll create a build tomorrow and give it a try. |
@tdruez Unfortunately, the patch is not working as I would expect. I'm now seeing duplicate packages being created for all of them, even the ones that could previously be cleanly be mapped to exactly one existing package (for which there was no other package with the same purl in the catalog prior to the SBOM import). |
@ghsa-retrieval I see, the new package matching is a bit too restrictive. We probably want to be more flexible when only 1 package exists for a given PURL and use this package instead of creating a new record (even if the |
@tdruez At least if the download_url for the potential new package is empty. If there truly is a different one, I'd say creating a new one would be the correct way to go. |
Signed-off-by: tdruez <[email protected]>
Signed-off-by: tdruez <[email protected]>
Signed-off-by: tdruez <[email protected]>
Signed-off-by: tdruez <[email protected]>
Signed-off-by: tdruez <[email protected]>
@ghsa-retrieval I've implemented and merged the refinements for the package matching logic in #300 |
@tdruez Thank you very much! It seem to be working well, no errors and everything imported. All packages that already existed got properly mapped. The only exception was a package that already had two packages in the catalog with the same PURL but different download URL. I think that is expected behavior because it would not be clear which one DejaCode is supposed to pick if the imported SBOM does not provide a download URL (empty) that matches either of the existing ones. |
Thanks for the confimation!
Yes, this the expected behavior. DejaCode tries to re-use existing packages as much as possible, unless there is no clear way to make a choice.
I'll look into this one next before closing on this issue. |
Entered as #303 |
Describe the bug
Importing an SBOM results in errors for several packages:
Comparing the content of the SBOM with the inventory as well as existing package revealed, that the issue is caused by duplicate packages in the component catalog. Apparently there are packages with the same PURL, hash, type, name, and version.
One is properly populated with scan data, while the other is not.(Edit: It seems that the above error is caused when there is an existing package without a download url, see #295 (comment) for the two distinct cases). I suspect the duplicate/broken one was formerly associated with a project that has since been deleted.The even bigger issue is that while we can see those packages in the regular UI, they do not get shown in the admin's dashboard when searching for its name. Thus, deleting over 400 affected ones is a bit of a challenge, given that the error message does not indicate which specific packages cause the issue.
It seems that there is some uniqueness constraint not properly checked when importing SBOMs, as all packages have been imported through SBOMs.
To Reproduce
Unclear
Expected behavior
DejaCode should not allow to create duplicate packages through imported SBOMs.
Screenshots
Context (OS, Browser, Device, etc.):
n.a.
The text was updated successfully, but these errors were encountered: