-
Notifications
You must be signed in to change notification settings - Fork 159
Search for a package with its verbatim name still does not display the package in first position #8518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@crcdng Thanks for reporting it! This seems to be a tough case, because for the given search query:
I'm 95% certain that if those static analysis points were fixed, it would be on the first few spots, likely on the first. I'm inclined to say that the points in this case are causing a bit more divide in the ranking than warranted. Maybe we should dampen the difference somehow? |
It's funny - I did not see the "Matching package names..." until mentioned. I would still argue that a package repository should always display the (one) direct match in the first place regardless of metrics. |
I think this was once the actual behavior, and was decided against. There can be abandoned packages that happen to have squatted some keyword. They should not be shown at the top of the list IMO. We show direct matches, and we show a ranked list of matches. I believe having the match in the package name ranks the package slightly higher (@isoos please confirm) but in this case it seems to be not enough. I agree with @isoos that if the package score was fixed this package would most likely rank much higher. |
Right now it ranks the same as if the |
@orestesgaolin: I'd llike to emphasize again: we actually had it in the first position as part of the regular results, but then we got complaints: the package was no longer relevant for the query, despite the name match. Hence the current compromise with exposing name matches separately. It is not clear which solution is really better. |
Maybe introduce some threshold that decides whether the exact match is still relevant? E.g. if it got an update in the last x months, got liked n times in the last n months or something like this. So we could skip the name squats but otherwise put the exact match at the top, in a manner that we all scan for, so a normal row with package info, not a small paragraph with a small link :) |
@Albert221: We are not removing the package from the result list. If it is relevant, it will be listed in (one of) the top position(s), and at the moment we don't have yet-another-relevancy score to up-or-downrank special cases. |
FWIW, this has tripped me up several times as a user. I overlook the "Matching package names" line and then only see the one or two leading packages being different from what I'm searching for. I almost had a heart attack when this happened the last time — I was searching for https://pub.dev/packages?q=xml If you're not trained to look for the exact match line, you can easily think that the package has been discontinued or removed or something. Only after remembering that this has happened before and scrolling down did I find the actual package. |
I think we could rephrase this problem and ask for more explicit presentation of directly matched package. Perhaps card-like style similar to all the remaining entries could help with discoverability. This way we could avoid tweaking the search relevance algorithm and just apply relatively straightforward UI change. Despite using pub.dev daily, I get confused all the time. It could make an interesting case in ux study ;) |
exactly, the current design creates a perceptual trap similar to this one: |
One obviously needs to take into account packages that don't "deserve" to be listed first for a given search term. For example, if there's a Foo Database and, before it gets its own quality, official API ( So this has no simple solution. But I'd propose tweaking the weight of exact match, or the formula that combines the different signals together. You generally want high quality, popular packages with exact match (like |
@jonasfj you looked at the top queries, and concluded that most search on pub.dev indeed is for a known package. Did I get this right? If that is indeed true, then I agree, we should probably do either or both of
|
I want to emphasize that no other package repository I have interacted with so far, be that pypi, npm, crates.io, lib.rs or any Linux Distro GUI package manager frontend has had this problem. Neither the bad sorting, nor the weird ux where the exact match is hidden in a small text. |
It is a different tradeoff I guess. If I search for 'xml' on crates, I indeed get the 'xml' crate as the first result. But judging from the download counts I would probably rather be using 'fast-xml', and thus I think that package should be listed higher. I still think showing the exact match as a separate thing is useful - but agree we should style it to be more noticable. Another example "yaml": |
Totally agree there's a trade off. (See my parable above about But I also think there's something fishy about the current formula. Because for the search term xml, you get this: The exact match If that's so, I suggest decreasing the effect of pub points. Sure, a package with 60/150 pub points should not be #1 in search results as long as there's almost any other package. But when packages reach some reasonable level, the pub points signal is much less important. A single info-level lint is clearly not something that should dethrone a million-downloads-per-week package that's used in 459 other packages. |
We (really @isoos) are trying to make the difference caused by the last few pub points less in #8572 we'll see the effect when that is deployed. (I think this can be seen on staging already https://staging.pub.dev/packages?q=xml) |
Comment: it's better, especially as the current result has something rather unrelated on top position. I still wonder, how can a package that
In other words a vastly more significant search result appear on place 2 merely because it scores 3% less on some rather complex and arbitrary "points" metric? |
@crcdng: to answer the specifics:
We then combine the like, the download count (50-50% right now) into a merged score, and then combine it with the pub points (which is 0.0-1.0 depending on the It is important to note that while you may find a few compelling queries to blindly promote the exactly matching package to the top, we have quite a lot of examples where this is not ideal at all. It will be always a balance. |
Thanks for the explanation. I understand this is a complex task. I am mostly judging from the current results (I encountered really many examples before filing the bug report) and I think there will be improvement. I'm still not convinced the "weights" will be calibrated correctly. To reiterate, when I put "xml" in the pub.dev search field, my intention is to find out if / what kind of xml library or libraries are there for Flutter / Dart. Or I might have read about or heard of a Flutter "xml" package. Now I want to check it out. In particular, I am NOT searching for a xml to json converter package (I would have entered something like "xml json") and I am clearly not searching for a package to "load, manipulate, and save GPS data in GPX format" a package that happens to be based on XML. Here my search would have mentioned something GPS or geodata related. And the fact that on some metric the package I am actually looking for ranks lower than the two that I am not looking for, but are returned above it doesn't matter, because I am not looking for these other two packages, as explained above. Therefore the current metric comparison between these packages really makes no sense. Taken to the extreme, your strategy could be to always return the highest metric package regardless the search term in order to ensure good quality results. I think putting such high weights on the "points" metric makes sense when we don't have an exact match. Now your point is, if I'm correct, that someone would publish a package called "a" that does "b" and that therefore should not be the first result looking for "a". It makes sense when the ranking punishes that behaviour. But if that treacherous package would have proper points, it would still be top-ranked, so clearly this is not a solution to the problem. In the current example "xml" is missing 10 points on static analysis, which is not an indicator of a match between package name and package purpose at all. |
I wonder: is the idea of tf-idf (Term Frequency-Inverse Document Frequency, or "term specificity") applied to the pubdev ranking algorithm? If not, that might be the actual problem. I'm no expert but from what I understand, tf-idf exists to de-emphasize documents such as "xml 2 json" or "GPS XML" when what you're looking for is "xml". I think we all agree that just blindly showing exact matches for every query would just lead to more name-squatting and worse results overall. But yeah, |
Such nefarious package wouldn't be top ranked without other indicators, e.g. likes or downloads. Of course one could game that with enough dedication, but what's the point if the package otherwise is crap? It is not that one can get massive benefits by gaming search like that.
To be fair, there is always some ranking that will result in similar differences: e.g. a new fork of an abandoned package may be fixing all the obsolete stuff and errors, and if you want to search for its name or features, you may want to get the new one with more points but less downloads to be ranked higher. Unfortunately there is no single metric that gets every ranking just right. Aside: the |
Yes and no: we had something like it, but over the years it got diluted/compressed/changed, and I wouldn't call it tf-idf anymore. The reasons are the usual: over the years we had similar discussions like this thread, prompting us to change/tune the ranking algorithm in use, and it drifted to its current state. The reluctance to swiftly change the ranking algorithm comes from this experience: we have seen very different ranking preferences and requests already, and did our best to fix the algorithm to accommodate the goals of the users. We intend to do the same here, but the solution may be different than what comes to mind in the first place. |
@isoos @sigurdm I looked at #8573 and how it appears in the staging website I think the problem people are having in this thread is that the exact match result looks like a lot more like a "Did you mean ___?" prompt than a package result at first glance. It's just a linked word and doesn't have any of the package details like the other results do. I strongly think following npm is the right move here, with how they still include the entire package information, but explicitly note that it's at the top because it's an exact match (copying the screenshot from earlier in the thread) |
This has been a UX issue for me as well. I think improving the UI for the exact matches is all that's needed for a fix. |
@sigurdm Doesn't always seem to work for exact match, searched for drops package and couldn't find it in the search result nor in the exact match |
@hamza-imran75: Thanks for reporting this, as this surfaces a bug where we don't lowercase the search expression for exact package name matching. However, we still won't move the exact name matches to the first hit spot, e.g. in this case |
While I agree that a search engine should surface good packages that are worth using, I'd still argue that a very basic requirement for a search engine is that it finds what you're looking for. Obviously, searching "database" and expecting to find In the much simpler case where a user already knows exactly what package they're looking for, it doesn't make much sense (at least, to me) for the engine to basically ignore that result completely. Packages don't start at high popularity, so this can negatively affect packages that are trying to gain users -- it must be hard to advertise if Pub barely even shows your package. In any case, I find it strange that using the search bar to find a package you know the exact name to can take more time than just hand-typing in a URL like |
@Levi-Lesches: I'm curious what you think about the following example: There is a package without any content called |
any empty package is against the pub.dev policy rules for namesquatting |
You are right for this specific case, but one can also easily imagine this example such as it could have some code, technically above the threshold of namesquatting, but below any usable client. |
I would argue that since it's just one spot in a list of potentially useful results, it's okay to put the exact match in the first spot with no extra heuristics. I would think that users are not typically searching for completely unusable packages (and if they did, there are runner-up results to help), and that we should not degrade the common situation for edge cases. |
I don't think it has to be the first, but it should definitely be visible without scrolling, even on a phone; so top 5. |
I appreciate the explanation, but I respectfully disagree with the current approach. The feedback on this issue has been consistently negative from the community, which I hope will be taken into consideration. There seem to be two competing concerns here: preventing package name squatting versus helping users efficiently find what they're looking for. I believe the first concern may be somewhat overstated. While low-quality packages are certainly an issue, this represents just one entry in search results. In practice, developers rarely select packages based solely on exact name matches - they evaluate multiple factors like stars, tests, likes, issue history, and overall quality scores before making decisions. The Flutter/Dart community is sophisticated enough to make these informed choices. When comparing pub.dev to other package repositories like npm, cargo, or PyPI, I recognize that pub.dev offers significantly better quality indicators. However, the current search algorithm may actually hinder rather than help users find what they're explicitly searching for. Developers don't blindly add the first package that appears in results - they conduct due diligence. I'm concerned that the current approach might be overprotecting users at the expense of usability. From a UX perspective, requiring users to scroll past multiple unrelated packages to find an exact name match creates unnecessary friction. I would suggest either:
This small change could substantially improve the developer experience while still maintaining the valuable quality indicators that set pub.dev apart. I hope this perspective can contribute to reconsidering the current approach. Thank you for your time and consideration. |
I think the following design can please everyone. This design accomplishes 3 things:
@isoos I would love to get your thoughts on this design. |
When I search for
flutter_animate
, a well-maintained package with high stats, the default ("Sort by search relevance") search result does not display the package in first position.Instead, it is shown somewhere down below, currently at 8th position, below a totally unrelated package named
map_picker
.The expected result is:
flutter_animate
in first positionThis is a long-standing issue with pub.dev that affects numerous packages.
The text was updated successfully, but these errors were encountered: