Skip to content

Search for a package with its verbatim name still does not display the package in first position #8518

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
crcdng opened this issue Feb 2, 2025 · 36 comments

Comments

@crcdng
Copy link

crcdng commented Feb 2, 2025

When I search for flutter_animate, a well-maintained package with high stats, the default ("Sort by search relevance") search result does not display the package in first position.

Instead, it is shown somewhere down below, currently at 8th position, below a totally unrelated package named map_picker.

The expected result is:

  • flutter_animate in first position
  • followed by other packages that are related, e.g. with similar names.

This is a long-standing issue with pub.dev that affects numerous packages.

@isoos
Copy link
Collaborator

isoos commented Feb 3, 2025

@crcdng Thanks for reporting it!

@jonasfj @sigurdm:

This seems to be a tough case, because for the given search query:

I'm 95% certain that if those static analysis points were fixed, it would be on the first few spots, likely on the first.

I'm inclined to say that the points in this case are causing a bit more divide in the ranking than warranted. Maybe we should dampen the difference somehow?

@sigurdm
Copy link
Contributor

sigurdm commented Feb 3, 2025

To me it seems fair that the package is not at the top, given that it is not the most popular by downloads nor the most liked nor has the most analysis points. This looks like it is working as intended.

We do show the exact package name match at the top:

Image

@crcdng
Copy link
Author

crcdng commented Feb 3, 2025

To me it seems fair that the package is not at the top, given that it is not the most popular by downloads nor the most liked nor has the most analysis points. This looks like it is working as intended.

We do show the exact package name match at the top:

Image

It's funny - I did not see the "Matching package names..." until mentioned.

I would still argue that a package repository should always display the (one) direct match in the first place regardless of metrics.

@sigurdm
Copy link
Contributor

sigurdm commented Feb 3, 2025

I would still argue that a package repository should always display the (one) direct match in the first place regardless of metrics.

I think this was once the actual behavior, and was decided against. There can be abandoned packages that happen to have squatted some keyword. They should not be shown at the top of the list IMO.

We show direct matches, and we show a ranked list of matches.

I believe having the match in the package name ranks the package slightly higher (@isoos please confirm) but in this case it seems to be not enough.

I agree with @isoos that if the package score was fixed this package would most likely rank much higher.

@isoos
Copy link
Collaborator

isoos commented Feb 3, 2025

I believe having the match in the package name ranks the package slightly higher (@isoos please confirm) but in this case it seems to be not enough.

Right now it ranks the same as if the description or the topics had the same keywords, so not entirely higher. (Similar to the exact matching, this was higher before, but other search examples prompted us to lower the match significance).

@orestesgaolin
Copy link

Imho it's worth comparing to npm which despite having lower number of downloads includes the exactly matched library at the very top with a tag exact match

Image

@isoos
Copy link
Collaborator

isoos commented Feb 17, 2025

@orestesgaolin: I'd llike to emphasize again: we actually had it in the first position as part of the regular results, but then we got complaints: the package was no longer relevant for the query, despite the name match. Hence the current compromise with exposing name matches separately. It is not clear which solution is really better.

@Albert221
Copy link

Albert221 commented Feb 17, 2025

Maybe introduce some threshold that decides whether the exact match is still relevant? E.g. if it got an update in the last x months, got liked n times in the last n months or something like this. So we could skip the name squats but otherwise put the exact match at the top, in a manner that we all scan for, so a normal row with package info, not a small paragraph with a small link :)

@isoos
Copy link
Collaborator

isoos commented Feb 17, 2025

@Albert221: We are not removing the package from the result list. If it is relevant, it will be listed in (one of) the top position(s), and at the moment we don't have yet-another-relevancy score to up-or-downrank special cases.

@filiph
Copy link

filiph commented Feb 17, 2025

FWIW, this has tripped me up several times as a user. I overlook the "Matching package names" line and then only see the one or two leading packages being different from what I'm searching for. I almost had a heart attack when this happened the last time — I was searching for package:xml and got this:

Image

https://pub.dev/packages?q=xml

If you're not trained to look for the exact match line, you can easily think that the package has been discontinued or removed or something.

Only after remembering that this has happened before and scrolling down did I find the actual package.

@orestesgaolin
Copy link

I think we could rephrase this problem and ask for more explicit presentation of directly matched package. Perhaps card-like style similar to all the remaining entries could help with discoverability. This way we could avoid tweaking the search relevance algorithm and just apply relatively straightforward UI change.

Despite using pub.dev daily, I get confused all the time. It could make an interesting case in ux study ;)

@crcdng
Copy link
Author

crcdng commented Feb 17, 2025

FWIW, this has tripped me up several times as a user. I overlook the "Matching package names" line and then only see the one or two leading packages being different from what I'm searching for. I almost had a heart attack when this happened the last time — I was searching for package:xml and got this:

exactly, the current design creates a perceptual trap similar to this one:

Image

@filiph
Copy link

filiph commented Feb 18, 2025

One obviously needs to take into account packages that don't "deserve" to be listed first for a given search term. For example, if there's a Foo Database and, before it gets its own quality, official API (pkg:foo_database), someone else releases an inferior pkg:foo, you don't want the exact match to be forever number one in the search results.

So this has no simple solution. But I'd propose tweaking the weight of exact match, or the formula that combines the different signals together. You generally want high quality, popular packages with exact match (like pkg:xml) appear first even when there are packages with higher scores that also match the query.

@sigurdm
Copy link
Contributor

sigurdm commented Feb 18, 2025

@jonasfj you looked at the top queries, and concluded that most search on pub.dev indeed is for a known package. Did I get this right?

If that is indeed true, then I agree, we should probably do either or both of

  • styling the "exact match" better to make it more obvious
  • giving higher ranking for a close-name-match

@benthillerkus
Copy link

I want to emphasize that no other package repository I have interacted with so far, be that pypi, npm, crates.io, lib.rs or any Linux Distro GUI package manager frontend has had this problem.

Neither the bad sorting, nor the weird ux where the exact match is hidden in a small text.

Here's an example for weird ordering
Image

And here's an example for way too aggressive fuzzy search
Image

@sigurdm
Copy link
Contributor

sigurdm commented Feb 24, 2025

It is a different tradeoff I guess. If I search for 'xml' on crates, I indeed get the 'xml' crate as the first result. But judging from the download counts I would probably rather be using 'fast-xml', and thus I think that package should be listed higher.

Image

I still think showing the exact match as a separate thing is useful - but agree we should style it to be more noticable.
#8573 is a start here, but I think we should do even more.

Another example "yaml":

Image

@filiph
Copy link

filiph commented Feb 24, 2025

Totally agree there's a trade off. (See my parable above about pkg:foo_database versus pkg:foo.) I don't think it's a good idea to always show the exact match at first place.

But I also think there's something fishy about the current formula. Because for the search term xml, you get this:

Image

The exact match pkg:xml has 411 likes and 4 million downloads. It's number 3 in the search results. The two packages that are above it have a fraction of the likes and downloads, and they're not as relevant (gpx is for GPS data in XML form, xml2json does what it says on the tin; xml is for parsing and building XML). It almost looks like the few additional pub points have an outsized influence on the ranking?

If that's so, I suggest decreasing the effect of pub points. Sure, a package with 60/150 pub points should not be #1 in search results as long as there's almost any other package. But when packages reach some reasonable level, the pub points signal is much less important. A single info-level lint is clearly not something that should dethrone a million-downloads-per-week package that's used in 459 other packages.

@sigurdm
Copy link
Contributor

sigurdm commented Feb 24, 2025

We (really @isoos) are trying to make the difference caused by the last few pub points less in #8572 we'll see the effect when that is deployed. (I think this can be seen on staging already https://staging.pub.dev/packages?q=xml)

@crcdng
Copy link
Author

crcdng commented Feb 24, 2025

We (really @isoos) are trying to make the difference caused by the last few pub points less in #8572 we'll see the effect when that is deployed. (I think this can be seen on staging already https://staging.pub.dev/packages?q=xml)

Comment: it's better, especially as the current result has something rather unrelated on top position.

I still wonder, how can a package that

  • matches the search intention / search term exactly
  • has more than 3x the likes
  • approx. 20x the downloads

In other words a vastly more significant search result appear on place 2 merely because it scores 3% less on some rather complex and arbitrary "points" metric?

@isoos
Copy link
Collaborator

isoos commented Feb 24, 2025

I still wonder, how can a package that
matches the name exactly
has more than 3x the likes
approx. 20x the downloads
In other words a vastly more significant search result appear on place 2 merely because it scores 3% less on some rather complex and arbitrary "points" metric?

@crcdng: to answer the specifics:

  • Exact name match has the same score as if the string was found in the package description or its topics (this is to de-emphasize the importance of the names so people won't fight over the "best" names).
  • The likes and download counts are not scored linearly: the 100M downloads should not worth 100x times than 1M downloads or 1000x times than 100k downloads. Instead, we order them in increasing order, and score 0.0-1.0 along the packages linearly: the least amount of downloads gets 0.0, the median download will get 0.5, the top download will get 1.0. Same with likes.

We then combine the like, the download count (50-50% right now) into a merged score, and then combine it with the pub points (which is 0.0-1.0 depending on the given points / max points). I've started tuning the later, effectively compressing the high end of that range.

It is important to note that while you may find a few compelling queries to blindly promote the exactly matching package to the top, we have quite a lot of examples where this is not ideal at all. It will be always a balance.

@crcdng
Copy link
Author

crcdng commented Feb 24, 2025

I still wonder, how can a package that
matches the name exactly
has more than 3x the likes
approx. 20x the downloads
In other words a vastly more significant search result appear on place 2 merely because it scores 3% less on some rather complex and arbitrary "points" metric?

@crcdng: to answer the specifics:

Thanks for the explanation. I understand this is a complex task. I am mostly judging from the current results (I encountered really many examples before filing the bug report) and I think there will be improvement. I'm still not convinced the "weights" will be calibrated correctly.

To reiterate, when I put "xml" in the pub.dev search field, my intention is to find out if / what kind of xml library or libraries are there for Flutter / Dart. Or I might have read about or heard of a Flutter "xml" package. Now I want to check it out.

In particular, I am NOT searching for a xml to json converter package (I would have entered something like "xml json") and I am clearly not searching for a package to "load, manipulate, and save GPS data in GPX format" a package that happens to be based on XML. Here my search would have mentioned something GPS or geodata related.

And the fact that on some metric the package I am actually looking for ranks lower than the two that I am not looking for, but are returned above it doesn't matter, because I am not looking for these other two packages, as explained above. Therefore the current metric comparison between these packages really makes no sense. Taken to the extreme, your strategy could be to always return the highest metric package regardless the search term in order to ensure good quality results.

I think putting such high weights on the "points" metric makes sense when we don't have an exact match.

Now your point is, if I'm correct, that someone would publish a package called "a" that does "b" and that therefore should not be the first result looking for "a". It makes sense when the ranking punishes that behaviour. But if that treacherous package would have proper points, it would still be top-ranked, so clearly this is not a solution to the problem. In the current example "xml" is missing 10 points on static analysis, which is not an indicator of a match between package name and package purpose at all.

@filiph
Copy link

filiph commented Feb 24, 2025

Exact name match has the same score as if the string was found in the package description or its topics (this is to de-emphasize the importance of the names so people won't fight over the "best" names).

I wonder: is the idea of tf-idf (Term Frequency-Inverse Document Frequency, or "term specificity") applied to the pubdev ranking algorithm? If not, that might be the actual problem. I'm no expert but from what I understand, tf-idf exists to de-emphasize documents such as "xml 2 json" or "GPS XML" when what you're looking for is "xml".

I think we all agree that just blindly showing exact matches for every query would just lead to more name-squatting and worse results overall. But yeah, flutter_animate and xml probably deserve to be number ones for their exact-match queries.

@isoos
Copy link
Collaborator

isoos commented Feb 24, 2025

Now your point is, if I'm correct, that someone would publish a package called "a" that does "b" and that therefore should not be the first result looking for "a". It makes sense when the ranking punishes that behaviour. But if that treacherous package would have proper points, it would still be top-ranked, so clearly this is not a solution to the problem.

Such nefarious package wouldn't be top ranked without other indicators, e.g. likes or downloads. Of course one could game that with enough dedication, but what's the point if the package otherwise is crap? It is not that one can get massive benefits by gaming search like that.

In the current example "xml" is missing 10 points on static analysis, which is not an indicator of a match between package name and package purpose at all.

To be fair, there is always some ranking that will result in similar differences: e.g. a new fork of an abandoned package may be fixing all the obsolete stuff and errors, and if you want to search for its name or features, you may want to get the new one with more points but less downloads to be ranked higher. Unfortunately there is no single metric that gets every ranking just right.

Aside: the package:xml case is also interesting, because the author already fixed the linter issue 2 months ago, they just haven't release it as part of a new version (or like any release in 15 months):
renggli/dart-xml@ac5cd8c

@isoos
Copy link
Collaborator

isoos commented Feb 24, 2025

is the idea of tf-idf (Term Frequency-Inverse Document Frequency, or "term specificity") applied to the pubdev ranking algorithm?

Yes and no: we had something like it, but over the years it got diluted/compressed/changed, and I wouldn't call it tf-idf anymore.

The reasons are the usual: over the years we had similar discussions like this thread, prompting us to change/tune the ranking algorithm in use, and it drifted to its current state. The reluctance to swiftly change the ranking algorithm comes from this experience: we have seen very different ranking preferences and requests already, and did our best to fix the algorithm to accommodate the goals of the users. We intend to do the same here, but the solution may be different than what comes to mind in the first place.

@Levi-Lesches
Copy link

@isoos @sigurdm I looked at #8573 and how it appears in the staging website

Image

I think the problem people are having in this thread is that the exact match result looks like a lot more like a "Did you mean ___?" prompt than a package result at first glance. It's just a linked word and doesn't have any of the package details like the other results do. I strongly think following npm is the right move here, with how they still include the entire package information, but explicitly note that it's at the top because it's an exact match (copying the screenshot from earlier in the thread)

Image

@ccadieux
Copy link

ccadieux commented Mar 3, 2025

This has been a UX issue for me as well.
I never noticed the exact matches so I'd resort to google to find the package.

I think improving the UI for the exact matches is all that's needed for a fix.
If that's more clearly called out in the UI then I don't think this is really an Issue.

@hamza-imran75
Copy link

hamza-imran75 commented Apr 1, 2025

@sigurdm Doesn't always seem to work for exact match, searched for drops package and couldn't find it in the search result nor in the exact match
Drops:
https://pub.dev/packages/drops

Screenshot_20250401-102216.jpg

@isoos
Copy link
Collaborator

isoos commented Apr 1, 2025

@hamza-imran75: Thanks for reporting this, as this surfaces a bug where we don't lowercase the search expression for exact package name matching.

However, we still won't move the exact name matches to the first hit spot, e.g. in this case package:drops is a relatively young package (less than a month old), it has not too many likes, not too many downloads. Once those catch up, it will move up in ranking position. (Also needs to fix the scores too.) (Aside: it starts with a typo in both the description and the readme: pacakge, but this shouldn't influence its general ranking).

@Levi-Lesches
Copy link

Levi-Lesches commented Apr 2, 2025

However, we still won't move the exact name matches to the first hit spot, e.g. in this case package:drops is a relatively young package (less than a month old), it has not too many likes, not too many downloads. Once those catch up, it will move up in ranking position.

While I agree that a search engine should surface good packages that are worth using, I'd still argue that a very basic requirement for a search engine is that it finds what you're looking for. Obviously, searching "database" and expecting to find cloud_firestore can be considered a goal, and searching through the description and surfacing high-ranking packages can be a means to that end, but that's a case where the user is relying on the search engine to do the hard part of choosing a good database for them.

In the much simpler case where a user already knows exactly what package they're looking for, it doesn't make much sense (at least, to me) for the engine to basically ignore that result completely. Packages don't start at high popularity, so this can negatively affect packages that are trying to gain users -- it must be hard to advertise if Pub barely even shows your package.

In any case, I find it strange that using the search bar to find a package you know the exact name to can take more time than just hand-typing in a URL like https://pub.dev/packages/package_name yourself. When I'm looking for changelogs or API docs, I always use URLs as I know it'll get me where I want, whereas I can't say the same for the search bar

@isoos
Copy link
Collaborator

isoos commented Apr 2, 2025

@Levi-Lesches: I'm curious what you think about the following example:

There is a package without any content called mysql and searching for mysql won't promote it on the first package spot, only at the exact package name matches. If we were to always promote exact name matches, it would certainly downgrade the user experience for this query. If we were to come up with arbitrary rules about when to no promote to the first spot, it would cause inconsistency and possible frustration in certain cases.

@benthillerkus
Copy link

any empty package is against the pub.dev policy rules for namesquatting

@isoos
Copy link
Collaborator

isoos commented Apr 2, 2025

any empty package is against the pub.dev policy rules for namesquatting

You are right for this specific case, but one can also easily imagine this example such as it could have some code, technically above the threshold of namesquatting, but below any usable client.

@Levi-Lesches
Copy link

I would argue that since it's just one spot in a list of potentially useful results, it's okay to put the exact match in the first spot with no extra heuristics.

I would think that users are not typically searching for completely unusable packages (and if they did, there are runner-up results to help), and that we should not degrade the common situation for edge cases.

@benthillerkus
Copy link

I don't think it has to be the first, but it should definitely be visible without scrolling, even on a phone; so top 5.

@dickermoshe
Copy link

I appreciate the explanation, but I respectfully disagree with the current approach. The feedback on this issue has been consistently negative from the community, which I hope will be taken into consideration.

There seem to be two competing concerns here: preventing package name squatting versus helping users efficiently find what they're looking for. I believe the first concern may be somewhat overstated.

While low-quality packages are certainly an issue, this represents just one entry in search results. In practice, developers rarely select packages based solely on exact name matches - they evaluate multiple factors like stars, tests, likes, issue history, and overall quality scores before making decisions. The Flutter/Dart community is sophisticated enough to make these informed choices.

When comparing pub.dev to other package repositories like npm, cargo, or PyPI, I recognize that pub.dev offers significantly better quality indicators. However, the current search algorithm may actually hinder rather than help users find what they're explicitly searching for.

Developers don't blindly add the first package that appears in results - they conduct due diligence. I'm concerned that the current approach might be overprotecting users at the expense of usability. From a UX perspective, requiring users to scroll past multiple unrelated packages to find an exact name match creates unnecessary friction.

I would suggest either:

  • Reverting this search behavior to prioritize exact name matches in the main results, or
  • Making the "Matching package names" section significantly more prominent and visually distinct

This small change could substantially improve the developer experience while still maintaining the valuable quality indicators that set pub.dev apart.

I hope this perspective can contribute to reconsidering the current approach. Thank you for your time and consideration.

@dickermoshe
Copy link

dickermoshe commented Apr 27, 2025

I think the following design can please everyone.

Image

This design accomplishes 3 things:

  1. A user who searches for a package verbatim is immediately drawn to the top result, clearly labeled as an "Exact Match"
  2. Users who are looking for a specific package using descriptions or a partial name will still see relevant results below, with the exact match not dominating the interface. This top "Exact Match" result will never be taller that shown above. Longer descriptions will be ellipsized...
  3. Users who are casually browsing pub.dev looking for a package to accomplish a specific task can see clearly that this package is only shown first because it is an exact match. They are also shown fewer information about the package which will make the this result less enticing to select.

@isoos I would love to get your thoughts on this design.
I definitely think we can find a happy medium which will address both of these concerns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

12 participants