-
Notifications
You must be signed in to change notification settings - Fork 654
feat: add std/text
with word-similarity helpers
#3488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm in favor of adding this, but should we made split this out into a different file? Also, I feel like there is value is exporting |
Putting it in a different file and renaming the function sounds good to me. I also found a probably-more-efficient implementation of levenshteinDistance on deno.land. I could just copy it, but I wasn't sure about the license so I left my function in there. |
flags/mod.ts
Outdated
* @param {number} [options.suggestionLimit=Infinity] - Number of results to return | ||
* @returns {string[]} An array of possible correct spellings for the given word. | ||
*/ | ||
export function didYouMean( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the name should be something like guessWord
, guessCandidate
, etc (No prior example of this kind of naming in std)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a fan of guessWord
.
flags/mod.ts
Outdated
givenWord, | ||
possibleWords, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually make mandatory arguments positional args, instead of putting them in option bag. ref. https://deno.land/[email protected]/references/contributing/style_guide#exported-functions-max-2-args-put-the-rest-into-an-options-object
I agree with @lino-levan . I think this API should be in somewhere else. I also agree that @lino-levan Do you have any suggestion about the place for these APIs? What if we create |
flags/mod.ts
Outdated
/** | ||
* Calculates the Levenshtein distance between two strings. | ||
* | ||
* @param {string} str1 - The first string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Types in @param
directive seem redundant as they are given in TypeScript. We usually omit them in our docs (For example, see https://github.com/denoland/deno_std/blob/6b1da3240ca5f9460df2571096d64566a7cb3c38/async/pool.ts#L31
I'm of the opinion that this and #3440 ought to be in std/text. I think std/string would limit our future options. |
I'm actually glad you guys bring up And just as a catch-all I don't really care about any small renaming, argument-change, file location kind of stuff. I'll probably implement all the suggestions tomorrow (including having it in a different file) |
alright I believe I implemented all the feedback |
flags/mod.ts
Outdated
@@ -784,4 +784,4 @@ export function parse< | |||
} | |||
|
|||
return argv as Args<TArgs, TDoubleDash>; | |||
} | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like unrelated change, could you revert?
flags/test.ts
Outdated
@@ -1906,3 +1906,41 @@ Deno.test("typesOfParseOptionsGenericDefaults", function () { | |||
> | |||
>(true); | |||
}); | |||
|
|||
Deno.test("basicDidYouMean", function () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we move this to a seperate testing file in std/text
?
text/mod.ts
Outdated
* @param {number} [options.suggestionLimit=Infinity] - Number of results to return | ||
* @returns {string[]} An array of possible correct spellings for the given word. | ||
*/ | ||
export function didYouMean( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussion: Should we follow the single-function file pattern that we do elsewhere in std here? Does it make any sense to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should follow single-file export pattern. Users who only want levenshteinDistance
would want to import from std/text/levenshtein_distance.ts
Could you also address |
I think this is a really cool addition to for std, but I don't like the |
I have to agree with @timreichen. |
Sorry I misinterpreted your original request (I renamed I do want to discuss this name change though ( I'd say there's already a bit of visibilty problem with deno-std, like how 3rd party libraries show up first when searching for So for the sake of convention/searchability, I think |
I totally agree with this, it feels werid, but I would like brainstorm some alternatives. Maybe there should be another function with assert in the name? I only added it because, in practice, it is a real pain to manually write the if-checks and error responses every time: function somethingWithArgs(args) {
const firstWordOptions = [ "length", "size", "help", ]
const colorOptions = [ "red", "green", "blue", ]
const videoFormatOptions = [ "mp4", "mov", ]
// arg validation step
didYouMean(args[0], firstWordOptions, { autoThrow: true })
didYouMean(args.color, colorOptions, { autoThrow: true })
didYouMean(args.format, videoFormatOptions, { autoThrow: true })
/* actual logic */
}
// vs
function somethingWithArgs(args) {
const firstWordOptions = [ "length", "size", "help", ]
const colorOptions = [ "red", "green", "blue", ]
const videoFormatOptions = [ "mp4", "mov", ]
// arg validation step
if (!firstWordOptions.includes(args[0])) {
const suggestions = didYouMean({ guessWord: args[0], firstWordOptions }).join(", ")
throw new Error(`${args[0]} isn't a valid argument, did you mean one of ${suggestions}?`)
}
if (!colorOptions.includes(args.color)) {
const suggestions = didYouMean({ guessWord: args.color, videoFormatOptions }).join(", ")
throw new Error(`${args.color} isn't a valid argument, did you mean one of ${suggestions}?`)
}
if (!colorOptions.includes(args.format)) {
const suggestions = didYouMean({ guessWord: args.format, videoFormatOptions }).join(", ")
throw new Error(`${args.format} isn't a valid argument, did you mean one of ${suggestions}?`)
}
/* actual logic */
} |
Thanks for the changes @kt3k. I got halfway through them yesterday after updating from denoland:main but ran out of time. Could you explain the |
text/_util.ts
Outdated
@@ -0,0 +1,129 @@ | |||
// Copyright 2018-2023 the Deno authors. All rights reserved. MIT license. | |||
import { levenshteinDistance } from "./levenshtein_distance.ts"; | |||
import { assert } from "../assert/mod.ts"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kt3k confused by this one, why is this mod.ts
and not assert.ts
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a strong preference, but it was more generic; I could do a find-and-replace-all from the old location ("../testing/asserts.ts";) to the new location, but only if I used "mod.ts" (so that imports like assertEquals, assertThrows
kept working)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My hope would be that in the std we try to minimize the module graph, so not sure I support this move.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay I'll change it
text/_util.ts
Outdated
* use a named-distance (e.g. levenshteinDistance) to | ||
* guarantee a particular ordering | ||
*/ | ||
export function closest( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this is missing a word here. closestWord
? Not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be closestString since spaces are allowed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
closestString
makes sense to me. I don't feel strongly either way, just brought this up for bikeshedding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
text/_util.ts
Outdated
* use a named-distance (e.g. levenshteinDistance) to | ||
* guarantee a particular ordering | ||
*/ | ||
export function similarityCompare( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compareSimilarity
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
text/_util.ts
Outdated
* @param options.caseSensitive - Flag indicating whether the distance should include case. Default is false. | ||
* @returns {string[]} A sorted copy of possibleWords | ||
*/ | ||
export function wordSimilaritySort( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're already exposing similarityCompare
, do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave that up to you guys, its the difference between:
let suggestions = [...possibleWords].sort(similarityCompare("hep", { caseSensitive })).join(", ")
and
let suggestions = wordSimilaritySort("hep", possibleWords).join(", ")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not understanding why one would need to do [...possibleWords]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cause otherwise it mutates possibleWords
, which might be sorted in a particular order already (for example an order that is used when the help command is invoked)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, I should have clarified. Why not use .toSorted()
? It would just be
let suggestions = possibleWords.toSorted(similarityCompare("hep", { caseSensitive })).join(", ")
which is a little more readable imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer older compatibility (e.g. avoiding bleeding-edge ES2023 features), and plus toSorted is literally only one character shorter than the more explicit version.
APIs exported from |
I revisited these links, but what the above pip module does is different from what this PR does (They seem correcting typo based on the dictionary). Stackoverflow question refers to I now feel unsure this design of (BTW I think |
This is starting to feel a bit like review purgatory 😅 I'm just simply trying to contribute:
Yes that's the point; I can't use his implementation and the cliffy author, myself, and others have all had to reimplement the same tool because there isn't one available in the std. Even if it was public in cliffy, it would be weird to depend on cliffy just to validate an enum argument in a function that's going to be used in a web interface. "I need to guarantee that this string belongs to a predefined list of strings, and throw a helpful error if that's not the case" is a common need, and for every duplicate implementation, there's probably 10 more devs that decided "I'm just going to throw an unhelpful error message, after all, I'm writing a tiny JS function and it doesn't make sense to create 2x the work implementing name similarity just to improve the error"
This is why it feels like purgatory; the original PR did correct typos based on a list of strings but based on feedback we renamed it to "closestString" and then the _utils change made it private. It feels like we're going around in circles while also blocking generic and commonly needed functionality. I agree naming is important, and I feel like things that throw should be an
|
Sorry for pressuring!
I feel I looked through the APIs again. Now I feel 3 APIs in I'd like to suggest the below:
If we have |
Hi @jeff-hykin, I spoke with @kt3k. Are you still happy to pursue this PR with the above comment's suggestions? |
Thanks for the ping, and sorry for leaving this PR hanging. I am still interested in it. Yeah, I can make the changes and get this branch back up to date with main.
This is good, thanks for addressing one of my core concerns.
I'm onboard with this sentiment; purity/cleanliness is a top priority for std. Even small things, like _ That said, I still feel there is a gap here. Maybe I we can merge those four functions to get this PR closed and open up a new PR to continue this discussion. For now I'll continue it here. The "let the user implement it" is exactly what I'm trying to avoid. I mean, if the expression is so easy that the end user would just inline it; ok thats fine. But when they need to define a helper function, a function that isn't customized to their specific project in any way, then I think theres a problem. In my opinion me, my coworker, and my student shouldn't each need our own stdlib. But if deno std doesn't implement stuff like assertOneOf, then we will (either that or we need to copy and paste our implementation from project to project which also seems terrible). |
std/text
with word-similarity helpers
I've created #3752 to continue the discussion of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now looks reasonable to me. Thanks for updating! @jeff-hykin
LGTM
Thanks @kt3k ! |
Simple argument helper, but could also be useful in frontend search components as well