Trying to understand predicted similarity scores during findAndLabel #1168
Unanswered
rkennedy-argus
asked this question in
Q&A
Replies: 2 comments 1 reply
-
|
Beta Was this translation helpful? Give feedback.
1 reply
-
Sorry for the late reply! For single value columns, you can try switching to exact instead of text. For multiple IDs concatenated with commas, text may be ok. If any of the columns are less than 70-80% populated, you probably dont want them to be part of the model - signal may be too weak and confusing. You typically want to label at least 40-50 matching pairs so that Zingg can build a good first model. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm seeing some pairs during
findAndLabel
with similarity scores that are FAR higher than I would expect. For example:These rows are not even remotely similar, aside from having the same first letter in the
name
field. At first, I thought maybe it was because of the empty fields. But I judiciously applied thenull_or_blank
match type to rule that out and I'm still seeing this similarity score.Here are the field definitions I'm working with:
Beta Was this translation helpful? Give feedback.
All reactions