Skip to content

Name Matching on Large Datasets in Databricks #1177

Open
@ojasch3

Description

@ojasch3

Hi,
I recently came across Zing, it looks like a fantastic product. I see that it has databricks integration. Basically, what I am trying to do is do large name matching between external datasources from which I have ingested into databricks to internal names we have in our own tables. How to go about this? For instance, one use case is I have brought into about 4,000 names and I want to see which of our 6500 names it produces best matches to. Additionally, I want to know how well this would scale because another use case compares 85,000 external names to 300,000 internal names to see what best matches.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions