-
Notifications
You must be signed in to change notification settings - Fork 6
Data Enrichment using Research API and formatting with GPT function calling #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Minor comments but LGTM overall. Thanks!!
"# obtain the API key from the environment\n", | ||
"headers = {'x-api-key': os.environ['YDC_API_KEY']}\n", | ||
"\n", | ||
"def get_research_data(company_name, missing_cols, mode=\"research\"):\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christeefy, what's the latest recommended way to do this? Can they import a retriever instead here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retriever implementations are framework-specific. They can import a retriever — the question becomes more about whether we want this example to be framework-agnostic or not. If yes to being framework-agnostic, then pinging the API directly would be the approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my bad, this is the chat API, which is implemented as an LLM (instead of a retriever). It is still framework-specific, so my comments above still apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know which retrievers have been merged already?
We have examples for langchain and llama index. Any others that they could use here to show off what we have done? If not, I'd suggest going with the llama-index one as we only have one example for that one right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rodrigo-georgian is llama-index using research API? -> To answer my own question - It's using the search API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rodrigo-georgian All framework's You retrievers support the search API. Langchain's support News as well.
PRs have been put up across all frameworks to update all their implementations to support News but not yet merged by the authors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PRs for the chat API have not been created until YDC is ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with @rodrigo-georgian offline. This example showcases research API. We will leave the implementation as is for now to use rest api and switch with langchain/dspy framework once available.
This notebook shows the guide to enrich missing tabular data for a sample of synthetic data.
You.com
research API is used to get all the relevant data from web and GPT function calling is used to extract the relevant information in a specific format which is used to populate the tables.