Skip to content

Data Enrichment using Research API and formatting with GPT function calling #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 7, 2024

Conversation

asleshapokhrel-georgian
Copy link

This notebook shows the guide to enrich missing tabular data for a sample of synthetic data. You.com research API is used to get all the relevant data from web and GPT function calling is used to extract the relevant information in a specific format which is used to populate the tables.

Copy link
Contributor

@rodrigo-georgian rodrigo-georgian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Minor comments but LGTM overall. Thanks!!

"# obtain the API key from the environment\n",
"headers = {'x-api-key': os.environ['YDC_API_KEY']}\n",
"\n",
"def get_research_data(company_name, missing_cols, mode=\"research\"):\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christeefy, what's the latest recommended way to do this? Can they import a retriever instead here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retriever implementations are framework-specific. They can import a retriever — the question becomes more about whether we want this example to be framework-agnostic or not. If yes to being framework-agnostic, then pinging the API directly would be the approach.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my bad, this is the chat API, which is implemented as an LLM (instead of a retriever). It is still framework-specific, so my comments above still apply.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know which retrievers have been merged already?
We have examples for langchain and llama index. Any others that they could use here to show off what we have done? If not, I'd suggest going with the llama-index one as we only have one example for that one right now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rodrigo-georgian is llama-index using research API? -> To answer my own question - It's using the search API

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rodrigo-georgian All framework's You retrievers support the search API. Langchain's support News as well.

PRs have been put up across all frameworks to update all their implementations to support News but not yet merged by the authors.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PRs for the chat API have not been created until YDC is ready.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @rodrigo-georgian offline. This example showcases research API. We will leave the implementation as is for now to use rest api and switch with langchain/dspy framework once available.

@rodrigo-georgian rodrigo-georgian merged commit 61d6585 into main Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants