This application allows you to generate a JSON dataset from ChatGPT conversations shared via a URL.
Link to Streamlit App https://chatgpt-dataset-generator.streamlit.app/
- Extracts ChatGPT conversation data from a shareable URL.
- Converts the extracted data into a JSON format.
- Provides a downloadable JSON file of the conversation data.
-
Enter the ChatGPT Share URL: Enter the URL of the ChatGPT conversation you want to scrape in the provided input field. Note that the URL must be a shareable link created from the ChatGPT conversation options.
-
Generate Dataset: Click the "Generate Dataset" button to start scraping. If the URL is valid, the conversation data will be displayed on the screen and a download button will appear to download the JSON file.
-
Clone the repository:
git clone https://github.com/yourusername/streamlit_scraper.git cd streamlit_scraper
-
Create and activate a virtual environment:
python3 -m venv venv source venv/bin/activate
-
Install the required packages:
pip install -r requirements.txt
-
Run the Streamlit app:
streamlit run app.py
app.py
: The main Streamlit app script.scraper.py
: The script that contains the web scraping logic using Selenium.requirements.txt
: The list of required Python packages..gitignore
: Git ignore file to exclude unnecessary files and directories.README.md
: This file.
- If the URL field is empty, an error message will be displayed:
Please paste URL to start.
- If the URL does not have the required structure (
/share/
), an error message will be displayed:Please create share link in conversation options.
- Author: Anthony Tatekawa
- Description: Generate a JSON dataset from ChatGPT conversations.
- Keywords: ChatGPT, JSON dataset generator, Streamlit app
This project is licensed under the MIT License - see the LICENSE file for details.