Welcome to YData Academy, a collection of hands-on tutorials and use-cases built entirely with the ydata-sdk.
Whether you're just getting started with synthetic data or diving into advanced data anonymization and generative AI, this repository will guide you through each core capability of ydata-sdk
.
ydata-sdk
is a Python package designed to simplify data-centric AI development. It includes tools for:
- Data exploration and profiling
- Synthetic data generation and evaluation
- Data anonymization and privacy preservation
- Integrations with generative AI for document analysis and Q&A (questions and answers pairs)
Folder | Description |
---|---|
1. Data & Connectors |
Working with connectors, datasets, schema definitions, and metadata exploration |
2. Data Profiling |
Using ydata-sdk to profile datasets for structure, quality, and distributions |
3. Synthetic Data Generation |
Creating synthetic data using ydata's generative models |
4. Generative AI - Documents & Q&A |
Generation of synthetic documents and Q&A pairs from existing documents |
5. Synthetic Data Evaluation |
Measuring utility, fidelity, and privacy of synthetic data |
6. Anonymizer |
Applying anonymization techniques to protect sensitive data |
7. Data Preparation & Cleaning |
Data preparation auxiliar methods |
8. Use Cases |
A set of ready to use-cases templated with ydata-sdk |
pip install ydata-sdk
- Python 3.9+
Clone the repo and start Jupyter:
git clone https://github.com/ydataai/academy
cd academy
jupyter notebook
Open any notebook under the folders to explore and run the code examples interactively.
Contributions are welcome! Please see our contribution guide for guidelines.