DatasetsSummarizer

Datasets Summarizer is compatible with Jupyter Notebooks. Need the x and y values based on any similarity metric to generated the similarity plot between datasets. Supports the metadata format generated by datamart-profiler library to generate the Detail View to explore each dataset.

( Click one dataset from the list of results to open the Detail View.)

Demo

Live demo (Google Colab):

Dataset results for Taxi query

In Jupyter Notebook:

import DatasetsSummarizer
data = DatasetsSummarizer.get_taxi_data()
DatasetsSummarizer.plot_datasets_summary(data)

Install

Option 1: install via pip:

pip install datasets-summarizer

Custom similarity metric

Use a subset or add a new entry (x and y values ) based on a different similatiry metric. For example, here we added x and y values based on a similarity metric using a modified version of the titles. Note that modif_title_x and modif_title_y must be included in the dataframe.

new_similarity_metrics = [{'name': 'Title', 'x': 'title_x', 'y': 'title_y'},
                          {'name': 'ModifiedTitle', 'x': 'modif_title_x', 'y': 'modif_title_y'}
                         ]

Then, we can pass this new similarity metrics as a parameter of our visualization

DatasetsSummarizer.plot_datasets_summary(dataframe, new_similarity_metrics)

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.ipynb_checkpoints		.ipynb_checkpoints
DatasetsSummarizer		DatasetsSummarizer
build/lib/DatasetsSummarizer		build/lib/DatasetsSummarizer
datasets_summarizer.egg-info		datasets_summarizer.egg-info
dist		dist
.DS_Store		.DS_Store
Demo.ipynb		Demo.ipynb
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
auctus_integration.ipynb		auctus_integration.ipynb
fake_dataframe_about_datasets.csv		fake_dataframe_about_datasets.csv
fake_dataframe_about_datasets.ipynb		fake_dataframe_about_datasets.ipynb
fake_dataframe_about_datasets_modified.csv		fake_dataframe_about_datasets_modified.csv
setup.py		setup.py
taxi_full_metadata_and_scatterplot_coordinates.csv		taxi_full_metadata_and_scatterplot_coordinates.csv
taxi_metadata_cleaned_cols_only.csv		taxi_metadata_cleaned_cols_only.csv
taxi_metadata_from_pandas_full_metadata.csv		taxi_metadata_from_pandas_full_metadata.csv
taxi_metadata_standardized_cols_cleaned_with_scatterplot_coords.csv		taxi_metadata_standardized_cols_cleaned_with_scatterplot_coords.csv
taxi_metadata_standardized_cols_cleaned_with_scatterplot_coords_normalized.csv		taxi_metadata_standardized_cols_cleaned_with_scatterplot_coords_normalized.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DatasetsSummarizer

Demo

Install

Option 1: install via pip:

Custom similarity metric

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

soniacq/DatasetsVis

Folders and files

Latest commit

History

Repository files navigation

DatasetsSummarizer

Demo

Install

Option 1: install via pip:

Custom similarity metric

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages