Datasets Summarizer is compatible with Jupyter Notebooks. Need the x and y values based on any similarity metric to generated the similarity plot between datasets. Supports the metadata format generated by datamart-profiler library to generate the Detail View to explore each dataset.
( Click one dataset from the list of results to open the Detail View.)
Live demo (Google Colab):
In Jupyter Notebook:
import DatasetsSummarizer
data = DatasetsSummarizer.get_taxi_data()
DatasetsSummarizer.plot_datasets_summary(data)
pip install datasets-summarizer
Use a subset or add a new entry (x
and y
values ) based on a different similatiry metric. For example, here we added x
and y
values based on a similarity metric using a modified version of the titles. Note that modif_title_x
and modif_title_y
must be included in the dataframe.
new_similarity_metrics = [{'name': 'Title', 'x': 'title_x', 'y': 'title_y'},
{'name': 'ModifiedTitle', 'x': 'modif_title_x', 'y': 'modif_title_y'}
]
Then, we can pass this new similarity metrics as a parameter of our visualization
DatasetsSummarizer.plot_datasets_summary(dataframe, new_similarity_metrics)