A set of Docker based scripts to visualize differences in large datasets. Visualizations are png files created by seaborn. Eventually, the goal of this tool is to provide at a glance visual QA of the GRLS UBC dataset.
We begin with two complete databases for comparison. Tables included in data-definition.json are dumped as tsv files. These are compared via datacompy, which creates summary csvs. The summary csvs are then massaged and visualized.
- Make sure images are built for the given host (
docker images). If not, build them via:docker build --target=mysql-env -t=<some name> .anddocker build --target=python-env -t=<some name> .These containers based on these images execute all steps of the script. - Create a workdir directory with subdirectories: one, two, comparison and app. Define the workdir in init.sh.
- Make bash scripts executeable and run
./init.sh