Starred repositories
Resources for tackling record linkage / deduplication / data matching problems
An awesome list of awesome YouTubers that teach about technology. Tutorials about web development, computer science, machine learning, game development, cybersecurity, and more.
API, CLI, and Web App for analyzing and finding a person's profile in 1000 social media \ websites
Visualization of all roads within any city
Natural language parsing of dates and recurring events
s3path is a pathlib extension for AWS S3 Service
Some utility scripts for making rolling your own cloud gaming server with Parsec on AWS easier, particularly with automation
Roam Research - A note-taking tool for networked thought.
Documentation and issues for Pylance
Data and code behind the articles and graphics at FiveThirtyEight
Streamlit — A faster way to build and share data apps.
DuckDB is an analytical in-process SQL database management system
Launch Parsec enabled cloud computers via your own cloud provider account.
A curated list of awesome warez and piracy links
A boilerplate for writing PySpark Jobs
Sourcetrail - free and open-source interactive source explorer
Create agents that monitor and act on your behalf. Your agents are standing by!
Questions to ask the company during your interview
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
pg_activity is a top like application for PostgreSQL server activity monitoring.
BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs


