Highlights
- Pro
Stars
Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
Styx: Transactional Stateful Functions on Streaming Dataflows
Build resilient language agents as graphs.
Implementation of a cache-efficient, multithreaded, lock-free, hash-based join pipeline utilizing a memory-efficient hash table optimized for joins. This project was created for the SIGMOD 2025 Pro…
Official Implementation of Poly2vec presented @ [ICML '25]
TPC-DS benchmark kit with some modifications/fixes
Mirror of the official PostgreSQL GIT repository. Note that this is just a *mirror* - we don't work with pull requests on github. To contribute, please see https://wiki.postgresql.org/wiki/Submitti…
QuestDB is a high performance, open-source, time-series database
Learn Low Level Design (LLD) and prepare for interviews using free resources.
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and cont…
PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
A list of learning materials to understand databases internals
pombredanne / xxHash-3
Forked from Cyan4973/xxHashExtremely fast non-cryptographic hash algorithm
Learn System Design concepts and prepare for interviews using free resources.
Learn how to design systems at scale and prepare for system design interviews
Explain complex systems using visuals and simple terms. Help you prepare for system design interviews.
geokats / dithesis
Forked from errikos/dithesisThesis class for undergraduate theses at the University of Athens
Papers for database systems powered by artificial intelligence (machine learning for database)
PVLDB Artifact Availability for the paper "Asymptotically Better Query Optimization Using Indexed Algebra"
Repository with an overview of the tutorial on Models and Practice of Neural Table Representations and up to date material for the hands-on part. This tutorial will be given at SIGMOD 2023.
StreamDQ is a library built on top of Apache Flink for defining "unit tests for data", which measure data quality in large data streams.
Data-Centric What-If Analysis for Native Machine Learning Pipelines


