A comprehensive, well-structured collection of resources, tools, and frameworks for Large Language Models and Generative AI applications. This repository serves as a central hub for developers, researchers, and practitioners working with modern AI systems.
| Provider | Models | Key Features |
|---|---|---|
| OpenAI | GPT-4o, GPT-4.1, GPT-o3, GPT-4o mini | Function calling, reasoning, multimodal |
| Anthropic | Claude 3.5 Sonnet, Claude 3.7 Sonnet | Long context, safety, computer use |
| Gemini 2.5 Pro, Gemini 2.0 Flash | Multimodal, 2M token context | |
| xAI | Grok-3 | Real-time data, X integration |
| Cohere | Command R+ | RAG optimization, multilingual |
| Provider | Models | Parameters | Key Features |
|---|---|---|---|
| Meta | Llama 3.1 405B, Llama 3.3 70B | 405B, 70B | Open weights, commercial use |
| Alibaba | Qwen 3, Qwen 2.5-Max | 235B+ | Multilingual, reasoning |
| DeepSeek | DeepSeek R1, DeepSeek-V3 | 671B | Reasoning, MoE architecture |
| Mistral | Mistral Large 2, Mixtral 8x22B | 123B, 141B | European, efficiency |
| Microsoft | Phi-3 | 3B-14B | Small models, high performance |
- Reasoning: o1, o3-mini, DeepSeek R1
- Code: CodeLlama, DeepSeek Coder, StarCoder2
- Math: Llemma, MathLlama
- Long Context: Gemini 1.5 Pro (2M tokens), Claude-3 (200k tokens)
- Proprietary: GPT-4V, Claude 3.5 Sonnet, Gemini Vision Pro
- Open Source: LLaVA, Qwen2-VL, InternVL, CogVLM, LLaVA-NeXT
- Specialized: PaliGemma, Flamingo, BLIP-2, InstructBLIP
- Diffusion Models: Stable Diffusion XL, Midjourney, DALL-E 3
- Other: Imagen, Parti
- Sora, Runway Gen-2, Pika Labs, VideoCrafter
| Database | Type | Key Features | Link |
|---|---|---|---|
| FAISS | Library | Facebook's similarity search | GitHub |
| Chroma | Embedded | Developer-friendly, SQL-like | GitHub |
| Pinecone | Managed | Serverless, real-time updates | Website |
| Weaviate | Open Source | GraphQL, vector + object search | Website |
| Qdrant | Rust-based | High performance, filtering | Website |
| Milvus | Distributed | Scalable, cloud-native | Website |
- ColBERT: Late interaction, Stanford Research - GitHub
- Dense Passage Retrieval (DPR): Facebook's dense retrieval
- SPLADE: Sparse + dense hybrid retrieval
- Elasticsearch: Full-text + vector search
- OpenSearch: AWS's Elasticsearch fork
- Vespa: Yahoo's big data serving engine
| Database | Type | Performance | Best For |
|---|---|---|---|
| Neo4j | Native Graph | High | OLTP, real-time queries |
| ArangoDB | Multi-model | Very High | Graph + document + key-value |
| TigerGraph | Native Graph | Extreme | Analytics, large-scale |
| Amazon Neptune | Managed | High | AWS ecosystem |
- RDF Stores: Apache Jena, Stardog, GraphDB
- Triple Stores: Blazegraph, Virtuoso
- NetworkX: Python graph analysis
- PyG: PyTorch Geometric for GNNs
- DGL: Deep Graph Library
| Benchmark | Focus | Link |
|---|---|---|
| Berkeley Function Calling Leaderboard (BFCL) | Function accuracy | Website |
| ToolBench | Tool use evaluation | GitHub |
| ToolScan | Error pattern analysis | Research paper |
| DPAB-α | Pythonic vs JSON calling | GitHub |
- MMMU: Multi-discipline multimodal understanding
- MMBench: Comprehensive vision-language evaluation
- SEED-Bench: Multimodal LLM evaluation
- LLaVA-Bench: Visual instruction following
- Open LLM Leaderboard: Hugging Face's comprehensive ranking
- Chatbot Arena: Human preference evaluation (LMSYS)
- HELM: Stanford's holistic evaluation
- BigBench: Google's diverse task collection
- GPQA: Graduate-level science questions
- MATH: Competition mathematics
- GSM8K: Grade school math word problems
- MuSR: Multi-step reasoning tasks
| Server | Optimization | Best For | Key Features |
|---|---|---|---|
| vLLM | GPU | High throughput | Tensor parallelism, continuous batching, PagedAttention |
| Llama.cpp | CPU/GPU | Edge deployment | GGUF quantization, Metal/CUDA support |
| TensorRT-LLM | NVIDIA GPU | Maximum performance | TensorRT optimization |
| Text Generation Inference (TGI) | GPU | Production | HuggingFace integration |
| SGLang | GPU | Structured generation | Fast serving with constraints |
| ExLlamaV2 | GPU | Memory efficiency | EXL2 quantization |
- Ollama: Simple local model serving
- LocalAI: OpenAI API compatibility
- Jan: Desktop AI assistant
- LM Studio: GUI for local models
- GPT4All: Cross-platform local AI
| Method | Target | Bits | Best For |
|---|---|---|---|
| GGUF | CPU/GPU | 2-8 bit | General purpose, llama.cpp |
| GPTQ | GPU | 4 bit | GPU inference, older method |
| AWQ | GPU | 4 bit | Activation-aware, 2x faster than GPTQ |
| EXL2 | GPU | Mixed | Best performance, ExLlama |
| BitsAndBytes | GPU | 4/8 bit | Hugging Face integration |
- LangChain Tools: Comprehensive tool ecosystem
- Haystack Tools: Enterprise-focused tools
- ReAct: Reasoning and Acting pattern
- Toolformer: Self-supervised tool learning
| Component | Description | Link |
|---|---|---|
| Core Protocol | Anthropic's standardization | Docs |
| MCP Servers | Playwright, Database, Filesystem | Various implementations |
| MCP Clients | Claude Desktop, Semantic Kernel | Integration examples |
| Framework | Focus | Language | Key Features |
|---|---|---|---|
| AutoGen | Multi-agent conversation | Python | Microsoft's framework, role-based agents |
| CrewAI | Team-based AI | Python | Task delegation, collaborative workflows |
| LangGraph | Graph-based orchestration | Python | State management, complex workflows |
| MetaGPT | Software development | Python | Multi-role software team simulation |
- LangChain: Comprehensive LLM framework
- Haystack: NLP pipeline framework
- Semantic Kernel: Microsoft's AI orchestration
- Phidata: Production-ready agent framework
- JADE: Java Agent Development Framework
- Mesa: Agent-based modeling in Python
- Ray RLlib: Multi-agent reinforcement learning
- RASA: Conversational AI framework
| Tool | Type | Key Features |
|---|---|---|
| OpenTelemetry | Standard | Distributed tracing, vendor-neutral |
| LangFuse | Platform | LLM-specific observability |
| Weights & Biases | Platform | Experiment tracking, model monitoring |
| Arize | Platform | ML monitoring, drift detection |
| LangSmith | Platform | LangChain's debugging tool |
- MLFlow: Open source ML lifecycle management
- Neptune: Experiment tracking and model registry
- ClearML: ML/DL experiment manager
- TensorBoard: TensorFlow's visualization toolkit
- Prometheus: Time series monitoring
- Grafana: Observability dashboards
- Datadog: Cloud monitoring platform
- New Relic: Application performance monitoring
| Framework | Focus | Key Features |
|---|---|---|
| Arize Phoenix | LLM evaluation | Trace analysis, hallucination detection |
| RAGAS | RAG evaluation | Retrieval and generation metrics |
| DeepEval | LLM testing | Unit testing for LLMs |
| TruLens | Truthfulness | Trust and transparency evaluation |
- LangChain Evaluation: Built-in evaluation tools
- Promptfoo: CLI evaluation tool
- OpenAI Evals: OpenAI's evaluation suite
- LlamaIndex Evaluation: RAG-specific evaluations
- LM Evaluation Harness: Standardized LLM evaluation
- BigBench: Google's comprehensive benchmark
- HELM: Stanford's holistic evaluation
- Sentence Transformers: Easy sentence embeddings
- BGE Models: BAAI's general embeddings
- E5 Models: Microsoft's embedding models
- CLIP: Vision-language embeddings
- Templates & Libraries: LangChain prompts, Promptify
- Techniques: Few-shot, Chain-of-Thought, ReAct, Tree of Thoughts
| Framework | Focus | Key Features |
|---|---|---|
| Guardrails AI | Output validation | Pydantic integration, validators |
| NeMo Guardrails | Conversation control | NVIDIA's dialogue management |
| LlamaGuard | Content safety | Facebook's safety classifier |
- Hugging Face Transformers: Comprehensive ML framework
- Axolotl: Fine-tuning toolkit
- Unsloth: Fast fine-tuning
- LLaMA Factory: Easy LLM fine-tuning
- Text Processing: spaCy, NLTK, Tokenizers
- Data Validation: Pydantic, Pandera, Great Expectations
- Synthetic Data: Faker, SDV, Gretel
This repository includes a comprehensive network graph showing relationships between different components:
- Core Dependencies: How models depend on inference servers
- Integration Patterns: How vector databases connect to agentic systems
- Monitoring Flows: How observability tools track different components
- Evaluation Chains: How benchmarks assess various capabilities
-
Choose Your Use Case:
- Text generation → Text Models + Inference Servers
- RAG applications → Text Models + Vector Databases + Embeddings
- Multi-agent systems → Agentic Frameworks + Tools
- Production deployment → Observability + Guardrails + Evaluation
-
Select Appropriate Tools:
- Local deployment: Ollama + llama.cpp + GGUF quantization
- Cloud deployment: vLLM + OpenTelemetry + MLFlow
- Research: Hugging Face + Evaluation frameworks
-
Implement Monitoring:
- Add observability from day one
- Set up evaluation benchmarks
- Implement safety guardrails
This repository is designed to be a living document. Contributions are welcome for:
- Adding new tools and frameworks
- Updating benchmark results
- Improving categorization
- Adding relationship mappings
- Sharing implementation examples
- Multimodal Integration: Vision + Text + Audio models
- Edge Deployment: Smaller, more efficient models
- Agentic Workflows: Complex multi-step reasoning
- Safety & Alignment: Robust guardrails and monitoring
- Standardization: Protocols like MCP gaining adoption
Note: This repository focuses on the rapidly evolving LLM ecosystem. Tools, benchmarks, and best practices are constantly changing. Please verify current versions and compatibility before implementation.
- Total Categories: 10 major categories
- Tools & Frameworks: 100+ listed resources
- Network Nodes: 39 key components
- Relationships: 31+ documented connections
- Update Frequency: Monthly updates planned
Built with ❤️ for the LLM & GenAI community
Note: This is still WIP Date: 25 Aug 2025