Skip to content

StarStyleSky/AI-workspace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 

Repository files navigation

Awesome AI Everything

AI LLM Survey

AI State of Art Model

paper&technical paper

AI benchmark

AI Algorithm

Network Architecture

  • Transformer: Attention Is All You Need | 2 Aug 2023 | Google
  • RWKV: Reinventing RNNs for the Transformer Era | 11 Dec 2023 | Generative AI Commons
  • Mamba: Linear-Time Sequence Modeling with Selective State Spaces | 31 May 2024 | CMU

MoE

AI Chips

Chips Survey

GPU

  • NVIDIA
# FLOPs dense fp16 HBM Bandwidth L2 cache NV link PCIe Architecture
GB200 5P 384GB 8.0TB/s 1.8TB/s 128GB/s blackwell
GH200 985T 141GB 4.8TB/s 60MB 900GB/s 128GB/s hopper
H100 985T 80GB 3.35TB/s 50MB 900GB/s 128GB/s hopper
H800 985T 80GB 3.35TB/s 50MB 400GB/s 64GB/s hopper
A100 312T 80GB 2.0TB/s 40MB 600GB/s 64GB/s ampere
A800 312T 80GB 2.0TB/s 80MB 400GB/s 128GB/s ampere
H20 148T 141GB 4.0TB/s 60MB 900GB/s 128GB/s hopper
L40s 362T 48GB 846GB/s 96MB / 64GB/s Ada Lovelace
4090 330T 24GB 1.0TB/s 72MB / 64GB/s Ada Lovelace

ASIC

FPGA

PIM/NDP

AI Training Optimization

MoE training

Finetune

Parallelism training

TP

PP

DP

SP/CP

EP

AI Inference optimization

KV cache optimization

Quantization

Pruning

Decomposition

Distilling

Sparse

Fusion

Heterogeneous Speculative Decoding

Overlapping

Communication & Compute: tensor parallelism & communication

MoE: overlapping of alltoall & compute & inference system

MoE route

Offloading

hybrid batches

Parameter Sharing

LORA

Attention optimization

The quadratic complexity of self-attention in a vanilla Transformer is well-known, and there has been much research on how to optimize attention to a linear-complexity algorithm.

1.Efficient attention algorithms: the use of faster, modified memory-efficient attention algorithms

2.Removing some attention heads, called attention head pruning

3.Approximate attention

4.Next-gen architectures

5.Code optimizations: op level rewriting optimizaiton

MHA2MLA

Prefill Decode Disaggregated

Prefill optimization

DeepSeek Open Day

https://github.com/deepseek-ai/open-infra-index?tab=readme-ov-file#day-6---one-more-thing-deepseek-v3r1-inference-system-overview

AI Inference&serving Framework

AI Complier

AI Infrastructure

About

AI everything

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published