Pulse · triton-inference-server/server · GitHub

April 14, 2025 – May 14, 2025

Overview

30 Active pull requests

31 Active issues

1 Release published by 1 person

v2.57.0 Release 2.57.0 corresponding to NGC container 25.04
published May 12, 2025

25 Pull requests merged by 8 people

R25.04 compatibility (#8201)
#8206 merged May 14, 2025
R25.04 compatibility
#8201 merged May 12, 2025
Update README.md for r25.05 release
#8200 merged May 9, 2025
proper 25.05 version
#8198 merged May 9, 2025
TPRD-1509: Upadate OpenVINO version to 2025.1.0
#8196 merged May 8, 2025
boost vLLM to 0.8.5
#8193 merged May 8, 2025
Fix: Update element count handling
#8182 merged May 7, 2025
Update default branch post-25.04
#8188 merged May 7, 2025
test: Add backend_api_test to test backend APIs
#8185 merged May 7, 2025
fix: Add HTTP JSON parsing recursion depth limit
#8172 merged May 2, 2025
Fix: Update handling of shared mem integer values
#8170 merged May 2, 2025
build: Update ARG in Dockerfile.sdk
#8179 merged Apr 30, 2025
build: Integrate to use PA and GAP assets if available
#8155 merged Apr 29, 2025
Adding additional output to build process
#8175 merged Apr 29, 2025
feat: Add graceful shutdown timer to GRPC frontend
#7969 merged Apr 26, 2025
Upgrade vllm version to 0.8.1
#8168 merged Apr 24, 2025
test: Input batch size overflow vulnerability
#8165 merged Apr 24, 2025
Fixes for 25.04 release L0_grpc_* and L0_http_* tests (#8152)
#8164 merged Apr 22, 2025
Fixes for 25.04 release L0_grpc_* and L0_http_* tests
#8152 merged Apr 22, 2025
fix: fix the bug that openai api frontend not able to start
#8163 merged Apr 21, 2025
TPRD-1425: Excluding Triton Model Analyzer from build
#8159 merged Apr 21, 2025
test: Add config parameter "execution_context_allocation_strategy" to TensorRT backend
#8150 merged Apr 18, 2025
ci: Remove unsupported PA tests from server repo and CI
#8158 merged Apr 18, 2025
feat: add the tool calling to the openai frontend
#8134 merged Apr 17, 2025
25.04 release - bump versions
#8149 merged Apr 16, 2025

5 Pull requests opened by 5 people

draft: Update handling of large array sizes
#8174 opened Apr 28, 2025
fix: Fix L0_backend_python env test
#8178 opened Apr 29, 2025
build: Upgrade vllm version to 0.8.5.post1
#8190 opened May 7, 2025
test: L0_orca_trtllm fixed
#8191 opened May 7, 2025
build: Convert vLLM index url ARGS into docker secrets
#8197 opened May 9, 2025

6 Issues closed by 5 people

Python backend with multiple instances cause unexpected and non-deterministic results
#7907 closed May 14, 2025
triton_python_backend_stub: Permission denied
#8162 closed Apr 22, 2025
use nvcr.io/nvidia/tritonserver:25.03-vllm-python-py3 image error
#8145 closed Apr 21, 2025
How to control pipeline of the ensemble model to moving data on only one GPU when using multi GPUs ?
#7001 closed Apr 18, 2025
about ensemble model with multi GPUs
#6981 closed Apr 18, 2025
Triton server receives Signal (11) when tracing is enabled with no sampling (or a small sampling rate)
#7795 closed Apr 15, 2025

25 Issues opened by 23 people

Multimodal support for OpenAI-Compatible frontend
#8207 opened May 15, 2025
If I want to implement streaming output for Python backend, where can I find tutorials
#8205 opened May 14, 2025
TensorFlow Model Not Loading After Successful Backend Build in Triton
#8204 opened May 13, 2025
Dynamic Batching Configuration Issue with Triton vLLM Backend
#8203 opened May 12, 2025
Can not build docker image
#8202 opened May 12, 2025
How to get token usage from openai frontend?
#8194 opened May 8, 2025
25.01 vllm tritonserver panic TRITONBACKEND_ResponseFactoryIsCancelled
#8192 opened May 8, 2025
vLLM backend: Model-specific GPU assignment ignored — both models loaded on GPU 0 despite config.pbtxt specifying gpus: [0] and gpus: [1]
#8189 opened May 7, 2025
Docker Image Security Report
#8187 opened May 6, 2025
Support deterministic algorithm configuration in PyTorch backend
#8186 opened May 5, 2025
The model instance placement on GPU seems incorrect?
#8184 opened May 5, 2025
GPU instances not supported on Jetson Orin AGX 64GB with JetPack 6.2
#8183 opened May 3, 2025
Add LoRA metrics compatible with gateway-api-inference-extension
#8181 opened May 1, 2025
Bug in inception_onnx example model
#8180 opened May 1, 2025
How to set cuda-memory-pool-byte-size and handle the case when we are out of this memory
#8177 opened Apr 29, 2025
CLIP Model will not load on CPU-Only Pytorch Build
#8176 opened Apr 28, 2025
Why throughput is too high when there is only one instance？
#8173 opened Apr 28, 2025
TensorRT backend multiple optimization profiles support
#8171 opened Apr 27, 2025
Expose IsLastResponse of InferResponse to the Python API
#8169 opened Apr 24, 2025
Reschedule request with START and END flag in iterative sequence batching mode
#8167 opened Apr 24, 2025
question about scheduling and load distribution
#8166 opened Apr 24, 2025
use nvcr.io/nvidia/tritonserver:25.03-vllm-python-py3 images deployment qwen2.5-vl-32B-Instruct-AWQ error
#8161 opened Apr 21, 2025
How can I use triton core/src /filesystem
#8160 opened Apr 20, 2025
If I want to implement streaming output for calling OpenAI API, which document should I refer to?
#8157 opened Apr 18, 2025
Feature Request: Support for Dynamic Batching with Variable-Length Inputs in Audio Processing
#8156 opened Apr 18, 2025

26 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

feat: Adding multiple tokenizers specification for open ai frontend
#8027 commented on Apr 17, 2025 • 1 new comment
Build: Build using the PA binaries and whl if available.
#8043 commented on Apr 17, 2025 • 0 new comments
No valid engine configs for ConvFwd_
#8137 commented on May 9, 2025 • 0 new comments
vllm_backend: What is the right way to use downloaded model + `model.json` together?
#7912 commented on May 7, 2025 • 0 new comments
Got run time error `0 active drivers ([]). There should only be one.` when using PipelineModule through ray and deepspeed
#8007 commented on May 7, 2025 • 0 new comments
Is it possible to make gRPC to use a unix socket instead of TCP in Triton Server?
#4095 commented on May 5, 2025 • 0 new comments
Direct Streaming of Model Weights from Cloud Storage to GPU Memory
#7660 commented on May 4, 2025 • 0 new comments
Error: ensemble of tensorrt + python_be + tensorrt is supported on jetson?
#7667 commented on May 2, 2025 • 0 new comments
Add model warmup functionality for ensemble models
#6877 commented on May 1, 2025 • 0 new comments
Can't build r25.01 (r24.12 builds okay) on `ubuntu-22.04` (unclear build errors). Also can't build `r24.12` on `ubuntu-24.04` (C++ errors)
#7997 commented on Apr 30, 2025 • 0 new comments
build.py broken in r24.11
#7939 commented on Apr 30, 2025 • 0 new comments
build.py fails during onnxruntime backend installation
#8126 commented on Apr 30, 2025 • 0 new comments
Unstable Memory after upgrading from Triton 2.27 / nvidia 22.10
#6051 commented on Apr 29, 2025 • 0 new comments
Allow for explicit folder name when specifying where remote model repository will be downloaded.
#6644 commented on Apr 28, 2025 • 0 new comments
Python backend SHM memory leak
#7481 commented on Apr 27, 2025 • 0 new comments
unexpected throughput results - Increasing instance group count VS deploying the count distributed on the same card using shared computing windows
#7956 commented on Apr 24, 2025 • 0 new comments
How to maximize single-model inference performance
#7706 commented on Apr 23, 2025 • 0 new comments
GPU Scaling issue : multi-gpu inference
#7385 commented on Apr 23, 2025 • 0 new comments
Triton Server Crash with Signal (11) with Async BLS
#6720 commented on Apr 22, 2025 • 0 new comments
Support for vLLM and TRT-LLM running in OpenAI compatible mode
#6583 commented on Apr 22, 2025 • 0 new comments
Unable to build Triton Core from Source In Windows 10.
#7416 commented on Apr 22, 2025 • 0 new comments
/v2/health/ready endpoint does not work as expected
#7588 commented on Apr 22, 2025 • 0 new comments
Segmentation fault (core dumped) - Server version 2.46.0
#7330 commented on Apr 21, 2025 • 0 new comments
libboost_filesystem.so.1.80.0 on jetpack 5.1.2
#6844 commented on Apr 19, 2025 • 0 new comments
How can I release the GPU memory used by triton_python_backend_stub when using the Python backend?
#8102 commented on Apr 18, 2025 • 0 new comments
ensemble multi-GPU
#7794 commented on Apr 16, 2025 • 0 new comments