-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Insights: triton-inference-server/server
Overview
Could not load contribution data
Please try again later
1 Release published by 1 person
-
v2.57.0 Release 2.57.0 corresponding to NGC container 25.04
published
May 12, 2025
25 Pull requests merged by 8 people
-
R25.04 compatibility (#8201)
#8206 merged
May 14, 2025 -
R25.04 compatibility
#8201 merged
May 12, 2025 -
Update README.md for r25.05 release
#8200 merged
May 9, 2025 -
proper 25.05 version
#8198 merged
May 9, 2025 -
TPRD-1509: Upadate OpenVINO version to 2025.1.0
#8196 merged
May 8, 2025 -
boost vLLM to 0.8.5
#8193 merged
May 8, 2025 -
Fix: Update element count handling
#8182 merged
May 7, 2025 -
Update default branch post-25.04
#8188 merged
May 7, 2025 -
test: Add backend_api_test to test backend APIs
#8185 merged
May 7, 2025 -
fix: Add HTTP JSON parsing recursion depth limit
#8172 merged
May 2, 2025 -
Fix: Update handling of shared mem integer values
#8170 merged
May 2, 2025 -
build: Update ARG in Dockerfile.sdk
#8179 merged
Apr 30, 2025 -
build: Integrate to use PA and GAP assets if available
#8155 merged
Apr 29, 2025 -
Adding additional output to build process
#8175 merged
Apr 29, 2025 -
feat: Add graceful shutdown timer to GRPC frontend
#7969 merged
Apr 26, 2025 -
Upgrade vllm version to 0.8.1
#8168 merged
Apr 24, 2025 -
test: Input batch size overflow vulnerability
#8165 merged
Apr 24, 2025 -
Fixes for 25.04 release L0_grpc_* and L0_http_* tests (#8152)
#8164 merged
Apr 22, 2025 -
Fixes for 25.04 release L0_grpc_* and L0_http_* tests
#8152 merged
Apr 22, 2025 -
fix: fix the bug that openai api frontend not able to start
#8163 merged
Apr 21, 2025 -
TPRD-1425: Excluding Triton Model Analyzer from build
#8159 merged
Apr 21, 2025 -
test: Add config parameter "execution_context_allocation_strategy" to TensorRT backend
#8150 merged
Apr 18, 2025 -
ci: Remove unsupported PA tests from server repo and CI
#8158 merged
Apr 18, 2025 -
feat: add the tool calling to the openai frontend
#8134 merged
Apr 17, 2025 -
25.04 release - bump versions
#8149 merged
Apr 16, 2025
5 Pull requests opened by 5 people
-
draft: Update handling of large array sizes
#8174 opened
Apr 28, 2025 -
fix: Fix L0_backend_python env test
#8178 opened
Apr 29, 2025 -
build: Upgrade vllm version to 0.8.5.post1
#8190 opened
May 7, 2025 -
test: L0_orca_trtllm fixed
#8191 opened
May 7, 2025 -
build: Convert vLLM index url ARGS into docker secrets
#8197 opened
May 9, 2025
6 Issues closed by 5 people
-
Python backend with multiple instances cause unexpected and non-deterministic results
#7907 closed
May 14, 2025 -
triton_python_backend_stub: Permission denied
#8162 closed
Apr 22, 2025 -
use nvcr.io/nvidia/tritonserver:25.03-vllm-python-py3 image error
#8145 closed
Apr 21, 2025 -
How to control pipeline of the ensemble model to moving data on only one GPU when using multi GPUs ?
#7001 closed
Apr 18, 2025 -
about ensemble model with multi GPUs
#6981 closed
Apr 18, 2025 -
Triton server receives Signal (11) when tracing is enabled with no sampling (or a small sampling rate)
#7795 closed
Apr 15, 2025
25 Issues opened by 23 people
-
Multimodal support for OpenAI-Compatible frontend
#8207 opened
May 15, 2025 -
If I want to implement streaming output for Python backend, where can I find tutorials
#8205 opened
May 14, 2025 -
TensorFlow Model Not Loading After Successful Backend Build in Triton
#8204 opened
May 13, 2025 -
Dynamic Batching Configuration Issue with Triton vLLM Backend
#8203 opened
May 12, 2025 -
Can not build docker image
#8202 opened
May 12, 2025 -
How to get token usage from openai frontend?
#8194 opened
May 8, 2025 -
25.01 vllm tritonserver panic TRITONBACKEND_ResponseFactoryIsCancelled
#8192 opened
May 8, 2025 -
Docker Image Security Report
#8187 opened
May 6, 2025 -
Support deterministic algorithm configuration in PyTorch backend
#8186 opened
May 5, 2025 -
The model instance placement on GPU seems incorrect?
#8184 opened
May 5, 2025 -
GPU instances not supported on Jetson Orin AGX 64GB with JetPack 6.2
#8183 opened
May 3, 2025 -
Add LoRA metrics compatible with gateway-api-inference-extension
#8181 opened
May 1, 2025 -
Bug in inception_onnx example model
#8180 opened
May 1, 2025 -
How to set cuda-memory-pool-byte-size and handle the case when we are out of this memory
#8177 opened
Apr 29, 2025 -
CLIP Model will not load on CPU-Only Pytorch Build
#8176 opened
Apr 28, 2025 -
Why throughput is too high when there is only one instance?
#8173 opened
Apr 28, 2025 -
TensorRT backend multiple optimization profiles support
#8171 opened
Apr 27, 2025 -
Expose IsLastResponse of InferResponse to the Python API
#8169 opened
Apr 24, 2025 -
Reschedule request with START and END flag in iterative sequence batching mode
#8167 opened
Apr 24, 2025 -
question about scheduling and load distribution
#8166 opened
Apr 24, 2025 -
How can I use triton core/src /filesystem
#8160 opened
Apr 20, 2025 -
If I want to implement streaming output for calling OpenAI API, which document should I refer to?
#8157 opened
Apr 18, 2025 -
Feature Request: Support for Dynamic Batching with Variable-Length Inputs in Audio Processing
#8156 opened
Apr 18, 2025
26 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
feat: Adding multiple tokenizers specification for open ai frontend
#8027 commented on
Apr 17, 2025 • 1 new comment -
Build: Build using the PA binaries and whl if available.
#8043 commented on
Apr 17, 2025 • 0 new comments -
No valid engine configs for ConvFwd_
#8137 commented on
May 9, 2025 • 0 new comments -
vllm_backend: What is the right way to use downloaded model + `model.json` together?
#7912 commented on
May 7, 2025 • 0 new comments -
Got run time error `0 active drivers ([]). There should only be one.` when using PipelineModule through ray and deepspeed
#8007 commented on
May 7, 2025 • 0 new comments -
Is it possible to make gRPC to use a unix socket instead of TCP in Triton Server?
#4095 commented on
May 5, 2025 • 0 new comments -
Direct Streaming of Model Weights from Cloud Storage to GPU Memory
#7660 commented on
May 4, 2025 • 0 new comments -
Error: ensemble of tensorrt + python_be + tensorrt is supported on jetson?
#7667 commented on
May 2, 2025 • 0 new comments -
Add model warmup functionality for ensemble models
#6877 commented on
May 1, 2025 • 0 new comments -
Can't build r25.01 (r24.12 builds okay) on `ubuntu-22.04` (unclear build errors). Also can't build `r24.12` on `ubuntu-24.04` (C++ errors)
#7997 commented on
Apr 30, 2025 • 0 new comments -
build.py broken in r24.11
#7939 commented on
Apr 30, 2025 • 0 new comments -
build.py fails during onnxruntime backend installation
#8126 commented on
Apr 30, 2025 • 0 new comments -
Unstable Memory after upgrading from Triton 2.27 / nvidia 22.10
#6051 commented on
Apr 29, 2025 • 0 new comments -
Allow for explicit folder name when specifying where remote model repository will be downloaded.
#6644 commented on
Apr 28, 2025 • 0 new comments -
Python backend SHM memory leak
#7481 commented on
Apr 27, 2025 • 0 new comments -
unexpected throughput results - Increasing instance group count VS deploying the count distributed on the same card using shared computing windows
#7956 commented on
Apr 24, 2025 • 0 new comments -
How to maximize single-model inference performance
#7706 commented on
Apr 23, 2025 • 0 new comments -
GPU Scaling issue : multi-gpu inference
#7385 commented on
Apr 23, 2025 • 0 new comments -
Triton Server Crash with Signal (11) with Async BLS
#6720 commented on
Apr 22, 2025 • 0 new comments -
Support for vLLM and TRT-LLM running in OpenAI compatible mode
#6583 commented on
Apr 22, 2025 • 0 new comments -
Unable to build Triton Core from Source In Windows 10.
#7416 commented on
Apr 22, 2025 • 0 new comments -
/v2/health/ready endpoint does not work as expected
#7588 commented on
Apr 22, 2025 • 0 new comments -
Segmentation fault (core dumped) - Server version 2.46.0
#7330 commented on
Apr 21, 2025 • 0 new comments -
libboost_filesystem.so.1.80.0 on jetpack 5.1.2
#6844 commented on
Apr 19, 2025 • 0 new comments -
How can I release the GPU memory used by triton_python_backend_stub when using the Python backend?
#8102 commented on
Apr 18, 2025 • 0 new comments -
ensemble multi-GPU
#7794 commented on
Apr 16, 2025 • 0 new comments