Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: kvcache-ai/ktransformers
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.3.1
Choose a base ref
...
head repository: kvcache-ai/ktransformers
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: main
Choose a head ref
  • 12 commits
  • 14 files changed
  • 9 contributors

Commits on May 18, 2025

  1. VLinearMarlin: padding to input.shape[0] to avoid CUDA error

    Fix the following runtime error with --no-use_cuda_graph option
    
    Traceback (most recent call last):
      File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
        self.run()
      File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/multiprocessing/process.py", line 108, in run
        self._target(*self._args, **self._kwargs)
      File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 282, in run_engine
        engine.loop()
      File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 234, in loop
        self.model_runner.run(self.batch, self.query_manager)
      File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/site-packages/ktransformers/server/balance_serve/inference/model_runner.py", line 220, in run
        self.output.logits[0] = self.output.logits[0][self.input[cuda_graph_idx].minibatch.logits_start]
                                ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    RuntimeError: CUDA error: an illegal memory access was encountered
    CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
    aubreyli committed May 18, 2025
    Configuration menu
    Copy the full SHA
    d347aeb View commit details
    Browse the repository at this point in the history
  2. Merge pull request #1320 from aubreyli/no_cuda_graph_err

    VLinearMarlin: padding to input.shape[0] to avoid CUDA error
    Atream authored May 18, 2025
    Configuration menu
    Copy the full SHA
    01311d2 View commit details
    Browse the repository at this point in the history

Commits on May 19, 2025

  1. Update version

    Atream authored May 19, 2025
    Configuration menu
    Copy the full SHA
    4f78e37 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #1323 from kvcache-ai/Atream-patch-2

    Update version
    Atream authored May 19, 2025
    Configuration menu
    Copy the full SHA
    7d79735 View commit details
    Browse the repository at this point in the history

Commits on May 21, 2025

  1. Fix NaN bug

    chenht2022 committed May 21, 2025
    Configuration menu
    Copy the full SHA
    6645398 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #1328 from chenht2022/main

    Fix NaN bug
    chenht2022 authored May 21, 2025
    Configuration menu
    Copy the full SHA
    2589336 View commit details
    Browse the repository at this point in the history

Commits on May 22, 2025

  1. Configuration menu
    Copy the full SHA
    adc0906 View commit details
    Browse the repository at this point in the history

Commits on May 23, 2025

  1. Configuration menu
    Copy the full SHA
    71a5fc5 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #1331 from rnwang04/qwen3_xpu_support

    add XPU support for qwen3moe local chat
    qiyuxinlin authored May 23, 2025
    Configuration menu
    Copy the full SHA
    0c44f2e View commit details
    Browse the repository at this point in the history

Commits on May 28, 2025

  1. docs: add Dockerfile.xpu and GPU driver setup instructions

    - Add Dockerfile.xpu for oneAPI-based container
    - Create Docker_xpu.md with usage instructions
    - Update xpu.md to include Docker guide
    liu-shaojun committed May 28, 2025
    Configuration menu
    Copy the full SHA
    404ad39 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #1337 from liu-shaojun/docker_xpu

    Add Dockerfile and usage guide for XPU support
    aubreyli authored May 28, 2025
    Configuration menu
    Copy the full SHA
    ce75fcd View commit details
    Browse the repository at this point in the history

Commits on May 29, 2025

  1. raise exception on device error (#1342)

    * display the unavailable torch device on error
    
    * Raise exception on device error
    
    ---------
    
    Signed-off-by: Emmanuel Ferdman <[email protected]>
    emmanuel-ferdman authored May 29, 2025
    Configuration menu
    Copy the full SHA
    d8bc640 View commit details
    Browse the repository at this point in the history
Loading