Skip to content

[DeepSeek][Kernels] MoE sorting - Scatter Gather kernels #1065

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Prev Previous commit
Next Next commit
start sorting kernels
  • Loading branch information
lessw2020 committed Apr 5, 2025
commit 30d215b60d2b4271318b1ecf78ea6bd611280aef
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
// Copyright (c) Meta Platforms, Inc. and affiliates.
// All rights reserved.
//
// This source code is licensed under the BSD 3-Clause license found in the
// LICENSE file in the root directory of this source tree.

/*
* Token sorting kernels
* sequential and parallel scans
*/

#include <cuda.h>
#include <cuda_runtime.h>
#include <torch/extension.h>
#include <vector.h>

#include "moe_kernel_utils.h"

// our utility namespace
using namespace moe_kernel_utils;

//
// kernels for sorting tokens by expert assignment
//

__global__ void sort_tokens_by_expert_kernel(

)

// gather kernel - move tokens to sorted indices
template <template scalar_t>
__global__ void gather_sorted_tokens_kernel(
scalar_t *sorted_tokens, // output: sorted token features
)