Skip to content
Draft
Changes from 1 commit
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
245f391
graph : reuse hybrid graphs
ggerganov Oct 9, 2025
638e2c2
graph : reuse recurrent graphs
ggerganov Oct 9, 2025
0b9c1ae
metal : fix mul-mm condition + fix mul-mv permuted kernels
ggerganov Oct 9, 2025
1f02d93
graph : fix reuse check for recurrent inputs
ggerganov Oct 10, 2025
00f115f
memory : move the recurrent state into the memory context
ggerganov Oct 10, 2025
2744d61
Revert "memory : move the recurrent state into the memory context"
ggerganov Oct 10, 2025
ab3f3fe
Merge branch 'gg/metal-mul-mat-fixes' into gg/graph-mamba-reuse
gabe-l-hart Oct 10, 2025
8c23c43
Added: tri, cumsum. Still a mess.
gabe-l-hart Oct 10, 2025
2a2e79c
feat(tests): Add --verbose | -v flag to test-backend-ops to print ten…
gabe-l-hart Oct 10, 2025
092f740
test: Add cumsum tests to test-backend-ops
gabe-l-hart Oct 10, 2025
6949ce7
feat(ggml-cpu): Add cumsum support for f16 and bf16
gabe-l-hart Oct 10, 2025
f8fba60
feat(ggml-cpu): Add F16 and BF16 support for tri
gabe-l-hart Oct 13, 2025
058160a
test: Add test cases for tri
gabe-l-hart Oct 13, 2025
86ce3da
chore: TODOs to loosen assertions in tri for ggml_is_contiguous
gabe-l-hart Oct 13, 2025
3a8958f
feat(ggml-metal): Initial (slow) implementation of cumsum for metal
gabe-l-hart Oct 13, 2025
cbaed86
feat(ggml-metal): Add stubs for metal tri
gabe-l-hart Oct 13, 2025
e596469
test: Use looser nmse for lower-precision types for cumsum
gabe-l-hart Oct 13, 2025
3011a6e
Merge remote-tracking branch 'origin/master' into Mamba2SSD
gabe-l-hart Oct 13, 2025
112d339
test: Allow multiple verbose flags to fully print tensors
gabe-l-hart Oct 15, 2025
78e137f
feat(llama-gguf): Print out the tensor type in llama-gguf r
gabe-l-hart Sep 26, 2025
e5587cb
feat(ggml-metal): Efficient implementation of cumsum for metal
gabe-l-hart Oct 15, 2025
0468b99
test: More verbose printing and better cumsum tests
gabe-l-hart Oct 15, 2025
c71e35e
fix(ggml-metal): better granularity for support bool for CUMSUM and TRI
gabe-l-hart Oct 15, 2025
5f0d2a1
feat(ggml-metal): Metal impl of tri
gabe-l-hart Oct 15, 2025
426580d
Merge remote-tracking branch 'origin/master' into Mamba2SSD
gabe-l-hart Oct 15, 2025
ba3b8db
fix(ggml-cpu): Fix warnings from build with gcc
gabe-l-hart Oct 15, 2025
dfae909
feat(ggml-cuda): common implementation of prefix sum
gabe-l-hart Oct 16, 2025
d1f8658
feat(ggml-cuda): CUDA implementation of CUMSUM
gabe-l-hart Oct 16, 2025
5071fbd
feat(ggml-cuda): CUDA implementation of TRI
gabe-l-hart Oct 16, 2025
be23a29
test: Add test-backend-ops perf tests for ssm conv and scan
gabe-l-hart Sep 25, 2025
71e2289
feat(ggml-cpu): Rename ggml_softplus to ggml_op_softplus to make room…
gabe-l-hart Oct 17, 2025
f6d60e3
feat(ggml-cpu): Add ggml_softplus tensor op for CPU
gabe-l-hart Oct 17, 2025
778e835
test: Better verbosity output for inputs in test-backend-ops
gabe-l-hart Oct 17, 2025
4228002
feat(ggml-metal): Add ggml_softplus support for metal
gabe-l-hart Oct 17, 2025
97bd17d
feat(ggml-cuda): Add support for ggml_softplus
gabe-l-hart Oct 17, 2025
ffd88ff
style: comments on ggml tri types
gabe-l-hart Oct 20, 2025
7409d9e
WIP(llama-model): Partial work on graph-based SSD implementation
gabe-l-hart Oct 20, 2025
ba74006
TEMP: Increase the max graph nodes to handle all the nodes for SSD
gabe-l-hart Oct 21, 2025
29b30c6
WIP: Shape-correct impl of SSD w/out multi-chunk support
gabe-l-hart Oct 21, 2025
fb68967
fix: Add names to tensors for better debugging and fix several wiring…
gabe-l-hart Oct 23, 2025
cd73f4d
fix(wip): Fix matmul order for CB and y
gabe-l-hart Oct 23, 2025
52be1ab
fix: Working output!!
gabe-l-hart Oct 23, 2025
f57dafe
feat(eval-callback): Use -vb to set tensor print width and number of …
gabe-l-hart Oct 24, 2025
8a87063
feat(ggml-cpu): Add ggml_tri_dims to support non-standard dims (with …
gabe-l-hart Oct 24, 2025
79bce3e
feat(ggml-metal): Extend metal tri imple for arbitrary dims and non-c…
gabe-l-hart Oct 24, 2025
1ceb15e
feat(ggml-cuda): Extend CUDA impl of tri to support arbitrary dims an…
gabe-l-hart Oct 24, 2025
ef12069
fix: Fix INT_MAX to use numeric_limits for better compiler compat
gabe-l-hart Oct 24, 2025
3da5c97
fix(temp): Fix CBdecay to make decay contiguous for metal
gabe-l-hart Oct 24, 2025
3336f3c
fix: Use ggml_tri_dims to avoid perm/cont for initial decay step
gabe-l-hart Oct 24, 2025
d1e15c0
feat(ggml-cpu): Add dim arg to ggml_cumsum
gabe-l-hart Oct 24, 2025
ee13af1
feat(ggml-metal): Support arbitrary dim and non-cont in cumsum
gabe-l-hart Oct 24, 2025
3b4055e
feat(ggml-cuda): Support arbitrary dims and non-cont in cumsum
gabe-l-hart Oct 24, 2025
3963a72
feat(wip): Partially working implementation with update from previous…
gabe-l-hart Oct 28, 2025
188ae84
refact: Avoid permute and cont for first cumsum
gabe-l-hart Oct 28, 2025
0441ccb
fix: Subset input states to match ids
gabe-l-hart Oct 29, 2025
aba30d6
fix: Fix the chunk size computation
gabe-l-hart Oct 29, 2025
62ac897
fix: Fix handling of batch size > 1 in chunk updates
gabe-l-hart Oct 29, 2025
36244fe
fix: Fix permutation for nemotron-h shape
gabe-l-hart Oct 29, 2025
5ff37fa
Merge remote-tracking branch 'origin/master' into Mamba2SSD
gabe-l-hart Nov 3, 2025
8b6f38a
feat(off-topic): print the number of elements in tensors with llama-gguf
gabe-l-hart Nov 4, 2025
82bba1d
feat(ggml-cpu): Add f16 and bf16 support for ssm_conv
gabe-l-hart Nov 4, 2025
7ad0f37
feat(llama-quant): Allow F16 and BF16 quants of ssm_conv1d.weight
gabe-l-hart Nov 4, 2025
6256f9a
feat(ggml-cpu): Add partial implementation of scale for f16
gabe-l-hart Nov 4, 2025
204cd80
feat(wip): Use type_k/type_v for hybrid cache types
gabe-l-hart Nov 4, 2025
86788a2
temp: Cast ssm to F32
gabe-l-hart Nov 4, 2025
de43d0b
feat(ggml-metal): Add support for F16 and BF16 ssm_conv weights
gabe-l-hart Nov 4, 2025
426a97c
feat: Keep ssm in f16 until output on SSD code path
gabe-l-hart Nov 5, 2025
6733bda
feat: Remove sub-ubatch batching
gabe-l-hart Nov 5, 2025
4435600
Merge remote-tracking branch 'origin/master' into Mamba2SSD
gabe-l-hart Nov 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
test: Add test cases for tri
Branch: Mamba2SSD

Signed-off-by: Gabe Goodhart <[email protected]>
  • Loading branch information
gabe-l-hart committed Oct 13, 2025
commit 058160a42d2e6cecac5b847c95fe2d57aa486a1d
51 changes: 51 additions & 0 deletions tests/test-backend-ops.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4796,6 +4796,35 @@ struct test_cumsum : public test_case {
}
};

// GGML_OP_TRI
struct test_tri : public test_case {
const ggml_type type;
const std::array<int64_t, 4> ne;
const ggml_tri_type tri_type;
const float c;

std::string vars() override {
return VARS_TO_STR4(type, ne, tri_type, c);
}

test_tri(ggml_tri_type tri_type,
ggml_type type = GGML_TYPE_F32,
std::array<int64_t, 4> ne = {10, 10, 1, 1},
float c = nan(""))
: type(type), ne(ne), tri_type(tri_type), c(c) {}

ggml_tensor * build_graph(ggml_context * ctx) override {
ggml_tensor * a = ggml_new_tensor(ctx, type, 4, ne.data());
ggml_set_param(a);
ggml_set_name(a, "a");

ggml_tensor * out = ggml_tri(ctx, a, c, tri_type);
ggml_set_name(out, "out");

return out;
}
};

// GGML_OP_MEAN
struct test_mean : public test_case {
const ggml_type type;
Expand Down Expand Up @@ -6902,6 +6931,17 @@ static std::vector<std::unique_ptr<test_case>> make_test_cases_eval(bool verbose
test_cases.emplace_back(new test_cumsum(GGML_TYPE_BF16, { 4, 2, 2, 1 }));
test_cases.emplace_back(new test_cumsum(GGML_TYPE_F32, { 1024, 15, 26, 12 }));

test_cases.emplace_back(new test_tri(GGML_TRI_TYPE_LOWER));
test_cases.emplace_back(new test_tri(GGML_TRI_TYPE_LOWER_DIAG));
test_cases.emplace_back(new test_tri(GGML_TRI_TYPE_UPPER));
test_cases.emplace_back(new test_tri(GGML_TRI_TYPE_UPPER_DIAG));
test_cases.emplace_back(new test_tri(GGML_TRI_TYPE_LOWER, GGML_TYPE_F16));
test_cases.emplace_back(new test_tri(GGML_TRI_TYPE_LOWER, GGML_TYPE_BF16));
test_cases.emplace_back(new test_tri(GGML_TRI_TYPE_UPPER, GGML_TYPE_F32, {8, 8, 4, 16}));
test_cases.emplace_back(new test_tri(GGML_TRI_TYPE_UPPER, GGML_TYPE_F32, {8, 8, 4, 16}, 42.f));
test_cases.emplace_back(new test_tri(GGML_TRI_TYPE_UPPER, GGML_TYPE_F16, {8, 8, 4, 16}, 42.f));
test_cases.emplace_back(new test_tri(GGML_TRI_TYPE_UPPER, GGML_TYPE_BF16, {8, 8, 4, 16}, 42.f));

for (bool v : {false, true}) {
test_cases.emplace_back(new test_pad_ext(GGML_TYPE_F32, {512, 512, 1, 1}, 0, 1, 0, 1, 0, 0, 0, 0, v));
test_cases.emplace_back(new test_pad_ext(GGML_TYPE_F32, {11, 22, 33, 44}, 1, 2, 3, 4, 5, 6, 7, 8, v));
Expand Down Expand Up @@ -7063,6 +7103,17 @@ static std::vector<std::unique_ptr<test_case>> make_test_cases_perf() {
test_cases.emplace_back(new test_cumsum(GGML_TYPE_BF16, { 4, 2, 2, 1 }));
test_cases.emplace_back(new test_cumsum(GGML_TYPE_F32, { 1024, 15, 26, 12 }));

test_cases.emplace_back(new test_tri(GGML_TRI_TYPE_LOWER));
test_cases.emplace_back(new test_tri(GGML_TRI_TYPE_LOWER_DIAG));
test_cases.emplace_back(new test_tri(GGML_TRI_TYPE_UPPER));
test_cases.emplace_back(new test_tri(GGML_TRI_TYPE_UPPER_DIAG));
test_cases.emplace_back(new test_tri(GGML_TRI_TYPE_LOWER, GGML_TYPE_F16));
test_cases.emplace_back(new test_tri(GGML_TRI_TYPE_LOWER, GGML_TYPE_BF16));
test_cases.emplace_back(new test_tri(GGML_TRI_TYPE_UPPER, GGML_TYPE_F32, {8, 8, 4, 16}));
test_cases.emplace_back(new test_tri(GGML_TRI_TYPE_UPPER, GGML_TYPE_F32, {8, 8, 4, 16}, 42.f));
test_cases.emplace_back(new test_tri(GGML_TRI_TYPE_UPPER, GGML_TYPE_F16, {8, 8, 4, 16}, 42.f));
test_cases.emplace_back(new test_tri(GGML_TRI_TYPE_UPPER, GGML_TYPE_BF16, {8, 8, 4, 16}, 42.f));

for (int bs : {1, 2, 3, 4, 5, 8, 512}) {
for (ggml_type type_a : all_types) {
for (ggml_type type_b : {GGML_TYPE_F32}) {
Expand Down