Add PLaMo-2 model #14560

mitmul · 2025-07-07T05:08:38Z

Make sure to read the contributing guidelines before submitting a PR

Based on #7531

How to check if the plamo-2-translate works with this PR. First, retrieve the model itself by:

git clone https://huggingface.co/pfnet/plamo-2-translate

Then, I needed to modify the tokenizer.jsonl to pad some meaningless vocabs to align the vocabulary size to what is specified in config.json, namely it should be 100032 by using this script:

#!/usr/bin/env python3
"""Fix PLaMo-2 tokenizer by adding missing padding tokens."""

import json
import shutil

def fix_tokenizer():
    # Backup original file
    shutil.copy("plamo-2-translate/tokenizer.jsonl", "plamo-2-translate/tokenizer.jsonl.backup")
    
    # Read existing tokens
    with open("plamo-2-translate/tokenizer.jsonl", "r", encoding="utf-8") as f:
        lines = f.readlines()
    
    print(f"Current number of tokens: {len(lines)}")
    
    # Add 32 padding tokens
    # Use the same format as other special tokens in the file
    for i in range(32):
        token_id = 100000 + i
        # Create padding token with same format as other special tokens
        padding_token = [f"<pad_{i}>", 0.0, "CONTROL", "basic", 8, None, [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]]
        lines.append(json.dumps(padding_token, ensure_ascii=False) + "\n")
    
    # Write back
    with open("plamo-2-translate/tokenizer.jsonl", "w", encoding="utf-8") as f:
        f.writelines(lines)
    
    print(f"New number of tokens: {len(lines)}")
    print("Tokenizer fixed!")

if __name__ == "__main__":
    fix_tokenizer()

Next, convert the model into gguf by the following command:

python convert_hf_to_gguf.py plamo-2-translate --outfile plamo-2-translate.gguf --outtype f32

Then build binaries as follows:

cmake -B release
cmake --build release --config Release

and finally, I successfully run the plamo-2-translate model as follows:

./release/bin/llama-cli -m plamo-2-translate.gguf -p "<|plamo:op|>dataset\ntranslation\n<|plamo:op|>input lang=English\nHello, how are you?\n<|plamo:op|>output\n" -no-cnv --verbose-prompt --no-warmup -sp

intermediate outputs

build: 5876 (272ffdb6) with Apple clang version 17.0.0 (clang-1700.0.13.5) for arm64-apple-darwin24.5.0
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device Metal (Apple M1 Max) - 64424 MiB free
llama_model_loader: loaded meta data with 37 key-value pairs and 467 tensors from plamo-2-translate.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = plamo2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Plamo 2 Translate
llama_model_loader: - kv   3:                         general.size_label str              = 10B
llama_model_loader: - kv   4:                            general.license str              = other
llama_model_loader: - kv   5:                       general.license.name str              = plamo-community-license
llama_model_loader: - kv   6:                       general.license.link str              = https://huggingface.co/pfnet/plamo-2-...
llama_model_loader: - kv   7:                   general.base_model.count u32              = 1
llama_model_loader: - kv   8:                  general.base_model.0.name str              = Plamo 2 8b
llama_model_loader: - kv   9:          general.base_model.0.organization str              = Pfnet
llama_model_loader: - kv  10:              general.base_model.0.repo_url str              = https://huggingface.co/pfnet/plamo-2-8b
llama_model_loader: - kv  11:                               general.tags arr[str,3]       = ["plamo", "translation", "translation"]
llama_model_loader: - kv  12:                          general.languages arr[str,2]       = ["en", "ja"]
llama_model_loader: - kv  13:             plamo2.attention.head_count_kv arr[i32,32]      = [0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4, ...
llama_model_loader: - kv  14:                      plamo2.context_length u32              = 10485760
llama_model_loader: - kv  15:                    plamo2.embedding_length u32              = 4096
llama_model_loader: - kv  16:                         plamo2.block_count u32              = 32
llama_model_loader: - kv  17:                plamo2.attention.head_count u32              = 32
llama_model_loader: - kv  18:    plamo2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  19:        plamo2.attention.group_norm_epsilon f32              = 0.000001
llama_model_loader: - kv  20:        plamo2.attention.layer_norm_epsilon f32              = 0.000001
llama_model_loader: - kv  21:                      plamo2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  22:                      plamo2.ssm.state_size u32              = 64
llama_model_loader: - kv  23:                     plamo2.ssm.conv_kernel u32              = 4
llama_model_loader: - kv  24:                  plamo2.ssm.time_step_rank u32              = 64
llama_model_loader: - kv  25:                      plamo2.ssm.inner_size u32              = 8192
llama_model_loader: - kv  26:                     plamo2.ssm.group_count u32              = 0
llama_model_loader: - kv  27:                 plamo2.feed_forward_length u32              = 16384
llama_model_loader: - kv  28:                          general.file_type u32              = 0
llama_model_loader: - kv  29:               general.quantization_version u32              = 2
llama_model_loader: - kv  30:                       tokenizer.ggml.model str              = plamo2
llama_model_loader: - kv  31:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  32:                      tokenizer.ggml.tokens arr[str,100032]  = ["<|plamo:unk|>", "<|plamo:bos|>", "<...
llama_model_loader: - kv  33:                      tokenizer.ggml.scores arr[f32,100032]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  34:                  tokenizer.ggml.token_type arr[i32,100032]  = [2, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  35:                tokenizer.ggml.eos_token_id u32              = 4
llama_model_loader: - kv  36:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - type  f32:  467 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = all F32
print_info: file size   = 35.50 GiB (32.00 BPW) 
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 61
load: token to piece cache size = 0.7989 MB
print_info: arch             = plamo2
print_info: vocab_only       = 0
print_info: n_ctx_train      = 10485760
print_info: n_embd           = 4096
print_info: n_layer          = 32
print_info: n_head           = 32
print_info: n_head_kv        = [0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4]
print_info: n_rot            = 128
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 128
print_info: n_embd_head_v    = 128
print_info: n_gqa            = [0, 8, 0, 8, 0, 8, 0, 8, 0, 8, 0, 8, 0, 8, 0, 8, 0, 8, 0, 8, 0, 8, 0, 8, 0, 8, 0, 8, 0, 8, 0, 8]
print_info: n_embd_k_gqa     = [0, 512, 0, 512, 0, 512, 0, 512, 0, 512, 0, 512, 0, 512, 0, 512, 0, 512, 0, 512, 0, 512, 0, 512, 0, 512, 0, 512, 0, 512, 0, 512]
print_info: n_embd_v_gqa     = [0, 512, 0, 512, 0, 512, 0, 512, 0, 512, 0, 512, 0, 512, 0, 512, 0, 512, 0, 512, 0, 512, 0, 512, 0, 512, 0, 512, 0, 512, 0, 512]
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-06
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 16384
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 1000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 10485760
print_info: rope_finetuned   = unknown
print_info: ssm_d_conv       = 4
print_info: ssm_d_inner      = 8192
print_info: ssm_d_state      = 64
print_info: ssm_dt_rank      = 64
print_info: ssm_n_group      = 0
print_info: ssm_dt_b_c_rms   = 0
print_info: model type       = 8B
print_info: model params     = 9.53 B
print_info: general.name     = Plamo 2 Translate
print_info: vocab type       = PLaMo2
print_info: n_vocab          = 100032
print_info: n_merges         = 0
print_info: BOS token        = 1 '<|plamo:bos|>'
print_info: EOS token        = 4 '<|plamo:op|>'
print_info: UNK token        = 0 '<|plamo:unk|>'
print_info: PAD token        = 3 '<|plamo:pad|>'
print_info: LF token         = 10 '
'
print_info: EOG token        = 4 '<|plamo:op|>'
print_info: max token length = 48
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: offloading 32 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 33/33 layers to GPU
load_tensors: Metal_Mapped model buffer size = 34784.34 MiB
load_tensors:   CPU_Mapped model buffer size =  1563.00 MiB
.............................................................................................
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 4096
llama_context: n_ctx_per_seq = 4096
llama_context: n_batch       = 2048
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = 0
llama_context: freq_base     = 1000000.0
llama_context: freq_scale    = 1
llama_context: n_ctx_per_seq (4096) < n_ctx_train (10485760) -- the full capacity of the model will not be utilized
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Max
ggml_metal_init: picking default device: Apple M1 Max
ggml_metal_load_library: using embedded metal library
ggml_metal_init: GPU name:   Apple M1 Max
ggml_metal_init: GPU family: MTLGPUFamilyApple7  (1007)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction   = true
ggml_metal_init: simdgroup matrix mul. = true
ggml_metal_init: has residency sets    = true
ggml_metal_init: has bfloat            = true
ggml_metal_init: use bfloat            = false
ggml_metal_init: hasUnifiedMemory      = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 67554.51 MB
ggml_metal_init: skipping kernel_get_rows_bf16                     (not supported)
ggml_metal_init: skipping kernel_set_rows_bf16                     (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_c4                (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row              (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4                (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_bf16                  (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_bf16_f16                (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h80           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h96           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h112          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h128          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h192          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_hk192_hv128   (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h256          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_hk576_hv512   (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h64       (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h96       (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h128      (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h192      (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_hk192_hv128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h256      (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_hk576_hv512 (not supported)
ggml_metal_init: skipping kernel_cpy_f32_bf16                      (not supported)
ggml_metal_init: skipping kernel_cpy_bf16_f32                      (not supported)
ggml_metal_init: skipping kernel_cpy_bf16_bf16                     (not supported)
llama_context:        CPU  output buffer size =     0.38 MiB
llama_kv_cache_unified:      Metal KV buffer size =   128.00 MiB
llama_kv_cache_unified: size =  128.00 MiB (  4096 cells,  16 layers,  1 seqs), K (f16):   64.00 MiB, V (f16):   64.00 MiB
llama_kv_cache_unified: LLAMA_SET_ROWS=0, using old ggml_cpy() method for backwards compatibility
llama_memory_recurrent: mem_size = 1, n_seq_max = 1, type_r = 'f32', type_s = 'f32', n_layer = 32
llama_memory_recurrent:      Metal KV buffer size =    33.50 MiB
llama_memory_recurrent: KV self size  =   33.50 MiB, R (f32):    1.50 MiB, S (f32):   32.00 MiB
llama_context:      Metal compute buffer size =   306.10 MiB
llama_context:        CPU compute buffer size =    16.01 MiB
llama_context: graph nodes  = 2038
llama_context: graph splits = 9
common_init_from_params: setting dry_penalty_last_n to ctx_size = 4096
main: llama threadpool init, n_threads = 8

system_info: n_threads = 8 (n_threads_batch = 8) / 10 | Metal : EMBED_LIBRARY = 1 | CPU : NEON = 1 | ARM_FMA = 1 | FP16_VA = 1 | DOTPROD = 1 | LLAMAFILE = 1 | ACCELERATE = 1 | REPACK = 1 | 

main: prompt: '<|plamo:op|>dataset
translation
<|plamo:op|>input lang=English
Hello, how are you?
<|plamo:op|>output
'
main: number of tokens in prompt = 20
     4 -> '<|plamo:op|>'
 45474 -> 'dataset'
    10 -> '
'
 18053 -> 'translation'
    10 -> '
'
     4 -> '<|plamo:op|>'
  1760 -> 'input'
 98700 -> ' lang'
    61 -> '='
 14134 -> 'English'
    10 -> '
'
  6721 -> 'Hello'
    44 -> ','
  1205 -> ' how'
  1089 -> ' are'
  1099 -> ' you'
  1076 -> '?
'
     4 -> '<|plamo:op|>'
  3045 -> 'output'
    10 -> '
'

sampler seed: 64554044
sampler params: 
	repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
	dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 4096
	top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.800
	mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
generate: n_ctx = 4096, n_batch = 2048, n_predict = -1, n_keep = 0

Output:

<|plamo:op|>dataset
translation
<|plamo:op|>input lang=English
Hello, how are you?
<|plamo:op|>output
こんにちは、ご機嫌いかがですか？
<|plamo:op|> [end of text]


llama_perf_sampler_print:    sampling time =       0.29 ms /    26 runs   (    0.01 ms per token, 89347.08 tokens per second)
llama_perf_context_print:        load time =    6939.57 ms
llama_perf_context_print: prompt eval time =     378.42 ms /    20 tokens (   18.92 ms per token,    52.85 tokens per second)
llama_perf_context_print:        eval time =     625.90 ms /     5 runs   (  125.18 ms per token,     7.99 tokens per second)
llama_perf_context_print:       total time =    7566.19 ms /    25 tokens
ggml_metal_free: deallocating

Seems correctly working!

This will be necessary to support Jamba (and other recurrent models mixed with Attention). Doesn't compile yet, and finding a slot isn't yet done correctly for recurrent states.

* llama : begin work on support for variable GQA This will also be useful for Jamba if we consider the Mamba layers to have 0 KV heads. * llama : gracefully fail when not finding hybrid slot

* ggml : simplify SSM-related operators * llama : make recurrent state slot allocation contiguous * llama : adapt internal uses of batches to llama_ubatch

This reduces overhead when running hellaswag on thousands of sequences with very small 100k params Mamba models.

This otherwise was a problem when running the HellaSwag benchmark with small batch sizes, making it crash.

This removes the need for ggml_ssm_conv!!! But performance seems slighly worse on my system, especially for prompt processing. Maybe ggml_mul_mat isn't optimized for small row sizes? More performance testing is necessary until GGML_OP_SSM_CONV is removed. * ggml : make ggml_ssm_scan not modify its source tensors * llama : fix shared recurrent tail cell count for small ubatch sizes Otherwise it was impossible to run the 'parallel' example with '-ub 1' with a Mamba or Jamba model.

* ggml : allow GGML_OP_CONCAT to work on non-contiguous tensors The implementation already supported it, and this makes Mamba's conv step slightly faster.

This can be changed back later if the name change is wrong. I was renaming the functions anyway to generalize kv-cache-related functions to hybrid and recurrent model architectures. I think llama_past is a better name than llama_cache for a combined kv cache and recurrent state cache, because the states it contains pretty much always come before the newly-added ones for any particular sequence. Also 'llama_past_clear' sounds more obvious in what it does than 'llama_kv_cache_clear'. The future is what the models generate. (For embeddings, the kv cache isn't really used anyway) Still, I'm open to better suggestions.

This also slightly reduces the diff from the master branch

Also begin reverting some implicit state rollback code.

But this time it contains the sub-cache graph inputs. This *should* make it easier to handle updating the inputs when caching the graph (eventually).

compilade added 30 commits April 3, 2024 20:47

wip: llama : separate recurrent states from the KV cache

271104c

This will be necessary to support Jamba (and other recurrent models mixed with Attention). Doesn't compile yet, and finding a slot isn't yet done correctly for recurrent states.

llama : use std::find for seq_nodes in llama_rs_cache

8db1e4d

llama : state checkpoints for recurrent models

0028010

llama : correctly handle more edge cases for the rs cache

0c8b3b2

Merge branch 'master' into compilade/refactor-kv-cache

d66849f

llama : rename many llama_kv_cache_* functions

a09db95

Merge branch 'master' into compilade/refactor-kv-cache

c460ff1

llama : remove useless return value for some llama_cache_* functions

b6fafd1

Merge branch 'master' into compilade/refactor-kv-cache

b7ec12e

Merge branch 'master' into compilade/refactor-kv-cache

3b57b55

llama : rethink recurrent state cell counts

7e13f19

* llama : begin work on support for variable GQA This will also be useful for Jamba if we consider the Mamba layers to have 0 KV heads. * llama : gracefully fail when not finding hybrid slot

llama : support Jamba

cbc743e

Merge branch 'master' into compilade/refactor-kv-cache

0fd13e9

llama : fix BERT inference without KV cache

61a88a1

convert-hf : check for unprocessed Jamba experts

ea2e63e

convert-hf : support Mini-Jamba conversion

fc59407

llama : fix Jamba quantization sanity checks

181dadf

llama : sequence-length-aware batch splitting

3a414b0

Merge branch 'master' into compilade/refactor-kv-cache

4e4c41e

llama : use equal-sequence-length sub-batches for recurrent models

3587a94

* ggml : simplify SSM-related operators * llama : make recurrent state slot allocation contiguous * llama : adapt internal uses of batches to llama_ubatch

Merge branch 'master' into compilade/refactor-kv-cache

5d3c7b9

llama : fix batch split output count for embeddings

72eea49

llama : minimize swaps when reordering logits

18d1c14

This reduces overhead when running hellaswag on thousands of sequences with very small 100k params Mamba models.

llama : fix edge case finding batch seq_id of split recurrent cell

61200ef

This otherwise was a problem when running the HellaSwag benchmark with small batch sizes, making it crash.

llama : avoid copies for simple batch splits

eb589d5

llama : fix .base() compilation error on Windows

17f6c1e

llama : allow doing the equivalent of SSM_CONV with SUM_ROWS and MUL

fee3c1d

* ggml : allow GGML_OP_CONCAT to work on non-contiguous tensors The implementation already supported it, and this makes Mamba's conv step slightly faster.

Merge branch 'master' into compilade/refactor-kv-cache

6840ac0

compilade and others added 23 commits June 11, 2024 23:27

examples : replace llama_kv_cache_seq_* with llama_past_seq_*

43d8d4b

Merge branch 'master' into compilade/refactor-kv-cache

ff794f5

mamba : fix non-contiguous usage of ggml_silu

33425a7

Merge branch 'master' into compilade/refactor-kv-cache

10c3c41

Merge branch 'master' into compilade/refactor-kv-cache

9b38f8b

Merge branch 'master' into compilade/refactor-kv-cache

bc320ef

llama : session saving and reloading for hybrid models

fcb889c

Merge branch 'master' into compilade/refactor-kv-cache

a03e32a

convert_hf : fix Jamba conversion

9d3f44d

llama : fix mixed signedness comparison

5f62db7

llama : use unused n_embd_k_gqa in k_shift

375de5b

This also slightly reduces the diff from the master branch

llama : begin renaming llama_past back to llama_kv_cache

4bb4b22

Merge branch 'master' into compilade/refactor-kv-cache

63ac36b

Merge branch 'master' into compilade/refactor-kv-cache

124c222

Also begin reverting some implicit state rollback code.

llama : remove implicit recurrent state rollbacks

8006f3b

Merge branch 'master' into compilade/refactor-kv-cache

691698e

llama : partially apply clang-format style

e3fe612

Merge branch 'master' into compilade/refactor-kv-cache

2bcaf64

convert : fix jamba conv1d shape squeezing

908e655

Merge branch 'master' into compilade/refactor-kv-cache

4682e21

graph : add back hybrid memory graph input

20f8e43

But this time it contains the sub-cache graph inputs. This *should* make it easier to handle updating the inputs when caching the graph (eventually).

model : add Jamba to Mamba-specific hparams printing

07c252f

Add PLaMo-2 model using hybrid memory module

dbce4e2

mitmul mentioned this pull request Jul 7, 2025

Add plamo2 #13930

Closed

github-actions bot added examples python python script changes labels Jul 7, 2025

mitmul changed the title ~~Mitmul/add plamo2~~ Add PLaMo-2 model Jul 7, 2025

mitmul added 2 commits July 7, 2025 16:55

Fix z shape

272ffdb

Add cmath include

0859c53

mitmul marked this pull request as ready for review July 7, 2025 08:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add PLaMo-2 model #14560

Add PLaMo-2 model #14560

mitmul commented Jul 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Add PLaMo-2 model #14560

Are you sure you want to change the base?

Add PLaMo-2 model #14560

Conversation

mitmul commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mitmul commented Jul 7, 2025 •

edited

Loading