Eval bug: the swiftui keeps saying the same thing #12558

voidtem · 2025-03-25T03:35:41Z

Name and Version

I follow the swiftui example to build an ios app
download and load the tinyllama-1.1b-f16
then I put "HI" in the textarea and press send button
then the respond just looks like the image
does anyone know why?
thx

Operating systems

Mac

GGML backends

CPU

Hardware

iphone 16 pro max

Models

tinyllama-1.1b-f16

Problem description & steps to reproduce

I follow the swiftui example to build an ios app
download and load the tinyllama-1.1b-f16
then I put "HI" in the textarea and press send button

First Bad Commit

No response

Relevant log output

CFPrefsPlistSource<0x303fc8a80> (Domain: kCFPreferencesAnyApplication, User: kCFPreferencesCurrentUser, ByHost: No, Container: (null), Contents Need Refresh: No): Value for key AppleLanguages was (
    "en-US",
    "zh-Hans-US",
    "zh-Hant-US"
). Expected (
    "en-US"
) (Setup(275): 2024-10-02 09:49:02 (PDT))
Downloading model TinyLlama-1.1B (F16, 2.2 GiB) from https://huggingface.co/ggml-org/models/resolve/main/tinyllama-1.1b/ggml-model-f16.gguf?download=true
nw_connection_add_timestamp_locked_on_nw_queue [C2] Hit maximum timestamp count, will start dropping events
Writing to tinyllama-1.1b-f16.gguf completed
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
llama_model_load_from_file_impl: using device Metal (Apple A18 Pro GPU) - 5461 MiB free
llama_model_loader: loaded meta data with 21 key-value pairs and 201 tensors from /var/mobile/Containers/Data/Application/605BF671-944E-4399-A04E-625E72CF19F9/Documents/tinyllama-1.1b-f16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = huggingface
llama_model_loader: - kv   2:                       llama.context_length u32              = 2048
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv   4:                          llama.block_count u32              = 22
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5632
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 64
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 4
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 1
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  15:                      tokenizer.ggml.merges arr[str,61249]   = ["▁ t", "e r", "i n", "▁ a", "e n...
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  19:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  20:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - type  f32:   45 tensors
llama_model_loader: - type  f16:  156 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 2.05 GiB (16.00 BPW) 
init_tokenizer: initializing tokenizer for type 1
load: control token:      1 '<s>' is not marked as EOG
load: control token:      2 '</s>' is not marked as EOG
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 3
load: token to piece cache size = 0.1684 MB
print_info: arch             = llama
print_info: vocab_only       = 0
print_info: n_ctx_train      = 2048
print_info: n_embd           = 2048
print_info: n_layer          = 22
print_info: n_head           = 32
print_info: n_head_kv        = 4
print_info: n_rot            = 64
print_info: n_swa            = 0
print_info: n_swa_pattern    = 1
print_info: n_embd_head_k    = 64
print_info: n_embd_head_v    = 64
print_info: n_gqa            = 8
print_info: n_embd_k_gqa     = 256
print_info: n_embd_v_gqa     = 256
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-05
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 5632
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 0
print_info: rope scaling     = linear
print_info: freq_base_train  = 10000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 2048
print_info: rope_finetuned   = unknown
print_info: ssm_d_conv       = 0
print_info: ssm_d_inner      = 0
print_info: ssm_d_state      = 0
print_info: ssm_dt_rank      = 0
print_info: ssm_dt_b_c_rms   = 0
print_info: model type       = 1B
print_info: model params     = 1.10 B
print_info: general.name     = huggingface
print_info: vocab type       = SPM
print_info: n_vocab          = 32000
print_info: n_merges         = 0
print_info: BOS token        = 1 '<s>'
print_info: EOS token        = 2 '</s>'
print_info: UNK token        = 0 '<unk>'
print_info: LF token         = 13 '<0x0A>'
print_info: EOG token        = 2 '</s>'
print_info: max token length = 48
load_tensors: loading model tensors, this can take a while... (mmap = true)
make_cpu_buft_list: disabling extra buffer types (i.e. repacking) since a GPU device is available
load_tensors: layer   0 assigned to device Metal, is_swa = 0
load_tensors: layer   1 assigned to device Metal, is_swa = 0
load_tensors: layer   2 assigned to device Metal, is_swa = 0
load_tensors: layer   3 assigned to device Metal, is_swa = 0
load_tensors: layer   4 assigned to device Metal, is_swa = 0
load_tensors: layer   5 assigned to device Metal, is_swa = 0
load_tensors: layer   6 assigned to device Metal, is_swa = 0
load_tensors: layer   7 assigned to device Metal, is_swa = 0
load_tensors: layer   8 assigned to device Metal, is_swa = 0
load_tensors: layer   9 assigned to device Metal, is_swa = 0
load_tensors: layer  10 assigned to device Metal, is_swa = 0
load_tensors: layer  11 assigned to device Metal, is_swa = 0
load_tensors: layer  12 assigned to device Metal, is_swa = 0
load_tensors: layer  13 assigned to device Metal, is_swa = 0
load_tensors: layer  14 assigned to device Metal, is_swa = 0
load_tensors: layer  15 assigned to device Metal, is_swa = 0
load_tensors: layer  16 assigned to device Metal, is_swa = 0
load_tensors: layer  17 assigned to device Metal, is_swa = 0
load_tensors: layer  18 assigned to device Metal, is_swa = 0
load_tensors: layer  19 assigned to device Metal, is_swa = 0
load_tensors: layer  20 assigned to device Metal, is_swa = 0
load_tensors: layer  21 assigned to device Metal, is_swa = 0
load_tensors: layer  22 assigned to device Metal, is_swa = 0
ggml_backend_metal_log_allocated_size: allocated buffer, size =  2098.36 MiB, ( 2098.44 /  5461.34)
load_tensors: offloading 22 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 23/23 layers to GPU
load_tensors:   CPU_Mapped model buffer size =   125.00 MiB
load_tensors: Metal_Mapped model buffer size =  2098.36 MiB
..........................................................................................
Using 4 threads
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 2048
llama_context: n_ctx_per_seq = 2048
llama_context: n_batch       = 2048
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = 0
llama_context: freq_base     = 10000.0
llama_context: freq_scale    = 1
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple A18 Pro GPU
ggml_metal_load_library: using embedded metal library
fopen failed for data file: errno = 2 (No such file or directory)
Errors found! Invalidating cache...
Warning: Compilation succeeded with: 

program_source:3821:27: warning: comparison of integers of different signs: 'int' and 'const uint' (aka 'const unsigned int') [-Wsign-compare]
        for (int j = 0; j < head_size; j += 4) {
                        ~ ^ ~~~~~~~~~
program_source:479:28: warning: unused variable 'ksigns64' [-Wunused-const-variable]
GGML_TABLE_BEGIN(uint64_t, ksigns64, 128)
                           ^
ggml_metal_init: GPU name:   Apple A18 Pro GPU
ggml_metal_init: GPU family: MTLGPUFamilyApple9  (1009)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction   = true
ggml_metal_init: simdgroup matrix mul. = true
ggml_metal_init: has residency sets    = true
ggml_metal_init: has bfloat            = true
ggml_metal_init: use bfloat            = true
ggml_metal_init: hasUnifiedMemory      = true
ggml_metal_init: recommendedMaxWorkingSetSize  =  5726.63 MB
fopen failed for data file: errno = 2 (No such file or directory)
Errors found! Invalidating cache...
ggml_metal_init: loaded kernel_add                                    0x3035eaee0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_add_row                                0x303591860 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_sub                                    0x303592280 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_sub_row                                0x3035ecde0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul                                    0x3035e14a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_row                                0x3035e15c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_div                                    0x3035c2880 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_div_row                                0x303589380 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_repeat_f32                             0x3035c2820 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_repeat_f16                             0x3035d6820 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_repeat_i32                             0x3035ac060 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_repeat_i16                             0x3035ac6c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_scale                                  0x3035b0060 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_scale_4                                0x3035b06c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_clamp                                  0x3035b0780 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_tanh                                   0x3035b07e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_relu                                   0x3035b0840 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_sigmoid                                0x3035b08a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_gelu                                   0x3035b0900 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_gelu_4                                 0x3035b0960 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_gelu_quick                             0x3035b09c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_gelu_quick_4                           0x3035b0a20 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_silu                                   0x3035b0a80 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_silu_4                                 0x3035b0ae0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_elu                                    0x3035b0b40 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_soft_max_f16                           0x3035ac720 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_soft_max_f16_4                         0x3035b0ba0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_soft_max_f32                           0x3035acde0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_soft_max_f32_4                         0x3035b0c00 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_diag_mask_inf                          0x3035ad140 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_diag_mask_inf_8                        0x3035ad4a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_f32                           0x3035b0f00 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_f16                           0x3035b1260 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_bf16                          0x3035b1320 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q4_0                          0x3035b1980 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q4_1                          0x3035b1a40 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q5_0                          0x3035b1da0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q5_1                          0x3035ad620 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q8_0                          0x3035ad7a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q2_K                          0x3035adb00 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q3_K                          0x3035ade60 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q4_K                          0x3035ae1c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q5_K                          0x3035ae520 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q6_K                          0x3035ae880 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_iq2_xxs                       0x3035aebe0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_iq2_xs                        0x3035aef40 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_iq3_xxs                       0x3035af2a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_iq3_s                         0x3035b1680 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_iq2_s                         0x3035b2400 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_iq1_s                         0x3035b2460 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_iq1_m                         0x3035b8120 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_iq4_nl                        0x3035b2760 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_iq4_xs                        0x3035b2ac0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_i32                           0x3035b2b80 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_rms_norm                               0x3035b2ee0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_l2_norm                                0x3035b3240 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_group_norm                             0x30358e220 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_norm                                   0x3035a8060 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_ssm_conv_f32                           0x3035a8240 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_ssm_scan_f32                           0x3035b3420 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_rwkv_wkv6_f32                          0x3035a8960 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_rwkv_wkv7_f32                          0x3035a91a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_f32_f32                         0x3035a9200 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_bf16_f32                        0x3035b3960 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_bf16_f32_1row                   0x3035af960 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_bf16_f32_l4                     0x3035a4060 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_bf16_bf16                       0x3035a4780 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_f16_f32                         0x3035a47e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_f16_f32_1row                    0x3035a4f00 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_f16_f32_l4                      0x3035a4fc0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_f16_f16                         0x3035a9380 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q4_0_f32                        0x3035a9a40 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q4_1_f32                        0x3035aa100 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q5_0_f32                        0x3035aa7c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q5_1_f32                        0x3035a64c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q8_0_f32                        0x3035a6b80 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_f16_f32_r1_2                0x3035aa8e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_f16_f32_r1_3                0x3035aa940 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_f16_f32_r1_4                0x3035ab000 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_f16_f32_r1_5                0x3035ab6c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_0_f32_r1_2               0x3035ab720 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_0_f32_r1_3               0x3035a18c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_0_f32_r1_4               0x3035a20a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_0_f32_r1_5               0x3035a2880 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_1_f32_r1_2               0x3035a3060 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_1_f32_r1_3               0x3035a30c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_1_f32_r1_4               0x3035a3120 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_1_f32_r1_5               0x3035a3180 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_0_f32_r1_2               0x3035a31e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_0_f32_r1_3               0x3035a3240 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_0_f32_r1_4               0x30355e880 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_0_f32_r1_5               0x30355ff60 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_1_f32_r1_2               0x30355ff00 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_1_f32_r1_3               0x30355fea0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_1_f32_r1_4               0x30355fe40 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_1_f32_r1_5               0x30355fde0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q8_0_f32_r1_2               0x30355fd80 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q8_0_f32_r1_3               0x30355fd20 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q8_0_f32_r1_4               0x30355fcc0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q8_0_f32_r1_5               0x30355fc60 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_K_f32_r1_2               0x30355fc00 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_K_f32_r1_3               0x30355fba0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_K_f32_r1_4               0x30355fb40 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_K_f32_r1_5               0x30355fae0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_K_f32_r1_2               0x30355fa80 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_K_f32_r1_3               0x30355fa20 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_K_f32_r1_4               0x30355f9c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_K_f32_r1_5               0x30355f960 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q6_K_f32_r1_2               0x30355f900 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q6_K_f32_r1_3               0x30355f8a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q6_K_f32_r1_4               0x30355f060 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_q6_K_f32_r1_5               0x30355bd20 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_iq4_nl_f32_r1_2             0x303557e40 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_iq4_nl_f32_r1_3             0x303557de0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_iq4_nl_f32_r1_4             0x30354c060 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_ext_iq4_nl_f32_r1_5             0x30354c840 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q2_K_f32                        0x30354c8a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q3_K_f32                        0x303552880 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q4_K_f32                        0x303553060 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q5_K_f32                        0x303553840 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q6_K_f32                        0x303553f00 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_iq2_xxs_f32                     0x30354d0e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_iq2_xs_f32                      0x30354d7a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_iq3_xxs_f32                     0x30354de60 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_iq3_s_f32                       0x30354e520 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_iq2_s_f32                       0x3035487e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_iq1_s_f32                       0x303548ea0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_iq1_m_f32                       0x303549560 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_iq4_nl_f32                      0x30354ed60 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_iq4_xs_f32                      0x303549c20 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_id_f32_f32                      0x303549ce0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_id_f16_f32                      0x303549d40 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_id_bf16_f32                     0x303549da0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_id_q4_0_f32                     0x303544780 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_id_q4_1_f32                     0x3035447e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_id_q5_0_f32                     0x303544f00 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_id_q5_1_f32                     0x303545620 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_id_q8_0_f32                     0x303545680 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_id_q2_K_f32                     0x3035456e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_id_q3_K_f32                     0x303545740 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_id_q4_K_f32                     0x3035457a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_id_q5_K_f32                     0x303545800 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_id_q6_K_f32                     0x303541680 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_id_iq2_xxs_f32                  0x303545860 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_id_iq2_xs_f32                   0x3035458c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_id_iq3_xxs_f32                  0x303545920 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_id_iq3_s_f32                    0x303545980 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_id_iq2_s_f32                    0x3035459e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_id_iq1_s_f32                    0x303545a40 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_id_iq1_m_f32                    0x30357c7e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_id_iq4_nl_f32                   0x30357c840 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_id_iq4_xs_f32                   0x30357cf60 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_f32_f32                         0x303545bc0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_f16_f32                         0x3035462e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_bf16_f32                        0x303547120 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q4_0_f32                        0x303547180 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q4_1_f32                        0x3035471e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q5_0_f32                        0x303547720 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q5_1_f32                        0x303547780 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q8_0_f32                        0x30357e760 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q2_K_f32                        0x30357eca0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q3_K_f32                        0x30357f1e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q4_K_f32                        0x30357f720 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q5_K_f32                        0x30357f780 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q6_K_f32                        0x30357f7e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_iq2_xxs_f32                     0x30357f840 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_iq2_xs_f32                      0x30357f8a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_iq3_xxs_f32                     0x30357f900 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_iq3_s_f32                       0x30357f960 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_iq2_s_f32                       0x30357f9c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_iq1_s_f32                       0x30357fa20 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_iq1_m_f32                       0x30357fa80 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_iq4_nl_f32                      0x30357fae0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_iq4_xs_f32                      0x30357fb40 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_id_f32_f32                      0x30357fba0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_id_f16_f32                      0x30357fc00 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_id_bf16_f32                     0x30357fc60 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_id_q4_0_f32                     0x303574660 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_id_q4_1_f32                     0x303574c00 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_id_q5_0_f32                     0x3035751a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_id_q5_1_f32                     0x303575740 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_id_q8_0_f32                     0x3035757a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_id_q2_K_f32                     0x303575800 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_id_q3_K_f32                     0x303575860 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_id_q4_K_f32                     0x3035758c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_id_q5_K_f32                     0x303575920 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_id_q6_K_f32                     0x303575980 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_id_iq2_xxs_f32                  0x303572880 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_id_iq2_xs_f32                   0x303572e20 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_id_iq3_xxs_f32                  0x3035733c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_id_iq3_s_f32                    0x303573960 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_id_iq2_s_f32                    0x3035760a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_id_iq1_s_f32                    0x303576640 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_id_iq1_m_f32                    0x303576be0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_id_iq4_nl_f32                   0x303577180 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_id_iq4_xs_f32                   0x3035771e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_rope_norm_f32                          0x30356cc60 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_rope_norm_f16                          0x30356d7a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_rope_neox_f32                          0x30356e160 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_rope_neox_f16                          0x303577240 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_im2col_f16                             0x303577300 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_im2col_f32                             0x3035773c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_im2col_ext_f16                         0x303568060 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_im2col_ext_f32                         0x303568660 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_conv_transpose_1d_f32_f32              0x303568720 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_conv_transpose_1d_f16_f32              0x303568d20 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_upscale_f32                            0x303568fc0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_pad_f32                                0x303569260 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_pad_reflect_1d_f32                     0x30356eb20 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_timestep_embedding_f32                 0x30356f4e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_arange_f32                             0x303573d20 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_argsort_f32_i32_asc                    0x303573c00 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_argsort_f32_i32_desc                   0x303569a40 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_leaky_relu_f32                         0x30356a0a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h64                 0x303573b40 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h80                 0x303573ae0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h96                 0x30356fea0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h112                0x30356ba20 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h128                0x30356b9c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h256                0x303560060 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_bf16_h64                0x303560840 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_bf16_h80                0x3035608a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_bf16_h96                0x303564900 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_bf16_h112               0x3035609c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_bf16_h128               0x303560a20 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_bf16_h256               0x303561200 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h64                0x3035660a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h80                0x303561aa0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h96                0x303561b00 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h112               0x3035622e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h128               0x303562ac0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h256               0x303562b20 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h64                0x303562b80 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h80                0x30351c8a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h96                0x30351c900 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h112               0x303562ca0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h128               0x303562d00 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h256               0x3035634e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h64                0x303563cc0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h80                0x303563d20 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h96                0x303563d80 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h112               0x303563de0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h128               0x303563e40 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h256               0x3035190e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h64                0x3035148a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h80                0x303514900 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h96                0x303514960 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h112               0x3035149c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h128               0x303514a80 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h256               0x303514de0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h64                0x303514e40 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h80                0x303514ea0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h96                0x303514f00 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h112               0x303514f60 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h128               0x303511440 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h256               0x303512be0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_f16_h128            0x303512d00 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_bf16_h128           0x303512d60 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q4_0_h128           0x303512dc0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q4_1_h128           0x303512e20 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q5_0_h128           0x303512e80 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q5_1_h128           0x303512ee0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q8_0_h128           0x30350c8a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_f16_h256            0x30350c900 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_bf16_h256           0x30350d0e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q4_0_h256           0x30350d8c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q4_1_h256           0x30350d920 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q5_0_h256           0x30350d980 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q5_1_h256           0x30350d9e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q8_0_h256           0x303508900 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_set_f32                                0x3035090e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_set_i32                                0x30350a0a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f32_f32                            0x30350a160 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f32_f16                            0x30350a640 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f32_bf16                           0x30350aca0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f16_f32                            0x30350b300 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f16_f16                            0x30350db00 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_bf16_f32                           0x30350e340 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_bf16_bf16                          0x30350e820 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f32_q8_0                           0x303504060 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f32_q4_0                           0x303504660 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f32_q4_1                           0x3035046c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f32_q5_0                           0x303504720 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f32_q5_1                           0x303504780 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f32_iq4_nl                         0x3035047e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_q4_0_f32                           0x303504840 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_q4_0_f16                           0x3035048a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_q4_1_f32                           0x303501920 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_q4_1_f16                           0x303501f20 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_q5_0_f32                           0x303502520 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_q5_0_f16                           0x303502b20 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_q5_1_f32                           0x303502b80 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_q5_1_f16                           0x303502be0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_q8_0_f32                           0x303502c40 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_q8_0_f16                           0x303502d00 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_concat                                 0x303502dc0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_sqr                                    0x303505c20 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_sqrt                                   0x303506820 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_sin                                    0x303506e80 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cos                                    0x303506ee0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_sum_rows                               0x303506f40 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_argmax                                 0x303506fa0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_pool_2d_avg_f32                        0x303507900 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_pool_2d_max_f32                        0x303507de0 | th_max = 1024 | th_width =   32
set_abort_callback: call
llama_context:        CPU  output buffer size =     0.12 MiB
llama_context: n_ctx = 2048
llama_context: n_ctx = 2048 (padded)
init: kv_size = 2048, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 22, can_shift = 1
init: layer   0: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer   1: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer   2: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer   3: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer   4: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer   5: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer   6: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer   7: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer   8: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer   9: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer  10: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer  11: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer  12: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer  13: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer  14: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer  15: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer  16: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer  17: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer  18: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer  19: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer  20: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer  21: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init:      Metal KV buffer size =    44.00 MiB
llama_context: KV self size  =   44.00 MiB, K (f16):   22.00 MiB, V (f16):   22.00 MiB
llama_context: enumerating backends
llama_context: backend_ptrs.size() = 3
llama_context: max_nodes = 65536
llama_context: n_tokens = 512, n_seqs = 1, n_outputs = 512
llama_context:      Metal compute buffer size =   148.00 MiB
llama_context:        CPU compute buffer size =     8.01 MiB
llama_context: graph nodes  = 754
llama_context: graph splits = 2
Downloading model TinyLlama-1.1B Chat (Q8_0, 1.1 GiB) from https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q8_0.gguf?download=true
Requesting visual style in an implementation that has disabled it, returning nil. Behavior of caller is undefined.
Requesting visual style in an implementation that has disabled it, returning nil. Behavior of caller is undefined.
Requesting visual style in an implementation that has disabled it, returning nil. Behavior of caller is undefined.
Requesting visual style in an implementation that has disabled it, returning nil. Behavior of caller is undefined.
Requesting visual style in an implementation that has disabled it, returning nil. Behavior of caller is undefined.
Error: cancelled
Unable to simultaneously satisfy constraints.
	Probably at least one of the constraints in the following list is one you don't want. 
	Try this: 
		(1) look at each constraint and try to figure out which you don't expect; 
		(2) find the code that added the unwanted constraint or constraints and fix it. 
(
    "<NSLayoutConstraint:0x3031c9cc0 'accessoryView.bottom' _UIRemoteKeyboardPlaceholderView:0x1151e5400.bottom == _UIKBCompatInputView:0x102b49f80.top   (active)>",
    "<NSLayoutConstraint:0x3031d95e0 'assistantHeight' SystemInputAssistantView.height == 45   (active, names: SystemInputAssistantView:0x115100600 )>",
    "<NSLayoutConstraint:0x3032b4410 'assistantView.bottom' SystemInputAssistantView.bottom == _UIKBCompatInputView:0x102b49f80.top   (active, names: SystemInputAssistantView:0x115100600 )>",
    "<NSLayoutConstraint:0x3032b5540 'assistantView.top' V:[_UIRemoteKeyboardPlaceholderView:0x1151e5400]-(0)-[SystemInputAssistantView]   (active, names: SystemInputAssistantView:0x115100600 )>"
)

Will attempt to recover by breaking constraint 
<NSLayoutConstraint:0x3031d95e0 'assistantHeight' SystemInputAssistantView.height == 45   (active, names: SystemInputAssistantView:0x115100600 )>

Make a symbolic breakpoint at UIViewAlertForUnsatisfiableConstraints to catch this in the debugger.
The methods in the UIConstraintBasedLayoutDebugging category on UIView listed in <UIKitCore/UIView.h> may also be helpful.
attempting to complete "Hi"

 n_len = 1024, n_ctx = 2048, n_kv_req = 1024

 Hi
,
 I
'
m
 new
 to
 this
 forum
.
 I
'
m
 trying
 to
 get
 my
 
2
0
0
8
 C
TS
-
V
 to
 run
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 
6
.
8
L
 engine
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 with
 the
 
6
.
0
L
 engine
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 
6
.
8
L
 engine
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 transmission
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 transmission
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
 a
 
2
0
0
8
 C
TS
-
V
 rear
 differential
.
 I
 have
 a
 
2
0
1
0
 Ford
 F
-
1
5
0
 rear
 differential
.
 I
 have
Message from debugger: killed

ngxson · 2025-03-26T13:20:12Z

The tinyllama model is maybe too old and it is not compatible with llama.cpp's chat template infrastructure.

You should try newer models like llama, gemma, qwen

github-actions · 2025-05-10T01:07:45Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

voidtem added the bug-unconfirmed label Mar 25, 2025

github-actions bot added the stale label Apr 26, 2025

github-actions bot closed this as completed May 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: the swiftui keeps saying the same thing #12558

Eval bug: the swiftui keeps saying the same thing #12558

voidtem commented Mar 25, 2025

ngxson commented Mar 26, 2025

github-actions bot commented May 10, 2025

Eval bug: the swiftui keeps saying the same thing #12558

Eval bug: the swiftui keeps saying the same thing #12558

Comments

voidtem commented Mar 25, 2025

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

ngxson commented Mar 26, 2025

github-actions bot commented May 10, 2025