You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I follow the swiftui example to build an ios app
download and load the tinyllama-1.1b-f16
then I put "HI" in the textarea and press send button
then the respond just looks like the image
does anyone know why?
thx
Operating systems
Mac
GGML backends
CPU
Hardware
iphone 16 pro max
Models
tinyllama-1.1b-f16
Problem description & steps to reproduce
I follow the swiftui example to build an ios app
download and load the tinyllama-1.1b-f16
then I put "HI" in the textarea and press send button
First Bad Commit
No response
Relevant log output
CFPrefsPlistSource<0x303fc8a80> (Domain: kCFPreferencesAnyApplication, User: kCFPreferencesCurrentUser, ByHost: No, Container: (null), Contents Need Refresh: No): Value for key AppleLanguages was (
"en-US",
"zh-Hans-US",
"zh-Hant-US"
). Expected (
"en-US"
) (Setup(275): 2024-10-02 09:49:02 (PDT))
Downloading model TinyLlama-1.1B (F16, 2.2 GiB) from https://huggingface.co/ggml-org/models/resolve/main/tinyllama-1.1b/ggml-model-f16.gguf?download=true
nw_connection_add_timestamp_locked_on_nw_queue [C2] Hit maximum timestamp count, will start dropping events
Writing to tinyllama-1.1b-f16.gguf completed
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
Publishing changes from background threads is not allowed; make sure to publish values from the main thread (via operators like receive(on:)) on model updates.
llama_model_load_from_file_impl: using device Metal (Apple A18 Pro GPU) - 5461 MiB free
llama_model_loader: loaded meta data with 21 key-value pairs and 201 tensors from /var/mobile/Containers/Data/Application/605BF671-944E-4399-A04E-625E72CF19F9/Documents/tinyllama-1.1b-f16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = huggingface
llama_model_loader: - kv 2: llama.context_length u32 = 2048
llama_model_loader: - kv 3: llama.embedding_length u32 = 2048
llama_model_loader: - kv 4: llama.block_count u32 = 22
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 5632
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 64
llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 4
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: general.file_type u32 = 1
llama_model_loader: - kv 11: tokenizer.ggml.model str = llama
llama_model_loader: - kv 12: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...llama_model_loader: - kv 13: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,61249] = ["▁ t", "e r", "i n", "▁ a", "e n...
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 19: tokenizer.ggml.add_bos_token bool =true
llama_model_loader: - kv 20: tokenizer.ggml.add_eos_token bool =false
llama_model_loader: - type f32: 45 tensors
llama_model_loader: - type f16: 156 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type= F16
print_info: file size = 2.05 GiB (16.00 BPW)
init_tokenizer: initializing tokenizer fortype 1
load: control token: 1 '<s>' is not marked as EOG
load: control token: 2 '</s>' is not marked as EOG
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 3
load: token to piece cache size = 0.1684 MB
print_info: arch = llama
print_info: vocab_only = 0
print_info: n_ctx_train = 2048
print_info: n_embd = 2048
print_info: n_layer = 22
print_info: n_head = 32
print_info: n_head_kv = 4
print_info: n_rot = 64
print_info: n_swa = 0
print_info: n_swa_pattern = 1
print_info: n_embd_head_k = 64
print_info: n_embd_head_v = 64
print_info: n_gqa = 8
print_info: n_embd_k_gqa = 256
print_info: n_embd_v_gqa = 256
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-05
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: f_attn_scale = 0.0e+00
print_info: n_ff = 5632
print_info: n_expert = 0
print_info: n_expert_used = 0
print_info: causal attn = 1
print_info: pooling type= 0
print_info: rope type= 0
print_info: rope scaling = linear
print_info: freq_base_train = 10000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 2048
print_info: rope_finetuned = unknown
print_info: ssm_d_conv = 0
print_info: ssm_d_inner = 0
print_info: ssm_d_state = 0
print_info: ssm_dt_rank = 0
print_info: ssm_dt_b_c_rms = 0
print_info: model type= 1B
print_info: model params = 1.10 B
print_info: general.name = huggingface
print_info: vocab type= SPM
print_info: n_vocab = 32000
print_info: n_merges = 0
print_info: BOS token = 1 '<s>'
print_info: EOS token = 2 '</s>'
print_info: UNK token = 0 '<unk>'
print_info: LF token = 13 '<0x0A>'
print_info: EOG token = 2 '</s>'
print_info: max token length = 48
load_tensors: loading model tensors, this can take a while... (mmap = true)
make_cpu_buft_list: disabling extra buffer types (i.e. repacking) since a GPU device is available
load_tensors: layer 0 assigned to device Metal, is_swa = 0
load_tensors: layer 1 assigned to device Metal, is_swa = 0
load_tensors: layer 2 assigned to device Metal, is_swa = 0
load_tensors: layer 3 assigned to device Metal, is_swa = 0
load_tensors: layer 4 assigned to device Metal, is_swa = 0
load_tensors: layer 5 assigned to device Metal, is_swa = 0
load_tensors: layer 6 assigned to device Metal, is_swa = 0
load_tensors: layer 7 assigned to device Metal, is_swa = 0
load_tensors: layer 8 assigned to device Metal, is_swa = 0
load_tensors: layer 9 assigned to device Metal, is_swa = 0
load_tensors: layer 10 assigned to device Metal, is_swa = 0
load_tensors: layer 11 assigned to device Metal, is_swa = 0
load_tensors: layer 12 assigned to device Metal, is_swa = 0
load_tensors: layer 13 assigned to device Metal, is_swa = 0
load_tensors: layer 14 assigned to device Metal, is_swa = 0
load_tensors: layer 15 assigned to device Metal, is_swa = 0
load_tensors: layer 16 assigned to device Metal, is_swa = 0
load_tensors: layer 17 assigned to device Metal, is_swa = 0
load_tensors: layer 18 assigned to device Metal, is_swa = 0
load_tensors: layer 19 assigned to device Metal, is_swa = 0
load_tensors: layer 20 assigned to device Metal, is_swa = 0
load_tensors: layer 21 assigned to device Metal, is_swa = 0
load_tensors: layer 22 assigned to device Metal, is_swa = 0
ggml_backend_metal_log_allocated_size: allocated buffer, size = 2098.36 MiB, ( 2098.44 / 5461.34)
load_tensors: offloading 22 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 23/23 layers to GPU
load_tensors: CPU_Mapped model buffer size = 125.00 MiB
load_tensors: Metal_Mapped model buffer size = 2098.36 MiB
..........................................................................................
Using 4 threads
llama_context: constructing llama_context
llama_context: n_seq_max = 1
llama_context: n_ctx = 2048
llama_context: n_ctx_per_seq = 2048
llama_context: n_batch = 2048
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = 0
llama_context: freq_base = 10000.0
llama_context: freq_scale = 1
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple A18 Pro GPU
ggml_metal_load_library: using embedded metal library
fopen failed for data file: errno = 2 (No such file or directory)
Errors found! Invalidating cache...
Warning: Compilation succeeded with:
program_source:3821:27: warning: comparison of integers of different signs: 'int' and 'const uint' (aka 'const unsigned int') [-Wsign-compare]
for (int j = 0; j < head_size; j += 4) {
~ ^ ~~~~~~~~~
program_source:479:28: warning: unused variable 'ksigns64' [-Wunused-const-variable]
GGML_TABLE_BEGIN(uint64_t, ksigns64, 128)
^
ggml_metal_init: GPU name: Apple A18 Pro GPU
ggml_metal_init: GPU family: MTLGPUFamilyApple9 (1009)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001)
ggml_metal_init: simdgroup reduction = true
ggml_metal_init: simdgroup matrix mul. = true
ggml_metal_init: has residency sets = true
ggml_metal_init: has bfloat = true
ggml_metal_init: use bfloat = true
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 5726.63 MB
fopen failed for data file: errno = 2 (No such file or directory)
Errors found! Invalidating cache...
ggml_metal_init: loaded kernel_add 0x3035eaee0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_add_row 0x303591860 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sub 0x303592280 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sub_row 0x3035ecde0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul 0x3035e14a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_row 0x3035e15c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_div 0x3035c2880 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_div_row 0x303589380 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_repeat_f32 0x3035c2820 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_repeat_f16 0x3035d6820 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_repeat_i32 0x3035ac060 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_repeat_i16 0x3035ac6c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_scale 0x3035b0060 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_scale_4 0x3035b06c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_clamp 0x3035b0780 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_tanh 0x3035b07e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_relu 0x3035b0840 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sigmoid 0x3035b08a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_gelu 0x3035b0900 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_gelu_4 0x3035b0960 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_gelu_quick 0x3035b09c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_gelu_quick_4 0x3035b0a20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_silu 0x3035b0a80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_silu_4 0x3035b0ae0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_elu 0x3035b0b40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_soft_max_f16 0x3035ac720 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_soft_max_f16_4 0x3035b0ba0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_soft_max_f32 0x3035acde0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_soft_max_f32_4 0x3035b0c00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_diag_mask_inf 0x3035ad140 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_diag_mask_inf_8 0x3035ad4a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_f32 0x3035b0f00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_f16 0x3035b1260 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_bf16 0x3035b1320 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q4_0 0x3035b1980 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q4_1 0x3035b1a40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q5_0 0x3035b1da0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q5_1 0x3035ad620 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q8_0 0x3035ad7a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q2_K 0x3035adb00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q3_K 0x3035ade60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q4_K 0x3035ae1c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q5_K 0x3035ae520 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q6_K 0x3035ae880 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq2_xxs 0x3035aebe0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq2_xs 0x3035aef40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq3_xxs 0x3035af2a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq3_s 0x3035b1680 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq2_s 0x3035b2400 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq1_s 0x3035b2460 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq1_m 0x3035b8120 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq4_nl 0x3035b2760 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq4_xs 0x3035b2ac0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_i32 0x3035b2b80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rms_norm 0x3035b2ee0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_l2_norm 0x3035b3240 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_group_norm 0x30358e220 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_norm 0x3035a8060 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_ssm_conv_f32 0x3035a8240 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_ssm_scan_f32 0x3035b3420 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rwkv_wkv6_f32 0x3035a8960 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rwkv_wkv7_f32 0x3035a91a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_f32_f32 0x3035a9200 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_bf16_f32 0x3035b3960 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_bf16_f32_1row 0x3035af960 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_bf16_f32_l4 0x3035a4060 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_bf16_bf16 0x3035a4780 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_f16_f32 0x3035a47e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_f16_f32_1row 0x3035a4f00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_f16_f32_l4 0x3035a4fc0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_f16_f16 0x3035a9380 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q4_0_f32 0x3035a9a40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q4_1_f32 0x3035aa100 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q5_0_f32 0x3035aa7c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q5_1_f32 0x3035a64c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q8_0_f32 0x3035a6b80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_f16_f32_r1_2 0x3035aa8e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_f16_f32_r1_3 0x3035aa940 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_f16_f32_r1_4 0x3035ab000 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_f16_f32_r1_5 0x3035ab6c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_0_f32_r1_2 0x3035ab720 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_0_f32_r1_3 0x3035a18c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_0_f32_r1_4 0x3035a20a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_0_f32_r1_5 0x3035a2880 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_1_f32_r1_2 0x3035a3060 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_1_f32_r1_3 0x3035a30c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_1_f32_r1_4 0x3035a3120 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_1_f32_r1_5 0x3035a3180 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_0_f32_r1_2 0x3035a31e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_0_f32_r1_3 0x3035a3240 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_0_f32_r1_4 0x30355e880 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_0_f32_r1_5 0x30355ff60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_1_f32_r1_2 0x30355ff00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_1_f32_r1_3 0x30355fea0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_1_f32_r1_4 0x30355fe40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_1_f32_r1_5 0x30355fde0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q8_0_f32_r1_2 0x30355fd80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q8_0_f32_r1_3 0x30355fd20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q8_0_f32_r1_4 0x30355fcc0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q8_0_f32_r1_5 0x30355fc60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_K_f32_r1_2 0x30355fc00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_K_f32_r1_3 0x30355fba0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_K_f32_r1_4 0x30355fb40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_K_f32_r1_5 0x30355fae0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_K_f32_r1_2 0x30355fa80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_K_f32_r1_3 0x30355fa20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_K_f32_r1_4 0x30355f9c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_K_f32_r1_5 0x30355f960 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q6_K_f32_r1_2 0x30355f900 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q6_K_f32_r1_3 0x30355f8a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q6_K_f32_r1_4 0x30355f060 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q6_K_f32_r1_5 0x30355bd20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_iq4_nl_f32_r1_2 0x303557e40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_iq4_nl_f32_r1_3 0x303557de0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_iq4_nl_f32_r1_4 0x30354c060 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_iq4_nl_f32_r1_5 0x30354c840 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q2_K_f32 0x30354c8a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q3_K_f32 0x303552880 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q4_K_f32 0x303553060 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q5_K_f32 0x303553840 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q6_K_f32 0x303553f00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq2_xxs_f32 0x30354d0e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq2_xs_f32 0x30354d7a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq3_xxs_f32 0x30354de60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq3_s_f32 0x30354e520 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq2_s_f32 0x3035487e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq1_s_f32 0x303548ea0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq1_m_f32 0x303549560 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq4_nl_f32 0x30354ed60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq4_xs_f32 0x303549c20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_f32_f32 0x303549ce0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_f16_f32 0x303549d40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_bf16_f32 0x303549da0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q4_0_f32 0x303544780 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q4_1_f32 0x3035447e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q5_0_f32 0x303544f00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q5_1_f32 0x303545620 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q8_0_f32 0x303545680 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q2_K_f32 0x3035456e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q3_K_f32 0x303545740 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q4_K_f32 0x3035457a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q5_K_f32 0x303545800 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q6_K_f32 0x303541680 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq2_xxs_f32 0x303545860 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq2_xs_f32 0x3035458c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq3_xxs_f32 0x303545920 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq3_s_f32 0x303545980 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq2_s_f32 0x3035459e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq1_s_f32 0x303545a40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq1_m_f32 0x30357c7e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq4_nl_f32 0x30357c840 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq4_xs_f32 0x30357cf60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_f32_f32 0x303545bc0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_f16_f32 0x3035462e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_bf16_f32 0x303547120 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q4_0_f32 0x303547180 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q4_1_f32 0x3035471e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q5_0_f32 0x303547720 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q5_1_f32 0x303547780 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q8_0_f32 0x30357e760 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q2_K_f32 0x30357eca0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q3_K_f32 0x30357f1e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q4_K_f32 0x30357f720 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q5_K_f32 0x30357f780 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q6_K_f32 0x30357f7e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq2_xxs_f32 0x30357f840 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq2_xs_f32 0x30357f8a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq3_xxs_f32 0x30357f900 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq3_s_f32 0x30357f960 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq2_s_f32 0x30357f9c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq1_s_f32 0x30357fa20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq1_m_f32 0x30357fa80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq4_nl_f32 0x30357fae0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq4_xs_f32 0x30357fb40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_f32_f32 0x30357fba0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_f16_f32 0x30357fc00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_bf16_f32 0x30357fc60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q4_0_f32 0x303574660 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q4_1_f32 0x303574c00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q5_0_f32 0x3035751a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q5_1_f32 0x303575740 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q8_0_f32 0x3035757a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q2_K_f32 0x303575800 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q3_K_f32 0x303575860 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q4_K_f32 0x3035758c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q5_K_f32 0x303575920 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q6_K_f32 0x303575980 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq2_xxs_f32 0x303572880 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq2_xs_f32 0x303572e20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq3_xxs_f32 0x3035733c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq3_s_f32 0x303573960 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq2_s_f32 0x3035760a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq1_s_f32 0x303576640 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq1_m_f32 0x303576be0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq4_nl_f32 0x303577180 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq4_xs_f32 0x3035771e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rope_norm_f32 0x30356cc60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rope_norm_f16 0x30356d7a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rope_neox_f32 0x30356e160 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rope_neox_f16 0x303577240 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_im2col_f16 0x303577300 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_im2col_f32 0x3035773c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_im2col_ext_f16 0x303568060 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_im2col_ext_f32 0x303568660 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_conv_transpose_1d_f32_f32 0x303568720 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_conv_transpose_1d_f16_f32 0x303568d20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_upscale_f32 0x303568fc0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_pad_f32 0x303569260 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_pad_reflect_1d_f32 0x30356eb20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_timestep_embedding_f32 0x30356f4e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_arange_f32 0x303573d20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_argsort_f32_i32_asc 0x303573c00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_argsort_f32_i32_desc 0x303569a40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_leaky_relu_f32 0x30356a0a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h64 0x303573b40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h80 0x303573ae0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h96 0x30356fea0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h112 0x30356ba20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h128 0x30356b9c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h256 0x303560060 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_bf16_h64 0x303560840 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_bf16_h80 0x3035608a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_bf16_h96 0x303564900 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_bf16_h112 0x3035609c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_bf16_h128 0x303560a20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_bf16_h256 0x303561200 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h64 0x3035660a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h80 0x303561aa0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h96 0x303561b00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h112 0x3035622e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h128 0x303562ac0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h256 0x303562b20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h64 0x303562b80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h80 0x30351c8a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h96 0x30351c900 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h112 0x303562ca0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h128 0x303562d00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h256 0x3035634e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h64 0x303563cc0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h80 0x303563d20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h96 0x303563d80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h112 0x303563de0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h128 0x303563e40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h256 0x3035190e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h64 0x3035148a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h80 0x303514900 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h96 0x303514960 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h112 0x3035149c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h128 0x303514a80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h256 0x303514de0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h64 0x303514e40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h80 0x303514ea0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h96 0x303514f00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h112 0x303514f60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h128 0x303511440 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h256 0x303512be0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_f16_h128 0x303512d00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_bf16_h128 0x303512d60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q4_0_h128 0x303512dc0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q4_1_h128 0x303512e20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q5_0_h128 0x303512e80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q5_1_h128 0x303512ee0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q8_0_h128 0x30350c8a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_f16_h256 0x30350c900 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_bf16_h256 0x30350d0e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q4_0_h256 0x30350d8c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q4_1_h256 0x30350d920 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q5_0_h256 0x30350d980 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q5_1_h256 0x30350d9e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q8_0_h256 0x303508900 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_set_f32 0x3035090e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_set_i32 0x30350a0a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_f32 0x30350a160 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_f16 0x30350a640 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_bf16 0x30350aca0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f16_f32 0x30350b300 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f16_f16 0x30350db00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_bf16_f32 0x30350e340 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_bf16_bf16 0x30350e820 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_q8_0 0x303504060 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_q4_0 0x303504660 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_q4_1 0x3035046c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_q5_0 0x303504720 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_q5_1 0x303504780 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_iq4_nl 0x3035047e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_q4_0_f32 0x303504840 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_q4_0_f16 0x3035048a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_q4_1_f32 0x303501920 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_q4_1_f16 0x303501f20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_q5_0_f32 0x303502520 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_q5_0_f16 0x303502b20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_q5_1_f32 0x303502b80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_q5_1_f16 0x303502be0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_q8_0_f32 0x303502c40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_q8_0_f16 0x303502d00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_concat 0x303502dc0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sqr 0x303505c20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sqrt 0x303506820 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sin 0x303506e80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cos 0x303506ee0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sum_rows 0x303506f40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_argmax 0x303506fa0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_pool_2d_avg_f32 0x303507900 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_pool_2d_max_f32 0x303507de0 | th_max = 1024 | th_width = 32
set_abort_callback: call
llama_context: CPU output buffer size = 0.12 MiB
llama_context: n_ctx = 2048
llama_context: n_ctx = 2048 (padded)
init: kv_size = 2048, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 22, can_shift = 1
init: layer 0: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer 1: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer 2: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer 3: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer 4: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer 5: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer 6: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer 7: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer 8: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer 9: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer 10: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer 11: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer 12: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer 13: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer 14: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer 15: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer 16: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer 17: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer 18: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer 19: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer 20: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: layer 21: n_embd_k_gqa = 256, n_embd_v_gqa = 256, dev = Metal
init: Metal KV buffer size = 44.00 MiB
llama_context: KV self size = 44.00 MiB, K (f16): 22.00 MiB, V (f16): 22.00 MiB
llama_context: enumerating backends
llama_context: backend_ptrs.size() = 3
llama_context: max_nodes = 65536
llama_context: n_tokens = 512, n_seqs = 1, n_outputs = 512
llama_context: Metal compute buffer size = 148.00 MiB
llama_context: CPU compute buffer size = 8.01 MiB
llama_context: graph nodes = 754
llama_context: graph splits = 2
Downloading model TinyLlama-1.1B Chat (Q8_0, 1.1 GiB) from https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q8_0.gguf?download=true
Requesting visual style in an implementation that has disabled it, returning nil. Behavior of caller is undefined.
Requesting visual style in an implementation that has disabled it, returning nil. Behavior of caller is undefined.
Requesting visual style in an implementation that has disabled it, returning nil. Behavior of caller is undefined.
Requesting visual style in an implementation that has disabled it, returning nil. Behavior of caller is undefined.
Requesting visual style in an implementation that has disabled it, returning nil. Behavior of caller is undefined.
Error: cancelled
Unable to simultaneously satisfy constraints.
Probably at least one of the constraints in the following list is one you don't want. Try this: (1) look at each constraint and try to figure out which you don't expect;
(2) find the code that added the unwanted constraint or constraints and fix it.
(
"<NSLayoutConstraint:0x3031c9cc0 'accessoryView.bottom' _UIRemoteKeyboardPlaceholderView:0x1151e5400.bottom == _UIKBCompatInputView:0x102b49f80.top (active)>",
"<NSLayoutConstraint:0x3031d95e0 'assistantHeight' SystemInputAssistantView.height == 45 (active, names: SystemInputAssistantView:0x115100600 )>",
"<NSLayoutConstraint:0x3032b4410 'assistantView.bottom' SystemInputAssistantView.bottom == _UIKBCompatInputView:0x102b49f80.top (active, names: SystemInputAssistantView:0x115100600 )>",
"<NSLayoutConstraint:0x3032b5540 'assistantView.top' V:[_UIRemoteKeyboardPlaceholderView:0x1151e5400]-(0)-[SystemInputAssistantView] (active, names: SystemInputAssistantView:0x115100600 )>"
)
Will attempt to recover by breaking constraint
<NSLayoutConstraint:0x3031d95e0 'assistantHeight' SystemInputAssistantView.height == 45 (active, names: SystemInputAssistantView:0x115100600 )>
Make a symbolic breakpoint at UIViewAlertForUnsatisfiableConstraints to catch this in the debugger.
The methods in the UIConstraintBasedLayoutDebugging category on UIView listed in<UIKitCore/UIView.h> may also be helpful.
attempting to complete"Hi"
n_len = 1024, n_ctx = 2048, n_kv_req = 1024
Hi
,
I
'm new to this forum. I'
m
trying
to
get
my
2
0
0
8
C
TS
-
V
to
run
a
2
0
1
0
Ford
F
-
1
5
0
6
.
8
L
engine
.
I
have
a
2
0
0
8
C
TS
-
V
with
the
6
.
0
L
engine
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
6
.
8
L
engine
.
I
have
a
2
0
0
8
C
TS
-
V
transmission
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
transmission
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
a
2
0
0
8
C
TS
-
V
rear
differential
.
I
have
a
2
0
1
0
Ford
F
-
1
5
0
rear
differential
.
I
have
Message from debugger: killed
The text was updated successfully, but these errors were encountered:
Name and Version
I follow the swiftui example to build an ios app
download and load the tinyllama-1.1b-f16
then I put "HI" in the textarea and press send button
then the respond just looks like the image
does anyone know why?
thx
Operating systems
Mac
GGML backends
CPU
Hardware
iphone 16 pro max
Models
tinyllama-1.1b-f16
Problem description & steps to reproduce
I follow the swiftui example to build an ios app
download and load the tinyllama-1.1b-f16
then I put "HI" in the textarea and press send button
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: