-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
LocalAI version:
LocalAI version: v3.5.4
All image variants have the same problem (AIO, CPU only, CUDA 11/12)
Environment, CPU architecture, OS, and Version:
VM: No
OS: Arch Linux x86_64
Kernel: 6.16.8-arch3-1
CPU: Intel Xeon E5-2696 v2 (24) @ 3.500GHz
GPU: NVIDIA GeForce RTX 2080 Ti Rev. A
Memory: 14289MiB / 32066MiB
Describe the bug
I've got the following docker-compose file, works on system that has AVX2, does not work on system which only has AVX.
services:
api:
container_name: localai
image: localai/localai:latest-aio-gpu-nvidia-cuda-12
# For a specific version:
#image: localai/localai:v3.5.4-aio-cpu
# For Nvidia GPUs decomment one of the following (cuda11 or cuda12):
# image: localai/localai:v3.5.4-aio-gpu-nvidia-cuda-11
# image: localai/localai:v3.5.4-aio-gpu-nvidia-cuda-12
# image: localai/localai:latest-aio-gpu-nvidia-cuda-11
# image: localai/localai:latest-aio-gpu-nvidia-cuda-12
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
interval: 1m
timeout: 20m
retries: 5
runtime: nvidia
ports:
- 33866:8080
environment:
- DEBUG=true
- NVIDIA_VISIBLE_DEVICES=all
# ...
volumes:
- models:/models:cached
- backends:/backends
- user-backends:/usr/share/localai/backends
volumes:
models:
backends:
user-backends:
To Reproduce
See docker-compose from above, run on a system with AVX2 and on a system without AVX2
With AVX2: Works
Without AVX2: Won't work
Expected behavior
Would be nice if it would work on systems without AVX2 CPU support.
Logs
LocalAI:
localai | CPU info:
localai | model name : Intel(R) Xeon(R) CPU E5-2696 v2 @ 2.50GHz
localai | flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush
dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1g
b rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf
pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti tpr_shadow flexpriority ept
vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts vnmi
localai | CPU: AVX found OK
localai | CPU: no AVX2 found
localai | CPU: no AVX512 found
...
localai | 9:01PM DBG GRPC(gpt-4-127.0.0.1:33445): stderr llama_context: CPU output buffer size = 0.49 MiB
localai | 9:01PM DBG GRPC(gpt-4-127.0.0.1:33445): stderr llama_kv_cache: CPU KV buffer size = 896.00 MiB
localai | 9:01PM DBG GRPC(gpt-4-127.0.0.1:33445): stderr llama_kv_cache: size = 896.00 MiB ( 8192 cells,
28 layers, 1/1 seqs), K (f16): 448.00 MiB, V (f16): 448.00 MiB
localai | 9:01PM ERR Failed to load model gpt-4 with backend llama-cpp error="failed to load model with
internal loader: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF"
modelID=gpt-4
localai | 9:01PM DBG No choices in the response, skipping
dmesg:
[100539.252358] traps: grpcpp_sync_ser[1047414] trap invalid opcode ip:7f52c87a194b sp:7f5284ff1110 error:0 in llama-cpp-avx[7a194b,7f52c80d0000+1358000]
[233335.656413] traps: grpcpp_sync_ser[326615] trap invalid opcode ip:7f21765a191b sp:7f2125ff3110 error:0 in llama-cpp-fallback[7a191b,7f2175ed0000+1334000]
I've also tried to rebuild LocalAI & backends with multiple compiler flags (e.g. the ones from the FAQ, but that did not help either.
https://localai.io/faq/#im-getting-a-sigill-error-whats-wrong
I also tried CPU-only build to exclude any issues with GPU, that did not help either.