Skip to content

LocalAI requires AVX2 on CPU — rpc error on older CPUs #6348

@dennisvanderpool

Description

@dennisvanderpool

LocalAI version:
LocalAI version: v3.5.4
All image variants have the same problem (AIO, CPU only, CUDA 11/12)

Environment, CPU architecture, OS, and Version:
VM: No
OS: Arch Linux x86_64
Kernel: 6.16.8-arch3-1
CPU: Intel Xeon E5-2696 v2 (24) @ 3.500GHz
GPU: NVIDIA GeForce RTX 2080 Ti Rev. A
Memory: 14289MiB / 32066MiB

Describe the bug
I've got the following docker-compose file, works on system that has AVX2, does not work on system which only has AVX.

services:
  api:
    container_name: localai
    image: localai/localai:latest-aio-gpu-nvidia-cuda-12
    # For a specific version:
    #image: localai/localai:v3.5.4-aio-cpu
    # For Nvidia GPUs decomment one of the following (cuda11 or cuda12):
    # image: localai/localai:v3.5.4-aio-gpu-nvidia-cuda-11
    # image: localai/localai:v3.5.4-aio-gpu-nvidia-cuda-12
    # image: localai/localai:latest-aio-gpu-nvidia-cuda-11
    # image: localai/localai:latest-aio-gpu-nvidia-cuda-12
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 5
    runtime: nvidia
    ports:
      - 33866:8080
    environment:
      - DEBUG=true
      - NVIDIA_VISIBLE_DEVICES=all
      # ...
    volumes:
      - models:/models:cached
      - backends:/backends
      - user-backends:/usr/share/localai/backends

volumes:
  models:
  backends:
  user-backends:

To Reproduce
See docker-compose from above, run on a system with AVX2 and on a system without AVX2
With AVX2: Works
Without AVX2: Won't work

Expected behavior
Would be nice if it would work on systems without AVX2 CPU support.

Logs
LocalAI:

localai  | CPU info:
localai  | model name   : Intel(R) Xeon(R) CPU E5-2696 v2 @ 2.50GHz
localai  | flags                : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush
dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1g
b rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf
pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti tpr_shadow flexpriority ept
vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts vnmi
localai  | CPU:    AVX    found OK
localai  | CPU: no AVX2   found
localai  | CPU: no AVX512 found
...
localai  | 9:01PM DBG GRPC(gpt-4-127.0.0.1:33445): stderr llama_context: CPU  output buffer size =     0.49 MiB
localai  | 9:01PM DBG GRPC(gpt-4-127.0.0.1:33445): stderr llama_kv_cache: CPU KV buffer size =   896.00 MiB
localai  | 9:01PM DBG GRPC(gpt-4-127.0.0.1:33445): stderr llama_kv_cache: size =  896.00 MiB (  8192 cells,  
28 layers,  1/1 seqs), K (f16):  448.00 MiB, V (f16):  448.00 MiB
localai  | 9:01PM ERR Failed to load model gpt-4 with backend llama-cpp error="failed to load model with 
internal loader: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF"
modelID=gpt-4
localai  | 9:01PM DBG No choices in the response, skipping

dmesg:

[100539.252358] traps: grpcpp_sync_ser[1047414] trap invalid opcode ip:7f52c87a194b sp:7f5284ff1110 error:0 in llama-cpp-avx[7a194b,7f52c80d0000+1358000]
[233335.656413] traps: grpcpp_sync_ser[326615] trap invalid opcode ip:7f21765a191b sp:7f2125ff3110 error:0 in llama-cpp-fallback[7a191b,7f2175ed0000+1334000]

I've also tried to rebuild LocalAI & backends with multiple compiler flags (e.g. the ones from the FAQ, but that did not help either.
https://localai.io/faq/#im-getting-a-sigill-error-whats-wrong
I also tried CPU-only build to exclude any issues with GPU, that did not help either.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions