-
Notifications
You must be signed in to change notification settings - Fork 11.8k
sycl : reviewing the backend documentation #13544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
docs/backend/SYCL.md
Outdated
@@ -425,13 +427,13 @@ Examples: | |||
- Use device 0: | |||
|
|||
```sh | |||
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -no-cnv -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm none -mg 0 | |||
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -no-cnv -m models/Meta-Llama-3-8B-Instruct-Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm none -mg 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: according to https://github.com/intel/compute-runtime/blob/master/programmers-guide/SYSMAN.md#support-and-limitations ZES_ENABLE_SYSMAN
is not supported on newer arch, so I am not sure if we should already remove it in this PR or maybe the next time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. We do have a lot of warnings in the code, so changes to this should also reflect in the helper header. I'll add a todo to the end of the documentation
docs/backend/SYCL.md
Outdated
| GGML_SYCL_GRAPH | ON *(default)* \|OFF *(Optional)* | Enable build with [SYCL Graph extension](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc). | | ||
| CMAKE_C_COMPILER | `icx` *(Linux)*, `icx/cl` *(Windows)* | Set `icx` compiler for SYCL code path. | | ||
| CMAKE_CXX_COMPILER | `icpx` *(Linux)*, `icx` *(Windows)* | Set `icpx/icx` compiler for SYCL code path. | | ||
|
||
* The FP32 codepath used to have better on quantized models but latest results show similar performance in text generation. Check both `GGML_SYCL_F16` ON and OFF to check in your system, but take into accound that FP32 reduces Prompt processing performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The *
here doesn't match the *
in GGML_SYCL_F16
line. If it doesn't want to render properly I think you could use numbers to link the two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't remove llama2 in example guide and example/sycl/scripts.
Please don't remove "-s 0" in script.
They are important to me to track the quality of SYCL backend.
In recently, I use them to find the accuracy in the reorder feature PRs.
docs/backend/SYCL.md
Outdated
- **Nvidia & AMD Plugins**: These are plugins extending oneAPI's DPCPP support to SYCL on Nvidia and AMD GPU targets. | ||
|
||
### Llama.cpp + SYCL | ||
|
||
The llama.cpp SYCL backend is designed to support **Intel GPU** firstly. Based on the cross-platform feature of SYCL, it also supports other vendor GPUs: Nvidia and AMD. | ||
The llama.cpp SYCL backend is mainly designed to support **Intel GPUs**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest using original words.
It describe the history of SYCL backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I revised the wording to improve clarity, the original text remains the same.
docs/backend/SYCL.md
Outdated
| Intel built-in Arc GPU | Support | built-in Arc GPU in Meteor Lake, Arrow Lake | | ||
| Intel iGPU | Support | iGPU in 13700k,iGPU in 13400, i5-1250P, i7-1260P, i7-1165G7 | | ||
| Intel iGPU | Support | iGPU in 13700k, 13400, i5-1250P, i7-1260P, i7-1165G7, Ultra 7 268V | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultra 7 268V should move to the line for "built-in Arc GPU"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, yes you are right. I'll simply add Lunar Lake as I've tested it there.
docs/backend/SYCL.md
Outdated
@@ -351,7 +353,7 @@ cmake --build build --config Release -j -v | |||
|
|||
#### Retrieve and prepare model | |||
|
|||
You can refer to the general [*Prepare and Quantize*](README.md#prepare-and-quantize) guide for model prepration, or simply download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) model as example. | |||
You can refer to the general [*Prepare and Quantize*](README.md#prepare-and-quantize) guide for model prepration, or download an already quantized model like [Meta-Llama-3-8B-Instruct-Q4_0.gguf](https://huggingface.co/aptha/Meta-Llama-3-8B-Instruct-Q4_0-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q4_0.gguf). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest not using llama3 here.
- llama2 is used in performance test as base.
I build local performance test script to trace the performance change of SYCL backend.
It uses llama2.
If it's changed, the performance trend trace is broken. - SYCL backend has good optimized for it.
- If you want to promote llama3, you could add more LLMs as example, and add script for llama3.
Don't remove llama2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not trying to promote llama3, but to reflect the changes to the ecosystem in our documentation and avoid the appearance that the backend is tied only to a relatively old model. Nothing stops us from keep tracking the performance of the backend using llama2, I agree that is good to show the development of the backend, but that is not shown here. We have the Updates section for that.
What if we update the documentation references, revert the example script changes, and add another set of examples for llama3?
Let me know if that compromise works for you.
docs/backend/SYCL.md
Outdated
| GGML_SYCL_GRAPH | ON *(default)* \|OFF *(Optional)* | Enable build with [SYCL Graph extension](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc). | | ||
| CMAKE_C_COMPILER | `icx` *(Linux)*, `icx/cl` *(Windows)* | Set `icx` compiler for SYCL code path. | | ||
| CMAKE_CXX_COMPILER | `icpx` *(Linux)*, `icx` *(Windows)* | Set `icpx/icx` compiler for SYCL code path. | | ||
|
||
* The FP32 codepath used to have better on quantized models but latest results show similar performance in text generation. Check both `GGML_SYCL_F16` ON and OFF to check in your system, but take into accound that FP32 reduces Prompt processing performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to tell info to user, which data type is recommended.
FP32 or FP16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the description. So far all the models I've checked have the same performance in FP16 vs FP32. I'm hesitant to change the default build from FP32 to FP16 though.
@@ -779,17 +786,17 @@ use 1 SYCL GPUs: [0] with Max compute units:512 | |||
|
|||
It's same for other projects including llama.cpp SYCL backend. | |||
|
|||
- Meet issue: `Native API failed. Native API returns: -6 (PI_ERROR_OUT_OF_HOST_MEMORY) -6 (PI_ERROR_OUT_OF_HOST_MEMORY) -999 (UNKNOWN PI error)` or `failed to allocate SYCL0 buffer` | |||
- `Native API failed. Native API returns: 39 (UR_RESULT_ERROR_OUT_OF_DEVICE_MEMORY)`, `ggml_backend_sycl_buffer_type_alloc_buffer: can't allocate 3503030272 Bytes of memory on device`, or `failed to allocate SYCL0 buffer` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest keep the old log info when add new log in same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These old errors are no longer present in supported DPC++ versions. Anything older than 2025.0 will run into compilation issues, and if people want to build old releases, they should go to the documentation of the commit they are building.
examples/sycl/run-llama.sh
Outdated
@@ -19,9 +19,9 @@ if [ $# -gt 0 ]; then | |||
GGML_SYCL_DEVICE=$1 | |||
echo "use $GGML_SYCL_DEVICE as main GPU" | |||
#use signle GPU only | |||
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -m ${MODEL_FILE} -p "${INPUT_PROMPT}" -n 400 -e -ngl ${NGL} -s 0 -c ${CONEXT} -mg $GGML_SYCL_DEVICE -sm none | |||
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -m ${MODEL_FILE} -p "${INPUT_PROMPT}" -n 400 -e -ngl ${NGL} -c ${CONEXT} -mg $GGML_SYCL_DEVICE -sm none |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-s 0 should be kept.
That will make the result is same every time.
It's important to make sure the SYCL backend quality for accuracy.
Same comments for other script update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I answered above, I'm fine reverting those changes and leaving these scripts if they are useful to you. Although a new user running these doesn't really need a set seed, so I am not including that in the new one.
@@ -250,7 +252,7 @@ sycl-ls | |||
|
|||
- **Intel GPU** | |||
|
|||
When targeting an intel GPU, the user should expect one or more level-zero devices among the available SYCL devices. Please make sure that at least one GPU is present, for instance [`level_zero:gpu`] in the sample output below: | |||
When targeting an intel GPU, the user should expect one or more level-zero devices among the available SYCL devices. Please make sure that at least one GPU is present, for instance `[level_zero:gpu]` in the sample output below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to convey that level zero is absolutely required, Could we maybe rephrase it to something along the lines of:
"One can check whether sycl detects a GPU via sycl-ls
and should see their device being listed in the output of the same"
# Copyright (C) 2025 Intel Corporation | ||
# SPDX-License-Identifier: MIT | ||
|
||
export ONEAPI_DEVICE_SELECTOR="level_zero:0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason we are selecting this particularly ? Is this required ?
I would say let the runtime choose whatever it prefers
| Intel built-in Arc GPU | Support | built-in Arc GPU in Meteor Lake, Arrow Lake, Lunar Lake | | ||
| Intel iGPU | Support | iGPU in 13700k, 13400, i5-1250P, i7-1260P, i7-1165G7 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two lines convey more or less the same thing ?
Maybe let's remove the below one ? so that we needn't keep appending it time and again.
Also it's better to just mention the architecture than a very specific CPU model, as it will cover all the cases for that series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also remove the word "Arc" from above, as technically B580 and Lunar lake are not a part of the Arc series.
Lets rename that column from "Intel Arc Series" to say "Intel Discrete GPUs" ?
Similarly for the one below, "Intel Built-in Arc GPU" to just "Intel iGPUs"
@@ -750,7 +767,7 @@ use 1 SYCL GPUs: [0] with Max compute units:512 | |||
|
|||
## Q&A | |||
|
|||
- Error: `error while loading shared libraries: libsycl.so.7: cannot open shared object file: No such file or directory`. | |||
- Error: `error while loading shared libraries: libsycl.so.8: cannot open shared object file: No such file or directory`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Error: `error while loading shared libraries: libsycl.so.8: cannot open shared object file: No such file or directory`. | |
- Error: `error while loading shared libraries: libsycl.so: cannot open shared object file: No such file or directory`. |
so that we needn't update with every release, just mentioning libsycl.so is sufficient
Small updates to the docs and examples of the SYCL backend.
I removed the seed from the examples, as its purpose wasn't clear, raise any objections if you have them.
Also, added a mention of the SYCL docker images that are being build in the CI (in the docker.md).