Skip to content

[CI] Some improvements to Nightly reports summaries #11166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Jun 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
213 changes: 114 additions & 99 deletions .github/workflows/nightly_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,9 @@
PYTEST_TIMEOUT: 600
RUN_SLOW: yes
RUN_NIGHTLY: yes
PIPELINE_USAGE_CUTOFF: 5000
PIPELINE_USAGE_CUTOFF: 0
SLACK_API_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
CONSOLIDATED_REPORT_PATH: consolidated_test_report.md

jobs:
setup_torch_cuda_pipeline_matrix:
Expand Down Expand Up @@ -99,11 +100,6 @@
with:
name: pipeline_${{ matrix.module }}_test_reports
path: reports
- name: Generate Report and Notify Channel
if: always()
run: |
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY

run_nightly_tests_for_other_torch_modules:
name: Nightly Torch CUDA Tests
Expand Down Expand Up @@ -142,7 +138,6 @@
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
CUBLAS_WORKSPACE_CONFIG: :16:8
RUN_COMPILE: yes
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-s -v -k "not Flax and not Onnx" \
Expand Down Expand Up @@ -175,12 +170,6 @@
name: torch_${{ matrix.module }}_cuda_test_reports
path: reports

- name: Generate Report and Notify Channel
if: always()
run: |
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY

run_torch_compile_tests:
name: PyTorch Compile CUDA tests

Expand Down Expand Up @@ -224,12 +213,6 @@
name: torch_compile_test_reports
path: reports

- name: Generate Report and Notify Channel
if: always()
run: |
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY

run_big_gpu_torch_tests:
name: Torch tests on big GPU
strategy:
Expand Down Expand Up @@ -280,12 +263,7 @@
with:
name: torch_cuda_big_gpu_test_reports
path: reports
- name: Generate Report and Notify Channel
if: always()
run: |
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY


torch_minimum_version_cuda_tests:
name: Torch Minimum Version CUDA Tests
runs-on:
Expand Down Expand Up @@ -342,63 +320,6 @@
with:
name: torch_minimum_version_cuda_test_reports
path: reports

run_flax_tpu_tests:
name: Nightly Flax TPU Tests
runs-on:
group: gcp-ct5lp-hightpu-8t
if: github.event_name == 'schedule'

container:
image: diffusers/diffusers-flax-tpu
options: --shm-size "16gb" --ipc host --privileged ${{ vars.V5_LITEPOD_8_ENV}} -v /mnt/hf_cache:/mnt/hf_cache
defaults:
run:
shell: bash
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2

- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
python -m uv pip install pytest-reportlog

- name: Environment
run: python utils/print_env.py

- name: Run nightly Flax TPU tests
env:
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
run: |
python -m pytest -n 0 \
-s -v -k "Flax" \
--make-reports=tests_flax_tpu \
--report-log=tests_flax_tpu.log \
tests/

- name: Failure short reports
if: ${{ failure() }}
run: |
cat reports/tests_flax_tpu_stats.txt
cat reports/tests_flax_tpu_failures_short.txt

- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: flax_tpu_test_reports
path: reports

- name: Generate Report and Notify Channel
if: always()
run: |
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY

run_nightly_onnx_tests:
name: Nightly ONNXRuntime CUDA tests on Ubuntu
Expand All @@ -423,69 +344,63 @@
python -m uv pip install -e [quality,test]
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
python -m uv pip install pytest-reportlog
- name: Environment
run: python utils/print_env.py

- name: Run Nightly ONNXRuntime CUDA tests
env:
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-s -v -k "Onnx" \
--make-reports=tests_onnx_cuda \
--report-log=tests_onnx_cuda.log \
tests/

- name: Failure short reports
if: ${{ failure() }}
run: |
cat reports/tests_onnx_cuda_stats.txt
cat reports/tests_onnx_cuda_failures_short.txt

- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: tests_onnx_cuda_reports
path: reports

- name: Generate Report and Notify Channel
if: always()
run: |
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY

run_nightly_quantization_tests:
name: Torch quantization nightly tests
strategy:
fail-fast: false
max-parallel: 2
matrix:
matrix:
config:
- backend: "bitsandbytes"
test_location: "bnb"
additional_deps: ["peft"]
- backend: "gguf"
test_location: "gguf"
additional_deps: ["peft"]
- backend: "torchao"
test_location: "torchao"
additional_deps: []
- backend: "optimum_quanto"
test_location: "quanto"
additional_deps: []
runs-on:
group: aws-g6e-xlarge-plus
container:
image: diffusers/diffusers-pytorch-cuda
options: --shm-size "20gb" --ipc host --gpus 0
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: NVIDIA-SMI
run: nvidia-smi

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}
- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
Expand Down Expand Up @@ -520,12 +435,7 @@
with:
name: torch_cuda_${{ matrix.config.backend }}_reports
path: reports
- name: Generate Report and Notify Channel
if: always()
run: |
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY


run_nightly_pipeline_level_quantization_tests:
name: Torch quantization nightly tests
strategy:
Expand Down Expand Up @@ -574,124 +484,229 @@
with:
name: torch_cuda_pipeline_level_quant_reports
path: reports
- name: Generate Report and Notify Channel
if: always()

run_flax_tpu_tests:
name: Nightly Flax TPU Tests
runs-on:
group: gcp-ct5lp-hightpu-8t
if: github.event_name == 'schedule'

container:
image: diffusers/diffusers-flax-tpu
options: --shm-size "16gb" --ipc host --privileged ${{ vars.V5_LITEPOD_8_ENV}} -v /mnt/hf_cache:/mnt/hf_cache
defaults:
run:
shell: bash
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2

- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
python -m uv pip install pytest-reportlog

- name: Environment
run: python utils/print_env.py

- name: Run nightly Flax TPU tests
env:
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
run: |
python -m pytest -n 0 \
-s -v -k "Flax" \
--make-reports=tests_flax_tpu \
--report-log=tests_flax_tpu.log \
tests/

- name: Failure short reports
if: ${{ failure() }}
run: |
cat reports/tests_flax_tpu_stats.txt
cat reports/tests_flax_tpu_failures_short.txt

- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: flax_tpu_test_reports
path: reports

generate_consolidated_report:
name: Generate Consolidated Test Report
needs: [
run_nightly_tests_for_torch_pipelines,
run_nightly_tests_for_other_torch_modules,
run_torch_compile_tests,
run_big_gpu_torch_tests,
run_nightly_quantization_tests,
run_nightly_pipeline_level_quantization_tests,
run_nightly_onnx_tests,
torch_minimum_version_cuda_tests,
run_flax_tpu_tests
]
if: always()
runs-on:
group: aws-general-8-plus
container:
image: diffusers/diffusers-pytorch-cpu
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2

- name: Create reports directory
run: mkdir -p combined_reports

- name: Download all test reports
uses: actions/download-artifact@v4
with:
path: artifacts

- name: Prepare reports
run: |
# Move all report files to a single directory for processing
find artifacts -name "*.txt" -exec cp {} combined_reports/ \;

- name: Install dependencies
run: |
pip install -e .[test]
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY


- name: Generate consolidated report
run: |
python utils/consolidated_test_report.py \
--reports_dir combined_reports \
--output_file $CONSOLIDATED_REPORT_PATH \
--slack_channel_name diffusers-ci-nightly

- name: Show consolidated report
run: |
cat $CONSOLIDATED_REPORT_PATH >> $GITHUB_STEP_SUMMARY

- name: Upload consolidated report
uses: actions/upload-artifact@v4
with:
name: consolidated_test_report
path: ${{ env.CONSOLIDATED_REPORT_PATH }}

# M1 runner currently not well supported
# TODO: (Dhruv) add these back when we setup better testing for Apple Silicon
# run_nightly_tests_apple_m1:
# name: Nightly PyTorch MPS tests on MacOS
# runs-on: [ self-hosted, apple-m1 ]
# if: github.event_name == 'schedule'
#
# steps:
# - name: Checkout diffusers
# uses: actions/checkout@v3
# with:
# fetch-depth: 2
#
# - name: Clean checkout
# shell: arch -arch arm64 bash {0}
# run: |
# git clean -fxd
# - name: Setup miniconda
# uses: ./.github/actions/setup-miniconda
# with:
# python-version: 3.9
#
# - name: Install dependencies
# shell: arch -arch arm64 bash {0}
# run: |
# ${CONDA_RUN} python -m pip install --upgrade pip uv
# ${CONDA_RUN} python -m uv pip install -e [quality,test]
# ${CONDA_RUN} python -m uv pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
# ${CONDA_RUN} python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate
# ${CONDA_RUN} python -m uv pip install pytest-reportlog
# - name: Environment
# shell: arch -arch arm64 bash {0}
# run: |
# ${CONDA_RUN} python utils/print_env.py
# - name: Run nightly PyTorch tests on M1 (MPS)
# shell: arch -arch arm64 bash {0}
# env:
# HF_HOME: /System/Volumes/Data/mnt/cache
# HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
# run: |
# ${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps \
# --report-log=tests_torch_mps.log \
# tests/
# - name: Failure short reports
# if: ${{ failure() }}
# run: cat reports/tests_torch_mps_failures_short.txt
#
# - name: Test suite reports artifacts
# if: ${{ always() }}
# uses: actions/upload-artifact@v4
# with:
# name: torch_mps_test_reports
# path: reports
#
# - name: Generate Report and Notify Channel
# if: always()
# run: |
# pip install slack_sdk tabulate
# python utils/log_reports.py >> $GITHUB_STEP_SUMMARY run_nightly_tests_apple_m1:
# name: Nightly PyTorch MPS tests on MacOS
# runs-on: [ self-hosted, apple-m1 ]
# if: github.event_name == 'schedule'
#
# steps:
# - name: Checkout diffusers
# uses: actions/checkout@v3
# with:
# fetch-depth: 2
#
# - name: Clean checkout
# shell: arch -arch arm64 bash {0}
# run: |
# git clean -fxd
# - name: Setup miniconda
# uses: ./.github/actions/setup-miniconda
# with:
# python-version: 3.9
#
# - name: Install dependencies
# shell: arch -arch arm64 bash {0}
# run: |
# ${CONDA_RUN} python -m pip install --upgrade pip uv
# ${CONDA_RUN} python -m uv pip install -e [quality,test]
# ${CONDA_RUN} python -m uv pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
# ${CONDA_RUN} python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate
# ${CONDA_RUN} python -m uv pip install pytest-reportlog
# - name: Environment
# shell: arch -arch arm64 bash {0}
# run: |
# ${CONDA_RUN} python utils/print_env.py
# - name: Run nightly PyTorch tests on M1 (MPS)
# shell: arch -arch arm64 bash {0}
# env:
# HF_HOME: /System/Volumes/Data/mnt/cache
# HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
# run: |
# ${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps \
# --report-log=tests_torch_mps.log \
# tests/
# - name: Failure short reports
# if: ${{ failure() }}
# run: cat reports/tests_torch_mps_failures_short.txt
#
# - name: Test suite reports artifacts
# if: ${{ always() }}
# uses: actions/upload-artifact@v4
# with:
# name: torch_mps_test_reports
# path: reports
#
# - name: Generate Report and Notify Channel
# if: always()
# run: |
# pip install slack_sdk tabulate
# python utils/log_reports.py >> $GITHUB_STEP_SUMMARY

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}
Loading
Loading