[AutoScheduler] Add Dynamic Gradient Descent Search Algorithm for Auto-Tuning #17126

Lurkrazy · 2024-06-28T23:43:09Z

This PR introduces the Dynamic Gradient Descent (DGD) Search algorithm for accelerating the auto-tuning process of GPU kernels within the Ansor/AutoScheduler framework. The DGD algorithm is designed to explore the search space more efficiently than the existing Genetic Algorithm-based approach. The following changes are included:

Dynamic Gradient Descent Search:
- Implements a new search strategy that uses gradient descent in a multi-dimensional tile-space.
- Utilizes online measurements and proxy model to guide the search process.
Record Processor:
- A new class to handle the processing and modification of measure records.
- Includes methods to extract and modify SP node coordinates.

This implementation is based on the algorithm described in the paper "Accelerated Auto-Tuning of GPU Kernels for Tensor Computations" presented at ICS'24.

Experimental evaluation on a number of matrix-matrix multiplication and convolution kernels shows that the DGD algorithm achieves an order-of-magnitude improvement in auto-tuning time while maintaining comparable code performance.

Usage:

To use the DGD Search algorithm, instantiate the DynamicGradientSearchTuner class with the desired parameters and call the dynamic_gradient_search method.

Example:

tuner = auto_scheduler.dynamic_gradient_search.DynamicGradientSearchTuner(task, log_file, tune_option)
tuner.dynamic_gradient_search()

Experiments setup:

The experiments used the DGD Search algorithm with a time budget of 1 hour and full duration used by Ansor, comparing the performance achieved by Ansor after suggested trials. The models used for the evaluation were Bert, ResNet-50, and MobileNetV2, with the following configurations based on the Apache blog Introducing TVM Auto-scheduler (a.k.a. Ansor):

Bert: 12000 trials, running on an Nvidia RTX 4090 for 6 hours.
ResNet-50: 20000 trials, running on an Nvidia RTX 4090 for 10 hours.
MobileNetV2: 16000 trials, running on an Nvidia RTX 4090 for 7 hours.

Relative Performance of the DGD Search algorithm achieved in 1 hour and full duration used by Ansor

Networks	Ratio (1 hour)	Ratio (full)
Bert	93.71%	100.15%
ResNet-50	90.46%	96.73%
MobileNetV2	95.08%	101.75%

This table presents the relative performance of the DGD Search algorithm with a 1-hour time budget compared to the full duration used by Ansor. The performance ratios indicate the effectiveness of the Dynamic Gradient Descent Search algorithm in achieving comparable performance within a significantly reduced time frame.

[AutoScheduler] Add Dynamic Gradient Descent Search Algorithm for Auto-Tuning

cbalint13 · 2024-08-12T09:00:43Z

Thank you @Lurkrazy for this contribution !

I add Cc to relevant folks here: @comaniac @jcf94 @merrymercy @FrozenGene @minminsun @jinhongyii

cbalint13

(x) Could some references be added to the benchmarks, howtos and docs parts ?

(x) Also please make sure the CI issues (lint & build) are also all-in-green state.

tqchen · 2024-08-12T12:33:28Z

Given that we are migrating toward meta-schedule and may phase out auto-scheduler, i would suggest we bring new xhanges to that path.

Lurkrazy added 7 commits June 5, 2024 03:53

Dynamic gradient search

df8bbbc

Merge branch 'apache:main' into dev

26cfcb6

add test file

15733b2

remove print

0bf72f1

SMview as an option

484abb9

clean up

7ea6f52

Merge pull request #3 from HPCRL/dev

9f92e5d

[AutoScheduler] Add Dynamic Gradient Descent Search Algorithm for Auto-Tuning

Lurkrazy marked this pull request as ready for review June 29, 2024 00:07

Lurkrazy added 5 commits July 5, 2024 17:16

Merge remote-tracking branch 'refs/remotes/origin/main'

921cad3

Merge branch 'apache:main' into main

21df6c6

Merge remote-tracking branch 'refs/remotes/origin/main'

5892f7b

lint fix

442e092

fix lint

bad9462

cbalint13 added the tune:auto_scheduler src/auto_scheduler, python/tvm/auto_scheduler label Aug 12, 2024

cbalint13 self-assigned this Aug 12, 2024

cbalint13 requested changes Aug 12, 2024

View reviewed changes

tqchen closed this Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AutoScheduler] Add Dynamic Gradient Descent Search Algorithm for Auto-Tuning #17126

[AutoScheduler] Add Dynamic Gradient Descent Search Algorithm for Auto-Tuning #17126

Lurkrazy commented Jun 28, 2024 •

edited

Loading

Uh oh!

cbalint13 commented Aug 12, 2024

Uh oh!

cbalint13 left a comment

Uh oh!

tqchen commented Aug 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[AutoScheduler] Add Dynamic Gradient Descent Search Algorithm for Auto-Tuning #17126

[AutoScheduler] Add Dynamic Gradient Descent Search Algorithm for Auto-Tuning #17126

Conversation

Lurkrazy commented Jun 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Usage:

Example:

Experiments setup:

Relative Performance of the DGD Search algorithm achieved in 1 hour and full duration used by Ansor

Uh oh!

cbalint13 commented Aug 12, 2024

Uh oh!

cbalint13 left a comment

Choose a reason for hiding this comment

Uh oh!

tqchen commented Aug 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Lurkrazy commented Jun 28, 2024 •

edited

Loading