Skip to content

Conversation

Lurkrazy
Copy link

@Lurkrazy Lurkrazy commented Jun 28, 2024

This PR introduces the Dynamic Gradient Descent (DGD) Search algorithm for accelerating the auto-tuning process of GPU kernels within the Ansor/AutoScheduler framework. The DGD algorithm is designed to explore the search space more efficiently than the existing Genetic Algorithm-based approach. The following changes are included:

  1. Dynamic Gradient Descent Search:

    • Implements a new search strategy that uses gradient descent in a multi-dimensional tile-space.
    • Utilizes online measurements and proxy model to guide the search process.
  2. Record Processor:

    • A new class to handle the processing and modification of measure records.
    • Includes methods to extract and modify SP node coordinates.

This implementation is based on the algorithm described in the paper "Accelerated Auto-Tuning of GPU Kernels for Tensor Computations" presented at ICS'24.

Experimental evaluation on a number of matrix-matrix multiplication and convolution kernels shows that the DGD algorithm achieves an order-of-magnitude improvement in auto-tuning time while maintaining comparable code performance.

Usage:

To use the DGD Search algorithm, instantiate the DynamicGradientSearchTuner class with the desired parameters and call the dynamic_gradient_search method.

Example:

tuner = auto_scheduler.dynamic_gradient_search.DynamicGradientSearchTuner(task, log_file, tune_option)
tuner.dynamic_gradient_search()

Experiments setup:

The experiments used the DGD Search algorithm with a time budget of 1 hour and full duration used by Ansor, comparing the performance achieved by Ansor after suggested trials. The models used for the evaluation were Bert, ResNet-50, and MobileNetV2, with the following configurations based on the Apache blog Introducing TVM Auto-scheduler (a.k.a. Ansor):

  • Bert: 12000 trials, running on an Nvidia RTX 4090 for 6 hours.
  • ResNet-50: 20000 trials, running on an Nvidia RTX 4090 for 10 hours.
  • MobileNetV2: 16000 trials, running on an Nvidia RTX 4090 for 7 hours.

Relative Performance of the DGD Search algorithm achieved in 1 hour and full duration used by Ansor

Networks Ratio (1 hour) Ratio (full)
Bert 93.71% 100.15%
ResNet-50 90.46% 96.73%
MobileNetV2 95.08% 101.75%

This table presents the relative performance of the DGD Search algorithm with a 1-hour time budget compared to the full duration used by Ansor. The performance ratios indicate the effectiveness of the Dynamic Gradient Descent Search algorithm in achieving comparable performance within a significantly reduced time frame.

@Lurkrazy Lurkrazy marked this pull request as ready for review June 29, 2024 00:07
@cbalint13
Copy link
Contributor

Thank you @Lurkrazy for this contribution !

I add Cc to relevant folks here: @comaniac @jcf94 @merrymercy @FrozenGene @minminsun @jinhongyii

@cbalint13 cbalint13 added the tune:auto_scheduler src/auto_scheduler, python/tvm/auto_scheduler label Aug 12, 2024
@cbalint13 cbalint13 self-assigned this Aug 12, 2024
Copy link
Contributor

@cbalint13 cbalint13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(x) Could some references be added to the benchmarks, howtos and docs parts ?

(x) Also please make sure the CI issues (lint & build) are also all-in-green state.

@tqchen
Copy link
Member

tqchen commented Aug 12, 2024

Given that we are migrating toward meta-schedule and may phase out auto-scheduler, i would suggest we bring new xhanges to that path.

@tqchen tqchen closed this Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tune:auto_scheduler src/auto_scheduler, python/tvm/auto_scheduler

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants