[AutoScheduler] Add Dynamic Gradient Descent Search Algorithm for Auto-Tuning #17126
+757
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces the Dynamic Gradient Descent (DGD) Search algorithm for accelerating the auto-tuning process of GPU kernels within the Ansor/AutoScheduler framework. The DGD algorithm is designed to explore the search space more efficiently than the existing Genetic Algorithm-based approach. The following changes are included:
Dynamic Gradient Descent Search:
Record Processor:
This implementation is based on the algorithm described in the paper "Accelerated Auto-Tuning of GPU Kernels for Tensor Computations" presented at ICS'24.
Experimental evaluation on a number of matrix-matrix multiplication and convolution kernels shows that the DGD algorithm achieves an order-of-magnitude improvement in auto-tuning time while maintaining comparable code performance.
Usage:
To use the DGD Search algorithm, instantiate the
DynamicGradientSearchTuner
class with the desired parameters and call thedynamic_gradient_search
method.Example:
Experiments setup:
The experiments used the DGD Search algorithm with a time budget of 1 hour and full duration used by Ansor, comparing the performance achieved by Ansor after suggested trials. The models used for the evaluation were Bert, ResNet-50, and MobileNetV2, with the following configurations based on the Apache blog Introducing TVM Auto-scheduler (a.k.a. Ansor):
Relative Performance of the DGD Search algorithm achieved in 1 hour and full duration used by Ansor
This table presents the relative performance of the DGD Search algorithm with a 1-hour time budget compared to the full duration used by Ansor. The performance ratios indicate the effectiveness of the Dynamic Gradient Descent Search algorithm in achieving comparable performance within a significantly reduced time frame.