Repository for creating CoreML models for music source separation for on-device inference. This repository enables conversion of trained music source separation models to CoreML format for efficient execution on Apple devices.
Thanks to ZFTurbo for the original model and inference implementations.
Currently compatible models for CoreML conversion:
- MDX23C based on KUIELab TFC TDF v3 architecture. Key:
mdx23c. - Demucs4HT [Paper]. Key:
htdemucs. - Band Split RoFormer [Paper, Repository] . Key:
bs_roformer. - Mel-Band RoFormer [Paper, Repository]. Key:
mel_band_roformer. - SCNet [Paper, Official Repository, Unofficial Repository] Key:
scnet.
Note: Thanks to @lucidrains for recreating the RoFormer models based on papers.
To convert a trained model to CoreML format:
- Run the model conversion script:
python model_coreml_conversion.py \
--model_type <model_type> \
--config_path <config_path> \
--checkpoint <checkpoint_path>- Since iSTFT is not supported by CoreML, you must export it separately:
python istft_coreml_conversion.py \
--model_type <model_type> \
--config_path <config_path> \
--checkpoint <checkpoint_path># Convert the main model
python model_coreml_conversion.py \
--model_type mel_band_roformer \
--config_path configs/config_mel_band_roformer_vocals.yaml \
--checkpoint results/model.ckpt
# Convert the iSTFT component separately
python istft_coreml_conversion.py \
--model_type mel_band_roformer \
--config_path configs/config_mel_band_roformer_vocals.yaml \
--checkpoint results/model.ckptTo test your converted CoreML models:
python test_coreml_conversion.py <path_to_model.mlpackage> <path_to_istft.mlpackage>python test_coreml_conversion.py model.mlpackage istft.mlpackageFor regular inference without CoreML to test modules, you can run:
python inference_coreml.py \
--model_type mdx23c \
--config_path configs/config_mdx23c_musdb18.yaml \
--start_check_point results/last_mdx23c.ckpt \
--input_folder input/wavs/ \
--store_dir separation_results/This uses the same arguments as the original inference.py script.
- CoreML models are optimized for on-device inference on Apple hardware (iOS, macOS).
- The iSTFT component must be exported separately due to CoreML limitations.
- For fastest runtime performance, consider implementing iSTFT directly in Swift or C++. The iSTFT conversion provided here is for convenience.
- Make sure you have the necessary dependencies installed for CoreML conversion.
configs/config_*.yaml- configuration files for modelsmodels/*- set of available models for training and inferencedataset.py- dataset which creates new samples for traininggui-wx.py- GUI interface for codeinference.py- process folder with music files and separate themtrain.py- main training codetrain_accelerate.py- experimental training code to use withacceleratemodule. Speed up for MultiGPU.utils.py- common functions used by train/validvalid.py- validation of model with metricsensemble.py- useful script to ensemble results of different models to make results better (see docs).
Look here: List of Pre-trained models
If you trained some good models, please, share them. You can post config and model weights in this issue.
Look here: Dataset types
Look here: Augmentations
Look here: GUI documentation or see tutorial on Youtube
@misc{solovyev2023benchmarks,
title={Benchmarks and leaderboards for sound demixing tasks},
author={Roman Solovyev and Alexander Stempkovskiy and Tatiana Habruseva},
year={2023},
eprint={2305.07489},
archivePrefix={arXiv},
primaryClass={cs.SD}
}