Our subjective listening study shows that Jointist achieves state-of-the-art performance on popular music, outperforming existing multi-instrument transcription models such as MT3. In our experiments, we assess the proposed model from various aspects, providing a new evaluation perspective for multi-instrument transcription. Its novelty, however, necessitates a new perspective on how to evaluate such a model. Our challenging problem formulation makes the model highly useful in the real world given that modern popular music typically consists of multiple instruments. This makes Jointist a flexible user-controllable framework. The instrument module is optional and can be directly controlled by human users. The joint training of the transcription and source separation modules serves to improve the performance of both tasks. Jointist consists of an instrument recognition module that conditions the other two modules: a transcription module that outputs instrument-specific piano rolls, and a source separation module that utilizes instrument information and transcription results. In this paper, we introduce Jointist, an instrument-aware multi-instrument framework that is capable of transcribing, recognizing, and separating multiple musical instruments from an audio clip.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |