complete model characteristics
Browse files
README.md
CHANGED
|
@@ -17,8 +17,23 @@ Trained model weights and training datasets for the paper:
|
|
| 17 |
### Stage 1: "Compose" model
|
| 18 |
Generates **melody and chord progression** from scratch.
|
| 19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
### Stage 2: "Embellish" model
|
| 21 |
Generates **accompaniment, timing and dynamics** conditioned on Stage 1 outputs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
## BibTex
|
| 24 |
If you find the materials useful, please consider citing our work:
|
|
|
|
| 17 |
### Stage 1: "Compose" model
|
| 18 |
Generates **melody and chord progression** from scratch.
|
| 19 |
|
| 20 |
+
- Model backbone: 12-layer Transformer w/ relative positional encoding
|
| 21 |
+
- Num trainable params: 41.3M
|
| 22 |
+
- Token vocabulary: [Revamped MIDI-derived events](https://arxiv.org/abs/2002.00212) (**REMI**) w/ slight modifications
|
| 23 |
+
- Pretraining dataset: subset of [Lakh MIDI full](https://colinraffel.com/projects/lmd/) (**LMD-full**), 14934 songs
|
| 24 |
+
- melody extraction (and data filtering) done by **matching lyrics to tracks**: https://github.com/gulnazaki/lyrics-melody/blob/main/pre-processing/create_dataset.py
|
| 25 |
+
- structural segmentation done with **A\* search**: https://github.com/Dsqvival/hierarchical-structure-analysis
|
| 26 |
+
- Finetuning dataset: subset of [AILabs.tw Pop1K7](https://github.com/YatingMusic/compound-word-transformer) (**Pop1K7**), 1591 songs
|
| 27 |
+
- melody extraction done with **skyline algorithm**: https://github.com/wazenmai/MIDI-BERT/blob/CP/melody_extraction/skyline/analyzer.py
|
| 28 |
+
- structural segmentation done in the same way as pretraining dataset
|
| 29 |
+
- Training sequence length: 2400
|
| 30 |
### Stage 2: "Embellish" model
|
| 31 |
Generates **accompaniment, timing and dynamics** conditioned on Stage 1 outputs.
|
| 32 |
+
- Model backbone: 12-layer **Performer** ([paper](https://arxiv.org/abs/2009.14794), [implementation](https://github.com/idiap/fast-transformers))
|
| 33 |
+
- Num trainable params: 38.2M
|
| 34 |
+
- Token vocabulary: [Revamped MIDI-derived events](https://arxiv.org/abs/2002.00212) (**REMI**) w/ slight modifications
|
| 35 |
+
- Training dataset: [AILabs.tw Pop1K7](https://github.com/YatingMusic/compound-word-transformer) (**Pop1K7**), 1747 songs
|
| 36 |
+
- Training sequence length: 3072
|
| 37 |
|
| 38 |
## BibTex
|
| 39 |
If you find the materials useful, please consider citing our work:
|