Spaces:
Running
on
Zero
Running
on
Zero
Update README.md
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ fullWidth: true
|
|
| 14 |
---
|
| 15 |
|
| 16 |
<p align="center">
|
| 17 |
-
<img src="figs/logo.png" width="50%" />
|
| 18 |
</p>
|
| 19 |
|
| 20 |
|
|
@@ -33,135 +33,14 @@ fullWidth: true
|
|
| 33 |
|
| 34 |
Chunbo Hao<sup>*</sup>, Ruibin Yuan<sup>*</sup>, Jixun Yao, Qixin Deng, Xinyi Bai, Wei Xue, Lei Xie<sup>†</sup>
|
| 35 |
|
| 36 |
-
|
| 37 |
----
|
| 38 |
|
|
|
|
| 39 |
|
| 40 |
SongFormer is a music structure analysis framework that leverages multi-resolution self-supervised representations and heterogeneous supervision, accompanied by the large-scale multilingual dataset SongFormDB and the high-quality benchmark SongFormBench to foster fair and reproducible research.
|
| 41 |
|
| 42 |
-

|
| 43 |
-
|
| 44 |
-
## News and Updates
|
| 45 |
-
|
| 46 |
-
## 📋 To-Do List
|
| 47 |
-
|
| 48 |
-
- [x] Complete and push inference code to GitHub
|
| 49 |
-
- [x] Upload model checkpoint(s) to Hugging Face Hub
|
| 50 |
-
- [ ] Upload the paper to arXiv
|
| 51 |
-
- [x] Fix readme
|
| 52 |
-
- [ ] Deploy an out-of-the-box inference version on Hugging Face (via Inference API or Spaces)
|
| 53 |
-
- [ ] Publish the package to PyPI for easy installation via `pip`
|
| 54 |
-
- [ ] Open-source evaluation code
|
| 55 |
-
- [ ] Open-source training code
|
| 56 |
-
|
| 57 |
-
## Installation
|
| 58 |
-
|
| 59 |
-
### Setting up Python Environment
|
| 60 |
-
|
| 61 |
-
```bash
|
| 62 |
-
git clone https://github.com/ASLP-lab/SongFormer.git
|
| 63 |
-
|
| 64 |
-
# Get MuQ and MusicFM source code
|
| 65 |
-
git submodule update --init --recursive
|
| 66 |
-
|
| 67 |
-
conda create -n songformer python=3.10 -y
|
| 68 |
-
conda activate songformer
|
| 69 |
-
```
|
| 70 |
-
|
| 71 |
-
For users in mainland China, you may need to set up pip mirror source:
|
| 72 |
-
|
| 73 |
-
```bash
|
| 74 |
-
pip config set global.index-url https://pypi.mirrors.ustc.edu.cn/simple
|
| 75 |
-
```
|
| 76 |
-
|
| 77 |
-
Install dependencies:
|
| 78 |
-
|
| 79 |
-
```bash
|
| 80 |
-
pip install -r requirements.txt
|
| 81 |
-
```
|
| 82 |
-
|
| 83 |
-
We tested this on Ubuntu 22.04.1 LTS and it works normally. If you cannot install, you may need to remove version constraints in `requirements.txt`
|
| 84 |
-
|
| 85 |
-
### Download Pre-trained Models
|
| 86 |
-
|
| 87 |
-
```bash
|
| 88 |
-
cd src/SongFormer
|
| 89 |
-
# For users in mainland China, you can modify according to the py file instructions to use hf-mirror.com for downloading
|
| 90 |
-
python utils/fetch_pretrained.py
|
| 91 |
-
```
|
| 92 |
-
|
| 93 |
-
After downloading, you can verify the md5sum values in `src/SongFormer/ckpts/MusicFM/md5sum.txt` match the downloaded files:
|
| 94 |
-
|
| 95 |
-
```bash
|
| 96 |
-
md5sum ckpts/MusicFM/msd_stats.json
|
| 97 |
-
md5sum ckpts/MusicFM/pretrained_msd.pt
|
| 98 |
-
md5sum ckpts/SongFormer.safetensors
|
| 99 |
-
# md5sum ckpts/SongFormer.pt
|
| 100 |
-
```
|
| 101 |
-
|
| 102 |
-
## Inference
|
| 103 |
-
|
| 104 |
-
## Inference
|
| 105 |
-
|
| 106 |
-
### 1. One-Click Inference with HuggingFace Space (coming soon)
|
| 107 |
-
|
| 108 |
-
Available at: [https://huggingface.co/spaces/ASLP-lab/SongFormer](https://huggingface.co/spaces/ASLP-lab/SongFormer)
|
| 109 |
-
|
| 110 |
-
### 2. Gradio App
|
| 111 |
|
| 112 |
-
First, cd to the project root directory and activate the environment:
|
| 113 |
-
|
| 114 |
-
```bash
|
| 115 |
-
conda activate songformer
|
| 116 |
-
```
|
| 117 |
-
|
| 118 |
-
You can modify the server port and listening address in the last line of `app.py` according to your preference.
|
| 119 |
-
|
| 120 |
-
> If you're using an HTTP proxy, please ensure you include:
|
| 121 |
-
>
|
| 122 |
-
> ```bash
|
| 123 |
-
> export no_proxy="localhost, 127.0.0.1, ::1"
|
| 124 |
-
> export NO_PROXY="localhost, 127.0.0.1, ::1"
|
| 125 |
-
> ```
|
| 126 |
-
>
|
| 127 |
-
> Otherwise, Gradio may incorrectly assume the service hasn't started, causing startup to exit directly.
|
| 128 |
-
|
| 129 |
-
When first running `app.py`, it will connect to Hugging Face to download MuQ-related weights. We recommend creating an empty folder in an appropriate location and using `export HF_HOME=XXX` to point to this folder, so cache will be stored there for easy cleanup and transfer.
|
| 130 |
-
|
| 131 |
-
And for users in mainland China, you may need `export HF_ENDPOINT=https://hf-mirror.com`. For details, refer to https://hf-mirror.com/
|
| 132 |
-
|
| 133 |
-
```bash
|
| 134 |
-
python app.py
|
| 135 |
-
```
|
| 136 |
-
|
| 137 |
-
### 3. Python Code
|
| 138 |
-
|
| 139 |
-
You can refer to the file `src/SongFormer/infer/infer.py`. The corresponding execution script is located at `src/SongFormer/infer.sh`. This is a ready-to-use, single-machine, multi-process annotation script.
|
| 140 |
-
|
| 141 |
-
Below are some configurable parameters from the `src/SongFormer/infer.sh` script. You can set `CUDA_VISIBLE_DEVICES` to specify which GPUs to use:
|
| 142 |
-
|
| 143 |
-
```bash
|
| 144 |
-
-i # Input SCP folder path, each line containing the absolute path to one audio file
|
| 145 |
-
-o # Output directory for annotation results
|
| 146 |
-
--model # Annotation model; the default is 'SongFormer', change if using a fine-tuned model
|
| 147 |
-
--checkpoint # Path to the model checkpoint file
|
| 148 |
-
--config_pat # Path to the configuration file
|
| 149 |
-
-gn # Total number of GPUs to use — should match the number specified in CUDA_VISIBLE_DEVICES
|
| 150 |
-
-tn # Number of processes to run per GPU
|
| 151 |
-
```
|
| 152 |
-
|
| 153 |
-
You can control which GPUs are used by setting the `CUDA_VISIBLE_DEVICES` environment variable.
|
| 154 |
-
|
| 155 |
-
### 4. CLI Inference
|
| 156 |
-
|
| 157 |
-
Coming soon
|
| 158 |
-
|
| 159 |
-
### 4. Pitfall
|
| 160 |
-
|
| 161 |
-
- You may need to modify line 121 in `src/third_party/musicfm/model/musicfm_25hz.py` to:
|
| 162 |
-
`S = torch.load(model_path, weights_only=False)["state_dict"]`
|
| 163 |
-
|
| 164 |
-
## Training
|
| 165 |
|
| 166 |
## Citation
|
| 167 |
|
|
@@ -180,15 +59,4 @@ If our work and codebase is useful for you, please cite as:
|
|
| 180 |
````
|
| 181 |
## License
|
| 182 |
|
| 183 |
-
Our code is released under CC-BY-4.0 License.
|
| 184 |
-
|
| 185 |
-
## Contact Us
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
<p align="center">
|
| 189 |
-
<a href="http://www.nwpu-aslp.org/">
|
| 190 |
-
<img src="figs/aslp.png" width="400"/>
|
| 191 |
-
</a>
|
| 192 |
-
</p>
|
| 193 |
-
|
| 194 |
-
|
|
|
|
| 14 |
---
|
| 15 |
|
| 16 |
<p align="center">
|
| 17 |
+
<img src="https://github.com/ASLP-lab/SongFormer/blob/main/figs/logo.png?raw=true" width="50%" />
|
| 18 |
</p>
|
| 19 |
|
| 20 |
|
|
|
|
| 33 |
|
| 34 |
Chunbo Hao<sup>*</sup>, Ruibin Yuan<sup>*</sup>, Jixun Yao, Qixin Deng, Xinyi Bai, Wei Xue, Lei Xie<sup>†</sup>
|
| 35 |
|
|
|
|
| 36 |
----
|
| 37 |
|
| 38 |
+
**For more information, please visit our [github repository](https://github.com/ASLP-lab/SongFormer)**
|
| 39 |
|
| 40 |
SongFormer is a music structure analysis framework that leverages multi-resolution self-supervised representations and heterogeneous supervision, accompanied by the large-scale multilingual dataset SongFormDB and the high-quality benchmark SongFormBench to foster fair and reproducible research.
|
| 41 |
|
| 42 |
+

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |
## Citation
|
| 46 |
|
|
|
|
| 59 |
````
|
| 60 |
## License
|
| 61 |
|
| 62 |
+
Our code is released under CC-BY-4.0 License.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|