| ## Training of MiniGPT-4 | |
| The training of MiniGPT-4 contains two alignment stages. | |
| **1. First pretraining stage** | |
| In the first pretrained stage, the model is trained using image-text pairs from Laion and CC datasets | |
| to align the vision and language model. To download and prepare the datasets, please check | |
| our [first stage dataset preparation instruction](dataset/README_1_STAGE.md). | |
| After the first stage, the visual features are mapped and can be understood by the language | |
| model. | |
| To launch the first stage training, run the following command. In our experiments, we use 4 A100. | |
| You can change the save path in the config file | |
| [train_configs/minigpt4_stage1_pretrain.yaml](train_configs/minigpt4_stage1_pretrain.yaml) | |
| ```bash | |
| torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage1_pretrain.yaml | |
| ``` | |
| A MiniGPT-4 checkpoint with only stage one training can be downloaded | |
| [here (13B)](https://drive.google.com/file/d/1u9FRRBB3VovP1HxCAlpD9Lw4t4P6-Yq8/view?usp=share_link) or [here (7B)](https://drive.google.com/file/d/1HihQtCEXUyBM1i9DQbaK934wW3TZi-h5/view?usp=share_link). | |
| Compared to the model after stage two, this checkpoint generate incomplete and repeated sentences frequently. | |
| **2. Second finetuning stage** | |
| In the second stage, we use a small high quality image-text pair dataset created by ourselves | |
| and convert it to a conversation format to further align MiniGPT-4. | |
| To download and prepare our second stage dataset, please check our | |
| [second stage dataset preparation instruction](dataset/README_2_STAGE.md). | |
| To launch the second stage alignment, | |
| first specify the path to the checkpoint file trained in stage 1 in | |
| [train_configs/minigpt4_stage1_pretrain.yaml](train_configs/minigpt4_stage2_finetune.yaml). | |
| You can also specify the output path there. | |
| Then, run the following command. In our experiments, we use 1 A100. | |
| ```bash | |
| torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage2_finetune.yaml | |
| ``` | |
| After the second stage alignment, MiniGPT-4 is able to talk about the image coherently and user-friendly. | |