{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "a3PTFH-H9Ozk" }, "source": [ "# ๐Ÿ’ง LFM2 - SFT with TRL\n", "\n", "This tutorial demonstrates how to fine-tune our LFM2 models, e.g. [`LiquidAI/LFM2-1.2B`](https://huggingface.co/LiquidAI/LFM2-1.2B), using the TRL library.\n", "\n", "Follow along if it's your first time using trl, or take single code snippets for your own workflow\n", "\n", "## ๐ŸŽฏ What you'll find:\n", "- **SFT** (Supervised Fine-Tuning) - Basic instruction following\n", "- **LoRA + SFT** - Using LoRA (from PEFT) to SFT while on constrained hardware\n", "\n", "## ๐Ÿ“‹ Prerequisites:\n", "- **GPU Runtime**: Select GPU in `Runtime` โ†’ `Change runtime type`\n", "- **Hugging Face Account**: For accessing models and datasets\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "x0RPLu2h9ome" }, "source": [ "# ๐Ÿ“ฆ Installation & Setup\n", "\n", "First, let's install all the required packages:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "3FIcp_wo9nsR", "tags": [] }, "outputs": [], "source": [ "!pip install transformers==4.54.1 trl>=0.18.2 peft>=0.15.2" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "!pip install sentencepiece --upgrade" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "!pip install patchelf" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "!patchelf --add-rpath '$ORIGIN/../../nvidia/cusparse/lib' /usr/local/lib/python3.11/dist-packages/torch/lib/libtorch_cuda.so" ] }, { "cell_type": "markdown", "metadata": { "id": "41UEf1uxCd6m" }, "source": [ "Let's now verify the packages are installed correctly" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "bSJgYtHT_Os4", "outputId": "23f86c62-471c-4579-fc23-1df88e87698b", "tags": [] }, "outputs": [], "source": [ "import torch\n", "import transformers\n", "import trl\n", "import os\n", "os.environ[\"WANDB_DISABLED\"] = \"true\"\n", "\n", "print(f\"๐Ÿ“ฆ PyTorch version: {torch.__version__}\")\n", "print(f\"๐Ÿค— Transformers version: {transformers.__version__}\")\n", "print(f\"๐Ÿ“Š TRL version: {trl.__version__}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "v_uXLzxQ_rnK" }, "source": [ "# Loading the model from Transformers ๐Ÿค—\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "iA3erKM4-HhS", "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "๐Ÿ“š Loading tokenizer...\n", "๐Ÿง  Loading model...\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/usr/local/lib/python3.11/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/usr/local/lib/python3.11/dist-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?\n", " warn(\n", "2025-08-19 00:26:28.995179: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n", "2025-08-19 00:26:29.033924: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", "2025-08-19 00:26:29.033969: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", "2025-08-19 00:26:29.035101: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", "2025-08-19 00:26:29.042173: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", "2025-08-19 00:26:29.962824: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "โœ… Local model loaded successfully!\n", "๐Ÿ”ข Parameters: 354,483,968\n", "๐Ÿ“– Vocab size: 64400\n", "๐Ÿ’พ Model size: ~0.7 GB (bfloat16)\n" ] } ], "source": [ "from transformers import AutoTokenizer, AutoModelForCausalLM\n", "from IPython.display import display, HTML, Markdown\n", "import torch\n", "\n", "model_id = \"LiquidAI/LFM2-350M\" # <- or LFM2-700M or LFM2-350M\n", "\n", "print(\"๐Ÿ“š Loading tokenizer...\")\n", "tokenizer = AutoTokenizer.from_pretrained(model_id)\n", "\n", "print(\"๐Ÿง  Loading model...\")\n", "model = AutoModelForCausalLM.from_pretrained(\n", " model_id,\n", " device_map=\"auto\",\n", " torch_dtype=\"bfloat16\",\n", "# attn_implementation=\"flash_attention_2\" <- uncomment on compatible GPU\n", ")\n", "\n", "print(\"โœ… Local model loaded successfully!\")\n", "print(f\"๐Ÿ”ข Parameters: {model.num_parameters():,}\")\n", "print(f\"๐Ÿ“– Vocab size: {len(tokenizer)}\")\n", "print(f\"๐Ÿ’พ Model size: ~{model.num_parameters() * 2 / 1e9:.1f} GB (bfloat16)\")" ] }, { "cell_type": "markdown", "metadata": { "id": "6ABA6Yrm_lql" }, "source": [ "# ๐ŸŽฏ Part 1: Supervised Fine-Tuning (SFT)\n", "\n", "SFT teaches the model to follow instructions by training on input-output pairs (instruction vs response). This is the foundation for creating instruction-following models." ] }, { "cell_type": "markdown", "metadata": { "id": "KufdgeypHtst" }, "source": [ "## Load an SFT Dataset\n", "\n", "We will use [HuggingFaceTB/smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk), limiting ourselves to the first 5k samples for brevity. Feel free to change the limit by changing the slicing index in the parameter `split`." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "XCe8O06-_Cps", "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "๐Ÿ“ฅ Loading SFT dataset...\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "f2f49b719f5f4528bebae99c4c81ae18", "version_major": 2, "version_minor": 0 }, "text/plain": [ "README.md: 0%| | 0.00/716 [00:00<|im_start|>user\n", "# What is C. elegans?<|im_end|>\n", "# <|im_start|>assistant\n", "# C. elegans, also known as Caenorhabditis elegans, is a small, free-living\n", "# nematode worm (roundworm) that belongs to the phylum Nematoda.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "08Y3TxKrBRXo" }, "source": [ "# ๐ŸŽ›๏ธ Part 2: LoRA + SFT (Parameter-Efficient Fine-tuning)\n", "\n", "LoRA (Low-Rank Adaptation) allows efficient fine-tuning by only training a small number of additional parameters. Perfect for limited compute resources!\n" ] }, { "cell_type": "markdown", "metadata": { "id": "-MfWfc-Pvl9q" }, "source": [ "## Wrap the model with PEFT\n", "\n", "We specify target modules that will be finetuned while the rest of the models weights remains frozen. Feel free to modify the `r` (rank) value:\n", "- higher -> better approximation of full-finetuning\n", "- lower -> needs even less compute resources" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "puYp_gTpBSsf" }, "outputs": [], "source": [ "from peft import LoraConfig, get_peft_model, TaskType\n", "\n", "GLU_MODULES = [\"w1\", \"w2\", \"w3\"]\n", "MHA_MODULES = [\"q_proj\", \"k_proj\", \"v_proj\", \"out_proj\"]\n", "CONV_MODULES = [\"in_proj\", \"out_proj\"]\n", "\n", "lora_config = LoraConfig(\n", " task_type=TaskType.CAUSAL_LM,\n", " inference_mode=False,\n", " r=8, # <- lower values = fewer parameters\n", " lora_alpha=16,\n", " lora_dropout=0.1,\n", " target_modules=GLU_MODULES + MHA_MODULES + CONV_MODULES,\n", " bias=\"none\",\n", " modules_to_save=None,\n", ")\n", "\n", "lora_model = get_peft_model(model, lora_config)\n", "lora_model.print_trainable_parameters()\n", "\n", "print(\"โœ… LoRA configuration applied!\")\n", "print(f\"๐ŸŽ›๏ธ LoRA rank: {lora_config.r}\")\n", "print(f\"๐Ÿ“Š LoRA alpha: {lora_config.lora_alpha}\")\n", "print(f\"๐ŸŽฏ Target modules: {lora_config.target_modules}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "L1Hem_DOwHgY" }, "source": [ "## Launch Training\n", "\n", "Now ready to launch the SFT training, but this time with the LoRA-wrapped model" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "u-VYQysHBY8-" }, "outputs": [], "source": [ "from trl import SFTConfig, SFTTrainer\n", "\n", "lora_sft_config = SFTConfig(\n", " output_dir=\"./lfm2-sft-lora\",\n", " num_train_epochs=1,\n", " per_device_train_batch_size=1,\n", " learning_rate=5e-5,\n", " lr_scheduler_type=\"linear\",\n", " warmup_steps=100,\n", " warmup_ratio=0.2,\n", " logging_steps=10,\n", " save_strategy=\"epoch\",\n", " eval_strategy=\"epoch\",\n", " load_best_model_at_end=True,\n", " report_to=None,\n", ")\n", "\n", "print(\"๐Ÿ—๏ธ Creating LoRA SFT trainer...\")\n", "lora_sft_trainer = SFTTrainer(\n", " model=lora_model,\n", " args=lora_sft_config,\n", " train_dataset=train_dataset_sft,\n", " eval_dataset=eval_dataset_sft,\n", " processing_class=tokenizer,\n", ")\n", "\n", "print(\"\\n๐Ÿš€ Starting LoRA + SFT training...\")\n", "lora_sft_trainer.train()\n", "\n", "print(\"๐ŸŽ‰ LoRA + SFT training completed!\")\n", "\n", "lora_sft_trainer.save_model()\n", "print(f\"๐Ÿ’พ LoRA model saved to: {lora_sft_config.output_dir}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "xI1-N-_Ev0cC" }, "source": [ "## Save merged model\n", "\n", "Merge the extra weights learned with LoRA back into the model to obtain a \"normal\" model checkpoint." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "_rizEFUsvwce" }, "outputs": [], "source": [ "print(\"\\n๐Ÿ”„ Merging LoRA weights...\")\n", "merged_model = lora_model.merge_and_unload()\n", "merged_model.save_pretrained(\"./lfm2-lora-merged\")\n", "tokenizer.save_pretrained(\"./lfm2-lora-merged\")\n", "print(\"๐Ÿ’พ Merged model saved to: ./lfm2-lora-merged\")" ] } ], "metadata": { "accelerator": "GPU", "colab": { "gpuType": "T4", "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.7" } }, "nbformat": 4, "nbformat_minor": 4 }