Version 2 of a locally trained model

Browse files

Files changed (14) hide show

README.md +99 -3
adapter_config.json +37 -0
adapter_model.safetensors +3 -0
added_tokens.json +7 -0
chat_template.jinja +8 -0
config.json +37 -0
generation_config.json +7 -0
merges.txt +0 -0
model.safetensors +3 -0
special_tokens_map.json +34 -0
tokenizer.json +0 -0
tokenizer_config.json +65 -0
training_args.bin +3 -0
vocab.json +0 -0

README.md CHANGED Viewed

@@ -1,3 +1,99 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+datasets:
+- allenai/c4
+- databricks/databricks-dolly-15k
+language:
+- en
+pipeline_tag: text-generation
+tags:
+- qwen2
+- transformers
+- text-generation
+---
+# Bootstrap LLM
+## Introduction
+Ever since I released my first Qwen2 based model several weeks ago I've taken what I've learned and attempted to create a new model that has been pre-trained more thoroughly and on a more diverse dataset. I settled on using the unfiltered version of the english subset of c4 with entries being shuffled in batches of 1000 in an effort to deviate away from continuous streams of related training data. As for fine-tuning I initially opted to use [agentlans/multiturn-chat](https://huggingface.co/datasets/agentlans/multiturn-chat) because of the large amounts of examples they had over [databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) however I reverted back to dolly-15k due to the verbosity of the conversations in multiturn chat which wasn't the best suited for a short 1024-token context model.
+## Model Details
+- **Model Name:** Bootstrap LLM
+- **Architecture:** Qwen2-based
+- **Context:** 1024 Tokens
+- **Vocab Size:** 50,262 tokens
+- **Qwen2 Specific:** Hidden size of 768, 6 layers, 6 heads
+## Training Details
+- **GPU:** NVIDIA GeForce RTX 4070 Laptop GPU
+- **Cuda:** CUDA was used during pre-training and fine-tuning.
+- **VRAM:** 8 GB
+Like my previous model the [AllenAI C4 English](https://huggingface.co/datasets/allenai/c4) dataset was used for pre-training with the key difference being that I used the "en.noblocklist" subset for more diversity. Instead of creating my own tokenizer I opted instead to using the internal tokenizer of [GPT-2](https://huggingface.co/openai-community/gpt2) because it saved me a lot of extra computation and was proven in real world examples to be effective. The model was trained on 280 thousand steps with 1024 token context, at a per device training batch size of 4, and 4 gradient accumulation steps. Pre-training took about 60 hours with the GPU overclocked to its maximum capacity. Post-training involved 5 epochs of [databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) formatted in ChatML.
+## How to use
+Below, I’ve included a simple python script you can use. The model should be usable directly through the transformers library but you can change the model path to point to a directory containing the model too.
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+model_path = "TheOneWhoWill/Bootstrap-LLM"
+tokenizer = AutoTokenizer.from_pretrained(model_path)
+stop_token_id = tokenizer.eos_token_id
+model = AutoModelForCausalLM.from_pretrained(
+	model_path,
+	torch_dtype="auto",
+	device_map="auto"
+)
+from transformers import pipeline
+pipe = pipeline(
+	"text-generation",
+	model=model,
+	tokenizer=tokenizer
+)
+messages = []
+temperature = float(input("Enter temperature (e.g., 0.9): ") or 1)
+token_limit = 256
+while True:
+	user_input = input("User: ")
+	if user_input.lower() in ["exit", "quit"]:
+		print("Exiting the chat.")
+		break
+	if user_input.lower().startswith("temperature:"):
+		temperature = float(user_input.lower().split("temperature:")[1] or temperature)
+		print(f"Temperature set to {temperature}")
+		continue
+	if user_input.lower().startswith("reset"):
+		messages = []
+		print("Conversation reset.")
+		continue
+	if user_input.lower().startswith("tokens:"):
+		token_limit = int(user_input.lower().split("tokens:")[1] or 1024)
+		print(f"Token limit set to {token_limit}")
+		continue
+	if user_input.lower().startswith("debug"):
+		tokens_in_last_response = tokenizer.tokenize(messages[-1]["content"])
+		print("Number of Tokens:", len(tokens_in_last_response))
+		for token in tokens_in_last_response:
+			if token == "<|im_end|>":
+				print("End of message token found.")
+		continue
+	messages.append({"role": "user", "content": user_input})
+	# Generate and print
+	response = pipe(
+		messages,
+		max_new_tokens=token_limit,
+		do_sample=True,
+		temperature=temperature,
+		top_k=64,
+		top_p=0.95,
+		eos_token_id=stop_token_id
+	)
+	response = response[0]['generated_text'][-1]["content"]
+	messages.append({"role": "assistant", "content": response})
+	print("Assistant:", response)
+```

adapter_config.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "./rafikov_qwen_final_with_tokens",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:37fc53ff9b8adb8798abae2d33cbb9d44788685de370cf20f980e4e9666225c0
+size 1182784

added_tokens.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "<|im_end|>": 50261,
+  "<|im_start|>": 50260,
+  "<|pad|>": 50258,
+  "<|startoftext|>": 50257,
+  "<|unk|>": 50259
+}

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,8 @@

+{% for message in messages %}{% if message['role'] == 'system' %}{{ '<|im_start|>system
+' + message['content'] + '<|im_end|>
+' }}{% elif message['role'] == 'user' %}{{ '<|im_start|>user
+' + message['content'] + '<|im_end|>
+' }}{% elif message['role'] == 'assistant' %}{{ '<|im_start|>assistant
+' + message['content'] + '<|im_end|>
+' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
+' }}{% endif %}

config.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 50257,
+  "eos_token_id": 50256,
+  "hidden_act": "silu",
+  "hidden_size": 768,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 1024,
+  "max_window_layers": 28,
+  "model_type": "qwen2",
+  "num_attention_heads": 6,
+  "num_hidden_layers": 6,
+  "num_key_value_heads": 6,
+  "pad_token_id": 50258,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 10000.0,
+  "sliding_window": null,
+  "tie_word_embeddings": false,
+  "torch_dtype": "float32",
+  "transformers_version": "4.55.4",
+  "use_cache": true,
+  "use_sliding_window": false,
+  "vocab_size": 50262
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 50257,
+  "eos_token_id": 50256,
+  "pad_token_id": 50258,
+  "transformers_version": "4.55.4"
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d1482c9b83ba88c5aa2e3dcf1c8e5188a0b0476b4c91e274918b8736a52fdb1c
+size 535405592

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>"
+  ],
+  "bos_token": {
+    "content": "<|startoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|pad|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<|unk|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,65 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "50256": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50257": {
+      "content": "<|startoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50258": {
+      "content": "<|pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50259": {
+      "content": "<|unk|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50260": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50261": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>"
+  ],
+  "bos_token": "<|startoftext|>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|endoftext|>",
+  "extra_special_tokens": {},
+  "model_max_length": 1024,
+  "pad_token": "<|pad|>",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|unk|>"
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6c09071a232378437018807102308f87058cf39f4c95d70ec9373f840fe94749
+size 6161

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff