TheOneWhoWill commited on
Commit
2101d58
·
verified ·
1 Parent(s): 93d7550

Version 2 of a locally trained model

Browse files
README.md CHANGED
@@ -1,3 +1,99 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - allenai/c4
5
+ - databricks/databricks-dolly-15k
6
+ language:
7
+ - en
8
+ pipeline_tag: text-generation
9
+ tags:
10
+ - qwen2
11
+ - transformers
12
+ - text-generation
13
+ ---
14
+ # Bootstrap LLM
15
+
16
+ ## Introduction
17
+ Ever since I released my first Qwen2 based model several weeks ago I've taken what I've learned and attempted to create a new model that has been pre-trained more thoroughly and on a more diverse dataset. I settled on using the unfiltered version of the english subset of c4 with entries being shuffled in batches of 1000 in an effort to deviate away from continuous streams of related training data. As for fine-tuning I initially opted to use [agentlans/multiturn-chat](https://huggingface.co/datasets/agentlans/multiturn-chat) because of the large amounts of examples they had over [databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) however I reverted back to dolly-15k due to the verbosity of the conversations in multiturn chat which wasn't the best suited for a short 1024-token context model.
18
+
19
+ ## Model Details
20
+ - **Model Name:** Bootstrap LLM
21
+ - **Architecture:** Qwen2-based
22
+ - **Context:** 1024 Tokens
23
+ - **Vocab Size:** 50,262 tokens
24
+ - **Qwen2 Specific:** Hidden size of 768, 6 layers, 6 heads
25
+
26
+ ## Training Details
27
+ - **GPU:** NVIDIA GeForce RTX 4070 Laptop GPU
28
+ - **Cuda:** CUDA was used during pre-training and fine-tuning.
29
+ - **VRAM:** 8 GB
30
+
31
+ Like my previous model the [AllenAI C4 English](https://huggingface.co/datasets/allenai/c4) dataset was used for pre-training with the key difference being that I used the "en.noblocklist" subset for more diversity. Instead of creating my own tokenizer I opted instead to using the internal tokenizer of [GPT-2](https://huggingface.co/openai-community/gpt2) because it saved me a lot of extra computation and was proven in real world examples to be effective. The model was trained on 280 thousand steps with 1024 token context, at a per device training batch size of 4, and 4 gradient accumulation steps. Pre-training took about 60 hours with the GPU overclocked to its maximum capacity. Post-training involved 5 epochs of [databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) formatted in ChatML.
32
+
33
+ ## How to use
34
+ Below, I’ve included a simple python script you can use. The model should be usable directly through the transformers library but you can change the model path to point to a directory containing the model too.
35
+ ```python
36
+ import torch
37
+ from transformers import AutoTokenizer, AutoModelForCausalLM
38
+
39
+ model_path = "TheOneWhoWill/Bootstrap-LLM"
40
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
41
+ stop_token_id = tokenizer.eos_token_id
42
+ model = AutoModelForCausalLM.from_pretrained(
43
+ model_path,
44
+ torch_dtype="auto",
45
+ device_map="auto"
46
+ )
47
+
48
+ from transformers import pipeline
49
+
50
+ pipe = pipeline(
51
+ "text-generation",
52
+ model=model,
53
+ tokenizer=tokenizer
54
+ )
55
+
56
+ messages = []
57
+
58
+ temperature = float(input("Enter temperature (e.g., 0.9): ") or 1)
59
+ token_limit = 256
60
+
61
+ while True:
62
+ user_input = input("User: ")
63
+ if user_input.lower() in ["exit", "quit"]:
64
+ print("Exiting the chat.")
65
+ break
66
+ if user_input.lower().startswith("temperature:"):
67
+ temperature = float(user_input.lower().split("temperature:")[1] or temperature)
68
+ print(f"Temperature set to {temperature}")
69
+ continue
70
+ if user_input.lower().startswith("reset"):
71
+ messages = []
72
+ print("Conversation reset.")
73
+ continue
74
+ if user_input.lower().startswith("tokens:"):
75
+ token_limit = int(user_input.lower().split("tokens:")[1] or 1024)
76
+ print(f"Token limit set to {token_limit}")
77
+ continue
78
+ if user_input.lower().startswith("debug"):
79
+ tokens_in_last_response = tokenizer.tokenize(messages[-1]["content"])
80
+ print("Number of Tokens:", len(tokens_in_last_response))
81
+ for token in tokens_in_last_response:
82
+ if token == "<|im_end|>":
83
+ print("End of message token found.")
84
+ continue
85
+ messages.append({"role": "user", "content": user_input})
86
+ # Generate and print
87
+ response = pipe(
88
+ messages,
89
+ max_new_tokens=token_limit,
90
+ do_sample=True,
91
+ temperature=temperature,
92
+ top_k=64,
93
+ top_p=0.95,
94
+ eos_token_id=stop_token_id
95
+ )
96
+ response = response[0]['generated_text'][-1]["content"]
97
+ messages.append({"role": "assistant", "content": response})
98
+ print("Assistant:", response)
99
+ ```
adapter_config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "./rafikov_qwen_final_with_tokens",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 16,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "v_proj",
29
+ "q_proj"
30
+ ],
31
+ "target_parameters": null,
32
+ "task_type": "CAUSAL_LM",
33
+ "trainable_token_indices": null,
34
+ "use_dora": false,
35
+ "use_qalora": false,
36
+ "use_rslora": false
37
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:37fc53ff9b8adb8798abae2d33cbb9d44788685de370cf20f980e4e9666225c0
3
+ size 1182784
added_tokens.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "<|im_end|>": 50261,
3
+ "<|im_start|>": 50260,
4
+ "<|pad|>": 50258,
5
+ "<|startoftext|>": 50257,
6
+ "<|unk|>": 50259
7
+ }
chat_template.jinja ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {% for message in messages %}{% if message['role'] == 'system' %}{{ '<|im_start|>system
2
+ ' + message['content'] + '<|im_end|>
3
+ ' }}{% elif message['role'] == 'user' %}{{ '<|im_start|>user
4
+ ' + message['content'] + '<|im_end|>
5
+ ' }}{% elif message['role'] == 'assistant' %}{{ '<|im_start|>assistant
6
+ ' + message['content'] + '<|im_end|>
7
+ ' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
8
+ ' }}{% endif %}
config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen2ForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 50257,
7
+ "eos_token_id": 50256,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 768,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 3072,
12
+ "layer_types": [
13
+ "full_attention",
14
+ "full_attention",
15
+ "full_attention",
16
+ "full_attention",
17
+ "full_attention",
18
+ "full_attention"
19
+ ],
20
+ "max_position_embeddings": 1024,
21
+ "max_window_layers": 28,
22
+ "model_type": "qwen2",
23
+ "num_attention_heads": 6,
24
+ "num_hidden_layers": 6,
25
+ "num_key_value_heads": 6,
26
+ "pad_token_id": 50258,
27
+ "rms_norm_eps": 1e-06,
28
+ "rope_scaling": null,
29
+ "rope_theta": 10000.0,
30
+ "sliding_window": null,
31
+ "tie_word_embeddings": false,
32
+ "torch_dtype": "float32",
33
+ "transformers_version": "4.55.4",
34
+ "use_cache": true,
35
+ "use_sliding_window": false,
36
+ "vocab_size": 50262
37
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 50257,
4
+ "eos_token_id": 50256,
5
+ "pad_token_id": 50258,
6
+ "transformers_version": "4.55.4"
7
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d1482c9b83ba88c5aa2e3dcf1c8e5188a0b0476b4c91e274918b8736a52fdb1c
3
+ size 535405592
special_tokens_map.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>"
5
+ ],
6
+ "bos_token": {
7
+ "content": "<|startoftext|>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false
12
+ },
13
+ "eos_token": {
14
+ "content": "<|endoftext|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false
19
+ },
20
+ "pad_token": {
21
+ "content": "<|pad|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false
26
+ },
27
+ "unk_token": {
28
+ "content": "<|unk|>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false
33
+ }
34
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "50256": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "50257": {
13
+ "content": "<|startoftext|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "50258": {
21
+ "content": "<|pad|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "50259": {
29
+ "content": "<|unk|>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "50260": {
37
+ "content": "<|im_start|>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "50261": {
45
+ "content": "<|im_end|>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ }
52
+ },
53
+ "additional_special_tokens": [
54
+ "<|im_start|>",
55
+ "<|im_end|>"
56
+ ],
57
+ "bos_token": "<|startoftext|>",
58
+ "clean_up_tokenization_spaces": false,
59
+ "eos_token": "<|endoftext|>",
60
+ "extra_special_tokens": {},
61
+ "model_max_length": 1024,
62
+ "pad_token": "<|pad|>",
63
+ "tokenizer_class": "GPT2Tokenizer",
64
+ "unk_token": "<|unk|>"
65
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6c09071a232378437018807102308f87058cf39f4c95d70ec9373f840fe94749
3
+ size 6161
vocab.json ADDED
The diff for this file is too large to render. See raw diff