Daemontatox commited on
Commit
6246fe6
·
verified ·
1 Parent(s): 5599100

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +190 -6
README.md CHANGED
@@ -9,13 +9,197 @@ license: apache-2.0
9
  language:
10
  - en
11
  ---
 
12
 
13
- # Uploaded finetuned model
14
 
15
- - **Developed by:** Daemontatox
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** deepseek-ai/DeepSeek-V2-Lite-Chat
18
 
19
- This deepseek_v2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
 
 
 
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  language:
10
  - en
11
  ---
12
+ # DeepZirel-V2
13
 
14
+ An experimental fine-tune of deepseek-ai/DeepSeek-V2-Lite-Chat using novel training approaches aimed at improving older model architectures.
15
 
16
+ ## Model Details
 
 
17
 
18
+ - **Base Model:** deepseek-ai/DeepSeek-V2-Lite-Chat
19
+ - **Fine-tuned by:** Daemontatox
20
+ - **Purpose:** Architecture improvement research
21
+ - **Training:** Experimental data and methodology targeting legacy architecture enhancement
22
+ - **Language:** Multilingual
23
 
24
+ ## Training Approach
25
+
26
+ This model explores new training techniques designed to enhance the performance of older model architectures. The experimental approach focuses on:
27
+ - Novel fine-tuning strategies for legacy architectures
28
+ - Custom training data optimization
29
+ - Architecture-specific improvements
30
+
31
+ ## Inference
32
+
33
+ # Transformers
34
+ ```python
35
+ from transformers import AutoModelForCausalLM, AutoTokenizer
36
+
37
+ model = AutoModelForCausalLM.from_pretrained(
38
+ "Daemontatox/DeepZirel-V2",
39
+ device_map="auto",
40
+ torch_dtype="auto",
41
+ trust_remote_code=True
42
+ )
43
+ tokenizer = AutoTokenizer.from_pretrained("Daemontatox/DeepZirel-V2", trust_remote_code=True)
44
+
45
+ messages = [
46
+ {"role": "user", "content": "Hello, how are you?"}
47
+ ]
48
+
49
+ input_ids = tokenizer.apply_chat_template(
50
+ messages,
51
+ add_generation_prompt=True,
52
+ return_tensors="pt"
53
+ ).to(model.device)
54
+
55
+ outputs = model.generate(
56
+ input_ids,
57
+ max_new_tokens=512,
58
+ temperature=0.7,
59
+ top_p=0.9,
60
+ do_sample=True
61
+ )
62
+
63
+ response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
64
+ print(response)
65
+ ```
66
+
67
+ # vLLM
68
+ ```python
69
+ from vllm import LLM, SamplingParams
70
+
71
+ llm = LLM(
72
+ model="Daemontatox/DeepZirel-V2",
73
+ tensor_parallel_size=2,
74
+ dtype="auto",
75
+ trust_remote_code=True
76
+ )
77
+
78
+ sampling_params = SamplingParams(
79
+ temperature=0.7,
80
+ top_p=0.9,
81
+ max_tokens=512
82
+ )
83
+
84
+ prompts = ["Hello, how are you?"]
85
+ outputs = llm.generate(prompts, sampling_params)
86
+
87
+ for output in outputs:
88
+ print(output.outputs[0].text)
89
+ ```
90
+
91
+ # vLLM OpenAI-Compatible Server
92
+ ```bash
93
+ vllm serve Daemontatox/DeepZirel-V2 \
94
+ --tensor-parallel-size 2 \
95
+ --dtype auto \
96
+ --trust-remote-code \
97
+ --max-model-len 4096
98
+ ```
99
+ ```python
100
+ from openai import OpenAI
101
+
102
+ client = OpenAI(
103
+ base_url="http://localhost:8000/v1",
104
+ api_key="token-abc123"
105
+ )
106
+
107
+ response = client.chat.completions.create(
108
+ model="Daemontatox/DeepZirel-V2",
109
+ messages=[
110
+ {"role": "user", "content": "Hello, how are you?"}
111
+ ],
112
+ temperature=0.7,
113
+ max_tokens=512
114
+ )
115
+
116
+ print(response.choices[0].message.content)
117
+ ```
118
+
119
+ # TensorRT-LLM
120
+ ```bash
121
+ # Convert to TensorRT-LLM format
122
+ python convert_checkpoint.py \
123
+ --model_dir Daemontatox/DeepZirel-V2 \
124
+ --output_dir ./trt_ckpt \
125
+ --dtype float16 \
126
+ --tp_size 2
127
+
128
+ # Build TensorRT engine
129
+ trtllm-build \
130
+ --checkpoint_dir ./trt_ckpt \
131
+ --output_dir ./trt_engine \
132
+ --gemm_plugin float16 \
133
+ --max_batch_size 8 \
134
+ --max_input_len 2048 \
135
+ --max_output_len 512
136
+ ```
137
+ ```python
138
+ from tensorrt_llm import LLM
139
+
140
+ llm = LLM(model="./trt_engine")
141
+
142
+ prompts = ["Hello, how are you?"]
143
+ outputs = llm.generate(prompts, max_new_tokens=512)
144
+
145
+ for output in outputs:
146
+ print(output.text)
147
+ ```
148
+
149
+ # Modular MAX
150
+ ```bash
151
+ # Serve with MAX Engine
152
+ max serve Daemontatox/DeepZirel-V2 \
153
+ --port 8000 \
154
+ --tensor-parallel-size 2
155
+ ```
156
+ ```python
157
+ from max import engine
158
+
159
+ # Load model with MAX
160
+ model = engine.InferenceSession(
161
+ "Daemontatox/DeepZirel-V2",
162
+ device="cuda",
163
+ tensor_parallel=2
164
+ )
165
+
166
+ # Run inference
167
+ prompt = "Hello, how are you?"
168
+ output = model.generate(
169
+ prompt,
170
+ max_tokens=512,
171
+ temperature=0.7,
172
+ top_p=0.9
173
+ )
174
+
175
+ print(output.text)
176
+ ```
177
+ ```python
178
+ # Using MAX with Python API
179
+ from max.serve import serve
180
+ from max.pipelines import pipeline
181
+
182
+ # Create pipeline
183
+ pipe = pipeline(
184
+ "text-generation",
185
+ model="Daemontatox/DeepZirel-V2",
186
+ device="cuda",
187
+ tensor_parallel=2
188
+ )
189
+
190
+ # Generate
191
+ result = pipe(
192
+ "Hello, how are you?",
193
+ max_new_tokens=512,
194
+ temperature=0.7,
195
+ top_p=0.9
196
+ )
197
+
198
+ print(result[0]["generated_text"])
199
+ ```
200
+
201
+
202
+
203
+ # Limitations
204
+
205
+ This is an experimental model using novel training approaches on legacy architectures. Results may vary and should be thoroughly tested before production deployment.