Daemontatox
/

DeepZirel-V2

@@ -9,13 +9,197 @@ license: apache-2.0
 language:
 - en
 ---
-# Uploaded finetuned  model
-- **Developed by:** Daemontatox
-- **License:** apache-2.0
-- **Finetuned from model :** deepseek-ai/DeepSeek-V2-Lite-Chat
-This deepseek_v2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 language:
 - en
 ---
+# DeepZirel-V2
+An experimental fine-tune of deepseek-ai/DeepSeek-V2-Lite-Chat using novel training approaches aimed at improving older model architectures.
+## Model Details
+- **Base Model:** deepseek-ai/DeepSeek-V2-Lite-Chat
+- **Fine-tuned by:** Daemontatox
+- **Purpose:** Architecture improvement research
+- **Training:** Experimental data and methodology targeting legacy architecture enhancement
+- **Language:** Multilingual
+## Training Approach
+This model explores new training techniques designed to enhance the performance of older model architectures. The experimental approach focuses on:
+- Novel fine-tuning strategies for legacy architectures
+- Custom training data optimization
+- Architecture-specific improvements
+## Inference
+# Transformers
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained(
+    "Daemontatox/DeepZirel-V2",
+    device_map="auto",
+    torch_dtype="auto",
+    trust_remote_code=True
+)
+tokenizer = AutoTokenizer.from_pretrained("Daemontatox/DeepZirel-V2", trust_remote_code=True)
+messages = [
+    {"role": "user", "content": "Hello, how are you?"}
+]
+input_ids = tokenizer.apply_chat_template(
+    messages,
+    add_generation_prompt=True,
+    return_tensors="pt"
+).to(model.device)
+outputs = model.generate(
+    input_ids,
+    max_new_tokens=512,
+    temperature=0.7,
+    top_p=0.9,
+    do_sample=True
+)
+response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
+print(response)
+```
+# vLLM
+```python
+from vllm import LLM, SamplingParams
+llm = LLM(
+    model="Daemontatox/DeepZirel-V2",
+    tensor_parallel_size=2,
+    dtype="auto",
+    trust_remote_code=True
+)
+sampling_params = SamplingParams(
+    temperature=0.7,
+    top_p=0.9,
+    max_tokens=512
+)
+prompts = ["Hello, how are you?"]
+outputs = llm.generate(prompts, sampling_params)
+for output in outputs:
+    print(output.outputs[0].text)
+```
+# vLLM OpenAI-Compatible Server
+```bash
+vllm serve Daemontatox/DeepZirel-V2 \
+    --tensor-parallel-size 2 \
+    --dtype auto \
+    --trust-remote-code \
+    --max-model-len 4096
+```
+```python
+from openai import OpenAI
+client = OpenAI(
+    base_url="http://localhost:8000/v1",
+    api_key="token-abc123"
+)
+response = client.chat.completions.create(
+    model="Daemontatox/DeepZirel-V2",
+    messages=[
+        {"role": "user", "content": "Hello, how are you?"}
+    ],
+    temperature=0.7,
+    max_tokens=512
+)
+print(response.choices[0].message.content)
+```
+# TensorRT-LLM
+```bash
+# Convert to TensorRT-LLM format
+python convert_checkpoint.py \
+    --model_dir Daemontatox/DeepZirel-V2 \
+    --output_dir ./trt_ckpt \
+    --dtype float16 \
+    --tp_size 2
+# Build TensorRT engine
+trtllm-build \
+    --checkpoint_dir ./trt_ckpt \
+    --output_dir ./trt_engine \
+    --gemm_plugin float16 \
+    --max_batch_size 8 \
+    --max_input_len 2048 \
+    --max_output_len 512
+```
+```python
+from tensorrt_llm import LLM
+llm = LLM(model="./trt_engine")
+prompts = ["Hello, how are you?"]
+outputs = llm.generate(prompts, max_new_tokens=512)
+for output in outputs:
+    print(output.text)
+```
+# Modular MAX
+```bash
+# Serve with MAX Engine
+max serve Daemontatox/DeepZirel-V2 \
+    --port 8000 \
+    --tensor-parallel-size 2
+```
+```python
+from max import engine
+# Load model with MAX
+model = engine.InferenceSession(
+    "Daemontatox/DeepZirel-V2",
+    device="cuda",
+    tensor_parallel=2
+)
+# Run inference
+prompt = "Hello, how are you?"
+output = model.generate(
+    prompt,
+    max_tokens=512,
+    temperature=0.7,
+    top_p=0.9
+)
+print(output.text)
+```
+```python
+# Using MAX with Python API
+from max.serve import serve
+from max.pipelines import pipeline
+# Create pipeline
+pipe = pipeline(
+    "text-generation",
+    model="Daemontatox/DeepZirel-V2",
+    device="cuda",
+    tensor_parallel=2
+)
+# Generate
+result = pipe(
+    "Hello, how are you?",
+    max_new_tokens=512,
+    temperature=0.7,
+    top_p=0.9
+)
+print(result[0]["generated_text"])
+```
+# Limitations
+This is an experimental model using novel training approaches on legacy architectures. Results may vary and should be thoroughly tested before production deployment.