Instructions to use alireza7/GrepSeek-Qwen3.5-9B-GRPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use alireza7/GrepSeek-Qwen3.5-9B-GRPO with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="alireza7/GrepSeek-Qwen3.5-9B-GRPO")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("alireza7/GrepSeek-Qwen3.5-9B-GRPO")
model = AutoModelForMultimodalLM.from_pretrained("alireza7/GrepSeek-Qwen3.5-9B-GRPO")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use alireza7/GrepSeek-Qwen3.5-9B-GRPO with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "alireza7/GrepSeek-Qwen3.5-9B-GRPO"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "alireza7/GrepSeek-Qwen3.5-9B-GRPO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/alireza7/GrepSeek-Qwen3.5-9B-GRPO

SGLang

How to use alireza7/GrepSeek-Qwen3.5-9B-GRPO with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "alireza7/GrepSeek-Qwen3.5-9B-GRPO" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "alireza7/GrepSeek-Qwen3.5-9B-GRPO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "alireza7/GrepSeek-Qwen3.5-9B-GRPO" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "alireza7/GrepSeek-Qwen3.5-9B-GRPO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use alireza7/GrepSeek-Qwen3.5-9B-GRPO with Docker Model Runner:
```
docker model run hf.co/alireza7/GrepSeek-Qwen3.5-9B-GRPO
```

GrepSeek-Qwen3.5-9B-GRPO

The full GrepSeek model. GrepSeek is a Direct Corpus Interaction (DCI) search agent: rather than retrieving from a pre-computed dense or sparse index, it answers questions by issuing Unix shell commands (rg, grep, head, …) directly against a raw 21M-passage Wikipedia corpus, interleaving retrieval and reasoning in a single policy. This checkpoint is Qwen/Qwen3.5-9B, cold-start fine-tuned and then optimized with GRPO.

📄 GrepSeek: Training Search Agents for Direct Corpus Interaction · 💻 https://github.com/alirezasalemi7/grepseek

Why direct corpus interaction?

Index-based retrieval (dense or sparse) suffers from semantic smoothing (blurring fine-grained entity/lexical distinctions), limited controllability (the agent can't enforce exact filters or iteratively refine results), and redundant re-retrieval in multi-hop settings. By executing exact-string shell pipelines (e.g. rg -F), GrepSeek preserves lexical precision, isolates rare symbolic patterns and exact entity names, and composes multi-stage retrieval programs for compositional reasoning — while needing no embedding index (only the ~14 GB raw corpus; no offline indexing).

Training

Initialized from: alireza7/GrepSeek-Qwen3.5-9B-SFT (cold-start SFT on alireza7/GrepSeek-ColdStart-SFT-10k; base Qwen/Qwen3.5-9B).
RL: GRPO, group size n=5, reward = token-F1 × binary format gate (only structurally valid <think>/<tool_call>/<tool_response>/<answer> trajectories get non-zero reward), 200 steps, LR 5e-6, batch 256, KL disabled, Ulysses SP=2, on 4×A100-80GB. Trained only on NQ + HotpotQA.

⚠️ A tool-using agent, not a standalone chatbot

The model emits <tool_call> shell commands that must be executed against the corpus and returned as <tool_response> turns. You need the corpus (PeterJinGo/wiki-18-corpus), a tool-calling vLLM server, and the GrepSeek inference harness — all in the code repo.

Usage

git clone https://github.com/alirezasalemi7/grepseek && cd grepseek
# env: TRAINING_ENV.md  ·  corpus: cold_start_sft/download_corpus.py

# 1. serve this checkpoint
MODEL_PATH=alireza7/GrepSeek-Qwen3.5-9B-GRPO bash rl/serve_rl.sh        # -> http://localhost:10730/v1

# 2a. generation on your own questions
GREPSEEK_CORPUS_ROOT=/path/to/wiki_18_corpus \
  bash inference/run_inference.sh --base_url http://localhost:10730/v1 \
    --model grepseek --temperature 0.6 --input my_questions.jsonl --out_dir out

# 2b. reproduce the benchmark eval (token-F1 / EM on the Search-R1 suite)
GREPSEEK_CORPUS_ROOT=/path/to/wiki_18_corpus \
  bash inference/run_inference.sh --base_url http://localhost:10730/v1 \
    --model grepseek --temperature 0.6 --datasets all --out_dir eval

The inference harness also ships the semantics-preserving sharded-parallel execution engine (+ persistent search daemon) that accelerates corpus search by up to 7.6× while remaining byte-exact with sequential grep.

Results (token-level F1)

Trained only on NQ + HotpotQA (marked *); the other five are out-of-distribution. GrepSeek gets the best micro-average and wins 4/7 benchmarks.

	NQ*	TriviaQA	PopQA	HotpotQA*	2Wiki	MuSiQue	Bamboogle	micro-avg
Search-R1 (Qwen3-Emb-4B, best baseline)	0.5067	0.7693	0.5101	0.5591	0.4299	0.2878	0.6989	0.5441
GrepSeek (this model)	0.5223	0.7673	0.4861	0.6231	0.5178	0.3006	0.6212	0.5691

Micro-average EM = 0.4948 (also best overall; full EM table in the paper). Gains are largest on multi-hop tasks (HotpotQA, 2Wiki, MuSiQue) that reward exact entity disambiguation and iterative evidence aggregation.

Limitations

Because retrieval is purely lexical, GrepSeek is weaker on surface-form variation / long-tail queries — e.g. PopQA (diacritics, name variants) — and grep has no semantic relevance ranking, so an authoritative passage can be buried behind earlier file-order matches. Dense retrieval remains advantageous on heavily semantic or paraphrase-driven queries.

License

Inherits the license of the base model Qwen/Qwen3.5-9B — confirm and update the license field above if needed.

Citation

@misc{salemi2026grepseektrainingsearchagents,
      title={GrepSeek: Training Search Agents for Direct Corpus Interaction},
      author={Alireza Salemi and Chang Zeng and Atharva Nijasure and Jui-Hui Chung and Razieh Rahimi and Fernando Diaz and Hamed Zamani},
      year={2026},
      eprint={2605.29307},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2605.29307},
}