Instructions to use alireza7/GrepSeek-Qwen3.5-9B-GRPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use alireza7/GrepSeek-Qwen3.5-9B-GRPO with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="alireza7/GrepSeek-Qwen3.5-9B-GRPO") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("alireza7/GrepSeek-Qwen3.5-9B-GRPO") model = AutoModelForMultimodalLM.from_pretrained("alireza7/GrepSeek-Qwen3.5-9B-GRPO") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use alireza7/GrepSeek-Qwen3.5-9B-GRPO with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "alireza7/GrepSeek-Qwen3.5-9B-GRPO" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "alireza7/GrepSeek-Qwen3.5-9B-GRPO", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/alireza7/GrepSeek-Qwen3.5-9B-GRPO
- SGLang
How to use alireza7/GrepSeek-Qwen3.5-9B-GRPO with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "alireza7/GrepSeek-Qwen3.5-9B-GRPO" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "alireza7/GrepSeek-Qwen3.5-9B-GRPO", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "alireza7/GrepSeek-Qwen3.5-9B-GRPO" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "alireza7/GrepSeek-Qwen3.5-9B-GRPO", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use alireza7/GrepSeek-Qwen3.5-9B-GRPO with Docker Model Runner:
docker model run hf.co/alireza7/GrepSeek-Qwen3.5-9B-GRPO
GrepSeek-Qwen3.5-9B-GRPO
The full GrepSeek model. GrepSeek is a Direct Corpus Interaction (DCI)
search agent: rather than retrieving from a pre-computed dense or sparse index, it
answers questions by issuing Unix shell commands (rg, grep, head, …)
directly against a raw 21M-passage Wikipedia corpus, interleaving retrieval and
reasoning in a single policy. This checkpoint is Qwen/Qwen3.5-9B, cold-start
fine-tuned and then optimized with GRPO.
📄 GrepSeek: Training Search Agents for Direct Corpus Interaction · 💻 https://github.com/alirezasalemi7/grepseek
Why direct corpus interaction?
Index-based retrieval (dense or sparse) suffers from semantic smoothing
(blurring fine-grained entity/lexical distinctions), limited controllability
(the agent can't enforce exact filters or iteratively refine results), and
redundant re-retrieval in multi-hop settings. By executing exact-string shell
pipelines (e.g. rg -F), GrepSeek preserves lexical precision, isolates rare
symbolic patterns and exact entity names, and composes multi-stage retrieval
programs for compositional reasoning — while needing no embedding index (only
the ~14 GB raw corpus; no offline indexing).
Training
- Initialized from:
alireza7/GrepSeek-Qwen3.5-9B-SFT(cold-start SFT onalireza7/GrepSeek-ColdStart-SFT-10k; baseQwen/Qwen3.5-9B). - RL: GRPO, group size n=5, reward = token-F1 × binary format gate (only structurally valid
<think>/<tool_call>/<tool_response>/<answer>trajectories get non-zero reward), 200 steps, LR 5e-6, batch 256, KL disabled, Ulysses SP=2, on 4×A100-80GB. Trained only on NQ + HotpotQA.
⚠️ A tool-using agent, not a standalone chatbot
The model emits <tool_call> shell commands that must be executed against the
corpus and returned as <tool_response> turns. You need the corpus
(PeterJinGo/wiki-18-corpus),
a tool-calling vLLM server, and the GrepSeek inference harness — all in the
code repo.
Usage
git clone https://github.com/alirezasalemi7/grepseek && cd grepseek
# env: TRAINING_ENV.md · corpus: cold_start_sft/download_corpus.py
# 1. serve this checkpoint
MODEL_PATH=alireza7/GrepSeek-Qwen3.5-9B-GRPO bash rl/serve_rl.sh # -> http://localhost:10730/v1
# 2a. generation on your own questions
GREPSEEK_CORPUS_ROOT=/path/to/wiki_18_corpus \
bash inference/run_inference.sh --base_url http://localhost:10730/v1 \
--model grepseek --temperature 0.6 --input my_questions.jsonl --out_dir out
# 2b. reproduce the benchmark eval (token-F1 / EM on the Search-R1 suite)
GREPSEEK_CORPUS_ROOT=/path/to/wiki_18_corpus \
bash inference/run_inference.sh --base_url http://localhost:10730/v1 \
--model grepseek --temperature 0.6 --datasets all --out_dir eval
The inference harness also ships the semantics-preserving sharded-parallel
execution engine (+ persistent search daemon) that accelerates corpus search by
up to 7.6× while remaining byte-exact with sequential grep.
Results (token-level F1)
Trained only on NQ + HotpotQA (marked *); the other five are out-of-distribution. GrepSeek gets the best micro-average and wins 4/7 benchmarks.
| NQ* | TriviaQA | PopQA | HotpotQA* | 2Wiki | MuSiQue | Bamboogle | micro-avg | |
|---|---|---|---|---|---|---|---|---|
| Search-R1 (Qwen3-Emb-4B, best baseline) | 0.5067 | 0.7693 | 0.5101 | 0.5591 | 0.4299 | 0.2878 | 0.6989 | 0.5441 |
| GrepSeek (this model) | 0.5223 | 0.7673 | 0.4861 | 0.6231 | 0.5178 | 0.3006 | 0.6212 | 0.5691 |
Micro-average EM = 0.4948 (also best overall; full EM table in the paper). Gains are largest on multi-hop tasks (HotpotQA, 2Wiki, MuSiQue) that reward exact entity disambiguation and iterative evidence aggregation.
Limitations
Because retrieval is purely lexical, GrepSeek is weaker on surface-form
variation / long-tail queries — e.g. PopQA (diacritics, name variants) — and
grep has no semantic relevance ranking, so an authoritative passage can be
buried behind earlier file-order matches. Dense retrieval remains advantageous on
heavily semantic or paraphrase-driven queries.
License
Inherits the license of the base model Qwen/Qwen3.5-9B — confirm and update the
license field above if needed.
Citation
@misc{salemi2026grepseektrainingsearchagents,
title={GrepSeek: Training Search Agents for Direct Corpus Interaction},
author={Alireza Salemi and Chang Zeng and Atharva Nijasure and Jui-Hui Chung and Razieh Rahimi and Fernando Diaz and Hamed Zamani},
year={2026},
eprint={2605.29307},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2605.29307},
}
- Downloads last month
- 581
Model tree for alireza7/GrepSeek-Qwen3.5-9B-GRPO
Base model
Qwen/Qwen3.5-9B-Base