|
|
--- |
|
|
title: Ragmint MCP Server |
|
|
emoji: 🧠 |
|
|
colorFrom: blue |
|
|
colorTo: purple |
|
|
sdk: gradio |
|
|
sdk_version: "5.49.1" |
|
|
app_file: app.py |
|
|
license: apache-2.0 |
|
|
pinned: true |
|
|
short_description: MCP server for Ragmint with RAG pipeline optimization |
|
|
tags: |
|
|
- building-mcp-track-enterprise |
|
|
- mcp |
|
|
- rag |
|
|
- llm |
|
|
- gradio |
|
|
- bayesian-optimization |
|
|
- embeddings |
|
|
- vector-search |
|
|
- gemini |
|
|
- retrievers |
|
|
- python-library |
|
|
--- |
|
|
|
|
|
# Ragmint MCP Server |
|
|
<p align="center"> |
|
|
<img src="https://raw.githubusercontent.com/andyolivers/ragmint/main/src/ragmint/assets/img/ragmint-banner70.png" height="70px" alt="Ragmint Banner"> |
|
|
</p> |
|
|
|
|
|
Gradio-based MCP server for Ragmint, enabling **Retrieval-Augmented Generation (RAG) pipeline optimization and tuning** via an MCP interface. |
|
|
|
|
|
    [](https://www.linkedin.com/posts/andyolivers_ragmint-mcp-server-a-hugging-face-space-activity-7399028674261348352-P5wy?utm_source=share&utm_medium=member_desktop&rcm=ACoAABanwk4Bp0A-FVwO9wyzwVp0g_yqZoRDptI) |
|
|
|
|
|
--- |
|
|
|
|
|
## 🧩 Overview |
|
|
|
|
|
Ragmint MCP Server exposes the full power of **Ragmint**, a modular Python library for **evaluating, optimizing, and tuning RAG pipelines**, through a **Multimodal Control Plane (MCP)**. This allows external clients (like Claude Desktop or Cursor) to **run experiments and tune RAG parameters programmatically**. |
|
|
|
|
|
## Ragmint |
|
|
|
|
|
[Ragmint](https://github.com/andyolivers/ragmint) (Retrieval-Augmented Generation Model Inspection & Tuning) is a **modular Python library** for **evaluating, optimizing, and tuning RAG pipelines**. It’s designed for developers and researchers who want automated hyperparameter optimization, retriever selection, embedding tuning, explainability, and reproducible experiment tracking. |
|
|
|
|
|
 |
|
|
 |
|
|
[](https://pypi.org/project/ragmint/) |
|
|
[](https://huggingface.co/spaces/andyolivers/ragmint-mcp-server) |
|
|
 |
|
|
 |
|
|
 |
|
|
 |
|
|
|
|
|
|
|
|
### Features exposed via MCP: |
|
|
|
|
|
* ✅ Automated hyperparameter optimization (Grid, Random, Bayesian via Optuna). |
|
|
* 🤖 Auto-RAG Tuner for dynamic retriever–embedding recommendations. |
|
|
* 🧮 Validation QA generation for corpora without labeled data. |
|
|
* 📦 Chunking, embeddings, retrievers, rerankers configuration. |
|
|
* ⚙️ Full RAG pipeline control programmatically. |
|
|
|
|
|
--- |
|
|
|
|
|
## 🚀 Quick Start |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
### Running the MCP Server |
|
|
|
|
|
```bash |
|
|
python app.py |
|
|
``` |
|
|
|
|
|
The server will expose MCP-compatible endpoints, allowing clients to: |
|
|
|
|
|
* Perform optimization experiments. |
|
|
* Automatically autotune pipelines. |
|
|
* Generate validation QA sets with LLM. |
|
|
|
|
|
|
|
|
### Environment Variables |
|
|
|
|
|
Set API keys for LLMs used in explainability and QA generation: |
|
|
|
|
|
```bash |
|
|
export GOOGLE_API_KEY="your_gemini_key" |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## 🧠 MCP Usage |
|
|
|
|
|
Ragmint MCP Server provides Python-callable interfaces for programmatic control. You can find an example of MCP usage in the [Ragmint MCP Server Space](https://huggingface.co/spaces/andyolivers/ragmint-mcp-server) on Hugging Face. |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## 🔤 Supported Embeddings |
|
|
|
|
|
* `sentence-transformers/all-MiniLM-L6-v2` |
|
|
* `sentence-transformers/all-mpnet-base-v2` |
|
|
* `BAAI/bge-base-en-v1.5` |
|
|
* `intfloat/multilingual-e5-base` |
|
|
|
|
|
### Configuration Example |
|
|
|
|
|
```yaml |
|
|
embedding_model: sentence-transformers/all-MiniLM-L6-v2 |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## 🔍 Supported Retrievers |
|
|
|
|
|
| Retriever | Description | |
|
|
|--------------|------------------------------------------------------------------| |
|
|
| FAISS | Fast vector similarity search and indexing. | |
|
|
| Chroma | Persistent vector database with embeddings. | |
|
|
| bm25 | Classical lexical search based on term relevance (TF-IDF-style). | |
|
|
| numpy | Brute-force similarity search using raw vectors and matrix ops. | |
|
|
|
|
|
### Configuration Example |
|
|
|
|
|
```yaml |
|
|
retriever: faiss |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## 🧮 Dataset Options |
|
|
|
|
|
| Mode | Example | Description | |
|
|
|----------------------|------------------------------------|------------------------------------| |
|
|
| Default | validation_set=None | Uses built-in validation_qa.json. | |
|
|
| Custom File | validation_set="data/my_eval.json" | Your QA dataset. | |
|
|
| Hugging Face Dataset | validation_set="squad" | Downloads benchmark dataset. | |
|
|
| Generate | validation_set="generate" | Generates the QA dataset with LLM. | |
|
|
|
|
|
--- |
|
|
|
|
|
## 🧩 Folder Structure |
|
|
|
|
|
``` |
|
|
ragmint_mcp_server/ |
|
|
├── app.py # MCP server entrypoint |
|
|
├── models.py |
|
|
└── api.py |
|
|
``` |
|
|
--- |
|
|
## 🔧 MCP Tools (app.py) |
|
|
|
|
|
The `app.py` file provides the Gradio UI and also registers the functions exposed as **MCP Tools**, enabling external MCP clients (Claude Desktop, Cursor, VS Code MCP extension, etc.) to call Ragmint programmatically. |
|
|
|
|
|
`app.py` launches the FastAPI backend (`api.py`) in a background thread and exposes the following MCP tools: |
|
|
|
|
|
| MCP Tool | Python Function | Description | |
|
|
|-----------|------------------------|------------------------------------------------------------------------------------| |
|
|
| upload_docs | upload_docs_tool() | Uploads `.txt` files or remote URLs into the configured `docs_path`. | |
|
|
| upload_urls | upload_urls_tool() | Downloads remote files from external URLs and stores them inside `docs_path`. | |
|
|
| optimize_rag | optimize_rag_tool() | Runs explicit hyperparameter optimization for a RAG pipeline. | |
|
|
| autotune | autotune_tool() | Automatically recommends best chunking + embedding configuration. | |
|
|
| generate_qa | generate_qa_tool() | Generates synthetic QA validation dataset for evaluation. | |
|
|
| clear_cache | clear_cache_tool() | Deletes all docs inside `data/docs` to reset the workspace. | |
|
|
|
|
|
--- |
|
|
|
|
|
## 🎬 Demo |
|
|
|
|
|
YouTube: https://www.youtube.com/watch?v=DKtHBI3jYgQ |
|
|
|
|
|
--- |
|
|
|
|
|
## 📥 Inputs |
|
|
|
|
|
The Ragmint MCP Server exposes three main endpoints with the following inputs: |
|
|
|
|
|
|
|
|
### 1. Upload Documents (`upload_docs`) |
|
|
|
|
|
Input: `.txt` files or file-like objects to upload to the documents directory (`docs_path`). |
|
|
|
|
|
<details> |
|
|
<summary>View Input Model</summary> |
|
|
|
|
|
| Field | Type | Description | Example | |
|
|
|--------|-------|-------------|---------| |
|
|
| files | File[] | Local `.txt` files selected or passed from MCP client | ["sample.txt"] | |
|
|
| docs_path | str | Directory where files are stored | data/docs | |
|
|
</details> |
|
|
|
|
|
|
|
|
### 2. Upload URLs (`upload_urls`) |
|
|
|
|
|
Input: List of URLs referencing `.txt` files to download and store in `docs_path`. |
|
|
|
|
|
<details> |
|
|
<summary>View Input Model</summary> |
|
|
|
|
|
| Field | Type | Description | Example | |
|
|
|--------|-------|-------------|---------| |
|
|
| urls | List[str] | List of URLs pointing to remote documents | ["https://example.com/doc.txt"] | |
|
|
| docs_path | str | Directory where downloaded files are saved | data/docs | |
|
|
|
|
|
</details> |
|
|
|
|
|
### 3. Optimize RAG (`optimize_rag`) |
|
|
|
|
|
Input: JSON object following the `OptimizeRequest` model. |
|
|
|
|
|
<details> |
|
|
<summary>View Input Model</summary> |
|
|
|
|
|
| Field | Type | Description | Example | |
|
|
|-------|------|-------------|---------| |
|
|
| docs_path | str | Folder containing documents | data/docs | |
|
|
| retriever | List[str] | Retriever type | ["faiss"] | |
|
|
| embedding_model | List[str] | Embedding model name or path | ["sentence-transformers/all-MiniLM-L6-v2"] | |
|
|
| strategy | List[str] | RAG strategy | ["fixed"] | |
|
|
| chunk_sizes | List[int] | Chunk sizes to evaluate | [200] | |
|
|
| overlaps | List[int] | Overlap values to test | [50] | |
|
|
| rerankers | List[str] | Rerankers to apply after retrieval | ["mmr"] | |
|
|
| search_type | str | Parameter search method (grid, random, bayesian) | "grid" | |
|
|
| trials | int | Number of optimization trials | 2 | |
|
|
| metric | str | Evaluation metric for optimization | "faithfulness" | |
|
|
| validation_choice | str | Validation data source (generate, local JSON path, HF dataset ID, etc.) | "generate" | |
|
|
| llm_model | str | LLM used to generate QA dataset when validation_choice=generate | "gemini-2.5-flash-lite" | |
|
|
|
|
|
</details> |
|
|
|
|
|
### 4. Autotune RAG (`autotune`) |
|
|
|
|
|
Input: JSON object following the `AutotuneRequest` model. |
|
|
|
|
|
<details> |
|
|
<summary>View Input Model</summary> |
|
|
|
|
|
| Field | Type | Description | Example | |
|
|
|-------|------|-------------|---------| |
|
|
| docs_path | str | Folder containing documents | data/docs | |
|
|
| embedding_model | str | Embedding model name or path | "sentence-transformers/all-MiniLM-L6-v2" | |
|
|
| num_chunk_pairs | int | Number of chunk pairs to analyze for tuning | 2 | |
|
|
| metric | str | Evaluation metric for optimization | "faithfulness" | |
|
|
| search_type | str | Search method (grid, random, bayesian) | "grid" | |
|
|
| trials | int | Number of optimization trials | 2 | |
|
|
| validation_choice | str | Validation data source (generate, local JSON, HF dataset) | "generate" | |
|
|
| llm_model | str | LLM used for generating QA dataset | "gemini-2.5-flash-lite" | |
|
|
|
|
|
</details> |
|
|
|
|
|
### 5. Generate QA (`generate_qa`) |
|
|
|
|
|
Input: JSON object following the `QARequest` model. |
|
|
<details> |
|
|
<summary>View Input Model</summary> |
|
|
|
|
|
| Field | Type | Description | Example | |
|
|
|-------|------|-------------|---------| |
|
|
| docs_path | str | Folder containing documents for QA generation | data/docs | |
|
|
| llm_model | str | LLM used for question generation | "gemini-2.5-flash-lite" | |
|
|
| batch_size | int | Number of documents processed per batch | 5 | |
|
|
| min_q | int | Minimum number of questions per document | 3 | |
|
|
| max_q | int | Maximum number of questions per document | 25 | |
|
|
|
|
|
</details> |
|
|
|
|
|
### 6. Clear Cache (`clear_cache`) |
|
|
|
|
|
Deletes all stored documents from `data/docs`. |
|
|
|
|
|
<details> |
|
|
<summary>View Input Model</summary> |
|
|
|
|
|
| Field | Type | Description | Example | |
|
|
|--------|-------|-------------|---------| |
|
|
| docs_path | str | Folder to wipe clean | data/docs | |
|
|
|
|
|
</details> |
|
|
|
|
|
--- |
|
|
|
|
|
## 📤 Outputs |
|
|
|
|
|
The Ragmint MCP Server exposes three main endpoints with the following example outputs: |
|
|
|
|
|
### 1. Upload Documents Response (`upload_docs`) |
|
|
|
|
|
<details> |
|
|
<summary>View Response Example</summary> |
|
|
|
|
|
```json |
|
|
{ |
|
|
"status": "ok", |
|
|
"uploaded_files": ["sample.txt"], |
|
|
"docs_path": "data/docs" |
|
|
} |
|
|
``` |
|
|
|
|
|
</details> |
|
|
|
|
|
- **status**: `"ok"` → Indicates that the upload was successful. |
|
|
- **uploaded_files**: List of file names that were successfully uploaded. |
|
|
- **docs_path**: The directory where the uploaded documents are stored. |
|
|
|
|
|
✅ Confirms your documents are ready for RAG operations. |
|
|
|
|
|
|
|
|
### 2. Upload URLs Response (`upload_urls`) |
|
|
|
|
|
<details> |
|
|
<summary>View Response Example</summary> |
|
|
|
|
|
```json |
|
|
{ |
|
|
"status": "ok", |
|
|
"uploaded_files": ["doc.txt"], |
|
|
"docs_path": "data/docs" |
|
|
} |
|
|
``` |
|
|
</details> |
|
|
|
|
|
- **status**: `"ok"` → Indicates that the upload was successful. |
|
|
- **uploaded_files**: List of file names that were successfully uploaded. |
|
|
- **docs_path**: The directory where the uploaded documents are stored. |
|
|
|
|
|
✅ Confirms your documents are ready for RAG operations. |
|
|
|
|
|
|
|
|
### 3. Optimize RAG Response (`optimize_rag`) |
|
|
|
|
|
<details> |
|
|
<summary>View Response Example</summary> |
|
|
|
|
|
```json |
|
|
{ |
|
|
"status": "finished", |
|
|
"run_id": "opt_1763222218", |
|
|
"elapsed_seconds": 0.937, |
|
|
"best_config": { |
|
|
"retriever": "faiss", |
|
|
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2", |
|
|
"reranker": "mmr", |
|
|
"chunk_size": 200, |
|
|
"overlap": 50, |
|
|
"strategy": "fixed", |
|
|
"faithfulness": 0.8659, |
|
|
"latency": 0.0333 |
|
|
}, |
|
|
"results": [ |
|
|
{ |
|
|
"retriever": "faiss", |
|
|
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2", |
|
|
"reranker": "mmr", |
|
|
"chunk_size": 200, |
|
|
"overlap": 50, |
|
|
"strategy": "fixed", |
|
|
"faithfulness": 0.8659, |
|
|
"latency": 0.0333 |
|
|
} |
|
|
], |
|
|
"corpus_stats": { |
|
|
"num_docs": 1, |
|
|
"avg_len": 8.0, |
|
|
"corpus_size": 61 |
|
|
} |
|
|
} |
|
|
``` |
|
|
|
|
|
</details> |
|
|
|
|
|
- **status**: `"finished"` → Optimization process completed. |
|
|
- **run_id**: Unique identifier for this optimization run. |
|
|
- **elapsed_seconds**: How long the optimization took. |
|
|
- **best_config**: Configuration that gave the best performance. |
|
|
- **retriever** → The retrieval algorithm used (faiss). |
|
|
- **embedding_model** → Embedding model applied. |
|
|
- **reranker** → Reranking strategy after retrieval. |
|
|
- **chunk_size** → Size of document chunks used in RAG. |
|
|
- **overlap** → Overlap between consecutive chunks. |
|
|
- **strategy** → RAG retrieval strategy. |
|
|
- **faithfulness** → Evaluation score (higher = better). |
|
|
- **latency** → Time per query in seconds. |
|
|
- **results**: List of all tested configurations and their scores. |
|
|
- **corpus_stats**: Statistics about the uploaded documents. |
|
|
- **num_docs** → Number of documents in corpus. |
|
|
- **avg_len** → Average document length. |
|
|
- **corpus_size** → Total size in characters or tokens. |
|
|
|
|
|
|
|
|
### 4. Autotune RAG Response (`autotune`) |
|
|
|
|
|
<details> |
|
|
<summary>View Response Example</summary> |
|
|
|
|
|
```json |
|
|
{ |
|
|
"status": "finished", |
|
|
"run_id": "autotune_1763222228", |
|
|
"elapsed_seconds": 4.733, |
|
|
"recommendation": { |
|
|
"retriever": "BM25", |
|
|
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2", |
|
|
"chunk_size": 100, |
|
|
"overlap": 30, |
|
|
"strategy": "fixed", |
|
|
"chunk_candidates": [[100, 30], [110, 30]] |
|
|
}, |
|
|
"chunk_candidates": [[90, 50], [70, 50]], |
|
|
"best_config": { |
|
|
"retriever": "BM25", |
|
|
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2", |
|
|
"reranker": "mmr", |
|
|
"chunk_size": 70, |
|
|
"overlap": 50, |
|
|
"strategy": "fixed", |
|
|
"faithfulness": 1.0, |
|
|
"latency": 0.0272 |
|
|
}, |
|
|
"results": [ |
|
|
{ |
|
|
"retriever": "BM25", |
|
|
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2", |
|
|
"reranker": "mmr", |
|
|
"chunk_size": 70, |
|
|
"overlap": 50, |
|
|
"strategy": "fixed", |
|
|
"faithfulness": 1.0, |
|
|
"latency": 0.0272 |
|
|
}, |
|
|
{ |
|
|
"retriever": "BM25", |
|
|
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2", |
|
|
"reranker": "mmr", |
|
|
"chunk_size": 90, |
|
|
"overlap": 50, |
|
|
"strategy": "fixed", |
|
|
"faithfulness": 1.0, |
|
|
"latency": 0.0186 |
|
|
} |
|
|
], |
|
|
"corpus_stats": { |
|
|
"num_docs": 1, |
|
|
"avg_len": 8.0, |
|
|
"corpus_size": 61 |
|
|
} |
|
|
} |
|
|
``` |
|
|
|
|
|
</details> |
|
|
|
|
|
- **recommendation**: The tuned configuration suggested by the autotuner. |
|
|
- **chunk_candidates**: List of possible chunk_size/overlap pairs analyzed. |
|
|
- **best_config**: Best-performing configuration with metrics. |
|
|
- **results**: All tested configurations and their performance. |
|
|
- **corpus_stats**: Same as in optimize response. |
|
|
- **status, run_id, elapsed_seconds**: Same meaning as Optimize endpoint. |
|
|
|
|
|
🧠 **Difference from Optimize**: Autotune automatically selects the best hyperparameters, rather than testing all user-specified combinations. |
|
|
|
|
|
|
|
|
### 5. Generate QA Response (`generate_qa`) |
|
|
|
|
|
<details> |
|
|
<summary>View Response Example</summary> |
|
|
|
|
|
```json |
|
|
{ |
|
|
"status": "finished", |
|
|
"output_path": "data/docs/validation_qa.json", |
|
|
"preview_count": 3, |
|
|
"sample": [ |
|
|
{ |
|
|
"query": "What capability does Artificial Intelligence provide to machines?", |
|
|
"expected_answer": "Artificial Intelligence enables machines to learn from data." |
|
|
}, |
|
|
{ |
|
|
"query": "What is the primary source of learning for machines with Artificial Intelligence?", |
|
|
"expected_answer": "Machines with Artificial Intelligence learn from data." |
|
|
}, |
|
|
{ |
|
|
"query": "How does Artificial Intelligence facilitate machine learning?", |
|
|
"expected_answer": "Artificial Intelligence enables machines to learn from data." |
|
|
} |
|
|
] |
|
|
} |
|
|
``` |
|
|
|
|
|
</details> |
|
|
|
|
|
- **output_path**: Where the generated QA JSON file is saved. |
|
|
- **preview_count**: Number of QA pairs included in the response preview. |
|
|
- **sample**: Example QA pairs: |
|
|
- **query** → The question generated from the document. |
|
|
- **expected_answer** → The reference answer corresponding to that question. |
|
|
- **status**: `"finished"` → QA generation completed successfully. |
|
|
|
|
|
|
|
|
### 6. Clear Cache Response (`clear_cache`) |
|
|
|
|
|
<details> |
|
|
<summary>View Response Example</summary> |
|
|
|
|
|
```json |
|
|
{ |
|
|
"status": "ok", |
|
|
"deleted_files": 7, |
|
|
"docs_path": "data/docs" |
|
|
} |
|
|
``` |
|
|
</details> |
|
|
|
|
|
- **deleted_files**: Number of documents removed. |
|
|
- **status**: "ok" indicates successful workspace reset. |
|
|
|
|
|
--- |
|
|
|
|
|
## 📘 License |
|
|
|
|
|
This project is licensed under the Apache License 2.0. See the [LICENSE](LICENSE) file for details. |
|
|
|
|
|
--- |
|
|
|
|
|
<p align="center"> |
|
|
<sub>Built with ❤️ by <a href="https://andyolivers.com">André Oliveira</a> | Apache 2.0 License</sub> |
|
|
</p> |
|
|
|