ragmint-mcp-server / README.md
André Oliveira
added demo on youtube
56014e8
---
title: Ragmint MCP Server
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "5.49.1"
app_file: app.py
license: apache-2.0
pinned: true
short_description: MCP server for Ragmint with RAG pipeline optimization
tags:
- building-mcp-track-enterprise
- mcp
- rag
- llm
- gradio
- bayesian-optimization
- embeddings
- vector-search
- gemini
- retrievers
- python-library
---
# Ragmint MCP Server
<p align="center">
<img src="https://raw.githubusercontent.com/andyolivers/ragmint/main/src/ragmint/assets/img/ragmint-banner70.png" height="70px" alt="Ragmint Banner">
</p>
Gradio-based MCP server for Ragmint, enabling **Retrieval-Augmented Generation (RAG) pipeline optimization and tuning** via an MCP interface.
![Python](https://img.shields.io/badge/python-3.9%2B-blue) ![License](https://img.shields.io/badge/license-Apache%202.0-green) ![Status](https://img.shields.io/badge/Status-Active-success) ![MCP](https://img.shields.io/badge/MCP-enabled-brightgreen) [![LinkedIn](https://img.shields.io/badge/LinkedIn-Post-blue)](https://www.linkedin.com/posts/andyolivers_ragmint-mcp-server-a-hugging-face-space-activity-7399028674261348352-P5wy?utm_source=share&utm_medium=member_desktop&rcm=ACoAABanwk4Bp0A-FVwO9wyzwVp0g_yqZoRDptI)
---
## 🧩 Overview
Ragmint MCP Server exposes the full power of **Ragmint**, a modular Python library for **evaluating, optimizing, and tuning RAG pipelines**, through a **Multimodal Control Plane (MCP)**. This allows external clients (like Claude Desktop or Cursor) to **run experiments and tune RAG parameters programmatically**.
## Ragmint
[Ragmint](https://github.com/andyolivers/ragmint) (Retrieval-Augmented Generation Model Inspection & Tuning) is a **modular Python library** for **evaluating, optimizing, and tuning RAG pipelines**. It’s designed for developers and researchers who want automated hyperparameter optimization, retriever selection, embedding tuning, explainability, and reproducible experiment tracking.
![Python](https://img.shields.io/badge/python-3.9%2B-blue)
![License](https://img.shields.io/badge/license-Apache%202.0-green)
[![PyPI](https://img.shields.io/pypi/v/ragmint?color=blue)](https://pypi.org/project/ragmint/)
[![HF Space](https://img.shields.io/badge/HF-Space-blue)](https://huggingface.co/spaces/andyolivers/ragmint-mcp-server)
![MCP](https://img.shields.io/badge/MCP-Enabled-green)
![Status](https://img.shields.io/badge/Status-Beta-orange)
![Optuna](https://img.shields.io/badge/Optuna-Bayesian%20Optimization-6f42c1?logo=optuna&logoColor=white)
![Google Gemini 2.5](https://img.shields.io/badge/Google%20Gemini-LLM-lightblue?logo=google&logoColor=white)
### Features exposed via MCP:
* ✅ Automated hyperparameter optimization (Grid, Random, Bayesian via Optuna).
* 🤖 Auto-RAG Tuner for dynamic retriever–embedding recommendations.
* 🧮 Validation QA generation for corpora without labeled data.
* 📦 Chunking, embeddings, retrievers, rerankers configuration.
* ⚙️ Full RAG pipeline control programmatically.
---
## 🚀 Quick Start
### Installation
```bash
pip install -r requirements.txt
```
### Running the MCP Server
```bash
python app.py
```
The server will expose MCP-compatible endpoints, allowing clients to:
* Perform optimization experiments.
* Automatically autotune pipelines.
* Generate validation QA sets with LLM.
### Environment Variables
Set API keys for LLMs used in explainability and QA generation:
```bash
export GOOGLE_API_KEY="your_gemini_key"
```
---
## 🧠 MCP Usage
Ragmint MCP Server provides Python-callable interfaces for programmatic control. You can find an example of MCP usage in the [Ragmint MCP Server Space](https://huggingface.co/spaces/andyolivers/ragmint-mcp-server) on Hugging Face.
---
## 🔤 Supported Embeddings
* `sentence-transformers/all-MiniLM-L6-v2`
* `sentence-transformers/all-mpnet-base-v2`
* `BAAI/bge-base-en-v1.5`
* `intfloat/multilingual-e5-base`
### Configuration Example
```yaml
embedding_model: sentence-transformers/all-MiniLM-L6-v2
```
---
## 🔍 Supported Retrievers
| Retriever | Description |
|--------------|------------------------------------------------------------------|
| FAISS | Fast vector similarity search and indexing. |
| Chroma | Persistent vector database with embeddings. |
| bm25 | Classical lexical search based on term relevance (TF-IDF-style). |
| numpy | Brute-force similarity search using raw vectors and matrix ops. |
### Configuration Example
```yaml
retriever: faiss
```
---
## 🧮 Dataset Options
| Mode | Example | Description |
|----------------------|------------------------------------|------------------------------------|
| Default | validation_set=None | Uses built-in validation_qa.json. |
| Custom File | validation_set="data/my_eval.json" | Your QA dataset. |
| Hugging Face Dataset | validation_set="squad" | Downloads benchmark dataset. |
| Generate | validation_set="generate" | Generates the QA dataset with LLM. |
---
## 🧩 Folder Structure
```
ragmint_mcp_server/
├── app.py # MCP server entrypoint
├── models.py
└── api.py
```
---
## 🔧 MCP Tools (app.py)
The `app.py` file provides the Gradio UI and also registers the functions exposed as **MCP Tools**, enabling external MCP clients (Claude Desktop, Cursor, VS Code MCP extension, etc.) to call Ragmint programmatically.
`app.py` launches the FastAPI backend (`api.py`) in a background thread and exposes the following MCP tools:
| MCP Tool | Python Function | Description |
|-----------|------------------------|------------------------------------------------------------------------------------|
| upload_docs | upload_docs_tool() | Uploads `.txt` files or remote URLs into the configured `docs_path`. |
| upload_urls | upload_urls_tool() | Downloads remote files from external URLs and stores them inside `docs_path`. |
| optimize_rag | optimize_rag_tool() | Runs explicit hyperparameter optimization for a RAG pipeline. |
| autotune | autotune_tool() | Automatically recommends best chunking + embedding configuration. |
| generate_qa | generate_qa_tool() | Generates synthetic QA validation dataset for evaluation. |
| clear_cache | clear_cache_tool() | Deletes all docs inside `data/docs` to reset the workspace. |
---
## 🎬 Demo
YouTube: https://www.youtube.com/watch?v=DKtHBI3jYgQ
---
## 📥 Inputs
The Ragmint MCP Server exposes three main endpoints with the following inputs:
### 1. Upload Documents (`upload_docs`)
Input: `.txt` files or file-like objects to upload to the documents directory (`docs_path`).
<details>
<summary>View Input Model</summary>
| Field | Type | Description | Example |
|--------|-------|-------------|---------|
| files | File[] | Local `.txt` files selected or passed from MCP client | ["sample.txt"] |
| docs_path | str | Directory where files are stored | data/docs |
</details>
### 2. Upload URLs (`upload_urls`)
Input: List of URLs referencing `.txt` files to download and store in `docs_path`.
<details>
<summary>View Input Model</summary>
| Field | Type | Description | Example |
|--------|-------|-------------|---------|
| urls | List[str] | List of URLs pointing to remote documents | ["https://example.com/doc.txt"] |
| docs_path | str | Directory where downloaded files are saved | data/docs |
</details>
### 3. Optimize RAG (`optimize_rag`)
Input: JSON object following the `OptimizeRequest` model.
<details>
<summary>View Input Model</summary>
| Field | Type | Description | Example |
|-------|------|-------------|---------|
| docs_path | str | Folder containing documents | data/docs |
| retriever | List[str] | Retriever type | ["faiss"] |
| embedding_model | List[str] | Embedding model name or path | ["sentence-transformers/all-MiniLM-L6-v2"] |
| strategy | List[str] | RAG strategy | ["fixed"] |
| chunk_sizes | List[int] | Chunk sizes to evaluate | [200] |
| overlaps | List[int] | Overlap values to test | [50] |
| rerankers | List[str] | Rerankers to apply after retrieval | ["mmr"] |
| search_type | str | Parameter search method (grid, random, bayesian) | "grid" |
| trials | int | Number of optimization trials | 2 |
| metric | str | Evaluation metric for optimization | "faithfulness" |
| validation_choice | str | Validation data source (generate, local JSON path, HF dataset ID, etc.) | "generate" |
| llm_model | str | LLM used to generate QA dataset when validation_choice=generate | "gemini-2.5-flash-lite" |
</details>
### 4. Autotune RAG (`autotune`)
Input: JSON object following the `AutotuneRequest` model.
<details>
<summary>View Input Model</summary>
| Field | Type | Description | Example |
|-------|------|-------------|---------|
| docs_path | str | Folder containing documents | data/docs |
| embedding_model | str | Embedding model name or path | "sentence-transformers/all-MiniLM-L6-v2" |
| num_chunk_pairs | int | Number of chunk pairs to analyze for tuning | 2 |
| metric | str | Evaluation metric for optimization | "faithfulness" |
| search_type | str | Search method (grid, random, bayesian) | "grid" |
| trials | int | Number of optimization trials | 2 |
| validation_choice | str | Validation data source (generate, local JSON, HF dataset) | "generate" |
| llm_model | str | LLM used for generating QA dataset | "gemini-2.5-flash-lite" |
</details>
### 5. Generate QA (`generate_qa`)
Input: JSON object following the `QARequest` model.
<details>
<summary>View Input Model</summary>
| Field | Type | Description | Example |
|-------|------|-------------|---------|
| docs_path | str | Folder containing documents for QA generation | data/docs |
| llm_model | str | LLM used for question generation | "gemini-2.5-flash-lite" |
| batch_size | int | Number of documents processed per batch | 5 |
| min_q | int | Minimum number of questions per document | 3 |
| max_q | int | Maximum number of questions per document | 25 |
</details>
### 6. Clear Cache (`clear_cache`)
Deletes all stored documents from `data/docs`.
<details>
<summary>View Input Model</summary>
| Field | Type | Description | Example |
|--------|-------|-------------|---------|
| docs_path | str | Folder to wipe clean | data/docs |
</details>
---
## 📤 Outputs
The Ragmint MCP Server exposes three main endpoints with the following example outputs:
### 1. Upload Documents Response (`upload_docs`)
<details>
<summary>View Response Example</summary>
```json
{
"status": "ok",
"uploaded_files": ["sample.txt"],
"docs_path": "data/docs"
}
```
</details>
- **status**: `"ok"` → Indicates that the upload was successful.
- **uploaded_files**: List of file names that were successfully uploaded.
- **docs_path**: The directory where the uploaded documents are stored.
✅ Confirms your documents are ready for RAG operations.
### 2. Upload URLs Response (`upload_urls`)
<details>
<summary>View Response Example</summary>
```json
{
"status": "ok",
"uploaded_files": ["doc.txt"],
"docs_path": "data/docs"
}
```
</details>
- **status**: `"ok"` → Indicates that the upload was successful.
- **uploaded_files**: List of file names that were successfully uploaded.
- **docs_path**: The directory where the uploaded documents are stored.
✅ Confirms your documents are ready for RAG operations.
### 3. Optimize RAG Response (`optimize_rag`)
<details>
<summary>View Response Example</summary>
```json
{
"status": "finished",
"run_id": "opt_1763222218",
"elapsed_seconds": 0.937,
"best_config": {
"retriever": "faiss",
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
"reranker": "mmr",
"chunk_size": 200,
"overlap": 50,
"strategy": "fixed",
"faithfulness": 0.8659,
"latency": 0.0333
},
"results": [
{
"retriever": "faiss",
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
"reranker": "mmr",
"chunk_size": 200,
"overlap": 50,
"strategy": "fixed",
"faithfulness": 0.8659,
"latency": 0.0333
}
],
"corpus_stats": {
"num_docs": 1,
"avg_len": 8.0,
"corpus_size": 61
}
}
```
</details>
- **status**: `"finished"` → Optimization process completed.
- **run_id**: Unique identifier for this optimization run.
- **elapsed_seconds**: How long the optimization took.
- **best_config**: Configuration that gave the best performance.
- **retriever** → The retrieval algorithm used (faiss).
- **embedding_model** → Embedding model applied.
- **reranker** → Reranking strategy after retrieval.
- **chunk_size** → Size of document chunks used in RAG.
- **overlap** → Overlap between consecutive chunks.
- **strategy** → RAG retrieval strategy.
- **faithfulness** → Evaluation score (higher = better).
- **latency** → Time per query in seconds.
- **results**: List of all tested configurations and their scores.
- **corpus_stats**: Statistics about the uploaded documents.
- **num_docs** → Number of documents in corpus.
- **avg_len** → Average document length.
- **corpus_size** → Total size in characters or tokens.
### 4. Autotune RAG Response (`autotune`)
<details>
<summary>View Response Example</summary>
```json
{
"status": "finished",
"run_id": "autotune_1763222228",
"elapsed_seconds": 4.733,
"recommendation": {
"retriever": "BM25",
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
"chunk_size": 100,
"overlap": 30,
"strategy": "fixed",
"chunk_candidates": [[100, 30], [110, 30]]
},
"chunk_candidates": [[90, 50], [70, 50]],
"best_config": {
"retriever": "BM25",
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
"reranker": "mmr",
"chunk_size": 70,
"overlap": 50,
"strategy": "fixed",
"faithfulness": 1.0,
"latency": 0.0272
},
"results": [
{
"retriever": "BM25",
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
"reranker": "mmr",
"chunk_size": 70,
"overlap": 50,
"strategy": "fixed",
"faithfulness": 1.0,
"latency": 0.0272
},
{
"retriever": "BM25",
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
"reranker": "mmr",
"chunk_size": 90,
"overlap": 50,
"strategy": "fixed",
"faithfulness": 1.0,
"latency": 0.0186
}
],
"corpus_stats": {
"num_docs": 1,
"avg_len": 8.0,
"corpus_size": 61
}
}
```
</details>
- **recommendation**: The tuned configuration suggested by the autotuner.
- **chunk_candidates**: List of possible chunk_size/overlap pairs analyzed.
- **best_config**: Best-performing configuration with metrics.
- **results**: All tested configurations and their performance.
- **corpus_stats**: Same as in optimize response.
- **status, run_id, elapsed_seconds**: Same meaning as Optimize endpoint.
🧠 **Difference from Optimize**: Autotune automatically selects the best hyperparameters, rather than testing all user-specified combinations.
### 5. Generate QA Response (`generate_qa`)
<details>
<summary>View Response Example</summary>
```json
{
"status": "finished",
"output_path": "data/docs/validation_qa.json",
"preview_count": 3,
"sample": [
{
"query": "What capability does Artificial Intelligence provide to machines?",
"expected_answer": "Artificial Intelligence enables machines to learn from data."
},
{
"query": "What is the primary source of learning for machines with Artificial Intelligence?",
"expected_answer": "Machines with Artificial Intelligence learn from data."
},
{
"query": "How does Artificial Intelligence facilitate machine learning?",
"expected_answer": "Artificial Intelligence enables machines to learn from data."
}
]
}
```
</details>
- **output_path**: Where the generated QA JSON file is saved.
- **preview_count**: Number of QA pairs included in the response preview.
- **sample**: Example QA pairs:
- **query** → The question generated from the document.
- **expected_answer** → The reference answer corresponding to that question.
- **status**: `"finished"` → QA generation completed successfully.
### 6. Clear Cache Response (`clear_cache`)
<details>
<summary>View Response Example</summary>
```json
{
"status": "ok",
"deleted_files": 7,
"docs_path": "data/docs"
}
```
</details>
- **deleted_files**: Number of documents removed.
- **status**: "ok" indicates successful workspace reset.
---
## 📘 License
This project is licensed under the Apache License 2.0. See the [LICENSE](LICENSE) file for details.
---
<p align="center">
<sub>Built with ❤️ by <a href="https://andyolivers.com">André Oliveira</a> | Apache 2.0 License</sub>
</p>