---
title: Ragmint MCP Server
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "5.49.1"
app_file: app.py
license: apache-2.0
pinned: true
short_description: MCP server for Ragmint with RAG pipeline optimization
tags:
  - building-mcp-track-enterprise
  - mcp
  - rag
  - llm
  - gradio
  - bayesian-optimization
  - embeddings
  - vector-search
  - gemini
  - retrievers
  - python-library
---

# Ragmint MCP Server
<p align="center">
  <img src="https://raw.githubusercontent.com/andyolivers/ragmint/main/src/ragmint/assets/img/ragmint-banner70.png" height="70px" alt="Ragmint Banner">
</p>

Gradio-based MCP server for Ragmint, enabling **Retrieval-Augmented Generation (RAG) pipeline optimization and tuning** via an MCP interface.

![Python](https://img.shields.io/badge/python-3.9%2B-blue) ![License](https://img.shields.io/badge/license-Apache%202.0-green) ![Status](https://img.shields.io/badge/Status-Active-success) ![MCP](https://img.shields.io/badge/MCP-enabled-brightgreen) [![LinkedIn](https://img.shields.io/badge/LinkedIn-Post-blue)](https://www.linkedin.com/posts/andyolivers_ragmint-mcp-server-a-hugging-face-space-activity-7399028674261348352-P5wy?utm_source=share&utm_medium=member_desktop&rcm=ACoAABanwk4Bp0A-FVwO9wyzwVp0g_yqZoRDptI)

---

## 🧩 Overview

Ragmint MCP Server exposes the full power of **Ragmint**, a modular Python library for **evaluating, optimizing, and tuning RAG pipelines**, through a **Multimodal Control Plane (MCP)**. This allows external clients (like Claude Desktop or Cursor) to **run experiments and tune RAG parameters programmatically**.

## Ragmint

[Ragmint](https://github.com/andyolivers/ragmint) (Retrieval-Augmented Generation Model Inspection & Tuning) is a **modular Python library** for **evaluating, optimizing, and tuning RAG pipelines**. It’s designed for developers and researchers who want automated hyperparameter optimization, retriever selection, embedding tuning, explainability, and reproducible experiment tracking.

![Python](https://img.shields.io/badge/python-3.9%2B-blue)
![License](https://img.shields.io/badge/license-Apache%202.0-green)
[![PyPI](https://img.shields.io/pypi/v/ragmint?color=blue)](https://pypi.org/project/ragmint/)
[![HF Space](https://img.shields.io/badge/HF-Space-blue)](https://huggingface.co/spaces/andyolivers/ragmint-mcp-server)
![MCP](https://img.shields.io/badge/MCP-Enabled-green) 
![Status](https://img.shields.io/badge/Status-Beta-orange) 
![Optuna](https://img.shields.io/badge/Optuna-Bayesian%20Optimization-6f42c1?logo=optuna&logoColor=white) 
![Google Gemini 2.5](https://img.shields.io/badge/Google%20Gemini-LLM-lightblue?logo=google&logoColor=white)


### Features exposed via MCP:

* ✅ Automated hyperparameter optimization (Grid, Random, Bayesian via Optuna).
* 🤖 Auto-RAG Tuner for dynamic retriever–embedding recommendations.
* 🧮 Validation QA generation for corpora without labeled data.
* 📦 Chunking, embeddings, retrievers, rerankers configuration.
* ⚙️ Full RAG pipeline control programmatically.

---

## 🚀 Quick Start

### Installation

```bash
pip install -r requirements.txt
```

### Running the MCP Server

```bash
python app.py
```

The server will expose MCP-compatible endpoints, allowing clients to:

* Perform optimization experiments.
* Automatically autotune pipelines.
* Generate validation QA sets with LLM.


### Environment Variables

Set API keys for LLMs used in explainability and QA generation:

```bash
export GOOGLE_API_KEY="your_gemini_key"
```

---

## 🧠 MCP Usage

Ragmint MCP Server provides Python-callable interfaces for programmatic control. You can find an example of MCP usage in the [Ragmint MCP Server Space](https://huggingface.co/spaces/andyolivers/ragmint-mcp-server) on Hugging Face.


---

## 🔤 Supported Embeddings

* `sentence-transformers/all-MiniLM-L6-v2`
* `sentence-transformers/all-mpnet-base-v2`
* `BAAI/bge-base-en-v1.5`
* `intfloat/multilingual-e5-base`

### Configuration Example

```yaml
embedding_model: sentence-transformers/all-MiniLM-L6-v2
```

---

## 🔍 Supported Retrievers

| Retriever    | Description                                                      |
|--------------|------------------------------------------------------------------|
| FAISS        | Fast vector similarity search and indexing.                      |
| Chroma       | Persistent vector database with embeddings.                      |
| bm25         | Classical lexical search based on term relevance (TF-IDF-style). |
| numpy        | Brute-force similarity search using raw vectors and matrix ops.  |

### Configuration Example

```yaml
retriever: faiss
```

---

## 🧮 Dataset Options

| Mode                 | Example                            | Description                        |
|----------------------|------------------------------------|------------------------------------|
| Default              | validation_set=None                | Uses built-in validation_qa.json.  |
| Custom File          | validation_set="data/my_eval.json" | Your QA dataset.                   |
| Hugging Face Dataset | validation_set="squad"             | Downloads benchmark dataset.       |
| Generate             | validation_set="generate"          | Generates the QA dataset with LLM. |

---

## 🧩 Folder Structure

```
ragmint_mcp_server/
├── app.py  # MCP server entrypoint
├── models.py
└── api.py
```
---
## 🔧 MCP Tools (app.py)

The `app.py` file provides the Gradio UI and also registers the functions exposed as **MCP Tools**, enabling external MCP clients (Claude Desktop, Cursor, VS Code MCP extension, etc.) to call Ragmint programmatically.

`app.py` launches the FastAPI backend (`api.py`) in a background thread and exposes the following MCP tools:

| MCP Tool  | Python Function        | Description                                                                        |
|-----------|------------------------|------------------------------------------------------------------------------------|
| upload_docs | upload_docs_tool()     | Uploads `.txt` files or remote URLs into the configured `docs_path`.              |
| upload_urls | upload_urls_tool()     | Downloads remote files from external URLs and stores them inside `docs_path`.     |
| optimize_rag | optimize_rag_tool()    | Runs explicit hyperparameter optimization for a RAG pipeline.                     |
| autotune  | autotune_tool()        | Automatically recommends best chunking + embedding configuration.                 |
| generate_qa | generate_qa_tool()     | Generates synthetic QA validation dataset for evaluation.                         |
| clear_cache | clear_cache_tool()     | Deletes all docs inside `data/docs` to reset the workspace.                       |

---

## 🎬 Demo

YouTube: https://www.youtube.com/watch?v=DKtHBI3jYgQ

---

## 📥 Inputs

The Ragmint MCP Server exposes three main endpoints with the following inputs:


### 1. Upload Documents (`upload_docs`)

Input: `.txt` files or file-like objects to upload to the documents directory (`docs_path`).

<details>
<summary>View Input Model</summary>

| Field | Type | Description | Example |
|--------|-------|-------------|---------|
| files | File[] | Local `.txt` files selected or passed from MCP client | ["sample.txt"] |
| docs_path | str | Directory where files are stored | data/docs |
</details>


### 2. Upload URLs (`upload_urls`)

Input: List of URLs referencing `.txt` files to download and store in `docs_path`.

<details>
<summary>View Input Model</summary>

| Field | Type | Description | Example |
|--------|-------|-------------|---------|
| urls | List[str] | List of URLs pointing to remote documents | ["https://example.com/doc.txt"] |
| docs_path | str | Directory where downloaded files are saved | data/docs |

</details>

### 3. Optimize RAG (`optimize_rag`)

Input: JSON object following the `OptimizeRequest` model.

<details>
<summary>View Input Model</summary>

| Field | Type | Description | Example |
|-------|------|-------------|---------|
| docs_path | str | Folder containing documents | data/docs |
| retriever | List[str] | Retriever type | ["faiss"] |
| embedding_model | List[str] | Embedding model name or path | ["sentence-transformers/all-MiniLM-L6-v2"] |
| strategy | List[str] | RAG strategy | ["fixed"] |
| chunk_sizes | List[int] | Chunk sizes to evaluate | [200] |
| overlaps | List[int] | Overlap values to test | [50] |
| rerankers | List[str] | Rerankers to apply after retrieval | ["mmr"] |
| search_type | str | Parameter search method (grid, random, bayesian) | "grid" |
| trials | int | Number of optimization trials | 2 |
| metric | str | Evaluation metric for optimization | "faithfulness" |
| validation_choice | str | Validation data source (generate, local JSON path, HF dataset ID, etc.) | "generate" |
| llm_model | str | LLM used to generate QA dataset when validation_choice=generate | "gemini-2.5-flash-lite" |

</details>

### 4. Autotune RAG (`autotune`)

Input: JSON object following the `AutotuneRequest` model.

<details>
<summary>View Input Model</summary>

| Field | Type | Description | Example |
|-------|------|-------------|---------|
| docs_path | str | Folder containing documents | data/docs |
| embedding_model | str | Embedding model name or path | "sentence-transformers/all-MiniLM-L6-v2" |
| num_chunk_pairs | int | Number of chunk pairs to analyze for tuning | 2 |
| metric | str | Evaluation metric for optimization | "faithfulness" |
| search_type | str | Search method (grid, random, bayesian) | "grid" |
| trials | int | Number of optimization trials | 2 |
| validation_choice | str | Validation data source (generate, local JSON, HF dataset) | "generate" |
| llm_model | str | LLM used for generating QA dataset | "gemini-2.5-flash-lite" |

</details>

### 5. Generate QA (`generate_qa`)

Input: JSON object following the `QARequest` model.
<details>
<summary>View Input Model</summary>

| Field | Type | Description | Example |
|-------|------|-------------|---------|
| docs_path | str | Folder containing documents for QA generation | data/docs |
| llm_model | str | LLM used for question generation | "gemini-2.5-flash-lite" |
| batch_size | int | Number of documents processed per batch | 5 |
| min_q | int | Minimum number of questions per document | 3 |
| max_q | int | Maximum number of questions per document | 25 |

</details>

### 6. Clear Cache (`clear_cache`)

Deletes all stored documents from `data/docs`.

<details>
<summary>View Input Model</summary>

| Field | Type | Description | Example |
|--------|-------|-------------|---------|
| docs_path | str | Folder to wipe clean | data/docs |

</details>

---

## 📤 Outputs

The Ragmint MCP Server exposes three main endpoints with the following example outputs:

### 1. Upload Documents Response (`upload_docs`)

<details>
<summary>View Response Example</summary>

```json
{
  "status": "ok",
  "uploaded_files": ["sample.txt"],
  "docs_path": "data/docs"
}
```

</details>

- **status**: `"ok"` → Indicates that the upload was successful.
- **uploaded_files**: List of file names that were successfully uploaded.
- **docs_path**: The directory where the uploaded documents are stored.

✅ Confirms your documents are ready for RAG operations.


### 2. Upload URLs Response (`upload_urls`)

<details>
<summary>View Response Example</summary>

```json
{
  "status": "ok",
  "uploaded_files": ["doc.txt"],
  "docs_path": "data/docs"
}
```
</details> 

- **status**: `"ok"` → Indicates that the upload was successful.
- **uploaded_files**: List of file names that were successfully uploaded.
- **docs_path**: The directory where the uploaded documents are stored.

✅ Confirms your documents are ready for RAG operations.


### 3. Optimize RAG Response (`optimize_rag`)

<details>
<summary>View Response Example</summary>

```json
{
  "status": "finished",
  "run_id": "opt_1763222218",
  "elapsed_seconds": 0.937,
  "best_config": {
    "retriever": "faiss",
    "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
    "reranker": "mmr",
    "chunk_size": 200,
    "overlap": 50,
    "strategy": "fixed",
    "faithfulness": 0.8659,
    "latency": 0.0333
  },
  "results": [
    {
      "retriever": "faiss",
      "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
      "reranker": "mmr",
      "chunk_size": 200,
      "overlap": 50,
      "strategy": "fixed",
      "faithfulness": 0.8659,
      "latency": 0.0333
    }
  ],
  "corpus_stats": {
    "num_docs": 1,
    "avg_len": 8.0,
    "corpus_size": 61
  }
}
```

</details>

- **status**: `"finished"` → Optimization process completed.
- **run_id**: Unique identifier for this optimization run.
- **elapsed_seconds**: How long the optimization took.
- **best_config**: Configuration that gave the best performance.
  - **retriever** → The retrieval algorithm used (faiss).
  - **embedding_model** → Embedding model applied.
  - **reranker** → Reranking strategy after retrieval.
  - **chunk_size** → Size of document chunks used in RAG.
  - **overlap** → Overlap between consecutive chunks.
  - **strategy** → RAG retrieval strategy.
  - **faithfulness** → Evaluation score (higher = better).
  - **latency** → Time per query in seconds.
- **results**: List of all tested configurations and their scores.
- **corpus_stats**: Statistics about the uploaded documents.
  - **num_docs** → Number of documents in corpus.
  - **avg_len** → Average document length.
  - **corpus_size** → Total size in characters or tokens.


### 4. Autotune RAG Response (`autotune`)

<details>
<summary>View Response Example</summary>

```json
{
  "status": "finished",
  "run_id": "autotune_1763222228",
  "elapsed_seconds": 4.733,
  "recommendation": {
    "retriever": "BM25",
    "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
    "chunk_size": 100,
    "overlap": 30,
    "strategy": "fixed",
    "chunk_candidates": [[100, 30], [110, 30]]
  },
  "chunk_candidates": [[90, 50], [70, 50]],
  "best_config": {
    "retriever": "BM25",
    "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
    "reranker": "mmr",
    "chunk_size": 70,
    "overlap": 50,
    "strategy": "fixed",
    "faithfulness": 1.0,
    "latency": 0.0272
  },
  "results": [
    {
      "retriever": "BM25",
      "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
      "reranker": "mmr",
      "chunk_size": 70,
      "overlap": 50,
      "strategy": "fixed",
      "faithfulness": 1.0,
      "latency": 0.0272
    },
    {
      "retriever": "BM25",
      "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
      "reranker": "mmr",
      "chunk_size": 90,
      "overlap": 50,
      "strategy": "fixed",
      "faithfulness": 1.0,
      "latency": 0.0186
    }
  ],
  "corpus_stats": {
    "num_docs": 1,
    "avg_len": 8.0,
    "corpus_size": 61
  }
}
```

</details>

- **recommendation**: The tuned configuration suggested by the autotuner.
- **chunk_candidates**: List of possible chunk_size/overlap pairs analyzed.
- **best_config**: Best-performing configuration with metrics.
- **results**: All tested configurations and their performance.
- **corpus_stats**: Same as in optimize response.
- **status, run_id, elapsed_seconds**: Same meaning as Optimize endpoint.

🧠 **Difference from Optimize**: Autotune automatically selects the best hyperparameters, rather than testing all user-specified combinations.


### 5. Generate QA Response (`generate_qa`)

<details>
<summary>View Response Example</summary>

```json
{
  "status": "finished",
  "output_path": "data/docs/validation_qa.json",
  "preview_count": 3,
  "sample": [
    {
      "query": "What capability does Artificial Intelligence provide to machines?",
      "expected_answer": "Artificial Intelligence enables machines to learn from data."
    },
    {
      "query": "What is the primary source of learning for machines with Artificial Intelligence?",
      "expected_answer": "Machines with Artificial Intelligence learn from data."
    },
    {
      "query": "How does Artificial Intelligence facilitate machine learning?",
      "expected_answer": "Artificial Intelligence enables machines to learn from data."
    }
  ]
}
```

</details>

- **output_path**: Where the generated QA JSON file is saved.
- **preview_count**: Number of QA pairs included in the response preview.
- **sample**: Example QA pairs:
  - **query** → The question generated from the document.
  - **expected_answer** → The reference answer corresponding to that question.
- **status**: `"finished"` → QA generation completed successfully.


### 6. Clear Cache Response (`clear_cache`)

<details>
<summary>View Response Example</summary>

```json
{
  "status": "ok",
  "deleted_files": 7,
  "docs_path": "data/docs"
}
```
</details>

- **deleted_files**: Number of documents removed.
- **status**: "ok" indicates successful workspace reset.

---

## 📘 License

This project is licensed under the Apache License 2.0. See the [LICENSE](LICENSE) file for details.

---

<p align="center">
  <sub>Built with ❤️ by <a href="https://andyolivers.com">André Oliveira</a> | Apache 2.0 License</sub>
</p>