Spaces:

MCP-1st-Birthday
/

ragmint-mcp-server

Running

App Files Files Community

ragmint-mcp-server / README.md

André Oliveira

added demo on youtube

56014e8 15 days ago

preview code

raw

history blame contribute delete

17.2 kB

	---
	title: Ragmint MCP Server
	emoji: 🧠
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: "5.49.1"
	app_file: app.py
	license: apache-2.0
	pinned: true
	short_description: MCP server for Ragmint with RAG pipeline optimization
	tags:
	- building-mcp-track-enterprise
	- mcp
	- rag
	- llm
	- gradio
	- bayesian-optimization
	- embeddings
	- vector-search
	- gemini
	- retrievers
	- python-library
	---

	# Ragmint MCP Server
	<p align="center">
	<img src="https://raw.githubusercontent.com/andyolivers/ragmint/main/src/ragmint/assets/img/ragmint-banner70.png" height="70px" alt="Ragmint Banner">
	</p>

	Gradio-based MCP server for Ragmint, enabling Retrieval-Augmented Generation (RAG) pipeline optimization and tuning via an MCP interface.

	![Python](https://img.shields.io/badge/python-3.9%2B-blue) ![License](https://img.shields.io/badge/license-Apache%202.0-green) ![Status](https://img.shields.io/badge/Status-Active-success) ![MCP](https://img.shields.io/badge/MCP-enabled-brightgreen) [![LinkedIn](https://img.shields.io/badge/LinkedIn-Post-blue)](https://www.linkedin.com/posts/andyolivers_ragmint-mcp-server-a-hugging-face-space-activity-7399028674261348352-P5wy?utm_source=share&utm_medium=member_desktop&rcm=ACoAABanwk4Bp0A-FVwO9wyzwVp0g_yqZoRDptI)

	---

	## 🧩 Overview

	Ragmint MCP Server exposes the full power of Ragmint, a modular Python library for evaluating, optimizing, and tuning RAG pipelines, through a Multimodal Control Plane (MCP). This allows external clients (like Claude Desktop or Cursor) to run experiments and tune RAG parameters programmatically.

	## Ragmint

	[Ragmint](https://github.com/andyolivers/ragmint) (Retrieval-Augmented Generation Model Inspection & Tuning) is a modular Python library for evaluating, optimizing, and tuning RAG pipelines. It’s designed for developers and researchers who want automated hyperparameter optimization, retriever selection, embedding tuning, explainability, and reproducible experiment tracking.

	![Python](https://img.shields.io/badge/python-3.9%2B-blue)
	![License](https://img.shields.io/badge/license-Apache%202.0-green)
	[![PyPI](https://img.shields.io/pypi/v/ragmint?color=blue)](https://pypi.org/project/ragmint/)
	[![HF Space](https://img.shields.io/badge/HF-Space-blue)](https://huggingface.co/spaces/andyolivers/ragmint-mcp-server)
	![MCP](https://img.shields.io/badge/MCP-Enabled-green)
	![Status](https://img.shields.io/badge/Status-Beta-orange)
	![Optuna](https://img.shields.io/badge/Optuna-Bayesian%20Optimization-6f42c1?logo=optuna&logoColor=white)
	![Google Gemini 2.5](https://img.shields.io/badge/Google%20Gemini-LLM-lightblue?logo=google&logoColor=white)


	### Features exposed via MCP:

	* ✅ Automated hyperparameter optimization (Grid, Random, Bayesian via Optuna).
	* 🤖 Auto-RAG Tuner for dynamic retriever–embedding recommendations.
	* 🧮 Validation QA generation for corpora without labeled data.
	* 📦 Chunking, embeddings, retrievers, rerankers configuration.
	* ⚙️ Full RAG pipeline control programmatically.

	---

	## 🚀 Quick Start

	### Installation

	```bash
	pip install -r requirements.txt
	```

	### Running the MCP Server

	```bash
	python app.py
	```

	The server will expose MCP-compatible endpoints, allowing clients to:

	* Perform optimization experiments.
	* Automatically autotune pipelines.
	* Generate validation QA sets with LLM.


	### Environment Variables

	Set API keys for LLMs used in explainability and QA generation:

	```bash
	export GOOGLE_API_KEY="your_gemini_key"
	```

	---

	## 🧠 MCP Usage

	Ragmint MCP Server provides Python-callable interfaces for programmatic control. You can find an example of MCP usage in the [Ragmint MCP Server Space](https://huggingface.co/spaces/andyolivers/ragmint-mcp-server) on Hugging Face.


	---

	## 🔤 Supported Embeddings

	* `sentence-transformers/all-MiniLM-L6-v2`
	* `sentence-transformers/all-mpnet-base-v2`
	* `BAAI/bge-base-en-v1.5`
	* `intfloat/multilingual-e5-base`

	### Configuration Example

	```yaml
	embedding_model: sentence-transformers/all-MiniLM-L6-v2
	```

	---

	## 🔍 Supported Retrievers

	\| Retriever \| Description \|
	\|--------------\|------------------------------------------------------------------\|
	\| FAISS \| Fast vector similarity search and indexing. \|
	\| Chroma \| Persistent vector database with embeddings. \|
	\| bm25 \| Classical lexical search based on term relevance (TF-IDF-style). \|
	\| numpy \| Brute-force similarity search using raw vectors and matrix ops. \|

	### Configuration Example

	```yaml
	retriever: faiss
	```

	---

	## 🧮 Dataset Options

	\| Mode \| Example \| Description \|
	\|----------------------\|------------------------------------\|------------------------------------\|
	\| Default \| validation_set=None \| Uses built-in validation_qa.json. \|
	\| Custom File \| validation_set="data/my_eval.json" \| Your QA dataset. \|
	\| Hugging Face Dataset \| validation_set="squad" \| Downloads benchmark dataset. \|
	\| Generate \| validation_set="generate" \| Generates the QA dataset with LLM. \|

	---

	## 🧩 Folder Structure

	```
	ragmint_mcp_server/
	├── app.py # MCP server entrypoint
	├── models.py
	└── api.py
	```
	---
	## 🔧 MCP Tools (app.py)

	The `app.py` file provides the Gradio UI and also registers the functions exposed as MCP Tools, enabling external MCP clients (Claude Desktop, Cursor, VS Code MCP extension, etc.) to call Ragmint programmatically.

	`app.py` launches the FastAPI backend (`api.py`) in a background thread and exposes the following MCP tools:

	\| MCP Tool \| Python Function \| Description \|
	\|-----------\|------------------------\|------------------------------------------------------------------------------------\|
	\| upload_docs \| upload_docs_tool() \| Uploads `.txt` files or remote URLs into the configured `docs_path`. \|
	\| upload_urls \| upload_urls_tool() \| Downloads remote files from external URLs and stores them inside `docs_path`. \|
	\| optimize_rag \| optimize_rag_tool() \| Runs explicit hyperparameter optimization for a RAG pipeline. \|
	\| autotune \| autotune_tool() \| Automatically recommends best chunking + embedding configuration. \|
	\| generate_qa \| generate_qa_tool() \| Generates synthetic QA validation dataset for evaluation. \|
	\| clear_cache \| clear_cache_tool() \| Deletes all docs inside `data/docs` to reset the workspace. \|

	---

	## 🎬 Demo

	YouTube: https://www.youtube.com/watch?v=DKtHBI3jYgQ

	---

	## 📥 Inputs

	The Ragmint MCP Server exposes three main endpoints with the following inputs:


	### 1. Upload Documents (`upload_docs`)

	Input: `.txt` files or file-like objects to upload to the documents directory (`docs_path`).

	<details>
	<summary>View Input Model</summary>

	\| Field \| Type \| Description \| Example \|
	\|--------\|-------\|-------------\|---------\|
	\| files \| File[] \| Local `.txt` files selected or passed from MCP client \| ["sample.txt"] \|
	\| docs_path \| str \| Directory where files are stored \| data/docs \|
	</details>


	### 2. Upload URLs (`upload_urls`)

	Input: List of URLs referencing `.txt` files to download and store in `docs_path`.

	<details>
	<summary>View Input Model</summary>

	\| Field \| Type \| Description \| Example \|
	\|--------\|-------\|-------------\|---------\|
	\| urls \| List[str] \| List of URLs pointing to remote documents \| ["https://example.com/doc.txt"] \|
	\| docs_path \| str \| Directory where downloaded files are saved \| data/docs \|

	</details>

	### 3. Optimize RAG (`optimize_rag`)

	Input: JSON object following the `OptimizeRequest` model.

	<details>
	<summary>View Input Model</summary>

	\| Field \| Type \| Description \| Example \|
	\|-------\|------\|-------------\|---------\|
	\| docs_path \| str \| Folder containing documents \| data/docs \|
	\| retriever \| List[str] \| Retriever type \| ["faiss"] \|
	\| embedding_model \| List[str] \| Embedding model name or path \| ["sentence-transformers/all-MiniLM-L6-v2"] \|
	\| strategy \| List[str] \| RAG strategy \| ["fixed"] \|
	\| chunk_sizes \| List[int] \| Chunk sizes to evaluate \| [200] \|
	\| overlaps \| List[int] \| Overlap values to test \| [50] \|
	\| rerankers \| List[str] \| Rerankers to apply after retrieval \| ["mmr"] \|
	\| search_type \| str \| Parameter search method (grid, random, bayesian) \| "grid" \|
	\| trials \| int \| Number of optimization trials \| 2 \|
	\| metric \| str \| Evaluation metric for optimization \| "faithfulness" \|
	\| validation_choice \| str \| Validation data source (generate, local JSON path, HF dataset ID, etc.) \| "generate" \|
	\| llm_model \| str \| LLM used to generate QA dataset when validation_choice=generate \| "gemini-2.5-flash-lite" \|

	</details>

	### 4. Autotune RAG (`autotune`)

	Input: JSON object following the `AutotuneRequest` model.

	<details>
	<summary>View Input Model</summary>

	\| Field \| Type \| Description \| Example \|
	\|-------\|------\|-------------\|---------\|
	\| docs_path \| str \| Folder containing documents \| data/docs \|
	\| embedding_model \| str \| Embedding model name or path \| "sentence-transformers/all-MiniLM-L6-v2" \|
	\| num_chunk_pairs \| int \| Number of chunk pairs to analyze for tuning \| 2 \|
	\| metric \| str \| Evaluation metric for optimization \| "faithfulness" \|
	\| search_type \| str \| Search method (grid, random, bayesian) \| "grid" \|
	\| trials \| int \| Number of optimization trials \| 2 \|
	\| validation_choice \| str \| Validation data source (generate, local JSON, HF dataset) \| "generate" \|
	\| llm_model \| str \| LLM used for generating QA dataset \| "gemini-2.5-flash-lite" \|

	</details>

	### 5. Generate QA (`generate_qa`)

	Input: JSON object following the `QARequest` model.
	<details>
	<summary>View Input Model</summary>

	\| Field \| Type \| Description \| Example \|
	\|-------\|------\|-------------\|---------\|
	\| docs_path \| str \| Folder containing documents for QA generation \| data/docs \|
	\| llm_model \| str \| LLM used for question generation \| "gemini-2.5-flash-lite" \|
	\| batch_size \| int \| Number of documents processed per batch \| 5 \|
	\| min_q \| int \| Minimum number of questions per document \| 3 \|
	\| max_q \| int \| Maximum number of questions per document \| 25 \|

	</details>

	### 6. Clear Cache (`clear_cache`)

	Deletes all stored documents from `data/docs`.

	<details>
	<summary>View Input Model</summary>

	\| Field \| Type \| Description \| Example \|
	\|--------\|-------\|-------------\|---------\|
	\| docs_path \| str \| Folder to wipe clean \| data/docs \|

	</details>

	---

	## 📤 Outputs

	The Ragmint MCP Server exposes three main endpoints with the following example outputs:

	### 1. Upload Documents Response (`upload_docs`)

	<details>
	<summary>View Response Example</summary>

	```json
	{
	"status": "ok",
	"uploaded_files": ["sample.txt"],
	"docs_path": "data/docs"
	}
	```

	</details>

	- status: `"ok"` → Indicates that the upload was successful.
	- uploaded_files: List of file names that were successfully uploaded.
	- docs_path: The directory where the uploaded documents are stored.

	✅ Confirms your documents are ready for RAG operations.


	### 2. Upload URLs Response (`upload_urls`)

	<details>
	<summary>View Response Example</summary>

	```json
	{
	"status": "ok",
	"uploaded_files": ["doc.txt"],
	"docs_path": "data/docs"
	}
	```
	</details>

	- status: `"ok"` → Indicates that the upload was successful.
	- uploaded_files: List of file names that were successfully uploaded.
	- docs_path: The directory where the uploaded documents are stored.

	✅ Confirms your documents are ready for RAG operations.


	### 3. Optimize RAG Response (`optimize_rag`)

	<details>
	<summary>View Response Example</summary>

	```json
	{
	"status": "finished",
	"run_id": "opt_1763222218",
	"elapsed_seconds": 0.937,
	"best_config": {
	"retriever": "faiss",
	"embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
	"reranker": "mmr",
	"chunk_size": 200,
	"overlap": 50,
	"strategy": "fixed",
	"faithfulness": 0.8659,
	"latency": 0.0333
	},
	"results": [
	{
	"retriever": "faiss",
	"embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
	"reranker": "mmr",
	"chunk_size": 200,
	"overlap": 50,
	"strategy": "fixed",
	"faithfulness": 0.8659,
	"latency": 0.0333
	}
	],
	"corpus_stats": {
	"num_docs": 1,
	"avg_len": 8.0,
	"corpus_size": 61
	}
	}
	```

	</details>

	- status: `"finished"` → Optimization process completed.
	- run_id: Unique identifier for this optimization run.
	- elapsed_seconds: How long the optimization took.
	- best_config: Configuration that gave the best performance.
	- retriever → The retrieval algorithm used (faiss).
	- embedding_model → Embedding model applied.
	- reranker → Reranking strategy after retrieval.
	- chunk_size → Size of document chunks used in RAG.
	- overlap → Overlap between consecutive chunks.
	- strategy → RAG retrieval strategy.
	- faithfulness → Evaluation score (higher = better).
	- latency → Time per query in seconds.
	- results: List of all tested configurations and their scores.
	- corpus_stats: Statistics about the uploaded documents.
	- num_docs → Number of documents in corpus.
	- avg_len → Average document length.
	- corpus_size → Total size in characters or tokens.


	### 4. Autotune RAG Response (`autotune`)

	<details>
	<summary>View Response Example</summary>

	```json
	{
	"status": "finished",
	"run_id": "autotune_1763222228",
	"elapsed_seconds": 4.733,
	"recommendation": {
	"retriever": "BM25",
	"embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
	"chunk_size": 100,
	"overlap": 30,
	"strategy": "fixed",
	"chunk_candidates": [[100, 30], [110, 30]]
	},
	"chunk_candidates": [[90, 50], [70, 50]],
	"best_config": {
	"retriever": "BM25",
	"embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
	"reranker": "mmr",
	"chunk_size": 70,
	"overlap": 50,
	"strategy": "fixed",
	"faithfulness": 1.0,
	"latency": 0.0272
	},
	"results": [
	{
	"retriever": "BM25",
	"embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
	"reranker": "mmr",
	"chunk_size": 70,
	"overlap": 50,
	"strategy": "fixed",
	"faithfulness": 1.0,
	"latency": 0.0272
	},
	{
	"retriever": "BM25",
	"embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
	"reranker": "mmr",
	"chunk_size": 90,
	"overlap": 50,
	"strategy": "fixed",
	"faithfulness": 1.0,
	"latency": 0.0186
	}
	],
	"corpus_stats": {
	"num_docs": 1,
	"avg_len": 8.0,
	"corpus_size": 61
	}
	}
	```

	</details>

	- recommendation: The tuned configuration suggested by the autotuner.
	- chunk_candidates: List of possible chunk_size/overlap pairs analyzed.
	- best_config: Best-performing configuration with metrics.
	- results: All tested configurations and their performance.
	- corpus_stats: Same as in optimize response.
	- status, run_id, elapsed_seconds: Same meaning as Optimize endpoint.

	🧠 Difference from Optimize: Autotune automatically selects the best hyperparameters, rather than testing all user-specified combinations.


	### 5. Generate QA Response (`generate_qa`)

	<details>
	<summary>View Response Example</summary>

	```json
	{
	"status": "finished",
	"output_path": "data/docs/validation_qa.json",
	"preview_count": 3,
	"sample": [
	{
	"query": "What capability does Artificial Intelligence provide to machines?",
	"expected_answer": "Artificial Intelligence enables machines to learn from data."
	},
	{
	"query": "What is the primary source of learning for machines with Artificial Intelligence?",
	"expected_answer": "Machines with Artificial Intelligence learn from data."
	},
	{
	"query": "How does Artificial Intelligence facilitate machine learning?",
	"expected_answer": "Artificial Intelligence enables machines to learn from data."
	}
	]
	}
	```

	</details>

	- output_path: Where the generated QA JSON file is saved.
	- preview_count: Number of QA pairs included in the response preview.
	- sample: Example QA pairs:
	- query → The question generated from the document.
	- expected_answer → The reference answer corresponding to that question.
	- status: `"finished"` → QA generation completed successfully.


	### 6. Clear Cache Response (`clear_cache`)

	<details>
	<summary>View Response Example</summary>

	```json
	{
	"status": "ok",
	"deleted_files": 7,
	"docs_path": "data/docs"
	}
	```
	</details>

	- deleted_files: Number of documents removed.
	- status: "ok" indicates successful workspace reset.

	---

	## 📘 License

	This project is licensed under the Apache License 2.0. See the [LICENSE](LICENSE) file for details.

	---

	<p align="center">
	<sub>Built with ❤️ by <a href="https://andyolivers.com">André Oliveira</a> \| Apache 2.0 License</sub>
	</p>