--- title: Ragmint MCP Server emoji: ๐Ÿง  colorFrom: blue colorTo: purple sdk: gradio sdk_version: "5.49.1" app_file: app.py license: apache-2.0 pinned: true short_description: MCP server for Ragmint with RAG pipeline optimization tags: - building-mcp-track-enterprise - mcp - rag - llm - gradio - bayesian-optimization - embeddings - vector-search - gemini - retrievers - python-library --- # Ragmint MCP Server

Ragmint Banner

Gradio-based MCP server for Ragmint, enabling **Retrieval-Augmented Generation (RAG) pipeline optimization and tuning** via an MCP interface. ![Python](https://img.shields.io/badge/python-3.9%2B-blue) ![License](https://img.shields.io/badge/license-Apache%202.0-green) ![Status](https://img.shields.io/badge/Status-Active-success) ![MCP](https://img.shields.io/badge/MCP-enabled-brightgreen) [![LinkedIn](https://img.shields.io/badge/LinkedIn-Post-blue)](https://www.linkedin.com/posts/andyolivers_ragmint-mcp-server-a-hugging-face-space-activity-7399028674261348352-P5wy?utm_source=share&utm_medium=member_desktop&rcm=ACoAABanwk4Bp0A-FVwO9wyzwVp0g_yqZoRDptI) --- ## ๐Ÿงฉ Overview Ragmint MCP Server exposes the full power of **Ragmint**, a modular Python library for **evaluating, optimizing, and tuning RAG pipelines**, through a **Multimodal Control Plane (MCP)**. This allows external clients (like Claude Desktop or Cursor) to **run experiments and tune RAG parameters programmatically**. ## Ragmint [Ragmint](https://github.com/andyolivers/ragmint) (Retrieval-Augmented Generation Model Inspection & Tuning) is a **modular Python library** for **evaluating, optimizing, and tuning RAG pipelines**. Itโ€™s designed for developers and researchers who want automated hyperparameter optimization, retriever selection, embedding tuning, explainability, and reproducible experiment tracking. ![Python](https://img.shields.io/badge/python-3.9%2B-blue) ![License](https://img.shields.io/badge/license-Apache%202.0-green) [![PyPI](https://img.shields.io/pypi/v/ragmint?color=blue)](https://pypi.org/project/ragmint/) [![HF Space](https://img.shields.io/badge/HF-Space-blue)](https://huggingface.co/spaces/andyolivers/ragmint-mcp-server) ![MCP](https://img.shields.io/badge/MCP-Enabled-green) ![Status](https://img.shields.io/badge/Status-Beta-orange) ![Optuna](https://img.shields.io/badge/Optuna-Bayesian%20Optimization-6f42c1?logo=optuna&logoColor=white) ![Google Gemini 2.5](https://img.shields.io/badge/Google%20Gemini-LLM-lightblue?logo=google&logoColor=white) ### Features exposed via MCP: * โœ… Automated hyperparameter optimization (Grid, Random, Bayesian via Optuna). * ๐Ÿค– Auto-RAG Tuner for dynamic retrieverโ€“embedding recommendations. * ๐Ÿงฎ Validation QA generation for corpora without labeled data. * ๐Ÿ“ฆ Chunking, embeddings, retrievers, rerankers configuration. * โš™๏ธ Full RAG pipeline control programmatically. --- ## ๐Ÿš€ Quick Start ### Installation ```bash pip install -r requirements.txt ``` ### Running the MCP Server ```bash python app.py ``` The server will expose MCP-compatible endpoints, allowing clients to: * Perform optimization experiments. * Automatically autotune pipelines. * Generate validation QA sets with LLM. ### Environment Variables Set API keys for LLMs used in explainability and QA generation: ```bash export GOOGLE_API_KEY="your_gemini_key" ``` --- ## ๐Ÿง  MCP Usage Ragmint MCP Server provides Python-callable interfaces for programmatic control. You can find an example of MCP usage in the [Ragmint MCP Server Space](https://huggingface.co/spaces/andyolivers/ragmint-mcp-server) on Hugging Face. --- ## ๐Ÿ”ค Supported Embeddings * `sentence-transformers/all-MiniLM-L6-v2` * `sentence-transformers/all-mpnet-base-v2` * `BAAI/bge-base-en-v1.5` * `intfloat/multilingual-e5-base` ### Configuration Example ```yaml embedding_model: sentence-transformers/all-MiniLM-L6-v2 ``` --- ## ๐Ÿ” Supported Retrievers | Retriever | Description | |--------------|------------------------------------------------------------------| | FAISS | Fast vector similarity search and indexing. | | Chroma | Persistent vector database with embeddings. | | bm25 | Classical lexical search based on term relevance (TF-IDF-style). | | numpy | Brute-force similarity search using raw vectors and matrix ops. | ### Configuration Example ```yaml retriever: faiss ``` --- ## ๐Ÿงฎ Dataset Options | Mode | Example | Description | |----------------------|------------------------------------|------------------------------------| | Default | validation_set=None | Uses built-in validation_qa.json. | | Custom File | validation_set="data/my_eval.json" | Your QA dataset. | | Hugging Face Dataset | validation_set="squad" | Downloads benchmark dataset. | | Generate | validation_set="generate" | Generates the QA dataset with LLM. | --- ## ๐Ÿงฉ Folder Structure ``` ragmint_mcp_server/ โ”œโ”€โ”€ app.py # MCP server entrypoint โ”œโ”€โ”€ models.py โ””โ”€โ”€ api.py ``` --- ## ๐Ÿ”ง MCP Tools (app.py) The `app.py` file provides the Gradio UI and also registers the functions exposed as **MCP Tools**, enabling external MCP clients (Claude Desktop, Cursor, VS Code MCP extension, etc.) to call Ragmint programmatically. `app.py` launches the FastAPI backend (`api.py`) in a background thread and exposes the following MCP tools: | MCP Tool | Python Function | Description | |-----------|------------------------|------------------------------------------------------------------------------------| | upload_docs | upload_docs_tool() | Uploads `.txt` files or remote URLs into the configured `docs_path`. | | upload_urls | upload_urls_tool() | Downloads remote files from external URLs and stores them inside `docs_path`. | | optimize_rag | optimize_rag_tool() | Runs explicit hyperparameter optimization for a RAG pipeline. | | autotune | autotune_tool() | Automatically recommends best chunking + embedding configuration. | | generate_qa | generate_qa_tool() | Generates synthetic QA validation dataset for evaluation. | | clear_cache | clear_cache_tool() | Deletes all docs inside `data/docs` to reset the workspace. | --- ## ๐ŸŽฌ Demo YouTube: https://www.youtube.com/watch?v=DKtHBI3jYgQ --- ## ๐Ÿ“ฅ Inputs The Ragmint MCP Server exposes three main endpoints with the following inputs: ### 1. Upload Documents (`upload_docs`) Input: `.txt` files or file-like objects to upload to the documents directory (`docs_path`).
View Input Model | Field | Type | Description | Example | |--------|-------|-------------|---------| | files | File[] | Local `.txt` files selected or passed from MCP client | ["sample.txt"] | | docs_path | str | Directory where files are stored | data/docs |
### 2. Upload URLs (`upload_urls`) Input: List of URLs referencing `.txt` files to download and store in `docs_path`.
View Input Model | Field | Type | Description | Example | |--------|-------|-------------|---------| | urls | List[str] | List of URLs pointing to remote documents | ["https://example.com/doc.txt"] | | docs_path | str | Directory where downloaded files are saved | data/docs |
### 3. Optimize RAG (`optimize_rag`) Input: JSON object following the `OptimizeRequest` model.
View Input Model | Field | Type | Description | Example | |-------|------|-------------|---------| | docs_path | str | Folder containing documents | data/docs | | retriever | List[str] | Retriever type | ["faiss"] | | embedding_model | List[str] | Embedding model name or path | ["sentence-transformers/all-MiniLM-L6-v2"] | | strategy | List[str] | RAG strategy | ["fixed"] | | chunk_sizes | List[int] | Chunk sizes to evaluate | [200] | | overlaps | List[int] | Overlap values to test | [50] | | rerankers | List[str] | Rerankers to apply after retrieval | ["mmr"] | | search_type | str | Parameter search method (grid, random, bayesian) | "grid" | | trials | int | Number of optimization trials | 2 | | metric | str | Evaluation metric for optimization | "faithfulness" | | validation_choice | str | Validation data source (generate, local JSON path, HF dataset ID, etc.) | "generate" | | llm_model | str | LLM used to generate QA dataset when validation_choice=generate | "gemini-2.5-flash-lite" |
### 4. Autotune RAG (`autotune`) Input: JSON object following the `AutotuneRequest` model.
View Input Model | Field | Type | Description | Example | |-------|------|-------------|---------| | docs_path | str | Folder containing documents | data/docs | | embedding_model | str | Embedding model name or path | "sentence-transformers/all-MiniLM-L6-v2" | | num_chunk_pairs | int | Number of chunk pairs to analyze for tuning | 2 | | metric | str | Evaluation metric for optimization | "faithfulness" | | search_type | str | Search method (grid, random, bayesian) | "grid" | | trials | int | Number of optimization trials | 2 | | validation_choice | str | Validation data source (generate, local JSON, HF dataset) | "generate" | | llm_model | str | LLM used for generating QA dataset | "gemini-2.5-flash-lite" |
### 5. Generate QA (`generate_qa`) Input: JSON object following the `QARequest` model.
View Input Model | Field | Type | Description | Example | |-------|------|-------------|---------| | docs_path | str | Folder containing documents for QA generation | data/docs | | llm_model | str | LLM used for question generation | "gemini-2.5-flash-lite" | | batch_size | int | Number of documents processed per batch | 5 | | min_q | int | Minimum number of questions per document | 3 | | max_q | int | Maximum number of questions per document | 25 |
### 6. Clear Cache (`clear_cache`) Deletes all stored documents from `data/docs`.
View Input Model | Field | Type | Description | Example | |--------|-------|-------------|---------| | docs_path | str | Folder to wipe clean | data/docs |
--- ## ๐Ÿ“ค Outputs The Ragmint MCP Server exposes three main endpoints with the following example outputs: ### 1. Upload Documents Response (`upload_docs`)
View Response Example ```json { "status": "ok", "uploaded_files": ["sample.txt"], "docs_path": "data/docs" } ```
- **status**: `"ok"` โ†’ Indicates that the upload was successful. - **uploaded_files**: List of file names that were successfully uploaded. - **docs_path**: The directory where the uploaded documents are stored. โœ… Confirms your documents are ready for RAG operations. ### 2. Upload URLs Response (`upload_urls`)
View Response Example ```json { "status": "ok", "uploaded_files": ["doc.txt"], "docs_path": "data/docs" } ```
- **status**: `"ok"` โ†’ Indicates that the upload was successful. - **uploaded_files**: List of file names that were successfully uploaded. - **docs_path**: The directory where the uploaded documents are stored. โœ… Confirms your documents are ready for RAG operations. ### 3. Optimize RAG Response (`optimize_rag`)
View Response Example ```json { "status": "finished", "run_id": "opt_1763222218", "elapsed_seconds": 0.937, "best_config": { "retriever": "faiss", "embedding_model": "sentence-transformers/all-MiniLM-L6-v2", "reranker": "mmr", "chunk_size": 200, "overlap": 50, "strategy": "fixed", "faithfulness": 0.8659, "latency": 0.0333 }, "results": [ { "retriever": "faiss", "embedding_model": "sentence-transformers/all-MiniLM-L6-v2", "reranker": "mmr", "chunk_size": 200, "overlap": 50, "strategy": "fixed", "faithfulness": 0.8659, "latency": 0.0333 } ], "corpus_stats": { "num_docs": 1, "avg_len": 8.0, "corpus_size": 61 } } ```
- **status**: `"finished"` โ†’ Optimization process completed. - **run_id**: Unique identifier for this optimization run. - **elapsed_seconds**: How long the optimization took. - **best_config**: Configuration that gave the best performance. - **retriever** โ†’ The retrieval algorithm used (faiss). - **embedding_model** โ†’ Embedding model applied. - **reranker** โ†’ Reranking strategy after retrieval. - **chunk_size** โ†’ Size of document chunks used in RAG. - **overlap** โ†’ Overlap between consecutive chunks. - **strategy** โ†’ RAG retrieval strategy. - **faithfulness** โ†’ Evaluation score (higher = better). - **latency** โ†’ Time per query in seconds. - **results**: List of all tested configurations and their scores. - **corpus_stats**: Statistics about the uploaded documents. - **num_docs** โ†’ Number of documents in corpus. - **avg_len** โ†’ Average document length. - **corpus_size** โ†’ Total size in characters or tokens. ### 4. Autotune RAG Response (`autotune`)
View Response Example ```json { "status": "finished", "run_id": "autotune_1763222228", "elapsed_seconds": 4.733, "recommendation": { "retriever": "BM25", "embedding_model": "sentence-transformers/all-MiniLM-L6-v2", "chunk_size": 100, "overlap": 30, "strategy": "fixed", "chunk_candidates": [[100, 30], [110, 30]] }, "chunk_candidates": [[90, 50], [70, 50]], "best_config": { "retriever": "BM25", "embedding_model": "sentence-transformers/all-MiniLM-L6-v2", "reranker": "mmr", "chunk_size": 70, "overlap": 50, "strategy": "fixed", "faithfulness": 1.0, "latency": 0.0272 }, "results": [ { "retriever": "BM25", "embedding_model": "sentence-transformers/all-MiniLM-L6-v2", "reranker": "mmr", "chunk_size": 70, "overlap": 50, "strategy": "fixed", "faithfulness": 1.0, "latency": 0.0272 }, { "retriever": "BM25", "embedding_model": "sentence-transformers/all-MiniLM-L6-v2", "reranker": "mmr", "chunk_size": 90, "overlap": 50, "strategy": "fixed", "faithfulness": 1.0, "latency": 0.0186 } ], "corpus_stats": { "num_docs": 1, "avg_len": 8.0, "corpus_size": 61 } } ```
- **recommendation**: The tuned configuration suggested by the autotuner. - **chunk_candidates**: List of possible chunk_size/overlap pairs analyzed. - **best_config**: Best-performing configuration with metrics. - **results**: All tested configurations and their performance. - **corpus_stats**: Same as in optimize response. - **status, run_id, elapsed_seconds**: Same meaning as Optimize endpoint. ๐Ÿง  **Difference from Optimize**: Autotune automatically selects the best hyperparameters, rather than testing all user-specified combinations. ### 5. Generate QA Response (`generate_qa`)
View Response Example ```json { "status": "finished", "output_path": "data/docs/validation_qa.json", "preview_count": 3, "sample": [ { "query": "What capability does Artificial Intelligence provide to machines?", "expected_answer": "Artificial Intelligence enables machines to learn from data." }, { "query": "What is the primary source of learning for machines with Artificial Intelligence?", "expected_answer": "Machines with Artificial Intelligence learn from data." }, { "query": "How does Artificial Intelligence facilitate machine learning?", "expected_answer": "Artificial Intelligence enables machines to learn from data." } ] } ```
- **output_path**: Where the generated QA JSON file is saved. - **preview_count**: Number of QA pairs included in the response preview. - **sample**: Example QA pairs: - **query** โ†’ The question generated from the document. - **expected_answer** โ†’ The reference answer corresponding to that question. - **status**: `"finished"` โ†’ QA generation completed successfully. ### 6. Clear Cache Response (`clear_cache`)
View Response Example ```json { "status": "ok", "deleted_files": 7, "docs_path": "data/docs" } ```
- **deleted_files**: Number of documents removed. - **status**: "ok" indicates successful workspace reset. --- ## ๐Ÿ“˜ License This project is licensed under the Apache License 2.0. See the [LICENSE](LICENSE) file for details. ---

Built with โค๏ธ by Andrรฉ Oliveira | Apache 2.0 License