ragmint-mcp-server / README.md
Andrรฉ Oliveira
added demo on youtube
56014e8

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: Ragmint MCP Server
emoji: ๐Ÿง 
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
license: apache-2.0
pinned: true
short_description: MCP server for Ragmint with RAG pipeline optimization
tags:
  - building-mcp-track-enterprise
  - mcp
  - rag
  - llm
  - gradio
  - bayesian-optimization
  - embeddings
  - vector-search
  - gemini
  - retrievers
  - python-library

Ragmint MCP Server

Ragmint Banner

Gradio-based MCP server for Ragmint, enabling Retrieval-Augmented Generation (RAG) pipeline optimization and tuning via an MCP interface.

Python License Status MCP LinkedIn


๐Ÿงฉ Overview

Ragmint MCP Server exposes the full power of Ragmint, a modular Python library for evaluating, optimizing, and tuning RAG pipelines, through a Multimodal Control Plane (MCP). This allows external clients (like Claude Desktop or Cursor) to run experiments and tune RAG parameters programmatically.

Ragmint

Ragmint (Retrieval-Augmented Generation Model Inspection & Tuning) is a modular Python library for evaluating, optimizing, and tuning RAG pipelines. Itโ€™s designed for developers and researchers who want automated hyperparameter optimization, retriever selection, embedding tuning, explainability, and reproducible experiment tracking.

Python License PyPI HF Space MCP Status Optuna Google Gemini 2.5

Features exposed via MCP:

  • โœ… Automated hyperparameter optimization (Grid, Random, Bayesian via Optuna).
  • ๐Ÿค– Auto-RAG Tuner for dynamic retrieverโ€“embedding recommendations.
  • ๐Ÿงฎ Validation QA generation for corpora without labeled data.
  • ๐Ÿ“ฆ Chunking, embeddings, retrievers, rerankers configuration.
  • โš™๏ธ Full RAG pipeline control programmatically.

๐Ÿš€ Quick Start

Installation

pip install -r requirements.txt

Running the MCP Server

python app.py

The server will expose MCP-compatible endpoints, allowing clients to:

  • Perform optimization experiments.
  • Automatically autotune pipelines.
  • Generate validation QA sets with LLM.

Environment Variables

Set API keys for LLMs used in explainability and QA generation:

export GOOGLE_API_KEY="your_gemini_key"

๐Ÿง  MCP Usage

Ragmint MCP Server provides Python-callable interfaces for programmatic control. You can find an example of MCP usage in the Ragmint MCP Server Space on Hugging Face.


๐Ÿ”ค Supported Embeddings

  • sentence-transformers/all-MiniLM-L6-v2
  • sentence-transformers/all-mpnet-base-v2
  • BAAI/bge-base-en-v1.5
  • intfloat/multilingual-e5-base

Configuration Example

embedding_model: sentence-transformers/all-MiniLM-L6-v2

๐Ÿ” Supported Retrievers

Retriever Description
FAISS Fast vector similarity search and indexing.
Chroma Persistent vector database with embeddings.
bm25 Classical lexical search based on term relevance (TF-IDF-style).
numpy Brute-force similarity search using raw vectors and matrix ops.

Configuration Example

retriever: faiss

๐Ÿงฎ Dataset Options

Mode Example Description
Default validation_set=None Uses built-in validation_qa.json.
Custom File validation_set="data/my_eval.json" Your QA dataset.
Hugging Face Dataset validation_set="squad" Downloads benchmark dataset.
Generate validation_set="generate" Generates the QA dataset with LLM.

๐Ÿงฉ Folder Structure

ragmint_mcp_server/
โ”œโ”€โ”€ app.py  # MCP server entrypoint
โ”œโ”€โ”€ models.py
โ””โ”€โ”€ api.py

๐Ÿ”ง MCP Tools (app.py)

The app.py file provides the Gradio UI and also registers the functions exposed as MCP Tools, enabling external MCP clients (Claude Desktop, Cursor, VS Code MCP extension, etc.) to call Ragmint programmatically.

app.py launches the FastAPI backend (api.py) in a background thread and exposes the following MCP tools:

MCP Tool Python Function Description
upload_docs upload_docs_tool() Uploads .txt files or remote URLs into the configured docs_path.
upload_urls upload_urls_tool() Downloads remote files from external URLs and stores them inside docs_path.
optimize_rag optimize_rag_tool() Runs explicit hyperparameter optimization for a RAG pipeline.
autotune autotune_tool() Automatically recommends best chunking + embedding configuration.
generate_qa generate_qa_tool() Generates synthetic QA validation dataset for evaluation.
clear_cache clear_cache_tool() Deletes all docs inside data/docs to reset the workspace.

๐ŸŽฌ Demo

YouTube: https://www.youtube.com/watch?v=DKtHBI3jYgQ


๐Ÿ“ฅ Inputs

The Ragmint MCP Server exposes three main endpoints with the following inputs:

1. Upload Documents (upload_docs)

Input: .txt files or file-like objects to upload to the documents directory (docs_path).

View Input Model
Field Type Description Example
files File[] Local .txt files selected or passed from MCP client ["sample.txt"]
docs_path str Directory where files are stored data/docs

2. Upload URLs (upload_urls)

Input: List of URLs referencing .txt files to download and store in docs_path.

View Input Model
Field Type Description Example
urls List[str] List of URLs pointing to remote documents ["https://example.com/doc.txt"]
docs_path str Directory where downloaded files are saved data/docs

3. Optimize RAG (optimize_rag)

Input: JSON object following the OptimizeRequest model.

View Input Model
Field Type Description Example
docs_path str Folder containing documents data/docs
retriever List[str] Retriever type ["faiss"]
embedding_model List[str] Embedding model name or path ["sentence-transformers/all-MiniLM-L6-v2"]
strategy List[str] RAG strategy ["fixed"]
chunk_sizes List[int] Chunk sizes to evaluate [200]
overlaps List[int] Overlap values to test [50]
rerankers List[str] Rerankers to apply after retrieval ["mmr"]
search_type str Parameter search method (grid, random, bayesian) "grid"
trials int Number of optimization trials 2
metric str Evaluation metric for optimization "faithfulness"
validation_choice str Validation data source (generate, local JSON path, HF dataset ID, etc.) "generate"
llm_model str LLM used to generate QA dataset when validation_choice=generate "gemini-2.5-flash-lite"

4. Autotune RAG (autotune)

Input: JSON object following the AutotuneRequest model.

View Input Model
Field Type Description Example
docs_path str Folder containing documents data/docs
embedding_model str Embedding model name or path "sentence-transformers/all-MiniLM-L6-v2"
num_chunk_pairs int Number of chunk pairs to analyze for tuning 2
metric str Evaluation metric for optimization "faithfulness"
search_type str Search method (grid, random, bayesian) "grid"
trials int Number of optimization trials 2
validation_choice str Validation data source (generate, local JSON, HF dataset) "generate"
llm_model str LLM used for generating QA dataset "gemini-2.5-flash-lite"

5. Generate QA (generate_qa)

Input: JSON object following the QARequest model.

View Input Model
Field Type Description Example
docs_path str Folder containing documents for QA generation data/docs
llm_model str LLM used for question generation "gemini-2.5-flash-lite"
batch_size int Number of documents processed per batch 5
min_q int Minimum number of questions per document 3
max_q int Maximum number of questions per document 25

6. Clear Cache (clear_cache)

Deletes all stored documents from data/docs.

View Input Model
Field Type Description Example
docs_path str Folder to wipe clean data/docs

๐Ÿ“ค Outputs

The Ragmint MCP Server exposes three main endpoints with the following example outputs:

1. Upload Documents Response (upload_docs)

View Response Example
{
  "status": "ok",
  "uploaded_files": ["sample.txt"],
  "docs_path": "data/docs"
}
  • status: "ok" โ†’ Indicates that the upload was successful.
  • uploaded_files: List of file names that were successfully uploaded.
  • docs_path: The directory where the uploaded documents are stored.

โœ… Confirms your documents are ready for RAG operations.

2. Upload URLs Response (upload_urls)

View Response Example
{
  "status": "ok",
  "uploaded_files": ["doc.txt"],
  "docs_path": "data/docs"
}
  • status: "ok" โ†’ Indicates that the upload was successful.
  • uploaded_files: List of file names that were successfully uploaded.
  • docs_path: The directory where the uploaded documents are stored.

โœ… Confirms your documents are ready for RAG operations.

3. Optimize RAG Response (optimize_rag)

View Response Example
{
  "status": "finished",
  "run_id": "opt_1763222218",
  "elapsed_seconds": 0.937,
  "best_config": {
    "retriever": "faiss",
    "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
    "reranker": "mmr",
    "chunk_size": 200,
    "overlap": 50,
    "strategy": "fixed",
    "faithfulness": 0.8659,
    "latency": 0.0333
  },
  "results": [
    {
      "retriever": "faiss",
      "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
      "reranker": "mmr",
      "chunk_size": 200,
      "overlap": 50,
      "strategy": "fixed",
      "faithfulness": 0.8659,
      "latency": 0.0333
    }
  ],
  "corpus_stats": {
    "num_docs": 1,
    "avg_len": 8.0,
    "corpus_size": 61
  }
}
  • status: "finished" โ†’ Optimization process completed.
  • run_id: Unique identifier for this optimization run.
  • elapsed_seconds: How long the optimization took.
  • best_config: Configuration that gave the best performance.
    • retriever โ†’ The retrieval algorithm used (faiss).
    • embedding_model โ†’ Embedding model applied.
    • reranker โ†’ Reranking strategy after retrieval.
    • chunk_size โ†’ Size of document chunks used in RAG.
    • overlap โ†’ Overlap between consecutive chunks.
    • strategy โ†’ RAG retrieval strategy.
    • faithfulness โ†’ Evaluation score (higher = better).
    • latency โ†’ Time per query in seconds.
  • results: List of all tested configurations and their scores.
  • corpus_stats: Statistics about the uploaded documents.
    • num_docs โ†’ Number of documents in corpus.
    • avg_len โ†’ Average document length.
    • corpus_size โ†’ Total size in characters or tokens.

4. Autotune RAG Response (autotune)

View Response Example
{
  "status": "finished",
  "run_id": "autotune_1763222228",
  "elapsed_seconds": 4.733,
  "recommendation": {
    "retriever": "BM25",
    "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
    "chunk_size": 100,
    "overlap": 30,
    "strategy": "fixed",
    "chunk_candidates": [[100, 30], [110, 30]]
  },
  "chunk_candidates": [[90, 50], [70, 50]],
  "best_config": {
    "retriever": "BM25",
    "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
    "reranker": "mmr",
    "chunk_size": 70,
    "overlap": 50,
    "strategy": "fixed",
    "faithfulness": 1.0,
    "latency": 0.0272
  },
  "results": [
    {
      "retriever": "BM25",
      "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
      "reranker": "mmr",
      "chunk_size": 70,
      "overlap": 50,
      "strategy": "fixed",
      "faithfulness": 1.0,
      "latency": 0.0272
    },
    {
      "retriever": "BM25",
      "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
      "reranker": "mmr",
      "chunk_size": 90,
      "overlap": 50,
      "strategy": "fixed",
      "faithfulness": 1.0,
      "latency": 0.0186
    }
  ],
  "corpus_stats": {
    "num_docs": 1,
    "avg_len": 8.0,
    "corpus_size": 61
  }
}
  • recommendation: The tuned configuration suggested by the autotuner.
  • chunk_candidates: List of possible chunk_size/overlap pairs analyzed.
  • best_config: Best-performing configuration with metrics.
  • results: All tested configurations and their performance.
  • corpus_stats: Same as in optimize response.
  • status, run_id, elapsed_seconds: Same meaning as Optimize endpoint.

๐Ÿง  Difference from Optimize: Autotune automatically selects the best hyperparameters, rather than testing all user-specified combinations.

5. Generate QA Response (generate_qa)

View Response Example
{
  "status": "finished",
  "output_path": "data/docs/validation_qa.json",
  "preview_count": 3,
  "sample": [
    {
      "query": "What capability does Artificial Intelligence provide to machines?",
      "expected_answer": "Artificial Intelligence enables machines to learn from data."
    },
    {
      "query": "What is the primary source of learning for machines with Artificial Intelligence?",
      "expected_answer": "Machines with Artificial Intelligence learn from data."
    },
    {
      "query": "How does Artificial Intelligence facilitate machine learning?",
      "expected_answer": "Artificial Intelligence enables machines to learn from data."
    }
  ]
}
  • output_path: Where the generated QA JSON file is saved.
  • preview_count: Number of QA pairs included in the response preview.
  • sample: Example QA pairs:
    • query โ†’ The question generated from the document.
    • expected_answer โ†’ The reference answer corresponding to that question.
  • status: "finished" โ†’ QA generation completed successfully.

6. Clear Cache Response (clear_cache)

View Response Example
{
  "status": "ok",
  "deleted_files": 7,
  "docs_path": "data/docs"
}
  • deleted_files: Number of documents removed.
  • status: "ok" indicates successful workspace reset.

๐Ÿ“˜ License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.


Built with โค๏ธ by Andrรฉ Oliveira | Apache 2.0 License