Spaces:

DataQuests
/

DeepCritical

Running

App Files Files Community

devtonic

by Tonic - opened 19 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

-58693

This view is limited to 50 files because it contains too many changes. See the raw diff here.

Files changed (50) hide show

.cursorrules +0 -240
.env.example +0 -107
.github/README.md +0 -56
.github/scripts/deploy_to_hf_space.py +0 -391
.github/workflows/ci.yml +0 -127
.github/workflows/deploy-hf-space.yml +0 -47
.gitignore +0 -84
.pre-commit-config.yaml +0 -64
.pre-commit-hooks/run_pytest.ps1 +0 -19
.pre-commit-hooks/run_pytest.sh +0 -20
.pre-commit-hooks/run_pytest_embeddings.ps1 +0 -14
.pre-commit-hooks/run_pytest_embeddings.sh +0 -15
.pre-commit-hooks/run_pytest_unit.ps1 +0 -14
.pre-commit-hooks/run_pytest_unit.sh +0 -15
.pre-commit-hooks/run_pytest_with_sync.ps1 +0 -25
.pre-commit-hooks/run_pytest_with_sync.py +0 -235
.python-version +0 -1
AGENTS.txt +0 -236
CONTRIBUTING.md +0 -494
Dockerfile +0 -52
LICENSE.md +0 -25
README.md +8 -56
deployments/README.md +0 -46
deployments/modal_tts.py +0 -97
dev/.cursorrules +0 -241
dev/AGENTS.txt +0 -236
dev/docs_plugins.py +0 -74
docs/LICENSE.md +0 -35
docs/api/agents.md +0 -211
docs/api/models.md +0 -191
docs/api/orchestrators.md +0 -149
docs/api/services.md +0 -279
docs/api/tools.md +0 -259
docs/architecture/agents.md +0 -293
docs/architecture/graph_orchestration.md +0 -302
docs/architecture/middleware.md +0 -146
docs/architecture/orchestrators.md +0 -201
docs/architecture/services.md +0 -146
docs/architecture/tools.md +0 -167
docs/architecture/workflow-diagrams.md +0 -655
docs/configuration/index.md +0 -564
docs/contributing/code-quality.md +0 -120
docs/contributing/code-style.md +0 -83
docs/contributing/error-handling.md +0 -54
docs/contributing/implementation-patterns.md +0 -67
docs/contributing/index.md +0 -254
docs/contributing/prompt-engineering.md +0 -55
docs/contributing/testing.md +0 -115
docs/getting-started/examples.md +0 -198
docs/getting-started/installation.md +0 -152

.cursorrules DELETED Viewed

@@ -1,240 +0,0 @@
-# DeepCritical Project - Cursor Rules
-## Project-Wide Rules
-**Architecture**: Multi-agent research system using Pydantic AI for agent orchestration, supporting iterative and deep research patterns. Uses middleware for state management, budget tracking, and workflow coordination.
-**Type Safety**: ALWAYS use complete type hints. All functions must have parameter and return type annotations. Use `mypy --strict` compliance. Use `TYPE_CHECKING` imports for circular dependencies: `from typing import TYPE_CHECKING; if TYPE_CHECKING: from src.services.embeddings import EmbeddingService`
-**Async Patterns**: ALL I/O operations must be async (`async def`, `await`). Use `asyncio.gather()` for parallel operations. CPU-bound work must use `run_in_executor()`: `loop = asyncio.get_running_loop(); result = await loop.run_in_executor(None, cpu_bound_function, args)`. Never block the event loop.
-**Error Handling**: Use custom exceptions from `src/utils/exceptions.py`: `DeepCriticalError`, `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions: `raise SearchError(...) from e`. Log with structlog: `logger.error("Operation failed", error=str(e), context=value)`.
-**Logging**: Use `structlog` for ALL logging (NOT `print` or `logging`). Import: `import structlog; logger = structlog.get_logger()`. Log with structured data: `logger.info("event", key=value)`. Use appropriate levels: DEBUG, INFO, WARNING, ERROR.
-**Pydantic Models**: All data exchange uses Pydantic models from `src/utils/models.py`. Models are frozen (`model_config = {"frozen": True}`) for immutability. Use `Field()` with descriptions. Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints.
-**Code Style**: Ruff with 100-char line length. Ignore rules: `PLR0913` (too many arguments), `PLR0912` (too many branches), `PLR0911` (too many returns), `PLR2004` (magic values), `PLW0603` (global statement), `PLC0415` (lazy imports).
-**Docstrings**: Google-style docstrings for all public functions. Include Args, Returns, Raises sections. Use type hints in docstrings only if needed for clarity.
-**Testing**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`). Use `respx` for httpx mocking, `pytest-mock` for general mocking.
-**State Management**: Use `ContextVar` in middleware for thread-safe isolation. Never use global mutable state (except singletons via `@lru_cache`). Use `WorkflowState` from `src/middleware/state_machine.py` for workflow state.
-**Citation Validation**: ALWAYS validate references before returning reports. Use `validate_references()` from `src/utils/citation_validator.py`. Remove hallucinated citations. Log warnings for removed citations.
----
-## src/agents/ - Agent Implementation Rules
-**Pattern**: All agents use Pydantic AI `Agent` class. Agents have structured output types (Pydantic models) or return strings. Use factory functions in `src/agent_factory/agents.py` for creation.
-**Agent Structure**:
-- System prompt as module-level constant (with date injection: `datetime.now().strftime("%Y-%m-%d")`)
-- Agent class with `__init__(model: Any | None = None)`
-- Main method (e.g., `async def evaluate()`, `async def write_report()`)
-- Factory function: `def create_agent_name(model: Any | None = None) -> AgentName`
-**Model Initialization**: Use `get_model()` from `src/agent_factory/judges.py` if no model provided. Support OpenAI/Anthropic/HF Inference via settings.
-**Error Handling**: Return fallback values (e.g., `KnowledgeGapOutput(research_complete=False, outstanding_gaps=[...])`) on failure. Log errors with context. Use retry logic (3 retries) in Pydantic AI Agent initialization.
-**Input Validation**: Validate query/inputs are not empty. Truncate very long inputs with warnings. Handle None values gracefully.
-**Output Types**: Use structured output types from `src/utils/models.py` (e.g., `KnowledgeGapOutput`, `AgentSelectionPlan`, `ReportDraft`). For text output (writer agents), return `str` directly.
-**Agent-Specific Rules**:
-- `knowledge_gap.py`: Outputs `KnowledgeGapOutput`. Evaluates research completeness.
-- `tool_selector.py`: Outputs `AgentSelectionPlan`. Selects tools (RAG/web/database).
-- `writer.py`: Returns markdown string. Includes citations in numbered format.
-- `long_writer.py`: Uses `ReportDraft` input/output. Handles section-by-section writing.
-- `proofreader.py`: Takes `ReportDraft`, returns polished markdown.
-- `thinking.py`: Returns observation string from conversation history.
-- `input_parser.py`: Outputs `ParsedQuery` with research mode detection.
----
-## src/tools/ - Search Tool Rules
-**Protocol**: All tools implement `SearchTool` protocol from `src/tools/base.py`: `name` property and `async def search(query, max_results) -> list[Evidence]`.
-**Rate Limiting**: Use `@retry` decorator from tenacity: `@retry(stop=stop_after_attempt(3), wait=wait_exponential(...))`. Implement `_rate_limit()` method for APIs with limits. Use shared rate limiters from `src/tools/rate_limiter.py`.
-**Error Handling**: Raise `SearchError` or `RateLimitError` on failures. Handle HTTP errors (429, 500, timeout). Return empty list on non-critical errors (log warning).
-**Query Preprocessing**: Use `preprocess_query()` from `src/tools/query_utils.py` to remove noise and expand synonyms.
-**Evidence Conversion**: Convert API responses to `Evidence` objects with `Citation`. Extract metadata (title, url, date, authors). Set relevance scores (0.0-1.0). Handle missing fields gracefully.
-**Tool-Specific Rules**:
-- `pubmed.py`: Use NCBI E-utilities (ESearch → EFetch). Rate limit: 0.34s between requests. Parse XML with `xmltodict`. Handle single vs. multiple articles.
-- `clinicaltrials.py`: Use `requests` library (NOT httpx - WAF blocks httpx). Run in thread pool: `await asyncio.to_thread(requests.get, ...)`. Filter: Only interventional studies, active/completed.
-- `europepmc.py`: Handle preprint markers: `[PREPRINT - Not peer-reviewed]`. Build URLs from DOI or PMID.
-- `rag_tool.py`: Wraps `LlamaIndexRAGService`. Returns Evidence from RAG results. Handles ingestion.
-- `search_handler.py`: Orchestrates parallel searches across multiple tools. Uses `asyncio.gather()` with `return_exceptions=True`. Aggregates results into `SearchResult`.
----
-## src/middleware/ - Middleware Rules
-**State Management**: Use `ContextVar` for thread-safe isolation. `WorkflowState` uses `ContextVar[WorkflowState | None]`. Initialize with `init_workflow_state(embedding_service)`. Access with `get_workflow_state()` (auto-initializes if missing).
-**WorkflowState**: Tracks `evidence: list[Evidence]`, `conversation: Conversation`, `embedding_service: Any`. Methods: `add_evidence()` (deduplicates by URL), `async search_related()` (semantic search).
-**WorkflowManager**: Manages parallel research loops. Methods: `add_loop()`, `run_loops_parallel()`, `update_loop_status()`, `sync_loop_evidence_to_state()`. Uses `asyncio.gather()` for parallel execution. Handles errors per loop (don't fail all if one fails).
-**BudgetTracker**: Tracks tokens, time, iterations per loop and globally. Methods: `create_budget()`, `add_tokens()`, `start_timer()`, `update_timer()`, `increment_iteration()`, `check_budget()`, `can_continue()`. Token estimation: `estimate_tokens(text)` (~4 chars per token), `estimate_llm_call_tokens(prompt, response)`.
-**Models**: All middleware models in `src/utils/models.py`. `IterationData`, `Conversation`, `ResearchLoop`, `BudgetStatus` are used by middleware.
----
-## src/orchestrator/ - Orchestration Rules
-**Research Flows**: Two patterns: `IterativeResearchFlow` (single loop) and `DeepResearchFlow` (plan → parallel loops → synthesis). Both support agent chains (`use_graph=False`) and graph execution (`use_graph=True`).
-**IterativeResearchFlow**: Pattern: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete. Uses `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`, `WriterAgent`, `JudgeHandler`. Tracks iterations, time, budget.
-**DeepResearchFlow**: Pattern: Planner → Parallel iterative loops per section → Synthesizer. Uses `PlannerAgent`, `IterativeResearchFlow` (per section), `LongWriterAgent` or `ProofreaderAgent`. Uses `WorkflowManager` for parallel execution.
-**Graph Orchestrator**: Uses Pydantic AI Graphs (when available) or agent chains (fallback). Routes based on research mode (iterative/deep/auto). Streams `AgentEvent` objects for UI.
-**State Initialization**: Always call `init_workflow_state()` before running flows. Initialize `BudgetTracker` per loop. Use `WorkflowManager` for parallel coordination.
-**Event Streaming**: Yield `AgentEvent` objects during execution. Event types: "started", "search_complete", "judge_complete", "hypothesizing", "synthesizing", "complete", "error". Include iteration numbers and data payloads.
----
-## src/services/ - Service Rules
-**EmbeddingService**: Local sentence-transformers (NO API key required). All operations async-safe via `run_in_executor()`. ChromaDB for vector storage. Deduplication threshold: 0.85 (85% similarity = duplicate).
-**LlamaIndexRAGService**: Uses OpenAI embeddings (requires `OPENAI_API_KEY`). Methods: `ingest_evidence()`, `retrieve()`, `query()`. Returns documents with metadata (source, title, url, date, authors). Lazy initialization with graceful fallback.
-**StatisticalAnalyzer**: Generates Python code via LLM. Executes in Modal sandbox (secure, isolated). Library versions pinned in `SANDBOX_LIBRARIES` dict. Returns `AnalysisResult` with verdict (SUPPORTED/REFUTED/INCONCLUSIVE).
-**Singleton Pattern**: Use `@lru_cache(maxsize=1)` for singletons: `@lru_cache(maxsize=1); def get_service() -> Service: return Service()`. Lazy initialization to avoid requiring dependencies at import time.
----
-## src/utils/ - Utility Rules
-**Models**: All Pydantic models in `src/utils/models.py`. Use frozen models (`model_config = {"frozen": True}`) except where mutation needed. Use `Field()` with descriptions. Validate with constraints.
-**Config**: Settings via Pydantic Settings (`src/utils/config.py`). Load from `.env` automatically. Use `settings` singleton: `from src.utils.config import settings`. Validate API keys with properties: `has_openai_key`, `has_anthropic_key`.
-**Exceptions**: Custom exception hierarchy in `src/utils/exceptions.py`. Base: `DeepCriticalError`. Specific: `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions.
-**LLM Factory**: Centralized LLM model creation in `src/utils/llm_factory.py`. Supports OpenAI, Anthropic, HF Inference. Use `get_model()` or factory functions. Check requirements before initialization.
-**Citation Validator**: Use `validate_references()` from `src/utils/citation_validator.py`. Removes hallucinated citations (URLs not in evidence). Logs warnings. Returns validated report string.
----
-## src/orchestrator_factory.py Rules
-**Purpose**: Factory for creating orchestrators. Supports "simple" (legacy) and "advanced" (magentic) modes. Auto-detects mode based on API key availability.
-**Pattern**: Lazy import for optional dependencies (`_get_magentic_orchestrator_class()`). Handles `ImportError` gracefully with clear error messages.
-**Mode Detection**: `_determine_mode()` checks explicit mode or auto-detects: "advanced" if `settings.has_openai_key`, else "simple". Maps "magentic" → "advanced".
-**Function Signature**: `create_orchestrator(search_handler, judge_handler, config, mode) -> Any`. Simple mode requires handlers. Advanced mode uses MagenticOrchestrator.
-**Error Handling**: Raise `ValueError` with clear messages if requirements not met. Log mode selection with structlog.
----
-## src/orchestrator_hierarchical.py Rules
-**Purpose**: Hierarchical orchestrator using middleware and sub-teams. Adapts Magentic ChatAgent to SubIterationTeam protocol.
-**Pattern**: Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`. Event-driven via callback queue.
-**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated, but kept for compatibility).
-**Event Streaming**: Uses `asyncio.Queue` for event coordination. Yields `AgentEvent` objects. Handles event callback pattern with `asyncio.wait()`.
-**Error Handling**: Log errors with context. Yield error events. Process remaining events after task completion.
----
-## src/orchestrator_magentic.py Rules
-**Purpose**: Magentic-based orchestrator using ChatAgent pattern. Each agent has internal LLM. Manager orchestrates agents.
-**Pattern**: Uses `MagenticBuilder` with participants (searcher, hypothesizer, judge, reporter). Manager uses `OpenAIChatClient`. Workflow built in `_build_workflow()`.
-**Event Processing**: `_process_event()` converts Magentic events to `AgentEvent`. Handles: `MagenticOrchestratorMessageEvent`, `MagenticAgentMessageEvent`, `MagenticFinalResultEvent`, `MagenticAgentDeltaEvent`, `WorkflowOutputEvent`.
-**Text Extraction**: `_extract_text()` defensively extracts text from messages. Priority: `.content` → `.text` → `str(message)`. Handles buggy message objects.
-**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated).
-**Requirements**: Must call `check_magentic_requirements()` in `__init__`. Requires `agent-framework-core` and OpenAI API key.
-**Event Types**: Maps agent names to event types: "search" → "search_complete", "judge" → "judge_complete", "hypothes" → "hypothesizing", "report" → "synthesizing".
----
-## src/agent_factory/ - Factory Rules
-**Pattern**: Factory functions for creating agents and handlers. Lazy initialization for optional dependencies. Support OpenAI/Anthropic/HF Inference.
-**Judges**: `create_judge_handler()` creates `JudgeHandler` with structured output (`JudgeAssessment`). Supports `MockJudgeHandler`, `HFInferenceJudgeHandler` as fallbacks.
-**Agents**: Factory functions in `agents.py` for all Pydantic AI agents. Pattern: `create_agent_name(model: Any | None = None) -> AgentName`. Use `get_model()` if model not provided.
-**Graph Builder**: `graph_builder.py` contains utilities for building research graphs. Supports iterative and deep research graph construction.
-**Error Handling**: Raise `ConfigurationError` if required API keys missing. Log agent creation. Handle import errors gracefully.
----
-## src/prompts/ - Prompt Rules
-**Pattern**: System prompts stored as module-level constants. Include date injection: `datetime.now().strftime("%Y-%m-%d")`. Format evidence with truncation (1500 chars per item).
-**Judge Prompts**: In `judge.py`. Handle empty evidence case separately. Always request structured JSON output.
-**Hypothesis Prompts**: In `hypothesis.py`. Use diverse evidence selection (MMR algorithm). Sentence-aware truncation.
-**Report Prompts**: In `report.py`. Include full citation details. Use diverse evidence selection (n=20). Emphasize citation validation rules.
----
-## Testing Rules
-**Structure**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`).
-**Mocking**: Use `respx` for httpx mocking. Use `pytest-mock` for general mocking. Mock LLM calls in unit tests (use `MockJudgeHandler`).
-**Fixtures**: Common fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`.
-**Coverage**: Aim for >80% coverage. Test error handling, edge cases, and integration paths.
----
-## File-Specific Agent Rules
-**knowledge_gap.py**: Outputs `KnowledgeGapOutput`. System prompt evaluates research completeness. Handles conversation history. Returns fallback on error.
-**writer.py**: Returns markdown string. System prompt includes citation format examples. Validates inputs. Truncates long findings. Retry logic for transient failures.
-**long_writer.py**: Uses `ReportDraft` input/output. Writes sections iteratively. Reformats references (deduplicates, renumbers). Reformats section headings.
-**proofreader.py**: Takes `ReportDraft`, returns polished markdown. Removes duplicates. Adds summary. Preserves references.
-**tool_selector.py**: Outputs `AgentSelectionPlan`. System prompt lists available agents (WebSearchAgent, SiteCrawlerAgent, RAGAgent). Guidelines for when to use each.
-**thinking.py**: Returns observation string. Generates observations from conversation history. Uses query and background context.
-**input_parser.py**: Outputs `ParsedQuery`. Detects research mode (iterative/deep). Extracts entities and research questions. Improves/refines query.

.env.example DELETED Viewed

@@ -1,107 +0,0 @@
-# HuggingFace
-HF_TOKEN=your_huggingface_token_here
-# OpenAI (optional)
-OPENAI_API_KEY=your_openai_key_here
-# Anthropic (optional)
-ANTHROPIC_API_KEY=your_anthropic_key_here
-# Model names (optional - sensible defaults set in config.py)
-# ANTHROPIC_MODEL=claude-sonnet-4-5-20250929
-# OPENAI_MODEL=gpt-5.1
-# ============================================
-# Audio Processing Configuration (TTS)
-# ============================================
-# Kokoro TTS Model Configuration
-TTS_MODEL=hexgrad/Kokoro-82M
-TTS_VOICE=af_heart
-TTS_SPEED=1.0
-TTS_GPU=T4
-TTS_TIMEOUT=60
-# Available TTS Voices:
-# American English Female: af_heart, af_bella, af_nicole, af_aoede, af_kore, af_sarah, af_nova, af_sky, af_alloy, af_jessica, af_river
-# American English Male: am_michael, am_fenrir, am_puck, am_echo, am_eric, am_liam, am_onyx, am_santa, am_adam
-# Available GPU Types (Modal):
-# T4 - Cheapest, good for testing (default)
-# A10 - Good balance of cost/performance
-# A100 - Fastest, most expensive
-# L4 - NVIDIA L4 GPU
-# L40S - NVIDIA L40S GPU
-# Note: GPU type is set at function definition time. Changes require app restart.
-# ============================================
-# Audio Processing Configuration (STT)
-# ============================================
-# Speech-to-Text API Configuration
-STT_API_URL=nvidia/canary-1b-v2
-STT_SOURCE_LANG=English
-STT_TARGET_LANG=English
-# Available STT Languages:
-# English, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Russian, Ukrainian
-# ============================================
-# Audio Feature Flags
-# ============================================
-ENABLE_AUDIO_INPUT=true
-ENABLE_AUDIO_OUTPUT=true
-# ============================================
-# Image OCR Configuration
-# ============================================
-OCR_API_URL=prithivMLmods/Multimodal-OCR3
-ENABLE_IMAGE_INPUT=true
-# ============== EMBEDDINGS ==============
-# OpenAI Embedding Model (used if LLM_PROVIDER is openai and performing RAG/Embeddings)
-OPENAI_EMBEDDING_MODEL=text-embedding-3-small
-# Local Embedding Model (used for local/offline embeddings)
-LOCAL_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
-# ============== HUGGINGFACE (FREE TIER) ==============
-# HuggingFace Token - enables Llama 3.1 (best quality free model)
-# Get yours at: https://huggingface.co/settings/tokens
-#
-# WITHOUT HF_TOKEN: Falls back to ungated models (zephyr-7b-beta)
-# WITH HF_TOKEN: Uses Llama 3.1 8B Instruct (requires accepting license)
-#
-# For HuggingFace Spaces deployment:
-#   Set this as a "Secret" in Space Settings -> Variables and secrets
-#   Users/judges don't need their own token - the Space secret is used
-#
-HF_TOKEN=hf_your-token-here
-# ============== AGENT CONFIGURATION ==============
-MAX_ITERATIONS=10
-SEARCH_TIMEOUT=30
-LOG_LEVEL=INFO
-# ============================================
-# Modal Configuration (Required for TTS)
-# ============================================
-# Modal credentials are required for TTS (Text-to-Speech) functionality
-# Get your credentials from: https://modal.com/
-MODAL_TOKEN_ID=your_modal_token_id_here
-MODAL_TOKEN_SECRET=your_modal_token_secret_here
-# ============== EXTERNAL SERVICES ==============
-# PubMed (optional - higher rate limits)
-NCBI_API_KEY=your-ncbi-key-here
-# Vector Database (optional - for LlamaIndex RAG)
-CHROMA_DB_PATH=./chroma_db
-# Neo4j Knowledge Graph
-NEO4J_URI=bolt://localhost:7687
-NEO4J_USER=neo4j
-NEO4J_PASSWORD=your_neo4j_password_here
-NEO4J_DATABASE=your_database_name

.github/README.md DELETED Viewed

@@ -1,56 +0,0 @@
-> [!IMPORTANT]
-> **You are reading the Github README!**
->
-> - 📚 **Documentation**: See our [technical documentation](https://deepcritical.github.io/GradioDemo/) for detailed information
-> - 📖 **Demo README**: Check out the [Demo README](..README.md) for more information > - 🏆 **Demo**: Kindly consider using our [Free Demo](https://hf.co/DataQuests/GradioDemo)
-<div align="center">
-[![GitHub](https://img.shields.io/github/stars/DeepCritical/GradioDemo?style=for-the-badge&logo=github&logoColor=white&label=🐙%20GitHub&labelColor=181717&color=181717)](https://github.com/DeepCritical/GradioDemo)
-[![Documentation](https://img.shields.io/badge/Docs-0080FF?style=for-the-badge&logo=readthedocs&logoColor=white&labelColor=0080FF&color=0080FF)](deepcritical.github.io/GradioDemo/)
-[![Demo](https://img.shields.io/badge/🚀%20Demo-FFD21E?style=for-the-badge&logo=huggingface&logoColor=white&labelColor=FFD21E&color=FFD21E)](https://huggingface.co/spaces/DataQuests/DeepCritical)
-[![codecov](https://codecov.io/gh/DeepCritical/GradioDemo/graph/badge.svg?token=B1f05RCGpz)](https://codecov.io/gh/DeepCritical/GradioDemo)
-[![Join us on Discord](https://img.shields.io/discord/1109943800132010065?label=Discord&logo=discord&style=flat-square)](https://discord.gg/qdfnvSPcqP)
-</div>
-## Quick Start
-### 1. Environment Setup
-```bash
-# Install uv if you haven't already
-pip install uv
-# Sync dependencies
-uv sync --all-extras
-```
-### 2. Run the UI
-```bash
-# Start the Gradio app
-gradio run "src/app.py"
-```
-Open your browser to `http://localhost:7860`.
-### 3. Connect via MCP
-This application exposes a Model Context Protocol (MCP) server, allowing you to use its search tools directly from Claude Desktop or other MCP clients.
-**MCP Server URL**: `http://localhost:7860/gradio_api/mcp/`
-**Claude Desktop Configuration**:
-Add this to your `claude_desktop_config.json`:
-```json
-{
-  "mcpServers": {
-    "deepcritical": {
-      "url": "http://localhost:7860/gradio_api/mcp/"
-    }
-  }
-}
-```

.github/scripts/deploy_to_hf_space.py DELETED Viewed

@@ -1,391 +0,0 @@
-"""Deploy repository to Hugging Face Space, excluding unnecessary files."""
-import os
-import shutil
-import subprocess
-import tempfile
-from pathlib import Path
-from huggingface_hub import HfApi
-def get_excluded_dirs() -> set[str]:
-    """Get set of directory names to exclude from deployment."""
-    return {
-        "docs",
-        "dev",
-        "folder",
-        "site",
-        "tests",  # Optional - can be included if desired
-        "examples",  # Optional - can be included if desired
-        ".git",
-        ".github",
-        "__pycache__",
-        ".pytest_cache",
-        ".mypy_cache",
-        ".ruff_cache",
-        ".venv",
-        "venv",
-        "env",
-        "ENV",
-        "node_modules",
-        ".cursor",
-        "reference_repos",
-        "burner_docs",
-        "chroma_db",
-        "logs",
-        "build",
-        "dist",
-        ".eggs",
-        "htmlcov",
-        "hf_space",  # Exclude the cloned HF Space directory itself
-    }
-def get_excluded_files() -> set[str]:
-    """Get set of file names to exclude from deployment."""
-    return {
-        ".pre-commit-config.yaml",
-        "mkdocs.yml",
-        "uv.lock",
-        "AGENTS.txt",
-        ".env",
-        ".env.local",
-        "*.local",
-        ".DS_Store",
-        "Thumbs.db",
-        "*.log",
-        ".coverage",
-        "coverage.xml",
-    }
-def should_exclude(path: Path, excluded_dirs: set[str], excluded_files: set[str]) -> bool:
-    """Check if a path should be excluded from deployment."""
-    # Check if any parent directory is excluded
-    for parent in path.parents:
-        if parent.name in excluded_dirs:
-            return True
-    # Check if the path itself is a directory that should be excluded
-    if path.is_dir() and path.name in excluded_dirs:
-        return True
-    # Check if the file name matches excluded patterns
-    if path.is_file():
-        # Check exact match
-        if path.name in excluded_files:
-            return True
-        # Check pattern matches (simple wildcard support)
-        for pattern in excluded_files:
-            if "*" in pattern:
-                # Simple pattern matching (e.g., "*.log")
-                suffix = pattern.replace("*", "")
-                if path.name.endswith(suffix):
-                    return True
-    return False
-def deploy_to_hf_space() -> None:
-    """Deploy repository to Hugging Face Space.
-    Supports both user and organization Spaces:
-    - User Space: username/space-name
-    - Organization Space: organization-name/space-name
-    Works with both classic tokens and fine-grained tokens.
-    """
-    # Get configuration from environment variables
-    hf_token = os.getenv("HF_TOKEN")
-    hf_username = os.getenv("HF_USERNAME")  # Can be username or organization name
-    space_name = os.getenv("HF_SPACE_NAME")
-    # Check which variables are missing and provide helpful error message
-    missing = []
-    if not hf_token:
-        missing.append("HF_TOKEN (should be in repository secrets)")
-    if not hf_username:
-        missing.append("HF_USERNAME (should be in repository variables)")
-    if not space_name:
-        missing.append("HF_SPACE_NAME (should be in repository variables)")
-    if missing:
-        raise ValueError(
-            f"Missing required environment variables: {', '.join(missing)}\n"
-            f"Please configure:\n"
-            f"  - HF_TOKEN in Settings > Secrets and variables > Actions > Secrets\n"
-            f"  - HF_USERNAME in Settings > Secrets and variables > Actions > Variables\n"
-            f"  - HF_SPACE_NAME in Settings > Secrets and variables > Actions > Variables"
-        )
-    # HF_USERNAME can be either a username or organization name
-    # Format: {username|organization}/{space_name}
-    repo_id = f"{hf_username}/{space_name}"
-    local_dir = "hf_space"
-    print(f"🚀 Deploying to Hugging Face Space: {repo_id}")
-    # Initialize HF API
-    api = HfApi(token=hf_token)
-    # Create Space if it doesn't exist
-    try:
-        api.repo_info(repo_id=repo_id, repo_type="space", token=hf_token)
-        print(f"✅ Space exists: {repo_id}")
-    except Exception:
-        print(f"⚠️  Space does not exist, creating: {repo_id}")
-        # Create new repository
-        # Note: For organizations, repo_id should be "org/space-name"
-        # For users, repo_id should be "username/space-name"
-        api.create_repo(
-            repo_id=repo_id,  # Full repo_id including owner
-            repo_type="space",
-            space_sdk="gradio",
-            token=hf_token,
-            exist_ok=True,
-        )
-        print(f"✅ Created new Space: {repo_id}")
-    # Configure Git credential helper for authentication
-    # This is needed for Git LFS to work properly with fine-grained tokens
-    print("🔐 Configuring Git credentials...")
-    # Use Git credential store to store the token
-    # This allows Git LFS to authenticate properly
-    temp_dir = Path(tempfile.gettempdir())
-    credential_store = temp_dir / ".git-credentials-hf"
-    # Write credentials in the format: https://username:token@huggingface.co
-    credential_store.write_text(
-        f"https://{hf_username}:{hf_token}@huggingface.co\n", encoding="utf-8"
-    )
-    try:
-        credential_store.chmod(0o600)  # Secure permissions (Unix only)
-    except OSError:
-        # Windows doesn't support chmod, skip
-        pass
-    # Configure Git to use the credential store
-    subprocess.run(
-        ["git", "config", "--global", "credential.helper", f"store --file={credential_store}"],
-        check=True,
-        capture_output=True,
-    )
-    # Also set environment variable for Git LFS
-    os.environ["GIT_CREDENTIAL_HELPER"] = f"store --file={credential_store}"
-    # Clone repository using git
-    # Use the token in the URL for initial clone, but LFS will use credential store
-    space_url = f"https://{hf_username}:{hf_token}@huggingface.co/spaces/{repo_id}"
-    if Path(local_dir).exists():
-        print(f"🧹 Removing existing {local_dir} directory...")
-        shutil.rmtree(local_dir)
-    print("📥 Cloning Space repository...")
-    try:
-        result = subprocess.run(
-            ["git", "clone", space_url, local_dir],
-            check=True,
-            capture_output=True,
-            text=True,
-        )
-        print("✅ Cloned Space repository")
-        # After clone, configure the remote to use credential helper
-        # This ensures future operations (like push) use the credential store
-        os.chdir(local_dir)
-        subprocess.run(
-            ["git", "remote", "set-url", "origin", f"https://huggingface.co/spaces/{repo_id}"],
-            check=True,
-            capture_output=True,
-        )
-        os.chdir("..")
-    except subprocess.CalledProcessError as e:
-        error_msg = e.stderr if e.stderr else e.stdout if e.stdout else "Unknown error"
-        print(f"❌ Failed to clone Space repository: {error_msg}")
-        # Try alternative: clone with LFS skip, then fetch LFS files separately
-        print("🔄 Trying alternative clone method (skip LFS during clone)...")
-        try:
-            env = os.environ.copy()
-            env["GIT_LFS_SKIP_SMUDGE"] = "1"  # Skip LFS during clone
-            subprocess.run(
-                ["git", "clone", space_url, local_dir],
-                check=True,
-                capture_output=True,
-                text=True,
-                env=env,
-            )
-            print("✅ Cloned Space repository (LFS skipped)")
-            # Configure remote
-            os.chdir(local_dir)
-            subprocess.run(
-                ["git", "remote", "set-url", "origin", f"https://huggingface.co/spaces/{repo_id}"],
-                check=True,
-                capture_output=True,
-            )
-            # Try to fetch LFS files with proper authentication
-            print("📥 Fetching LFS files...")
-            subprocess.run(
-                ["git", "lfs", "pull"],
-                check=False,  # Don't fail if LFS pull fails - we'll continue without LFS files
-                capture_output=True,
-                text=True,
-            )
-            os.chdir("..")
-            print("✅ Repository cloned (LFS files may be incomplete, but deployment can continue)")
-        except subprocess.CalledProcessError as e2:
-            error_msg2 = e2.stderr if e2.stderr else e2.stdout if e2.stdout else "Unknown error"
-            print(f"❌ Alternative clone method also failed: {error_msg2}")
-            raise RuntimeError(f"Git clone failed: {error_msg}") from e
-    # Get exclusion sets
-    excluded_dirs = get_excluded_dirs()
-    excluded_files = get_excluded_files()
-    # Remove all existing files in HF Space (except .git)
-    print("🧹 Cleaning existing files...")
-    for item in Path(local_dir).iterdir():
-        if item.name == ".git":
-            continue
-        if item.is_dir():
-            shutil.rmtree(item)
-        else:
-            item.unlink()
-    # Copy files from repository root
-    print("📦 Copying files...")
-    repo_root = Path(".")
-    files_copied = 0
-    dirs_copied = 0
-    for item in repo_root.rglob("*"):
-        # Skip if in .git directory
-        if ".git" in item.parts:
-            continue
-        # Skip if in hf_space directory (the cloned Space directory)
-        if "hf_space" in item.parts:
-            continue
-        # Skip if should be excluded
-        if should_exclude(item, excluded_dirs, excluded_files):
-            continue
-        # Calculate relative path
-        try:
-            rel_path = item.relative_to(repo_root)
-        except ValueError:
-            # Item is outside repo root, skip
-            continue
-        # Skip if in excluded directory
-        if any(part in excluded_dirs for part in rel_path.parts):
-            continue
-        # Destination path
-        dest_path = Path(local_dir) / rel_path
-        # Create parent directories
-        dest_path.parent.mkdir(parents=True, exist_ok=True)
-        # Copy file or directory
-        if item.is_file():
-            shutil.copy2(item, dest_path)
-            files_copied += 1
-        elif item.is_dir():
-            # Directory will be created by parent mkdir, but we track it
-            dirs_copied += 1
-    print(f"✅ Copied {files_copied} files and {dirs_copied} directories")
-    # Commit and push changes using git
-    print("💾 Committing changes...")
-    # Change to the Space directory
-    original_cwd = os.getcwd()
-    os.chdir(local_dir)
-    try:
-        # Configure git user (required for commit)
-        subprocess.run(
-            ["git", "config", "user.name", "github-actions[bot]"],
-            check=True,
-            capture_output=True,
-        )
-        subprocess.run(
-            ["git", "config", "user.email", "github-actions[bot]@users.noreply.github.com"],
-            check=True,
-            capture_output=True,
-        )
-        # Add all files
-        subprocess.run(
-            ["git", "add", "."],
-            check=True,
-            capture_output=True,
-        )
-        # Check if there are changes to commit
-        result = subprocess.run(
-            ["git", "status", "--porcelain"],
-            check=False,
-            capture_output=True,
-            text=True,
-        )
-        if result.stdout.strip():
-            # There are changes, commit and push
-            subprocess.run(
-                ["git", "commit", "-m", "Deploy to Hugging Face Space [skip ci]"],
-                check=True,
-                capture_output=True,
-            )
-            print("📤 Pushing to Hugging Face Space...")
-            # Ensure remote URL uses credential helper (not token in URL)
-            subprocess.run(
-                ["git", "remote", "set-url", "origin", f"https://huggingface.co/spaces/{repo_id}"],
-                check=True,
-                capture_output=True,
-            )
-            subprocess.run(
-                ["git", "push"],
-                check=True,
-                capture_output=True,
-            )
-            print("✅ Deployment complete!")
-        else:
-            print("ℹ️  No changes to commit (repository is up to date)")
-    except subprocess.CalledProcessError as e:
-        error_msg = e.stderr if e.stderr else (e.stdout if e.stdout else str(e))
-        if isinstance(error_msg, bytes):
-            error_msg = error_msg.decode("utf-8", errors="replace")
-        if "nothing to commit" in error_msg.lower():
-            print("ℹ️  No changes to commit (repository is up to date)")
-        else:
-            print(f"⚠️  Error during git operations: {error_msg}")
-            raise RuntimeError(f"Git operation failed: {error_msg}") from e
-    finally:
-        # Return to original directory
-        os.chdir(original_cwd)
-        # Clean up credential store for security
-        try:
-            if credential_store.exists():
-                credential_store.unlink()
-        except Exception:
-            # Ignore cleanup errors
-            pass
-    print(f"🎉 Successfully deployed to: https://huggingface.co/spaces/{repo_id}")
-if __name__ == "__main__":
-    deploy_to_hf_space()

.github/workflows/ci.yml DELETED Viewed

@@ -1,127 +0,0 @@
-name: CI
-on:
-  push:
-    branches: [main, dev, develop]
-  pull_request:
-    branches: [main, dev, develop]
-jobs:
-  test:
-    runs-on: ubuntu-latest
-    strategy:
-      matrix:
-        python-version: ["3.11"]
-    steps:
-      - uses: actions/checkout@v4
-      - name: Set up Python ${{ matrix.python-version }}
-        uses: actions/setup-python@v5
-        with:
-          python-version: ${{ matrix.python-version }}
-      - name: Install dependencies
-        run: |
-          python -m pip install --upgrade pip
-          pip install -e ".[dev]"
-      - name: Lint with ruff
-        run: |
-          ruff check . --exclude tests
-          ruff format --check . --exclude tests
-        continue-on-error: true
-      - name: Type check with mypy
-        run: |
-          mypy src
-        continue-on-error: true
-      - name: Install embedding dependencies
-        run: |
-          pip install -e ".[embeddings]"
-      - name: Run unit tests (excluding OpenAI and embedding providers)
-        env:
-          HF_TOKEN: ${{ secrets.HF_TOKEN }}
-        run: |
-          pytest tests/unit/ -v -m "not openai and not embedding_provider" --tb=short -p no:logfire --cov --cov-branch --cov-report=xml --cov-report=term
-      - name: Run local embeddings tests
-        env:
-          HF_TOKEN: ${{ secrets.HF_TOKEN }}
-        run: |
-          pytest tests/ -v -m "local_embeddings" --tb=short -p no:logfire --cov --cov-branch --cov-report=xml --cov-report=term --cov-append || true
-        continue-on-error: true  # Allow failures if dependencies not available
-      - name: Run HuggingFace integration tests
-        env:
-          HF_TOKEN: ${{ secrets.HF_TOKEN }}
-        run: |
-          pytest tests/integration/ -v -m "huggingface and not embedding_provider" --tb=short -p no:logfire --cov --cov-branch --cov-report=xml --cov-report=term --cov-append || true
-        continue-on-error: true  # Allow failures if HF_TOKEN not set
-      - name: Run non-OpenAI integration tests (excluding embedding providers)
-        env:
-          HF_TOKEN: ${{ secrets.HF_TOKEN }}
-        run: |
-          pytest tests/integration/ -v -m "integration and not openai and not embedding_provider" --tb=short -p no:logfire --cov --cov-branch --cov-report=xml --cov-report=term --cov-append || true
-        continue-on-error: true  # Allow failures if dependencies not available
-      - name: Upload coverage reports to Codecov
-        uses: codecov/codecov-action@v5
-        with:
-          token: ${{ secrets.CODECOV_TOKEN }}
-          slug: DeepCritical/GradioDemo
-          files: ./coverage.xml
-          fail_ci_if_error: false
-        continue-on-error: true
-  docs:
-    runs-on: ubuntu-latest
-    permissions:
-      contents: write
-    if: github.event_name == 'push' && (github.ref == 'refs/heads/main' || github.ref == 'refs/heads/dev' || github.ref == 'refs/heads/develop')
-    steps:
-      - uses: actions/checkout@v4
-        with:
-          fetch-depth: 0
-      - name: Set up Python
-        uses: actions/setup-python@v5
-        with:
-          python-version: '3.11'
-      - name: Install uv
-        uses: astral-sh/setup-uv@v5
-        with:
-          version: "latest"
-      - name: Install dependencies
-        run: |
-          uv sync --extra dev
-      - name: Configure Git
-        run: |
-          git config user.name "github-actions[bot]"
-          git config user.email "github-actions[bot]@users.noreply.github.com"
-          git remote set-url origin https://x-access-token:${{ secrets.GITHUB_TOKEN }}@github.com/${{ github.repository }}.git
-      - name: Deploy to GitHub Pages
-        run: |
-          # mkdocs gh-deploy automatically creates .nojekyll, but let's verify
-          uv run mkdocs gh-deploy --force --message "Deploy docs [skip ci]" --strict
-          # Verify .nojekyll was created in gh-pages branch
-          git fetch origin gh-pages:gh-pages || true
-          git checkout gh-pages || true
-          if [ -f .nojekyll ]; then
-            echo "✓ .nojekyll file exists"
-          else
-            echo "⚠ .nojekyll file missing, creating it..."
-            touch .nojekyll
-            git add .nojekyll
-            git commit -m "Add .nojekyll to disable Jekyll [skip ci]" || true
-            git push origin gh-pages || true
-          fi
-        env:
-          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

.github/workflows/deploy-hf-space.yml DELETED Viewed

@@ -1,47 +0,0 @@
-name: Deploy to Hugging Face Space
-on:
-  push:
-    branches: [main]
-  workflow_dispatch:  # Allow manual triggering
-jobs:
-  deploy:
-    runs-on: ubuntu-latest
-    permissions:
-      contents: read
-      # No write permissions needed for GitHub repo (we're pushing to HF Space)
-    steps:
-      - name: Checkout Repository
-        uses: actions/checkout@v4
-        with:
-          fetch-depth: 0
-      - name: Set up Python
-        uses: actions/setup-python@v5
-        with:
-          python-version: '3.11'
-      - name: Install dependencies
-        run: |
-          pip install --upgrade pip
-          pip install huggingface-hub
-      - name: Deploy to Hugging Face Space
-        env:
-          # Token from secrets (sensitive data)
-          HF_TOKEN: ${{ secrets.HF_TOKEN }}
-          # Username/Organization from repository variables (non-sensitive)
-          HF_USERNAME: ${{ vars.HF_USERNAME }}
-          # Space name from repository variables (non-sensitive)
-          HF_SPACE_NAME: ${{ vars.HF_SPACE_NAME }}
-        run: |
-          python .github/scripts/deploy_to_hf_space.py
-      - name: Verify deployment
-        if: success()
-        run: |
-          echo "✅ Deployment completed successfully!"
-          echo "Space URL: https://huggingface.co/spaces/${{ vars.HF_USERNAME }}/${{ vars.HF_SPACE_NAME }}"

.gitignore DELETED Viewed

@@ -1,84 +0,0 @@
-folder/
-site/
-.cursor/
-.ruff_cache/
-# Python
-__pycache__/
-*.py[cod]
-*$py.class
-*.so
-.Python
-build/
-develop-eggs/
-dist/
-downloads/
-eggs/
-.eggs/
-lib/
-lib64/
-parts/
-sdist/
-var/
-wheels/
-*.egg-info/
-.installed.cfg
-*.egg
-# Virtual environments
-.venv/
-venv/
-ENV/
-env/
-# IDE
-.vscode/
-.idea/
-*.swp
-*.swo
-# Environment
-.env
-.env.local
-*.local
-# Claude
-.claude/
-# Burner docs (working drafts, not for commit)
-burner_docs/
-# Reference repos (clone locally, don't commit)
-reference_repos/autogen-microsoft/
-reference_repos/claude-agent-sdk/
-reference_repos/pydanticai-research-agent/
-reference_repos/pubmed-mcp-server/
-reference_repos/DeepCritical/
-# Keep the README in reference_repos
-!reference_repos/README.md
-# Development directory
-dev/
-# OS
-.DS_Store
-Thumbs.db
-# Logs
-*.log
-logs/
-# Testing
-.pytest_cache/
-.mypy_cache/
-.coverage
-htmlcov/
-test_output*.txt
-# Database files
-chroma_db/
-*.sqlite3
-# Trigger rebuild Wed Nov 26 17:51:41 EST 2025
-.env

.pre-commit-config.yaml DELETED Viewed

@@ -1,64 +0,0 @@
-repos:
-  - repo: https://github.com/astral-sh/ruff-pre-commit
-    rev: v0.4.4
-    hooks:
-      - id: ruff
-        args: [--fix, --exclude, tests]
-        exclude: ^reference_repos/
-      - id: ruff-format
-        args: [--exclude, tests]
-        exclude: ^reference_repos/
-  - repo: https://github.com/pre-commit/mirrors-mypy
-    rev: v1.10.0
-    hooks:
-      - id: mypy
-        files: ^src/
-        exclude: ^folder|^src/app.py
-        additional_dependencies:
-          - pydantic>=2.7
-          - pydantic-settings>=2.2
-          - tenacity>=8.2
-          - pydantic-ai>=0.0.16
-        args: [--ignore-missing-imports]
-  - repo: local
-    hooks:
-      - id: pytest-unit
-        name: pytest unit tests (no OpenAI)
-        entry: uv
-        language: system
-        types: [python]
-        args: [
-          "run",
-          "pytest",
-          "tests/unit/",
-          "-v",
-          "-m",
-          "not openai and not embedding_provider",
-          "--tb=short",
-          "-p",
-          "no:logfire",
-        ]
-        pass_filenames: false
-        always_run: true
-        require_serial: false
-      - id: pytest-local-embeddings
-        name: pytest local embeddings tests
-        entry: uv
-        language: system
-        types: [python]
-        args: [
-          "run",
-          "pytest",
-          "tests/",
-          "-v",
-          "-m",
-          "local_embeddings",
-          "--tb=short",
-          "-p",
-          "no:logfire",
-        ]
-        pass_filenames: false
-        always_run: true
-        require_serial: false

.pre-commit-hooks/run_pytest.ps1 DELETED Viewed

@@ -1,19 +0,0 @@
-# PowerShell pytest runner for pre-commit (Windows)
-# Uses uv if available, otherwise falls back to pytest
-if (Get-Command uv -ErrorAction SilentlyContinue) {
-    # Sync dependencies before running tests
-    uv sync
-    uv run pytest $args
-} else {
-    Write-Warning "uv not found, using system pytest (may have missing dependencies)"
-    pytest $args
-}

.pre-commit-hooks/run_pytest.sh DELETED Viewed

@@ -1,20 +0,0 @@
-#!/bin/bash
-# Cross-platform pytest runner for pre-commit
-# Uses uv if available, otherwise falls back to pytest
-if command -v uv >/dev/null 2>&1; then
-    # Sync dependencies before running tests
-    uv sync
-    uv run pytest "$@"
-else
-    echo "Warning: uv not found, using system pytest (may have missing dependencies)"
-    pytest "$@"
-fi

.pre-commit-hooks/run_pytest_embeddings.ps1 DELETED Viewed

@@ -1,14 +0,0 @@
-# PowerShell wrapper to sync embeddings dependencies and run embeddings tests
-$ErrorActionPreference = "Stop"
-if (Get-Command uv -ErrorAction SilentlyContinue) {
-    Write-Host "Syncing embeddings dependencies..."
-    uv sync --extra embeddings
-    Write-Host "Running embeddings tests..."
-    uv run pytest tests/ -v -m local_embeddings --tb=short -p no:logfire
-} else {
-    Write-Error "uv not found"
-    exit 1
-}

.pre-commit-hooks/run_pytest_embeddings.sh DELETED Viewed

@@ -1,15 +0,0 @@
-#!/bin/bash
-# Wrapper script to sync embeddings dependencies and run embeddings tests
-set -e
-if command -v uv >/dev/null 2>&1; then
-    echo "Syncing embeddings dependencies..."
-    uv sync --extra embeddings
-    echo "Running embeddings tests..."
-    uv run pytest tests/ -v -m local_embeddings --tb=short -p no:logfire
-else
-    echo "Error: uv not found"
-    exit 1
-fi

.pre-commit-hooks/run_pytest_unit.ps1 DELETED Viewed

@@ -1,14 +0,0 @@
-# PowerShell wrapper to sync dependencies and run unit tests
-$ErrorActionPreference = "Stop"
-if (Get-Command uv -ErrorAction SilentlyContinue) {
-    Write-Host "Syncing dependencies..."
-    uv sync
-    Write-Host "Running unit tests..."
-    uv run pytest tests/unit/ -v -m "not openai and not embedding_provider" --tb=short -p no:logfire
-} else {
-    Write-Error "uv not found"
-    exit 1
-}

.pre-commit-hooks/run_pytest_unit.sh DELETED Viewed

@@ -1,15 +0,0 @@
-#!/bin/bash
-# Wrapper script to sync dependencies and run unit tests
-set -e
-if command -v uv >/dev/null 2>&1; then
-    echo "Syncing dependencies..."
-    uv sync
-    echo "Running unit tests..."
-    uv run pytest tests/unit/ -v -m "not openai and not embedding_provider" --tb=short -p no:logfire
-else
-    echo "Error: uv not found"
-    exit 1
-fi

.pre-commit-hooks/run_pytest_with_sync.ps1 DELETED Viewed

@@ -1,25 +0,0 @@
-# PowerShell wrapper for pytest runner
-# Ensures uv is available and runs the Python script
-param(
-    [Parameter(Position=0)]
-    [string]$TestType = "unit"
-)
-$ErrorActionPreference = "Stop"
-# Check if uv is available
-if (-not (Get-Command uv -ErrorAction SilentlyContinue)) {
-    Write-Error "uv not found. Please install uv: https://github.com/astral-sh/uv"
-    exit 1
-}
-# Get the script directory
-$ScriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
-$PythonScript = Join-Path $ScriptDir "run_pytest_with_sync.py"
-# Run the Python script using uv
-uv run python $PythonScript $TestType
-exit $LASTEXITCODE

.pre-commit-hooks/run_pytest_with_sync.py DELETED Viewed

@@ -1,235 +0,0 @@
-#!/usr/bin/env python3
-"""Cross-platform pytest runner that syncs dependencies before running tests."""
-import shutil
-import subprocess
-import sys
-from pathlib import Path
-def clean_caches(project_root: Path) -> None:
-    """Remove pytest and Python cache directories and files.
-    Comprehensively removes all cache files and directories to ensure
-    clean test runs. Only scans specific directories to avoid resource
-    exhaustion from scanning large directories like .venv on Windows.
-    """
-    # Directories to scan for caches (only project code, not dependencies)
-    scan_dirs = ["src", "tests", ".pre-commit-hooks"]
-    # Directories to exclude (to avoid resource issues)
-    exclude_dirs = {
-        ".venv",
-        "venv",
-        "ENV",
-        "env",
-        ".git",
-        "node_modules",
-        "dist",
-        "build",
-        ".eggs",
-        "reference_repos",
-        "folder",
-    }
-    # Comprehensive list of cache patterns to remove
-    cache_patterns = [
-        ".pytest_cache",
-        "__pycache__",
-        "*.pyc",
-        "*.pyo",
-        "*.pyd",
-        ".mypy_cache",
-        ".ruff_cache",
-        ".coverage",
-        "coverage.xml",
-        "htmlcov",
-        ".hypothesis",  # Hypothesis testing framework cache
-        ".tox",  # Tox cache (if used)
-        ".cache",  # General Python cache
-    ]
-    def should_exclude(path: Path) -> bool:
-        """Check if a path should be excluded from cache cleanup."""
-        # Check if any parent directory is in exclude list
-        for parent in path.parents:
-            if parent.name in exclude_dirs:
-                return True
-        # Check if the path itself is excluded
-        if path.name in exclude_dirs:
-            return True
-        return False
-    cleaned = []
-    # Only scan specific directories to avoid resource exhaustion
-    for scan_dir in scan_dirs:
-        scan_path = project_root / scan_dir
-        if not scan_path.exists():
-            continue
-        for pattern in cache_patterns:
-            if "*" in pattern:
-                # Handle glob patterns for files
-                try:
-                    for cache_file in scan_path.rglob(pattern):
-                        if should_exclude(cache_file):
-                            continue
-                        try:
-                            if cache_file.is_file():
-                                cache_file.unlink()
-                                cleaned.append(str(cache_file.relative_to(project_root)))
-                        except OSError:
-                            pass  # Ignore errors (file might be locked or already deleted)
-                except OSError:
-                    pass  # Ignore errors during directory traversal
-            else:
-                # Handle directory patterns
-                try:
-                    for cache_dir in scan_path.rglob(pattern):
-                        if should_exclude(cache_dir):
-                            continue
-                        try:
-                            if cache_dir.is_dir():
-                                shutil.rmtree(cache_dir, ignore_errors=True)
-                                cleaned.append(str(cache_dir.relative_to(project_root)))
-                        except OSError:
-                            pass  # Ignore errors (directory might be locked)
-                except OSError:
-                    pass  # Ignore errors during directory traversal
-    # Also clean root-level caches (like .pytest_cache in project root)
-    root_cache_patterns = [
-        ".pytest_cache",
-        ".mypy_cache",
-        ".ruff_cache",
-        ".coverage",
-        "coverage.xml",
-        "htmlcov",
-        ".hypothesis",
-        ".tox",
-        ".cache",
-        ".pytest",
-    ]
-    for pattern in root_cache_patterns:
-        cache_path = project_root / pattern
-        if cache_path.exists():
-            try:
-                if cache_path.is_dir():
-                    shutil.rmtree(cache_path, ignore_errors=True)
-                elif cache_path.is_file():
-                    cache_path.unlink()
-                cleaned.append(pattern)
-            except OSError:
-                pass
-    # Also remove any .pyc files in root directory
-    try:
-        for pyc_file in project_root.glob("*.pyc"):
-            try:
-                pyc_file.unlink()
-                cleaned.append(pyc_file.name)
-            except OSError:
-                pass
-    except OSError:
-        pass
-    if cleaned:
-        print(
-            f"Cleaned {len(cleaned)} cache items: {', '.join(cleaned[:10])}{'...' if len(cleaned) > 10 else ''}"
-        )
-    else:
-        print("No cache files found to clean")
-def run_command(
-    cmd: list[str], check: bool = True, shell: bool = False, cwd: str | None = None
-) -> int:
-    """Run a command and return exit code."""
-    try:
-        result = subprocess.run(
-            cmd,
-            check=check,
-            shell=shell,
-            cwd=cwd,
-            env=None,  # Use current environment, uv will handle venv
-        )
-        return result.returncode
-    except subprocess.CalledProcessError as e:
-        return e.returncode
-    except FileNotFoundError:
-        print(f"Error: Command not found: {cmd[0]}")
-        return 1
-def main() -> int:
-    """Main entry point."""
-    import os
-    # Get the project root (where pyproject.toml is)
-    script_dir = Path(__file__).parent
-    project_root = script_dir.parent
-    # Change to project root to ensure uv works correctly
-    os.chdir(project_root)
-    # Clean caches before running tests
-    print("Cleaning pytest and Python caches...")
-    clean_caches(project_root)
-    # Check if uv is available
-    if run_command(["uv", "--version"], check=False) != 0:
-        print("Error: uv not found. Please install uv: https://github.com/astral-sh/uv")
-        return 1
-    # Parse arguments
-    test_type = sys.argv[1] if len(sys.argv) > 1 else "unit"
-    extra_args = sys.argv[2:] if len(sys.argv) > 2 else []
-    # Sync dependencies - always include dev
-    # Note: embeddings dependencies are now in main dependencies, not optional
-    # Use --extra dev for [project.optional-dependencies].dev (not --dev which is for [dependency-groups])
-    sync_cmd = ["uv", "sync", "--extra", "dev"]
-    print(f"Syncing dependencies for {test_type} tests...")
-    if run_command(sync_cmd, cwd=project_root) != 0:
-        return 1
-    # Build pytest command - use uv run to ensure correct environment
-    if test_type == "unit":
-        pytest_args = [
-            "tests/unit/",
-            "-v",
-            "-m",
-            "not openai and not embedding_provider",
-            "--tb=short",
-            "-p",
-            "no:logfire",
-            "--cache-clear",  # Clear pytest cache before running
-        ]
-    elif test_type == "embeddings":
-        pytest_args = [
-            "tests/",
-            "-v",
-            "-m",
-            "local_embeddings",
-            "--tb=short",
-            "-p",
-            "no:logfire",
-            "--cache-clear",  # Clear pytest cache before running
-        ]
-    else:
-        pytest_args = []
-    pytest_args.extend(extra_args)
-    # Use uv run python -m pytest to ensure we use the venv's pytest
-    # This is more reliable than uv run pytest which might find system pytest
-    pytest_cmd = ["uv", "run", "python", "-m", "pytest", *pytest_args]
-    print(f"Running {test_type} tests...")
-    return run_command(pytest_cmd, cwd=project_root)
-if __name__ == "__main__":
-    sys.exit(main())

.python-version DELETED Viewed

	@@ -1 +0,0 @@
1	- 3.11

AGENTS.txt DELETED Viewed

@@ -1,236 +0,0 @@
-# DeepCritical Project - Rules
-## Project-Wide Rules
-**Architecture**: Multi-agent research system using Pydantic AI for agent orchestration, supporting iterative and deep research patterns. Uses middleware for state management, budget tracking, and workflow coordination.
-**Type Safety**: ALWAYS use complete type hints. All functions must have parameter and return type annotations. Use `mypy --strict` compliance. Use `TYPE_CHECKING` imports for circular dependencies: `from typing import TYPE_CHECKING; if TYPE_CHECKING: from src.services.embeddings import EmbeddingService`
-**Async Patterns**: ALL I/O operations must be async (`async def`, `await`). Use `asyncio.gather()` for parallel operations. CPU-bound work must use `run_in_executor()`: `loop = asyncio.get_running_loop(); result = await loop.run_in_executor(None, cpu_bound_function, args)`. Never block the event loop.
-**Error Handling**: Use custom exceptions from `src/utils/exceptions.py`: `DeepCriticalError`, `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions: `raise SearchError(...) from e`. Log with structlog: `logger.error("Operation failed", error=str(e), context=value)`.
-**Logging**: Use `structlog` for ALL logging (NOT `print` or `logging`). Import: `import structlog; logger = structlog.get_logger()`. Log with structured data: `logger.info("event", key=value)`. Use appropriate levels: DEBUG, INFO, WARNING, ERROR.
-**Pydantic Models**: All data exchange uses Pydantic models from `src/utils/models.py`. Models are frozen (`model_config = {"frozen": True}`) for immutability. Use `Field()` with descriptions. Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints.
-**Code Style**: Ruff with 100-char line length. Ignore rules: `PLR0913` (too many arguments), `PLR0912` (too many branches), `PLR0911` (too many returns), `PLR2004` (magic values), `PLW0603` (global statement), `PLC0415` (lazy imports).
-**Docstrings**: Google-style docstrings for all public functions. Include Args, Returns, Raises sections. Use type hints in docstrings only if needed for clarity.
-**Testing**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`). Use `respx` for httpx mocking, `pytest-mock` for general mocking.
-**State Management**: Use `ContextVar` in middleware for thread-safe isolation. Never use global mutable state (except singletons via `@lru_cache`). Use `WorkflowState` from `src/middleware/state_machine.py` for workflow state.
-**Citation Validation**: ALWAYS validate references before returning reports. Use `validate_references()` from `src/utils/citation_validator.py`. Remove hallucinated citations. Log warnings for removed citations.
----
-## src/agents/ - Agent Implementation Rules
-**Pattern**: All agents use Pydantic AI `Agent` class. Agents have structured output types (Pydantic models) or return strings. Use factory functions in `src/agent_factory/agents.py` for creation.
-**Agent Structure**:
-- System prompt as module-level constant (with date injection: `datetime.now().strftime("%Y-%m-%d")`)
-- Agent class with `__init__(model: Any | None = None)`
-- Main method (e.g., `async def evaluate()`, `async def write_report()`)
-- Factory function: `def create_agent_name(model: Any | None = None) -> AgentName`
-**Model Initialization**: Use `get_model()` from `src/agent_factory/judges.py` if no model provided. Support OpenAI/Anthropic/HF Inference via settings.
-**Error Handling**: Return fallback values (e.g., `KnowledgeGapOutput(research_complete=False, outstanding_gaps=[...])`) on failure. Log errors with context. Use retry logic (3 retries) in Pydantic AI Agent initialization.
-**Input Validation**: Validate query/inputs are not empty. Truncate very long inputs with warnings. Handle None values gracefully.
-**Output Types**: Use structured output types from `src/utils/models.py` (e.g., `KnowledgeGapOutput`, `AgentSelectionPlan`, `ReportDraft`). For text output (writer agents), return `str` directly.
-**Agent-Specific Rules**:
-- `knowledge_gap.py`: Outputs `KnowledgeGapOutput`. Evaluates research completeness.
-- `tool_selector.py`: Outputs `AgentSelectionPlan`. Selects tools (RAG/web/database).
-- `writer.py`: Returns markdown string. Includes citations in numbered format.
-- `long_writer.py`: Uses `ReportDraft` input/output. Handles section-by-section writing.
-- `proofreader.py`: Takes `ReportDraft`, returns polished markdown.
-- `thinking.py`: Returns observation string from conversation history.
-- `input_parser.py`: Outputs `ParsedQuery` with research mode detection.
----
-## src/tools/ - Search Tool Rules
-**Protocol**: All tools implement `SearchTool` protocol from `src/tools/base.py`: `name` property and `async def search(query, max_results) -> list[Evidence]`.
-**Rate Limiting**: Use `@retry` decorator from tenacity: `@retry(stop=stop_after_attempt(3), wait=wait_exponential(...))`. Implement `_rate_limit()` method for APIs with limits. Use shared rate limiters from `src/tools/rate_limiter.py`.
-**Error Handling**: Raise `SearchError` or `RateLimitError` on failures. Handle HTTP errors (429, 500, timeout). Return empty list on non-critical errors (log warning).
-**Query Preprocessing**: Use `preprocess_query()` from `src/tools/query_utils.py` to remove noise and expand synonyms.
-**Evidence Conversion**: Convert API responses to `Evidence` objects with `Citation`. Extract metadata (title, url, date, authors). Set relevance scores (0.0-1.0). Handle missing fields gracefully.
-**Tool-Specific Rules**:
-- `pubmed.py`: Use NCBI E-utilities (ESearch → EFetch). Rate limit: 0.34s between requests. Parse XML with `xmltodict`. Handle single vs. multiple articles.
-- `clinicaltrials.py`: Use `requests` library (NOT httpx - WAF blocks httpx). Run in thread pool: `await asyncio.to_thread(requests.get, ...)`. Filter: Only interventional studies, active/completed.
-- `europepmc.py`: Handle preprint markers: `[PREPRINT - Not peer-reviewed]`. Build URLs from DOI or PMID.
-- `rag_tool.py`: Wraps `LlamaIndexRAGService`. Returns Evidence from RAG results. Handles ingestion.
-- `search_handler.py`: Orchestrates parallel searches across multiple tools. Uses `asyncio.gather()` with `return_exceptions=True`. Aggregates results into `SearchResult`.
----
-## src/middleware/ - Middleware Rules
-**State Management**: Use `ContextVar` for thread-safe isolation. `WorkflowState` uses `ContextVar[WorkflowState | None]`. Initialize with `init_workflow_state(embedding_service)`. Access with `get_workflow_state()` (auto-initializes if missing).
-**WorkflowState**: Tracks `evidence: list[Evidence]`, `conversation: Conversation`, `embedding_service: Any`. Methods: `add_evidence()` (deduplicates by URL), `async search_related()` (semantic search).
-**WorkflowManager**: Manages parallel research loops. Methods: `add_loop()`, `run_loops_parallel()`, `update_loop_status()`, `sync_loop_evidence_to_state()`. Uses `asyncio.gather()` for parallel execution. Handles errors per loop (don't fail all if one fails).
-**BudgetTracker**: Tracks tokens, time, iterations per loop and globally. Methods: `create_budget()`, `add_tokens()`, `start_timer()`, `update_timer()`, `increment_iteration()`, `check_budget()`, `can_continue()`. Token estimation: `estimate_tokens(text)` (~4 chars per token), `estimate_llm_call_tokens(prompt, response)`.
-**Models**: All middleware models in `src/utils/models.py`. `IterationData`, `Conversation`, `ResearchLoop`, `BudgetStatus` are used by middleware.
----
-## src/orchestrator/ - Orchestration Rules
-**Research Flows**: Two patterns: `IterativeResearchFlow` (single loop) and `DeepResearchFlow` (plan → parallel loops → synthesis). Both support agent chains (`use_graph=False`) and graph execution (`use_graph=True`).
-**IterativeResearchFlow**: Pattern: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete. Uses `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`, `WriterAgent`, `JudgeHandler`. Tracks iterations, time, budget.
-**DeepResearchFlow**: Pattern: Planner → Parallel iterative loops per section → Synthesizer. Uses `PlannerAgent`, `IterativeResearchFlow` (per section), `LongWriterAgent` or `ProofreaderAgent`. Uses `WorkflowManager` for parallel execution.
-**Graph Orchestrator**: Uses Pydantic AI Graphs (when available) or agent chains (fallback). Routes based on research mode (iterative/deep/auto). Streams `AgentEvent` objects for UI.
-**State Initialization**: Always call `init_workflow_state()` before running flows. Initialize `BudgetTracker` per loop. Use `WorkflowManager` for parallel coordination.
-**Event Streaming**: Yield `AgentEvent` objects during execution. Event types: "started", "search_complete", "judge_complete", "hypothesizing", "synthesizing", "complete", "error". Include iteration numbers and data payloads.
----
-## src/services/ - Service Rules
-**EmbeddingService**: Local sentence-transformers (NO API key required). All operations async-safe via `run_in_executor()`. ChromaDB for vector storage. Deduplication threshold: 0.85 (85% similarity = duplicate).
-**LlamaIndexRAGService**: Uses OpenAI embeddings (requires `OPENAI_API_KEY`). Methods: `ingest_evidence()`, `retrieve()`, `query()`. Returns documents with metadata (source, title, url, date, authors). Lazy initialization with graceful fallback.
-**StatisticalAnalyzer**: Generates Python code via LLM. Executes in Modal sandbox (secure, isolated). Library versions pinned in `SANDBOX_LIBRARIES` dict. Returns `AnalysisResult` with verdict (SUPPORTED/REFUTED/INCONCLUSIVE).
-**Singleton Pattern**: Use `@lru_cache(maxsize=1)` for singletons: `@lru_cache(maxsize=1); def get_service() -> Service: return Service()`. Lazy initialization to avoid requiring dependencies at import time.
----
-## src/utils/ - Utility Rules
-**Models**: All Pydantic models in `src/utils/models.py`. Use frozen models (`model_config = {"frozen": True}`) except where mutation needed. Use `Field()` with descriptions. Validate with constraints.
-**Config**: Settings via Pydantic Settings (`src/utils/config.py`). Load from `.env` automatically. Use `settings` singleton: `from src.utils.config import settings`. Validate API keys with properties: `has_openai_key`, `has_anthropic_key`.
-**Exceptions**: Custom exception hierarchy in `src/utils/exceptions.py`. Base: `DeepCriticalError`. Specific: `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions.
-**LLM Factory**: Centralized LLM model creation in `src/utils/llm_factory.py`. Supports OpenAI, Anthropic, HF Inference. Use `get_model()` or factory functions. Check requirements before initialization.
-**Citation Validator**: Use `validate_references()` from `src/utils/citation_validator.py`. Removes hallucinated citations (URLs not in evidence). Logs warnings. Returns validated report string.
----
-## src/orchestrator_factory.py Rules
-**Purpose**: Factory for creating orchestrators. Supports "simple" (legacy) and "advanced" (magentic) modes. Auto-detects mode based on API key availability.
-**Pattern**: Lazy import for optional dependencies (`_get_magentic_orchestrator_class()`). Handles `ImportError` gracefully with clear error messages.
-**Mode Detection**: `_determine_mode()` checks explicit mode or auto-detects: "advanced" if `settings.has_openai_key`, else "simple". Maps "magentic" → "advanced".
-**Function Signature**: `create_orchestrator(search_handler, judge_handler, config, mode) -> Any`. Simple mode requires handlers. Advanced mode uses MagenticOrchestrator.
-**Error Handling**: Raise `ValueError` with clear messages if requirements not met. Log mode selection with structlog.
----
-## src/orchestrator_hierarchical.py Rules
-**Purpose**: Hierarchical orchestrator using middleware and sub-teams. Adapts Magentic ChatAgent to SubIterationTeam protocol.
-**Pattern**: Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`. Event-driven via callback queue.
-**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated, but kept for compatibility).
-**Event Streaming**: Uses `asyncio.Queue` for event coordination. Yields `AgentEvent` objects. Handles event callback pattern with `asyncio.wait()`.
-**Error Handling**: Log errors with context. Yield error events. Process remaining events after task completion.
----
-## src/orchestrator_magentic.py Rules
-**Purpose**: Magentic-based orchestrator using ChatAgent pattern. Each agent has internal LLM. Manager orchestrates agents.
-**Pattern**: Uses `MagenticBuilder` with participants (searcher, hypothesizer, judge, reporter). Manager uses `OpenAIChatClient`. Workflow built in `_build_workflow()`.
-**Event Processing**: `_process_event()` converts Magentic events to `AgentEvent`. Handles: `MagenticOrchestratorMessageEvent`, `MagenticAgentMessageEvent`, `MagenticFinalResultEvent`, `MagenticAgentDeltaEvent`, `WorkflowOutputEvent`.
-**Text Extraction**: `_extract_text()` defensively extracts text from messages. Priority: `.content` → `.text` → `str(message)`. Handles buggy message objects.
-**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated).
-**Requirements**: Must call `check_magentic_requirements()` in `__init__`. Requires `agent-framework-core` and OpenAI API key.
-**Event Types**: Maps agent names to event types: "search" → "search_complete", "judge" → "judge_complete", "hypothes" → "hypothesizing", "report" → "synthesizing".
----
-## src/agent_factory/ - Factory Rules
-**Pattern**: Factory functions for creating agents and handlers. Lazy initialization for optional dependencies. Support OpenAI/Anthropic/HF Inference.
-**Judges**: `create_judge_handler()` creates `JudgeHandler` with structured output (`JudgeAssessment`). Supports `MockJudgeHandler`, `HFInferenceJudgeHandler` as fallbacks.
-**Agents**: Factory functions in `agents.py` for all Pydantic AI agents. Pattern: `create_agent_name(model: Any | None = None) -> AgentName`. Use `get_model()` if model not provided.
-**Graph Builder**: `graph_builder.py` contains utilities for building research graphs. Supports iterative and deep research graph construction.
-**Error Handling**: Raise `ConfigurationError` if required API keys missing. Log agent creation. Handle import errors gracefully.
----
-## src/prompts/ - Prompt Rules
-**Pattern**: System prompts stored as module-level constants. Include date injection: `datetime.now().strftime("%Y-%m-%d")`. Format evidence with truncation (1500 chars per item).
-**Judge Prompts**: In `judge.py`. Handle empty evidence case separately. Always request structured JSON output.
-**Hypothesis Prompts**: In `hypothesis.py`. Use diverse evidence selection (MMR algorithm). Sentence-aware truncation.
-**Report Prompts**: In `report.py`. Include full citation details. Use diverse evidence selection (n=20). Emphasize citation validation rules.
----
-## Testing Rules
-**Structure**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`).
-**Mocking**: Use `respx` for httpx mocking. Use `pytest-mock` for general mocking. Mock LLM calls in unit tests (use `MockJudgeHandler`).
-**Fixtures**: Common fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`.
-**Coverage**: Aim for >80% coverage. Test error handling, edge cases, and integration paths.
----
-## File-Specific Agent Rules
-**knowledge_gap.py**: Outputs `KnowledgeGapOutput`. System prompt evaluates research completeness. Handles conversation history. Returns fallback on error.
-**writer.py**: Returns markdown string. System prompt includes citation format examples. Validates inputs. Truncates long findings. Retry logic for transient failures.
-**long_writer.py**: Uses `ReportDraft` input/output. Writes sections iteratively. Reformats references (deduplicates, renumbers). Reformats section headings.
-**proofreader.py**: Takes `ReportDraft`, returns polished markdown. Removes duplicates. Adds summary. Preserves references.
-**tool_selector.py**: Outputs `AgentSelectionPlan`. System prompt lists available agents (WebSearchAgent, SiteCrawlerAgent, RAGAgent). Guidelines for when to use each.
-**thinking.py**: Returns observation string. Generates observations from conversation history. Uses query and background context.
-**input_parser.py**: Outputs `ParsedQuery`. Detects research mode (iterative/deep). Extracts entities and research questions. Improves/refines query.

CONTRIBUTING.md DELETED Viewed

@@ -1,494 +0,0 @@
-# Contributing to The DETERMINATOR
-Thank you for your interest in contributing to The DETERMINATOR! This guide will help you get started.
-## Table of Contents
-- [Git Workflow](#git-workflow)
-- [Getting Started](#getting-started)
-- [Development Commands](#development-commands)
-- [MCP Integration](#mcp-integration)
-- [Common Pitfalls](#common-pitfalls)
-- [Key Principles](#key-principles)
-- [Pull Request Process](#pull-request-process)
-> **Note**: Additional sections (Code Style, Error Handling, Testing, Implementation Patterns, Code Quality, and Prompt Engineering) are available as separate pages in the [documentation](https://deepcritical.github.io/GradioDemo/contributing/).
-> **Note on Project Names**: "The DETERMINATOR" is the product name, "DeepCritical" is the organization/project name, and "determinator" is the Python package name.
-## Repository Information
-- **GitHub Repository**: [`DeepCritical/GradioDemo`](https://github.com/DeepCritical/GradioDemo) (source of truth, PRs, code review)
-- **HuggingFace Space**: [`DataQuests/DeepCritical`](https://huggingface.co/spaces/DataQuests/DeepCritical) (deployment/demo)
-- **Package Name**: `determinator` (Python package name in `pyproject.toml`)
-## Git Workflow
-- `main`: Production-ready (GitHub)
-- `dev`: Development integration (GitHub)
-- Use feature branches: `yourname-dev`
-- **NEVER** push directly to `main` or `dev` on HuggingFace
-- GitHub is source of truth; HuggingFace is for deployment
-### Dual Repository Setup
-This project uses a dual repository setup:
-- **GitHub (`DeepCritical/GradioDemo`)**: Source of truth for code, PRs, and code review
-- **HuggingFace (`DataQuests/DeepCritical`)**: Deployment target for the Gradio demo
-#### Remote Configuration
-When cloning, set up remotes as follows:
-```bash
-# Clone from GitHub
-git clone https://github.com/DeepCritical/GradioDemo.git
-cd GradioDemo
-# Add HuggingFace remote (optional, for deployment)
-git remote add huggingface-upstream https://huggingface.co/spaces/DataQuests/DeepCritical
-```
-**Important**: Never push directly to `main` or `dev` on HuggingFace. Always work through GitHub PRs. GitHub is the source of truth; HuggingFace is for deployment/demo only.
-## Getting Started
-1. **Fork the repository** on GitHub: [`DeepCritical/GradioDemo`](https://github.com/DeepCritical/GradioDemo)
-2. **Clone your fork**:
-   ```bash
-   git clone https://github.com/yourusername/GradioDemo.git
-   cd GradioDemo
-   ```
-3. **Install dependencies**:
-   ```bash
-   uv sync --all-extras
-   uv run pre-commit install
-   ```
-4. **Create a feature branch**:
-   ```bash
-   git checkout -b yourname-feature-name
-   ```
-5. **Make your changes** following the guidelines below
-6. **Run checks**:
-   ```bash
-   uv run ruff check src tests
-   uv run mypy src
-   uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire
-   ```
-7. **Commit and push**:
-   ```bash
-   git commit -m "Description of changes"
-   git push origin yourname-feature-name
-   ```
-8. **Create a pull request** on GitHub
-## Package Manager
-This project uses [`uv`](https://github.com/astral-sh/uv) as the package manager. All commands should be prefixed with `uv run` to ensure they run in the correct environment.
-### Installation
-```bash
-# Install uv if you haven't already (recommended: standalone installer)
-# Unix/macOS/Linux:
-curl -LsSf https://astral.sh/uv/install.sh | sh
-# Windows (PowerShell):
-powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
-# Alternative: pipx install uv
-# Or: pip install uv
-# Sync all dependencies including dev extras
-uv sync --all-extras
-# Install pre-commit hooks
-uv run pre-commit install
-```
-## Development Commands
-```bash
-# Installation
-uv sync --all-extras              # Install all dependencies including dev
-uv run pre-commit install          # Install pre-commit hooks
-# Code Quality Checks (run all before committing)
-uv run ruff check src tests       # Lint with ruff
-uv run ruff format src tests      # Format with ruff
-uv run mypy src                   # Type checking
-uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire  # Tests with coverage
-# Testing Commands
-uv run pytest tests/unit/ -v -m "not openai" -p no:logfire              # Run unit tests (excludes OpenAI tests)
-uv run pytest tests/ -v -m "huggingface" -p no:logfire                 # Run HuggingFace tests
-uv run pytest tests/ -v -p no:logfire                                  # Run all tests
-uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire  # Tests with terminal coverage
-uv run pytest --cov=src --cov-report=html -p no:logfire                # Generate HTML coverage report (opens htmlcov/index.html)
-# Documentation Commands
-uv run mkdocs build                # Build documentation
-uv run mkdocs serve                # Serve documentation locally (http://127.0.0.1:8000)
-```
-### Test Markers
-The project uses pytest markers to categorize tests. See [Testing Guidelines](docs/contributing/testing.md) for details:
-- `unit`: Unit tests (mocked, fast)
-- `integration`: Integration tests (real APIs)
-- `slow`: Slow tests
-- `openai`: Tests requiring OpenAI API key
-- `huggingface`: Tests requiring HuggingFace API key
-- `embedding_provider`: Tests requiring API-based embedding providers
-- `local_embeddings`: Tests using local embeddings
-**Note**: The `-p no:logfire` flag disables the logfire plugin to avoid conflicts during testing.
-## Code Style & Conventions
-### Type Safety
-- **ALWAYS** use type hints for all function parameters and return types
-- Use `mypy --strict` compliance (no `Any` unless absolutely necessary)
-- Use `TYPE_CHECKING` imports for circular dependencies:
-<!--codeinclude-->
-[TYPE_CHECKING Import Pattern](../src/utils/citation_validator.py) start_line:8 end_line:11
-<!--/codeinclude-->
-### Pydantic Models
-- All data exchange uses Pydantic models (`src/utils/models.py`)
-- Models are frozen (`model_config = {"frozen": True}`) for immutability
-- Use `Field()` with descriptions for all model fields
-- Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints
-### Async Patterns
-- **ALL** I/O operations must be async (`async def`, `await`)
-- Use `asyncio.gather()` for parallel operations
-- CPU-bound work (embeddings, parsing) must use `run_in_executor()`:
-```python
-loop = asyncio.get_running_loop()
-result = await loop.run_in_executor(None, cpu_bound_function, args)
-```
-- Never block the event loop with synchronous I/O
-### Linting
-- Ruff with 100-char line length
-- Ignore rules documented in `pyproject.toml`:
-  - `PLR0913`: Too many arguments (agents need many params)
-  - `PLR0912`: Too many branches (complex orchestrator logic)
-  - `PLR0911`: Too many return statements (complex agent logic)
-  - `PLR2004`: Magic values (statistical constants)
-  - `PLW0603`: Global statement (singleton pattern)
-  - `PLC0415`: Lazy imports for optional dependencies
-### Pre-commit
-- Pre-commit hooks run automatically on commit
-- Must pass: lint + typecheck + test-cov
-- Install hooks with: `uv run pre-commit install`
-- Note: `uv sync --all-extras` installs the pre-commit package, but you must run `uv run pre-commit install` separately to set up the git hooks
-## Error Handling & Logging
-### Exception Hierarchy
-Use custom exception hierarchy (`src/utils/exceptions.py`):
-<!--codeinclude-->
-[Exception Hierarchy](../src/utils/exceptions.py) start_line:4 end_line:31
-<!--/codeinclude-->
-### Error Handling Rules
-- Always chain exceptions: `raise SearchError(...) from e`
-- Log errors with context using `structlog`:
-```python
-logger.error("Operation failed", error=str(e), context=value)
-```
-- Never silently swallow exceptions
-- Provide actionable error messages
-### Logging
-- Use `structlog` for all logging (NOT `print` or `logging`)
-- Import: `import structlog; logger = structlog.get_logger()`
-- Log with structured data: `logger.info("event", key=value)`
-- Use appropriate levels: DEBUG, INFO, WARNING, ERROR
-### Logging Examples
-```python
-logger.info("Starting search", query=query, tools=[t.name for t in tools])
-logger.warning("Search tool failed", tool=tool.name, error=str(result))
-logger.error("Assessment failed", error=str(e))
-```
-### Error Chaining
-Always preserve exception context:
-```python
-try:
-    result = await api_call()
-except httpx.HTTPError as e:
-    raise SearchError(f"API call failed: {e}") from e
-```
-## Testing Requirements
-### Test Structure
-- Unit tests in `tests/unit/` (mocked, fast)
-- Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`)
-- Use markers: `unit`, `integration`, `slow`
-### Mocking
-- Use `respx` for httpx mocking
-- Use `pytest-mock` for general mocking
-- Mock LLM calls in unit tests (use `MockJudgeHandler`)
-- Fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`
-### TDD Workflow
-1. Write failing test in `tests/unit/`
-2. Implement in `src/`
-3. Ensure test passes
-4. Run checks: `uv run ruff check src tests && uv run mypy src && uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire`
-### Test Examples
-```python
-@pytest.mark.unit
-async def test_pubmed_search(mock_httpx_client):
-    tool = PubMedTool()
-    results = await tool.search("metformin", max_results=5)
-    assert len(results) > 0
-    assert all(isinstance(r, Evidence) for r in results)
-@pytest.mark.integration
-async def test_real_pubmed_search():
-    tool = PubMedTool()
-    results = await tool.search("metformin", max_results=3)
-    assert len(results) <= 3
-```
-### Test Coverage
-- Run `uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire` for coverage report
-- Run `uv run pytest --cov=src --cov-report=html -p no:logfire` for HTML coverage report (opens `htmlcov/index.html`)
-- Aim for >80% coverage on critical paths
-- Exclude: `__init__.py`, `TYPE_CHECKING` blocks
-## Implementation Patterns
-### Search Tools
-All tools implement `SearchTool` protocol (`src/tools/base.py`):
-- Must have `name` property
-- Must implement `async def search(query, max_results) -> list[Evidence]`
-- Use `@retry` decorator from tenacity for resilience
-- Rate limiting: Implement `_rate_limit()` for APIs with limits (e.g., PubMed)
-- Error handling: Raise `SearchError` or `RateLimitError` on failures
-Example pattern:
-```python
-class MySearchTool:
-    @property
-    def name(self) -> str:
-        return "mytool"
-    @retry(stop=stop_after_attempt(3), wait=wait_exponential(...))
-    async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
-        # Implementation
-        return evidence_list
-```
-### Judge Handlers
-- Implement `JudgeHandlerProtocol` (`async def assess(question, evidence) -> JudgeAssessment`)
-- Use pydantic-ai `Agent` with `output_type=JudgeAssessment`
-- System prompts in `src/prompts/judge.py`
-- Support fallback handlers: `MockJudgeHandler`, `HFInferenceJudgeHandler`
-- Always return valid `JudgeAssessment` (never raise exceptions)
-### Agent Factory Pattern
-- Use factory functions for creating agents (`src/agent_factory/`)
-- Lazy initialization for optional dependencies (e.g., embeddings, Modal)
-- Check requirements before initialization:
-<!--codeinclude-->
-[Check Magentic Requirements](../src/utils/llm_factory.py) start_line:152 end_line:170
-<!--/codeinclude-->
-### State Management
-- **Magentic Mode**: Use `ContextVar` for thread-safe state (`src/agents/state.py`)
-- **Simple Mode**: Pass state via function parameters
-- Never use global mutable state (except singletons via `@lru_cache`)
-### Singleton Pattern
-Use `@lru_cache(maxsize=1)` for singletons:
-<!--codeinclude-->
-[Singleton Pattern Example](../src/services/statistical_analyzer.py) start_line:252 end_line:255
-<!--/codeinclude-->
-- Lazy initialization to avoid requiring dependencies at import time
-## Code Quality & Documentation
-### Docstrings
-- Google-style docstrings for all public functions
-- Include Args, Returns, Raises sections
-- Use type hints in docstrings only if needed for clarity
-Example:
-<!--codeinclude-->
-[Search Method Docstring Example](../src/tools/pubmed.py) start_line:51 end_line:58
-<!--/codeinclude-->
-### Code Comments
-- Explain WHY, not WHAT
-- Document non-obvious patterns (e.g., why `requests` not `httpx` for ClinicalTrials)
-- Mark critical sections: `# CRITICAL: ...`
-- Document rate limiting rationale
-- Explain async patterns when non-obvious
-## Prompt Engineering & Citation Validation
-### Judge Prompts
-- System prompt in `src/prompts/judge.py`
-- Format evidence with truncation (1500 chars per item)
-- Handle empty evidence case separately
-- Always request structured JSON output
-- Use `format_user_prompt()` and `format_empty_evidence_prompt()` helpers
-### Hypothesis Prompts
-- Use diverse evidence selection (MMR algorithm)
-- Sentence-aware truncation (`truncate_at_sentence()`)
-- Format: Drug → Target → Pathway → Effect
-- System prompt emphasizes mechanistic reasoning
-- Use `format_hypothesis_prompt()` with embeddings for diversity
-### Report Prompts
-- Include full citation details for validation
-- Use diverse evidence selection (n=20)
-- **CRITICAL**: Emphasize citation validation rules
-- Format hypotheses with support/contradiction counts
-- System prompt includes explicit JSON structure requirements
-### Citation Validation
-- **ALWAYS** validate references before returning reports
-- Use `validate_references()` from `src/utils/citation_validator.py`
-- Remove hallucinated citations (URLs not in evidence)
-- Log warnings for removed citations
-- Never trust LLM-generated citations without validation
-### Citation Validation Rules
-1. Every reference URL must EXACTLY match a provided evidence URL
-2. Do NOT invent, fabricate, or hallucinate any references
-3. Do NOT modify paper titles, authors, dates, or URLs
-4. If unsure about a citation, OMIT it rather than guess
-5. Copy URLs exactly as provided - do not create similar-looking URLs
-### Evidence Selection
-- Use `select_diverse_evidence()` for MMR-based selection
-- Balance relevance vs diversity (lambda=0.7 default)
-- Sentence-aware truncation preserves meaning
-- Limit evidence per prompt to avoid context overflow
-## MCP Integration
-### MCP Tools
-- Functions in `src/mcp_tools.py` for Claude Desktop
-- Full type hints required
-- Google-style docstrings with Args/Returns sections
-- Formatted string returns (markdown)
-### Gradio MCP Server
-- Enable with `mcp_server=True` in `demo.launch()`
-- Endpoint: `/gradio_api/mcp/`
-- Use `ssr_mode=False` to fix hydration issues in HF Spaces
-## Common Pitfalls
-1. **Blocking the event loop**: Never use sync I/O in async functions
-2. **Missing type hints**: All functions must have complete type annotations
-3. **Hallucinated citations**: Always validate references
-4. **Global mutable state**: Use ContextVar or pass via parameters
-5. **Import errors**: Lazy-load optional dependencies (magentic, modal, embeddings)
-6. **Rate limiting**: Always implement for external APIs
-7. **Error chaining**: Always use `from e` when raising exceptions
-## Key Principles
-1. **Type Safety First**: All code must pass `mypy --strict`
-2. **Async Everything**: All I/O must be async
-3. **Test-Driven**: Write tests before implementation
-4. **No Hallucinations**: Validate all citations
-5. **Graceful Degradation**: Support free tier (HF Inference) when no API keys
-6. **Lazy Loading**: Don't require optional dependencies at import time
-7. **Structured Logging**: Use structlog, never print()
-8. **Error Chaining**: Always preserve exception context
-## Pull Request Process
-1. Ensure all checks pass: `uv run ruff check src tests && uv run mypy src && uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire`
-2. Update documentation if needed
-3. Add tests for new features
-4. Update CHANGELOG if applicable
-5. Request review from maintainers
-6. Address review feedback
-7. Wait for approval before merging
-## Project Structure
-- `src/`: Main source code
-- `tests/`: Test files (`unit/` and `integration/`)
-- `docs/`: Documentation source files (MkDocs)
-- `examples/`: Example usage scripts
-- `pyproject.toml`: Project configuration and dependencies
-- `.pre-commit-config.yaml`: Pre-commit hook configuration
-## Questions?
-- Open an issue on [GitHub](https://github.com/DeepCritical/GradioDemo)
-- Check existing [documentation](https://deepcritical.github.io/GradioDemo/)
-- Review code examples in the codebase
-Thank you for contributing to The DETERMINATOR!

Dockerfile DELETED Viewed

@@ -1,52 +0,0 @@
-# Dockerfile for DeepCritical
-FROM python:3.11-slim
-# Set working directory
-WORKDIR /app
-# Install system dependencies (curl needed for HEALTHCHECK)
-RUN apt-get update && apt-get install -y \
-    git \
-    curl \
-    && rm -rf /var/lib/apt/lists/*
-# Install uv
-RUN pip install uv==0.5.4
-# Copy project files
-COPY pyproject.toml .
-COPY uv.lock .
-COPY src/ src/
-COPY README.md .
-# Install runtime dependencies only (no dev/test tools)
-RUN uv sync --frozen --no-dev --extra embeddings --extra magentic
-# Create non-root user BEFORE downloading models
-RUN useradd --create-home --shell /bin/bash appuser
-# Set cache directory for HuggingFace models (must be writable by appuser)
-ENV HF_HOME=/app/.cache
-ENV TRANSFORMERS_CACHE=/app/.cache
-# Create cache dir with correct ownership
-RUN mkdir -p /app/.cache && chown -R appuser:appuser /app/.cache
-# Pre-download the embedding model during build (as appuser to set correct ownership)
-USER appuser
-RUN uv run python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('all-MiniLM-L6-v2')"
-# Expose port
-EXPOSE 7860
-# Health check
-HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
-    CMD curl -f http://localhost:7860/ || exit 1
-# Set environment variables
-ENV GRADIO_SERVER_NAME=0.0.0.0
-ENV GRADIO_SERVER_PORT=7860
-ENV PYTHONPATH=/app
-# Run the app
-CMD ["uv", "run", "python", "-m", "src.app"]

LICENSE.md DELETED Viewed

@@ -1,25 +0,0 @@
-# License
-DeepCritical is licensed under the MIT License.
-## MIT License
-Copyright (c) 2024 DeepCritical Team
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.

README.md CHANGED Viewed

@@ -1,63 +1,15 @@
 ---
-title: The DETERMINATOR
-emoji: 🐉
-colorFrom: red
-colorTo: yellow
 sdk: gradio
-sdk_version: "6.0.1"
-python_version: "3.11"
 app_file: src/app.py
-hf_oauth: true
-hf_oauth_expiration_minutes: 480
-hf_oauth_scopes:
-  # Required for HuggingFace Inference API (includes all third-party providers)
-  # This scope grants access to:
-  # - HuggingFace's own Inference API
-  # - Third-party inference providers (nebius, together, scaleway, hyperbolic, novita, nscale, sambanova, ovh, fireworks, etc.)
-  # - All models available through the Inference Providers API
-  - inference-api
-  # Optional: Uncomment if you need to access user's billing information
-  # - read-billing
-pinned: true
 license: mit
-tags:
-  - mcp-in-action-track-enterprise
-  - mcp-hackathon
-  - deep-research
-  - biomedical-ai
-  - pydantic-ai
-  - llamaindex
-  - modal
-  - building-mcp-track-enterprise
-  - building-mcp-track-consumer
-  - mcp-in-action-track-enterprise
-  - mcp-in-action-track-consumer
-  - building-mcp-track-modal
-  - building-mcp-track-blaxel
-  - building-mcp-track-llama-index
-  - building-mcp-track-HUGGINGFACE
 ---
-> [!IMPORTANT]
-> **You are reading the Gradio Demo README!**
->
-> - 📚 **Documentation**: See our [technical documentation](https://deepcritical.github.io/GradioDemo/) for detailed information
-> - 📖 **Complete README**: Check out the [Github README](.github/README.md) for setup, configuration, and contribution guidelines
-> - ⚠️**This README is for our Gradio Demo Only !**
-<div align="center">
-[![GitHub](https://img.shields.io/github/stars/DeepCritical/GradioDemo?style=for-the-badge&logo=github&logoColor=white&label=GitHub&labelColor=181717&color=181717)](https://github.com/DeepCritical/GradioDemo)
-[![Documentation](https://img.shields.io/badge/Docs-0080FF?style=for-the-badge&logo=readthedocs&logoColor=white&labelColor=0080FF&color=0080FF)](deepcritical.github.io/GradioDemo/)
-[![Demo](https://img.shields.io/badge/Demo-FFD21E?style=for-the-badge&logo=huggingface&logoColor=white&labelColor=FFD21E&color=FFD21E)](https://huggingface.co/spaces/DataQuests/DeepCritical)
-[![codecov](https://codecov.io/gh/DeepCritical/GradioDemo/graph/badge.svg?token=B1f05RCGpz)](https://codecov.io/gh/DeepCritical/GradioDemo)
-[![Join us on Discord](https://img.shields.io/discord/1109943800132010065?label=Discord&logo=discord&style=flat-square)](https://discord.gg/qdfnvSPcqP)
-</div>
-# The DETERMINATOR
-## About
-The DETERMINATOR is a powerful generalist deep research agent system that stops at nothing until finding precise answers to complex questions. It uses iterative search-and-judge loops to comprehensively investigate any research question from any domain.

 ---
+title: DeepCritical
+emoji: 📈
+colorFrom: blue
+colorTo: purple
 sdk: gradio
+sdk_version: 6.0.0
 app_file: src/app.py
+pinned: false
 license: mit
+short_description: Deep Search for Critical Research [BigData] -> [Actionable]
 ---
+### DeepCritical

deployments/README.md DELETED Viewed

@@ -1,46 +0,0 @@
-# Deployments
-This directory contains infrastructure deployment scripts for DeepCritical services.
-## Modal Deployments
-### TTS Service (`modal_tts.py`)
-Deploys the Kokoro TTS (Text-to-Speech) function to Modal's GPU infrastructure.
-**Deploy:**
-```bash
-modal deploy deployments/modal_tts.py
-```
-**Features:**
-- Kokoro 82M TTS model
-- GPU-accelerated (T4)
-- Voice options: af_heart, af_bella, am_michael, etc.
-- Configurable speech speed
-**Requirements:**
-- Modal account and credentials (`MODAL_TOKEN_ID`, `MODAL_TOKEN_SECRET` in `.env`)
-- GPU quota on Modal
-**After Deployment:**
-The function will be available at:
-- App: `deepcritical-tts`
-- Function: `kokoro_tts_function`
-The main application (`src/services/tts_modal.py`) will call this deployed function.
----
-## Adding New Deployments
-When adding new deployment scripts:
-1. Create a new file: `deployments/<service_name>.py`
-2. Use Modal's app pattern:
-   ```python
-   import modal
-   app = modal.App("deepcritical-<service-name>")
-   ```
-3. Document in this README
-4. Test deployment: `modal deploy deployments/<service_name>.py`

deployments/modal_tts.py DELETED Viewed

@@ -1,97 +0,0 @@
-"""Deploy Kokoro TTS function to Modal.
-This script deploys the TTS function to Modal so it can be called
-from the main DeepCritical application.
-Usage:
-    modal deploy deploy_modal_tts.py
-After deployment, the function will be available at:
-    App: deepcritical-tts
-    Function: kokoro_tts_function
-"""
-import modal
-import numpy as np
-# Create Modal app
-app = modal.App("deepcritical-tts")
-# Define Kokoro TTS dependencies
-KOKORO_DEPENDENCIES = [
-    "torch>=2.0.0",
-    "transformers>=4.30.0",
-    "numpy<2.0",
-]
-# Create Modal image with Kokoro
-tts_image = (
-    modal.Image.debian_slim(python_version="3.11")
-    .apt_install("git")  # Install git first for pip install from github
-    .pip_install(*KOKORO_DEPENDENCIES)
-    .pip_install("git+https://github.com/hexgrad/kokoro.git")
-)
-@app.function(
-    image=tts_image,
-    gpu="T4",
-    timeout=60,
-)
-def kokoro_tts_function(text: str, voice: str, speed: float) -> tuple[int, np.ndarray]:
-    """Modal GPU function for Kokoro TTS.
-    This function runs on Modal's GPU infrastructure.
-    Based on: https://huggingface.co/spaces/hexgrad/Kokoro-TTS
-    Args:
-        text: Text to synthesize
-        voice: Voice ID (e.g., af_heart, af_bella, am_michael)
-        speed: Speech speed multiplier (0.5-2.0)
-    Returns:
-        Tuple of (sample_rate, audio_array)
-    """
-    import numpy as np
-    try:
-        import torch
-        from kokoro import KModel, KPipeline
-        # Initialize model (cached on GPU)
-        model = KModel().to("cuda").eval()
-        pipeline = KPipeline(lang_code=voice[0])
-        pack = pipeline.load_voice(voice)
-        # Generate audio - accumulate all chunks
-        audio_chunks = []
-        for _, ps, _ in pipeline(text, voice, speed):
-            ref_s = pack[len(ps) - 1]
-            audio = model(ps, ref_s, speed)
-            audio_chunks.append(audio.numpy())
-        # Concatenate all audio chunks
-        if audio_chunks:
-            full_audio = np.concatenate(audio_chunks)
-            return (24000, full_audio)
-        # If no audio generated, return empty
-        return (24000, np.zeros(1, dtype=np.float32))
-    except ImportError as e:
-        raise RuntimeError(
-            f"Kokoro not installed: {e}. "
-            "Install with: pip install git+https://github.com/hexgrad/kokoro.git"
-        ) from e
-    except Exception as e:
-        raise RuntimeError(f"TTS synthesis failed: {e}") from e
-# Optional: Add a test entrypoint
-@app.local_entrypoint()
-def test():
-    """Test the TTS function."""
-    print("Testing Modal TTS function...")
-    sample_rate, audio = kokoro_tts_function.remote("Hello, this is a test.", "af_heart", 1.0)
-    print(f"Generated audio: {sample_rate}Hz, shape={audio.shape}")
-    print("✓ TTS function works!")

dev/.cursorrules DELETED Viewed

@@ -1,241 +0,0 @@
-# DeepCritical Project - Cursor Rules
-## Project-Wide Rules
-**Architecture**: Multi-agent research system using Pydantic AI for agent orchestration, supporting iterative and deep research patterns. Uses middleware for state management, budget tracking, and workflow coordination.
-**Type Safety**: ALWAYS use complete type hints. All functions must have parameter and return type annotations. Use `mypy --strict` compliance. Use `TYPE_CHECKING` imports for circular dependencies: `from typing import TYPE_CHECKING; if TYPE_CHECKING: from src.services.embeddings import EmbeddingService`
-**Async Patterns**: ALL I/O operations must be async (`async def`, `await`). Use `asyncio.gather()` for parallel operations. CPU-bound work must use `run_in_executor()`: `loop = asyncio.get_running_loop(); result = await loop.run_in_executor(None, cpu_bound_function, args)`. Never block the event loop.
-**Error Handling**: Use custom exceptions from `src/utils/exceptions.py`: `DeepCriticalError`, `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions: `raise SearchError(...) from e`. Log with structlog: `logger.error("Operation failed", error=str(e), context=value)`.
-**Logging**: Use `structlog` for ALL logging (NOT `print` or `logging`). Import: `import structlog; logger = structlog.get_logger()`. Log with structured data: `logger.info("event", key=value)`. Use appropriate levels: DEBUG, INFO, WARNING, ERROR.
-**Pydantic Models**: All data exchange uses Pydantic models from `src/utils/models.py`. Models are frozen (`model_config = {"frozen": True}`) for immutability. Use `Field()` with descriptions. Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints.
-**Code Style**: Ruff with 100-char line length. Ignore rules: `PLR0913` (too many arguments), `PLR0912` (too many branches), `PLR0911` (too many returns), `PLR2004` (magic values), `PLW0603` (global statement), `PLC0415` (lazy imports).
-**Docstrings**: Google-style docstrings for all public functions. Include Args, Returns, Raises sections. Use type hints in docstrings only if needed for clarity.
-**Testing**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`). Use `respx` for httpx mocking, `pytest-mock` for general mocking.
-**State Management**: Use `ContextVar` in middleware for thread-safe isolation. Never use global mutable state (except singletons via `@lru_cache`). Use `WorkflowState` from `src/middleware/state_machine.py` for workflow state.
-**Citation Validation**: ALWAYS validate references before returning reports. Use `validate_references()` from `src/utils/citation_validator.py`. Remove hallucinated citations. Log warnings for removed citations.
----
-## src/agents/ - Agent Implementation Rules
-**Pattern**: All agents use Pydantic AI `Agent` class. Agents have structured output types (Pydantic models) or return strings. Use factory functions in `src/agent_factory/agents.py` for creation.
-**Agent Structure**:
-- System prompt as module-level constant (with date injection: `datetime.now().strftime("%Y-%m-%d")`)
-- Agent class with `__init__(model: Any | None = None)`
-- Main method (e.g., `async def evaluate()`, `async def write_report()`)
-- Factory function: `def create_agent_name(model: Any | None = None) -> AgentName`
-**Model Initialization**: Use `get_model()` from `src/agent_factory/judges.py` if no model provided. Support OpenAI/Anthropic/HF Inference via settings.
-**Error Handling**: Return fallback values (e.g., `KnowledgeGapOutput(research_complete=False, outstanding_gaps=[...])`) on failure. Log errors with context. Use retry logic (3 retries) in Pydantic AI Agent initialization.
-**Input Validation**: Validate query/inputs are not empty. Truncate very long inputs with warnings. Handle None values gracefully.
-**Output Types**: Use structured output types from `src/utils/models.py` (e.g., `KnowledgeGapOutput`, `AgentSelectionPlan`, `ReportDraft`). For text output (writer agents), return `str` directly.
-**Agent-Specific Rules**:
-- `knowledge_gap.py`: Outputs `KnowledgeGapOutput`. Evaluates research completeness.
-- `tool_selector.py`: Outputs `AgentSelectionPlan`. Selects tools (RAG/web/database).
-- `writer.py`: Returns markdown string. Includes citations in numbered format.
-- `long_writer.py`: Uses `ReportDraft` input/output. Handles section-by-section writing.
-- `proofreader.py`: Takes `ReportDraft`, returns polished markdown.
-- `thinking.py`: Returns observation string from conversation history.
-- `input_parser.py`: Outputs `ParsedQuery` with research mode detection.
----
-## src/tools/ - Search Tool Rules
-**Protocol**: All tools implement `SearchTool` protocol from `src/tools/base.py`: `name` property and `async def search(query, max_results) -> list[Evidence]`.
-**Rate Limiting**: Use `@retry` decorator from tenacity: `@retry(stop=stop_after_attempt(3), wait=wait_exponential(...))`. Implement `_rate_limit()` method for APIs with limits. Use shared rate limiters from `src/tools/rate_limiter.py`.
-**Error Handling**: Raise `SearchError` or `RateLimitError` on failures. Handle HTTP errors (429, 500, timeout). Return empty list on non-critical errors (log warning).
-**Query Preprocessing**: Use `preprocess_query()` from `src/tools/query_utils.py` to remove noise and expand synonyms.
-**Evidence Conversion**: Convert API responses to `Evidence` objects with `Citation`. Extract metadata (title, url, date, authors). Set relevance scores (0.0-1.0). Handle missing fields gracefully.
-**Tool-Specific Rules**:
-- `pubmed.py`: Use NCBI E-utilities (ESearch → EFetch). Rate limit: 0.34s between requests. Parse XML with `xmltodict`. Handle single vs. multiple articles.
-- `clinicaltrials.py`: Use `requests` library (NOT httpx - WAF blocks httpx). Run in thread pool: `await asyncio.to_thread(requests.get, ...)`. Filter: Only interventional studies, active/completed.
-- `europepmc.py`: Handle preprint markers: `[PREPRINT - Not peer-reviewed]`. Build URLs from DOI or PMID.
-- `rag_tool.py`: Wraps `LlamaIndexRAGService`. Returns Evidence from RAG results. Handles ingestion.
-- `search_handler.py`: Orchestrates parallel searches across multiple tools. Uses `asyncio.gather()` with `return_exceptions=True`. Aggregates results into `SearchResult`.
----
-## src/middleware/ - Middleware Rules
-**State Management**: Use `ContextVar` for thread-safe isolation. `WorkflowState` uses `ContextVar[WorkflowState | None]`. Initialize with `init_workflow_state(embedding_service)`. Access with `get_workflow_state()` (auto-initializes if missing).
-**WorkflowState**: Tracks `evidence: list[Evidence]`, `conversation: Conversation`, `embedding_service: Any`. Methods: `add_evidence()` (deduplicates by URL), `async search_related()` (semantic search).
-**WorkflowManager**: Manages parallel research loops. Methods: `add_loop()`, `run_loops_parallel()`, `update_loop_status()`, `sync_loop_evidence_to_state()`. Uses `asyncio.gather()` for parallel execution. Handles errors per loop (don't fail all if one fails).
-**BudgetTracker**: Tracks tokens, time, iterations per loop and globally. Methods: `create_budget()`, `add_tokens()`, `start_timer()`, `update_timer()`, `increment_iteration()`, `check_budget()`, `can_continue()`. Token estimation: `estimate_tokens(text)` (~4 chars per token), `estimate_llm_call_tokens(prompt, response)`.
-**Models**: All middleware models in `src/utils/models.py`. `IterationData`, `Conversation`, `ResearchLoop`, `BudgetStatus` are used by middleware.
----
-## src/orchestrator/ - Orchestration Rules
-**Research Flows**: Two patterns: `IterativeResearchFlow` (single loop) and `DeepResearchFlow` (plan → parallel loops → synthesis). Both support agent chains (`use_graph=False`) and graph execution (`use_graph=True`).
-**IterativeResearchFlow**: Pattern: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete. Uses `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`, `WriterAgent`, `JudgeHandler`. Tracks iterations, time, budget.
-**DeepResearchFlow**: Pattern: Planner → Parallel iterative loops per section → Synthesizer. Uses `PlannerAgent`, `IterativeResearchFlow` (per section), `LongWriterAgent` or `ProofreaderAgent`. Uses `WorkflowManager` for parallel execution.
-**Graph Orchestrator**: Uses Pydantic AI Graphs (when available) or agent chains (fallback). Routes based on research mode (iterative/deep/auto). Streams `AgentEvent` objects for UI.
-**State Initialization**: Always call `init_workflow_state()` before running flows. Initialize `BudgetTracker` per loop. Use `WorkflowManager` for parallel coordination.
-**Event Streaming**: Yield `AgentEvent` objects during execution. Event types: "started", "search_complete", "judge_complete", "hypothesizing", "synthesizing", "complete", "error". Include iteration numbers and data payloads.
----
-## src/services/ - Service Rules
-**EmbeddingService**: Local sentence-transformers (NO API key required). All operations async-safe via `run_in_executor()`. ChromaDB for vector storage. Deduplication threshold: 0.85 (85% similarity = duplicate).
-**LlamaIndexRAGService**: Uses OpenAI embeddings (requires `OPENAI_API_KEY`). Methods: `ingest_evidence()`, `retrieve()`, `query()`. Returns documents with metadata (source, title, url, date, authors). Lazy initialization with graceful fallback.
-**StatisticalAnalyzer**: Generates Python code via LLM. Executes in Modal sandbox (secure, isolated). Library versions pinned in `SANDBOX_LIBRARIES` dict. Returns `AnalysisResult` with verdict (SUPPORTED/REFUTED/INCONCLUSIVE).
-**Singleton Pattern**: Use `@lru_cache(maxsize=1)` for singletons: `@lru_cache(maxsize=1); def get_service() -> Service: return Service()`. Lazy initialization to avoid requiring dependencies at import time.
----
-## src/utils/ - Utility Rules
-**Models**: All Pydantic models in `src/utils/models.py`. Use frozen models (`model_config = {"frozen": True}`) except where mutation needed. Use `Field()` with descriptions. Validate with constraints.
-**Config**: Settings via Pydantic Settings (`src/utils/config.py`). Load from `.env` automatically. Use `settings` singleton: `from src.utils.config import settings`. Validate API keys with properties: `has_openai_key`, `has_anthropic_key`.
-**Exceptions**: Custom exception hierarchy in `src/utils/exceptions.py`. Base: `DeepCriticalError`. Specific: `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions.
-**LLM Factory**: Centralized LLM model creation in `src/utils/llm_factory.py`. Supports OpenAI, Anthropic, HF Inference. Use `get_model()` or factory functions. Check requirements before initialization.
-**Citation Validator**: Use `validate_references()` from `src/utils/citation_validator.py`. Removes hallucinated citations (URLs not in evidence). Logs warnings. Returns validated report string.
----
-## src/orchestrator_factory.py Rules
-**Purpose**: Factory for creating orchestrators. Supports "simple" (legacy) and "advanced" (magentic) modes. Auto-detects mode based on API key availability.
-**Pattern**: Lazy import for optional dependencies (`_get_magentic_orchestrator_class()`). Handles `ImportError` gracefully with clear error messages.
-**Mode Detection**: `_determine_mode()` checks explicit mode or auto-detects: "advanced" if `settings.has_openai_key`, else "simple". Maps "magentic" → "advanced".
-**Function Signature**: `create_orchestrator(search_handler, judge_handler, config, mode) -> Any`. Simple mode requires handlers. Advanced mode uses MagenticOrchestrator.
-**Error Handling**: Raise `ValueError` with clear messages if requirements not met. Log mode selection with structlog.
----
-## src/orchestrator_hierarchical.py Rules
-**Purpose**: Hierarchical orchestrator using middleware and sub-teams. Adapts Magentic ChatAgent to SubIterationTeam protocol.
-**Pattern**: Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`. Event-driven via callback queue.
-**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated, but kept for compatibility).
-**Event Streaming**: Uses `asyncio.Queue` for event coordination. Yields `AgentEvent` objects. Handles event callback pattern with `asyncio.wait()`.
-**Error Handling**: Log errors with context. Yield error events. Process remaining events after task completion.
----
-## src/orchestrator_magentic.py Rules
-**Purpose**: Magentic-based orchestrator using ChatAgent pattern. Each agent has internal LLM. Manager orchestrates agents.
-**Pattern**: Uses `MagenticBuilder` with participants (searcher, hypothesizer, judge, reporter). Manager uses `OpenAIChatClient`. Workflow built in `_build_workflow()`.
-**Event Processing**: `_process_event()` converts Magentic events to `AgentEvent`. Handles: `MagenticOrchestratorMessageEvent`, `MagenticAgentMessageEvent`, `MagenticFinalResultEvent`, `MagenticAgentDeltaEvent`, `WorkflowOutputEvent`.
-**Text Extraction**: `_extract_text()` defensively extracts text from messages. Priority: `.content` → `.text` → `str(message)`. Handles buggy message objects.
-**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated).
-**Requirements**: Must call `check_magentic_requirements()` in `__init__`. Requires `agent-framework-core` and OpenAI API key.
-**Event Types**: Maps agent names to event types: "search" → "search_complete", "judge" → "judge_complete", "hypothes" → "hypothesizing", "report" → "synthesizing".
----
-## src/agent_factory/ - Factory Rules
-**Pattern**: Factory functions for creating agents and handlers. Lazy initialization for optional dependencies. Support OpenAI/Anthropic/HF Inference.
-**Judges**: `create_judge_handler()` creates `JudgeHandler` with structured output (`JudgeAssessment`). Supports `MockJudgeHandler`, `HFInferenceJudgeHandler` as fallbacks.
-**Agents**: Factory functions in `agents.py` for all Pydantic AI agents. Pattern: `create_agent_name(model: Any | None = None) -> AgentName`. Use `get_model()` if model not provided.
-**Graph Builder**: `graph_builder.py` contains utilities for building research graphs. Supports iterative and deep research graph construction.
-**Error Handling**: Raise `ConfigurationError` if required API keys missing. Log agent creation. Handle import errors gracefully.
----
-## src/prompts/ - Prompt Rules
-**Pattern**: System prompts stored as module-level constants. Include date injection: `datetime.now().strftime("%Y-%m-%d")`. Format evidence with truncation (1500 chars per item).
-**Judge Prompts**: In `judge.py`. Handle empty evidence case separately. Always request structured JSON output.
-**Hypothesis Prompts**: In `hypothesis.py`. Use diverse evidence selection (MMR algorithm). Sentence-aware truncation.
-**Report Prompts**: In `report.py`. Include full citation details. Use diverse evidence selection (n=20). Emphasize citation validation rules.
----
-## Testing Rules
-**Structure**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`).
-**Mocking**: Use `respx` for httpx mocking. Use `pytest-mock` for general mocking. Mock LLM calls in unit tests (use `MockJudgeHandler`).
-**Fixtures**: Common fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`.
-**Coverage**: Aim for >80% coverage. Test error handling, edge cases, and integration paths.
----
-## File-Specific Agent Rules
-**knowledge_gap.py**: Outputs `KnowledgeGapOutput`. System prompt evaluates research completeness. Handles conversation history. Returns fallback on error.
-**writer.py**: Returns markdown string. System prompt includes citation format examples. Validates inputs. Truncates long findings. Retry logic for transient failures.
-**long_writer.py**: Uses `ReportDraft` input/output. Writes sections iteratively. Reformats references (deduplicates, renumbers). Reformats section headings.
-**proofreader.py**: Takes `ReportDraft`, returns polished markdown. Removes duplicates. Adds summary. Preserves references.
-**tool_selector.py**: Outputs `AgentSelectionPlan`. System prompt lists available agents (WebSearchAgent, SiteCrawlerAgent, RAGAgent). Guidelines for when to use each.
-**thinking.py**: Returns observation string. Generates observations from conversation history. Uses query and background context.
-**input_parser.py**: Outputs `ParsedQuery`. Detects research mode (iterative/deep). Extracts entities and research questions. Improves/refines query.

dev/AGENTS.txt DELETED Viewed

@@ -1,236 +0,0 @@
-# DeepCritical Project - Rules
-## Project-Wide Rules
-**Architecture**: Multi-agent research system using Pydantic AI for agent orchestration, supporting iterative and deep research patterns. Uses middleware for state management, budget tracking, and workflow coordination.
-**Type Safety**: ALWAYS use complete type hints. All functions must have parameter and return type annotations. Use `mypy --strict` compliance. Use `TYPE_CHECKING` imports for circular dependencies: `from typing import TYPE_CHECKING; if TYPE_CHECKING: from src.services.embeddings import EmbeddingService`
-**Async Patterns**: ALL I/O operations must be async (`async def`, `await`). Use `asyncio.gather()` for parallel operations. CPU-bound work must use `run_in_executor()`: `loop = asyncio.get_running_loop(); result = await loop.run_in_executor(None, cpu_bound_function, args)`. Never block the event loop.
-**Error Handling**: Use custom exceptions from `src/utils/exceptions.py`: `DeepCriticalError`, `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions: `raise SearchError(...) from e`. Log with structlog: `logger.error("Operation failed", error=str(e), context=value)`.
-**Logging**: Use `structlog` for ALL logging (NOT `print` or `logging`). Import: `import structlog; logger = structlog.get_logger()`. Log with structured data: `logger.info("event", key=value)`. Use appropriate levels: DEBUG, INFO, WARNING, ERROR.
-**Pydantic Models**: All data exchange uses Pydantic models from `src/utils/models.py`. Models are frozen (`model_config = {"frozen": True}`) for immutability. Use `Field()` with descriptions. Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints.
-**Code Style**: Ruff with 100-char line length. Ignore rules: `PLR0913` (too many arguments), `PLR0912` (too many branches), `PLR0911` (too many returns), `PLR2004` (magic values), `PLW0603` (global statement), `PLC0415` (lazy imports).
-**Docstrings**: Google-style docstrings for all public functions. Include Args, Returns, Raises sections. Use type hints in docstrings only if needed for clarity.
-**Testing**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`). Use `respx` for httpx mocking, `pytest-mock` for general mocking.
-**State Management**: Use `ContextVar` in middleware for thread-safe isolation. Never use global mutable state (except singletons via `@lru_cache`). Use `WorkflowState` from `src/middleware/state_machine.py` for workflow state.
-**Citation Validation**: ALWAYS validate references before returning reports. Use `validate_references()` from `src/utils/citation_validator.py`. Remove hallucinated citations. Log warnings for removed citations.
----
-## src/agents/ - Agent Implementation Rules
-**Pattern**: All agents use Pydantic AI `Agent` class. Agents have structured output types (Pydantic models) or return strings. Use factory functions in `src/agent_factory/agents.py` for creation.
-**Agent Structure**:
-- System prompt as module-level constant (with date injection: `datetime.now().strftime("%Y-%m-%d")`)
-- Agent class with `__init__(model: Any | None = None)`
-- Main method (e.g., `async def evaluate()`, `async def write_report()`)
-- Factory function: `def create_agent_name(model: Any | None = None) -> AgentName`
-**Model Initialization**: Use `get_model()` from `src/agent_factory/judges.py` if no model provided. Support OpenAI/Anthropic/HF Inference via settings.
-**Error Handling**: Return fallback values (e.g., `KnowledgeGapOutput(research_complete=False, outstanding_gaps=[...])`) on failure. Log errors with context. Use retry logic (3 retries) in Pydantic AI Agent initialization.
-**Input Validation**: Validate query/inputs are not empty. Truncate very long inputs with warnings. Handle None values gracefully.
-**Output Types**: Use structured output types from `src/utils/models.py` (e.g., `KnowledgeGapOutput`, `AgentSelectionPlan`, `ReportDraft`). For text output (writer agents), return `str` directly.
-**Agent-Specific Rules**:
-- `knowledge_gap.py`: Outputs `KnowledgeGapOutput`. Evaluates research completeness.
-- `tool_selector.py`: Outputs `AgentSelectionPlan`. Selects tools (RAG/web/database).
-- `writer.py`: Returns markdown string. Includes citations in numbered format.
-- `long_writer.py`: Uses `ReportDraft` input/output. Handles section-by-section writing.
-- `proofreader.py`: Takes `ReportDraft`, returns polished markdown.
-- `thinking.py`: Returns observation string from conversation history.
-- `input_parser.py`: Outputs `ParsedQuery` with research mode detection.
----
-## src/tools/ - Search Tool Rules
-**Protocol**: All tools implement `SearchTool` protocol from `src/tools/base.py`: `name` property and `async def search(query, max_results) -> list[Evidence]`.
-**Rate Limiting**: Use `@retry` decorator from tenacity: `@retry(stop=stop_after_attempt(3), wait=wait_exponential(...))`. Implement `_rate_limit()` method for APIs with limits. Use shared rate limiters from `src/tools/rate_limiter.py`.
-**Error Handling**: Raise `SearchError` or `RateLimitError` on failures. Handle HTTP errors (429, 500, timeout). Return empty list on non-critical errors (log warning).
-**Query Preprocessing**: Use `preprocess_query()` from `src/tools/query_utils.py` to remove noise and expand synonyms.
-**Evidence Conversion**: Convert API responses to `Evidence` objects with `Citation`. Extract metadata (title, url, date, authors). Set relevance scores (0.0-1.0). Handle missing fields gracefully.
-**Tool-Specific Rules**:
-- `pubmed.py`: Use NCBI E-utilities (ESearch → EFetch). Rate limit: 0.34s between requests. Parse XML with `xmltodict`. Handle single vs. multiple articles.
-- `clinicaltrials.py`: Use `requests` library (NOT httpx - WAF blocks httpx). Run in thread pool: `await asyncio.to_thread(requests.get, ...)`. Filter: Only interventional studies, active/completed.
-- `europepmc.py`: Handle preprint markers: `[PREPRINT - Not peer-reviewed]`. Build URLs from DOI or PMID.
-- `rag_tool.py`: Wraps `LlamaIndexRAGService`. Returns Evidence from RAG results. Handles ingestion.
-- `search_handler.py`: Orchestrates parallel searches across multiple tools. Uses `asyncio.gather()` with `return_exceptions=True`. Aggregates results into `SearchResult`.
----
-## src/middleware/ - Middleware Rules
-**State Management**: Use `ContextVar` for thread-safe isolation. `WorkflowState` uses `ContextVar[WorkflowState | None]`. Initialize with `init_workflow_state(embedding_service)`. Access with `get_workflow_state()` (auto-initializes if missing).
-**WorkflowState**: Tracks `evidence: list[Evidence]`, `conversation: Conversation`, `embedding_service: Any`. Methods: `add_evidence()` (deduplicates by URL), `async search_related()` (semantic search).
-**WorkflowManager**: Manages parallel research loops. Methods: `add_loop()`, `run_loops_parallel()`, `update_loop_status()`, `sync_loop_evidence_to_state()`. Uses `asyncio.gather()` for parallel execution. Handles errors per loop (don't fail all if one fails).
-**BudgetTracker**: Tracks tokens, time, iterations per loop and globally. Methods: `create_budget()`, `add_tokens()`, `start_timer()`, `update_timer()`, `increment_iteration()`, `check_budget()`, `can_continue()`. Token estimation: `estimate_tokens(text)` (~4 chars per token), `estimate_llm_call_tokens(prompt, response)`.
-**Models**: All middleware models in `src/utils/models.py`. `IterationData`, `Conversation`, `ResearchLoop`, `BudgetStatus` are used by middleware.
----
-## src/orchestrator/ - Orchestration Rules
-**Research Flows**: Two patterns: `IterativeResearchFlow` (single loop) and `DeepResearchFlow` (plan → parallel loops → synthesis). Both support agent chains (`use_graph=False`) and graph execution (`use_graph=True`).
-**IterativeResearchFlow**: Pattern: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete. Uses `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`, `WriterAgent`, `JudgeHandler`. Tracks iterations, time, budget.
-**DeepResearchFlow**: Pattern: Planner → Parallel iterative loops per section → Synthesizer. Uses `PlannerAgent`, `IterativeResearchFlow` (per section), `LongWriterAgent` or `ProofreaderAgent`. Uses `WorkflowManager` for parallel execution.
-**Graph Orchestrator**: Uses Pydantic AI Graphs (when available) or agent chains (fallback). Routes based on research mode (iterative/deep/auto). Streams `AgentEvent` objects for UI.
-**State Initialization**: Always call `init_workflow_state()` before running flows. Initialize `BudgetTracker` per loop. Use `WorkflowManager` for parallel coordination.
-**Event Streaming**: Yield `AgentEvent` objects during execution. Event types: "started", "search_complete", "judge_complete", "hypothesizing", "synthesizing", "complete", "error". Include iteration numbers and data payloads.
----
-## src/services/ - Service Rules
-**EmbeddingService**: Local sentence-transformers (NO API key required). All operations async-safe via `run_in_executor()`. ChromaDB for vector storage. Deduplication threshold: 0.85 (85% similarity = duplicate).
-**LlamaIndexRAGService**: Uses OpenAI embeddings (requires `OPENAI_API_KEY`). Methods: `ingest_evidence()`, `retrieve()`, `query()`. Returns documents with metadata (source, title, url, date, authors). Lazy initialization with graceful fallback.
-**StatisticalAnalyzer**: Generates Python code via LLM. Executes in Modal sandbox (secure, isolated). Library versions pinned in `SANDBOX_LIBRARIES` dict. Returns `AnalysisResult` with verdict (SUPPORTED/REFUTED/INCONCLUSIVE).
-**Singleton Pattern**: Use `@lru_cache(maxsize=1)` for singletons: `@lru_cache(maxsize=1); def get_service() -> Service: return Service()`. Lazy initialization to avoid requiring dependencies at import time.
----
-## src/utils/ - Utility Rules
-**Models**: All Pydantic models in `src/utils/models.py`. Use frozen models (`model_config = {"frozen": True}`) except where mutation needed. Use `Field()` with descriptions. Validate with constraints.
-**Config**: Settings via Pydantic Settings (`src/utils/config.py`). Load from `.env` automatically. Use `settings` singleton: `from src.utils.config import settings`. Validate API keys with properties: `has_openai_key`, `has_anthropic_key`.
-**Exceptions**: Custom exception hierarchy in `src/utils/exceptions.py`. Base: `DeepCriticalError`. Specific: `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions.
-**LLM Factory**: Centralized LLM model creation in `src/utils/llm_factory.py`. Supports OpenAI, Anthropic, HF Inference. Use `get_model()` or factory functions. Check requirements before initialization.
-**Citation Validator**: Use `validate_references()` from `src/utils/citation_validator.py`. Removes hallucinated citations (URLs not in evidence). Logs warnings. Returns validated report string.
----
-## src/orchestrator_factory.py Rules
-**Purpose**: Factory for creating orchestrators. Supports "simple" (legacy) and "advanced" (magentic) modes. Auto-detects mode based on API key availability.
-**Pattern**: Lazy import for optional dependencies (`_get_magentic_orchestrator_class()`). Handles `ImportError` gracefully with clear error messages.
-**Mode Detection**: `_determine_mode()` checks explicit mode or auto-detects: "advanced" if `settings.has_openai_key`, else "simple". Maps "magentic" → "advanced".
-**Function Signature**: `create_orchestrator(search_handler, judge_handler, config, mode) -> Any`. Simple mode requires handlers. Advanced mode uses MagenticOrchestrator.
-**Error Handling**: Raise `ValueError` with clear messages if requirements not met. Log mode selection with structlog.
----
-## src/orchestrator_hierarchical.py Rules
-**Purpose**: Hierarchical orchestrator using middleware and sub-teams. Adapts Magentic ChatAgent to SubIterationTeam protocol.
-**Pattern**: Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`. Event-driven via callback queue.
-**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated, but kept for compatibility).
-**Event Streaming**: Uses `asyncio.Queue` for event coordination. Yields `AgentEvent` objects. Handles event callback pattern with `asyncio.wait()`.
-**Error Handling**: Log errors with context. Yield error events. Process remaining events after task completion.
----
-## src/orchestrator_magentic.py Rules
-**Purpose**: Magentic-based orchestrator using ChatAgent pattern. Each agent has internal LLM. Manager orchestrates agents.
-**Pattern**: Uses `MagenticBuilder` with participants (searcher, hypothesizer, judge, reporter). Manager uses `OpenAIChatClient`. Workflow built in `_build_workflow()`.
-**Event Processing**: `_process_event()` converts Magentic events to `AgentEvent`. Handles: `MagenticOrchestratorMessageEvent`, `MagenticAgentMessageEvent`, `MagenticFinalResultEvent`, `MagenticAgentDeltaEvent`, `WorkflowOutputEvent`.
-**Text Extraction**: `_extract_text()` defensively extracts text from messages. Priority: `.content` → `.text` → `str(message)`. Handles buggy message objects.
-**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated).
-**Requirements**: Must call `check_magentic_requirements()` in `__init__`. Requires `agent-framework-core` and OpenAI API key.
-**Event Types**: Maps agent names to event types: "search" → "search_complete", "judge" → "judge_complete", "hypothes" → "hypothesizing", "report" → "synthesizing".
----
-## src/agent_factory/ - Factory Rules
-**Pattern**: Factory functions for creating agents and handlers. Lazy initialization for optional dependencies. Support OpenAI/Anthropic/HF Inference.
-**Judges**: `create_judge_handler()` creates `JudgeHandler` with structured output (`JudgeAssessment`). Supports `MockJudgeHandler`, `HFInferenceJudgeHandler` as fallbacks.
-**Agents**: Factory functions in `agents.py` for all Pydantic AI agents. Pattern: `create_agent_name(model: Any | None = None) -> AgentName`. Use `get_model()` if model not provided.
-**Graph Builder**: `graph_builder.py` contains utilities for building research graphs. Supports iterative and deep research graph construction.
-**Error Handling**: Raise `ConfigurationError` if required API keys missing. Log agent creation. Handle import errors gracefully.
----
-## src/prompts/ - Prompt Rules
-**Pattern**: System prompts stored as module-level constants. Include date injection: `datetime.now().strftime("%Y-%m-%d")`. Format evidence with truncation (1500 chars per item).
-**Judge Prompts**: In `judge.py`. Handle empty evidence case separately. Always request structured JSON output.
-**Hypothesis Prompts**: In `hypothesis.py`. Use diverse evidence selection (MMR algorithm). Sentence-aware truncation.
-**Report Prompts**: In `report.py`. Include full citation details. Use diverse evidence selection (n=20). Emphasize citation validation rules.
----
-## Testing Rules
-**Structure**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`).
-**Mocking**: Use `respx` for httpx mocking. Use `pytest-mock` for general mocking. Mock LLM calls in unit tests (use `MockJudgeHandler`).
-**Fixtures**: Common fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`.
-**Coverage**: Aim for >80% coverage. Test error handling, edge cases, and integration paths.
----
-## File-Specific Agent Rules
-**knowledge_gap.py**: Outputs `KnowledgeGapOutput`. System prompt evaluates research completeness. Handles conversation history. Returns fallback on error.
-**writer.py**: Returns markdown string. System prompt includes citation format examples. Validates inputs. Truncates long findings. Retry logic for transient failures.
-**long_writer.py**: Uses `ReportDraft` input/output. Writes sections iteratively. Reformats references (deduplicates, renumbers). Reformats section headings.
-**proofreader.py**: Takes `ReportDraft`, returns polished markdown. Removes duplicates. Adds summary. Preserves references.
-**tool_selector.py**: Outputs `AgentSelectionPlan`. System prompt lists available agents (WebSearchAgent, SiteCrawlerAgent, RAGAgent). Guidelines for when to use each.
-**thinking.py**: Returns observation string. Generates observations from conversation history. Uses query and background context.
-**input_parser.py**: Outputs `ParsedQuery`. Detects research mode (iterative/deep). Extracts entities and research questions. Improves/refines query.

dev/docs_plugins.py DELETED Viewed

@@ -1,74 +0,0 @@
-"""Custom MkDocs extension to handle code anchor format: ```start:end:filepath"""
-import re
-from pathlib import Path
-from markdown import Markdown
-from markdown.extensions import Extension
-from markdown.preprocessors import Preprocessor
-class CodeAnchorPreprocessor(Preprocessor):
-    """Preprocess code blocks with anchor format: ```start:end:filepath"""
-    def __init__(self, md: Markdown, base_path: Path):
-        super().__init__(md)
-        self.base_path = base_path
-        self.pattern = re.compile(r"^```(\d+):(\d+):([^\n]+)\n(.*?)```$", re.MULTILINE | re.DOTALL)
-    def run(self, lines: list[str]) -> list[str]:
-        """Process lines and convert code anchor format to standard code blocks."""
-        text = "\n".join(lines)
-        new_text = self.pattern.sub(self._replace_code_anchor, text)
-        return new_text.split("\n")
-    def _replace_code_anchor(self, match) -> str:
-        """Replace code anchor format with standard code block + link."""
-        start_line = int(match.group(1))
-        end_line = int(match.group(2))
-        file_path = match.group(3).strip()
-        existing_code = match.group(4)
-        # Determine language from file extension
-        ext = Path(file_path).suffix.lower()
-        lang_map = {
-            ".py": "python",
-            ".js": "javascript",
-            ".ts": "typescript",
-            ".md": "markdown",
-            ".yaml": "yaml",
-            ".yml": "yaml",
-            ".toml": "toml",
-            ".json": "json",
-            ".html": "html",
-            ".css": "css",
-            ".sh": "bash",
-        }
-        language = lang_map.get(ext, "python")
-        # Generate GitHub link
-        repo_url = "https://github.com/DeepCritical/GradioDemo"
-        github_link = f"{repo_url}/blob/main/{file_path}#L{start_line}-L{end_line}"
-        # Return standard code block with source link
-        return (
-            f'[View source: `{file_path}` (lines {start_line}-{end_line})]({github_link}){{: target="_blank" }}\n\n'
-            f"```{language}\n{existing_code}\n```"
-        )
-class CodeAnchorExtension(Extension):
-    """Markdown extension for code anchors."""
-    def __init__(self, base_path: str = ".", **kwargs):
-        super().__init__(**kwargs)
-        self.base_path = Path(base_path)
-    def extendMarkdown(self, md: Markdown):  # noqa: N802
-        """Register the preprocessor."""
-        md.preprocessors.register(CodeAnchorPreprocessor(md, self.base_path), "codeanchor", 25)
-def makeExtension(**kwargs):  # noqa: N802
-    """Create the extension."""
-    return CodeAnchorExtension(**kwargs)

docs/LICENSE.md DELETED Viewed

@@ -1,35 +0,0 @@
-# License
-DeepCritical is licensed under the MIT License.
-## MIT License
-Copyright (c) 2024 DeepCritical Team
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.

docs/api/agents.md DELETED Viewed

@@ -1,211 +0,0 @@
-# Agents API Reference
-This page documents the API for DeepCritical agents.
-## KnowledgeGapAgent
-**Module**: `src.agents.knowledge_gap`
-**Purpose**: Evaluates research state and identifies knowledge gaps.
-### Methods
-#### `evaluate`
-<!--codeinclude-->
-[KnowledgeGapAgent.evaluate](../src/agents/knowledge_gap.py) start_line:66 end_line:74
-<!--/codeinclude-->
-Evaluates research completeness and identifies outstanding knowledge gaps.
-**Parameters**:
-- `query`: Research query string
-- `background_context`: Background context for the query (default: "")
-- `conversation_history`: History of actions, findings, and thoughts as string (default: "")
-- `iteration`: Current iteration number (default: 0)
-- `time_elapsed_minutes`: Elapsed time in minutes (default: 0.0)
-- `max_time_minutes`: Maximum time limit in minutes (default: 10)
-**Returns**: `KnowledgeGapOutput` with:
-- `research_complete`: Boolean indicating if research is complete
-- `outstanding_gaps`: List of remaining knowledge gaps
-## ToolSelectorAgent
-**Module**: `src.agents.tool_selector`
-**Purpose**: Selects appropriate tools for addressing knowledge gaps.
-### Methods
-#### `select_tools`
-<!--codeinclude-->
-[ToolSelectorAgent.select_tools](../src/agents/tool_selector.py) start_line:78 end_line:84
-<!--/codeinclude-->
-Selects tools for addressing a knowledge gap.
-**Parameters**:
-- `gap`: The knowledge gap to address
-- `query`: Research query string
-- `background_context`: Optional background context (default: "")
-- `conversation_history`: History of actions, findings, and thoughts as string (default: "")
-**Returns**: `AgentSelectionPlan` with list of `AgentTask` objects.
-## WriterAgent
-**Module**: `src.agents.writer`
-**Purpose**: Generates final reports from research findings.
-### Methods
-#### `write_report`
-<!--codeinclude-->
-[WriterAgent.write_report](../src/agents/writer.py) start_line:67 end_line:73
-<!--/codeinclude-->
-Generates a markdown report from research findings.
-**Parameters**:
-- `query`: Research query string
-- `findings`: Research findings to include in report
-- `output_length`: Optional description of desired output length (default: "")
-- `output_instructions`: Optional additional instructions for report generation (default: "")
-**Returns**: Markdown string with numbered citations.
-## LongWriterAgent
-**Module**: `src.agents.long_writer`
-**Purpose**: Long-form report generation with section-by-section writing.
-### Methods
-#### `write_next_section`
-<!--codeinclude-->
-[LongWriterAgent.write_next_section](../src/agents/long_writer.py) start_line:94 end_line:100
-<!--/codeinclude-->
-Writes the next section of a long-form report.
-**Parameters**:
-- `original_query`: The original research query
-- `report_draft`: Current report draft as string (all sections written so far)
-- `next_section_title`: Title of the section to write
-- `next_section_draft`: Draft content for the next section
-**Returns**: `LongWriterOutput` with formatted section and references.
-#### `write_report`
-<!--codeinclude-->
-[LongWriterAgent.write_report](../src/agents/long_writer.py) start_line:263 end_line:268
-<!--/codeinclude-->
-Generates final report from draft.
-**Parameters**:
-- `query`: Research query string
-- `report_title`: Title of the report
-- `report_draft`: Complete report draft
-**Returns**: Final markdown report string.
-## ProofreaderAgent
-**Module**: `src.agents.proofreader`
-**Purpose**: Proofreads and polishes report drafts.
-### Methods
-#### `proofread`
-<!--codeinclude-->
-[ProofreaderAgent.proofread](../src/agents/proofreader.py) start_line:72 end_line:76
-<!--/codeinclude-->
-Proofreads and polishes a report draft.
-**Parameters**:
-- `query`: Research query string
-- `report_title`: Title of the report
-- `report_draft`: Report draft to proofread
-**Returns**: Polished markdown string.
-## ThinkingAgent
-**Module**: `src.agents.thinking`
-**Purpose**: Generates observations from conversation history.
-### Methods
-#### `generate_observations`
-<!--codeinclude-->
-[ThinkingAgent.generate_observations](../src/agents/thinking.py) start_line:70 end_line:76
-<!--/codeinclude-->
-Generates observations from conversation history.
-**Parameters**:
-- `query`: Research query string
-- `background_context`: Optional background context (default: "")
-- `conversation_history`: History of actions, findings, and thoughts as string (default: "")
-- `iteration`: Current iteration number (default: 1)
-**Returns**: Observation string.
-## InputParserAgent
-**Module**: `src.agents.input_parser`
-**Purpose**: Parses and improves user queries, detects research mode.
-### Methods
-#### `parse`
-<!--codeinclude-->
-[InputParserAgent.parse](../src/agents/input_parser.py) start_line:82 end_line:82
-<!--/codeinclude-->
-Parses and improves a user query.
-**Parameters**:
-- `query`: Original query string
-**Returns**: `ParsedQuery` with:
-- `original_query`: Original query string
-- `improved_query`: Refined query string
-- `research_mode`: "iterative" or "deep"
-- `key_entities`: List of key entities
-- `research_questions`: List of research questions
-## Factory Functions
-All agents have factory functions in `src.agent_factory.agents`:
-<!--codeinclude-->
-[Factory Functions](../src/agent_factory/agents.py) start_line:30 end_line:50
-<!--/codeinclude-->
-**Parameters**:
-- `model`: Optional Pydantic AI model. If None, uses `get_model()` from settings.
-- `oauth_token`: Optional OAuth token from HuggingFace login (takes priority over env vars)
-**Returns**: Agent instance.
-## See Also
-- [Architecture - Agents](../architecture/agents.md) - Architecture overview
-- [Models API](models.md) - Data models used by agents

docs/api/models.md DELETED Viewed

@@ -1,191 +0,0 @@
-# Models API Reference
-This page documents the Pydantic models used throughout DeepCritical.
-## Evidence
-**Module**: `src.utils.models`
-**Purpose**: Represents evidence from search results.
-<!--codeinclude-->
-[Evidence Model](../src/utils/models.py) start_line:33 end_line:44
-<!--/codeinclude-->
-**Fields**:
-- `citation`: Citation information (title, URL, date, authors)
-- `content`: Evidence text content
-- `relevance`: Relevance score (0.0-1.0)
-- `metadata`: Additional metadata dictionary
-## Citation
-**Module**: `src.utils.models`
-**Purpose**: Citation information for evidence.
-<!--codeinclude-->
-[Citation Model](../src/utils/models.py) start_line:12 end_line:30
-<!--/codeinclude-->
-**Fields**:
-- `source`: Source name (e.g., "pubmed", "clinicaltrials", "europepmc", "web", "rag")
-- `title`: Article/trial title
-- `url`: Source URL
-- `date`: Publication date (YYYY-MM-DD or "Unknown")
-- `authors`: List of authors (optional)
-## KnowledgeGapOutput
-**Module**: `src.utils.models`
-**Purpose**: Output from knowledge gap evaluation.
-<!--codeinclude-->
-[KnowledgeGapOutput Model](../src/utils/models.py) start_line:494 end_line:504
-<!--/codeinclude-->
-**Fields**:
-- `research_complete`: Boolean indicating if research is complete
-- `outstanding_gaps`: List of remaining knowledge gaps
-## AgentSelectionPlan
-**Module**: `src.utils.models`
-**Purpose**: Plan for tool/agent selection.
-<!--codeinclude-->
-[AgentSelectionPlan Model](../src/utils/models.py) start_line:521 end_line:526
-<!--/codeinclude-->
-**Fields**:
-- `tasks`: List of agent tasks to execute
-## AgentTask
-**Module**: `src.utils.models`
-**Purpose**: Individual agent task.
-<!--codeinclude-->
-[AgentTask Model](../src/utils/models.py) start_line:507 end_line:518
-<!--/codeinclude-->
-**Fields**:
-- `gap`: The knowledge gap being addressed (optional)
-- `agent`: Name of agent to use
-- `query`: The specific query for the agent
-- `entity_website`: The website of the entity being researched, if known (optional)
-## ReportDraft
-**Module**: `src.utils.models`
-**Purpose**: Draft structure for long-form reports.
-<!--codeinclude-->
-[ReportDraft Model](../src/utils/models.py) start_line:538 end_line:545
-<!--/codeinclude-->
-**Fields**:
-- `sections`: List of report sections
-## ReportSection
-**Module**: `src.utils.models`
-**Purpose**: Individual section in a report draft.
-<!--codeinclude-->
-[ReportDraftSection Model](../src/utils/models.py) start_line:529 end_line:535
-<!--/codeinclude-->
-**Fields**:
-- `section_title`: The title of the section
-- `section_content`: The content of the section
-## ParsedQuery
-**Module**: `src.utils.models`
-**Purpose**: Parsed and improved query.
-<!--codeinclude-->
-[ParsedQuery Model](../src/utils/models.py) start_line:557 end_line:572
-<!--/codeinclude-->
-**Fields**:
-- `original_query`: Original query string
-- `improved_query`: Refined query string
-- `research_mode`: Research mode ("iterative" or "deep")
-- `key_entities`: List of key entities
-- `research_questions`: List of research questions
-## Conversation
-**Module**: `src.utils.models`
-**Purpose**: Conversation history with iterations.
-<!--codeinclude-->
-[Conversation Model](../src/utils/models.py) start_line:331 end_line:337
-<!--/codeinclude-->
-**Fields**:
-- `history`: List of iteration data
-## IterationData
-**Module**: `src.utils.models`
-**Purpose**: Data for a single iteration.
-<!--codeinclude-->
-[IterationData Model](../src/utils/models.py) start_line:315 end_line:328
-<!--/codeinclude-->
-**Fields**:
-- `gap`: The gap addressed in the iteration
-- `tool_calls`: The tool calls made
-- `findings`: The findings collected from tool calls
-- `thought`: The thinking done to reflect on the success of the iteration and next steps
-## AgentEvent
-**Module**: `src.utils.models`
-**Purpose**: Event emitted during research execution.
-<!--codeinclude-->
-[AgentEvent Model](../src/utils/models.py) start_line:104 end_line:125
-<!--/codeinclude-->
-**Fields**:
-- `type`: Event type (e.g., "started", "search_complete", "complete")
-- `iteration`: Iteration number (optional)
-- `data`: Event data dictionary
-## BudgetStatus
-**Module**: `src.utils.models`
-**Purpose**: Current budget status.
-<!--codeinclude-->
-[BudgetStatus Model](../src/middleware/budget_tracker.py) start_line:15 end_line:25
-<!--/codeinclude-->
-**Fields**:
-- `tokens_used`: Total tokens used
-- `tokens_limit`: Token budget limit
-- `time_elapsed_seconds`: Time elapsed in seconds
-- `time_limit_seconds`: Time budget limit (default: 600.0 seconds / 10 minutes)
-- `iterations`: Number of iterations completed
-- `iterations_limit`: Maximum iterations (default: 10)
-- `iteration_tokens`: Tokens used per iteration (iteration number -> token count)
-## See Also
-- [Architecture - Agents](../architecture/agents.md) - How models are used
-- [Configuration](../configuration/index.md) - Model configuration

docs/api/orchestrators.md DELETED Viewed

@@ -1,149 +0,0 @@
-# Orchestrators API Reference
-This page documents the API for DeepCritical orchestrators.
-## IterativeResearchFlow
-**Module**: `src.orchestrator.research_flow`
-**Purpose**: Single-loop research with search-judge-synthesize cycles.
-### Methods
-#### `run`
-<!--codeinclude-->
-[IterativeResearchFlow.run](../src/orchestrator/research_flow.py) start_line:134 end_line:140
-<!--/codeinclude-->
-Runs iterative research flow.
-**Parameters**:
-- `query`: Research query string
-- `background_context`: Background context (default: "")
-- `output_length`: Optional description of desired output length (default: "")
-- `output_instructions`: Optional additional instructions for report generation (default: "")
-- `message_history`: Optional user conversation history in Pydantic AI `ModelMessage` format (default: None)
-**Returns**: Final report string.
-**Note**: The `message_history` parameter enables multi-turn conversations by providing context from previous interactions.
-**Note**: `max_iterations`, `max_time_minutes`, and `token_budget` are constructor parameters, not `run()` parameters.
-## DeepResearchFlow
-**Module**: `src.orchestrator.research_flow`
-**Purpose**: Multi-section parallel research with planning and synthesis.
-### Methods
-#### `run`
-<!--codeinclude-->
-[DeepResearchFlow.run](../src/orchestrator/research_flow.py) start_line:778 end_line:778
-<!--/codeinclude-->
-Runs deep research flow.
-**Parameters**:
-- `query`: Research query string
-- `message_history`: Optional user conversation history in Pydantic AI `ModelMessage` format (default: None)
-**Returns**: Final report string.
-**Note**: The `message_history` parameter enables multi-turn conversations by providing context from previous interactions.
-**Note**: `max_iterations_per_section`, `max_time_minutes`, and `token_budget` are constructor parameters, not `run()` parameters.
-## GraphOrchestrator
-**Module**: `src.orchestrator.graph_orchestrator`
-**Purpose**: Graph-based execution using Pydantic AI agents as nodes.
-### Methods
-#### `run`
-<!--codeinclude-->
-[GraphOrchestrator.run](../src/orchestrator/graph_orchestrator.py) start_line:177 end_line:177
-<!--/codeinclude-->
-Runs graph-based research orchestration.
-**Parameters**:
-- `query`: Research query string
-- `message_history`: Optional user conversation history in Pydantic AI `ModelMessage` format (default: None)
-**Yields**: `AgentEvent` objects during graph execution.
-**Note**:
-- `research_mode` and `use_graph` are constructor parameters, not `run()` parameters.
-- The `message_history` parameter enables multi-turn conversations by providing context from previous interactions. Message history is stored in `GraphExecutionContext` and passed to agents during execution.
-## Orchestrator Factory
-**Module**: `src.orchestrator_factory`
-**Purpose**: Factory for creating orchestrators.
-### Functions
-#### `create_orchestrator`
-<!--codeinclude-->
-[create_orchestrator](../src/orchestrator_factory.py) start_line:44 end_line:50
-<!--/codeinclude-->
-Creates an orchestrator instance.
-**Parameters**:
-- `search_handler`: Search handler protocol implementation (optional, required for simple mode)
-- `judge_handler`: Judge handler protocol implementation (optional, required for simple mode)
-- `config`: Configuration object (optional)
-- `mode`: Orchestrator mode ("simple", "advanced", "magentic", "iterative", "deep", "auto", or None for auto-detect)
-- `oauth_token`: Optional OAuth token from HuggingFace login (takes priority over env vars)
-**Returns**: Orchestrator instance.
-**Raises**:
-- `ValueError`: If requirements not met
-**Modes**:
-- `"simple"`: Legacy orchestrator
-- `"advanced"` or `"magentic"`: Magentic orchestrator (requires OpenAI API key)
-- `None`: Auto-detect based on API key availability
-## MagenticOrchestrator
-**Module**: `src.orchestrator_magentic`
-**Purpose**: Multi-agent coordination using Microsoft Agent Framework.
-### Methods
-#### `run`
-<!--codeinclude-->
-[MagenticOrchestrator.run](../src/orchestrator_magentic.py) start_line:101 end_line:101
-<!--/codeinclude-->
-Runs Magentic orchestration.
-**Parameters**:
-- `query`: Research query string
-**Yields**: `AgentEvent` objects converted from Magentic events.
-**Note**: `max_rounds` and `max_stalls` are constructor parameters, not `run()` parameters.
-**Requirements**:
-- `agent-framework-core` package
-- OpenAI API key
-## See Also
-- [Architecture - Orchestrators](../architecture/orchestrators.md) - Architecture overview
-- [Graph Orchestration](../architecture/graph_orchestration.md) - Graph execution details

docs/api/services.md DELETED Viewed

@@ -1,279 +0,0 @@
-# Services API Reference
-This page documents the API for DeepCritical services.
-## EmbeddingService
-**Module**: `src.services.embeddings`
-**Purpose**: Local sentence-transformers for semantic search and deduplication.
-### Methods
-#### `embed`
-<!--codeinclude-->
-[EmbeddingService.embed](../src/services/embeddings.py) start_line:55 end_line:55
-<!--/codeinclude-->
-Generates embedding for a text string.
-**Parameters**:
-- `text`: Text to embed
-**Returns**: Embedding vector as list of floats.
-#### `embed_batch`
-```python
-async def embed_batch(self, texts: list[str]) -> list[list[float]]
-```
-Generates embeddings for multiple texts.
-**Parameters**:
-- `texts`: List of texts to embed
-**Returns**: List of embedding vectors.
-#### `similarity`
-```python
-async def similarity(self, text1: str, text2: str) -> float
-```
-Calculates similarity between two texts.
-**Parameters**:
-- `text1`: First text
-- `text2`: Second text
-**Returns**: Similarity score (0.0-1.0).
-#### `find_duplicates`
-```python
-async def find_duplicates(
-    self,
-    texts: list[str],
-    threshold: float = 0.85
-) -> list[tuple[int, int]]
-```
-Finds duplicate texts based on similarity threshold.
-**Parameters**:
-- `texts`: List of texts to check
-- `threshold`: Similarity threshold (default: 0.85)
-**Returns**: List of (index1, index2) tuples for duplicate pairs.
-#### `add_evidence`
-```python
-async def add_evidence(
-    self,
-    evidence_id: str,
-    content: str,
-    metadata: dict[str, Any]
-) -> None
-```
-Adds evidence to vector store for semantic search.
-**Parameters**:
-- `evidence_id`: Unique identifier for the evidence
-- `content`: Evidence text content
-- `metadata`: Additional metadata dictionary
-#### `search_similar`
-```python
-async def search_similar(
-    self,
-    query: str,
-    n_results: int = 5
-) -> list[dict[str, Any]]
-```
-Finds semantically similar evidence.
-**Parameters**:
-- `query`: Search query string
-- `n_results`: Number of results to return (default: 5)
-**Returns**: List of dictionaries with `id`, `content`, `metadata`, and `distance` keys.
-#### `deduplicate`
-```python
-async def deduplicate(
-    self,
-    new_evidence: list[Evidence],
-    threshold: float = 0.9
-) -> list[Evidence]
-```
-Removes semantically duplicate evidence.
-**Parameters**:
-- `new_evidence`: List of evidence items to deduplicate
-- `threshold`: Similarity threshold (default: 0.9, where 0.9 = 90% similar is duplicate)
-**Returns**: List of unique evidence items (not already in vector store).
-### Factory Function
-#### `get_embedding_service`
-```python
-@lru_cache(maxsize=1)
-def get_embedding_service() -> EmbeddingService
-```
-Returns singleton EmbeddingService instance.
-## LlamaIndexRAGService
-**Module**: `src.services.rag`
-**Purpose**: Retrieval-Augmented Generation using LlamaIndex.
-### Methods
-#### `ingest_evidence`
-<!--codeinclude-->
-[LlamaIndexRAGService.ingest_evidence](../src/services/llamaindex_rag.py) start_line:290 end_line:290
-<!--/codeinclude-->
-Ingests evidence into RAG service.
-**Parameters**:
-- `evidence_list`: List of Evidence objects to ingest
-**Note**: Supports multiple embedding providers (OpenAI, local sentence-transformers, Hugging Face).
-#### `retrieve`
-```python
-def retrieve(
-    self,
-    query: str,
-    top_k: int | None = None
-) -> list[dict[str, Any]]
-```
-Retrieves relevant documents for a query.
-**Parameters**:
-- `query`: Search query string
-- `top_k`: Number of top results to return (defaults to `similarity_top_k` from constructor)
-**Returns**: List of dictionaries with `text`, `score`, and `metadata` keys.
-#### `query`
-```python
-def query(
-    self,
-    query_str: str,
-    top_k: int | None = None
-) -> str
-```
-Queries RAG service and returns synthesized response.
-**Parameters**:
-- `query_str`: Query string
-- `top_k`: Number of results to use (defaults to `similarity_top_k` from constructor)
-**Returns**: Synthesized response string.
-**Raises**:
-- `ConfigurationError`: If no LLM API key is available for query synthesis
-#### `ingest_documents`
-```python
-def ingest_documents(self, documents: list[Any]) -> None
-```
-Ingests raw LlamaIndex Documents.
-**Parameters**:
-- `documents`: List of LlamaIndex Document objects
-#### `clear_collection`
-```python
-def clear_collection(self) -> None
-```
-Clears all documents from the collection.
-### Factory Function
-#### `get_rag_service`
-```python
-def get_rag_service(
-    collection_name: str = "deepcritical_evidence",
-    oauth_token: str | None = None,
-    **kwargs: Any
-) -> LlamaIndexRAGService
-```
-Get or create a RAG service instance.
-**Parameters**:
-- `collection_name`: Name of the ChromaDB collection (default: "deepcritical_evidence")
-- `oauth_token`: Optional OAuth token from HuggingFace login (takes priority over env vars)
-- `**kwargs`: Additional arguments for LlamaIndexRAGService (e.g., `use_openai_embeddings=False`)
-**Returns**: Configured LlamaIndexRAGService instance.
-**Note**: By default, uses local embeddings (sentence-transformers) which require no API keys.
-## StatisticalAnalyzer
-**Module**: `src.services.statistical_analyzer`
-**Purpose**: Secure execution of AI-generated statistical code.
-### Methods
-#### `analyze`
-```python
-async def analyze(
-    self,
-    query: str,
-    evidence: list[Evidence],
-    hypothesis: dict[str, Any] | None = None
-) -> AnalysisResult
-```
-Analyzes a research question using statistical methods.
-**Parameters**:
-- `query`: The research question
-- `evidence`: List of Evidence objects to analyze
-- `hypothesis`: Optional hypothesis dict with `drug`, `target`, `pathway`, `effect`, `confidence` keys
-**Returns**: `AnalysisResult` with:
-- `verdict`: SUPPORTED, REFUTED, or INCONCLUSIVE
-- `confidence`: Confidence in verdict (0.0-1.0)
-- `statistical_evidence`: Summary of statistical findings
-- `code_generated`: Python code that was executed
-- `execution_output`: Output from code execution
-- `key_takeaways`: Key takeaways from analysis
-- `limitations`: List of limitations
-**Note**: Requires Modal credentials for sandbox execution.
-## See Also
-- [Architecture - Services](../architecture/services.md) - Architecture overview
-- [Configuration](../configuration/index.md) - Service configuration

docs/api/tools.md DELETED Viewed

@@ -1,259 +0,0 @@
-# Tools API Reference
-This page documents the API for DeepCritical search tools.
-## SearchTool Protocol
-All tools implement the `SearchTool` protocol:
-```python
-class SearchTool(Protocol):
-    @property
-    def name(self) -> str: ...
-    async def search(
-        self,
-        query: str,
-        max_results: int = 10
-    ) -> list[Evidence]: ...
-```
-## PubMedTool
-**Module**: `src.tools.pubmed`
-**Purpose**: Search peer-reviewed biomedical literature from PubMed.
-### Properties
-#### `name`
-```python
-@property
-def name(self) -> str
-```
-Returns tool name: `"pubmed"`
-### Methods
-#### `search`
-```python
-async def search(
-    self,
-    query: str,
-    max_results: int = 10
-) -> list[Evidence]
-```
-Searches PubMed for articles.
-**Parameters**:
-- `query`: Search query string
-- `max_results`: Maximum number of results to return (default: 10)
-**Returns**: List of `Evidence` objects with PubMed articles.
-**Raises**:
-- `SearchError`: If search fails (timeout, HTTP error, XML parsing error)
-- `RateLimitError`: If rate limit is exceeded (429 status code)
-**Note**: Uses NCBI E-utilities (ESearch → EFetch). Rate limit: 0.34s between requests. Handles single vs. multiple articles.
-## ClinicalTrialsTool
-**Module**: `src.tools.clinicaltrials`
-**Purpose**: Search ClinicalTrials.gov for interventional studies.
-### Properties
-#### `name`
-```python
-@property
-def name(self) -> str
-```
-Returns tool name: `"clinicaltrials"`
-### Methods
-#### `search`
-```python
-async def search(
-    self,
-    query: str,
-    max_results: int = 10
-) -> list[Evidence]
-```
-Searches ClinicalTrials.gov for trials.
-**Parameters**:
-- `query`: Search query string
-- `max_results`: Maximum number of results to return (default: 10)
-**Returns**: List of `Evidence` objects with clinical trials.
-**Note**: Only returns interventional studies with status: COMPLETED, ACTIVE_NOT_RECRUITING, RECRUITING, ENROLLING_BY_INVITATION. Uses `requests` library (NOT httpx - WAF blocks httpx). Runs in thread pool for async compatibility.
-**Raises**:
-- `SearchError`: If search fails (HTTP error, request exception)
-## EuropePMCTool
-**Module**: `src.tools.europepmc`
-**Purpose**: Search Europe PMC for preprints and peer-reviewed articles.
-### Properties
-#### `name`
-```python
-@property
-def name(self) -> str
-```
-Returns tool name: `"europepmc"`
-### Methods
-#### `search`
-```python
-async def search(
-    self,
-    query: str,
-    max_results: int = 10
-) -> list[Evidence]
-```
-Searches Europe PMC for articles and preprints.
-**Parameters**:
-- `query`: Search query string
-- `max_results`: Maximum number of results to return (default: 10)
-**Returns**: List of `Evidence` objects with articles/preprints.
-**Note**: Includes both preprints (marked with `[PREPRINT - Not peer-reviewed]`) and peer-reviewed articles. Handles preprint markers. Builds URLs from DOI or PMID.
-**Raises**:
-- `SearchError`: If search fails (HTTP error, connection error)
-## RAGTool
-**Module**: `src.tools.rag_tool`
-**Purpose**: Semantic search within collected evidence.
-### Initialization
-```python
-def __init__(
-    self,
-    rag_service: LlamaIndexRAGService | None = None,
-    oauth_token: str | None = None
-) -> None
-```
-**Parameters**:
-- `rag_service`: Optional RAG service instance. If None, will be lazy-initialized.
-- `oauth_token`: Optional OAuth token from HuggingFace login (for RAG LLM)
-### Properties
-#### `name`
-```python
-@property
-def name(self) -> str
-```
-Returns tool name: `"rag"`
-### Methods
-#### `search`
-```python
-async def search(
-    self,
-    query: str,
-    max_results: int = 10
-) -> list[Evidence]
-```
-Searches collected evidence using semantic similarity.
-**Parameters**:
-- `query`: Search query string
-- `max_results`: Maximum number of results to return (default: 10)
-**Returns**: List of `Evidence` objects from collected evidence.
-**Raises**:
-- `ConfigurationError`: If RAG service is unavailable
-**Note**: Requires evidence to be ingested into RAG service first. Wraps `LlamaIndexRAGService`. Returns Evidence from RAG results.
-## SearchHandler
-**Module**: `src.tools.search_handler`
-**Purpose**: Orchestrates parallel searches across multiple tools.
-### Initialization
-```python
-def __init__(
-    self,
-    tools: list[SearchTool],
-    timeout: float = 30.0,
-    include_rag: bool = False,
-    auto_ingest_to_rag: bool = True,
-    oauth_token: str | None = None
-) -> None
-```
-**Parameters**:
-- `tools`: List of search tools to use
-- `timeout`: Timeout for each search in seconds (default: 30.0)
-- `include_rag`: Whether to include RAG tool in searches (default: False)
-- `auto_ingest_to_rag`: Whether to automatically ingest results into RAG (default: True)
-- `oauth_token`: Optional OAuth token from HuggingFace login (for RAG LLM)
-### Methods
-#### `execute`
-<!--codeinclude-->
-[SearchHandler.execute](../src/tools/search_handler.py) start_line:86 end_line:86
-<!--/codeinclude-->
-Searches multiple tools in parallel.
-**Parameters**:
-- `query`: Search query string
-- `max_results_per_tool`: Maximum results per tool (default: 10)
-**Returns**: `SearchResult` with:
-- `query`: The search query
-- `evidence`: Aggregated list of evidence
-- `sources_searched`: List of source names searched
-- `total_found`: Total number of results
-- `errors`: List of error messages from failed tools
-**Raises**:
-- `SearchError`: If search times out
-**Note**: Uses `asyncio.gather()` for parallel execution. Handles tool failures gracefully (returns errors in `SearchResult.errors`). Automatically ingests evidence into RAG if enabled.
-## See Also
-- [Architecture - Tools](../architecture/tools.md) - Architecture overview
-- [Models API](models.md) - Data models used by tools

docs/architecture/agents.md DELETED Viewed

@@ -1,293 +0,0 @@
-# Agents Architecture
-DeepCritical uses Pydantic AI agents for all AI-powered operations. All agents follow a consistent pattern and use structured output types.
-## Agent Pattern
-### Pydantic AI Agents
-Pydantic AI agents use the `Agent` class with the following structure:
-- **System Prompt**: Module-level constant with date injection
-- **Agent Class**: `__init__(model: Any | None = None)`
-- **Main Method**: Async method (e.g., `async def evaluate()`, `async def write_report()`)
-- **Factory Function**: `def create_agent_name(model: Any | None = None, oauth_token: str | None = None) -> AgentName`
-**Note**: Factory functions accept an optional `oauth_token` parameter for HuggingFace authentication, which takes priority over environment variables.
-## Model Initialization
-Agents use `get_model()` from `src/agent_factory/judges.py` if no model is provided. This supports:
-- OpenAI models
-- Anthropic models
-- HuggingFace Inference API models
-The model selection is based on the configured `LLM_PROVIDER` in settings.
-## Error Handling
-Agents return fallback values on failure rather than raising exceptions:
-- `KnowledgeGapOutput(research_complete=False, outstanding_gaps=[...])`
-- Empty strings for text outputs
-- Default structured outputs
-All errors are logged with context using structlog.
-## Input Validation
-All agents validate inputs:
-- Check that queries/inputs are not empty
-- Truncate very long inputs with warnings
-- Handle None values gracefully
-## Output Types
-Agents use structured output types from `src/utils/models.py`:
-- `KnowledgeGapOutput`: Research completeness evaluation
-- `AgentSelectionPlan`: Tool selection plan
-- `ReportDraft`: Long-form report structure
-- `ParsedQuery`: Query parsing and mode detection
-For text output (writer agents), agents return `str` directly.
-## Agent Types
-### Knowledge Gap Agent
-**File**: `src/agents/knowledge_gap.py`
-**Purpose**: Evaluates research state and identifies knowledge gaps.
-**Output**: `KnowledgeGapOutput` with:
-- `research_complete`: Boolean indicating if research is complete
-- `outstanding_gaps`: List of remaining knowledge gaps
-**Methods**:
-- `async def evaluate(query, background_context, conversation_history, iteration, time_elapsed_minutes, max_time_minutes) -> KnowledgeGapOutput`
-### Tool Selector Agent
-**File**: `src/agents/tool_selector.py`
-**Purpose**: Selects appropriate tools for addressing knowledge gaps.
-**Output**: `AgentSelectionPlan` with list of `AgentTask` objects.
-**Available Agents**:
-- `WebSearchAgent`: General web search for fresh information
-- `SiteCrawlerAgent`: Research specific entities/companies
-- `RAGAgent`: Semantic search within collected evidence
-### Writer Agent
-**File**: `src/agents/writer.py`
-**Purpose**: Generates final reports from research findings.
-**Output**: Markdown string with numbered citations.
-**Methods**:
-- `async def write_report(query, findings, output_length, output_instructions) -> str`
-**Features**:
-- Validates inputs
-- Truncates very long findings (max 50000 chars) with warning
-- Retry logic for transient failures (3 retries)
-- Citation validation before returning
-### Long Writer Agent
-**File**: `src/agents/long_writer.py`
-**Purpose**: Long-form report generation with section-by-section writing.
-**Input/Output**: Uses `ReportDraft` models.
-**Methods**:
-- `async def write_next_section(query, draft, section_title, section_content) -> LongWriterOutput`
-- `async def write_report(query, report_title, report_draft) -> str`
-**Features**:
-- Writes sections iteratively
-- Aggregates references across sections
-- Reformats section headings and references
-- Deduplicates and renumbers references
-### Proofreader Agent
-**File**: `src/agents/proofreader.py`
-**Purpose**: Proofreads and polishes report drafts.
-**Input**: `ReportDraft`
-**Output**: Polished markdown string
-**Methods**:
-- `async def proofread(query, report_title, report_draft) -> str`
-**Features**:
-- Removes duplicate content across sections
-- Adds executive summary if multiple sections
-- Preserves all references and citations
-- Improves flow and readability
-### Thinking Agent
-**File**: `src/agents/thinking.py`
-**Purpose**: Generates observations from conversation history.
-**Output**: Observation string
-**Methods**:
-- `async def generate_observations(query, background_context, conversation_history) -> str`
-### Input Parser Agent
-**File**: `src/agents/input_parser.py`
-**Purpose**: Parses and improves user queries, detects research mode.
-**Output**: `ParsedQuery` with:
-- `original_query`: Original query string
-- `improved_query`: Refined query string
-- `research_mode`: "iterative" or "deep"
-- `key_entities`: List of key entities
-- `research_questions`: List of research questions
-## Magentic Agents
-The following agents use the `BaseAgent` pattern from `agent-framework` and are used exclusively with `MagenticOrchestrator`:
-### Hypothesis Agent
-**File**: `src/agents/hypothesis_agent.py`
-**Purpose**: Generates mechanistic hypotheses based on evidence.
-**Pattern**: `BaseAgent` from `agent-framework`
-**Methods**:
-- `async def run(messages, thread, **kwargs) -> AgentRunResponse`
-**Features**:
-- Uses internal Pydantic AI `Agent` with `HypothesisAssessment` output type
-- Accesses shared `evidence_store` for evidence
-- Uses embedding service for diverse evidence selection (MMR algorithm)
-- Stores hypotheses in shared context
-### Search Agent
-**File**: `src/agents/search_agent.py`
-**Purpose**: Wraps `SearchHandler` as an agent for Magentic orchestrator.
-**Pattern**: `BaseAgent` from `agent-framework`
-**Methods**:
-- `async def run(messages, thread, **kwargs) -> AgentRunResponse`
-**Features**:
-- Executes searches via `SearchHandlerProtocol`
-- Deduplicates evidence using embedding service
-- Searches for semantically related evidence
-- Updates shared evidence store
-### Analysis Agent
-**File**: `src/agents/analysis_agent.py`
-**Purpose**: Performs statistical analysis using Modal sandbox.
-**Pattern**: `BaseAgent` from `agent-framework`
-**Methods**:
-- `async def run(messages, thread, **kwargs) -> AgentRunResponse`
-**Features**:
-- Wraps `StatisticalAnalyzer` service
-- Analyzes evidence and hypotheses
-- Returns verdict (SUPPORTED/REFUTED/INCONCLUSIVE)
-- Stores analysis results in shared context
-### Report Agent (Magentic)
-**File**: `src/agents/report_agent.py`
-**Purpose**: Generates structured scientific reports from evidence and hypotheses.
-**Pattern**: `BaseAgent` from `agent-framework`
-**Methods**:
-- `async def run(messages, thread, **kwargs) -> AgentRunResponse`
-**Features**:
-- Uses internal Pydantic AI `Agent` with `ResearchReport` output type
-- Accesses shared evidence store and hypotheses
-- Validates citations before returning
-- Formats report as markdown
-### Judge Agent
-**File**: `src/agents/judge_agent.py`
-**Purpose**: Evaluates evidence quality and determines if sufficient for synthesis.
-**Pattern**: `BaseAgent` from `agent-framework`
-**Methods**:
-- `async def run(messages, thread, **kwargs) -> AgentRunResponse`
-- `async def run_stream(messages, thread, **kwargs) -> AsyncIterable[AgentRunResponseUpdate]`
-**Features**:
-- Wraps `JudgeHandlerProtocol`
-- Accesses shared evidence store
-- Returns `JudgeAssessment` with sufficient flag, confidence, and recommendation
-## Agent Patterns
-DeepCritical uses two distinct agent patterns:
-### 1. Pydantic AI Agents (Traditional Pattern)
-These agents use the Pydantic AI `Agent` class directly and are used in iterative and deep research flows:
-- **Pattern**: `Agent(model, output_type, system_prompt)`
-- **Initialization**: `__init__(model: Any | None = None)`
-- **Methods**: Agent-specific async methods (e.g., `async def evaluate()`, `async def write_report()`)
-- **Examples**: `KnowledgeGapAgent`, `ToolSelectorAgent`, `WriterAgent`, `LongWriterAgent`, `ProofreaderAgent`, `ThinkingAgent`, `InputParserAgent`
-### 2. Magentic Agents (Agent-Framework Pattern)
-These agents use the `BaseAgent` class from `agent-framework` and are used in Magentic orchestrator:
-- **Pattern**: `BaseAgent` from `agent-framework` with `async def run()` method
-- **Initialization**: `__init__(evidence_store, embedding_service, ...)`
-- **Methods**: `async def run(messages, thread, **kwargs) -> AgentRunResponse`
-- **Examples**: `HypothesisAgent`, `SearchAgent`, `AnalysisAgent`, `ReportAgent`, `JudgeAgent`
-**Note**: Magentic agents are used exclusively with the `MagenticOrchestrator` and follow the agent-framework protocol for multi-agent coordination.
-## Factory Functions
-All agents have factory functions in `src/agent_factory/agents.py`:
-<!--codeinclude-->
-[Factory Functions](../src/agent_factory/agents.py) start_line:79 end_line:100
-<!--/codeinclude-->
-Factory functions:
-- Use `get_model()` if no model provided
-- Accept `oauth_token` parameter for HuggingFace authentication
-- Raise `ConfigurationError` if creation fails
-- Log agent creation
-## See Also
-- [Orchestrators](orchestrators.md) - How agents are orchestrated
-- [API Reference - Agents](../api/agents.md) - API documentation
-- [Contributing - Code Style](../contributing/code-style.md) - Development guidelines

docs/architecture/graph_orchestration.md DELETED Viewed

@@ -1,302 +0,0 @@
-# Graph Orchestration Architecture
-## Overview
-DeepCritical implements a graph-based orchestration system for research workflows using Pydantic AI agents as nodes. This enables better parallel execution, conditional routing, and state management compared to simple agent chains.
-## Conversation History
-DeepCritical supports multi-turn conversations through Pydantic AI's native message history format. The system maintains two types of history:
-1. **User Conversation History**: Multi-turn user interactions (from Gradio chat interface) stored as `list[ModelMessage]`
-2. **Research Iteration History**: Internal research process state (existing `Conversation` model)
-### Message History Flow
-```
-Gradio Chat History → convert_gradio_to_message_history() → GraphOrchestrator.run(message_history)
-    ↓
-GraphExecutionContext (stores message_history)
-    ↓
-Agent Nodes (receive message_history via agent.run())
-    ↓
-WorkflowState (persists user_message_history)
-```
-### Usage
-Message history is automatically converted from Gradio format and passed through the orchestrator:
-```python
-# In app.py - automatic conversion
-message_history = convert_gradio_to_message_history(history) if history else None
-async for event in orchestrator.run(query, message_history=message_history):
-    yield event
-```
-Agents receive message history through their `run()` methods:
-```python
-# In agent execution
-if message_history:
-    result = await agent.run(input_data, message_history=message_history)
-```
-## Graph Patterns
-### Iterative Research Graph
-The iterative research graph follows this pattern:
-```
-[Input] → [Thinking] → [Knowledge Gap] → [Decision: Complete?]
-                                              ↓ No          ↓ Yes
-                                    [Tool Selector]    [Writer]
-                                              ↓
-                                    [Execute Tools] → [Loop Back]
-```
-**Node IDs**: `thinking` → `knowledge_gap` → `continue_decision` → `tool_selector`/`writer` → `execute_tools` → (loop back to `thinking`)
-**Special Node Handling**:
-- `execute_tools`: State node that uses `search_handler` to execute searches and add evidence to workflow state
-- `continue_decision`: Decision node that routes based on `research_complete` flag from `KnowledgeGapOutput`
-### Deep Research Graph
-The deep research graph follows this pattern:
-```
-[Input] → [Planner] → [Store Plan] → [Parallel Loops] → [Collect Drafts] → [Synthesizer]
-                                        ↓         ↓         ↓
-                                     [Loop1]  [Loop2]  [Loop3]
-```
-**Node IDs**: `planner` → `store_plan` → `parallel_loops` → `collect_drafts` → `synthesizer`
-**Special Node Handling**:
-- `planner`: Agent node that creates `ReportPlan` with report outline
-- `store_plan`: State node that stores `ReportPlan` in context for parallel loops
-- `parallel_loops`: Parallel node that executes `IterativeResearchFlow` instances for each section
-- `collect_drafts`: State node that collects section drafts from parallel loops
-- `synthesizer`: Agent node that calls `LongWriterAgent.write_report()` directly with `ReportDraft`
-### Deep Research
-```mermaid
-sequenceDiagram
-    actor User
-    participant GraphOrchestrator
-    participant InputParser
-    participant GraphBuilder
-    participant GraphExecutor
-    participant Agent
-    participant BudgetTracker
-    participant WorkflowState
-    User->>GraphOrchestrator: run(query)
-    GraphOrchestrator->>InputParser: detect_research_mode(query)
-    InputParser-->>GraphOrchestrator: mode (iterative/deep)
-    GraphOrchestrator->>GraphBuilder: build_graph(mode)
-    GraphBuilder-->>GraphOrchestrator: ResearchGraph
-    GraphOrchestrator->>WorkflowState: init_workflow_state()
-    GraphOrchestrator->>BudgetTracker: create_budget()
-    GraphOrchestrator->>GraphExecutor: _execute_graph(graph)
-    loop For each node in graph
-        GraphExecutor->>Agent: execute_node(agent_node)
-        Agent->>Agent: process_input
-        Agent-->>GraphExecutor: result
-        GraphExecutor->>WorkflowState: update_state(result)
-        GraphExecutor->>BudgetTracker: add_tokens(used)
-        GraphExecutor->>BudgetTracker: check_budget()
-        alt Budget exceeded
-            GraphExecutor->>GraphOrchestrator: emit(error_event)
-        else Continue
-            GraphExecutor->>GraphOrchestrator: emit(progress_event)
-        end
-    end
-    GraphOrchestrator->>User: AsyncGenerator[AgentEvent]
-```
-### Iterative Research
-```mermaid
-sequenceDiagram
-    participant IterativeFlow
-    participant ThinkingAgent
-    participant KnowledgeGapAgent
-    participant ToolSelector
-    participant ToolExecutor
-    participant JudgeHandler
-    participant WriterAgent
-    IterativeFlow->>IterativeFlow: run(query)
-    loop Until complete or max_iterations
-        IterativeFlow->>ThinkingAgent: generate_observations()
-        ThinkingAgent-->>IterativeFlow: observations
-        IterativeFlow->>KnowledgeGapAgent: evaluate_gaps()
-        KnowledgeGapAgent-->>IterativeFlow: KnowledgeGapOutput
-        alt Research complete
-            IterativeFlow->>WriterAgent: create_final_report()
-            WriterAgent-->>IterativeFlow: final_report
-        else Gaps remain
-            IterativeFlow->>ToolSelector: select_agents(gap)
-            ToolSelector-->>IterativeFlow: AgentSelectionPlan
-            IterativeFlow->>ToolExecutor: execute_tool_tasks()
-            ToolExecutor-->>IterativeFlow: ToolAgentOutput[]
-            IterativeFlow->>JudgeHandler: assess_evidence()
-            JudgeHandler-->>IterativeFlow: should_continue
-        end
-    end
-```
-## Graph Structure
-### Nodes
-Graph nodes represent different stages in the research workflow:
-1. **Agent Nodes**: Execute Pydantic AI agents
-   - Input: Prompt/query
-   - Output: Structured or unstructured response
-   - Examples: `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`
-2. **State Nodes**: Update or read workflow state
-   - Input: Current state
-   - Output: Updated state
-   - Examples: Update evidence, update conversation history
-3. **Decision Nodes**: Make routing decisions based on conditions
-   - Input: Current state/results
-   - Output: Next node ID
-   - Examples: Continue research vs. complete research
-4. **Parallel Nodes**: Execute multiple nodes concurrently
-   - Input: List of node IDs
-   - Output: Aggregated results
-   - Examples: Parallel iterative research loops
-### Edges
-Edges define transitions between nodes:
-1. **Sequential Edges**: Always traversed (no condition)
-   - From: Source node
-   - To: Target node
-   - Condition: None (always True)
-2. **Conditional Edges**: Traversed based on condition
-   - From: Source node
-   - To: Target node
-   - Condition: Callable that returns bool
-   - Example: If research complete → go to writer, else → continue loop
-3. **Parallel Edges**: Used for parallel execution branches
-   - From: Parallel node
-   - To: Multiple target nodes
-   - Execution: All targets run concurrently
-## State Management
-State is managed via `WorkflowState` using `ContextVar` for thread-safe isolation:
-- **Evidence**: Collected evidence from searches
-- **Conversation**: Iteration history (gaps, tool calls, findings, thoughts)
-- **Embedding Service**: For semantic search
-State transitions occur at state nodes, which update the global workflow state.
-## Execution Flow
-1. **Graph Construction**: Build graph from nodes and edges using `create_iterative_graph()` or `create_deep_graph()`
-2. **Graph Validation**: Ensure graph is valid (no cycles, all nodes reachable) via `ResearchGraph.validate_structure()`
-3. **Graph Execution**: Traverse graph from entry node using `GraphOrchestrator._execute_graph()`
-4. **Node Execution**: Execute each node based on type:
-   - **Agent Nodes**: Call `agent.run()` with transformed input
-   - **State Nodes**: Update workflow state via `state_updater` function
-   - **Decision Nodes**: Evaluate `decision_function` to get next node ID
-   - **Parallel Nodes**: Execute all parallel nodes concurrently via `asyncio.gather()`
-5. **Edge Evaluation**: Determine next node(s) based on edges and conditions
-6. **Parallel Execution**: Use `asyncio.gather()` for parallel nodes
-7. **State Updates**: Update state at state nodes via `GraphExecutionContext.update_state()`
-8. **Event Streaming**: Yield `AgentEvent` objects during execution for UI
-### GraphExecutionContext
-The `GraphExecutionContext` class manages execution state during graph traversal:
-- **State**: Current `WorkflowState` instance
-- **Budget Tracker**: `BudgetTracker` instance for budget enforcement
-- **Node Results**: Dictionary storing results from each node execution
-- **Visited Nodes**: Set of node IDs that have been executed
-- **Current Node**: ID of the node currently being executed
-Methods:
-- `set_node_result(node_id, result)`: Store result from node execution
-- `get_node_result(node_id)`: Retrieve stored result
-- `has_visited(node_id)`: Check if node was visited
-- `mark_visited(node_id)`: Mark node as visited
-- `update_state(updater, data)`: Update workflow state
-## Conditional Routing
-Decision nodes evaluate conditions and return next node IDs:
-- **Knowledge Gap Decision**: If `research_complete` → writer, else → tool selector
-- **Budget Decision**: If budget exceeded → exit, else → continue
-- **Iteration Decision**: If max iterations → exit, else → continue
-## Parallel Execution
-Parallel nodes execute multiple nodes concurrently:
-- Each parallel branch runs independently
-- Results are aggregated after all branches complete
-- State is synchronized after parallel execution
-- Errors in one branch don't stop other branches
-## Budget Enforcement
-Budget constraints are enforced at decision nodes:
-- **Token Budget**: Track LLM token usage
-- **Time Budget**: Track elapsed time
-- **Iteration Budget**: Track iteration count
-If any budget is exceeded, execution routes to exit node.
-## Error Handling
-Errors are handled at multiple levels:
-1. **Node Level**: Catch errors in individual node execution
-2. **Graph Level**: Handle errors during graph traversal
-3. **State Level**: Rollback state changes on error
-Errors are logged and yield error events for UI.
-## Backward Compatibility
-Graph execution is optional via feature flag:
-- `USE_GRAPH_EXECUTION=true`: Use graph-based execution
-- `USE_GRAPH_EXECUTION=false`: Use agent chain execution (existing)
-This allows gradual migration and fallback if needed.
-## See Also
-- [Orchestrators](orchestrators.md) - Overview of all orchestrator patterns
-- [Workflow Diagrams](workflow-diagrams.md) - Detailed workflow diagrams
-- [API Reference - Orchestrators](../api/orchestrators.md) - API documentation

docs/architecture/middleware.md DELETED Viewed

@@ -1,146 +0,0 @@
-# Middleware Architecture
-DeepCritical uses middleware for state management, budget tracking, and workflow coordination.
-## State Management
-### WorkflowState
-**File**: `src/middleware/state_machine.py`
-**Purpose**: Thread-safe state management for research workflows
-**Implementation**: Uses `ContextVar` for thread-safe isolation
-**State Components**:
-- `evidence: list[Evidence]`: Collected evidence from searches
-- `conversation: Conversation`: Iteration history (gaps, tool calls, findings, thoughts)
-- `embedding_service: Any`: Embedding service for semantic search
-**Methods**:
-- `add_evidence(new_evidence: list[Evidence]) -> int`: Adds evidence with URL-based deduplication. Returns the number of new items added (excluding duplicates).
-- `async search_related(query: str, n_results: int = 5) -> list[Evidence]`: Semantic search for related evidence using embedding service
-**Initialization**:
-<!--codeinclude-->
-[Initialize Workflow State](../src/middleware/state_machine.py) start_line:98 end_line:110
-<!--/codeinclude-->
-**Access**:
-<!--codeinclude-->
-[Get Workflow State](../src/middleware/state_machine.py) start_line:115 end_line:129
-<!--/codeinclude-->
-## Workflow Manager
-**File**: `src/middleware/workflow_manager.py`
-**Purpose**: Coordinates parallel research loops
-**Methods**:
-- `async add_loop(loop_id: str, query: str) -> ResearchLoop`: Add a new research loop to manage
-- `async run_loops_parallel(loop_configs: list[dict], loop_func: Callable, judge_handler: Any | None = None, budget_tracker: Any | None = None) -> list[Any]`: Run multiple research loops in parallel. Takes configuration dicts and a loop function.
-- `async update_loop_status(loop_id: str, status: LoopStatus, error: str | None = None)`: Update loop status
-- `async sync_loop_evidence_to_state(loop_id: str)`: Synchronize evidence from a specific loop to global state
-**Features**:
-- Uses `asyncio.gather()` for parallel execution
-- Handles errors per loop (doesn't fail all if one fails)
-- Tracks loop status: `pending`, `running`, `completed`, `failed`, `cancelled`
-- Evidence deduplication across parallel loops
-**Usage**:
-```python
-from src.middleware.workflow_manager import WorkflowManager
-manager = WorkflowManager()
-await manager.add_loop("loop1", "Research query 1")
-await manager.add_loop("loop2", "Research query 2")
-async def run_research(config: dict) -> str:
-    loop_id = config["loop_id"]
-    query = config["query"]
-    # ... research logic ...
-    return "report"
-results = await manager.run_loops_parallel(
-    loop_configs=[
-        {"loop_id": "loop1", "query": "Research query 1"},
-        {"loop_id": "loop2", "query": "Research query 2"},
-    ],
-    loop_func=run_research,
-)
-```
-## Budget Tracker
-**File**: `src/middleware/budget_tracker.py`
-**Purpose**: Tracks and enforces resource limits
-**Budget Components**:
-- **Tokens**: LLM token usage
-- **Time**: Elapsed time in seconds
-- **Iterations**: Number of iterations
-**Methods**:
-- `create_budget(loop_id: str, tokens_limit: int = 100000, time_limit_seconds: float = 600.0, iterations_limit: int = 10) -> BudgetStatus`: Create a budget for a specific loop
-- `add_tokens(loop_id: str, tokens: int)`: Add token usage to a loop's budget
-- `start_timer(loop_id: str)`: Start time tracking for a loop
-- `update_timer(loop_id: str)`: Update elapsed time for a loop
-- `increment_iteration(loop_id: str)`: Increment iteration count for a loop
-- `check_budget(loop_id: str) -> tuple[bool, str]`: Check if a loop's budget has been exceeded. Returns (exceeded: bool, reason: str)
-- `can_continue(loop_id: str) -> bool`: Check if a loop can continue based on budget
-**Token Estimation**:
-- `estimate_tokens(text: str) -> int`: ~4 chars per token
-- `estimate_llm_call_tokens(prompt: str, response: str) -> int`: Estimate LLM call tokens
-**Usage**:
-```python
-from src.middleware.budget_tracker import BudgetTracker
-tracker = BudgetTracker()
-budget = tracker.create_budget(
-    loop_id="research_loop",
-    tokens_limit=100000,
-    time_limit_seconds=600,
-    iterations_limit=10
-)
-tracker.start_timer("research_loop")
-# ... research operations ...
-tracker.add_tokens("research_loop", 5000)
-tracker.update_timer("research_loop")
-exceeded, reason = tracker.check_budget("research_loop")
-if exceeded:
-    # Budget exceeded, stop research
-    pass
-if not tracker.can_continue("research_loop"):
-    # Budget exceeded, stop research
-    pass
-```
-## Models
-All middleware models are defined in `src/utils/models.py`:
-- `IterationData`: Data for a single iteration
-- `Conversation`: Conversation history with iterations
-- `ResearchLoop`: Research loop state and configuration
-- `BudgetStatus`: Current budget status
-## Thread Safety
-All middleware components use `ContextVar` for thread-safe isolation:
-- Each request/thread has its own workflow state
-- No global mutable state
-- Safe for concurrent requests
-## See Also
-- [Orchestrators](orchestrators.md) - How middleware is used in orchestration
-- [API Reference - Orchestrators](../api/orchestrators.md) - API documentation
-- [Contributing - Code Style](../contributing/code-style.md) - Development guidelines

docs/architecture/orchestrators.md DELETED Viewed

@@ -1,201 +0,0 @@
-# Orchestrators Architecture
-DeepCritical supports multiple orchestration patterns for research workflows.
-## Research Flows
-### IterativeResearchFlow
-**File**: `src/orchestrator/research_flow.py`
-**Pattern**: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete
-**Agents Used**:
-- `KnowledgeGapAgent`: Evaluates research completeness
-- `ToolSelectorAgent`: Selects tools for addressing gaps
-- `ThinkingAgent`: Generates observations
-- `WriterAgent`: Creates final report
-- `JudgeHandler`: Assesses evidence sufficiency
-**Features**:
-- Tracks iterations, time, budget
-- Supports graph execution (`use_graph=True`) and agent chains (`use_graph=False`)
-- Iterates until research complete or constraints met
-**Usage**:
-<!--codeinclude-->
-[IterativeResearchFlow Initialization](../src/orchestrator/research_flow.py) start_line:57 end_line:80
-<!--/codeinclude-->
-### DeepResearchFlow
-**File**: `src/orchestrator/research_flow.py`
-**Pattern**: Planner → Parallel iterative loops per section → Synthesizer
-**Agents Used**:
-- `PlannerAgent`: Breaks query into report sections
-- `IterativeResearchFlow`: Per-section research (parallel)
-- `LongWriterAgent` or `ProofreaderAgent`: Final synthesis
-**Features**:
-- Uses `WorkflowManager` for parallel execution
-- Budget tracking per section and globally
-- State synchronization across parallel loops
-- Supports graph execution and agent chains
-**Usage**:
-<!--codeinclude-->
-[DeepResearchFlow Initialization](../src/orchestrator/research_flow.py) start_line:709 end_line:728
-<!--/codeinclude-->
-## Graph Orchestrator
-**File**: `src/orchestrator/graph_orchestrator.py`
-**Purpose**: Graph-based execution using Pydantic AI agents as nodes
-**Features**:
-- Uses graph execution (`use_graph=True`) or agent chains (`use_graph=False`) as fallback
-- Routes based on research mode (iterative/deep/auto)
-- Streams `AgentEvent` objects for UI
-- Uses `GraphExecutionContext` to manage execution state
-**Node Types**:
-- **Agent Nodes**: Execute Pydantic AI agents
-- **State Nodes**: Update or read workflow state
-- **Decision Nodes**: Make routing decisions
-- **Parallel Nodes**: Execute multiple nodes concurrently
-**Edge Types**:
-- **Sequential Edges**: Always traversed
-- **Conditional Edges**: Traversed based on condition
-- **Parallel Edges**: Used for parallel execution branches
-**Special Node Handling**:
-The `GraphOrchestrator` has special handling for certain nodes:
-- **`execute_tools` node**: State node that uses `search_handler` to execute searches and add evidence to workflow state
-- **`parallel_loops` node**: Parallel node that executes `IterativeResearchFlow` instances for each section in deep research mode
-- **`synthesizer` node**: Agent node that calls `LongWriterAgent.write_report()` directly with `ReportDraft` instead of using `agent.run()`
-- **`writer` node**: Agent node that calls `WriterAgent.write_report()` directly with findings instead of using `agent.run()`
-**GraphExecutionContext**:
-The orchestrator uses `GraphExecutionContext` to manage execution state:
-- Tracks current node, visited nodes, and node results
-- Manages workflow state and budget tracker
-- Provides methods to store and retrieve node execution results
-## Orchestrator Factory
-**File**: `src/orchestrator_factory.py`
-**Purpose**: Factory for creating orchestrators
-**Modes**:
-- **Simple**: Legacy orchestrator (backward compatible)
-- **Advanced**: Magentic orchestrator (requires OpenAI API key)
-- **Auto-detect**: Chooses based on API key availability
-**Usage**:
-<!--codeinclude-->
-[Create Orchestrator](../src/orchestrator_factory.py) start_line:44 end_line:66
-<!--/codeinclude-->
-## Magentic Orchestrator
-**File**: `src/orchestrator_magentic.py`
-**Purpose**: Multi-agent coordination using Microsoft Agent Framework
-**Features**:
-- Uses `agent-framework-core`
-- ChatAgent pattern with internal LLMs per agent
-- `MagenticBuilder` with participants:
-  - `searcher`: SearchAgent (wraps SearchHandler)
-  - `hypothesizer`: HypothesisAgent (generates hypotheses)
-  - `judge`: JudgeAgent (evaluates evidence)
-  - `reporter`: ReportAgent (generates final report)
-- Manager orchestrates agents via chat client (OpenAI or HuggingFace)
-- Event-driven: converts Magentic events to `AgentEvent` for UI streaming via `_process_event()` method
-- Supports max rounds, stall detection, and reset handling
-**Event Processing**:
-The orchestrator processes Magentic events and converts them to `AgentEvent`:
-- `MagenticOrchestratorMessageEvent` → `AgentEvent` with type based on message content
-- `MagenticAgentMessageEvent` → `AgentEvent` with type based on agent name
-- `MagenticAgentDeltaEvent` → `AgentEvent` for streaming updates
-- `MagenticFinalResultEvent` → `AgentEvent` with type "complete"
-**Requirements**:
-- `agent-framework-core` package
-- OpenAI API key or HuggingFace authentication
-## Hierarchical Orchestrator
-**File**: `src/orchestrator_hierarchical.py`
-**Purpose**: Hierarchical orchestrator using middleware and sub-teams
-**Features**:
-- Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`
-- Adapts Magentic ChatAgent to `SubIterationTeam` protocol
-- Event-driven via `asyncio.Queue` for coordination
-- Supports sub-iteration patterns for complex research tasks
-## Legacy Simple Mode
-**File**: `src/legacy_orchestrator.py`
-**Purpose**: Linear search-judge-synthesize loop
-**Features**:
-- Uses `SearchHandlerProtocol` and `JudgeHandlerProtocol`
-- Generator-based design yielding `AgentEvent` objects
-- Backward compatibility for simple use cases
-## State Initialization
-All orchestrators must initialize workflow state:
-<!--codeinclude-->
-[Initialize Workflow State](../src/middleware/state_machine.py) start_line:98 end_line:112
-<!--/codeinclude-->
-## Event Streaming
-All orchestrators yield `AgentEvent` objects:
-**Event Types**:
-- `started`: Research started
-- `searching`: Search in progress
-- `search_complete`: Search completed
-- `judging`: Evidence evaluation in progress
-- `judge_complete`: Evidence evaluation completed
-- `looping`: Iteration in progress
-- `hypothesizing`: Generating hypotheses
-- `analyzing`: Statistical analysis in progress
-- `analysis_complete`: Statistical analysis completed
-- `synthesizing`: Synthesizing results
-- `complete`: Research completed
-- `error`: Error occurred
-- `streaming`: Streaming update (delta events)
-**Event Structure**:
-<!--codeinclude-->
-[AgentEvent Model](../src/utils/models.py) start_line:104 end_line:126
-<!--/codeinclude-->
-## See Also
-- [Graph Orchestration](graph_orchestration.md) - Graph-based execution details
-- [Workflow Diagrams](workflow-diagrams.md) - Detailed workflow diagrams
-- [API Reference - Orchestrators](../api/orchestrators.md) - API documentation

docs/architecture/services.md DELETED Viewed

@@ -1,146 +0,0 @@
-# Services Architecture
-DeepCritical provides several services for embeddings, RAG, and statistical analysis.
-## Embedding Service
-**File**: `src/services/embeddings.py`
-**Purpose**: Local sentence-transformers for semantic search and deduplication
-**Features**:
-- **No API Key Required**: Uses local sentence-transformers models
-- **Async-Safe**: All operations use `run_in_executor()` to avoid blocking the event loop
-- **ChromaDB Storage**: In-memory vector storage for embeddings
-- **Deduplication**: 0.9 similarity threshold by default (90% similarity = duplicate, configurable)
-**Model**: Configurable via `settings.local_embedding_model` (default: `all-MiniLM-L6-v2`)
-**Methods**:
-- `async def embed(text: str) -> list[float]`: Generate embeddings (async-safe via `run_in_executor()`)
-- `async def embed_batch(texts: list[str]) -> list[list[float]]`: Batch embedding (more efficient)
-- `async def add_evidence(evidence_id: str, content: str, metadata: dict[str, Any]) -> None`: Add evidence to vector store
-- `async def search_similar(query: str, n_results: int = 5) -> list[dict[str, Any]]`: Find semantically similar evidence
-- `async def deduplicate(new_evidence: list[Evidence], threshold: float = 0.9) -> list[Evidence]`: Remove semantically duplicate evidence
-**Usage**:
-```python
-from src.services.embeddings import get_embedding_service
-service = get_embedding_service()
-embedding = await service.embed("text to embed")
-```
-## LlamaIndex RAG Service
-**File**: `src/services/llamaindex_rag.py`
-**Purpose**: Retrieval-Augmented Generation using LlamaIndex
-**Features**:
-- **Multiple Embedding Providers**: OpenAI embeddings (requires `OPENAI_API_KEY`) or local sentence-transformers (no API key)
-- **Multiple LLM Providers**: HuggingFace LLM (preferred) or OpenAI LLM (fallback) for query synthesis
-- **ChromaDB Storage**: Vector database for document storage (supports in-memory mode)
-- **Metadata Preservation**: Preserves source, title, URL, date, authors
-- **Lazy Initialization**: Graceful fallback if dependencies not available
-**Initialization Parameters**:
-- `use_openai_embeddings: bool | None`: Force OpenAI embeddings (None = auto-detect)
-- `use_in_memory: bool`: Use in-memory ChromaDB client (useful for tests)
-- `oauth_token: str | None`: Optional OAuth token from HuggingFace login (takes priority over env vars)
-**Methods**:
-- `async def ingest_evidence(evidence: list[Evidence]) -> None`: Ingest evidence into RAG
-- `async def retrieve(query: str, top_k: int = 5) -> list[Document]`: Retrieve relevant documents
-- `async def query(query: str, top_k: int = 5) -> str`: Query with RAG
-**Usage**:
-```python
-from src.services.llamaindex_rag import get_rag_service
-service = get_rag_service(
-    use_openai_embeddings=False,  # Use local embeddings
-    use_in_memory=True,  # Use in-memory ChromaDB
-    oauth_token=token  # Optional HuggingFace token
-)
-if service:
-    documents = await service.retrieve("query", top_k=5)
-```
-## Statistical Analyzer
-**File**: `src/services/statistical_analyzer.py`
-**Purpose**: Secure execution of AI-generated statistical code
-**Features**:
-- **Modal Sandbox**: Secure, isolated execution environment
-- **Code Generation**: Generates Python code via LLM
-- **Library Pinning**: Version-pinned libraries in `SANDBOX_LIBRARIES`
-- **Network Isolation**: `block_network=True` by default
-**Libraries Available**:
-- pandas, numpy, scipy
-- matplotlib, scikit-learn
-- statsmodels
-**Output**: `AnalysisResult` with:
-- `verdict`: SUPPORTED, REFUTED, or INCONCLUSIVE
-- `code`: Generated analysis code
-- `output`: Execution output
-- `error`: Error message if execution failed
-**Usage**:
-```python
-from src.services.statistical_analyzer import StatisticalAnalyzer
-analyzer = StatisticalAnalyzer()
-result = await analyzer.analyze(
-    hypothesis="Metformin reduces cancer risk",
-    evidence=evidence_list
-)
-```
-## Singleton Pattern
-Services use singleton patterns for lazy initialization:
-**EmbeddingService**: Uses a global variable pattern:
-<!--codeinclude-->
-[EmbeddingService Singleton](../src/services/embeddings.py) start_line:164 end_line:172
-<!--/codeinclude-->
-**LlamaIndexRAGService**: Direct instantiation (no caching):
-<!--codeinclude-->
-[LlamaIndexRAGService Factory](../src/services/llamaindex_rag.py) start_line:440 end_line:466
-<!--/codeinclude-->
-This ensures:
-- Single instance per process
-- Lazy initialization
-- No dependencies required at import time
-## Service Availability
-Services check availability before use:
-```python
-from src.utils.config import settings
-if settings.modal_available:
-    # Use Modal sandbox
-    pass
-if settings.has_openai_key:
-    # Use OpenAI embeddings for RAG
-    pass
-```
-## See Also
-- [Tools](tools.md) - How services are used by search tools
-- [API Reference - Services](../api/services.md) - API documentation
-- [Configuration](../configuration/index.md) - Service configuration

docs/architecture/tools.md DELETED Viewed

@@ -1,167 +0,0 @@
-# Tools Architecture
-DeepCritical implements a protocol-based search tool system for retrieving evidence from multiple sources.
-## SearchTool Protocol
-All tools implement the `SearchTool` protocol from `src/tools/base.py`:
-<!--codeinclude-->
-[SearchTool Protocol](../src/tools/base.py) start_line:8 end_line:31
-<!--/codeinclude-->
-## Rate Limiting
-All tools use the `@retry` decorator from tenacity:
-<!--codeinclude-->
-[Retry Decorator Pattern](../src/tools/pubmed.py) start_line:46 end_line:50
-<!--/codeinclude-->
-Tools with API rate limits implement `_rate_limit()` method and use shared rate limiters from `src/tools/rate_limiter.py`.
-## Error Handling
-Tools raise custom exceptions:
-- `SearchError`: General search failures
-- `RateLimitError`: Rate limit exceeded
-Tools handle HTTP errors (429, 500, timeout) and return empty lists on non-critical errors (with warning logs).
-## Query Preprocessing
-Tools use `preprocess_query()` from `src/tools/query_utils.py` to:
-- Remove noise from queries
-- Expand synonyms
-- Normalize query format
-## Evidence Conversion
-All tools convert API responses to `Evidence` objects with:
-- `Citation`: Title, URL, date, authors
-- `content`: Evidence text
-- `relevance_score`: 0.0-1.0 relevance score
-- `metadata`: Additional metadata
-Missing fields are handled gracefully with defaults.
-## Tool Implementations
-### PubMed Tool
-**File**: `src/tools/pubmed.py`
-**API**: NCBI E-utilities (ESearch → EFetch)
-**Rate Limiting**:
-- 0.34s between requests (3 req/sec without API key)
-- 0.1s between requests (10 req/sec with NCBI API key)
-**Features**:
-- XML parsing with `xmltodict`
-- Handles single vs. multiple articles
-- Query preprocessing
-- Evidence conversion with metadata extraction
-### ClinicalTrials Tool
-**File**: `src/tools/clinicaltrials.py`
-**API**: ClinicalTrials.gov API v2
-**Important**: Uses `requests` library (NOT httpx) because WAF blocks httpx TLS fingerprint.
-**Execution**: Runs in thread pool: `await asyncio.to_thread(requests.get, ...)`
-**Filtering**:
-- Only interventional studies
-- Status: `COMPLETED`, `ACTIVE_NOT_RECRUITING`, `RECRUITING`, `ENROLLING_BY_INVITATION`
-**Features**:
-- Parses nested JSON structure
-- Extracts trial metadata
-- Evidence conversion
-### Europe PMC Tool
-**File**: `src/tools/europepmc.py`
-**API**: Europe PMC REST API
-**Features**:
-- Handles preprint markers: `[PREPRINT - Not peer-reviewed]`
-- Builds URLs from DOI or PMID
-- Checks `pubTypeList` for preprint detection
-- Includes both preprints and peer-reviewed articles
-### RAG Tool
-**File**: `src/tools/rag_tool.py`
-**Purpose**: Semantic search within collected evidence
-**Implementation**: Wraps `LlamaIndexRAGService`
-**Features**:
-- Returns Evidence from RAG results
-- Handles evidence ingestion
-- Semantic similarity search
-- Metadata preservation
-### Search Handler
-**File**: `src/tools/search_handler.py`
-**Purpose**: Orchestrates parallel searches across multiple tools
-**Initialization Parameters**:
-- `tools: list[SearchTool]`: List of search tools to use
-- `timeout: float = 30.0`: Timeout for each search in seconds
-- `include_rag: bool = False`: Whether to include RAG tool in searches
-- `auto_ingest_to_rag: bool = True`: Whether to automatically ingest results into RAG
-- `oauth_token: str | None = None`: Optional OAuth token from HuggingFace login (for RAG LLM)
-**Methods**:
-- `async def execute(query: str, max_results_per_tool: int = 10) -> SearchResult`: Execute search across all tools in parallel
-**Features**:
-- Uses `asyncio.gather()` with `return_exceptions=True` for parallel execution
-- Aggregates results into `SearchResult` with evidence and metadata
-- Handles tool failures gracefully (continues with other tools)
-- Deduplicates results by URL
-- Automatically ingests results into RAG if `auto_ingest_to_rag=True`
-- Can add RAG tool dynamically via `add_rag_tool()` method
-## Tool Registration
-Tools are registered in the search handler:
-```python
-from src.tools.pubmed import PubMedTool
-from src.tools.clinicaltrials import ClinicalTrialsTool
-from src.tools.europepmc import EuropePMCTool
-from src.tools.search_handler import SearchHandler
-search_handler = SearchHandler(
-    tools=[
-        PubMedTool(),
-        ClinicalTrialsTool(),
-        EuropePMCTool(),
-    ],
-    include_rag=True,  # Include RAG tool for semantic search
-    auto_ingest_to_rag=True,  # Automatically ingest results into RAG
-    oauth_token=token  # Optional HuggingFace token for RAG LLM
-)
-# Execute search
-result = await search_handler.execute("query", max_results_per_tool=10)
-```
-## See Also
-- [Services](services.md) - RAG and embedding services
-- [API Reference - Tools](../api/tools.md) - API documentation
-- [Contributing - Implementation Patterns](../contributing/implementation-patterns.md) - Development guidelines

docs/architecture/workflow-diagrams.md DELETED Viewed

@@ -1,655 +0,0 @@
-# DeepCritical Workflow - Simplified Magentic Architecture
-> **Architecture Pattern**: Microsoft Magentic Orchestration
-> **Design Philosophy**: Simple, dynamic, manager-driven coordination
-> **Key Innovation**: Intelligent manager replaces rigid sequential phases
----
-## 1. High-Level Magentic Workflow
-```mermaid
-flowchart TD
-    Start([User Query]) --> Manager[Magentic Manager<br/>Plan • Select • Assess • Adapt]
-    Manager -->|Plans| Task1[Task Decomposition]
-    Task1 --> Manager
-    Manager -->|Selects & Executes| HypAgent[Hypothesis Agent]
-    Manager -->|Selects & Executes| SearchAgent[Search Agent]
-    Manager -->|Selects & Executes| AnalysisAgent[Analysis Agent]
-    Manager -->|Selects & Executes| ReportAgent[Report Agent]
-    HypAgent -->|Results| Manager
-    SearchAgent -->|Results| Manager
-    AnalysisAgent -->|Results| Manager
-    ReportAgent -->|Results| Manager
-    Manager -->|Assesses Quality| Decision{Good Enough?}
-    Decision -->|No - Refine| Manager
-    Decision -->|No - Different Agent| Manager
-    Decision -->|No - Stalled| Replan[Reset Plan]
-    Replan --> Manager
-    Decision -->|Yes| Synthesis[Synthesize Final Result]
-    Synthesis --> Output([Research Report])
-    style Start fill:#e1f5e1
-    style Manager fill:#ffe6e6
-    style HypAgent fill:#fff4e6
-    style SearchAgent fill:#fff4e6
-    style AnalysisAgent fill:#fff4e6
-    style ReportAgent fill:#fff4e6
-    style Decision fill:#ffd6d6
-    style Synthesis fill:#d4edda
-    style Output fill:#e1f5e1
-```
-## 2. Magentic Manager: The 6-Phase Cycle
-```mermaid
-flowchart LR
-    P1[1. Planning<br/>Analyze task<br/>Create strategy] --> P2[2. Agent Selection<br/>Pick best agent<br/>for subtask]
-    P2 --> P3[3. Execution<br/>Run selected<br/>agent with tools]
-    P3 --> P4[4. Assessment<br/>Evaluate quality<br/>Check progress]
-    P4 --> Decision{Quality OK?<br/>Progress made?}
-    Decision -->|Yes| P6[6. Synthesis<br/>Combine results<br/>Generate report]
-    Decision -->|No| P5[5. Iteration<br/>Adjust plan<br/>Try again]
-    P5 --> P2
-    P6 --> Done([Complete])
-    style P1 fill:#fff4e6
-    style P2 fill:#ffe6e6
-    style P3 fill:#e6f3ff
-    style P4 fill:#ffd6d6
-    style P5 fill:#fff3cd
-    style P6 fill:#d4edda
-    style Done fill:#e1f5e1
-```
-## 3. Simplified Agent Architecture
-```mermaid
-graph TB
-    subgraph "Orchestration Layer"
-        Manager[Magentic Manager<br/>• Plans workflow<br/>• Selects agents<br/>• Assesses quality<br/>• Adapts strategy]
-        SharedContext[(Shared Context<br/>• Hypotheses<br/>• Search Results<br/>• Analysis<br/>• Progress)]
-        Manager <--> SharedContext
-    end
-    subgraph "Specialist Agents"
-        HypAgent[Hypothesis Agent<br/>• Domain understanding<br/>• Hypothesis generation<br/>• Testability refinement]
-        SearchAgent[Search Agent<br/>• Multi-source search<br/>• RAG retrieval<br/>• Result ranking]
-        AnalysisAgent[Analysis Agent<br/>• Evidence extraction<br/>• Statistical analysis<br/>• Code execution]
-        ReportAgent[Report Agent<br/>• Report assembly<br/>• Visualization<br/>• Citation formatting]
-    end
-    subgraph "MCP Tools"
-        WebSearch[Web Search<br/>PubMed • arXiv • bioRxiv]
-        CodeExec[Code Execution<br/>Sandboxed Python]
-        RAG[RAG Retrieval<br/>Vector DB • Embeddings]
-        Viz[Visualization<br/>Charts • Graphs]
-    end
-    Manager -->|Selects & Directs| HypAgent
-    Manager -->|Selects & Directs| SearchAgent
-    Manager -->|Selects & Directs| AnalysisAgent
-    Manager -->|Selects & Directs| ReportAgent
-    HypAgent --> SharedContext
-    SearchAgent --> SharedContext
-    AnalysisAgent --> SharedContext
-    ReportAgent --> SharedContext
-    SearchAgent --> WebSearch
-    SearchAgent --> RAG
-    AnalysisAgent --> CodeExec
-    ReportAgent --> CodeExec
-    ReportAgent --> Viz
-    style Manager fill:#ffe6e6
-    style SharedContext fill:#ffe6f0
-    style HypAgent fill:#fff4e6
-    style SearchAgent fill:#fff4e6
-    style AnalysisAgent fill:#fff4e6
-    style ReportAgent fill:#fff4e6
-    style WebSearch fill:#e6f3ff
-    style CodeExec fill:#e6f3ff
-    style RAG fill:#e6f3ff
-    style Viz fill:#e6f3ff
-```
-## 4. Dynamic Workflow Example
-```mermaid
-sequenceDiagram
-    participant User
-    participant Manager
-    participant HypAgent
-    participant SearchAgent
-    participant AnalysisAgent
-    participant ReportAgent
-    User->>Manager: "Research protein folding in Alzheimer's"
-    Note over Manager: PLAN: Generate hypotheses → Search → Analyze → Report
-    Manager->>HypAgent: Generate 3 hypotheses
-    HypAgent-->>Manager: Returns 3 hypotheses
-    Note over Manager: ASSESS: Good quality, proceed
-    Manager->>SearchAgent: Search literature for hypothesis 1
-    SearchAgent-->>Manager: Returns 15 papers
-    Note over Manager: ASSESS: Good results, continue
-    Manager->>SearchAgent: Search for hypothesis 2
-    SearchAgent-->>Manager: Only 2 papers found
-    Note over Manager: ASSESS: Insufficient, refine search
-    Manager->>SearchAgent: Refined query for hypothesis 2
-    SearchAgent-->>Manager: Returns 12 papers
-    Note over Manager: ASSESS: Better, proceed
-    Manager->>AnalysisAgent: Analyze evidence for all hypotheses
-    AnalysisAgent-->>Manager: Returns analysis with code
-    Note over Manager: ASSESS: Complete, generate report
-    Manager->>ReportAgent: Create comprehensive report
-    ReportAgent-->>Manager: Returns formatted report
-    Note over Manager: SYNTHESIZE: Combine all results
-    Manager->>User: Final Research Report
-```
-## 5. Manager Decision Logic
-```mermaid
-flowchart TD
-    Start([Manager Receives Task]) --> Plan[Create Initial Plan]
-    Plan --> Select[Select Agent for Next Subtask]
-    Select --> Execute[Execute Agent]
-    Execute --> Collect[Collect Results]
-    Collect --> Assess[Assess Quality & Progress]
-    Assess --> Q1{Quality Sufficient?}
-    Q1 -->|No| Q2{Same Agent Can Fix?}
-    Q2 -->|Yes| Feedback[Provide Specific Feedback]
-    Feedback --> Execute
-    Q2 -->|No| Different[Try Different Agent]
-    Different --> Select
-    Q1 -->|Yes| Q3{Task Complete?}
-    Q3 -->|No| Q4{Making Progress?}
-    Q4 -->|Yes| Select
-    Q4 -->|No - Stalled| Replan[Reset Plan & Approach]
-    Replan --> Plan
-    Q3 -->|Yes| Synth[Synthesize Final Result]
-    Synth --> Done([Return Report])
-    style Start fill:#e1f5e1
-    style Plan fill:#fff4e6
-    style Select fill:#ffe6e6
-    style Execute fill:#e6f3ff
-    style Assess fill:#ffd6d6
-    style Q1 fill:#ffe6e6
-    style Q2 fill:#ffe6e6
-    style Q3 fill:#ffe6e6
-    style Q4 fill:#ffe6e6
-    style Synth fill:#d4edda
-    style Done fill:#e1f5e1
-```
-## 6. Hypothesis Agent Workflow
-```mermaid
-flowchart LR
-    Input[Research Query] --> Domain[Identify Domain<br/>& Key Concepts]
-    Domain --> Context[Retrieve Background<br/>Knowledge]
-    Context --> Generate[Generate 3-5<br/>Initial Hypotheses]
-    Generate --> Refine[Refine for<br/>Testability]
-    Refine --> Rank[Rank by<br/>Quality Score]
-    Rank --> Output[Return Top<br/>Hypotheses]
-    Output --> Struct[Hypothesis Structure:<br/>• Statement<br/>• Rationale<br/>• Testability Score<br/>• Data Requirements<br/>• Expected Outcomes]
-    style Input fill:#e1f5e1
-    style Output fill:#fff4e6
-    style Struct fill:#e6f3ff
-```
-## 7. Search Agent Workflow
-```mermaid
-flowchart TD
-    Input[Hypotheses] --> Strategy[Formulate Search<br/>Strategy per Hypothesis]
-    Strategy --> Multi[Multi-Source Search]
-    Multi --> PubMed[PubMed Search<br/>via MCP]
-    Multi --> ArXiv[arXiv Search<br/>via MCP]
-    Multi --> BioRxiv[bioRxiv Search<br/>via MCP]
-    PubMed --> Aggregate[Aggregate Results]
-    ArXiv --> Aggregate
-    BioRxiv --> Aggregate
-    Aggregate --> Filter[Filter & Rank<br/>by Relevance]
-    Filter --> Dedup[Deduplicate<br/>Cross-Reference]
-    Dedup --> Embed[Embed Documents<br/>via MCP]
-    Embed --> Vector[(Vector DB)]
-    Vector --> RAGRetrieval[RAG Retrieval<br/>Top-K per Hypothesis]
-    RAGRetrieval --> Output[Return Contextualized<br/>Search Results]
-    style Input fill:#fff4e6
-    style Multi fill:#ffe6e6
-    style Vector fill:#ffe6f0
-    style Output fill:#e6f3ff
-```
-## 8. Analysis Agent Workflow
-```mermaid
-flowchart TD
-    Input1[Hypotheses] --> Extract
-    Input2[Search Results] --> Extract[Extract Evidence<br/>per Hypothesis]
-    Extract --> Methods[Determine Analysis<br/>Methods Needed]
-    Methods --> Branch{Requires<br/>Computation?}
-    Branch -->|Yes| GenCode[Generate Python<br/>Analysis Code]
-    Branch -->|No| Qual[Qualitative<br/>Synthesis]
-    GenCode --> Execute[Execute Code<br/>via MCP Sandbox]
-    Execute --> Interpret1[Interpret<br/>Results]
-    Qual --> Interpret2[Interpret<br/>Findings]
-    Interpret1 --> Synthesize[Synthesize Evidence<br/>Across Sources]
-    Interpret2 --> Synthesize
-    Synthesize --> Verdict[Determine Verdict<br/>per Hypothesis]
-    Verdict --> Support[• Supported<br/>• Refuted<br/>• Inconclusive]
-    Support --> Gaps[Identify Knowledge<br/>Gaps & Limitations]
-    Gaps --> Output[Return Analysis<br/>Report]
-    style Input1 fill:#fff4e6
-    style Input2 fill:#e6f3ff
-    style Execute fill:#ffe6e6
-    style Output fill:#e6ffe6
-```
-## 9. Report Agent Workflow
-```mermaid
-flowchart TD
-    Input1[Query] --> Assemble
-    Input2[Hypotheses] --> Assemble
-    Input3[Search Results] --> Assemble
-    Input4[Analysis] --> Assemble[Assemble Report<br/>Sections]
-    Assemble --> Exec[Executive Summary]
-    Assemble --> Intro[Introduction]
-    Assemble --> Methods[Methods]
-    Assemble --> Results[Results per<br/>Hypothesis]
-    Assemble --> Discussion[Discussion]
-    Assemble --> Future[Future Directions]
-    Assemble --> Refs[References]
-    Results --> VizCheck{Needs<br/>Visualization?}
-    VizCheck -->|Yes| GenViz[Generate Viz Code]
-    GenViz --> ExecViz[Execute via MCP<br/>Create Charts]
-    ExecViz --> Combine
-    VizCheck -->|No| Combine[Combine All<br/>Sections]
-    Exec --> Combine
-    Intro --> Combine
-    Methods --> Combine
-    Discussion --> Combine
-    Future --> Combine
-    Refs --> Combine
-    Combine --> Format[Format Output]
-    Format --> MD[Markdown]
-    Format --> PDF[PDF]
-    Format --> JSON[JSON]
-    MD --> Output[Return Final<br/>Report]
-    PDF --> Output
-    JSON --> Output
-    style Input1 fill:#e1f5e1
-    style Input2 fill:#fff4e6
-    style Input3 fill:#e6f3ff
-    style Input4 fill:#e6ffe6
-    style Output fill:#d4edda
-```
-## 10. Data Flow & Event Streaming
-```mermaid
-flowchart TD
-    User[👤 User] -->|Research Query| UI[Gradio UI]
-    UI -->|Submit| Manager[Magentic Manager]
-    Manager -->|Event: Planning| UI
-    Manager -->|Select Agent| HypAgent[Hypothesis Agent]
-    HypAgent -->|Event: Delta/Message| UI
-    HypAgent -->|Hypotheses| Context[(Shared Context)]
-    Context -->|Retrieved by| Manager
-    Manager -->|Select Agent| SearchAgent[Search Agent]
-    SearchAgent -->|MCP Request| WebSearch[Web Search Tool]
-    WebSearch -->|Results| SearchAgent
-    SearchAgent -->|Event: Delta/Message| UI
-    SearchAgent -->|Documents| Context
-    SearchAgent -->|Embeddings| VectorDB[(Vector DB)]
-    Context -->|Retrieved by| Manager
-    Manager -->|Select Agent| AnalysisAgent[Analysis Agent]
-    AnalysisAgent -->|MCP Request| CodeExec[Code Execution Tool]
-    CodeExec -->|Results| AnalysisAgent
-    AnalysisAgent -->|Event: Delta/Message| UI
-    AnalysisAgent -->|Analysis| Context
-    Context -->|Retrieved by| Manager
-    Manager -->|Select Agent| ReportAgent[Report Agent]
-    ReportAgent -->|MCP Request| CodeExec
-    ReportAgent -->|Event: Delta/Message| UI
-    ReportAgent -->|Report| Context
-    Manager -->|Event: Final Result| UI
-    UI -->|Display| User
-    style User fill:#e1f5e1
-    style UI fill:#e6f3ff
-    style Manager fill:#ffe6e6
-    style Context fill:#ffe6f0
-    style VectorDB fill:#ffe6f0
-    style WebSearch fill:#f0f0f0
-    style CodeExec fill:#f0f0f0
-```
-## 11. MCP Tool Architecture
-```mermaid
-graph TB
-    subgraph "Agent Layer"
-        Manager[Magentic Manager]
-        HypAgent[Hypothesis Agent]
-        SearchAgent[Search Agent]
-        AnalysisAgent[Analysis Agent]
-        ReportAgent[Report Agent]
-    end
-    subgraph "MCP Protocol Layer"
-        Registry[MCP Tool Registry<br/>• Discovers tools<br/>• Routes requests<br/>• Manages connections]
-    end
-    subgraph "MCP Servers"
-        Server1[Web Search Server<br/>localhost:8001<br/>• PubMed<br/>• arXiv<br/>• bioRxiv]
-        Server2[Code Execution Server<br/>localhost:8002<br/>• Sandboxed Python<br/>• Package management]
-        Server3[RAG Server<br/>localhost:8003<br/>• Vector embeddings<br/>• Similarity search]
-        Server4[Visualization Server<br/>localhost:8004<br/>• Chart generation<br/>• Plot rendering]
-    end
-    subgraph "External Services"
-        PubMed[PubMed API]
-        ArXiv[arXiv API]
-        BioRxiv[bioRxiv API]
-        Modal[Modal Sandbox]
-        ChromaDB[(ChromaDB)]
-    end
-    SearchAgent -->|Request| Registry
-    AnalysisAgent -->|Request| Registry
-    ReportAgent -->|Request| Registry
-    Registry --> Server1
-    Registry --> Server2
-    Registry --> Server3
-    Registry --> Server4
-    Server1 --> PubMed
-    Server1 --> ArXiv
-    Server1 --> BioRxiv
-    Server2 --> Modal
-    Server3 --> ChromaDB
-    style Manager fill:#ffe6e6
-    style Registry fill:#fff4e6
-    style Server1 fill:#e6f3ff
-    style Server2 fill:#e6f3ff
-    style Server3 fill:#e6f3ff
-    style Server4 fill:#e6f3ff
-```
-## 12. Progress Tracking & Stall Detection
-```mermaid
-stateDiagram-v2
-    [*] --> Initialization: User Query
-    Initialization --> Planning: Manager starts
-    Planning --> AgentExecution: Select agent
-    AgentExecution --> Assessment: Collect results
-    Assessment --> QualityCheck: Evaluate output
-    QualityCheck --> AgentExecution: Poor quality<br/>(retry < max_rounds)
-    QualityCheck --> Planning: Poor quality<br/>(try different agent)
-    QualityCheck --> NextAgent: Good quality<br/>(task incomplete)
-    QualityCheck --> Synthesis: Good quality<br/>(task complete)
-    NextAgent --> AgentExecution: Select next agent
-    state StallDetection <<choice>>
-    Assessment --> StallDetection: Check progress
-    StallDetection --> Planning: No progress<br/>(stall count < max)
-    StallDetection --> ErrorRecovery: No progress<br/>(max stalls reached)
-    ErrorRecovery --> PartialReport: Generate partial results
-    PartialReport --> [*]
-    Synthesis --> FinalReport: Combine all outputs
-    FinalReport --> [*]
-    note right of QualityCheck
-        Manager assesses:
-        • Output completeness
-        • Quality metrics
-        • Progress made
-    end note
-    note right of StallDetection
-        Stall = no new progress
-        after agent execution
-        Triggers plan reset
-    end note
-```
-## 13. Gradio UI Integration
-```mermaid
-graph TD
-    App[Gradio App<br/>DeepCritical Research Agent]
-    App --> Input[Input Section]
-    App --> Status[Status Section]
-    App --> Output[Output Section]
-    Input --> Query[Research Question<br/>Text Area]
-    Input --> Controls[Controls]
-    Controls --> MaxHyp[Max Hypotheses: 1-10]
-    Controls --> MaxRounds[Max Rounds: 5-20]
-    Controls --> Submit[Start Research Button]
-    Status --> Log[Real-time Event Log<br/>• Manager planning<br/>• Agent selection<br/>• Execution updates<br/>• Quality assessment]
-    Status --> Progress[Progress Tracker<br/>• Current agent<br/>• Round count<br/>• Stall count]
-    Output --> Tabs[Tabbed Results]
-    Tabs --> Tab1[Hypotheses Tab<br/>Generated hypotheses with scores]
-    Tabs --> Tab2[Search Results Tab<br/>Papers & sources found]
-    Tabs --> Tab3[Analysis Tab<br/>Evidence & verdicts]
-    Tabs --> Tab4[Report Tab<br/>Final research report]
-    Tab4 --> Download[Download Report<br/>MD / PDF / JSON]
-    Submit -.->|Triggers| Workflow[Magentic Workflow]
-    Workflow -.->|MagenticOrchestratorMessageEvent| Log
-    Workflow -.->|MagenticAgentDeltaEvent| Log
-    Workflow -.->|MagenticAgentMessageEvent| Log
-    Workflow -.->|MagenticFinalResultEvent| Tab4
-    style App fill:#e1f5e1
-    style Input fill:#fff4e6
-    style Status fill:#e6f3ff
-    style Output fill:#e6ffe6
-    style Workflow fill:#ffe6e6
-```
-## 14. Complete System Context
-```mermaid
-graph LR
-    User[👤 Researcher<br/>Asks research questions] -->|Submits query| DC[DeepCritical<br/>Magentic Workflow]
-    DC -->|Literature search| PubMed[PubMed API<br/>Medical papers]
-    DC -->|Preprint search| ArXiv[arXiv API<br/>Scientific preprints]
-    DC -->|Biology search| BioRxiv[bioRxiv API<br/>Biology preprints]
-    DC -->|Agent reasoning| Claude[Claude API<br/>Sonnet 4 / Opus]
-    DC -->|Code execution| Modal[Modal Sandbox<br/>Safe Python env]
-    DC -->|Vector storage| Chroma[ChromaDB<br/>Embeddings & RAG]
-    DC -->|Deployed on| HF[HuggingFace Spaces<br/>Gradio 6.0]
-    PubMed -->|Results| DC
-    ArXiv -->|Results| DC
-    BioRxiv -->|Results| DC
-    Claude -->|Responses| DC
-    Modal -->|Output| DC
-    Chroma -->|Context| DC
-    DC -->|Research report| User
-    style User fill:#e1f5e1
-    style DC fill:#ffe6e6
-    style PubMed fill:#e6f3ff
-    style ArXiv fill:#e6f3ff
-    style BioRxiv fill:#e6f3ff
-    style Claude fill:#ffd6d6
-    style Modal fill:#f0f0f0
-    style Chroma fill:#ffe6f0
-    style HF fill:#d4edda
-```
-## 15. Workflow Timeline (Simplified)
-```mermaid
-gantt
-    title DeepCritical Magentic Workflow - Typical Execution
-    dateFormat mm:ss
-    axisFormat %M:%S
-    section Manager Planning
-    Initial planning         :p1, 00:00, 10s
-    section Hypothesis Agent
-    Generate hypotheses      :h1, after p1, 30s
-    Manager assessment       :h2, after h1, 5s
-    section Search Agent
-    Search hypothesis 1      :s1, after h2, 20s
-    Search hypothesis 2      :s2, after s1, 20s
-    Search hypothesis 3      :s3, after s2, 20s
-    RAG processing          :s4, after s3, 15s
-    Manager assessment      :s5, after s4, 5s
-    section Analysis Agent
-    Evidence extraction     :a1, after s5, 15s
-    Code generation        :a2, after a1, 20s
-    Code execution         :a3, after a2, 25s
-    Synthesis              :a4, after a3, 20s
-    Manager assessment     :a5, after a4, 5s
-    section Report Agent
-    Report assembly        :r1, after a5, 30s
-    Visualization          :r2, after r1, 15s
-    Formatting             :r3, after r2, 10s
-    section Manager Synthesis
-    Final synthesis        :f1, after r3, 10s
-```
----
-## Key Differences from Original Design
-| Aspect | Original (Judge-in-Loop) | New (Magentic) |
-|--------|-------------------------|----------------|
-| **Control Flow** | Fixed sequential phases | Dynamic agent selection |
-| **Quality Control** | Separate Judge Agent | Manager assessment built-in |
-| **Retry Logic** | Phase-level with feedback | Agent-level with adaptation |
-| **Flexibility** | Rigid 4-phase pipeline | Adaptive workflow |
-| **Complexity** | 5 agents (including Judge) | 4 agents (no Judge) |
-| **Progress Tracking** | Manual state management | Built-in round/stall detection |
-| **Agent Coordination** | Sequential handoff | Manager-driven dynamic selection |
-| **Error Recovery** | Retry same phase | Try different agent or replan |
----
-## Simplified Design Principles
-1. **Manager is Intelligent**: LLM-powered manager handles planning, selection, and quality assessment
-2. **No Separate Judge**: Manager's assessment phase replaces dedicated Judge Agent
-3. **Dynamic Workflow**: Agents can be called multiple times in any order based on need
-4. **Built-in Safety**: max_round_count (15) and max_stall_count (3) prevent infinite loops
-5. **Event-Driven UI**: Real-time streaming updates to Gradio interface
-6. **MCP-Powered Tools**: All external capabilities via Model Context Protocol
-7. **Shared Context**: Centralized state accessible to all agents
-8. **Progress Awareness**: Manager tracks what's been done and what's needed
----
-## Legend
-- 🔴 **Red/Pink**: Manager, orchestration, decision-making
-- 🟡 **Yellow/Orange**: Specialist agents, processing
-- 🔵 **Blue**: Data, tools, MCP services
-- 🟣 **Purple/Pink**: Storage, databases, state
-- 🟢 **Green**: User interactions, final outputs
-- ⚪ **Gray**: External services, APIs
----
-## Implementation Highlights
-**Simple 4-Agent Setup:**
-<!--codeinclude-->
-[Magentic Workflow Builder](../src/orchestrator_magentic.py) start_line:72 end_line:99
-<!--/codeinclude-->
-**Manager handles quality assessment in its instructions:**
-- Checks hypothesis quality (testable, novel, clear)
-- Validates search results (relevant, authoritative, recent)
-- Assesses analysis soundness (methodology, evidence, conclusions)
-- Ensures report completeness (all sections, proper citations)
-No separate Judge Agent needed - manager does it all!
----
-**Document Version**: 2.0 (Magentic Simplified)
-**Last Updated**: 2025-11-24
-**Architecture**: Microsoft Magentic Orchestration Pattern
-**Agents**: 4 (Hypothesis, Search, Analysis, Report) + 1 Manager
-**License**: MIT
-## See Also
-- [Orchestrators](orchestrators.md) - Overview of all orchestrator patterns
-- [Graph Orchestration](graph_orchestration.md) - Graph-based execution overview
-- [API Reference - Orchestrators](../api/orchestrators.md) - API documentation

docs/configuration/index.md DELETED Viewed

@@ -1,564 +0,0 @@
-# Configuration Guide
-## Overview
-DeepCritical uses **Pydantic Settings** for centralized configuration management. All settings are defined in the `Settings` class in `src/utils/config.py` and can be configured via environment variables or a `.env` file.
-The configuration system provides:
-- **Type Safety**: Strongly-typed fields with Pydantic validation
-- **Environment File Support**: Automatically loads from `.env` file (if present)
-- **Case-Insensitive**: Environment variables are case-insensitive
-- **Singleton Pattern**: Global `settings` instance for easy access throughout the codebase
-- **Validation**: Automatic validation on load with helpful error messages
-## Quick Start
-1. Create a `.env` file in the project root
-2. Set at least one LLM API key (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or `HF_TOKEN`)
-3. Optionally configure other services as needed
-4. The application will automatically load and validate your configuration
-## Configuration System Architecture
-### Settings Class
-The [`Settings`][settings-class] class extends `BaseSettings` from `pydantic_settings` and defines all application configuration:
-<!--codeinclude-->
-[Settings Class Definition](../src/utils/config.py) start_line:13 end_line:21
-<!--/codeinclude-->
-[View source](https://github.com/DeepCritical/GradioDemo/blob/main/src/utils/config.py#L13-L21)
-### Singleton Instance
-A global `settings` instance is available for import:
-<!--codeinclude-->
-[Singleton Instance](../src/utils/config.py) start_line:234 end_line:235
-<!--/codeinclude-->
-[View source](https://github.com/DeepCritical/GradioDemo/blob/main/src/utils/config.py#L234-L235)
-### Usage Pattern
-Access configuration throughout the codebase:
-```python
-from src.utils.config import settings
-# Check if API keys are available
-if settings.has_openai_key:
-    # Use OpenAI
-    pass
-# Access configuration values
-max_iterations = settings.max_iterations
-web_search_provider = settings.web_search_provider
-```
-## Required Configuration
-### LLM Provider
-You must configure at least one LLM provider. The system supports:
-- **OpenAI**: Requires `OPENAI_API_KEY`
-- **Anthropic**: Requires `ANTHROPIC_API_KEY`
-- **HuggingFace**: Optional `HF_TOKEN` or `HUGGINGFACE_API_KEY` (can work without key for public models)
-#### OpenAI Configuration
-```bash
-LLM_PROVIDER=openai
-OPENAI_API_KEY=your_openai_api_key_here
-OPENAI_MODEL=gpt-5.1
-```
-The default model is defined in the `Settings` class:
-<!--codeinclude-->
-[OpenAI Model Configuration](../src/utils/config.py) start_line:29 end_line:29
-<!--/codeinclude-->
-#### Anthropic Configuration
-```bash
-LLM_PROVIDER=anthropic
-ANTHROPIC_API_KEY=your_anthropic_api_key_here
-ANTHROPIC_MODEL=claude-sonnet-4-5-20250929
-```
-The default model is defined in the `Settings` class:
-<!--codeinclude-->
-[Anthropic Model Configuration](../src/utils/config.py) start_line:30 end_line:32
-<!--/codeinclude-->
-#### HuggingFace Configuration
-HuggingFace can work without an API key for public models, but an API key provides higher rate limits:
-```bash
-# Option 1: Using HF_TOKEN (preferred)
-HF_TOKEN=your_huggingface_token_here
-# Option 2: Using HUGGINGFACE_API_KEY (alternative)
-HUGGINGFACE_API_KEY=your_huggingface_api_key_here
-# Default model
-HUGGINGFACE_MODEL=meta-llama/Llama-3.1-8B-Instruct
-```
-The HuggingFace token can be set via either environment variable:
-<!--codeinclude-->
-[HuggingFace Token Configuration](../src/utils/config.py) start_line:33 end_line:35
-<!--/codeinclude-->
-<!--codeinclude-->
-[HuggingFace API Key Configuration](../src/utils/config.py) start_line:57 end_line:59
-<!--/codeinclude-->
-## Optional Configuration
-### Embedding Configuration
-DeepCritical supports multiple embedding providers for semantic search and RAG:
-```bash
-# Embedding Provider: "openai", "local", or "huggingface"
-EMBEDDING_PROVIDER=local
-# OpenAI Embedding Model (used by LlamaIndex RAG)
-OPENAI_EMBEDDING_MODEL=text-embedding-3-small
-# Local Embedding Model (sentence-transformers, used by EmbeddingService)
-LOCAL_EMBEDDING_MODEL=all-MiniLM-L6-v2
-# HuggingFace Embedding Model
-HUGGINGFACE_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
-```
-The embedding provider configuration:
-<!--codeinclude-->
-[Embedding Provider Configuration](../src/utils/config.py) start_line:47 end_line:50
-<!--/codeinclude-->
-**Note**: OpenAI embeddings require `OPENAI_API_KEY`. The local provider (default) uses sentence-transformers and requires no API key.
-### Web Search Configuration
-DeepCritical supports multiple web search providers:
-```bash
-# Web Search Provider: "serper", "searchxng", "brave", "tavily", or "duckduckgo"
-# Default: "duckduckgo" (no API key required)
-WEB_SEARCH_PROVIDER=duckduckgo
-# Serper API Key (for Google search via Serper)
-SERPER_API_KEY=your_serper_api_key_here
-# SearchXNG Host URL (for self-hosted search)
-SEARCHXNG_HOST=http://localhost:8080
-# Brave Search API Key
-BRAVE_API_KEY=your_brave_api_key_here
-# Tavily API Key
-TAVILY_API_KEY=your_tavily_api_key_here
-```
-The web search provider configuration:
-<!--codeinclude-->
-[Web Search Provider Configuration](../src/utils/config.py) start_line:71 end_line:74
-<!--/codeinclude-->
-**Note**: DuckDuckGo is the default and requires no API key, making it ideal for development and testing.
-### PubMed Configuration
-PubMed search supports optional NCBI API key for higher rate limits:
-```bash
-# NCBI API Key (optional, for higher rate limits: 10 req/sec vs 3 req/sec)
-NCBI_API_KEY=your_ncbi_api_key_here
-```
-The PubMed tool uses this configuration:
-<!--codeinclude-->
-[PubMed Tool Configuration](../src/tools/pubmed.py) start_line:22 end_line:29
-<!--/codeinclude-->
-### Agent Configuration
-Control agent behavior and research loop execution:
-```bash
-# Maximum iterations per research loop (1-50, default: 10)
-MAX_ITERATIONS=10
-# Search timeout in seconds
-SEARCH_TIMEOUT=30
-# Use graph-based execution for research flows
-USE_GRAPH_EXECUTION=false
-```
-The agent configuration fields:
-<!--codeinclude-->
-[Agent Configuration](../src/utils/config.py) start_line:80 end_line:85
-<!--/codeinclude-->
-### Budget & Rate Limiting Configuration
-Control resource limits for research loops:
-```bash
-# Default token budget per research loop (1000-1000000, default: 100000)
-DEFAULT_TOKEN_LIMIT=100000
-# Default time limit per research loop in minutes (1-120, default: 10)
-DEFAULT_TIME_LIMIT_MINUTES=10
-# Default iterations limit per research loop (1-50, default: 10)
-DEFAULT_ITERATIONS_LIMIT=10
-```
-The budget configuration with validation:
-<!--codeinclude-->
-[Budget Configuration](../src/utils/config.py) start_line:87 end_line:105
-<!--/codeinclude-->
-### RAG Service Configuration
-Configure the Retrieval-Augmented Generation service:
-```bash
-# ChromaDB collection name for RAG
-RAG_COLLECTION_NAME=deepcritical_evidence
-# Number of top results to retrieve from RAG (1-50, default: 5)
-RAG_SIMILARITY_TOP_K=5
-# Automatically ingest evidence into RAG
-RAG_AUTO_INGEST=true
-```
-The RAG configuration:
-<!--codeinclude-->
-[RAG Service Configuration](../src/utils/config.py) start_line:127 end_line:141
-<!--/codeinclude-->
-### ChromaDB Configuration
-Configure the vector database for embeddings and RAG:
-```bash
-# ChromaDB storage path
-CHROMA_DB_PATH=./chroma_db
-# Whether to persist ChromaDB to disk
-CHROMA_DB_PERSIST=true
-# ChromaDB server host (for remote ChromaDB, optional)
-CHROMA_DB_HOST=localhost
-# ChromaDB server port (for remote ChromaDB, optional)
-CHROMA_DB_PORT=8000
-```
-The ChromaDB configuration:
-<!--codeinclude-->
-[ChromaDB Configuration](../src/utils/config.py) start_line:113 end_line:125
-<!--/codeinclude-->
-### External Services
-#### Modal Configuration
-Modal is used for secure sandbox execution of statistical analysis:
-```bash
-# Modal Token ID (for Modal sandbox execution)
-MODAL_TOKEN_ID=your_modal_token_id_here
-# Modal Token Secret
-MODAL_TOKEN_SECRET=your_modal_token_secret_here
-```
-The Modal configuration:
-<!--codeinclude-->
-[Modal Configuration](../src/utils/config.py) start_line:110 end_line:112
-<!--/codeinclude-->
-### Logging Configuration
-Configure structured logging:
-```bash
-# Log Level: "DEBUG", "INFO", "WARNING", or "ERROR"
-LOG_LEVEL=INFO
-```
-The logging configuration:
-<!--codeinclude-->
-[Logging Configuration](../src/utils/config.py) start_line:107 end_line:108
-<!--/codeinclude-->
-Logging is configured via the `configure_logging()` function:
-<!--codeinclude-->
-[Configure Logging Function](../src/utils/config.py) start_line:212 end_line:231
-<!--/codeinclude-->
-## Configuration Properties
-The `Settings` class provides helpful properties for checking configuration state:
-### API Key Availability
-Check which API keys are available:
-<!--codeinclude-->
-[API Key Availability Properties](../src/utils/config.py) start_line:171 end_line:189
-<!--/codeinclude-->
-**Usage:**
-```python
-from src.utils.config import settings
-# Check API key availability
-if settings.has_openai_key:
-    # Use OpenAI
-    pass
-if settings.has_anthropic_key:
-    # Use Anthropic
-    pass
-if settings.has_huggingface_key:
-    # Use HuggingFace
-    pass
-if settings.has_any_llm_key:
-    # At least one LLM is available
-    pass
-```
-### Service Availability
-Check if external services are configured:
-<!--codeinclude-->
-[Modal Availability Property](../src/utils/config.py) start_line:143 end_line:146
-<!--/codeinclude-->
-<!--codeinclude-->
-[Web Search Availability Property](../src/utils/config.py) start_line:191 end_line:204
-<!--/codeinclude-->
-**Usage:**
-```python
-from src.utils.config import settings
-# Check service availability
-if settings.modal_available:
-    # Use Modal sandbox
-    pass
-if settings.web_search_available:
-    # Web search is configured
-    pass
-```
-### API Key Retrieval
-Get the API key for the configured provider:
-<!--codeinclude-->
-[Get API Key Method](../src/utils/config.py) start_line:148 end_line:160
-<!--/codeinclude-->
-For OpenAI-specific operations (e.g., Magentic mode):
-<!--codeinclude-->
-[Get OpenAI API Key Method](../src/utils/config.py) start_line:162 end_line:169
-<!--/codeinclude-->
-## Configuration Usage in Codebase
-The configuration system is used throughout the codebase:
-### LLM Factory
-The LLM factory uses settings to create appropriate models:
-<!--codeinclude-->
-[LLM Factory Usage](../src/utils/llm_factory.py) start_line:129 end_line:144
-<!--/codeinclude-->
-### Embedding Service
-The embedding service uses local embedding model configuration:
-<!--codeinclude-->
-[Embedding Service Usage](../src/services/embeddings.py) start_line:29 end_line:31
-<!--/codeinclude-->
-### Orchestrator Factory
-The orchestrator factory uses settings to determine mode:
-<!--codeinclude-->
-[Orchestrator Factory Mode Detection](../src/orchestrator_factory.py) start_line:69 end_line:80
-<!--/codeinclude-->
-## Environment Variables Reference
-### Required (at least one LLM)
-- `OPENAI_API_KEY` - OpenAI API key (required for OpenAI provider)
-- `ANTHROPIC_API_KEY` - Anthropic API key (required for Anthropic provider)
-- `HF_TOKEN` or `HUGGINGFACE_API_KEY` - HuggingFace API token (optional, can work without for public models)
-#### LLM Configuration Variables
-- `LLM_PROVIDER` - Provider to use: `"openai"`, `"anthropic"`, or `"huggingface"` (default: `"huggingface"`)
-- `OPENAI_MODEL` - OpenAI model name (default: `"gpt-5.1"`)
-- `ANTHROPIC_MODEL` - Anthropic model name (default: `"claude-sonnet-4-5-20250929"`)
-- `HUGGINGFACE_MODEL` - HuggingFace model ID (default: `"meta-llama/Llama-3.1-8B-Instruct"`)
-#### Embedding Configuration Variables
-- `EMBEDDING_PROVIDER` - Provider: `"openai"`, `"local"`, or `"huggingface"` (default: `"local"`)
-- `OPENAI_EMBEDDING_MODEL` - OpenAI embedding model (default: `"text-embedding-3-small"`)
-- `LOCAL_EMBEDDING_MODEL` - Local sentence-transformers model (default: `"all-MiniLM-L6-v2"`)
-- `HUGGINGFACE_EMBEDDING_MODEL` - HuggingFace embedding model (default: `"sentence-transformers/all-MiniLM-L6-v2"`)
-#### Web Search Configuration Variables
-- `WEB_SEARCH_PROVIDER` - Provider: `"serper"`, `"searchxng"`, `"brave"`, `"tavily"`, or `"duckduckgo"` (default: `"duckduckgo"`)
-- `SERPER_API_KEY` - Serper API key (required for Serper provider)
-- `SEARCHXNG_HOST` - SearchXNG host URL (required for SearchXNG provider)
-- `BRAVE_API_KEY` - Brave Search API key (required for Brave provider)
-- `TAVILY_API_KEY` - Tavily API key (required for Tavily provider)
-#### PubMed Configuration Variables
-- `NCBI_API_KEY` - NCBI API key (optional, increases rate limit from 3 to 10 req/sec)
-#### Agent Configuration Variables
-- `MAX_ITERATIONS` - Maximum iterations per research loop (1-50, default: `10`)
-- `SEARCH_TIMEOUT` - Search timeout in seconds (default: `30`)
-- `USE_GRAPH_EXECUTION` - Use graph-based execution (default: `false`)
-#### Budget Configuration Variables
-- `DEFAULT_TOKEN_LIMIT` - Default token budget per research loop (1000-1000000, default: `100000`)
-- `DEFAULT_TIME_LIMIT_MINUTES` - Default time limit in minutes (1-120, default: `10`)
-- `DEFAULT_ITERATIONS_LIMIT` - Default iterations limit (1-50, default: `10`)
-#### RAG Configuration Variables
-- `RAG_COLLECTION_NAME` - ChromaDB collection name (default: `"deepcritical_evidence"`)
-- `RAG_SIMILARITY_TOP_K` - Number of top results to retrieve (1-50, default: `5`)
-- `RAG_AUTO_INGEST` - Automatically ingest evidence into RAG (default: `true`)
-#### ChromaDB Configuration Variables
-- `CHROMA_DB_PATH` - ChromaDB storage path (default: `"./chroma_db"`)
-- `CHROMA_DB_PERSIST` - Whether to persist ChromaDB to disk (default: `true`)
-- `CHROMA_DB_HOST` - ChromaDB server host (optional, for remote ChromaDB)
-- `CHROMA_DB_PORT` - ChromaDB server port (optional, for remote ChromaDB)
-#### External Services Variables
-- `MODAL_TOKEN_ID` - Modal token ID (optional, for Modal sandbox execution)
-- `MODAL_TOKEN_SECRET` - Modal token secret (optional, for Modal sandbox execution)
-#### Logging Configuration Variables
-- `LOG_LEVEL` - Log level: `"DEBUG"`, `"INFO"`, `"WARNING"`, or `"ERROR"` (default: `"INFO"`)
-## Validation
-Settings are validated on load using Pydantic validation:
-- **Type Checking**: All fields are strongly typed
-- **Range Validation**: Numeric fields have min/max constraints (e.g., `ge=1, le=50` for `max_iterations`)
-- **Literal Validation**: Enum fields only accept specific values (e.g., `Literal["openai", "anthropic", "huggingface"]`)
-- **Required Fields**: API keys are checked when accessed via `get_api_key()` or `get_openai_api_key()`
-### Validation Examples
-The `max_iterations` field has range validation:
-<!--codeinclude-->
-[Max Iterations Validation](../src/utils/config.py) start_line:81 end_line:81
-<!--/codeinclude-->
-The `llm_provider` field has literal validation:
-<!--codeinclude-->
-[LLM Provider Literal Validation](../src/utils/config.py) start_line:26 end_line:28
-<!--/codeinclude-->
-## Error Handling
-Configuration errors raise `ConfigurationError` from `src/utils/exceptions.py`:
-```22:25:src/utils/exceptions.py
-class ConfigurationError(DeepCriticalError):
-    """Raised when configuration is invalid."""
-    pass
-```
-### Error Handling Example
-```python
-from src.utils.config import settings
-from src.utils.exceptions import ConfigurationError
-try:
-    api_key = settings.get_api_key()
-except ConfigurationError as e:
-    print(f"Configuration error: {e}")
-```
-### Common Configuration Errors
-1. **Missing API Key**: When `get_api_key()` is called but the required API key is not set
-2. **Invalid Provider**: When `llm_provider` is set to an unsupported value
-3. **Out of Range**: When numeric values exceed their min/max constraints
-4. **Invalid Literal**: When enum fields receive unsupported values
-## Configuration Best Practices
-1. **Use `.env` File**: Store sensitive keys in `.env` file (add to `.gitignore`)
-2. **Check Availability**: Use properties like `has_openai_key` before accessing API keys
-3. **Handle Errors**: Always catch `ConfigurationError` when calling `get_api_key()`
-4. **Validate Early**: Configuration is validated on import, so errors surface immediately
-5. **Use Defaults**: Leverage sensible defaults for optional configuration
-## Future Enhancements
-The following configurations are planned for future phases:
-1. **Additional LLM Providers**: DeepSeek, OpenRouter, Gemini, Perplexity, Azure OpenAI, Local models
-2. **Model Selection**: Reasoning/main/fast model configuration
-3. **Service Integration**: Additional service integrations and configurations

docs/contributing/code-quality.md DELETED Viewed

@@ -1,120 +0,0 @@
-# Code Quality & Documentation
-This document outlines code quality standards and documentation requirements for The DETERMINATOR.
-## Linting
-- Ruff with 100-char line length
-- Ignore rules documented in `pyproject.toml`:
-  - `PLR0913`: Too many arguments (agents need many params)
-  - `PLR0912`: Too many branches (complex orchestrator logic)
-  - `PLR0911`: Too many return statements (complex agent logic)
-  - `PLR2004`: Magic values (statistical constants)
-  - `PLW0603`: Global statement (singleton pattern)
-  - `PLC0415`: Lazy imports for optional dependencies
-  - `E402`: Module level import not at top (needed for pytest.importorskip)
-  - `E501`: Line too long (ignore line length violations)
-  - `RUF100`: Unused noqa (version differences between local/CI)
-## Type Checking
-- `mypy --strict` compliance
-- `ignore_missing_imports = true` (for optional dependencies)
-- Exclude: `reference_repos/`, `examples/`
-- All functions must have complete type annotations
-## Pre-commit
-Pre-commit hooks run automatically on commit to ensure code quality. Configuration is in `.pre-commit-config.yaml`.
-### Installation
-```bash
-# Install dependencies (includes pre-commit package)
-uv sync --all-extras
-# Set up git hooks (must be run separately)
-uv run pre-commit install
-```
-**Note**: `uv sync --all-extras` installs the pre-commit package, but you must run `uv run pre-commit install` separately to set up the git hooks.
-### Pre-commit Hooks
-The following hooks run automatically on commit:
-1. **ruff**: Lints code and fixes issues automatically
-   - Runs on: `src/` (excludes `tests/`, `reference_repos/`)
-   - Auto-fixes: Yes
-2. **ruff-format**: Formats code with ruff
-   - Runs on: `src/` (excludes `tests/`, `reference_repos/`)
-   - Auto-fixes: Yes
-3. **mypy**: Type checking
-   - Runs on: `src/` (excludes `folder/`)
-   - Additional dependencies: pydantic, pydantic-settings, tenacity, pydantic-ai
-4. **pytest-unit**: Runs unit tests (excludes OpenAI and embedding_provider tests)
-   - Runs: `tests/unit/` with `-m "not openai and not embedding_provider"`
-   - Always runs: Yes (not just on changed files)
-5. **pytest-local-embeddings**: Runs local embedding tests
-   - Runs: `tests/` with `-m "local_embeddings"`
-   - Always runs: Yes
-### Manual Pre-commit Run
-To run pre-commit hooks manually (without committing):
-```bash
-uv run pre-commit run --all-files
-```
-### Troubleshooting
-- **Hooks failing**: Fix the issues shown in the output, then commit again
-- **Skipping hooks**: Use `git commit --no-verify` (not recommended)
-- **Hook not running**: Ensure hooks are installed with `uv run pre-commit install`
-- **Type errors**: Check that all dependencies are installed with `uv sync --all-extras`
-## Documentation
-### Building Documentation
-Documentation is built using MkDocs. Source files are in `docs/`, and the configuration is in `mkdocs.yml`.
-```bash
-# Build documentation
-uv run mkdocs build
-# Serve documentation locally (http://127.0.0.1:8000)
-uv run mkdocs serve
-```
-The documentation site is published at: <https://deepcritical.github.io/GradioDemo/>
-### Docstrings
-- Google-style docstrings for all public functions
-- Include Args, Returns, Raises sections
-- Use type hints in docstrings only if needed for clarity
-Example:
-<!--codeinclude-->
-[Search Method Docstring Example](../src/tools/pubmed.py) start_line:51 end_line:70
-<!--/codeinclude-->
-### Code Comments
-- Explain WHY, not WHAT
-- Document non-obvious patterns (e.g., why `requests` not `httpx` for ClinicalTrials)
-- Mark critical sections: `# CRITICAL: ...`
-- Document rate limiting rationale
-- Explain async patterns when non-obvious
-## See Also
-- [Code Style](code-style.md) - Code style guidelines
-- [Testing](testing.md) - Testing guidelines

docs/contributing/code-style.md DELETED Viewed

@@ -1,83 +0,0 @@
-# Code Style & Conventions
-This document outlines the code style and conventions for The DETERMINATOR.
-## Package Manager
-This project uses [`uv`](https://github.com/astral-sh/uv) as the package manager. All commands should be prefixed with `uv run` to ensure they run in the correct environment.
-### Installation
-```bash
-# Install uv if you haven't already (recommended: standalone installer)
-# Unix/macOS/Linux:
-curl -LsSf https://astral.sh/uv/install.sh | sh
-# Windows (PowerShell):
-powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
-# Alternative: pipx install uv
-# Or: pip install uv
-# Sync all dependencies including dev extras
-uv sync --all-extras
-```
-### Running Commands
-All development commands should use `uv run` prefix:
-```bash
-# Instead of: pytest tests/
-uv run pytest tests/
-# Instead of: ruff check src
-uv run ruff check src
-# Instead of: mypy src
-uv run mypy src
-```
-This ensures commands run in the correct virtual environment managed by `uv`.
-## Type Safety
-- **ALWAYS** use type hints for all function parameters and return types
-- Use `mypy --strict` compliance (no `Any` unless absolutely necessary)
-- Use `TYPE_CHECKING` imports for circular dependencies:
-<!--codeinclude-->
-[TYPE_CHECKING Import Pattern](../src/utils/citation_validator.py) start_line:8 end_line:11
-<!--/codeinclude-->
-## Pydantic Models
-- All data exchange uses Pydantic models (`src/utils/models.py`)
-- Models are frozen (`model_config = {"frozen": True}`) for immutability
-- Use `Field()` with descriptions for all model fields
-- Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints
-## Async Patterns
-- **ALL** I/O operations must be async (`async def`, `await`)
-- Use `asyncio.gather()` for parallel operations
-- CPU-bound work (embeddings, parsing) must use `run_in_executor()`:
-```python
-loop = asyncio.get_running_loop()
-result = await loop.run_in_executor(None, cpu_bound_function, args)
-```
-- Never block the event loop with synchronous I/O
-## Common Pitfalls
-1. **Blocking the event loop**: Never use sync I/O in async functions
-2. **Missing type hints**: All functions must have complete type annotations
-3. **Global mutable state**: Use ContextVar or pass via parameters
-4. **Import errors**: Lazy-load optional dependencies (magentic, modal, embeddings)
-## See Also
-- [Error Handling](error-handling.md) - Error handling guidelines
-- [Implementation Patterns](implementation-patterns.md) - Common patterns

docs/contributing/error-handling.md DELETED Viewed

@@ -1,54 +0,0 @@
-# Error Handling & Logging
-This document outlines error handling and logging conventions for The DETERMINATOR.
-## Exception Hierarchy
-Use custom exception hierarchy (`src/utils/exceptions.py`):
-<!--codeinclude-->
-[Exception Hierarchy](../src/utils/exceptions.py) start_line:4 end_line:31
-<!--/codeinclude-->
-## Error Handling Rules
-- Always chain exceptions: `raise SearchError(...) from e`
-- Log errors with context using `structlog`:
-```python
-logger.error("Operation failed", error=str(e), context=value)
-```
-- Never silently swallow exceptions
-- Provide actionable error messages
-## Logging
-- Use `structlog` for all logging (NOT `print` or `logging`)
-- Import: `import structlog; logger = structlog.get_logger()`
-- Log with structured data: `logger.info("event", key=value)`
-- Use appropriate levels: DEBUG, INFO, WARNING, ERROR
-## Logging Examples
-```python
-logger.info("Starting search", query=query, tools=[t.name for t in tools])
-logger.warning("Search tool failed", tool=tool.name, error=str(result))
-logger.error("Assessment failed", error=str(e))
-```
-## Error Chaining
-Always preserve exception context:
-```python
-try:
-    result = await api_call()
-except httpx.HTTPError as e:
-    raise SearchError(f"API call failed: {e}") from e
-```
-## See Also
-- [Code Style](code-style.md) - Code style guidelines
-- [Testing](testing.md) - Testing guidelines

docs/contributing/implementation-patterns.md DELETED Viewed

@@ -1,67 +0,0 @@
-# Implementation Patterns
-This document outlines common implementation patterns used in The DETERMINATOR.
-## Search Tools
-All tools implement `SearchTool` protocol (`src/tools/base.py`):
-- Must have `name` property
-- Must implement `async def search(query, max_results) -> list[Evidence]`
-- Use `@retry` decorator from tenacity for resilience
-- Rate limiting: Implement `_rate_limit()` for APIs with limits (e.g., PubMed)
-- Error handling: Raise `SearchError` or `RateLimitError` on failures
-Example pattern:
-```python
-class MySearchTool:
-    @property
-    def name(self) -> str:
-        return "mytool"
-    @retry(stop=stop_after_attempt(3), wait=wait_exponential(...))
-    async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
-        # Implementation
-        return evidence_list
-```
-## Judge Handlers
-- Implement `JudgeHandlerProtocol` (`async def assess(question, evidence) -> JudgeAssessment`)
-- Use pydantic-ai `Agent` with `output_type=JudgeAssessment`
-- System prompts in `src/prompts/judge.py`
-- Support fallback handlers: `MockJudgeHandler`, `HFInferenceJudgeHandler`
-- Always return valid `JudgeAssessment` (never raise exceptions)
-## Agent Factory Pattern
-- Use factory functions for creating agents (`src/agent_factory/`)
-- Lazy initialization for optional dependencies (e.g., embeddings, Modal)
-- Check requirements before initialization:
-<!--codeinclude-->
-[Check Magentic Requirements](../src/utils/llm_factory.py) start_line:152 end_line:170
-<!--/codeinclude-->
-## State Management
-- **Magentic Mode**: Use `ContextVar` for thread-safe state (`src/agents/state.py`)
-- **Simple Mode**: Pass state via function parameters
-- Never use global mutable state (except singletons via `@lru_cache`)
-## Singleton Pattern
-Use `@lru_cache(maxsize=1)` for singletons:
-<!--codeinclude-->
-[Singleton Pattern Example](../src/services/statistical_analyzer.py) start_line:252 end_line:255
-<!--/codeinclude-->
-- Lazy initialization to avoid requiring dependencies at import time
-## See Also
-- [Code Style](code-style.md) - Code style guidelines
-- [Error Handling](error-handling.md) - Error handling guidelines

docs/contributing/index.md DELETED Viewed

@@ -1,254 +0,0 @@
-# Contributing to The DETERMINATOR
-Thank you for your interest in contributing to The DETERMINATOR! This guide will help you get started.
-> **Note on Project Names**: "The DETERMINATOR" is the product name, "DeepCritical" is the organization/project name, and "determinator" is the Python package name.
-## Git Workflow
-- `main`: Production-ready (GitHub)
-- `dev`: Development integration (GitHub)
-- Use feature branches: `yourname-dev`
-- **NEVER** push directly to `main` or `dev` on HuggingFace
-- GitHub is source of truth; HuggingFace is for deployment
-## Repository Information
-- **GitHub Repository**: [`DeepCritical/GradioDemo`](https://github.com/DeepCritical/GradioDemo) (source of truth, PRs, code review)
-- **HuggingFace Space**: [`DataQuests/DeepCritical`](https://huggingface.co/spaces/DataQuests/DeepCritical) (deployment/demo)
-- **Package Name**: `determinator` (Python package name in `pyproject.toml`)
-### Dual Repository Setup
-This project uses a dual repository setup:
-- **GitHub (`DeepCritical/GradioDemo`)**: Source of truth for code, PRs, and code review
-- **HuggingFace (`DataQuests/DeepCritical`)**: Deployment target for the Gradio demo
-#### Remote Configuration
-When cloning, set up remotes as follows:
-```bash
-# Clone from GitHub
-git clone https://github.com/DeepCritical/GradioDemo.git
-cd GradioDemo
-# Add HuggingFace remote (optional, for deployment)
-git remote add huggingface-upstream https://huggingface.co/spaces/DataQuests/DeepCritical
-```
-**Important**: Never push directly to `main` or `dev` on HuggingFace. Always work through GitHub PRs. GitHub is the source of truth; HuggingFace is for deployment/demo only.
-## Package Manager
-This project uses [`uv`](https://github.com/astral-sh/uv) as the package manager. All commands should be prefixed with `uv run` to ensure they run in the correct environment.
-### Installation
-```bash
-# Install uv if you haven't already (recommended: standalone installer)
-# Unix/macOS/Linux:
-curl -LsSf https://astral.sh/uv/install.sh | sh
-# Windows (PowerShell):
-powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
-# Alternative: pipx install uv
-# Or: pip install uv
-# Sync all dependencies including dev extras
-uv sync --all-extras
-# Install pre-commit hooks
-uv run pre-commit install
-```
-## Development Commands
-```bash
-# Installation
-uv sync --all-extras              # Install all dependencies including dev
-uv run pre-commit install          # Install pre-commit hooks
-# Code Quality Checks (run all before committing)
-uv run ruff check src tests       # Lint with ruff
-uv run ruff format src tests      # Format with ruff
-uv run mypy src                   # Type checking
-uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire  # Tests with coverage
-# Testing Commands
-uv run pytest tests/unit/ -v -m "not openai" -p no:logfire              # Run unit tests (excludes OpenAI tests)
-uv run pytest tests/ -v -m "huggingface" -p no:logfire                 # Run HuggingFace tests
-uv run pytest tests/ -v -p no:logfire                                  # Run all tests
-uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire  # Tests with terminal coverage
-uv run pytest --cov=src --cov-report=html -p no:logfire                # Generate HTML coverage report (opens htmlcov/index.html)
-# Documentation Commands
-uv run mkdocs build                # Build documentation
-uv run mkdocs serve                # Serve documentation locally (http://127.0.0.1:8000)
-```
-### Test Markers
-The project uses pytest markers to categorize tests. See [Testing Guidelines](testing.md) for details:
-- `unit`: Unit tests (mocked, fast)
-- `integration`: Integration tests (real APIs)
-- `slow`: Slow tests
-- `openai`: Tests requiring OpenAI API key
-- `huggingface`: Tests requiring HuggingFace API key
-- `embedding_provider`: Tests requiring API-based embedding providers
-- `local_embeddings`: Tests using local embeddings
-**Note**: The `-p no:logfire` flag disables the logfire plugin to avoid conflicts during testing.
-## Getting Started
-1. **Fork the repository** on GitHub: [`DeepCritical/GradioDemo`](https://github.com/DeepCritical/GradioDemo)
-2. **Clone your fork**:
-   ```bash
-   git clone https://github.com/yourusername/GradioDemo.git
-   cd GradioDemo
-   ```
-3. **Install dependencies**:
-   ```bash
-   uv sync --all-extras
-   uv run pre-commit install
-   ```
-4. **Create a feature branch**:
-   ```bash
-   git checkout -b yourname-feature-name
-   ```
-5. **Make your changes** following the guidelines below
-6. **Run checks**:
-   ```bash
-   uv run ruff check src tests
-   uv run mypy src
-   uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire
-   ```
-7. **Commit and push**:
-   ```bash
-   git commit -m "Description of changes"
-   git push origin yourname-feature-name
-   ```
-8. **Create a pull request** on GitHub
-## Development Guidelines
-### Code Style
-- Follow [Code Style Guidelines](code-style.md)
-- All code must pass `mypy --strict`
-- Use `ruff` for linting and formatting
-- Line length: 100 characters
-### Error Handling
-- Follow [Error Handling Guidelines](error-handling.md)
-- Always chain exceptions: `raise SearchError(...) from e`
-- Use structured logging with `structlog`
-- Never silently swallow exceptions
-### Testing
-- Follow [Testing Guidelines](testing.md)
-- Write tests before implementation (TDD)
-- Aim for >80% coverage on critical paths
-- Use markers: `unit`, `integration`, `slow`
-### Implementation Patterns
-- Follow [Implementation Patterns](implementation-patterns.md)
-- Use factory functions for agent/tool creation
-- Implement protocols for extensibility
-- Use singleton pattern with `@lru_cache(maxsize=1)`
-### Prompt Engineering
-- Follow [Prompt Engineering Guidelines](prompt-engineering.md)
-- Always validate citations
-- Use diverse evidence selection
-- Never trust LLM-generated citations without validation
-### Code Quality
-- Follow [Code Quality Guidelines](code-quality.md)
-- Google-style docstrings for all public functions
-- Explain WHY, not WHAT in comments
-- Mark critical sections: `# CRITICAL: ...`
-## MCP Integration
-### MCP Tools
-- Functions in `src/mcp_tools.py` for Claude Desktop
-- Full type hints required
-- Google-style docstrings with Args/Returns sections
-- Formatted string returns (markdown)
-### Gradio MCP Server
-- Enable with `mcp_server=True` in `demo.launch()`
-- Endpoint: `/gradio_api/mcp/`
-- Use `ssr_mode=False` to fix hydration issues in HF Spaces
-## Common Pitfalls
-1. **Blocking the event loop**: Never use sync I/O in async functions
-2. **Missing type hints**: All functions must have complete type annotations
-3. **Hallucinated citations**: Always validate references
-4. **Global mutable state**: Use ContextVar or pass via parameters
-5. **Import errors**: Lazy-load optional dependencies (magentic, modal, embeddings)
-6. **Rate limiting**: Always implement for external APIs
-7. **Error chaining**: Always use `from e` when raising exceptions
-## Key Principles
-1. **Type Safety First**: All code must pass `mypy --strict`
-2. **Async Everything**: All I/O must be async
-3. **Test-Driven**: Write tests before implementation
-4. **No Hallucinations**: Validate all citations
-5. **Graceful Degradation**: Support free tier (HF Inference) when no API keys
-6. **Lazy Loading**: Don't require optional dependencies at import time
-7. **Structured Logging**: Use structlog, never print()
-8. **Error Chaining**: Always preserve exception context
-## Pull Request Process
-1. Ensure all checks pass: `uv run ruff check src tests && uv run mypy src && uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire`
-2. Update documentation if needed
-3. Add tests for new features
-4. Update CHANGELOG if applicable
-5. Request review from maintainers
-6. Address review feedback
-7. Wait for approval before merging
-## Project Structure
-- `src/`: Main source code
-- `tests/`: Test files (`unit/` and `integration/`)
-- `docs/`: Documentation source files (MkDocs)
-- `examples/`: Example usage scripts
-- `pyproject.toml`: Project configuration and dependencies
-- `.pre-commit-config.yaml`: Pre-commit hook configuration
-## Questions?
-- Open an issue on [GitHub](https://github.com/DeepCritical/GradioDemo)
-- Check existing [documentation](https://deepcritical.github.io/GradioDemo/)
-- Review code examples in the codebase
-Thank you for contributing to The DETERMINATOR!

docs/contributing/prompt-engineering.md DELETED Viewed

@@ -1,55 +0,0 @@
-# Prompt Engineering & Citation Validation
-This document outlines prompt engineering guidelines and citation validation rules.
-## Judge Prompts
-- System prompt in `src/prompts/judge.py`
-- Format evidence with truncation (1500 chars per item)
-- Handle empty evidence case separately
-- Always request structured JSON output
-- Use `format_user_prompt()` and `format_empty_evidence_prompt()` helpers
-## Hypothesis Prompts
-- Use diverse evidence selection (MMR algorithm)
-- Sentence-aware truncation (`truncate_at_sentence()`)
-- Format: Drug → Target → Pathway → Effect
-- System prompt emphasizes mechanistic reasoning
-- Use `format_hypothesis_prompt()` with embeddings for diversity
-## Report Prompts
-- Include full citation details for validation
-- Use diverse evidence selection (n=20)
-- **CRITICAL**: Emphasize citation validation rules
-- Format hypotheses with support/contradiction counts
-- System prompt includes explicit JSON structure requirements
-## Citation Validation
-- **ALWAYS** validate references before returning reports
-- Use `validate_references()` from `src/utils/citation_validator.py`
-- Remove hallucinated citations (URLs not in evidence)
-- Log warnings for removed citations
-- Never trust LLM-generated citations without validation
-## Citation Validation Rules
-1. Every reference URL must EXACTLY match a provided evidence URL
-2. Do NOT invent, fabricate, or hallucinate any references
-3. Do NOT modify paper titles, authors, dates, or URLs
-4. If unsure about a citation, OMIT it rather than guess
-5. Copy URLs exactly as provided - do not create similar-looking URLs
-## Evidence Selection
-- Use `select_diverse_evidence()` for MMR-based selection
-- Balance relevance vs diversity (lambda=0.7 default)
-- Sentence-aware truncation preserves meaning
-- Limit evidence per prompt to avoid context overflow
-## See Also
-- [Code Quality](code-quality.md) - Code quality guidelines
-- [Error Handling](error-handling.md) - Error handling guidelines

docs/contributing/testing.md DELETED Viewed

@@ -1,115 +0,0 @@
-# Testing Requirements
-This document outlines testing requirements and guidelines for The DETERMINATOR.
-## Test Structure
-- Unit tests in `tests/unit/` (mocked, fast)
-- Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`)
-- Use markers: `unit`, `integration`, `slow`, `openai`, `huggingface`, `embedding_provider`, `local_embeddings`
-## Test Markers
-The project uses pytest markers to categorize tests. These markers are defined in `pyproject.toml`:
-- `@pytest.mark.unit`: Unit tests (mocked, fast) - Run with `-m "unit"`
-- `@pytest.mark.integration`: Integration tests (real APIs) - Run with `-m "integration"`
-- `@pytest.mark.slow`: Slow tests - Run with `-m "slow"`
-- `@pytest.mark.openai`: Tests requiring OpenAI API key - Run with `-m "openai"` or exclude with `-m "not openai"`
-- `@pytest.mark.huggingface`: Tests requiring HuggingFace API key or using HuggingFace models - Run with `-m "huggingface"`
-- `@pytest.mark.embedding_provider`: Tests requiring API-based embedding providers (OpenAI, etc.) - Run with `-m "embedding_provider"`
-- `@pytest.mark.local_embeddings`: Tests using local embeddings (sentence-transformers, ChromaDB) - Run with `-m "local_embeddings"`
-### Running Tests by Marker
-```bash
-# Run only unit tests (excludes OpenAI tests by default)
-uv run pytest tests/unit/ -v -m "not openai" -p no:logfire
-# Run HuggingFace tests
-uv run pytest tests/ -v -m "huggingface" -p no:logfire
-# Run all tests
-uv run pytest tests/ -v -p no:logfire
-# Run only local embedding tests
-uv run pytest tests/ -v -m "local_embeddings" -p no:logfire
-# Exclude slow tests
-uv run pytest tests/ -v -m "not slow" -p no:logfire
-```
-**Note**: The `-p no:logfire` flag disables the logfire plugin to avoid conflicts during testing.
-## Mocking
-- Use `respx` for httpx mocking
-- Use `pytest-mock` for general mocking
-- Mock LLM calls in unit tests (use `MockJudgeHandler`)
-- Fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`
-## TDD Workflow
-1. Write failing test in `tests/unit/`
-2. Implement in `src/`
-3. Ensure test passes
-4. Run checks: `uv run ruff check src tests && uv run mypy src && uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire`
-### Test Command Examples
-```bash
-# Run unit tests (default, excludes OpenAI tests)
-uv run pytest tests/unit/ -v -m "not openai" -p no:logfire
-# Run HuggingFace tests
-uv run pytest tests/ -v -m "huggingface" -p no:logfire
-# Run all tests
-uv run pytest tests/ -v -p no:logfire
-```
-## Test Examples
-```python
-@pytest.mark.unit
-async def test_pubmed_search(mock_httpx_client):
-    tool = PubMedTool()
-    results = await tool.search("metformin", max_results=5)
-    assert len(results) > 0
-    assert all(isinstance(r, Evidence) for r in results)
-@pytest.mark.integration
-async def test_real_pubmed_search():
-    tool = PubMedTool()
-    results = await tool.search("metformin", max_results=3)
-    assert len(results) <= 3
-```
-## Test Coverage
-### Terminal Coverage Report
-```bash
-uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire
-```
-This shows coverage with missing lines highlighted in the terminal output.
-### HTML Coverage Report
-```bash
-uv run pytest --cov=src --cov-report=html -p no:logfire
-```
-This generates an HTML coverage report in `htmlcov/index.html`. Open this file in your browser to see detailed coverage information.
-### Coverage Goals
-- Aim for >80% coverage on critical paths
-- Exclude: `__init__.py`, `TYPE_CHECKING` blocks
-- Coverage configuration is in `pyproject.toml` under `[tool.coverage.*]`
-## See Also
-- [Code Style](code-style.md) - Code style guidelines
-- [Implementation Patterns](implementation-patterns.md) - Common patterns

docs/getting-started/examples.md DELETED Viewed

@@ -1,198 +0,0 @@
-# Examples
-This page provides examples of using The DETERMINATOR for various research tasks.
-## Basic Research Query
-### Example 1: Drug Information
-**Query**:
-```
-What are the latest treatments for Alzheimer's disease?
-```
-**What The DETERMINATOR Does**:
-1. Searches PubMed for recent papers
-2. Searches ClinicalTrials.gov for active trials
-3. Evaluates evidence quality
-4. Synthesizes findings into a comprehensive report
-### Example 2: Clinical Trial Search
-**Query**:
-```
-What clinical trials are investigating metformin for cancer prevention?
-```
-**What The DETERMINATOR Does**:
-1. Searches ClinicalTrials.gov for relevant trials
-2. Searches PubMed for supporting literature
-3. Provides trial details and status
-4. Summarizes findings
-## Advanced Research Queries
-### Example 3: Comprehensive Review
-**Query**:
-```
-Review the evidence for using metformin as an anti-aging intervention,
-including clinical trials, mechanisms of action, and safety profile.
-```
-**What The DETERMINATOR Does**:
-1. Uses deep research mode (multi-section)
-2. Searches multiple sources in parallel
-3. Generates sections on:
-   - Clinical trials
-   - Mechanisms of action
-   - Safety profile
-4. Synthesizes comprehensive report
-### Example 4: Hypothesis Testing
-**Query**:
-```
-Test the hypothesis that regular exercise reduces Alzheimer's disease risk.
-```
-**What The DETERMINATOR Does**:
-1. Generates testable hypotheses
-2. Searches for supporting/contradicting evidence
-3. Performs statistical analysis (if Modal configured)
-4. Provides verdict: SUPPORTED, REFUTED, or INCONCLUSIVE
-## MCP Tool Examples
-### Using search_pubmed
-```
-Search PubMed for "CRISPR gene editing cancer therapy"
-```
-### Using search_clinical_trials
-```
-Find active clinical trials for "diabetes type 2 treatment"
-```
-### Using search_all
-```
-Search all sources for "COVID-19 vaccine side effects"
-```
-### Using analyze_hypothesis
-```
-Analyze whether vitamin D supplementation reduces COVID-19 severity
-```
-## Code Examples
-### Python API Usage
-```python
-from src.orchestrator_factory import create_orchestrator
-from src.tools.search_handler import SearchHandler
-from src.agent_factory.judges import create_judge_handler
-# Create orchestrator
-search_handler = SearchHandler()
-judge_handler = create_judge_handler()
-```
-<!--codeinclude-->
-[Create Orchestrator](../src/orchestrator_factory.py) start_line:44 end_line:66
-<!--/codeinclude-->
-```python
-# Run research query
-query = "What are the latest treatments for Alzheimer's disease?"
-async for event in orchestrator.run(query):
-    print(f"Event: {event.type} - {event.data}")
-```
-### Gradio UI Integration
-```python
-import gradio as gr
-from src.app import create_research_interface
-# Create interface
-interface = create_research_interface()
-# Launch
-interface.launch(server_name="0.0.0.0", server_port=7860)
-```
-## Research Patterns
-### Iterative Research
-Single-loop research with search-judge-synthesize cycles:
-```python
-from src.orchestrator.research_flow import IterativeResearchFlow
-```
-<!--codeinclude-->
-[IterativeResearchFlow Initialization](../src/orchestrator/research_flow.py) start_line:56 end_line:77
-<!--/codeinclude-->
-```python
-async for event in flow.run(query):
-    # Handle events
-    pass
-```
-### Deep Research
-Multi-section parallel research:
-```python
-from src.orchestrator.research_flow import DeepResearchFlow
-```
-<!--codeinclude-->
-[DeepResearchFlow Initialization](../src/orchestrator/research_flow.py) start_line:674 end_line:697
-<!--/codeinclude-->
-```python
-async for event in flow.run(query):
-    # Handle events
-    pass
-```
-## Configuration Examples
-### Basic Configuration
-```bash
-# .env file
-LLM_PROVIDER=openai
-OPENAI_API_KEY=your_key_here
-MAX_ITERATIONS=10
-```
-### Advanced Configuration
-```bash
-# .env file
-LLM_PROVIDER=anthropic
-ANTHROPIC_API_KEY=your_key_here
-EMBEDDING_PROVIDER=local
-WEB_SEARCH_PROVIDER=duckduckgo
-MAX_ITERATIONS=20
-DEFAULT_TOKEN_LIMIT=200000
-USE_GRAPH_EXECUTION=true
-```
-## Next Steps
-- Read the [Configuration Guide](../configuration/index.md) for all options
-- Explore the [Architecture Documentation](../architecture/graph_orchestration.md)
-- Check out the [API Reference](../api/agents.md) for programmatic usage

docs/getting-started/installation.md DELETED Viewed

@@ -1,152 +0,0 @@
-# Installation
-This guide will help you install and set up DeepCritical on your system.
-## Prerequisites
-- Python 3.11 or higher
-- `uv` package manager (recommended) or `pip`
-- At least one LLM API key (OpenAI, Anthropic, or HuggingFace)
-## Installation Steps
-### 1. Install uv (Recommended)
-`uv` is a fast Python package installer and resolver. Install it using the standalone installer (recommended):
-**Unix/macOS/Linux:**
-```bash
-curl -LsSf https://astral.sh/uv/install.sh | sh
-```
-**Windows (PowerShell):**
-```powershell
-powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
-```
-**Alternative methods:**
-```bash
-# Using pipx (recommended if you have pipx installed)
-pipx install uv
-# Or using pip
-pip install uv
-```
-After installation, restart your terminal or add `~/.cargo/bin` to your PATH.
-### 2. Clone the Repository
-```bash
-git clone https://github.com/DeepCritical/GradioDemo.git
-cd GradioDemo
-```
-### 3. Install Dependencies
-Using `uv` (recommended):
-```bash
-uv sync
-```
-Using `pip`:
-```bash
-pip install -e .
-```
-### 4. Install Optional Dependencies
-For embeddings support (local sentence-transformers):
-```bash
-uv sync --extra embeddings
-```
-For Modal sandbox execution:
-```bash
-uv sync --extra modal
-```
-For Magentic orchestration:
-```bash
-uv sync --extra magentic
-```
-Install all extras:
-```bash
-uv sync --all-extras
-```
-### 5. Configure Environment Variables
-Create a `.env` file in the project root:
-```bash
-# Required: At least one LLM provider
-LLM_PROVIDER=openai  # or "anthropic" or "huggingface"
-OPENAI_API_KEY=your_openai_api_key_here
-# Optional: Other services
-NCBI_API_KEY=your_ncbi_api_key_here  # For higher PubMed rate limits
-MODAL_TOKEN_ID=your_modal_token_id
-MODAL_TOKEN_SECRET=your_modal_token_secret
-```
-See the [Configuration Guide](../configuration/index.md) for all available options.
-### 6. Verify Installation
-Run the application:
-```bash
-uv run gradio run src/app.py
-```
-Open your browser to `http://localhost:7860` to verify the installation.
-## Development Setup
-For development, install dev dependencies:
-```bash
-uv sync --all-extras --dev
-```
-Install pre-commit hooks:
-```bash
-uv run pre-commit install
-```
-## Troubleshooting
-### Common Issues
-**Import Errors**:
-- Ensure you've installed all required dependencies
-- Check that Python 3.11+ is being used
-**API Key Errors**:
-- Verify your `.env` file is in the project root
-- Check that API keys are correctly formatted
-- Ensure at least one LLM provider is configured
-**Module Not Found**:
-- Run `uv sync` or `pip install -e .` again
-- Check that you're in the correct virtual environment
-**Port Already in Use**:
-- Change the port in `src/app.py` or use environment variable
-- Kill the process using port 7860
-## Next Steps
-- Read the [Quick Start Guide](quick-start.md)
-- Learn about [MCP Integration](mcp-integration.md)
-- Explore [Examples](examples.md)