Ebisu: Benchmarking Large Language Models in Japanese Finance Paper • 2602.01479 • Published 2 days ago • 16
FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs Paper • 2510.08886 • Published Oct 10, 2025 • 20
When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents Paper • 2510.11695 • Published Oct 13, 2025 • 2
FinCriticalED: A Visual Benchmark for Financial Fact-Level OCR Evaluation Paper • 2511.14998 • Published Nov 19, 2025
Ebisu: Benchmarking Large Language Models in Japanese Finance Paper • 2602.01479 • Published 2 days ago • 16
When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents Paper • 2510.11695 • Published Oct 13, 2025 • 2
Ebisu: Benchmarking Large Language Models in Japanese Finance Paper • 2602.01479 • Published 2 days ago • 16
All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection Paper • 2601.04160 • Published 27 days ago • 4
Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection Paper • 2601.05403 • Published 26 days ago • 10
The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models Paper • 2601.03425 • Published 28 days ago • 16