EthicsMH: A Pilot Benchmark for Ethical Reasoning in Mental Health AI Paper • 2509.11648 • Published Sep 15 • 1
D-HUMOR: Dark Humor Understanding via Multimodal Open-ended Reasoning Paper • 2509.06771 • Published Sep 8 • 5
Query Attribute Modeling: Improving search relevance with Semantic Search and Meta Data Filtering Paper • 2508.04683 • Published Aug 6
DSBC : Data Science task Benchmarking with Context engineering Paper • 2507.23336 • Published Jul 31 • 2
NeurIPS 2025 E2LM Competition : Early Training Evaluation of Language Models Paper • 2506.07731 • Published Jun 9 • 2
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance Paper • 2507.22448 • Published Jul 30 • 66
Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home Paper • 2501.12835 • Published Jan 22 • 4
LLM-Independent Adaptive RAG: Let the Question Speak for Itself Paper • 2505.04253 • Published May 7 • 14
Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA Paper • 2505.21115 • Published May 27 • 140
SAEs $\textit{Can}$ Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs Paper • 2504.08192 • Published Apr 11 • 3
Position: Mechanistic Interpretability Should Prioritize Feature Consistency in SAEs Paper • 2505.20254 • Published May 26 • 5
Uncovering Cultural Representation Disparities in Vision-Language Models Paper • 2505.14729 • Published May 20 • 1
Robust and Fine-Grained Detection of AI Generated Texts Paper • 2504.11952 • Published Apr 16 • 12
Improving Multilingual Capabilities with Cultural and Local Knowledge in Large Language Models While Enhancing Native Performance Paper • 2504.09753 • Published Apr 13 • 6
Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation Paper • 2504.07072 • Published Apr 9 • 9
Class Incremental Learning via Likelihood Ratio Based Task Prediction Paper • 2309.15048 • Published Sep 26, 2023
MCU: A Task-centric Framework for Open-ended Agent Evaluation in Minecraft Paper • 2310.08367 • Published Oct 12, 2023 • 1
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models Paper • 2311.05997 • Published Nov 10, 2023 • 37
Selecting Large Language Model to Fine-tune via Rectified Scaling Law Paper • 2402.02314 • Published Feb 4, 2024 • 2