SWE-bench: Can Language Models Resolve Real-World GitHub Issues? Paper • 2310.06770 • Published Oct 10, 2023 • 9
InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback Paper • 2306.14898 • Published Jun 26, 2023
DevBench: A Comprehensive Benchmark for Software Development Paper • 2403.08604 • Published Mar 13, 2024 • 2
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents Paper • 2207.01206 • Published Jul 4, 2022 • 3
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering Paper • 2405.15793 • Published May 6, 2024 • 7
Focus on what matters: Applying Discourse Coherence Theory to Cross Document Coreference Paper • 2110.05362 • Published Oct 11, 2021
Mind the Gap! Static and Interactive Evaluations of Large Audio Models Paper • 2502.15919 • Published Feb 21 • 4
Distilling an End-to-End Voice Assistant Without Instruction Training Data Paper • 2410.02678 • Published Oct 3, 2024 • 23
Can Large Language Models Transform Computational Social Science? Paper • 2305.03514 • Published Apr 12, 2023 • 1
Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers Paper • 2210.05709 • Published Oct 11, 2022
DADA: Dialect Adaptation via Dynamic Aggregation of Linguistic Rules Paper • 2305.13406 • Published May 22, 2023
Unintended Impacts of LLM Alignment on Global Representation Paper • 2402.15018 • Published Feb 22, 2024 • 1
DAMP: Doubly Aligned Multilingual Parser for Task-Oriented Dialogue Paper • 2212.08054 • Published Dec 15, 2022
Task-Agnostic Low-Rank Adapters for Unseen English Dialects Paper • 2311.00915 • Published Nov 2, 2023 • 1