Markus's picture

Markus PRO

marksverdhei

NbAiLab

·

marksverdhei

AI & ML interests

NLP

Recent Activity

reacted to their post with 🔥 about 18 hours ago

Poll: Will 2026 be the year of subquadratic attention? The transformer architecture is cursed by its computational complexity. It is why you run out of tokens and have to compact. But some would argue that this is a feature not a bug and that this is also why these models are so good. We've been doing a lot of research on trying to make equally good models that are computationally cheaper, But so far, none of the approaches have stood the test of time. Or so it seems. Please vote, don't be shy. Remember that the Dunning-Kruger effect is very real, so the person who knows less about transformers than you is going to vote. We want everyone's opinion, no matter confidence. 👍 if you think at least one frontier model* will have no O(n^2) attention by the end of 2026 🔥 If you disagree * Frontier models - models that match / outperform the flagship claude, gemini or chatgpt at the time on multiple popular benchmarks

reacted to their post with 👍 about 18 hours ago

Poll: Will 2026 be the year of subquadratic attention? The transformer architecture is cursed by its computational complexity. It is why you run out of tokens and have to compact. But some would argue that this is a feature not a bug and that this is also why these models are so good. We've been doing a lot of research on trying to make equally good models that are computationally cheaper, But so far, none of the approaches have stood the test of time. Or so it seems. Please vote, don't be shy. Remember that the Dunning-Kruger effect is very real, so the person who knows less about transformers than you is going to vote. We want everyone's opinion, no matter confidence. 👍 if you think at least one frontier model* will have no O(n^2) attention by the end of 2026 🔥 If you disagree * Frontier models - models that match / outperform the flagship claude, gemini or chatgpt at the time on multiple popular benchmarks

posted an update about 18 hours ago

Poll: Will 2026 be the year of subquadratic attention? The transformer architecture is cursed by its computational complexity. It is why you run out of tokens and have to compact. But some would argue that this is a feature not a bug and that this is also why these models are so good. We've been doing a lot of research on trying to make equally good models that are computationally cheaper, But so far, none of the approaches have stood the test of time. Or so it seems. Please vote, don't be shy. Remember that the Dunning-Kruger effect is very real, so the person who knows less about transformers than you is going to vote. We want everyone's opinion, no matter confidence. 👍 if you think at least one frontier model* will have no O(n^2) attention by the end of 2026 🔥 If you disagree * Frontier models - models that match / outperform the flagship claude, gemini or chatgpt at the time on multiple popular benchmarks

View all activity

Organizations

marksverdhei 's datasets 6

marksverdhei/foo128k

Viewer • Updated Nov 13, 2025 • 1 • 2

marksverdhei/wordnet-definitions-en-2021

Viewer • Updated May 23, 2025 • 43.8k • 94 • 11

marksverdhei/data-by-countries

Viewer • Updated Nov 24, 2024 • 201 • 5 • 1

marksverdhei/hdi-ihdi-democracy-by-country

Viewer • Updated Nov 23, 2024 • 194 • 1.29k

marksverdhei/reddit-syac-urls

Viewer • Updated Jun 7, 2022 • 8.61k • 32

marksverdhei/clickbait_title_classification

Viewer • Updated Mar 29, 2022 • 32k • 44 • 6