Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
260.9
TFLOPS
52
42
397
Markus
PRO
marksverdhei
Follow
NikolaSigmoid's profile picture
Mi6paulino's profile picture
kramp's profile picture
33 followers
ยท
88 following
marksverdhei
AI & ML interests
NLP
Recent Activity
reacted
to
their
post
with ๐ฅ
about 18 hours ago
Poll: Will 2026 be the year of subquadratic attention? The transformer architecture is cursed by its computational complexity. It is why you run out of tokens and have to compact. But some would argue that this is a feature not a bug and that this is also why these models are so good. We've been doing a lot of research on trying to make equally good models that are computationally cheaper, But so far, none of the approaches have stood the test of time. Or so it seems. Please vote, don't be shy. Remember that the Dunning-Kruger effect is very real, so the person who knows less about transformers than you is going to vote. We want everyone's opinion, no matter confidence. ๐ if you think at least one frontier model* will have no O(n^2) attention by the end of 2026 ๐ฅ If you disagree * Frontier models - models that match / outperform the flagship claude, gemini or chatgpt at the time on multiple popular benchmarks
reacted
to
their
post
with ๐
about 18 hours ago
Poll: Will 2026 be the year of subquadratic attention? The transformer architecture is cursed by its computational complexity. It is why you run out of tokens and have to compact. But some would argue that this is a feature not a bug and that this is also why these models are so good. We've been doing a lot of research on trying to make equally good models that are computationally cheaper, But so far, none of the approaches have stood the test of time. Or so it seems. Please vote, don't be shy. Remember that the Dunning-Kruger effect is very real, so the person who knows less about transformers than you is going to vote. We want everyone's opinion, no matter confidence. ๐ if you think at least one frontier model* will have no O(n^2) attention by the end of 2026 ๐ฅ If you disagree * Frontier models - models that match / outperform the flagship claude, gemini or chatgpt at the time on multiple popular benchmarks
posted
an
update
about 18 hours ago
Poll: Will 2026 be the year of subquadratic attention? The transformer architecture is cursed by its computational complexity. It is why you run out of tokens and have to compact. But some would argue that this is a feature not a bug and that this is also why these models are so good. We've been doing a lot of research on trying to make equally good models that are computationally cheaper, But so far, none of the approaches have stood the test of time. Or so it seems. Please vote, don't be shy. Remember that the Dunning-Kruger effect is very real, so the person who knows less about transformers than you is going to vote. We want everyone's opinion, no matter confidence. ๐ if you think at least one frontier model* will have no O(n^2) attention by the end of 2026 ๐ฅ If you disagree * Frontier models - models that match / outperform the flagship claude, gemini or chatgpt at the time on multiple popular benchmarks
View all activity
Organizations
marksverdhei
's datasets
6
Sort:ย Recently updated
marksverdhei/foo128k
Viewer
โข
Updated
Nov 13, 2025
โข
1
โข
2
marksverdhei/wordnet-definitions-en-2021
Viewer
โข
Updated
May 23, 2025
โข
43.8k
โข
94
โข
11
marksverdhei/data-by-countries
Viewer
โข
Updated
Nov 24, 2024
โข
201
โข
5
โข
1
marksverdhei/hdi-ihdi-democracy-by-country
Viewer
โข
Updated
Nov 23, 2024
โข
194
โข
1.29k
marksverdhei/reddit-syac-urls
Viewer
โข
Updated
Jun 7, 2022
โข
8.61k
โข
32
marksverdhei/clickbait_title_classification
Viewer
โข
Updated
Mar 29, 2022
โข
32k
โข
44
โข
6