Generated: 2026-05-06 · API: Gemini 2.5 Flash · Modes: Summary


SubQ AI: 12M Token Context, Sparse Attention Architecture, and Verification Concerns

Clip title: A New AI Model Just Dropped With A CRAZY Claim. Author / channel: Tim Carambat URL: https://www.youtube.com/watch?v=34I9hKjJbSM

Summary

The video discusses SubQ, a new AI model from Subquadratic, which claims a groundbreaking 12 million token context window. This unprecedented capacity is touted as 52 times more efficient than FlashAttention at 1 million tokens, with less than 5% of the cost of models like Opus, and crucially, without any loss in quality. The presenter, Timothy Karenbhatt, emphasizes the potential revolutionary impact of such a model, not only for large cloud-based LLMs but especially for smaller, local models that could perform complex tasks on personal devices.

The core innovation behind SubQ is its “fully sub-quadratic sparse-attention architecture” (SSA). Karenbhatt provides a simplified explanation of attention mechanisms to clarify this. Traditional “dense attention” is highly accurate but computationally expensive and slow, scaling quadratically with context length. “Flash attention” improves speed by limiting the scope of tokens considered. Sparse attention, however, aims to intelligently select only the semantically relevant tokens from anywhere in the context window, allowing for much larger context processing with significantly reduced computational cost while theoretically maintaining accuracy. This selective attention is what SubQ claims to have mastered.

However, Karenbhatt expresses significant skepticism regarding SubQ’s claims, primarily due to a lack of transparent, verifiable information. There is no publicly available technical report or paper detailing the model’s architecture or validation. The benchmarks presented by Subquadratic are often for a “SubQ 1M-Preview” (1 million tokens), rather than the full 12 million token model. While the 1M-Preview shows strong performance on tasks like SWE-Bench Verified (81.8%) and RULER @ 128K (95.8%), it is not consistently superior to existing models like Claude Opus 4.6/4.7 at similar context lengths. Furthermore, Karenbhatt points out discrepancies between benchmark figures presented in the video and on Subquadratic’s website, adding to the ambiguity. The inability to benchmark the 12 million token model against competitors highlights the current technical limitations of other models, rather than a definitively proven superiority for SubQ at that scale.

Despite the skepticism, the underlying concept of sparse attention for long-context reasoning is presented as a highly promising area of AI development. Karenbhatt maintains cautious optimism, acknowledging that breakthroughs in efficiency and context handling are vital for making advanced AI more accessible and affordable, especially for local, on-device applications. He has applied for early access to SubQ to independently verify its claims, underscoring the importance of transparency and robust testing in the rapidly evolving field of AI.

Description

Today, a new AI model from a company name “SubQuadratic” dropped a brand new model using a known, but not used, attention mechanism that should be 52x faster than anything else and boasts a 12M context window with no loss.

Safe to say I am skeptical when the website benchmarks are only benching a fractional size of the model, there are no 12M benchmarks, and you cannot even run the model without signing up for early access.

That being said, I am cautiously optimistic about these kinds of new technologies, but there are people that make a ton of claims in AI that often pan out to be nothing or outright fabrications. So hopefully this is something we can all eventually benefit from if it works!

If SSA is so cheap, why are we having to sign up for early access? This all doesn’t seem right.

Links : SubQuadratic: https://subq.ai/ Tweet/X: https://x.com/alex_whedon/status/2051663268704636937 3Blue1Brown Attention: https://www.youtube.com/watch?v=eMlx5fFNoYc AnythingLLM: https://anythingllm.com/

Chapters : 0:00 Snake Oil or Sparse Attention? 1:18 Intro to Me & AnythingLLM 2:00 Intro to SubQuadratic 2:37 Attention Overview: Dense Attention 4:14 Attention Overview: Flash Attention (FA) 5:21 Attention Overview: Sparse Attention (SSA) 6:30 Overview of Claims & Benchmarks 9:30 Problems with their claims 12:57 Why is this Early Access?? 13:18 Snake Oil? Time Will tell.

URLs