self-attention

A core mechanism within Transformer architectures that allows a model to compute the relative importance of all tokens in a sequence, enabling the capture of long-range dependencies.

Recent Architectural Advancements


2026 04 14 256k context window LLM

Source Notes

  • 2026-04-13: EXPOSED: The Dirty Little Secret of AI (On a 1979 PDP-11)