Encoder-only transformers

Architectures utilizing only the encoder component of the Transformer architecture, characterized by bidirectional self-attention. Unlike decoder-only models, these models process the entire input sequence simultaneously, allowing each token to attend to both preceding and following tokens in the context.

Core Functionality & Use Cases

Primary applications are focused on discriminative, extractive, and sequence-labeling tasks within natural-language-processing (NLP):

Comparative Context

  • Attention Mechanism: Uses bidirectional context, whereas Decoder-only models (e.g., gemini, GPT) use causal/masked self-attention to prevent looking “ahead” in the sequence.
  • Task Specialization: While encoder-only models excel at understanding and labeling, generative large-language-models (LLMs) are optimized for autoregressive text generation.
  • Emerging Trends in Extraction:
    • New developments like LangExtract (Google) leverage generative gemini models to perform Information Extraction from unstructured text.
    • This represents a shift from traditional NLP pipelines toward using generative power for specific, non-generative extraction tasks, despite the inherent challenges of using large-scale generative models for structured tasks.

Backlinks:

  • 2026 04 14 Langextract Sam Witteveen