Step Audio
Step Audio is an AI audio synthesis model developed by Stepfun, capable of generating high-fidelity audio from text prompts. It specializes in creating realistic soundscapes, music, and voice effects with precise control over timbre and spatial properties.
Key Developments
Step Audio 3 (May 2026)
- Released as part of a major update cycle in May 2026, coinciding with advancements in Opus 4.8 and bonsai-image.
- Significantly improved coherence in long-form audio generation.
- Enhanced ability to isolate and manipulate specific acoustic elements within complex mixes.
- Detailed analysis available in: Weekly AI Developments: Opus 4.8, Step Audio 3, Bonsai Image (May 2026)
Technical Specifications
- Input: Text-to-Audio, Audio-to-Audio
- Latency: Low-latency streaming support
- Capabilities: Noise reduction, style transfer, multi-speaker dialog generation
See Also
- Stepfun
- Audio Synthesis
- AI Model Versions