What stands out: Uses diffusion-based generation instead of sequential token-by-token decoding Generates tokens in parallel and refines them over a few steps Claims 1,009 tokens/sec on NVIDIA Blackwell GPUs Pricing: $0.25 / 1M input tokens , $0.75 / 1M output tokens 128K context Tunable reasoning Native tool use + schema-aligned JSON output OpenAI API compatible They’re positioning it heavily for: Coding assistants Agentic loops (multi-step inference chains) Real-time voice systems RAG/search pipelines with multi-hop retrieval submitted by /u/TyedalWaves
Originally posted by u/TyedalWaves on r/ArtificialInteligence
You must log in or # to comment.

