In a recent investigation, Apple researchers showcase a diffusion model that can generate text up to 128 times quicker than its peers. Here’s the mechanism behind it.
## The technical specifics
Here’s what you should know regarding this investigation: LLMs like ChatGPT follow an autoregressive framework. They produce text in a sequential manner, processing one token at a time while considering both the user’s input and all previously generated tokens.
Unlike autoregressive frameworks, diffusion models exist. They create numerous tokens simultaneously and enhance them through various iterative phases until the complete response is formed.
Ultimately, one type of diffusion model is the flow-matching model, which essentially bypasses the iterative phase of diffusion models and learns to produce the final output in a single attempt.
For a more in-depth exploration of the functioning of diffusion models, refer to [this post](https://9to5mac.com/2025/07/04/apple-just-released-a-weirdly-interesting-coding-language-model/) regarding Apple’s diffusion-centered coding model. Additionally, for more insights into flow-matching models, check out [this post](https://9to5mac.com/2025/09/24/apple-simplefold-protein-folding-prediction-ai/) about Apple’s flow-matching model used for protein folding.
## Apple’s latest research
In a paper released today, titled “[FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models](https://machinelearning.apple.com/research/fs-dfm),” researchers from Apple and Ohio State University introduce a novel model known as Few-Step Discrete Flow-Matching, abbreviated as FS-DFM.
Throughout the research, the team illustrates that FS-DFM can produce complete passages with merely eight swift refinement iterations, achieving quality comparable to diffusion models that typically take over a thousand steps to reach a similar outcome.
To accomplish this, the researchers employ an intriguing three-step method: initially, the model is educated to manage various budgets of refinement iterations. Next, a guiding “teacher” model is utilized to assist it in making larger, more precise adjustments at each iteration without “overshooting” the desired text. Finally, they modify the operation of each iteration so that the model can attain the final outcome in fewer and more consistent steps.
When assessed against larger diffusion models, FS-DFM excelled in two critical metrics: perplexity and entropy.
In summary, the perplexity score serves as a conventional metric for text quality in language models. A lower perplexity signifies that the text sounds more accurate and natural.
Regarding entropy, it essentially evaluates how decisively the model chooses each word. If entropy is excessively low, the text may become repetitive or predictable, whereas if it’s too high, it can begin to sound random or incoherent.
In comparison with the Dream diffusion model containing 7 billion parameters and the LLaDA diffusion model with 8 billion parameters, FS-DFM variants with 1.7, 1.3, and 0.17 billion parameters consistently obtained lower perplexity and exhibited more stable entropy across all iteration counts.
Considering the findings and the potential this method displays, coupled with the scarcity of similar models and studies available, the researchers mentioned their intention to “release code and model checkpoints to support reproducibility and further research.”
For those interested in delving deeper into Apple’s strategies and more specific implementation aspects of Apple’s models, don’t forget to review the [full paper](https://arxiv.org/abs/2509.20624) on arXiv. It includes several performance examples, such as this one, that color-codes the iteration at which each word was last modified:
Locate “FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models” on [arXiv](https://arxiv.org/abs/2509.20624).