We are thrilled to introduce Stable-DiffCoder, a robust code diffusion Large Language Model (LLM). Built directly on the Seed-Coder architecture, data, and training pipeline, it introduces a block diffusion continual pretraining (CPT) stage equipped with a tailored warmup strategy and a block-wise clipped noise schedule.
Notably, with only CPT followed by supervised fine-tuning (SFT), Stable-DiffCoder surpasses many strong ~8B Autoregressive (AR) and diffusion-based code models. These results demonstrate that diffusion-based training can improve code modeling quality beyond what AR training alone can achieve, even under tightly controlled data and architecture constraints.
Traditional bidirectional training in Diffusion LLMs (DLLMs) often introduces noise, hindering the model from learning clear reasoning patterns. Our analysis on 2.5B scale models reveals that effective DLLMs require:
Our Solution: We initialize training from a pre-annealing AR checkpoint—which retains clean, malleable knowledge—and proceed with a small block diffusion stage (learning clear knowledge and further enhancing data).
We observed significant instability in gradient norms during the CPT of DLLMs. To address this and ensure efficient block diffusion training, we implemented two key designs:
Stable-DiffCoder demonstrates robust capabilities across both Base and Instruct versions. It consistently outperforms the AR baseline and maintains a competitive edge against other state-of-the-art ~8B AR and DLLM code models.