**Apple’s Latest Foundation Models: Insights from the Tech Report 2025**
At WWDC25, Apple introduced updated iterations of its on-device and cloud foundation models, accompanied by a thorough tech report that outlines the training, optimization, and assessment of these models. The document, titled “[Apple Intelligence Foundation Language Models – Tech Report 2025](https://machinelearning.apple.com/papers/apple_intelligence_foundation_language_models_tech_report_2025.pdf),” explores various elements, including architecture, data sources, and performance metrics.
### The Local Model: A Dual-Block Framework
Apple’s on-device model, featuring around 3 billion parameters, is structured into two blocks. Block 1 accounts for 62.5% of the transformer layers, while Block 2 contains the remaining 37.5% without including key and value projections. This architecture diminishes memory caching needs by 37.5% and speeds up the generation of the first token by the same margin, all while ensuring quality performance.
### Cloud-Based Model: Cutting-Edge Architecture
For its cloud model, Apple engineered a bespoke architecture referred to as Parallel-Track Mixture-of-Experts (PT-MoE). This methodology permits the model to activate smaller subnetworks, or “experts,” only when pertinent to the task, thereby improving speed and precision. The Parallel Track Transformer processes tokens via several independent tracks, with each track housing MoE layers that activate only a limited number of experts per token, thus circumventing bottlenecks found in conventional models.
### Improved Multilingual Representation
Apple has greatly enhanced its multilingual functions, raising the multilingual data utilized in training from 8% to 30%. The tokenizer has been increased by 50%, now accommodating 150,000 tokens. These upgrades have resulted in significant performance enhancements in non-English evaluations, particularly following reinforcement learning fine-tuning.
### Data Acquisition Strategies
Apple’s training data mainly originates from web scraping through its Applebot, which conforms to `robots.txt` exclusions to honor website preferences. The data sources comprise:
– **Publicly Accessible Web Data:** The largest segment of training data, curated for quality and pertinence.
– **Licensed Data:** Portions of data have been acquired from publishers, including major news outlets.
– **Synthetic Data:** Created for specific tasks, supporting fine-tuning and multilingual capabilities.
– **Visual Data:** More than 10 billion image-caption pairs were amassed to improve image comprehension.
### Conclusion
In spite of notions of trailing in AI, Apple’s tech report demonstrates considerable progress in its foundation models, highlighting innovative frameworks and a dedication to privacy. The in-depth insights into the architecture and data sourcing tactics underline Apple’s continuous endeavors to bolster its AI capabilities while prioritizing user privacy.