Apple Unveils Comprehensive Research Dataset to Improve AI Image Editing Model Development

Apple Unveils Comprehensive Research Dataset to Improve AI Image Editing Model Development

Apple Unveils Comprehensive Research Dataset to Improve AI Image Editing Model Development


Apple has introduced the Pico-Banana-400K, a research dataset comprising 400,000 images created with Google’s Gemini-2.5 models. This dataset is featured in a study titled “Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing,” published by Apple’s research division. It is offered under a non-commercial research license, permitting use solely for academic and AI research objectives.

The impetus for developing Pico-Banana-400K arises from challenges associated with current datasets, which typically depend on synthetic creations or limited human-curated selections. Apple’s researchers observed that these datasets commonly present domain shifts, imbalanced distributions of edit types, and inconsistent quality control, which obstruct the advancement of effective editing models.

In constructing Pico-Banana-400K, Apple chose a variety of authentic photographs from the OpenImages dataset, ensuring representation of humans, objects, and textual environments. They identified 35 distinct types of modifications users might request, clustered into eight categories, such as:

– **Pixel & Photometric:** Incorporate film grain or a retro filter
– **Human-Centric:** Generate a Funko-Pop–style toy version of an individual
– **Scene Composition & Multi-Subject:** Alter weather conditions (sunny/rainy/snowy)
– **Object-Level Semantic:** Move an object (change its position/spatial relation)
– **Scale:** Zoom in

The researchers uploaded images into the Nano-Banana model along with appropriate prompts. Following the generation of edited images, they employed Gemini-2.5-Pro to evaluate the outcomes, approving or dismissing them based on adherence to instructions and visual quality.

Pico-Banana-400K encompasses images generated via single-turn edits, multi-turn edit sequences, and preference pair comparisons of successful and unsuccessful results, enabling models to learn from negative instances. While recognizing certain limitations of Nano-Banana regarding fine-grained spatial editing and typography, the researchers aspire for Pico-Banana-400K to provide a solid basis for training and evaluating future text-guided image editing models.

The study is accessible on arXiv, and the dataset is available on GitHub.