Apple trained compact AI models to describe images better than their larger competitors

08.04.2026 36 software

Apple Unveils New “RubiCap” Technology for Image Description

Apple researchers have developed a method called *RubiCap*, which enables small AI models to generate more accurate and detailed image descriptions than large-scale counterparts.

How RubiCap Works
1. Image Parsing

To produce a detailed text, the model first recognizes numerous objects and regions in the frame. This provides a deep understanding of composition rather than a superficial description.

2. Practical Value

These skills are useful for training child AI models, text-to-image generators, and specialized features (e.g., enhancing visual content).

3. Resource Challenge

Traditional approaches to training detailed description systems require substantial computational resources both at the initial phase and during subsequent reinforcement learning.

Experimental Methodology
- Image Selection – 50,000 images were randomly chosen from the *PixMoCap* and *DenseFusion‑4V‑100K* datasets.

- Description Generation – Existing computer vision models were used: Google Gemini 2.5 Pro, OpenAI GPT‑5, Alibaba Qwen 2.5‑VL‑72B‑Instruct, Google Gemma‑3‑27B‑IT, and Alibaba Qwen 3‑VL‑30B‑A3B‑Instruct, along with Apple’s models currently being trained.

- Quality Assessment – Gemini 2.5 Pro served as the expert: it analyzed descriptions, identified matches and errors, and formulated clear evaluation criteria.

- Judge Scoring – The Qwen 2.5‑7B‑Instruct model assigned scores for each criterion and generated a reward signal for the training model.

Results
- The training model received specific feedback, allowing rapid improvement of description accuracy without relying on a single “correct” answer.

- Apple ultimately created three proprietary models: RubiCap‑2B, RubiCap‑3B, and RubiCap‑7B (respectively 2, 3, and 7 billion parameters).

- In image-description tests, RubiCap outperformed competitors with 32 billion and even 72 billion parameters. In some cases, RubiCap‑3B achieved better results than RubiCap‑7B, confirming that model size does not always guarantee superior performance.

Thus, the RubiCap technology demonstrates how high-quality image descriptions can be achieved with fewer resources and more efficient training.

Apple trained compact AI models to describe images better than their larger competitors

Related news

Google Gemini has reached 750 million monthly active users, leaving ChatGPT only a small distance from the leader.

Nothing introduced a beta version of Essential Apps—a platform for creating mini-apps using artificial intelligence

Microsoft explained why the accounts of VeraCrypt and other open services are blocked—due to the negligence of their creators

Meta✴ AI app ranked fifth in the App Store after launching Muse Spark

Comments (0)

Log in to comment