Apple is developing its own AI assistant for the iPhone that can launch apps on behalf of the user

Apple is developing its own AI assistant for the iPhone that can launch apps on behalf of the user

9 hardware

Apple is developing a compact local AI agent for working with user interfaces

Apple is working on a new algorithm – Ferret‑UI Lite, which can “understand” application interfaces and interact with them on behalf of the user, but all of this happens on the device itself. The model has 3 billion parameters and in tests shows results comparable to or even surpassing large models up to 24 times bigger.

Origins of the project
In December 2023 a team of nine researchers published the paper FERRET: Refer and Ground Anything Anywhere at Any Granularity. It presented a multimodal language model trained on various types of data that can link textual descriptions to specific parts of an image.

Since then Apple has expanded the Ferret family:

ModelPurpose
Ferretv2Improved base model
Ferret‑UISpecialized MLLM for mobile interfaces
Ferret‑UI 2Multi‑platform support and higher resolution

Ferret‑UI in particular addresses one of the problems with modern multimodal large language models (MLLMs): they poorly recognize UI elements. The model adds “arbitrary resolution” on top of Ferret, increasing image detail and using enhanced visual cues.

New achievements
Apple recently introduced two additional versions:

1. Ferret‑UI Lite – a lightweight model with 3 billion parameters, optimized for local deployment on mobile devices.
2. Ferret‑UI 2 – an extended version that supports multiple platforms and higher resolution screenshots.

The main difference between Ferret‑UI Lite and large server models is that it remains competitive while requiring significantly fewer computational resources.

Why this matters
Most existing GUI agents are built on massive foundational models because their powerful reasoning and planning capabilities achieve outstanding results in navigating graphical interfaces. However, such models are too bulky to run directly on a device.

Ferret‑UI Lite solves this problem by combining:

- Multiple key components and ideas from training small LLMs;
- Real and synthetic data from various GUI domains;
- Dynamic cropping techniques and interface segmentation quality optimization;
- Controlled fine‑tuning and reinforcement learning.

The result is a model that is nearly on par with or even outperforms larger competing GUI agents in low‑level UI element binding, screen understanding, multi‑step planning, and self‑analysis.

Comments (0)

Share your thoughts — please be polite and stay on topic.

No comments yet. Leave a comment — share your opinion!

To leave a comment, please log in.

Log in to comment