A test that uses Apple hardware and external storage to run an ultra-large artificial intelligence (AI) model on-device has succeeded, bringing renewed attention to the potential of on-device AI.
On March 24 local time, online media outlet Gigazine reported that the attempt began with an experiment by AI researcher Dan Woods (댄 우즈). Using a method called "LLM in a Flash", he succeeded in running a large language model (LLM) that exceeds DRAM capacity. The approach stores the model's weights in external flash memory and loads them into memory only when needed.
Woods ran a custom model called "Qwen3.5-397B-A17B" with about 397 billion parameters on a MacBook Pro with 209GB of disk capacity and 48GB of RAM. The model uses a mixture-of-experts (MoE) structure that activates only some weights, enabling inference without loading the entire model into memory.
It recorded a processing speed of about 5.7 tokens per second, with a peak of 7.07 tokens per second. It also maintained output quality usable in practice with only about 5.5GB of memory.
Later, AI researcher ANEMLL (아넴엘) conducted a similar experiment on an iPhone 17 Pro and succeeded in running the model at 0.7 tokens per second. Woods also expressed surprise at the news, saying, "WHAT".
The test is also drawing attention because most of the code was written by Claude Opus 4.6. Woods said he only provided the idea and materials, while AI handled the implementation, and explained that the technique itself has existed for some time but has been difficult to implement in practice.
The industry views the case as a sign showing the possibility of shifting from "cloud-centric AI" to "on-device AI". It is also being assessed that further technological improvements are needed before commercialisation in terms of speed and efficiency.
https://t.co/WEFb86xtnS