AI & Enterprise
Google unveils Gemma 4 QAT for mobiles and laptops to cut AI memory use sharply
Google unveiled Gemma 4 QAT, a version of its Gemma 4 model family designed to sharply reduce memory use so large AI models can run more easily on smartphones and standard laptops. It applies quantization-aware training, simulating quantization during training to limit quality loss. Google said the approach can preserve response quality while lowering memory needs. The models are free under the Apache License 2.0 and officially support local runtimes such as llama.cpp, Ollama and LM Studio.