[Photo: Shutterstock]

[DigitalToday reporter Hwang Chi-kyu (황치규)] Microsoft (MS) has released its 15 billion-parameter multimodal AI model, Phi-4-reasoning-vision-15B, as open source, SiliconANGLE reported on Thursday (local time).

The model combines existing algorithms SigLIP-2 and Phi-4 Reasoning, and is optimised for processing multimodal data such as science and mathematics graphs.

AI models typically process multimodal data across all layers, but Microsoft applies a "mid-fusion" approach so only some layers support multimodal processing. This significantly reduces hardware burden at the cost of sacrificing some output quality. It is also possible to switch reasoning on and off through specific prompts.

Microsoft trained the model mainly using open-source data and also used images and text descriptions. It selected high-quality datasets and went through a process of correcting inaccurate captions using OpenAI's GPT-4o and o4-mini. It also added internally generated data, high-quality data secured from specific companies, and example data to avoid inappropriate behaviour.

In performance evaluations, Phi-4-reasoning-vision-15B scored 17 percent higher than Google's "gemma-3-12b-it". It showed strong performance in mathematics and science, and maintained performance similar to models that require more compute time and tokens, Microsoft said.

Microsoft released the model through Hugging Face, GitHub and Azure.

Keyword

#Microsoft #Phi-4-reasoning-vision-15B #SigLIP-2 #GPT-4o #Hugging Face
Copyright © DigitalToday. All rights reserved. Unauthorized reproduction and redistribution are prohibited.