Microsoft Corp. today expanded its Phi line of open-source language models with two new algorithms optimized for multimodal processing and hardware efficiency.
The first addition is the text-only Phi-4-mini. The second new model, Phi-4-multimodal, is an upgraded version of Phi-4-mini that can also process visual and audio input. Microsoft says that both models significantly outperform comparably sized alternatives at certain tasks.
Phi-4-mini, the text-only model, features 3.8 billion parameters. That makes it compact enough to run on mobile devices. It’s based on the ubiquitous transformer neural network architecture that underpins most LLMs.
A standard transformer model analyzes the text before and after a word to understand its meaning. According to Microsoft, Phi-4-mini is based on a version of the architecture called a decoder-only transformer that takes a different approach. Such models only analyze the text that precedes a word when trying to determine its meaning, which lowers hardware usage and speeds up processing speed.
Phi-4-mini also uses a second performance optimization technique called grouped query attention, or GQA. It reduces the hardware usage of the algorithm’s attention mechanism. A language model’s attention mechanism helps it determine which data points are most relevant to a given processing task.
Explore IT Tech News for the latest advancements in Information Technology & insightful updates from industry experts!