...

Glean Announces Support for NVIDIA Nemotron 3 Ultra, Expanding Model Choice for Cost-Effective Enterprise AI

Glean Adds Support for NVIDIA Nemotron 3 Ultra

Enterprise AI leader Glean officially announced support for NVIDIA Nemotron 3 Ultra today. This massive update expands the diverse set of models available to clients on the platform. The new addition gives organizations a strong, open model option for cost-effective agentic work. This specific model demonstrates major leaps in open-source agentic capabilities. Remarkably, the system delivers 91% of frontier LLM performance on key metrics like completeness. The platform allows businesses to move away from forcing every task through a single model family. Instead, companies can orchestrate the best tool within a secure, context-aware environment.

“Enterprises are moving beyond the idea that one model should do everything,” said Emrecan Dogan, Chief Product Officer, Glean. “They want the ability to match the right model to the right task, and they need a cost-effective way to bring AI into everyday work. Our support for NVIDIA Nemotron 3 Ultra reflects that reality and gives customers a strong option as they scale AI across the enterprise.”

“Glean is bringing NVIDIA Nemotron 3 Ultra into enterprise AI workflows where model choice, cost, and performance are critical,” said Kari Briski, Vice President of Generative AI, NVIDIA. “Together, we’re helping companies deploy open models for everyday work at scale.”

Optimizing Token Economics With Multi-Model Collaboration

The recent announcement highlights a long-standing model-agnostic platform strategy. Glean provides access to more than 30 open-source and proprietary models. This expansive access allows clients to utilize the latest technological advances while successfully avoiding provider lock-in.

Furthermore, this development serves as a continuation of a collaborative effort across the Nemotron family. For instance, the agentic search model known as Glean Waldo is post-trained on NVIDIA Nemotron 3 Nano. This setup delivers 50% lower latency and uses 25% fewer tokens. Waldo handles the specific search tasks that frontier models previously managed. Consequently, this system preserves computing capacity for tasks that require intense reasoning. Multiple models work together efficiently to deliver frontier-level intelligence with fewer tokens. 

Explore IT Tech News for the latest advancements in Information Technology & insightful updates from industry experts!

News Source: Businesswire.com