Ant Group Launches High-Efficiency Ling-2.6-Flash AI Model

News and Community

Ant Group officially released the Ling-2.6-Flash model today. This new tool helps developers build faster applications. It uses a sparse Mixture-of-Experts architecture to stay lean. The model has 104 billion total parameters in its system. Only 7.4 billion parameters are activated during a single task. This setup delivers high intelligence without wasting computing power.

Token Efficiency and Stable Inference Performance

According to data from Artificial Analysis, Ling-2.6-Flash demonstrates a significant advantage in efficiency. It achieved an Intelligence Index of 26 recently. It used only 15 million output tokens to finish. Comparable models like Nemotron-3-Super consumed over 110 million tokens. Ling-2.6-Flash cuts inference costs for users by 86%. It handles 340 tokens per second on H20 hardware. Its prefill throughput is 2.2 times that of Nemotron-3-Super.

This model previously led charts as “Elephant Alpha” on OpenRouter. Testing reached a peak of 100 billion daily token calls. Input costs are only 0.1 USD per million tokens. Output costs stay low at 0.3 USD per million tokens. The official API is open to the public right now.

In the complete Artificial Analysis Intelligence Index evaluation, Ling-2.6-flash consumed a total of 15 million tokens to complete tasks, whereas comparable models like Nemotron-3-Super consumed over 110 million tokens. Under 4-card H20 conditions, it achieves inference speeds of up to 340 tokens per second, with a Prefill throughput 2.2 times that of Nemotron-3-Super.

Unlike many models that rely on generating excessive tokens to achieve higher benchmark scores, Ling-2.6-flash focuses on token efficiency. For developers and enterprises, this equals an 86% reduction in inference cost, faster response times, and a smoother user experience.

The model improves the performance of complex AI agent tools. It leads the SWE-bench Verified rankings for coding tasks. It also handles long documents and math very well. Users can start a one-week free trial today. Access is available via the Alipay Tbox or OpenRouter. Ling-2.6-Flash represents a major leap in modern AI efficiency.

Explore IT Tech News for the latest advancements in Information Technology & insightful updates from industry experts!

News Source: Businesswire.com