Just two months after the tech industry shook up with the DeepSeek-R1 AI model, Alibaba Cloud has rolled out its new QwQ-32B, an open-source large language model.
Alibaba calls this model a “compact reasoning model” with 32 billion parameters. Despite its smaller size, it performs on par with other large language models that boast many more parameters. On their website, Alibaba Cloud shared benchmarks showing that QwQ-32B stands up well against models from DeepSeek and OpenAI. They included metrics like AIME 24 for math reasoning, Live CodeBench for coding skills, and LiveBench for evaluation accuracy.
Alibaba claims that with continuous reinforced learning (RL) scaling, QwQ-32B significantly enhances both its mathematical reasoning and coding abilities. They noted that this model can match the performance of DeepSeek-R1, which has 671 billion parameters. This shows how effective RL can be when paired with strong foundation models, trained on vast amounts of information.
In their blog, Alibaba stated that they’ve added agent-related capabilities to QwQ-32B, allowing it to think critically and adapt its reasoning based on feedback from its environment. This ability to learn through interactions is a key feature of reinforcement learning. By doing this, Alibaba has made their model more efficient.
They acknowledged the potential of scaled RL and the unexplored opportunities within pretrained language models. As they develop the next generation of Qwen, they believe combining robust models with RL using significant computational resources will bring them closer to Artificial General Intelligence (AGI).
Alibaba is also looking into integrating agents with RL to facilitate “long-horizon reasoning,” which they believe will lead to advances in intelligence over time. QwQ-32B was trained with feedback from a general reward model and rule-based verifiers, enhancing its overall performance. This leads to better instruction-following, alignment with human preferences, and improved functioning of its agents.
China’s DeepSeek has been available since the start of the year, showing how effective RL can be in matching the capabilities of US language models without relying on the latest GPU hardware. With the US banning the export of high-end AI chips like the Nvidia H100 to China, local developers have turned to alternative methods, such as RL, to achieve competitive results.
What’s intriguing about QwQ-32B is that it uses far fewer parameters to reach similar outcomes as DeepSeek. This means it should be capable of running on less powerful AI hardware.