Nation's firms eye lightweight LLMs as AI race heats up
Smaller large models require fewer calculations, less powerful processors
More Chinese companies are developing lightweight large language models after US-based technology firm OpenAI launched a text-to-video model, Sora, last month, hiking the stakes in the global AI race.
The lightweight model, also known as a smaller large model, basically refers to those that require fewer parameters. This means they will have limited capacity to process and generate text compared to large models.
Simply put, these small models are like compact cars, while large models are like luxury sport utility vehicles.
In February, Chinese artificial intelligence startup ModelBest Inc launched its latest lightweight large model, generating much attention in the AI industry.
Dubbed as MiniCPM-2B, the model is embedded with a capacity of 2 billion parameters, much smaller than the 1.7 trillion parameters that OpenAI's massive GPT-4.0 can handle.
In December, US tech giant Microsoft released Phi-2, a small language model capable of common-sense reasoning and language understanding, although this packed 2.7 billion parameters.
Li Dahai, CEO of ModelBest, said the new model's performance is close to that of Mistral-7B from French AI company Mistral on open-sourced general benchmarks with better ability on Chinese, mathematics and coding. Its overall performance exceeds some peer large models with some 10-billion-level parameters, Li said.
"Both large and smaller large models have their advantages, depending on the specific requirements of a task and their constraints, but Chinese companies may find a way out to leverage small models amid an AI boom," said Li.
Zhou Hongyi, founder and chairman of 360 Security Technology, and a member of the 14th National Committee of the Chinese People's Political Consultative Conference at the ongoing two sessions, had also said previously in an interview that creating a universal large model that surpasses GPT-4.0 may be challenging at the moment.
Though GPT-4.0 currently "knows everything, it is not specialized", he said.
"If we can excel in a particular business domain by training a model with unique business data and integrating it with many business tools within that sector, such a model will not only have intelligence, but also possess unique knowledge, even hands and feet," he said.
Li said that if such a lightweight model can be applied to industries, its commercial value will be huge.
"If the model is compressed, it will require fewer calculations to operate, which also means less powerful processors and less time to complete responses," Li said.
"With the popularity of such end-side models, the inference cost of more electronic devices, such as mobile phones, will further decrease in the future," he added.