Smaller models could help AI move from the cloud to edge

  • LLMs have been all the rage, but there’s a push toward smaller AI models

  • Smaller models could be a perfect fit for the space, power and compute constraints at the edge

  • Applications could include training but will still vary by vertical

The thing about large language models (LLMs) is that they’re…large. That makes it hard to run certain artificial intelligence (AI) use cases like training at the edge given constraints on space, power and compute capacity there. But there’s a new trend emerging in the AI realm that could help more use cases move out of the data center and to the edge: smaller models.

“We’re seeing some of these models are shrinking in size pretty dramatically,” Red Hat’s VP and GM for Edge and In-vehicle Operating Systems Francis Chow told Silverlinings.

Indeed, Microsoft in January reportedly formed a new team internally to create a generative AI model that requires less compute power than something like OpenAI’s ChatGPT, for instance. And last month, Google unveiled its line of Gemma models, scaled down versions of its Gemini technology that are small enough to run on a laptop.

According to Chow, that means that rather than just seeing inferencing at the edge – as has long been expected – we could also start to see training move from data centers to edge locations.

In addition to shrinking models enabling this shift, Chow said some edge applications “likely don’t require” a full LLM to function. Keeping the data at the edge, closer to where it’s generated, not only just makes sense but also can enable these applications to leverage more specific data than a regular LLM is trained on.

Interestingly, spending on edge compute infrastructure which would, in theory, host these smaller AI models, is expected to hit $232 billion this year and nearly $350 billion by 2027, according to IDC. That’s more than forecast spending on cloud compute and storage infrastructure, which is estimated to reach $153 billion in 2027.

"Edge computing will play a pivotal role in the deployment of AI applications," IDC Research VP Dave McCarthy stated. “OEMs, ISVs and service providers are taking advantage of this market opportunity by extending feature sets to enable AI in edge locations."

Applying AI

Chow said enterprises are “still trying to figure out what they can do with AI.” The name of the game right now is sifting through the noise to develop a solid strategy and figure out which solutions are mature enough and which will generate the right return on investment.

Another key task for the time being is working up an informed approach to AI governance, he said, especially in light of efforts around the globe to begin regulating the technology. That’s especially true for companies planning to leverage AI for edge use cases that cross borders – like self-driving cars, for instance.

In terms of what kind of applications might be on the (edge of the) table, Chow said it will depend on the vertical.

Financial companies might use AI for executing trades faster and more intelligently, while retail might use it for loss prevention or real time promotions based on customer purchases.

“In general terms, if something has to be decided in real time and it doesn’t take a lot of heavy analytics that’s best run at the edge,” Chow concluded. “Something that can be aggregated, doesn’t have much of a need for a real time response and is cheaper to run and train as a batch makes sense to run at the data center.