The Next Phase Of AI Innovation: Inference Time ScalingWhere to place your bets on Huang's "third scaling law"Happy Sunday and welcome to Investing in AI. Today I want to write about an idea that goes by many names in the AI community. You may hear it called “test time compute,” “test time scaling,” “inference time compute,” “inference time scaling,” or even something else. But whatever you call it - the picture below from NVIDIA’s last developer conference explains it best. For purposes of this post, we will call it “inference time compute.” At the conference, Jensen Huang called inference time compute the “third scaling law.” What he means is that when you want to make AI models, and LLMs in particular, better, there are 3 things you can do.
Inference time compute is a series of algorithms, the class of which is growing all the time, that go by names like “chain of thought,” “tree of thought,” “best of N,” “beam search,” “diverse verifier tree search,” “lookahead search” and more. Below is a visual example of beam search. Since LLMs operate in probabilities, you can think of beam search as exploring multiple paths through the LLM at the same time. It expands the next step in the path on each cycle, then prunes down to the best couple of options. Why is inference time compute so interesting? Because as we build AI agents, and these agents have an array of different inputs, each input may require a different inference time compute algorithm to get the best output. Inference time compute also allows you to trade time for cost to generate the same performance. As this post shows, a small language model with more inference time compute power behind it can match or beat a much much larger LLM in terms of performance. There are also classes of problems that are difficult to solve but easy to verify the solution. Using something open source like Llama or Qwen to solve the problem but then OpenAI to verify the solution can save time and money. From an investor perspective, the inference time compute world has a few interesting traits.
If you want to learn more, here is a good video about why this concept is so important, and going to be so large and impactful on AI. If you are an enterprise and want to try various inference time compute algorithms yourself on open source LLMs, at Neurometric we have a platform to make that really easy. Thanks for reading. |