The quiet benefit of open source LLMs
A.K.A Why I don't care about the artificially deflated price of SOTA model API services
There is a lot of talk about leaning into open source models as a way to mitigate the eventual cranking up of prices from companies like Anthropic, OpenAI and Google. The line of thinking goes something like “The prices are artificially deflated and therefore you should not make these SOTA models a core part of your workflow, because the rug will eventually be pulled”.
There is a small but critical aspect missed in that line of thinking.
Right now, SOTA model API providers are cheaper and easier to use than rolling your own. So we use them. We’re getting a bunch of free tokens it would be stupid not to. But the fact that for ~$20k I can have all the hardware I need to roll my own LLM stack and “approximate” the benefits of what I currently get from Sonnet 4.1 is all that needs to be true for us to have a massive security blanket.
The cost of using a SOTA model is capped to some function of the time and money it would cost me to set up my own stack. If prices begin to rise, at some point, a $20k down payment for my own state of the art hardware and the time involved in setting up my own ML stack will become “cheaper”.
That side of the equation makes Nvidia just as much money, if not more money due to the lack of bulk purchasing, so it will always remain viable at some price point. The OSS models continue to advance on their own independent market and geopolitical forces.
I’m not going to waste my time setting up local models and OSS agents. I’m just going to benefit from the fact that SOTA model API providers margin can never exceed the amount of money it would take me to tell them to fuck off and roll my own.
But what about all of the cool features they have that are hard to get out of local and OSS models?
One other factor here that we’re seeing emerge is that so many LLM products differentiate themselves on their “harness”. i.e system prompts, agentic loops, stitching together disparate models. There are already tons of services that help you approach SOTA model performance with local models by combining different models and approximating similar behavior on weaker hardware.
The bottom line is that OpenAI and Anthropic and Gemini are saving you time and money, not providing you a service that you can’t get via similar means working directly with hardware providers. The moment they aren’t, you go straight to the source and work directly with Nvidia and the latest open model. Also, buy Nvidia stock 🤷♂️.