31 points by swolpers 5 hours ago|33 comments
Lerc 2 minutes ago
Framing GPT 5 as a loss because of its short run like that is a bit weird. They say "the R&D that went into GPT-5 likely informs future models like GPT-6". but that really understates what is happening here.

Barring solid evidence otherwise you would think that GPT 5.2 was built largely on GPT 5, enough that possibly the majority of the cost of 5.2 was in developing GPT 5.

It would be like if you shipped something v1.0 on day one and discovered a bug and shipped something v1.01 the next day. Then at the end of the year reported that v1.0 massively lost money but you wouldn't believe the profit we made on v1.01 it was the single largest return on a single day of development we've ever seen.

xmcqdpt2 8 minutes ago
> And there are reasons to even be really bullish about AI’s long-run profitability — most notably, the sheer scale of value that AI could create. Many higher-ups at AI companies expect AI systems to outcompete humans across virtually all economically valuable tasks. If you truly believe that in your heart of hearts, that means potentially capturing trillions of dollars from labor automation. The resulting revenue growth could dwarf development costs even with thin margins and short model lifespans.

We keep seeing estimates like this repeated by AI companies and such. There is something that really irks me with it though, which is that it assumes companies that are replacing labor with LLMs are willing to pay as much as (or at least a significant fraction of) the labor costs they are replacing.

In practice, I haven't seen that to be true anywhere. If Claude Code (for example) can replace 30% of a developers job, you would expect companies to be willing to pay tens of thousands of dollars per seat for it. Anecdotally at $WORK, we get nickel and dimed on dev tools (better for AI tools somewhat). I don't expect corporate to suddenly accept to pay Anthropic 50k$ per developer even if they can lay off 1/3 of us. Will anyone pay enough to realize the capture of "trillion dollar"?

steveBK123 12 minutes ago
The Uber comparison is funny because they burned $32B over 14 years before profitability. OpenAI alone is burning something like $10B/year and growing, so the US AI labs are probably close to 10x Ubers burn rate.

Given the scale of AI lab burn rates being so high that the AI Capex shows up in nationwide economic stats, it clearly cannot burn for that long.

So what happens first - labs figure out how to get compute costs down by an order of magnitude, they add enough value to raise prices an order of magnitude (Uber), or some labs begin imploding?

Keep in mind another aspect of the comparison is that there wasn't an entire supply chain spend effect triggered by Uber. That is - you didn't have new car companies building new factories to produce 10x more cars, new roads being built, new gas stations being built, etc the way you have for AI.

It's like the entire economy has taken 1 giant correlated bet here.

mellosouls 2 hours ago
sa-code 2 hours ago
Thanks, could the link for this post be replaced with the original?
xyzsparetimexyz 2 hours ago
What I dont understand is, why would a company pay $10,000s a month to anthropic for Claude in a situation where a Chinese LLM is 99% as good, is open weight and runs on US servers and is 5% the price?
willis936 2 hours ago
By what metrics are they 99% as good? There are a lot of benchmarks out there. Please share them.

I think the answer lies in the "we actually care a lot about that 1% (which is actually a lot more than 1%)".

samuelknight 46 minutes ago
Open models have been about 6 to 9 months behind frontier models, and this has been the case since 2024. That is a very long time for this technology at it's current rate of development. If fast takeoff theory is right, this should widen (although with Kimi K2.5 it might have actually shortened).

If we consider what typically happens with other technologies, we would expect open models to match others on general intelligence benchmarks in time. Sort of like how every brand of battery-powered drill you find at the store is very similar, despite being head and shoulders better than the best drill from 25 years ago.

hobofan 36 minutes ago
> That is a very long time for this technology at it's current rate of development.

Yes, as long as that gap stays consistent, there is no problem with building on ~9 months old tech from a business perspective. Heck, many companies are lagging behind tech advancements by decades and are doing fine.

SecretDreams 18 minutes ago
> Sort of like how every brand of battery-powered drill you find at the store is very similar, despite being head and shoulders better than the best drill from 25 years ago.

They all get made in China, mostly all in the same facilities. Designs tend to converge under such conditions. Especially since design is not open loop - you talk to the supplier that will make your drill and the supplier might communicate how they already make drills for others.

Topfi 2 hours ago
I'm still testing myself and cannot make a confident statement yet, but Artifical Analysis is a solid and independent, though also to be fair somewhat imperfect source for a general overview: https://artificialanalysis.ai/

Kimi K2.5 is rather competitive in regard to pure output quality, agentic evals are also close to or beating US made frontier models and, lest we forget, the model is far more affordable than said competitors, to a point where it is frankly silly that we are actually comparing them.

For what it's worth, of the models I have been able to test as of yet, many purely on performance (meaning solely task adherence, output quality and agentic capabilities; so discounting price, speed, hosting flexibility), I have personally found the prior Kimi K2 Thinking model to be overall more usable and reliable than Gemini 3 Pro and Flash. Purely on output quality in very specific coding tasks, Opus 4.5 was in my testing leaps and bounds superior of both the Gemini models and K2 Thinking however, though task adherence was surprisingly less reliable than Haiku 4.5 or K2 Thinking.

Being many times more expensive and in some cases less reliably adhering to tasks, I really cannot say that Opus 4.5 is superior or Kimi K2 Thinking is inferior here. The latter is certainly better in my specific usage than any Gemini model and again, I haven't yet gone through this with K2.5. I try not to just presume from the outset that K2.5 is better than K2 Thinking, though even if K2.5 remains at the same level of quality and reliability, just with multi modal input, that'd make the model very competitive.

Lerc 16 minutes ago
I don't think 99% of the best is a good metric.

It is highly dependent on what the best represents.

If you had a 100% chance of not breaking your arm on any given day, what kind of value would you place on it over a 99% chance on any given day. I would imagine it to be pretty high.

The top models are not perfect, so they don't really represent 100% of anything on any scale.

If the best you could do is have a 99% chance of not breaking your arm on any given day, then perhaps you might be more stoic about something that is 99% of 99% Which is close enough to 98% that you are 'only' going to double the number of broken arms you get in a year.

I suspect using AI in production will be a calculated more as likelihood of pain than increased widgets per hour. Recovery from disaster can eat any productivity gains easily.

eru 45 minutes ago
Usain Bolt's top speed is about 44.72 km/h. My top speed sprinting is about 25 km/h. That's at least 50% as good. But I'd have a hard time getting paid even half as much as Mr Bolt.
SecretDreams 16 minutes ago
Yeah, but you'd both be quite suitable to go walk to the grocery store.
senko 17 minutes ago
As a heavy Claude Code user, I would like to have that option.

But if it's just 33% as good, I wouldn't bother.

Top LLMs have passed a usability threshold in the past few months. I haven't had the feeling the open models (from any country) have passed it as well.

When they do, we'll have a realistic option of using the best and the most expensive vs the good and cheap. That will be great.

Maybe in 2026.

energy123 37 minutes ago
I tried that but I'm back to paying OpenAI $200/month because the quality was significantly worse on my codebase.
re-thc 2 hours ago
How do they run on US servers? Self host? That’s not going to be cheap whilst the big AI players horde resources like memory.
Topfi 46 minutes ago
There are many providers (Fireworks, Groq, Cerebras, Google Vertex), some using rather common hardware from Nvidia, etc., others their own solutions focused solely on high throughput inference. They often tend to be faster, cheaper and/or more reliable than what the lab that trained the model is charging [0], simply because there is some competition, unlike with US frontier models which at best can be hosted by Azure, AWS or GCloud at the same price as the first party.

[0] https://openrouter.ai/moonshotai/kimi-k2-thinking

Arkhaine_kupo 2 hours ago
Isn't there pretty good indications that the chinese llms have been trained on top of the expensive models?

Their cost is not real.

Plus you have things like MCP or agents that are mostly being spearheaded by companies like Anthropic. So if it is "the future" and you believe in it, then you should pay a premium to spearhead it.

You want to bet on the first Boeing not the cheapest copy of a Wright brother plane.

(Full disclosure, I dont think its the future and I think we are over leveraging on AI to a degree that is, no pun intended, misanthropic)

malka1986 2 hours ago
> Isn't there pretty good indications that the chinese llms have been trained on top of the expensive models?

So what ?

fc417fc802 53 minutes ago
Well it raises an interesting conundrum. Suppose there's a microcontroller that's $5.00 and another that's $0.50. The latter is a clone of the former. Are you better off worrying only about your short term needs, or should you take the long view and direct your business towards the former despite it being more expensive?
blitzar 46 minutes ago
Suppose both microcontrollers will be out of date in a week and replaced by far more capable microcontrollers.

The long view is to see the microcontroller as a commodity piece of hardware that is rapidly changing. Now is not the time to go all in on betamax and take 10 years leases on physical blockbuster stores when streaming is 2 weeks away.

Ai is possibly the most open technological advance I have experienced - there is no excuse, this time, for skilled operators to be stuck for decades with AWS or some other propriety blend of vendor lock-in.

FridgeSeal 11 minutes ago
Well the company of the former microcontroller has gone out of their way to make getting and developing on actual hardware as difficult and expensive as possible as possible, and could reasonably accused of doing “suspect financial shenanigans”, and the other company will happily sell me the microcontroller for a reasonable price. And sure, thy started off cloning the former, but their own stuff is getting really quite good these days.

So really, the argument pretty well makes itself in favour of the $0.5 micro controller.

woadwarrior01 13 minutes ago
That's a very tenuous analogy. Microcontrollers are circuits that are designed. LLMs are circuits that learned using vast amounts of data scraped from the internet, and pirated e-books[1][2][3].

[1]: https://finance.yahoo.com/news/nvidia-accused-trying-cut-dea...

[2]: https://arstechnica.com/tech-policy/2025/12/openai-desperate...

[3]: https://www.businessinsider.com/anthropic-cut-pirated-millio...

ForHackernews 50 minutes ago
You're asking whether businesses will choose to pay a 1000% markup on commodities?
blitzar 2 hours ago
> Isn't there pretty good indications that the chinese llms have been trained on top of the expensive models?

there are pretty good indications that the american llms have been trained on top of stolen data

svara 37 minutes ago
This is proven. You can prove it yourself easily. Take a novel from your bookshelf, type in any sentence from the novel and ask it what book it's from. Ask it for the next sentence.

This works with every novel I've tried so far in Gemini 3.

My actual prompt was a bit more convoluted than this (involving translation) so you may need to experiment a bit.

ForHackernews 51 minutes ago
This so-called "PC compatible" seems like a cheap copy, give me a real IBM every time.
re-thc 2 hours ago
> Their cost is not real.

They can’t even officially account for any nvidia gpus they managed to buy outside the official channels.

daft_pink 18 minutes ago
I think the question is as it matures will, the value of the model become more stable and then what will happen to price?

If you can compare phones or pcs, there was a time when each new version was a huge upgrade to the last version, but eventually these ai models are gonna mature and something else is gonna happen

asyncadventure 16 minutes ago
The discussion about Chinese vs US models misses a key enterprise reality: switching costs are enormous once you've built production systems around a specific API. Companies aren't just buying the model - they're buying reliability, compliance guarantees, and the ecosystem that reduces integration risk. Price matters, but in enterprise AI the "last mile" of trust and operational certainty often justifies significant premiums.
senko 7 minutes ago
Responses API is a commodity.

That's why OpenAI tries to push Assistants API, Agents SDK and ChatGPT Apps which are more of a lock in: https://senkorasic.com/articles/openai-product-strategy-2025

Funny thing is, even OpenAI seems to ignore Assistant/Apps API internally. Codex (cli) uses Responses API.

TSiege 10 minutes ago
I think AI has the potential to break this model. It reduces switching costs immensely