Measuring software productivity by lines of code is like measuring progress on an airplane by how much it weighs.
150k sounds like a lot. I do have to wonder what the program does exactly to see if that’s warranted, but it sounds bloated. yes 'println("a very important and useful line of code");' >> main.c
in under a second!As such, this is high productivity! /s
Do you remember such a time or company? I have been developing professionally since the early 1990's (and hobbyist before then), and this "truth" has been a meme even back then.
I'm sure it happened, but I'm not sure it was ever as widespread as this legend would make it sound.
But, there were decades of programmers programming before I started, so maybe it just predated even me.
> They devised a form that each engineer was required to submit every Friday, which included a field for the number of lines of code that were written that week.
And it confuses Claude.
This way of running tests is also what Rails does, and AFAIK Django too. Tests are isolated and can be run in random order. Actually, Rails randomizes the order so if the are tests that for any reason depend on the order of execution, they will eventually fail. To help debug those cases, it prints the seed and it can be used to rerun those tests deterministically, including the calls to methods returning random values.
I thought that this is how all test frameworks work in 2026.
I've never had this problem.
That's easier said than done. Simple example: API that returns a count of all users in the database. The obvious correct implementation that will work would be just to `select count(*) from users`. But if some other test touches users table beforehand, it won't work. There is no uuid to latch onto here.
That could run on developer machines but maybe it runs only on a CI server and developers run only unit tests.
As for assertions, it’s not that hard to think of a better way to check if you made an insertion or not into the db without writing “assert user_count() == 0”
I've been tweaking my skills to avoid nested cases, better use of with/do to control flow, good contexts, etc.
What does your workflow look like?
6 of one one-half dozen of the other.
At the point where you have a phoenix project in dev, you're already exposing an http endpoint, so the infra to not have to do a full on "attach to the VM and do RPCs" is nice, and you just pull tidewave in as a single dependency, instead of downloading a bunch of scripts, etc.
We don't 100% AI it but this very much matches our experience, especially the bits about defensiveness.
Going to do some testing this week to see if a better agents file can't improve some of the author's testing struggles.
The new generation of code assistants are great. But when I dogmatically try to only let the AI work on a project it usually fails and shots itself in its proverbial feet.
If this is indeed 100% vibe coded, then there is some magic I would love to learn!
Overall my process is, define a broad spec, including architecture. Heavy usage of standard libraries and frameworks is very helpful, also typed languages. Create skills according to your needs, and use MCP to give CC a feedback mechanism, playwright is a must for web development.
After the environment and initial seed is in place in the form of a clear spec, it's process of iteration via conversation. My session tend to go "Lets implement X, plan it", CC offers a few route, I pick what makes most sense, or on occasions I need to explain the route I want to take. After the feature is implemented we go into a cleanup phase, we check if anything might be going out of hand, recheck security stuff, and create testing. Repeat. Pick small battles, instead of huge features. I'm doing quite a lot of hand handling at the moment, saying a lots of "no", but the process is on another level with what I was doing before, and the speed I can get features out is insane.
I have been through Karpathy's work - however, I don't find that it helps with large scale development.
Your tactics work successfully for me at smalle scale (at around 10klocs, etc) and starts to break down - especially when refactorings are involved.
Refactoring happens when I see that the LLM is stumbling over it's own decisions _and_ when I get a new idea. So the ability to refactor is a hard requirement.
Alternatively refactoring could be achieved by starting over? But I do have a hard time accepting that idea for projects > 100klocs.
Then on average your velocity is little better than if you just did it all by hand.
im doing some heavy duty shit, almost everything is routed through a custom CQRS-style events table before rollup into the db tables (the events are sequentially hashed for lab notebook integrity). editing is done through a custom implementation of quill js's delta OT. 100% of my tests are async.
I've never once run into the ecto issues mentioned.
I haven't had issues with genservers (but i have none* in my project).
claude knows oban really well. Honestly I was always afraid to use oban until claude just suggesting "let's use oban" gave me the courage. I'll be sending Parker and Shannon a first check when the startup's check comes in.
article is absolutely spot on on everything else. I think at this point what I've built in a month-ish would have taken me years to build out by myself.
biggest annoyance is the over-defensiveness mentioned, and that Claude keeps trying to use Jason instead of JSON. Also, Claude has some bad habits around aliases that it does even though it's pretty explicitly mentioned in CLAUDE.md, other annoying things like doing `case functioncall() do nil -> ... end` instead of `if var = functioncall() do else`
*none that are written, except liveviews, and one ETS table cache.
[0] CQRS library: https://hexdocs.pm/spector/Spector.html
[1] Quill impl: https://hexdocs.pm/otzel/Otzel.html
- Silently closes the tab, and makes a remark to avoid given software at any cost.
What if it doesnt? What if LLMs just stay mostly the same level of usefulness they are now, but the costs continue to rise as subsidization wears off?
Is it still worth it? Maybe, but not worth abandoning having actual knowledge of what you’re doing.
Anyone can sell the future.
What I'd really like to see though is experiments on whether you can few shot prompt an AI to in-context-learn a new language with any level of success.
It's certainly helpful, but has a tendency to go for very non idiomatic patterns (like using exceptions for control flow).
Plus, it has issues which I assume are the effect of reinforcement learning - it struggles with letting things crash and tends to silence things that should never fail silently.
It tends to always write Java even if it's Elixir. Usage rules help: https://hexdocs.pm/usage_rules/readme.html
The SOTA models come do a great job for all of them, but if I had to rank the capabilities for each language it would look like this:
JavaScript, Julia > Elixir > Python > C++
That's just a sample size of one, but I suspect, that for all but the most esoteric programming languages there is more than enough code in the training data.
- https://github.com/agoodway/.claude/blob/main/skills/elixir-...
- https://github.com/agoodway/.claude/blob/main/agents/elixir-...
- https://github.com/agoodway/.claude/blob/main/agents/elixir-...
Getting pretty good results so far.
They could've been sorted with precise context injection of claude.md files and/or dedicated subagents, no?
My experience using Claude suggests you should spend a good amount of time scaffolding its instructions in documents it can follow and refer to if you don't want it to end in the same loops over and over.
Author hasn't written on whether this was tried.
An ERP is practically an OS.
It now has
- pluggable modules with a core system - Users/Roles/ACLs/etc. - an event system (IE so we can roll up Sales Order journal entries into the G/L) - G/L, SO, AR, AP - rollback/retries on transactions
i havent written a line of code