We've been running agent workflows for a while now. The pattern that works: treat agents like junior team members. Clear scope, explicit success criteria, checkpoints to review output. The skills that matter are the same ones that make someone a good manager of people.
pglevy is right that many managers aren't good at this. But that's always been true. The difference now is that the feedback loop is faster. Bad delegation to an agent fails in minutes, not weeks. You learn quickly whether your instructions were clear.
The uncomfortable part: if your value was being the person who could grind through tedious work, that's no longer a moat. Orchestration and judgment are what's left.
Here's the thing - that feedback loop isn't a magic lamp. Actually understanding why an agent is failing (when it does) takes knowledge of the problem space. Actually guiding that feedback loop so it optimally handles tasks - segmenting work and composing agentic cores to focus on the right things with the right priority of decision making - that's something you need to be curious about the internals for. Engineering, basically.
One thing I've seen in using these models to create code is that they're myopic and shortsighted - they do whatever it takes to fix the problem right in front of them when asked. This causes a cascading failure mode where the code is a patchwork of one-off fixes and hardcoded solutions for problems that not only recur, they get exponentially worse as they compound. You'd only know this if you could spot it when the model says something like "I see the problem, this server configuration is blocking port 80 and that's blocking my test probes. Let me open that port in the firewall".
This assumes there aren't "engineers" involved in the busywork cycle, which I'm not sure is accurate.
eg. the work wssnt for engineers in the first place, or engineers worth their salt would have already automated it
You still need to do most of the grunt work, verifying and organizing the code. it's just you're not editing the code directly. Speed of typing out code is hardly the bottle neck.
The bottleneck is visualizing it and then coming up with a way to figure out bugs or add features.
I've tried a bunch of agents, none of them can reasonably conduct a good architectural change in a medium size codebase.
There's a paper that came out in the latter half of last year about this. I wished I kept its name/publisher around, but in synopsis is once you reach a particular amount of complexity in the task you're trying to achieve you'll run out of context to process it, compressing the context still loses enough details that the AI has to reconstitute those details on the next run, again, running out of context.
Currently at least, the task has to have the ability to be broken up in smaller chunks to work properly.
I disagree. Part of being a good manager of (junior) people is teaching them soft skills in addition to technical skills -- how to ask for help and do their own research, and how to build their own skills autonomously, how to think about requirements creatively, etc.
Clear specifications and validating output is only a part of good people management, but is 100% of good agent management.
Actually you can. Training data, then the way you describe the task, goals, checkpoints, etc is still training.
Yes, it's a crutch. But maybe the whole NNs that can code and we don't really know why is too.
Where are yall working that "writing code" was ever the slow part of process
What kind of work do you think people who deal with LLMs everyday are doing? LLMs could maybe take something 60% of the way there. The remaining 40% is horrible tedious work that someone needs to grind through.
Curiously, this is where automated checks, that people have known are useful for years but haven't been implementing widely enough, come in really handy!
Not just linters and code tests, but also various checks in regards to the architecture - like how the code is organized, how certain abstractions are used (e.g. if you want to enforce Pinia Setup instead of Option stores, and Vue Composition instead of Options API; or a particular ASP.NET or Spring Boot way of structuring filters and API endpoints and access controls) and so on.
Previously we just expected a bunch of devs to do a lot of heavy lifting along the lines of: "Oh yeah, this doesn't match our ADR described there, please follow the existing structure" which obviously doesn't work when the LLM produces code at 10x the rate.
I think the projects that will truly work well with increased use of agentic LLM use will be those that will have hundreds of various checks and actually ENFORCE standards instead of just expecting them to be followed (which people don't do anyways).
Sure, writing code was slower before the agentic coder era, but as people coded their understanding of what they coded grew with them while coding and that allowed people to make informed decisions on what to do next and how to fix things when they go sideways.
By replacing the human who writes code with an agent that does it but faster doesn't necessarily improve the speed of the overall process by the same amount. Some of the time saved in producing code is simply shifted elsewhere: to reading, validating, and reconstructing the understanding that previously emerged naturally while writing. If the human still needs a sufficiently deep mental model of the system in order to make correct decisions, diagnose failures, and decide what to do next, then that understanding must be acquired one way or another. When it no longer forms incrementally during the act of coding, it has to be rebuilt after the fact, often under worse conditions and with less context. In that sense, the apparent speedup only holds if we ignore the cost of comprehension and review; once those are included, the comparison becomes less about raw code throughput and more about where and how understanding is generated in the process.
Many people understand this tradeoff in general terms. Just like we generally understand the concept of technical debt.
But just as it's very hard to deal with classic technical debt, it will be very hard to counterbalance the short term gains of AI producing endless streams of code
Was it ever? If you don't care about correctness and just want the vibes, then hiring idiots for pennies and telling them to write unlimited code was always an option. Way before "AI" even existed.
And I mean pennies literally. Hell, people will do it for free. Just explain upfront that you only care that the code technically works.
OMG, I see you also deal with ______ Bank.
What I have seen in enterprise organizations is enough to turn a man pale and send him to an early grave.
I keep saying it on here. You become an architect with really smart and capable Junior developers. You can even have them do research to death and review their own code, and stop them once you're satisfied.
I feel like this was always true. Business still moves at the speed of high-level decisions.
> The uncomfortable part: if your value was being the person who could grind through tedious work, that's no longer a moat.
Even when junior devs were copy-pasting from stackoverflow over a decade ago they still had to be accountable for what they did. AI is ultimately a search tool, not a solution builder. We will continue to need junior devs. All devs regardless of experience level still have to push back when requirements are missing or poorly defined. How is picking up this slack and needing to constantly follow up and hold people's hands not "grinding through tedious work"?
AI didn't change anything other than how you find code. I guess it's nice that less technical people can now find it using their plain english ramblings instead of needing to know better keywords? AI has arguably made these search results worse, the need for good docs and examples even more important, and we've all seen how vibecoding goes off the rails.
The best code is still the least you can get away with. The skill devs get paid for has always been making the best choices for the use case, and that's way harder than just "writing code".
Same fallacy, new technology. I keep getting older, while managers fall for the same mythical power of numbers.
Actually I disagree. I've been experimenting with AI a lot, and the limiting factor is marketing. You can build things as fast as you want, but without a reliable and repeatable (and at least somewhat automated) marketing system, you won't get far. This is especially because all marketing channels are flooded with user-generated content (UGS) that is generated by AI.
But you can also think what would you want to build (for yourself or someone you know), that would otherwise take a team of people. Coding what used to be a professional app can now be a short hobby project.
I played with Claude Code Pro only a short while, but I already believe the mode of production of SW will change to be more accessible to individuals (pro or amateur). It will be similar to death of music labels.
Depends if you're talking about new client acquisition or expansion of existing products in order to assure your client doesn't leave.
The issue I see with this, at least in enterprise, is while we may fix some smaller plates of spaghetti, we're busy building massive tangled pasta apps that do even more.
Patently shocked to find this on profile:
> I lead AI & Engineering at Boon AI (Startup building AI for Construction).
In my experience so far, AI prototyping has been a powerful force for breaking analysis paralysis.
In the last 10 years of my career, the slow execution speed at different companies wasn't due to slow code writing. It was due to management excesses trying to drive consensus and de-risk ideas before the developers were even allowed to write the code. Let's circle back and drive consensus in a weekly meeting with the stakeholders to get alignment on the KPIs for the design doc that goes through the approval and sign off process first.
Developers would then read the ream and realize that perfection was expected from their output, too, so development processes grew to be long and careful to avoid accidents. I landed on a couple teams where even small changes required meetings to discuss it, multiple rounds of review, and a lot of grandstanding before we were allowed to proceed.
Then AI comes along and makes it cheap to prototype something. If it breaks or it's the wrong thing, nobody feels like they're in trouble because we all agree it was a prototype and the AI wrote it. We can cycle through prototypes faster because it's happening outside of this messy human reputation-review-grandstanding loop that has become the norm.
Instead of months of meetings, we can have an LLM generate a UI and a backend with fake data and say "This is what I want to build, and this is what it will do". It's a hundred times more efficient than trying to describe it to a dozen people in 1-hour timeslots in between all of their other meetings for 12 weeks in a row.
The dark side of this same coin is when teams try to rely on the AI to write the real code, too, and then blame the AI when something goes wrong. You have to draw a very clear line between AI-driven prototyping and developer-driven code that developers must own. I think this article misses the mark on that by framing everything as a decision to DIY or delegate to AI. The real AI-assisted successes I see have developers driving with AI as an assistant on the side, not the other way around. I could see how an MBA class could come to believe that AI is going to do the jobs instead of developers, though, as it's easy to look at these rapid LLM prototypes and think that production ready code is just a few prompts away.
This is what's missing in most teams. There's a bright line between throwaway almost fully vibe-coded, cursorily architected features on a product and designing a scalable production-ready product and building it. I don't need a mental model of how to build a prototype, I absolutely need one for something I'm putting in production that is expected to scale, and where failures are acceptable but failure modes need to be known.
Almost everyone misses this in going the whole AI hog, or in going the no-AI hog.
Once I build a good mental model of how my service should work and design it properly, all the scaffolding is much easier to outsource, and that's a speed up but I still own the code because I know what everything does and my changes to the product are well thought out. For throw-away prototypes its 5x this output because the hard part of actually thinking the problem through doesn't really matter its just about getting everyone to agree on one direction of output.
So is an 8-ball.
Since shipping prototypes doesn't actually create value unless they're in some form of production environment to effect change, then either they work and are ZeroOps or they break and someone needs to operate on them and is accountable for them.
This means that at some point, your thesis of
"The dark side of this same coin is when teams try to rely on the AI to write the real code, too, and then blame the AI when something goes wrong" won't really work that way but whoever is accountable will get the blame and the operations.
The same principles for building software that we've always have apply more than ever to AI related things.
Easy to change, reusable, compostable, testable.
Prototypes need to be thrown away. Otherwise they're trace bullets and you don't want to have tech debt in your tracer bullets unless your approach is to throw it to someone else ans make it their problem.
-----
Creating a startup or any code from scratch in a way that you don't actually have to maintain and find out the consequences of your lack of sustainable approaches (tech debt/bad design/excessive cost) is easy. You hide the hardest part. It's easy to do things that in surface look good if you can't see how they will break.
The blog post is interesting but, unless I've missed something, it does gloss over the accountability aspect. If you can delegate accountability you don't worry about evals-first design, you can push harder on dates because you're not working backwards from the actual building and design and its blockers.
Evals (think promtpfoo) for evals-first design will be key for any builder who is accountable for the decisions of their agents (automation).
I need to turn it into a small blog post but the points of the talk https://alexhans.github.io/talks/airflow-summit/toward-a-sha...
- We can’t compare what we can’t measure
- Can I trust this to run on its own?
Are crucial to have a live system that makes critical decisions. If you don't, have this, you're just using the --yolo flag.
At least ChatGPT, Gemini and Claude told me it was. I did so many rounds of each one evaluating the other, trying to poke holes etc. Reviewing the idea and the "research", the reasoning. Plugging the gaps.
Then I started talking to real people about their problems in this space to see if this was one of them. Nope, not really. It kinda was, but not often enough to pay for a dedicated service, and not enough of a pain to move on from free workarounds.
Beware of AI reviewing AI. Always talk to real people to validate.
I once solved a Leetcode problem kind of unorthodox and both ChatGPT and Gemini both said it was wrong in the same way. Then I asked both of them to give me a counter example and only Gemini was able to realize the counter example would have actually worked.
Similarly, it’s easy to think that the lowly peons in the engineering world are going to get replaced and we’ll all be doing the job of directors and CEOs in the future, but that doesn’t really make sense to me.
Being able to whip your army of AI employees 3% better than your competitor doesn’t (usually) give any lasting advantage.
What does give an advantage is: specialized deep knowledge, building relationships and trust with users and customers, and having a good sense of design/ux/etc.
Like maybe that’s some of the job of a manager/director/CEO, but not anyone that I’ve worked with.
What do you mean by “better”? The advantage is speed. Shipping a feature in 1 week instead of 1 month is a tremendous advantage
I like his thinking but many professional managers are not good at management. So I'm not sure about the assumption that "many people" can easily pick this up.
5 years ago: ML-auto-complete → You had to learn coding in depth
Last Year: AI-generated suggestions → You had to be an expert to ask the right questions
Now: AI-generated code → You should learn how to be a PM
Future: AI-generated companies → You must learn how to be a CEO
Meta-future: AI-generated conglomerates → ?
Recently I realized that instead of just learning technical skills, I need to learn management skills. Specifically, project management, time management, writing specifications, setting expectations, writing tests, and in general, handling and orchestrating an entire workflow.And I think this will only shift to the higher levels of the management hierarchy in the future. For example, in the future we will have AI models that can one-shot an entire platform like Twitter. Then the question is less about how to handle a database and more about how to handle several AI generated companies!
While we're at the project manager level now, in the future we'll be at the CEO level. It's an interesting thing to think about.
This is the kind of half baked thought that seems profound to a certain kind of tech-brained poster on HN, but upon further consideration makes absolutely zero sense.
@dang
If you become just a manager, you don't have answers to these questions. You can just ask the AI agent for the answer, but at that point, what value are you actually providing to the whole process?
And what happens when, inevitably, the agent responds to your question with "You're absolutely right, I didn't consider that possibility! Let's redo the entire project to account for this?" How do you communicate that to your peers or clients?
It would not be shocking at all if in 10 years, "Let's redo the entire project to account for this" is exactly how things work.
Or lets make 3 or 4 versions of the project and see what one the customer likes best.
Or each decision point of the customer becomes multiple iterations of the project, with each time the project starting from scratch.
Of course, at some point there might not be a customer in this context. The "customer" that can't handle this internally might no longer be a viable business.
"You're absolutely right" feels so summer 2025 to me.
I bring the table, AI brings the value.
Basically I don't see how you can be an AI maximalist and a capitalist at the same time. They're contradictory, IMO.
Byung-Chul Han - Psychopolitics should be standard text that everyone is discussing right now but instead we will probably do nothing and the future will suffer the consequences of our collective intellectual laziness.
neoliberal ideas vs Marxism is just incredibly intellectually lazy. We really need to think on the level of the way Marx did about the industrial revolution in a new way without being lazy and just falling back on the standard Marxist orothodxy religious ideas.
We don't just need a protestant reformation, we need an entire new religion to deal with this. I think that will be too hard so if I had to bet my bet would be on we do absolutely nothing.
If anything, managing the project, writing the spec, setting expectations and writing tests are things llms are incredibly well suited for. Getting their work 'correct' and not 'functional enough that you don't know the difference' is where they struggle.
one-shot means you provide one full question/answer example (from the same distribution) in the context to LLM.
The cost of a model capable of running an entire company will be multiples of the market cap of the company it is capable of running.
Also you're forgetting the decreasing cost of AI, as well as the fact that you can buy a $10k Mac Studio NOW and have it run 24/7 with some of the best models out there. Only costs would be the initial fixed cost and electric (250W at peak GPU usage).
AI is still being heavily subsidized. None of the major players have turned a profit, and they are all having to do 4D Chess levels of financing to afford the capex.
Fire MBAs and other “management” types. If they’re not technical and you’re building something technical, they need to go. Anyone who says otherwise gets fired too.
Keep the engineers who consistently get Exceeds Expectations. Fire everyone else. No Pip just go please.
Keep a few EE product managers. Fire the rest.
Hire a few QAs who can work with AI and work with product to ensure the stuff actually works. You don’t need that many people anymore and a couple of quality people can’t hurt. I don’t trust engineers enough, sorry. You need discerning eyes.
Fire everyone else. Give the best people AI and they will be able to put out more good work. If someone doesn’t get this, fire them too because they’re clearly not EE level.
Scale this to the whole org.
"AI labs"
Can we stop this misleading language. They're doing product development. It's not a "laboratory" doing scientific research. There's no attempt at the scientific method. It's a software firm and these are software developers/project managers.
Which brings me to point 2. These guys are selling AI tooling. Obviously there's a huge desire to dogfood the tooling. Plus, by joining the company, you are buying into the hype and the vision. It would be more surprising if they weren't using their own tools the whole time. If you can't even sell to yourself...
I don't know why you're trying to suggest some kind of restriction on the word "lab", or based on what. Calling them "labs" is perfectly normal, conventional, and justified terminology.