How current is the PyPI package? https://pypi.org/project/iroh/
If everyone expects specific behavior - ie it’s in the contract - you require that contract.
We generally did this in the avahi libraries, be fairly liberal with asserts that "shouldn't happen", it is a source of complaints though because basically you can be using a third party library that uses avahi and have your program crash due to a bug in that library, or in avahi. It's extra fun when using some historical libc systems such as "NSS" and you load a plugin to do hostname resolution, which nss-mdns does.. now you can have any program on the entire system crash if you are assert happy.
On the one hand I agree that if the result is going to be memory un-safety then perhaps you should assert, but more ideally you'd just fail gracefully and throw or return an error. That can sometimes be tricky though, if there is no good way to return an error or return a NULL value or similar. Depending on the API.
Of course, this is the entire reason behind the error return traditions of Golang and Rust, e.g: https://doc.rust-lang.org/book/ch09-00-error-handling.html
Which basically says what I said above :)
But in the case of curl_getenv, returning NULL seems a valid possibility (https://curl.se/libcurl/c/curl_getenv.html) as that is indicated to be done if you don't find the requested environment variable. Arguably the NULL environment variable is not found. so, this feels likely to be acceptable. Though I could see an argument for you now assuming the environment variable you were actually looking for not existing, but you didn't actually ask for one, and now your logic is broken and maybe you introduce a different class of security bug because you change your behaviour based on some environment variable not existing.
As always everything is a trade-off...
Ah but what high integrity computing, well there neither crashes nor memory corruption are welcomed, hence programming guidelines and certification workflows that would make most C devs cry with the language features they are allowed to use, and how each line of code gets analysed by tools and humans.
But it's >> 99.9% true that this will just crash even though it's acshually UB, nasal demons and so forth. Now raise this << 0.1% likelihood that it isn't true on some system with some compiler and build flags, to the power of the number of distinct deployed configurations out there, and you get the result which is the correct engineering decision of just moving on instead of spending your life filling straightforward code with pointless boilerplate assertions.
NB it can make sense to assert nonnull when the condition won't be tested on all code paths or the intention is otherwise not super obvious.
Is it though? Linux saw enough bugs from that kind of issue that they now build with -fno-delete-null-pointer-checks and accept the (supposed) performance penalty.
In general on a system where you trap when accessing the zero page, this optimization should be safe and a null pointer dereferences should (safely) trap.
If you mean that C compiler writers "should" prioritise sanity over high scores on microbenchmarks, then I agree. However in practice they do not and this optimization is not remotely safe.
(EDIT: what is not safe is indexing into a null pointer. For this you need to be safe you need -fsanitize=null)
We are talking about an extremely simple straightforward API with an obvious contract. It's good enough for this function to reliably surface almost all wrong uses with a segfault immediately. Wrong use will result in segfaults and otherwise bugs and crashes. The goal is not to work when used wrong but to work when used right. You cannot save the world from scratch in every little function. You still have a job to get done, and you have to move on.
Or you can take all of 10 minutes to put sanity-check assertions at the start of all your public-facing API functions, eliminating a source of security bugs, get on with your life, and worry about the performance implications as and when it becomes a problem (hint: it's never going to become a problem).
There are so many other interesting and relevant invariants that are usually in an API contract that are much harder or impossible to check upfront (let alone express formally in a type system), and even violations may be impossible to diagnose when they happen.
People focus on NULL because that's the only way they can apply their silly limited type systems. But NULL checks give very little return for investment. In practice, you'll see templated Option<T> types and whatnot, and when I have to look at or even work with such code I want to kill myself because it's so painful.
Given a, b, c input parameters to my func, it must hold that that a->m->t == b->t. c->mutex must be held, and c->cond is the condition variable that goes with c->mutex and will release any waiters on the buffer contained in a.
Or: Integer x is representable using 12 bits only, Integer y should be a multiple of N and I have a integer s is used as a bit-shift that should be less than 8.
Or: I need to guarantee that no locks have to be taken and no allocations have to be made on this complicated looking codepath. While holding a lock, we must not do any syscalls (syscall a, b, c are ok though), and surely not make any logging calls.
I know only one system that can express this, it's called STRAIGHTFORWARD CODE, and it requires doing engineering and casual logic out-of-band, and yes it does include making mistakes and repairing them incrementally.
I don't know a type system that would let me explain these things to me and tell me where I was wrong. But maybe you can show me, with 5 minutes of actually trying?
So define a wrapper type that represents that invariant (it's not going to take up space at runtime), where the only constructor enforces it?
> Integer x is representable using 12 bits only, Integer y should be a multiple of N and I have a integer s is used as a bit-shift that should be less than 8.
Those are all standard things that already exist?
> Or: I need to guarantee that no locks have to be taken and no allocations have to be made on this complicated looking codepath. While holding a lock, we must not do any syscalls (syscall a, b, c are ok though), and surely not make any logging calls.
Sounds like a pretty standard free monad case? Define a command algebra in which the "ok" syscalls are a subtype, and then require that the thing you want to only use the ok calls to have a type that reflects that?
> Define a command algebra in which the "ok" syscalls are a subtype
Dude, it's clear you're not doing any actual work. You are living in an ivory tower, and you underestimate the complexity and detail and volatility of real world applications by at least 3 orders of magnitude. You don't understand how to modularize and contain complexity.
You _cannot_ complete a project with this attitude.
You are ignorant of the fact that a type system is necessarily a blunt simplification of the real complexity. Therefore, use of types must be pragmatic, and actual logic must be coded in normal code (which should be obvious but it isn't to type theory weirdos). Otherwise, you require dependent typing or whatever, and you will have to write your code twice, once in a usable programming language and once in a very unusable programming language. Much more than only twice actually, given that all the implicit detail should apprently go explicitly formalized at the type level.
Just to make sure I'm not entirely talking out of my arse because I'm so incredibly annoyed by your otherworldly proposition, I asked an AI about the sel4 microkernel. It consists of 10,000 lines of C code (that says a lot about its practical utility, which is very limited), and of 1,3 million lines of manually written proof code (which says a lot about the practicality of proving).
My proposition is that "it's silently ok" isn't likely enough, which is in line with your position on "don't extend the contract to accept null". So what's left is crash, or something worse.
So if those are your choices, don't waste time justifying that a null can't get there, just add a check to ensure you get the better behaviour. It takes seconds.
You are requiring yourself to find a valid outcome for an input that doesn't make _any_ sense in the context of what your application is meant to achieve. How is that not a Sysiphean task?
You're actually pruning the valid state space; before, when the null value is passed to the function, there are more operations performed that have uncertain consequences. If you assert-and-fail when you get the null input, you've pruned those states.
"Because no one is expecting it to work if a null is passed", so you can do whatever. If you write an assert for every pointer passed to every function, that will be a lot of asserts, for pretty much the same outcome in practice. Asserts are just marginally more ergonomic when they trigger, but are a nuisance in the code often. So my position is to use them judiciously, but not overdo it, be instead focused on the actual task.
When the lack of non-null assertions is an actual problem during development, you have much larger structural issues.
Strangely, I've also observed that some customers respond very well to words dressed up by AI, even if the words oversimplify the truth. Now I'm working to understand why they want that. Are my customers not swimming in AI slop like the rest of us?
BTW, this doesn't mean I'm anti-AI. AI coding is an incredible superpower and I use it constantly, but it seems to me that AI coding works because code expresses the minutiae that is rightfully omitted from most other communication.
Some confuse this with LDS-OS, which makes the user weirdly and unquestionably `nice` by only accepting inputs from protected mode.
Seriously though, that's an overly-pedantic definition of a compiler. Broadly speaking, languages compile in a direction of decreasing abstraction. Crossing from one high-level abstraction to another is just asking for trouble, especially in this case where the target language makes very specific performance promises as long as certain abstractions are maintained.
That said, Pangram agrees and its track record is pretty good.
I'm not saying the quotes are fake, that would be horrific. I'm saying the rest of the article appears to have had minimal human intervention.
I agree so much with this. It's why I feel so stifled when an e.g. product manager tries to insulate and isolate me from the people who I'm trying to serve -- you (or a collective of yous) need to have access to both expertise in the domain you're serving, and expertise in the method of service, in order to develop an appropriate and satisfactory solution. Unnecessary games of telephone make it much harder for anyone to build an internal theory of the domain, which is absolutely essential for applying your engineering skills appropriately.
Another facet of this is my annoyance at other developers when they persistently incurious about the domain. (Thankfully, this has not been too common.)
I don't just mean when there are tight deadlines, or there's a customer-from-heck who insists they always know best, but as their default mode of operation. I imagine it's like a gardener who cares only about the catalogue of tools, and just wants the bare-minimum knowledge to deal with any particular set of green thingies in the dirt.
Edit: The main role of PM is to decide which features to build, not how those features should be built or how they should work. Someone has to decide what to build, that is the PM, but most PM are not very good at figuring out the best way for those features to work so its better if the programmers can talk to users directly there. Of course a PM could do that work if they are skilled at it, but most PM wont be.
So that we're on the same page, what I think should be PM responsibilities:
If I have a user story: "As a customer I want to purchase a product so that I can receive it at my address" - PM defines this user story as they have insight to decide if such feature is needed.
PM should then define acceptance criteria: "Given customer is logged in When they view Product page Then 'Add product to basket' button should appear", "Given 'Add product to basket' button When customers click on it Then Product information modal should appear" etc - PM should know what users actually want, ie whether modals should appears, or not; whether this feature should be available for logged users only, or not.
How this will work shouldn't matter to PM; these are AC they've defined.
Of course the process of defining AC should involve developers (and QA), because AC should be exhaustive to delivering given feature
In your example of an order placement - the PM has no special knowledge of what is a good customer order flow. Developers are usually way better at coming up with those by the dint of experience and technical knowledge of the current codebase and make the appropriate speed/polish trade-off.
PMs acts as an imperfect proxy for what the customer wants, making judgements off nothing more than their own taste. And though there are many great PMs, the taste of a PM is usually worse than that of developers and designers on average.
IMO the main business reason they exist is for organization accountability and ownership, despite the often negative value they bring.
Even the most verbose specifications too often have glaring ambiguities that are only found during implementation (or worse, interoperability testing!)
In practice, it isn't.
Product designers have to intuit the entire world model of the customer. Product managers have to intuit the business model that bridges both. And on and on.
Why do engineers constantly have these laughably mind blowing moments where they think they are the center of the universe.
Software people do what they do better than anyone else. I mean obviously! Just listening to a non-software person discuss software is embarrassing. As it should be.
There's something close to mathematics that SWEs do, and yet it's so much more useful and economically relevant than mathematics, and I believe that's the bulk of how the "center of the universe" mindset develops. But they don't care that they're outclassed by mathematicians in matters of abstract reasoning, because they're doers and builders, and they don't care that they're outclassed by people in effective but less intellectual careers, because they're decoding the fundamental invariants of the universe.
I don't know. I guess I care so much because I can feel myself infected by the same arrogance when I finally succeed in getting my silicon golems to carry out my whims. It's exhilarating.
If the programmer gets to intimately understand the user's experience software would be easier to use. That's why I support the idea of engineers taking support calls on rotation to understand the user.
Both can be true at the same time, a product manager who retains the big picture of the business and product, and engineers who understand tiny but important details of how the product is being used.
If there were indeed perfect product managers, there would no need for product support.
A lot of the error messages I'd write were for me, especially those errors I never expected to see.
The typical feedback I'd get from end users is "your software doesn't work". If they can send me a screenshot of the error I'm halfway to solving the problem.
Similarly, by siloing the world model in one or two heads, you disable the team dynamics from contributing to building a better solution: eg. a product manager/designer might think the right solution is an "offline mode" for a privacy need without communicating the need, the engineering might decide to build it with an eventual consistency model — sync-when-reconnected — as that might be easier in the incumbent architecture, and the whole privacy angle goes out the window. As with everything, assuming non-perfection from anyone leads to better outcomes.
Finally, many of the software engineers are the creative type who like solving customer problems in innovative ways, and taking it away in a very specialized org actually demotivates them. Many have worked in environments where this was not just accepted, but appreciated, and I've it seen it lead to better products built _faster_.
Broadly similar general concept to the team responsible for the DNSSSEC signing keys for an entire ccTLD.
Yeah a x509 PKI / root CA is a very different thing than DNSSSEC but they have a number of general logical similarities in that the chain of trust ultimately comes down to a "do not fuck this up" single point of failure.
I think most low-end projects done in KiCad are not tested beyond making sure there's no red squiggly underlines at a glance. You are your own F5 key and assembler/runtime crash reporter. Proper circuit verification through software simulation isn't needed for most digital designs unless you do your own wireless antenna, analog amps, and/or DRAM/PCIe/GbE/etc.
"File > New Project from Template"
KiCAD comes with all the usual suspects, including Arduino and the various hats. You can get pmod templates, etc. They're actually really nice.
I use the pmod template all the time because it saves time and they're convenient to plug into Arty dev boards. PCBs are so cheap and quick I'll often make a quick PCB with a template because I just want a cleaner connector system. PCBs are basically bread boards these days.
https://techexplorations.com/guides/kicad/3e/create-a-new-ki...
I guess in theory, the original question is whether this project allows a board to be sent of for construction at a company that makes and populates boards. Yes, you could do this if you wanted to. As numpad0 has said though, it's early days for these boards and if you wanted to do something commercially reliable, you will most likely run into issues with things not being completely tested on these boards yet.
These boards provide the ability to make your own boards to host the chipsets yourself, rather than relying on a third party providing the board. So what? What if you want USB-C? What if you want to make a square or a circular board? This project is a good step along the way to allowing you to make these kinds of things.
On the hobbyist and corporate side, they also provide a way to provide a modern design that can use USB-C, which is becoming very common and is better than older USB options.
As mentioned in the README.md "Available Development Boards" section, the Atmega16u2 chip was hard to come by for Hanqaqa in 2023. The Arduino guys (arduino.com ?) probably did a "lifetime buy" of these comms chips and they probably also have several shelves of fully built Arduino boards as well. Lifetime buys and keeping good stock levels mitigate the risk of difficulty building new boards... Just get one of the older working ones off the shelf and send it. However, for an organisation (even an open source board that becomes fairly popular) wanting to build their own board, not having a given comms chip is a problem. Replacing it with a commonly available one makes it much easier for people/companies wanting to build these boards in any kind of numbers.
Having the board design readily available is really useful for the reasons above. It does seem like overkill if you just want to fiddle with a board, but if you make something that becomes popular that needs any kind of hardware adjustment, having the design becomes almost essential.
Wonderful that there's a Free version of these designs out there. The bugs and kinks will get sorted out over time.
It doesn't really make sense economically for me to write software for work anymore. I'm a teacher, architect, and infrastructure maintainer now. I hand over most development to my experienced team of Claude sessions. I review everything, but so does Claude (because Claude writes thorough tests also.) It has no problem handling a large project these days.
I don't mean for this post to be an ad for Claude. (Who knows what Anthropic will do to Claude tomorrow?) I intend for this post to be a question: what am I doing that makes Claude profoundly effective?
Also, I'm never running out of tokens anymore. I really only use the Opus model and I find it very efficient with tokens. Just last week I landed over 150 non-trivial commits, all with Claude's help, and used only 1/3 of the tokens allotted for the week. The most commits I could do before Claude was 25-30 per week.
(Gosh, it's hard to write that without coming across as an ad for Anthropic. Sorry.)
I looked at some stats yesterday and was surprised to learn Cursor AI now writes 97% of my code at work. Mostly through cloud agents (watching it work is too distracting for me)
My approach is very simple: Just Talk To It
People way overthink this stuff. It works pretty good. Sharing .md files and hyperfocusing on various orchestrations and prompt hacks of the week feels as interesting as going deep on vim shortcuts and IDE skins.
Just ask for what you want, be clear, give good feedback. That’s it
Of course exceptions apply. Some basic information that will reliably be discovered is still worth adding to your AGENTS.md to cut down on token use. But after a couple obvious things you quickly get into the realm of premature optimization (unless you actually measure the effects)
If you find any, consider making them into skills or /commands or maybe even add them to AGENTS.md.
My time is worth more than tokens. I’m thinking of maybe creating some .md files to save me time in code review. If I do it right, it’s going to cost more in tokens because the robots will do more.
Personally, I tend to get crap quality code out of Claude. Very branchy. Very un-DRY. Consistently fails to understand the conventions of my codebase (e.g. keeps hallucinating that my arena allocator zero initializes memory - it does not). And sometimes after a context compaction it goes haywire and starts creating new regressions everywhere. And while you can prompt to fix these things, it can take an entire afternoon of whack-a-mole prompting to fix the fallout of one bad initial run. I've also tried dumping lessons into a project specific skill file, which sometimes helps, but also sometimes hurts - the skill file can turn into a footgun if it gets out of sync with an evolving codebase.
In terms of limits, I usually find myself hitting the rate limit after two or three requests. On bad days, only one. This has made Claude borderline unusable over the past couple weeks, so I've started hand coding again and using Claude as a code search and debugging tool rather than a code generator.
I'd absolutely love to see exactly what you're doing (...well, maybe in a world where I had unlimited time or could clone myself...) because as tight as the usage limits are I absolutely cannot fathom hitting them THAT early.
What are the requests like, and have you noticed what is Claude doing during them? Is it reading an entire massive codebase or files that are thousands of lines long? Or are you loaded up with many MCPs or have an ever-growing CLAUDE.md?
I have a shell script that automates this. If all tests pass, the shell script prints "200/200 passing" with very little token spend. If only 190/200 pass, the shell script reports the names of every test that failed, and now Claude does a process of
1) run the compiler binary -> 2) get assembly output and inspect for obvious errors -> 3) assemble -> 4) verify that the assembler did not report errors -> 5) run test binary, connect with gdb, and find the issue -> 6) edit the compiler source -> 7) recompile the compiler -> 8) back to 1
multiplied by 10 for the 10 failing tests. This eats up tokens very quickly. I realize that not every use case is going to look like this. But if I didn't have Claude verify against the test suite, then I'd be getting regressions left and right, and then what's the point?
The whole codebase (tests included) is less than 15k lines, so I don't think that's the issue. No MCPs. CLAUDE.md about 1.5k lines.
I've found this can be vastly reduced with AGENTS.md instructions, at least with codex/gpt-5.4.
In TFA they found that prompting mitigates over-editing up to about 10 percentage points.
I've used that ever since. Works most of the time, but other stuff is often failing and I've learned to become distrustful of an agent very quickly. One mistake where I point it out and the agent corrects itself is fine if it keeps working well after. A second mistake when it's trying to fix the first one or an inability to understand or a claim that it fixed it but it didn't is instant termination (after dumping context for the next agent).
What people have is radically different expectations.
I noticed engineers will review Claude's output and go "holy crap that's junior-level code". Coders will just commit because looking at the code is a waste of time. Move fast, break things, disrupt, drown yourself into tech debt: the investors won't care anyways.
And no, telling the agent to "be less shit" doesn't work. I have to painstakingly point every single shit architectural decision so Claude can even see and fix it. "Git gud" didn't work for people and doesn't work for LLMs.
It's not that the code isn't DRY, it's just DRY at the wrong points of abstraction, which is even worse than not being DRY. I manage to find better patterns in each and every single task I tell Claude or Copilot to autonomously work on, dropping tons of code in the process (DRY or not). You can't prompt Claude out of making these wrong decisions (at best from very basic mistakes) since they are too granular to even extract a rule.
This is what separates a senior from a junior.
If you think Claude writes good code either you're very lucky, I'm very bad at prompting, or your standards are too low.
Don't get me wrong. I love Claude Code, but it's just a tool in my belt, not an autonomous engineer. Seeing all these "Claude wrote 97% of my code" makes me shudder at the amount of crap I will have to maintain 5 years down the line.
I've thought about this and I think the reason is as follows: we hold code written by ourselves to a much higher standard than code written by somebody else. If you think of AI code as your own code, then it probably won't seem very acceptable because it lacks the beauty (partly subjective as all beauty tends to be) that we put into our own code. If you think of it as a coworker's code, then it's usually alright i.e. you wouldn't be wildly impressed with that coworker but it would also not be bad enough to raise a stink.
It follows from this that it also depends on how you regard the codebase that you're working on. Do you think of it as a personal masterpiece or is it some mishmash camel by committee as the codebases at work tend to be?
This means it can do anything in the VM, install dependencies, etc... So far, it managed to bork the VM once (unbootable), I could have spent a bit of time figuring out what happened but I had a script to rebuild the VM so didn't bother. To be entirely fair to claude, the VM runs arch linux which is definitely easier to break than other distros.
Maybe that's because the harness maybe (not sure; haven't looked at their source code) has it baked in? Doesn't matter; the point is that it works.
Now, the one thing I heavily dislike is the UI it generates...it doesn't seem to realize that matching UI patterns with the existing codebase is quite important.
Because it is that uneven. Some problems it nails at first go or with very little cosmetic changes.
In others it decides on solution, hallucinates parts that do not exist like adding API calls or config options that do not exists and gets the basics wrong.
Similarly you do something that's somewhat common pattern, it usually nails it. If you do something that subtly differs in certain way from a common pattern, it will just do the common pattern and you get something wrong.
It's bitten me several times at work, and I rather not waste any more of my limited time doing the re-prompt -> modify code manually cycle. I'm capable of doing this myself.
It's great for the simple tasks tho, most feature work are simple tasks IMO. They were only "costly" in the sense that it took a while to previously read the code, find appropriate changes, create tests for appropriate changes, etc. LLMs reduce that cycle of work, but that type of work in general isn't the majority of my time at my job.
I've worked at feature factories before, it's hell. I can't imagine how much more hell it has become since the introduction of these tools.
Feature factories treat devs as literal assembly line machines, output is the only thing that matters not quality. Having it mass induced because of these tools is just so shitty to workers.
I fully expect a backlash in the upcoming years.
---
My only Q to the OP of this thread is what kind of teacher they are, because if you teach people anything about software while admitting that you no longer write code because it's not profitable (big LOL at caring about money over people) is just beyond pathetic.
The view of Claude on HN is extremely positive and nearly every thread will have highly positive comment "that is not an ad".
I think people are seeing others just irked by the constant stream what feels like ads and reading it as Claude being somehow disliked.
I don't measure my productivity, but I see it in the sort of tasks I tackle after years of waiting. It's especially good at tedious tasks like turning 100 markdown files into 5 json files and updating the code that reads them, for example.
Are you working more on operational stuff or on "long-running product" stuff?
My personal headcanon: this tooling works well when built on simple patterns, and can handle complex work. This tooling has also been not great at coming up with new patterns, and if left unsupervised will totally make up new patterns that are going to go south very quickly. With that lens, I find myself just rewriting what Claude gives me in a good number of cases.
I sometimes race the robot and beat the robot at doing a change. I am "cheating" I guess cuz I know what I want already in many cases and it has to find things first but... I think the futzing fraction[0] is underestimated for some people.
And like in the "perils of laziness lost"[1] essay... I think that sometimes the machine trying too hard just offends my sensibilities. Why are you doing 3 things instead of just doing the one thing!
One might say "but it fixes it after it's corrected"... but I already go through this annoying "no don't do A,B, C just do A, yes just that it's fine" flow when working with coworkers, and it's annoying there too!
"Claude writes thorough tests" is also its own micro-mess here, because while guided test creation works very well for me, giving it any leeway in creativity leads to so many "test that foo + bar == bar + foo" tests. Applying skepticism to utility of tests is important, because it's part of the feedback loop. And I'm finding lots of the test to be mainly useful as a way to get all the imports I need in.
If we have all these machines doing this work for us, in theory average code quality should be able to go up. After all we're more capable! I think a lot of people have been using it in a "well most of the time it hits near the average" way, but depending on how you work there you might drag down your average.
[0]: https://blog.glyph.im/2025/08/futzing-fraction.html [1]: https://bcantrill.dtrace.org/2026/04/12/the-peril-of-lazines...
The first (and maybe even second) usage of a gnarly, badly thought out pattern might work fine. But you're only a couple steps away from if statement soup. And in the world where your agent's life is built around "getting the tests to pass", you can quickly find it doing _very_ gnarly things to "fix" issues.
I think you're likely in the silent majority. LLMs do some stupid things, but when they work it's amazing and it far outweighs the negatives IMHO, and they're getting better by leaps and bounds.
I respect some of the complaints against them (plagiarism, censorship, gatekeeping, truth/bias, data center arms race, crawler behavior, etc.), but I think LLMs are a leap forward for mankind (hopefully). A Young Lady's Illustrated Primer for everyone. An entirely new computing interface.
Much like giving a codebase to a newbie developer, whatever patterns exist will proliferate and the lack of good patterns means that patterns will just be made up in an ad-hoc and messy way.
I've been doing a greenfield project with Claude recently. The initial prototype worked but was very ugly (repeated duplicate boilerplate code, a few methods doing the same exact thing, poor isolation between classes)... I was very much tempted to rewrite it on my own. This time, I decided to try and get it to refactor so get the target architecture and fix those code quality issues, it's possible but it's very much like pulling teeths... I use plan mode, we have multiple round of reviews on a plan (that started based on me explaining what I expect), then it implements 95% of it but doesn't realize that some parts of it were not implemented... It reminds me of my experience mentoring a junior employee except that claude code is both more eager (jumping into implementation before understanding the problem), much faster at doing things and dumber.
That said, I've seen codebases created by humans that were as bad or worse than what claude produced when doing prototype.
I can’t wait for all the future vibe coded projects to be exploited by the black hats waiting in the shadows for things to reach a critical state. I don’t believe in anthropic because they love to lie.
I'm fascinated by this question.
I think the first two sections of this article point towards an answer: https://aphyr.com/posts/412-the-future-of-everything-is-lies...
I've personally had radically different experiences working on different projects, different features within the same project, etc.
This is the problem.
I think there is a huge gap between people on salaries getting effectively more responsibility by being given spend that they otherwise would not have had and people hustling on projects on their own.
Yes it is 100% what I use but I am never happy with usage. It burns up by sub fast and there is little feelings of control. Experiments like using lower tier models are hard to understand in reality. Graphify might work or it might not. I have no idea.
1. Is a product/software you develop novel? As in does it do something useful and unique? Or it's a product that already exists in many varietes and yours is just "one of ..."?
2. What if one day, LLMs will get regulated/become terrible/raise prices above your budget. Do you have plans for that?
2. Regulation? I'm sceptical that the cat can be put back into the bag. It's already out there. More realistic problem is the business model part - openweight/local provides a counterpoint to that.
1. Even really novel projects have large chunks of glue code and boring infrastructure that the novel bits depend on. claude means I spend 10% of my time on the borng stuff and 90% of time on stuff I previously onky had 10% of my day to work on. In my experience the software picked up our idioms fast and for context, we have a skill file explaining code standards.
2. codex and gemini are comparable when paired with a good harness (pi.dev). if things ever get really bad, I'll drop 8k on a dedicated agent coding server and run it locally. I tried it recently with my current system and it was sub par but I was running a drasticly simpler model.
Edit: The lurkers and the commenters must be a pretty different set of people I suppose.