Also, it seems very buggy with the visuals. I see weird artifacts.
I agree that I'm not sure what value I'd see, as an employer, in a "visual designer" whose CV rips off something else's visual design. Much of the design doesn't belong to you (never mind the ROMs used in the Game Boy emulator) so my alarm bells about how much you respect IP would be going off ("this guy's gonna get us sued").
Now, on the other hand, if this were a display of some HTML, CSS, and JavaScript skills, I'd understand more, but then the title OP has given themselves seems off.
These people just destroy their ability to read and understand the systems they're working with. I actually see it as them making themselves redundant. Because if you can't understand anything without Claude, and Claude doesn't even give the right answers, then what are you worth?
Instead of using the LLM to create deterministic tools, we are using LLMs to replace them. It's completely backwards and I don't know why people (especially high ranking people in my company at least) seem to think that this is the way forward. No, I don't want a whole CI pipeline that is just LLM prompts. Yes it's very easy, but it's expensive, slow and prone to failure in ways you can't even predict.
Same things like using LLMs for the code review process. What would have been a simple linting rule is now a pass with an LLM rather than using the LLM to create the linting rule, which it is absolutely excellent at creating.
Yes, and we're also seeing lots of companies claiming they're using "AI" and it's just deterministic under the hood.
One of the first things we did when we got access to AI was make a Jira MCP. I try not to touch Jira anymore. I get Claude to just create the Jira issues, write comments, create subtasks, link issues together, etc.
I used to dread having to investigate how to implement something and break it down into tasks because the more granular I broke things down, the more Jira issues I had to create to capture each task. Now I can just write everything up in a file and send an LLM to do all the Jira crap.
Someone spends months or years of their life dedicated to writing a book. And people celebrate the fact they can get it for free, justify it by saying it's not free to search or host this content and offer to donate to piracy sites.
Rather than... Just supporting the author and buying their book?
It's different when this is American education and you're effectively being forced to buy books otherwise. I can understand fighting against that. But most stuff on the archive isn't that. It's just plain old piracy.
Yes a PDF or epub doesn't cost money to "print". Yes no one is "losing" money. But this isn't Netflix or Hollywood who still making billions regardless of piracy. Most of these authors are just regular people.
And the whole preservation angle makes sense when the books are no longer for sale. It's hard to argue preservation when you're linking to or hosting these works the second they are available to download. I'd be much more inclined projects that time walled the data, so you could effectively argue it's for preservation.
Because we broke copyright. There is room to quibble about exactly where and when, but the result is quite clear. The best summation I know of is from a speech by Thomas Babington Macaulay in the British House of Commons in 1841[1],
"At present the holder of copyright has the public feeling on his side. Those who invade copyright are regarded as knaves who take the bread out of the mouths of deserving men. Everybody is well pleased to see them restrained by the law, and compelled to refund their ill-gotten gains. No tradesman of good repute will have anything to do with such disgraceful transactions. Pass this law: and that feeling is at an end. Men very different from the present race of piratical booksellers will soon infringe this intolerable monopoly. Great masses of capital will be constantly employed in the violation of the law. Every art will be employed to evade legal pursuit; and the whole nation will be in the plot. On which side indeed should the public sympathy be when the question is whether some book as popular as Robinson Crusoe, or the Pilgrim's Progress, shall be in every cottage, or whether it shall be confined to the libraries of the rich for the advantage of the great-grandson of a bookseller who, a hundred years before, drove a hard bargain for the copyright with the author when in great distress? Remember too that, when once it ceases to be considered as wrong and discreditable to invade literary property, no person can say where the invasion will stop. The public seldom makes nice distinctions. The wholesome copyright which now exists will share in the disgrace and danger of the new copyright which you are about to create. And you will find that, in attempting to impose unreasonable restraints on the reprinting of the works of the dead, you have, to a great extent, annulled those restraints which now prevent men from pillaging and defrauding the living."
Are libraries unethical to use? You can go to your library and read books without paying for them.
Libraries aren't unethical, because they're just letting you borrow stock of books. There's practical limits on how it scales, and any impatient users might just buy the book. Once you can infinitely duplicate a work, it's not borrowing.
So what? I think, if you read a good book, learn something or are well-entertained, it's a positive externality, so there is no problem with people doing it for free.
The only real issue with IP piracy is when someone gets money by copying the works. Which were originally the cases copyright tried to prevent.
Maybe you can clarify why you see people doing these things for free a problem, when there is a net benefit to society and also you.
When people around me ask about how to "get into reading" I tell them to just find stuff they like online (via AA) or at the library and go from there. If you don't pay initially you don't feel as bad about trying things that may be "bad" or that you aren't interested in.
Publishers aren't just stealing money that should go to authors. We can debate percentages and such, but buying a book also pays the editors (who any author will tell you are just as important to a book as they are), the typesetters, the designers, etc.
Moreover, many respected academic publishers no longer provide proofreading or typesetting: they expect the authors or editors to commission their own proofreading, and the editors to just send in a PDF with camera-ready output.
For monographs, the “editor” that the publisher provides is only there to guide the author in producing their own camera-ready output, and does not actually do any work on the contents of the book. The publisher will hand off the manuscript to 1–2 peer reviewers, but those peer reviewers are unpaid.
In the more indie fantasy scene authors often pay for editing themselves out of pocket. Often the only "publisher" they can get is direct publishing through Kindle, which then locks them into exclusivity with Kindle/Amazon. It's frankly disgusting but it's a way to help them get paid. I'd rather kick these people $20-50 directly than do anything else.
There's been a reasonable amount of research that suggests that piracy doesn't really cannibalise sales from those who can afford to pay.
But I do agree that for some of their categories a time wall would improve their optics.
There's also the fact that just because a something is available to purchase in one country, doesn't mean it's available in other countries. A lot of movies/books/games/etc are geo-restricted in sale, with many countries having no valid methods to acquire them.
The best (but unrealistic) solution would be for people who can purchase legally to do so, while leaving it available for download for everyone else.
Academics have never really made any money off their published research, but rather are paid via their institutions or grants. The publishers make money, but academics themselves are aghast at the publishers taking their edited collections and monographs, doing no proofreading or even no typesetting (that obligation is often on the authors and editors now), and selling the book for hundreds of euro. That’s why authors will almost always send you the PDF for free if you email them.
The celebration is easy to understand if you are a researcher. Getting ahold of publications that your institution doesn’t hold or subscribe to is always a hassle, it really slows you down during the writing process. The shadow libraries turbocharge research. Over the last several years, shadow libraries have gone from a niche to something that pretty much everyone in my field uses daily.
The normal distribution of music and stories was for others to repeat them, and only recently have we decided it's illegal. I understand that things are different now, and people make a living off of art, but at the same time I find it difficult to care too much for someone who chose to make their hobby their job and refuses to adapt when things change.
And it seems that piracy has become a net benefit to new and niche artists. (https://www.sciencedirect.com/science/article/abs/pii/S01676...)
I'd posit that the book industry will turn out to be the same. Piracy will harm the bottom line of the companies already at the top while giving exposure to the authors at the bottom. The latter being the ones who often strong-armed into terrible financial deals just to gain access to book-industry's four big gatekeepers, and who likely need that exposure to help keep a roof over their heads.
Anecdotally, I'm one of those folks who end up purchasing many of the books I pirate or otherwise obtain for free, and I'm sure I'm not the only one who does this.
Like, no it doesn't seem like very high quality work... It just seems like a vibe coded tool.
Edit: yes it's wrapping Claude. It's BREAKING the TUI. Not sure what people aren't getting here...
The problem with being such a naysayer is that you're entirely disconnected from what's going on. You haven't tried an agent like Claude Code and experienced it for yourself, so you don't recognise what it looks like when it's in front of you.
1) This tool breaks the Claude TUI. Exactly as described by the comment.
2) The Claude TUI itself is broken. The comment is wrong, but assuming the "billion dollar TUI product" is capable of basic rendering and it's the wrapper that broke it, that is an entirely reasonable assumption
The fun here is that both of these softwares were made extensively using AI. No matter which of our options is the case here, the point stands. An AI-built product was shown, it looks obviously ass.
Claude Code correctly reduces its display to 7-bit ASCII in response (still functional, although less pretty). Once I get around to fixing this, it will probably result in another section in https://github.com/kstenerud/yoloai/blob/main/docs/dev/backe...
Edit: Looks like it's the terminal. That's a rabbit hole for another day.
Running through VS Code's terminal via VSCode tunnel, it looks like it normally does.
There's one major reason to have higher expectations for autonomous systems (of all kinds, not just LLM-powered) than for humans, at least those intended to be deployed at scale, and that's the scale. If a human makes a mistake, has biases, or even intentionally breaks the rules the impact of their actions is limited by the nature of them being a human, where something like an autonomous driving system, a coding agent, etc. is intended to be deployed by the thousands, millions, or more and any problematic behaviors happen at that scale.
There are obviously millions of bad drivers out there, but every one of the human ones is bad in different ways. If Waymo pushes a bad update there could be tens of thousands of "drivers" that suddenly become bad in identical ways.
Humans also have the ability to learn from our mistakes. The ones you'd want to have working for you usually don't make the same one twice. LLMs are pretty good at making the same mistake repeatedly, even the simplest things like basic math or counting letters.
Zero defects? Because you can always find at least one defect. But people don't naturally think statistically, so they grasp the thing that confirms their bias and then hang on tenaciously.
You'll notice the incredible amount of vitriol resulting from a purely cosmetic bug (which, it turns out, results from a missing TERM env in the base image - Claude is very conservative when it can't determine utf-8 support with 100% certainty).
> The Claude TUI itself is broken.
I mean this is also true. You forgot the third option, that 1 and 2 are true (and 4th, that neither are).Seriously, the Claude TUI fucking sucks. I don't know how anyone thinks otherwise. It breaks constantly if you enter your editor (<C-g>), or resizing windows/panes, or making another pane full screen, scrolling, or any number of things. It is objectively a bad piece of software.
And honestly, are we surprised? Anthropic says themselves that a lot of code is written by Claude. They've been saying that for years. If you look at agents now and think "man, agents a few years ago sucked" then this shouldn't be surprising at all! I mean FFS the thing spits out text and they designed it like a fucking game engine. It is silly
I don't know what the project is. All I see is a TUI that looks completely broken.
Go and use Claude Code right now. Does it look like that? Random underscores all over the page. No it doesn't.
His tool wraps Claude and breaks the TUI. What's so hard to understand?
That's valid critique. What world have I woke up in today?
> The question is why are you so eager to give critique on unrelated work, appearing in a demo screencap, to someone who didn't produce it?
I guess the question was actually, why were you so eager to critique a critique based on a false assumption?
I wish people would be careful what they support with their rhetoric.
That is not the question. The topic of discussion had been defined multiple times before you commented!
That's like blaming the company making hammers because you're unable to build a lasting house with the hammer, it really isn't up to Anthropic, but all about how you use the tool you're holding.
Microsoft is pretty shit at launching products, does that mean "products" as a concept is wrong? No, it just means Microsoft is bad at products, not more than that. Not sure why you have to extrapolate over an entire ecosystem just because one actor is bad at something.
I wouldn't trust a toolmaker who doesn't know how to use the tools decently.
I agree but would extend that qualification:
I wouldn't trust a toolmaker who doesn't know how to use the tools decently for exactly the same field of expertise.
> No, it just means Microsoft is bad at products
FYI, that's what people are saying...And if that's not true, then it's quite literally about how you're holding this hammer.
Just because the naked cowboy can paint well with just his penis, doesn't mean a penis is the right tool for painting. It doesn't matter how you hold your penis, it's not the right tool.
I can't decide which joke to make, either (little dick joke) "well yeah you'd have to be able to see your paintbrush in order to use it" or (big dick joke) "well yeah, if you can't even hold it in two hands, how are you supposed to paint with it?" so I'll just make both :-D
It is reasonable to both use the right tool for the right job, and demand better tools than you currently have. Success with the wrong tool in the wrong job doesn't mean it's the right tool for the right job.
Ok, I agree with this, don't use the wrong tool for the wrong job.
> It is reasonable to both use the right tool for the right job, and demand better tools than you currently have. Success with the wrong tool in the wrong job doesn't mean it's the right tool for the right job.
Yes, I agree with this too.
I'm still not sure how this relates to LLMs and particular this specific context. I claimed that the output of your agents depend on the developer driving it. You're saying "not every tool is right for every job", I agree with this too, but is that against/for what I said?
Could you just clearly write out exactly what you're arguing for here, no analogies or metaphors, just plain and simple, because I still feel like we're having two different conversations.
I'm sorry, but you need to look yourself in the mirror. You didn't like what they said so you jumped to the assumption that they must not have used CC (or any other agent). That if they had, they would have the same experience as you did/do. But this whole thread is exactly that conversation, that those experiences aren't shared. That this assumption is baseless. And you know what? That's okay. We're not robots. We're human. Each of us has our own unique world we live in. It's okay that people don't have the same experience as you. It's okay that their favorite color, food, activity, or whatever isn't the same as yours. I'm glad that we live in that kind of world. That's what makes things like culture. I don't want to live in a hive mind, and I don't think anyone else does either.
> I don't want to offend (it's AI coded anyway :)) but that does not scream "high quality" to me.
Honestly, I think this is where the big divide is. People have massively different opinions on what "quality" is. Which is okay, but it feels like everyone is working under some assumption that quality is this very clear objective measure that we all agree on. Clearly we don't. We didn't before AI and well... if you can't tell that we don't with AI... you need to take a step back.FWIW, I agree with Philip here. I don't think this screams "high quality" to me. I'm also not trying to take a shit on your project. Nothing screams "terrible" to me, but yeah, it does look a bit sloppy. There's no polish to it. It looks like someone that grades on "it works" and that's fine. But it also isn't everyone's cup of tea. Where the sloppiness comes in is like what Philip said. First thing I saw was the gif and well... I think Claude Code is sloppy. But this is also a great example at how and where LLMs visibly fail. Creating a box in text is pretty simple. There's tons of tools to do it. And the LLM 100% knows about characters like ⌜⌝⌞⌟⎜, it just doesn't use them and doesn't care. The code itself also looks very LLM generated.
It's fine and I don't think you have any reason to be ashamed of it, but I also wouldn't go around boasting that it is an example of high quality work too. And FWIW, I can't think of a single heavily LLM assisted code where I don't have similar feelings. I've seen stuff with more polish, but yeah, they feel off.
> TUI
This is a space I feel weird in. I love the terminal. I love that there's a lot of new TUIs. But it also feels very weird because it is extremely clear that a lot of these new TUIs were written by people (or machines) that don't really have a lot of experience in the terminal itself. There's a real shared language by people like me who live in the cli. There's a reason people like me can pick up a new tool and guess certain flags and certain ways to use them. It's because of a shared design language that we know of and we end up writing that way because we know it reduces to cognitive load on our peers. But the LLMs? They don't have that shared experience.I think this is true for a lot of stuff, not just TUIs or bash tools. Things just smell... off...
That's not what this product is; merely a tool it uses.
I also strongly suspect that you'd only taken a cursory glance at the top of the readme prior to passing judgment.
Now it was a long time ago I did Go professionally, but I'm also in the camp of "That doesn't really count as high-quality", although I know for a fact you can get quality code out of LLMs, but I don't think that's a good showcase of that.
Really? What duplication did you actually find? I count a few small ones in buildMounts and ReadPrompt, maybe 20 lines or so, but hardly anything worthy of such an epithet.
Admittedly, the parsing & escaping code and some utility functions could be moved outside to shrink the file, but otherwise I'm having trouble finding issues with the code.
Look for slight variations of the same thing but with different paths, variables, or modes and I think you'd be able to spot the rest as well.
But people are so quick to label their vibe-coded codebase as high quality and no grace is going to be given to a machine.
What comments are you seeing that are calling code from humans high-quality?
Because the end result is people committing bad code. For some random hobby project, sure who cares. But people are using this at work. The codebase is rotting in a new innovative way.
Either the bar has to be set at "actually good code comes out of vibe coding" or you have to accept that codebases are going to steadily become less usable by human coders who use their fingers to type in emacs.
Suddenly every dev needs an agent to even work with the slop. Seems like an outcome Anthropic would love though....
AI code is competent, but it's not great or high quality unless you have a good enough eye for quality to steer it with an iron hand. But if you do, you know the quality comes from proper guidance, so you still wouldn't say AI code is great. If you do say exactly that, it comes across as having low standards (which is fine if you own it) and people are going to jump on that just to bring you down a peg.
Because that is literally the hype being fed to us by the marketers at the AI companies and HN users promoting AI.
- AI promoters: "AI is doing Ph.D level work! LLMs are not just a token predictor, it is actually thinking and reasoning! It will replace all developers, including _you_, so get on board the AI hype train now!"
- AI promoters when confronted with blatant mistakes and reasoning errors from cutting edge models: "Why are you holding LLMs up to higher standards than humans? That's not fair or reasonable."
Be surprised then, because me, who left the critique, probably exclusively programmed with agents for the last year or so, so unlikely I think the code is bad because I "don't like AI". I don't love it either, but wouldn't call myself a AI-hater by any measurements, would be weird to write articles like this if so: https://emsh.cat/en/one-human-one-agent-one-browser/
E.g.
https://github.com/kstenerud/yoloai/blob/main/internal/fileu... <- that recursively creates directories, but will only change permissions on the innermost dir (user may be unable to cd into intermediary directories)
https://github.com/kstenerud/yoloai/blob/main/internal/mcpsr... <- all the json.Marshal calls in this file just suppress errors, so if anything un-marshallable ends up in there the app will return empty strings with no errors logged
https://github.com/kstenerud/yoloai/blob/main/runtime/regist... <- `Register` embeds a copy of the code from `IsAvailable` because of the locking; that could be replaced with a private `isAvailable` that has no locking that both use (after doing their own locking)
https://github.com/kstenerud/yoloai/blob/main/runtime/exec.g... <- these functions are identical except for the strings.Trim, one should just call the other and then trim the output
Just out of curiosity, I enabled some other linters and it looks bad. Excluding test files, there are 110 functions with a cyclomatic complexity over 10 and 7 that are _over 50_. The worst is at 86, which is mind-boggling.
Could probably find more, but you get the drift. I'm sure it runs, but stylistically this is more along the lines of what I would expect an intern to do.
This is also sort of nit-picky, but like half the stuff in https://github.com/kstenerud/yoloai/blob/main/docs/dev/backe... isn't idiosyncratic, it's just the way those things work and a lot of them aren't even tricky. The one linked is particularly blatant; that's not limited to os.Stat that's literally just how permissions work. Denying permission on inodes is a property of the folder, not the file.
Can't you see in the gif? It's completely broken. My Claude doesn't look like that. Neither does anyone else's.
Likely there are some terminal caps that aren't being properly preserved inside of the sandbox. It's never bothered me since the agent itself works fine.
"It's never bothered me". Cool. But your tool is bugged.
Or feel free to avoid the tool entirely if this UI issue shakes your faith in its overall quality down to its very foundations.
This is hardly a hill to die on.
You claimed high quality and provided a repo.
Did you not expect someone to actually look and critique it?
Whether the visual bugs are a deal breaker or not isn’t the point.
The point is that’s not high quality code, it may work. But it’s not code I would ship at my job and therefore it’s not high enough quality for anyone serious
But I still stand by the quality of my code, including here. You and I don't need to agree.
What decades of managing codebases (public and private, huge and small) has taught me is that there will always be an endless list of bugs and feature ideas and nice-to-haves and technical debt pressures in any given project. You'll never get to them all, so you prioritize (as I have done here). Functional bugs usually trump visual ones unless they're actually interfering with work.
Will I fix this bug? Probably, now that I'm aware of it. But there are more important matters to attend to first.
Edit: Turns out the bug comes from a mismatch with the terminal I'm using. With other terminals it looks fine. Term caps are surprisingly complicated, especially when you have multiple layers!
You aren’t having a disagreement with a person. You’re having a disagreement with reality.
How so? Are you going to instruct us all on how a termcaps mismatch bug is an indicator of poor code quality, rather than an unfortunate bug emerging from within the chaos of the many layers of disparate technologies that must somehow be stitched together (along with their idiosyncrasies) in order to make a project like this work?
You had a visual bug right at the top of the repos README. Then insisted you hadn’t noticed it before.
Whats important is not that specific visual bug, it’s what that bug says about the rest of the code.
How can we believe that this code is high quality if we see a glaring issue 5 seconds into opening the github?
We didn’t seek out your repo and start lobbing critiques at it. YOU POSTED IT as an example of high quality generated code. I’m telling you I am unimpressed
Really? So the discussion leading to the theory that there's likely a problem with termcaps disparity between layers didn't happen?
> Whats important is not that specific visual bug, it’s what that bug says about the rest of the code.
Really? So you can tell from a single cosmetic bug which doesn't affect its ability to perform its task, that the rest of the codebase is deficient? That's a pretty damn impressive skill!
Hater's gonna hate, I guess ¯\_(ツ)_/¯
The otherwise timid pack always circles after they sense a single drop of blood, no matter how small and insignificant.
Thanks for explaining it for me.
https://www.star-history.com/?repos=kstenerud%2Fyoloai&type=...
Also this reminds me of a principle I learned from a mentor. "People are visual buyers. If it looks good, people will think the code is good."
Unfortunately it doesn't matter whose fault the janky TUI is, people will see that and associate it with your software.
Early stage products will have some rough edges. We've seen that in Docker, Kubernetes, AWS, Azure, LXC, KVM, etc. And people griped and raged about the sheer incompetence of the maintainers and utter lack of quality, but they still used those tools even before the rough edges were polished away and folks finally settled down.
The less one pays for something, the more entitled one feels to whinge and heap on abuse.
I've been down this road so much now that it's no biggie if a few Karens want to blow off steam at my expense. I'm not above exposing their silliness though ;-)
Is your product really the same complexity as these?
Is it doing it to the same scale? No - it's a single user app. But have a look at https://github.com/kstenerud/yoloai/blob/main/docs/dev/backe... and you'll see the kind of shit a project like this has to handle. It's not trivial.
I regularly get pieces of work someone product guy has thought up in an afternoon. They only care about the happy path, and sometimes only part of the happy path. I work for a global company that has to abide by rules and regulations in each country we operate in. The product guy thinks up some feature, we implement the feature, then we're told "actually, we legally aren't allowed to do this in 90% of the markets we operate in". Cool, so we add an ability to disable it in those markets. Then they come back "We can do this in some of those markets if it's implemented with [regulatory bureaucracy], so can you do that please".
Then we have to hack away at the solution because the deadline is right around the corner.
This is not software engineering! None of this is related to the software. The job of a software engineer is to take a list of requirements and figure out the way we accomplish those requirements. Requirements gathering is NOT a software engineering problem. Software is implementation, product is behaviour. That's the split. The behaviour of the thing we're building needs to be known before we even try to seriously build it.
If someone just held back for week and did their due diligence, we would been able to architect a solution that is scaleable, extensible, easy to maintain and can make the future easier.
That's a theory but I've never seen this work in practice. A piece of software is unique. If it weren't, we'd just use the cp command.
What usually happens is you get a set of requirements that looks simple. Then you start thinking about a design and see 10 different possibilities, each corresponding to a slightly different interpretation of the requirements set. You iterate a few times reviewing the designs with who set the requirements and a few peers and see more possible variations to the requirements. You need to double check its parent requirements up to the master requirements. Then you need to take time/feature/quality tradeoffs, affecting the fulfillment of requirements.
Once starting to implement, you see dependencies to other software (framework, sdk, drivers, language features,...) and understand that other software is not what you thought, or has bugs. Or you see an issue with performance or see that one particular feature becomes unfeasible.
That's where all the complexity goes. AI doesn't change that, but can make prototyping iterations and bug hunting faster, as long as someone holds it on a leash and understands its decisions.
It has to be someone's job to push back on the Product Guy's stupid idea and answer all the awkward questions about the not-so-happy path with it. Unfortunately, because of the way we've ended up with this process, that person is often the engineer tasked with building it, without any effective political power to challenge the design process.
If there is a "hierachy" where product managers are seen as superiors to software development, i.e. where product managers decide what to and then only delegate the implementation to software developers, that product will invariably fail. Don't do that.
It's not the software engineer duty to know about how a given product is legal in what regulatory environment. That is something that must be hashed out upstream, well before tasking somebody to write a program.
Granted, an expert engineer with strong domain knowledge could be aware of those kind of pitfalls, and offer insights during the product development phase. But again, that should be done before committing to a schedule or making implementation decisions.
And it is given that not all requirements will have been discovered before a development start or that they may change during development.
My senior year software engineering class had a whole section on requirements gathering.
Do existing companies run entire end-to-end product integration tests on every single change they make to a repo to make sure something hasn't broken? No, they just architect things in a way such that a minor change to something can be tested in isolation. And that can be automated, deterministically and efficiently.
Where I work we can release changes to our production site in minutes almost completely autonomously with high confidence with absolutely zero AI agents in the loop. How did we do it? With lessons learned from the past 5 decades of professional software development experience.
Lets not forget what OpenClaw is at it's core. It's a glorified cron scheduler. Why on earth does any of this effort need to exist. It's not that deep, it's not that complex, it's all AI for AI's sake.
I run it in a firewalled VM and am very conscious about any tokens I give it access to - so far for all I know this was unnecessary.
PS. for me the core feature of OpenClaw isn't the cron, though that is nice. It's the memory and instant extensibility. Like it takes 5-15 minutes to add an SSH tool where all agent requests go through a manual review, together with a good auto loaded description that just works in all future sessions.
This is clearly an implementation and not a conceptual issue, as I had none of these issues using the same model with Hermes, for example.
Yes, that is _exactly_ the problem that is being solved. Is it easier to spin up some LLMs or pay a team of experienced engineers?
As inference costs fall, which will be cheaper?
Opencode has the same problems. They often do multiple releases of that app a day, yet within the span of a week or two I have had to update my config because some random change has altered the behaviour and my permissions broke. Or I've noticed the way the app renders is suddenly different.
Yet, my day to day usage has barely changed since the version I installed last year. It's like everything changes but nothing changes.