boscillator

Born on March 11, 2023•134 Karma

18 hours ago

•on: Springer Nature has removed two studies by Max Pla...

> detailed information about specific retractions is usually confidential and can only be shared with the relevant authors.

Good luck sharing that information with Max Planck. It's amazing how robotically humans can act sometimes. I suppose this could be an AI or automated response, but it's just as likely it's someone following the letter of the law without using any critical thought.

fhdkweig•

17 hours ago

I think this is a good example of Kafkaesque.

poizan42•

16 hours ago

I really wish they would have asked the representative to confirm that they can only share detailed information with the skeletal remains of an author who died 78 years ago. Not that I think it would make any difference, but it would force the representative to acknowledge the absurdity of the situation.

boscillator•

3 days ago

•on: The Coming Loop

> the right fix is not "handle every malformed case." ... [LLMs] will still attempt to handle now impossible errors.

This is the number one code smell from LLMs and I don't know why they are so obsessed with it. In python, it often comes as `hasattr` checks on types that are defined to have that attribute, in a code base that is fully type-checked.

Why do they do that? Is it from pre-training or re-enforcement? If that latter, can the labs please fix this?

rzmmm•

3 days ago

Likely just that they err on the unnecessary error handling than missing error handling. They likely penalize runtime errors harshly in the training

jerf•

3 days ago

I suspect it's mostly the training data. I am also on team "make illegal states unrepresentable". It may get talked about a lot on HN, but I'm still at the point that I'm surprised when I see a code base that I didn't write in the wild that does a really good job of it, either open source or at work. Most programmers still think in terms of picking up pieces and fixing errors at the point where the error message pops out rather than making it so the error can't happen and the data reflects that.

I say "mostly" because I think there's also a problem with AIs thinking this way in their current state. That last level of human understanding of a code base, where the human holistically understands the flow of those guarantees, is a challenge to give them right now. On the raw code level, this sort of thing often involves enough code to easily blow out their context window. Trying to summarize it in memories-style files has its own problems; just because there is text written down about the guarantees doesn't mean that the AI is going to get the right info out of it, any more than a human might from just reading the code. I won't say it's "impossible" to give an AI this understanding because I'm not sure it is, but it is a level of understanding of the code that even if you get them to have it, their practices tend to fight against it.

My own solution to this problem has largely been to give up on them getting this. I prompt a solution to the problem the way that most people do, then if I want to make bad illegal states unrepresentable I prompt the AI through the process of the necessary refactorings, unless it's so small that I just do it myself. Given a lot of code that uses maps/dicts and arrays and strings and ints, if you prompt it through making those more thoroughly typed, it's actually pretty good at it. I've not had a lot of luck getting good designs out of single prompts, even when I get detailed. Treating it as two separate tasks seems to work out well.

And watch the diffs on the types carefully; AI loves to sneak past a ".JustSetItAndIgnoreAllThePreAndPostConditions(string)" method. After all, I suspect there's plenty of training data of "types that are nicely structured to make error states unrepresentable and then a later maintainer came along and added a 'JustEffingDoIt' method that broke everything" in the field. One of the best defenses is to make sure that the type implementing these things is in its own file and you can easily look at all the methods it adds on those types and smack it when it does that. I've tried slathering warnings about not doing this and explaining the pre- and post-conditions being maintained in the docs but the change seems marginal.

ambicapter•

3 days ago

Because the vast majority of the codebases in its training set aren't fully type-checked, or very clean at all. Or it's just snippets from Stack Overflow, so there's no existing context to not assume null-checking is valid.

skywhopper•

3 days ago

It’s because it matches the patterns they are trained to follow. They don’t understand the code. They can’t reason about the actual logic flow. They can only work with patterns.

efromvt•

3 days ago

million times this - getattr on every dataclass is a wild choice

CuriouslyC•

3 days ago

Sorry to say but the solution is to stop using python. The models are trained to code defensively assuming historically representative python codebases. The models trust the types a lot more in languages where the canonical historical examples trust the types because the language is constructed around that premise.

zahlman•

3 days ago

I would expect a language model to do a better job of coping with that kind of uncertainty, inferring type from name and usage, etc.

boscillator•

7 days ago

•on: Show HN: Modeloop – From visual algorithms to micr...

Mostly unrelated, but what is going on with the "I specifically approve section ... of the terms and conditions" when you sign in without an account. Is this a new requirement somewhere?

Other than that, seems interesting! Simulink could always do with a competitor, although I'm always saying Simulink needs a text-based interface. Same signal flow programming model that supports scopes and continues time integrators, just with text instead of drag-and-drop.

lucamark•

7 days ago

To be legally binding and enforceable, they must be explicitly and individually approved by the user. Even when signing in as a guest/without an account, using the app creates a license agreement, so the clickwrap flow must legally capture this specific assent to protect the underlying engine's IP.

Regarding the needs a text-based interface, I totally agree with you! Most of the time you can create models directly within the terminal or with a simple script. That's why we actually provide a CLI that allows you to build and run models. You can initialize models, add blocks with parameters, connect ports, validate the graph, and trigger code generation all from the terminal. Nevertheless, we think the GUI is useful to review the model and inspect it graphically

boscillator•

16 days ago

•on: RIP software hackathons. Long live the hardware ha...

> A bad design with a good presentation is doomed eventually. A good design with a bad presentation is doomed immediately. - Akin's 20th law of spacecraft design

I always really enjoyed making a slick presentation. It was a lot of fun figuring out how to scope the hardest problem you are sure you can finish in 24hr while still having time to polish your presentation and make the app look good. I find picking a problem that lets you put a big map on the screen helps with the latter.

afavour•

16 days ago

I get that… but that’s basically a startup pitching competition. It’s not a hackathon.

inigyou•

16 days ago

Aren't most hackathons pseudo-startup pitching competitions? At the very least, they've always been about established companies trying to extract value from newcomers.

boscillator•

18 days ago

•on: Dopamine Fracking

Huh, I've been to plenty of places were you could order it for an up charge and it came in it's own little bottle.

boscillator•

26 days ago

•on: Security Envelope Pattern collection – S.E.C.R.E.T

Tomas Pynchon has much to say about this in The Crying of Lot 49.

boscillator•

61 days ago

•on: GoDaddy gave a domain to a stranger without any do...

They show up as the #2 ad spot when you search "register a domain" and most people don't know any better.

boscillator•

5 months ago

•on: Notice of collective action lawsuit against Workda...

It will be fascinating to see the facts of this case, but if it is proven their algorithms are discriminatory, even by accident, I hope workday is held accountable. Making sure your AI doesn't violate obvious discrimination laws should be basic engineering practice, and the courts should help remind people of that.

zugi•

5 months ago

An AI class that I took decades ago had just a 1 day session on "AI ethics". Somehow despite being short, it was memorable (or maybe because it was short...)

They said ethics demand that any AI that is going to pass judgment on humans must be able to explain its reasoning. An if-then rule says this, or even a statistical correlation between A and B indicates that would be fine. Fundamental fairness requires that if an automated system denies you a loan, a house, or a job, it be able to explain something you can challenge, fix, or at least understand.

LLMs may be able to provide that, but it would have to be carefully built into the system.

nemomarx•

5 months ago

I'm sure you could get an LLM to create a plausible sounding justification for every decision? It might not be related to the real reason, but coming up with text isn't the hard part there surely

zugi•

5 months ago

> I'm sure you could get an LLM to create a plausible sounding justification for every decision.

That's a great point: funny, sad, and true.

My AI class predated LLMs. The implicit assumption was that the explanation had to be correct and verifiable, which may not be achievable with LLMs.

storystarling•

5 months ago

It seems solvable if you treat it as an architecture problem. I've been using LangGraph to force the model to extract and cite evidence before it runs any scoring logic. That creates an audit trail based on the flow rather than just opaque model outputs.

fwip•

5 months ago

It's not. If you actually look at any chain-of-thought stuff long enough, you'll see instances where what it delivers directly contradicts the "thoughts."

If your AI is *ist in effect but told not to be, it will just manifest as highlighting negative things more often for the people it has bad vibes for. Just like people will do.

nullc•

5 months ago

Yes, they will, they'll rationalize whatever. This is most obvious w/ transcript editing where you make the LLM 'say' things it wouldn't say and then ask it why.

SpaceNoodled•

5 months ago

It sounds like you're saying we should generate more bullshit to justify bullshit.

teraflop•

5 months ago

They said "could", not "should".

I believe the point is that it's much easier to create a plausible justification than an accurate justification. So simply requiring that the system produce some kind of explanation doesn't help, unless there are rigorous controls to make sure it's accurate.

rilindo•

5 months ago

> Fundamental fairness requires that if an automated system denies you a loan, a house, or a job, it be able to explain something you can challenge, fix, or at least understand.

That could get interesting, as most companies will not provide feedback if you are denied employment.

zugi•

5 months ago

Fair point. Maybe the requirement should be that the automated system provide an explanation that some human could review for fairness and correctness. While who receives the explanation may be a separate question, the drawback of LLMs judging people is that said explanation may not even exist.

direwolf20•

5 months ago

This is the law in the EU, I think

em-bee•

5 months ago

the way i understand it is that the law says decisions must be reviewed by a human (and i am guessing should also be overrideable), but this still leaves the question how the review is done and what information the human has to make the review.

ottah•

5 months ago

I hate this. An explanation is only meaningful if it comes with accountability, knowing why I was denied does me no good if I have no avenue for effective recourse outside of a lawsuit.

candiddevmike•

5 months ago

Would love to see some of the liability transfer to the companies using Workday too...

boscillator•

5 months ago

•on: Ask HN: Share your personal website

https://buchanan.one

boscillator•

5 months ago

•on: Show HN: Play poker with LLMs, or watch them play ...

No, I'm not super certain, but I believe most solvers are trained to be game theory optimal (GTO), which means they assume every other player is also playing GTO. This means there is no strategy which beats them in the long run, but they may not be playing the absolute best strategy.