More

nl · 2026-05-11T00:19:40 1778458780

I think it's useful to be realistic about what you can do with a local model, especially something as small as the 9B the author is using. A 9B model is around the level of Sonnet 3.6 - it can do autocomplete and small functions but it loses track trying to understand large problems.

But the are interesting and fun to play with! I do a LOT of work on local agent harnesses etc, mostly for fun.

My current project is a zero install agent: https://gemma-agent-explainer.nicklothian.com/ - Python, SQL and React all run completely in browser. Gemma E4B is recommended for the best experience!

This is under heavy development, needs Chrome for both HTML5 Filesystem API support and LiteRT (although most Chromium based browsers can be made to work with it)

It's different to most agents because it is zero install: the model runs in the browser using LiteRT/LiteLLM (which gives better performance than Transformers.js), and Filesystem API gives it optional sandbox access to a directory to read from.

It is self documenting - you can ask questions like "How is the system prompt used" in the live help pane and it has access to its own source code.

There's quite a lot there: press "Tour" to see it all.

Will be open source next week.

furyofantares · 2026-05-11T04:27:02 1778473622

But I was doing a lot more than autocomplete and small functions with Sonnet 3.5.

Xeoncross · 2026-05-11T13:41:18 1778506878

I agree, earlier Sonnet wasn't that great, but Sonnet 3.5 is where things really came together. The difference was night-and-day. Sonnet 3.7, 4.0, 4.5, etc... didn't have as drastic of a change to me.

walthamstow · 2026-05-11T14:00:00 1778508000

I remember even after 3.7 was released I kept using 3.5 in Cursor because it just did exactly what I wanted

potatoman22 · 2026-05-11T04:47:01 1778474821

Not to be nitpicky, but many of the 4-12b models are somewhere between GPT-3.5 and GPT-4o-mini. It's hard to find a good comparison though, because the benchmarks people score models against change so often. For reference, Sonnet 3.6 came out about a year after GPT 3.5

nl · 2026-05-11T05:25:34 1778477134

Don't worry about being nitpicky! I'm going to out-nitpick you....

Actually....

I write and publish my own benchmark for this stuff. It's an agentic SQL benchmark which isn't in the training data yet and I've found can separate frontier models from close-followers (the only models to get 100% are Opus 4.6 and GPT 5.5).

The best small model I've found is a fine-tune of Opus-3.5 9B which scores 18/25: https://sql-benchmark.nicklothian.com/?highlight=Jackrong_Qw...

Haiku 4.5 scores 20/25, and Haiku is certainly better than Sonnet 3.6. GPT 3.5 scores 13/25.

ai_fry_ur_brain · 2026-05-11T00:21:51 1778458911

[flagged]

nl · 2026-05-11T00:35:24 1778459724

I think knowledge is power.

I think that the more people who try local models (especially the larger ones) the better.

I sometimes get the impression that many people claiming that local models are as good as frontier models work in "token poor" environments. If you can't build large-scale programs using at least Opus 4.5+ then it's difficult to compare. They compare something like Qwen 27B with Sonnet and see that it is nearly as good, but miss that the frontier models are a lot better.

That knowledge is power, too.

I personally can help making local models more accessible. I can't make Opus cheaper.

bachmeier · 2026-05-11T00:52:29 1778460749

> I sometimes get the impression that many people claiming that local models are as good as frontier models work in "token poor" environments. If you can't build large-scale programs using at least Opus 4.5+ then it's difficult to compare.

I sometimes get the impression that people posting comments on HN don't realize that LLMs do more than vibe coding.

BubbleRings · 2026-05-11T01:26:43 1778462803

Yeah no kidding. For instance, if you are an independent inventor trying to write a patent while keeping your patent lawyer expenses to a minimum, you want to write as much of the first draft(s) of the patent as you can yourself. (You’ll save billable hours with your patent lawyer, and you’ll end up with a better patent because you’ll communicate your innovations more clearly to your lawyer.)

However, and this is the big thing, you absolutely do not want to be asking a SOTA LLM for help with the language in your patent application. This is because describing your invention to a web based LLM could be considered a public “disclosure” of your invention, which, (after a one year grace period goes by), could put your invention in the public domain, basically—and thereby prevent you (or anyone else) from being able to ever patent the invention. Plus, you know, a random unscrupulous employer at the SOTA company could be reviewing logs and notice your great idea, and file a patent on it before you do, and remember, the United States patent office went to “first to file” in 2013.

Oh and don’t take legal advice from random people in the internet by the way.

solenoid0937 · 2026-05-11T01:37:19 1778463439

> This is because describing your invention to a web based LLM could be considered a public “disclosure” of your invention, which, (after a one year grace period goes by), could put your invention in the public domain, basically—and thereby prevent you (or anyone else) from being able to ever patent the invention.

This is simply not true. Even if it were true (and again, it's not) you could simply use zero data retention APIs.

No one at the big model companies is trawling through your chats to steal your patents. It's not only illegal and against their own terms of service, but these people have better uses of their time.

BubbleRings · 2026-05-12T00:00:43 1778544043

If a competitor to your business discovers that you used a free online AI to help draft your patent 1.5 years ago, that competitor could then cause your patent to be invalidated, which could be greatly to their benefit of course.

The Terms of Service (ToS) for Open/Public AI (e.g., free consumer versions of ChatGPT, Gemini, Claude) often reserve the right to store your prompts and use them to train and refine the model.

Doing an enabling disclosure of your patent draft to another party that is not bound by a non-disclosure agreement is a big mistake, at least while the case law has not yet been settled.

My post was meant to be encouraging to people that might be considering local LLM for this specific use case, where protecting confidential information is of particular importance.

solenoid0937 · 2026-05-12T07:52:20 1778572340

1. This absolutely wouldn't count as disclosure under the eye of the law.

2. ZDR + frontier LLMs would still be far more effective than local LLMs.

3. By your logic you can't upload patent drafts to Google Docs because Google hasn't signed an NDA, but this obviously is not the case.

The law is a lot less strict to the letter than you think it is. Intent and spirit of the law matter. Any judge would throw such a case out.

nl · 2026-05-09T12:24:30 1778329470

But they don't?

Mythos is a 10T model. Opus is a 5T model.

That's not an exponentially growing amount of compute but it is achieving exponential improvements (eg from Mozilla: https://blog.mozilla.org/en/privacy-security/ai-security-zer... )

le-mark · 2026-05-09T13:13:56 1778332436

> but it is achieving exponential improvements

“Exponential” used here is pure hyperbole. Can you justify it?

coldtea · 2026-05-09T13:37:46 1778333866

Compute doesn't necessarily linerarly follow parameters. And with how many active parameters Mythos vs Opus gets its effectivenes from? Is it 1x or 2x? We don't know. We don't even know the parameters (it's more of rumor than confirmed 10T iirc).

But even more so, who said the improvements are "exponential"? Mozilla's single metric, that doesn't even prove anything of the sort?

minitech · 2026-05-09T14:57:10 1778338630

I know parameters don’t translate directly like that (and that linear and exponential aren’t the only types of growth) but a doubling as a go-to example of “not exponential growth” is pretty funny.

_heimdall · 2026-05-09T14:36:23 1778337383

Wasn't 4.6 Sonnet a 1T model?

Parameters and compute are quite the same thing, but going from 1T to 5T to 10T is quite a ramp up.

crthpl · 2026-05-09T15:04:29 1778339069

where the heck did you get those parameter numbers from?

nl · 2026-05-10T07:14:56 1778397296

Sonnet and Opus are from Elon Musk (given the people he's hired it seems likely it is approximately true). Mythos is quite widely spoken about.

nozzlegear · 2026-05-09T14:19:23 1778336363

> Mythos

Ah yes, the marketing model that's ostensibly so powerful us mere mortals aren't allowed to use it. It's certainly led to exponential hype and speculation.

nl · 2026-05-08T01:53:42 1778205222

The labs are spending hundreds of millions of dollars hiring people doing many fairly random (but economically valuable) tasks to collect this tacit knowledge for RL.

It works really well.

mrdomino- · 2026-05-08T03:46:20 1778211980

It ceases to become tacit as soon as it is collected.

Maybe this rephrase will help: the proposed solution is to render all knowledge explicit.

nl · 2026-05-08T06:13:42 1778220822

> It ceases to become tacit as soon as it is collected.

I'm not sure.

It it is collected via preferences then it isn't necessarily something that can be communicated (except in the LLM's latent space).

That still feels tacit to me.

To simplify that argument, the relationship between King and Queen in the Word2Vec latent space can be easily explicitly labelled.

But the relationship between Napoleon and Tsar Alexander I also exists and encodes much of the tacit knowledge about their relationship but isn't as easily labelled (eg, Google AI Mode says "Napoleon I and Tsar Alexander I had a volatile "bromance" that shifted from mutual admiration to deep animosity, acting as a defining conflict of the Napoleonic Wars".)

Word2Vec is a very simple model. In a more complex LLM that deeper knowledge can be queried by asking questions but you can never capture it all. Isn't that what "tacit knowledge" is?

mrdomino- · 2026-05-08T21:43:34 1778276614

It's a good question, yeah, and a lot of these boundaries get fuzzy when they're looked at closely enough.

It's certainly the case that LLMs already are able to represent and make use of some kinds of apparently still tacit knowledge, and that the scope of that is apparently expanding. I don't question that. I question two things: whether it is always desirable for that scope to expand, and whether it is possible for that scope to ever fully cover what it seeks to cover.

nl · 2026-05-07T01:05:25 1778115925

Say what you like about Sam Altman, but given how Anthropic is scrambling to sign capacity deals for compute we can sure say he was right about the capcity build out needed.

stingraycharles · 2026-05-07T01:12:48 1778116368

That’s correct, but from what I understand his move was also strategic: to choke the market.

Having said that, Anthropic’s position is fully understandable, as Sam took a very large risk here, and OpenAI’s future is all but certain.

sigmar · 2026-05-07T05:21:38 1778131298

Scrambling? Seems to me xAI built too much capacity (for what they can use in 2026). Does that mean OpenAI built the right amount? I don't see how this proves that just because we see one AI company willing to sell compute. We don't even know the terms/pricing.

nl · 2026-05-07T11:12:36 1778152356

> Scrambling?

Yes.

To quote:

> Anthropic CEO Dario Amodei said his company tried to plan for 10-fold growth. But revenue and usage increased 80-fold in the first quarter on an annualized basis, which he says explains why it’s been so hard to keep up with demand.

> “That is the reason we have had difficulties with compute,” Amodei said Wednesday at his company’s developer conference in San Francisco. Amodei added that the company is “working as quickly as possible to provide more” capacity and will “pass that compute on to you as soon as we can.

https://www.cnbc.com/2026/05/06/anthropic-ceo-dario-amodei-s...

I think "scrambling" is a fair characterization of the CEO saying "we have had difficulties with compute" and "working as quickly as possible to provide more"

They've also signed new compute deals with Google and AWS recently.

lmm · 2026-05-07T04:01:42 1778126502

Or the bubble he was pumping hasn't popped yet. We won't be able to say how much of this capacity was actually "needed" until 10 years in the future, if ever.

nl · 2026-05-07T04:30:12 1778128212

The point is that Anthropic is already a decent way into eating through all that capacity, and it's based on real revenue.

lmm · 2026-05-07T06:05:15 1778133915

Some entities are paying for it, sure. I'm still not convinced that's because it's "needed".

nl · 2026-05-07T12:05:56 1778155556

True. No one "needs" the internet or computers, right?

lmm · 2026-05-07T22:55:03 1778194503

People are getting real stuff done with the internet. But there were also a whole lot of overhyped companies that rightly crashed back in '99.

Once we've gone through the AI equivalent of the dot.com crash, will Anthropic still be scrambling for more capacity, or will they have more than they can profitably use, like the dark fiber we were left with last time?

nl · 2026-05-08T03:47:04 1778212024

Depends when it happens. I'm sure they'll take up whatever capacity is available.

At the moment computer providers are charging more for outdated H100 capacity now than when the H100s were new. That capacity is going to the smaller labs, not the frontier labs.

That hardware has already been depreciated financially so even if all those small labs disappeared it's not sending computer providers bankrupt - they can just cut prices and so long as they can charge more than electricity and maintenance they'll just keep them running.

nl · 2026-05-06T02:37:01 1778035021

I was there.

It's mostly rose tinted glasses.

There were some amazing feats. But it was slow and frustrating. Like you wouldn't believe how long things took.

In the 90s most technical documentation was in actual physical books. If you wanted to learn something you had to order and buy the book (and Amazon wasn't a thing everywhere!), and it would take weeks or months to arrive. Or you did inter-library loans (which were amazing but also took weeks).

Or you relied on magazines which had a publication cycle. Writing actual physical letters about a program that was written out in the magazine was a thing.

When I got internet access in the mid-90s I remember emailing someone to ask about mirrors of their documentation project because I didn't want to use up their bandwidth.

I'd never ever want to go back. Bring on the future!

bigstrat2003 · 2026-05-07T06:53:46 1778136826

I was also there. It isn't rose-colored glasses. The tech truly was better back in the 90s and 2000s. Not perfect, certainly, but it was made by people who gave a damn and were trying to actually make good things. It wasn't like today, where most software is slop (not necessarily AI slop, but slop nonetheless) churned out by companies whose decisions are made solely by how much profit they can squeeze out no matter if the quality tanks.

No, we really did lose something along the way.

nl · 2026-05-07T11:17:00 1778152620

> It wasn't like today, where most software is slop

Did you ever use Window 3.1? Or any windows software?

Are you forgetting the multiple crashes per day? Or the incessant playing with system.cfg to get MSDOS drivers to work?

And Linux was only better because it didn't do much.

The only hardware damage software has ever caused was some 1994 Linux driver for a Trident video card getting a frequency wrong and frying a monitor.

> churned out by companies whose decisions are made solely by how much profit they can squeeze out no matter if the quality tanks.

As opposed to 1990's Microsoft or IBM?

nl · 2026-05-05T10:09:50 1777975790

That makes the title of another of his posts very ironic then:

"Automatic programming"

https://antirez.com/news/159

robertlagrant · 2026-05-05T10:42:06 1777977726

Not really:

> I started to refer to the process of writing software using AI assistance (soon to become just "the process of writing software", I believe) with the term "Automatic Programming"

nl · 2026-05-05T10:08:06 1777975686

Antirez wrote Redis. That is "production-code with critical concurrency"

To quote another of his posts:

> I fixed transient failures in the Redis test. This is very annoying work, timing related issues, TCP deadlock conditions, and so forth. Claude Code iterated for all the time needed to reproduce it, inspected the state of the processes to understand what was happening, and fixed the bugs.

...

> In the past weeks I operated changes to Redis Streams internals. I had a design document for the work I did. I tried to give it to Claude Code and it reproduced my work in, like, 20 minutes or less (mostly because I'm slow at checking and authorizing to run the commands needed).

From "Don't fall into the anti-AI hype" https://antirez.com/news/158

zozbot234 · 2026-05-05T11:28:02 1777980482

His summarized assessment from that very post: "...state of the art LLMs are able to complete large subtasks or medium size projects alone, almost unassisted, given a good set of hints about what the end result should be. The degree of success you'll get is related to the kind of programming you do (the more isolated, and the more textually representable, the better: system programming is particularly apt), and to your ability to create a mental representation of the problem to communicate to the LLM."

He's saying you should be writing up complex, highly detailed specs for the LLM to turn into code, stressing that it's critical to work in a self-contained and "textually representable" problem domain. This is not one-shotting complete products from a vague prompt. You're still going to need software architects, and they'll still be doing much the same work. Turning fully-specified design into code has never been a "10x" task, it was always regarded as a relatively straightforward, if often tricky part of the job. And the way he worked with Redis makes it clear that you can't take what the AI delivers at face value, either: you'll have to go through it yourself, and that will take time and effort.

oulipo2 · 2026-05-05T11:57:52 1777982272

First he didn't write Redis with LLMs, it was way before. Second I'm not speaking of him in that comment.

Also his whole blog is about how, in order to do a task, he would need to spec it properly, then do "code inpainting" with the LLM, then fix all the issues that he could spot only because he's a senior, then repeat, etc

Did you read it?

nl · 2026-05-05T10:04:13 1777975453

The author said "You know what was the biggest realization of all that?"

> For high quality system programming tasks you have to still be fully involved, but I ventured to a level of complexity that I would have otherwise skipped. AI provided the safety net for two things: certain massive tasks that are very tiring (like the 32 bit support that was added and tested later), and at the same time the virtual work force required to make sure there are no obvious bugs in complicated algorithms.

nl · 2026-05-05T09:52:38 1777974758

I think this is a bad framing.

Javascript running on a page can use a feature that requires a model to be downloaded.

I have pages that use it, or other LLM models via LiteRT or HuggingFace transformers.js.

I try to warn the user, but that is my responsibility as a page author. I like that this is enabling the web platform to remain competitive.

The author is pulling a long bow by trying to claim this is some GDPR violation. Have they ever used the web? There are inefficient sites everywhere, with autoplaying video etc.

4GB isn't nothing, but if a page wants to use it then hopefully it is useful to the user!

nl · 2026-05-05T04:17:47 1777954667

Note that one large factor in this is Anthropic and OpenAI's lobbying to ban large open source models. They frame it as "China is stealing US intellectual property" which plays well in the current environment, but be very clear: it is just an anti-competitive play.

To make that very clear, contrast the loud public blog posts about how "Chinese Companies" were using multiple accounts to distill model data, and the complete silence when Elon Musk admitted in court they do the same for Grok.

(The side issue that LLM outputs are probably only protected under contract law, not as something that is copyrightable is a distraction here BTW)