Hacker Newsnew | past | comments | ask | show | jobs | submit | pdyc's commentslogin

i use smaller model gemma e2b for most of my editing and it works surprisingly well. Workflow is planning with sota models and execution via small models. If you plan properly dont leave ambiguity for smaller model it works well.

Out of curiosity have you tried other small models? The e2b for me was unusable. Llama3.2 3b was better and that thing is a year old and I rarely use it now too.

yes i keep on trying small models, i have also tried qwen 3.5 0.8B, 2B, 4b and gemma4 e4B models but they either did not worked reliably (thinking loop, issue in following instruction) or there were performance issues (prompt speed, tg speed, too much ram) e2b was the sweet spot where i could give it plan and it can edit files properly.

How did e2b compare to e4b ?

i did not see much improvement for my use case i.e. file editing tasks but with e4b tg/s is lower so i stick with e2b.

That makes sense it sounds like your computer isn't super powerful. Whatever works for you

- Tool for organizing files, pasted data, and prompts into markdown snippets you can copy into different AI chats.

- Calculator that gives tg/s and vram required based on model params and ddr settings.

- Auto create dashboard from csv/json files or apis Easyanalytica.com

- snippet viewer for html/react that allows annotation and sharing based on url fragments


why do people want to continue to use anthropic despite their shitty service? its not like they have some kind of lock-in as it is still new company and it has shown its color before we are stuck with it unlike google/meta etc.

Totally agree. This is why open source models and toolings are so important for the ecosystem. I would not want these companies decide what we can or cannot do.

That's a great question. Maybe other services have flaws too.

I did a showhn with similar idea(got a whooping 1 point and was flagged as spam which was later removed by mods), you paste your html and it encodes it into url, you can share the url without server involvement. I even added a url shortener because while technically feasible encoded url becomes long and QR code no longer works reliably. I also added annotation so you can add your comments and pass it to colleagues.

https://easyanalytica.com/tools/html-playground/


1. How does this work? window.open('about:blank'); and then a document write?

2. The share svg icons look very broken.


that guy is not including ffmpeg and is not encoding in browser. What he is doing is generating a ffmpeg command that you can run on your cli/scripts etc.

PS: i am that guy :-)


i was with them until

"We ran our own analysis sampling 150 profiles per repo across 20 projects and found repos where 36-76% of stargazers have zero followers and fork-to-star ratios 10x below organic baselines"

This does not looks like appropriate signal to use on github, i doubt that this is organic baseline.If this is used as metric than study might be flawed.


i dont get it, mac has unified memory how would offloading experts to cpu help?


I bet the poster just didn’t remember that important detail about Macs, it is kind of unusual from a normal computer point of view.

I wonder though, do Macs have swap, coupled unused experts be offloaded to swap?


Of course the swap is there for fallback but I hate using it lol as I don't want to degrade SSD longevity.


can you elaborate? you can use quantized version, would context still be an issue with it?


A usable quant, Q5_KM imo, takes up ~26GB[0], which leaves around ~6-7GB for context and running other programs which is not much.

[0] https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF?show_fil...


context is always an issue with local models and consumer hardware.


correct but it should be some ratio of model size like if model size is x GB, max context would occupy x * some constant of RAM. For quantized version assuming its 18GB for Q4 it should be able to support 64-128k with this mac


For the 9B model, I can use the full context with Q8_0 KV. This uses around ~16GB, while still leaving a comfortable headroom.

Output after I exit the llama-server command:

  llama_memory_breakdown_print: | memory breakdown [MiB]  | total    free     self   model   context   compute    unaccounted |
  llama_memory_breakdown_print: |   - MTL0 (Apple M3 Pro) | 28753 = 14607 + (14145 =  6262 +    4553 +    3329) +           0 |
  llama_memory_breakdown_print: |   - Host                |                   2779 =   666 +       0 +    2112                |


what was your data size? i am surprised 800kb made a difference? using stringzilla was smart approach,my guess is it being unusually faster made all the difference.


Usually I'm dealing with about 20mb of compressed data, almost 100mb uncompressed. Even with only a couple mb of data SQLite still has a startup time of a couple hundred milliseconds on my phone. But that's a couple hundred milliseconds when loading a database that's already decompressed. When loading 100mb SQLite usually took a second or so which I didn't really like for a pwa.

It took me quite a few attempts to get something faster than SQLite. My new format loads instantly because I'm just casting the data to a struct. The only thing that takes time is decompressing, but that's still faster than loading the uncompressed via SQLite. My phone loads 100mb from 20mb compressed in about 400ms.

But writing my own format gives other benefits like being able to extract all the HTML tags and capital letters beforehand for fast and sensible search and reconstructing it on render. It's also just way easier for me to edit tsvs with markers for what parts are indexed and have that transformed into an indexed format with 3 indexes.

Also, with SQLite I was just running one module, but with my new format I'm running about 20 instances of it because it keeps the data nicer, more manageable and makes everything very parallel. Though I keep the number of web workers to 2 because it doesn't seem to benefit much to increase it more.

https://github.com/tnelsond/peakslab


This is really cool. I'm working on stuff that is somewhat aligned with this - offline knowledge base/educational platform focused on things like appropriate technologies for rural people in the developing world. Storing in the browser and, more importantly, searching it is definitely one of the major challenges. (it's also just a much more dynamic app)

My main question about this is whether it can be dynamically/incrementally updated within the browser? Eg new material is available or edits have been made, so sync it from backend and it gets merged in.

I've been working on using rxdb to sync and store in browser - it can use its own indexeddb abstraction, sqlite or it's own OPFS-based DB. It can also load any of these into memory in its memory-mapped mechanism. I've also made a mechanism to load everything into flexsearch in a sharedworker, so that you can do full text search fairly performantly.

It's a lot of complexity though. I'd be curious to hear any of your thoughts. Or even to chat if you're open to it!


I'm not sure I follow exactly, but if I understand you mean that when the database file is updated that the app updates? Right now on app load it updates the service worker and shows the files in cache first. If there's a newer file it fetches it in the background, it then sends a message to the client that there is a new file. I haven't implemented the next part yet but it should be able to invalidate the current file and load the new file without refreshing the page. Right now the new files will load the refresh after the new service worker is activated.

But the page still had to be refreshed to load the new service worker. I'm looking into ways to cut the time to loading the new files down because right now you have to refresh the page 3 times for the new files to take over.

The .peak files aren't designed to be a database that you can just add to during runtime, they're rather static and highly efficient in that context. But it's easy to edit the source files and generate a new .peak file from that.

You can take a folder of any kind of files and run peakgen on it and it will create a compressed .slab file that you can search and fetch results from just like the .peak files. I first saw that done with SQLite and I really liked it, so I knew I could do it too.

If you want to chat you can shoot me an email.


i also used fragment technique for sharing html snippets but url's became very long, i had to implement optional url shortener after users complained. Unfortunately that meant server interaction.

https://easyanalytica.com/tools/html-playground/


(I left a stand alone comment, but:) A little update: I added privacy-focused optional shorter URLs to SDocs.

You can read more about the implementation here: https://sdocs.dev/#sec=short-links

Briefly:

  https://sdocs.dev/s/{short id}#k={encryption key}
                      └────┬───┘   └───────┬──────┘
                           │                │
                      sent to           never leaves
                       server           your browser

We encrypt your document client side. The encrypted document is sent to the server with an id to save it against. The encryption key stays client side in the URL fragment. (And - probably very obviously - the encryption key is required to make the sever stored text readable again).

You can test this by opening your browser's developer tools, switch to the Network tab, click Generate next to the "Short URL" heading, and inspecting the request body. You will see a base64-encoded blob of random bytes, not your document.


Really nice implementation by the way.

Re URL length: Yes... I have a feeling it could become an issue. I was wondering if a browser extension might give users the ability to have shorter urls without losing privacy... but haven't looked into it deeply/don't know if it would be possible (browser extensions are decent bridges between the local machine and the browser, so maybe some sort of decryption key could be used to allow for more compressed urls...)


i doubt it would be possible, it boils down to compression problem compressing x amount of content to y bits, since content is unpredictable it cannot be done without having intermediary to store it.


For this use-case, maybe compression and then encoding would get more data into the URL before you hit a limit (or before users complain)?

I.e. .md -> gzip -> base64


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: