Jwahir Sundai

In search of a good local LLM

Published: May 18, 2026 · 2 min read

Local language models are genuinely exciting to me, and I also think it's important to be honest that they're not there yet and that finding a good one is still more work than it should be. Running a model on your own machine with no subscriptions, no one storing your prompts, and no rate limits cutting you off mid-thought feels like real ownership in a space that doesn't offer it very often, and that's worth caring about. But the path to actually finding something useful is messier than most people let on.

I've been through the same cycle more times than I'd like: download something, run a quick test, decide it's not quite right, and go looking for the next one. At some point I had five or six models sitting on my machine and couldn't have told you why I'd kept any of them.

Model size is a hardware constraint, not a quality one

Bigger models aren't automatically better, they're just slower on hardware that wasn't built for them. A 13B model might technically fit in your machine's memory but output text so slowly it feels like watching something buffer, and anything larger running on CPU is genuinely painful to use. For most laptops without a dedicated GPU, the sweet spot is around 7B to 8B parameters, and models like Llama 3.1 8B or Mistral 7B are actually quite good at that size. The responsiveness alone makes them more useful day-to-day than a much larger model you'll quietly avoid opening.

Be honest about your actual use case

Writing help, coding help, and pasting in a PDF you really don't want to read yourself are genuinely different tasks, and models are built with different strengths, so chasing benchmark rankings without first asking what you personally need is a reliable way to end up with something impressive that doesn't quite fit how you work. Once I got clear on that, a 7B or 8B model was almost always more than enough.

The tradeoffs are real

For complex reasoning or anything that genuinely requires the best possible output, a local 7B model isn't going to compete with something like GPT-4o or Claude, and pretending otherwise doesn't help anyone. The gap has been closing, but it hasn't closed. The hardware barrier is also real, and not everyone has a machine that can run even a mid-sized model at a usable speed, which is worth acknowledging.

Pick something and actually use it

The hardest part is resisting the urge to keep searching, because new models drop constantly and the temptation is always there, but the model you're genuinely reaching for every day is worth far more than the theoretically better one you're still evaluating. I use Ollama, I have two models I actually like, and once I stopped chasing every new release, local LLMs stopped feeling like an ongoing project and started feeling like something I just rely on.