alexi.sh
All articlesBrowser securityNetwork privacyPrivacy toolingThreat modelingAI codingDev tooling

alexi.shAI Engineering Lab

ai-coding

What Is Ollama? Run LLMs Locally in 2026 (Beginner's Guide)

PrivSec Lab3 min read
A command-line terminal on Ubuntu

Ollama is an open-source tool to download and run large language models locally with one command — Llama, Qwen, Mistral and more, on your own machine. What it is, how to install and use it, the REST API, and the honest limits versus cloud models.

If you have wanted to run AI on your own computer — no cloud, no API key, nothing leaving your machine — Ollama is the simplest way to do it in 2026. It is an open-source tool that downloads and runs large language models locally with a single command. This guide explains what Ollama is, how to install and use it, its local API, and the honest limits versus cloud models.

What Ollama is

Ollama bundles model weights, configuration and a runtime so that one command works:

ollama run qwen2.5

That downloads the model on first run and drops you into a local chat. It runs on macOS, Linux and Windows, supports many open models (Llama, Qwen, Mistral, Gemma, DeepSeek and more), and keeps everything on your machine. It is the easiest on-ramp to local AI.

A server room aisle lined with racks

Installing and using it

Download the installer for your OS (or run the Linux install script), then:

ollama run llama3.2     # chat with a model (downloads on first run)
ollama pull qwen2.5     # fetch a model without chatting
ollama list             # see installed models
ollama serve            # run the local API

It is deliberately minimal: one command to chat, one to pull, one to serve.

The local API

Ollama runs a REST API on http://localhost:11434 that apps and scripts call to generate text, chat or create embeddings — so you can build RAG pipelines, chatbots and editor assistants entirely on-device. Tools like the Continue extension (VS Code/JetBrains) integrate with it directly. Keep the endpoint on localhost (not 0.0.0.0) so it isn't exposed on your network.

Why people use Ollama

  • Privacy: prompts and documents stay local — nothing sent to a third party. See data sovereignty.
  • Cost: free tool, free inference on hardware you own.
  • Offline & reproducible: works without internet; the same model behaves the same indefinitely.

For picking the right model to run, see the best local LLM for coding and best coding LLMs 2026.

The honest limits

  • Hardware: you need enough RAM/VRAM for the model size (a 7B model in ~6–8 GB at 4-bit; larger needs more). Apple Silicon with unified memory does well.
  • Capability: local 7B–70B models are great for drafting, summarising, coding help and RAG, but the largest hosted models still lead on the hardest reasoning and longest context.
  • Licenses: the models have their own licenses — respect them for commercial use.

So the trade is clear: Ollama gives privacy, zero per-token cost and offline use; cloud gives peak capability. For the cloud side, see Cursor vs Copilot.

The bottom line

Ollama is the easiest way to run LLMs locally in 2026: one command, many open models, a local API, and full privacy because nothing leaves your machine. It will not match the absolute frontier of hosted models on the hardest tasks, but for private chat, coding help, RAG over your own files and offline use, it is genuinely excellent — and free. If local, private AI is your goal, Ollama is the place to start.

To go further, pair Ollama with the right model in the best local LLM for coding, and read why keeping inference local matters in data sovereignty.

Editorial guide based on Ollama's documented features (local model runtime, CLI, REST API on localhost, supported open models) and the documented trade-offs of local versus hosted LLMs. We state plainly that local models trail the largest hosted ones on the hardest tasks. No vendor relationship influences this guide.

Photo: Unsplash (source)

Also available in

FAQ

What is Ollama?
Ollama is a free, open-source tool that lets you download and run large language models (LLMs) locally on your own computer with a single command. It bundles the model weights, configuration and a runtime so that 'ollama run llama3.2' just works — no cloud account, no API key, no data leaving your machine. It runs on macOS, Linux and Windows, exposes a local REST API for apps to use, and supports many open models (Llama, Qwen, Mistral, Gemma, DeepSeek and more). Think of it as the easiest on-ramp to local AI.
How do I install and use Ollama?
Download the installer for your OS from the official site (or use the Linux install script), then in a terminal run 'ollama run <model>', for example 'ollama run qwen2.5' — Ollama downloads the model on first run and drops you into a chat prompt. Other key commands: 'ollama pull <model>' to fetch a model, 'ollama list' to see installed ones, and 'ollama serve' which runs the local API. It is deliberately minimal: one command to chat, one to pull, one to serve.
Does Ollama have an API?
Yes. Ollama runs a local REST API (by default on http://localhost:11434) that apps and scripts can call to generate text, chat, or create embeddings — so you can build RAG pipelines, editor assistants and chatbots entirely on-device. Many tools integrate with it out of the box, including the Continue extension for VS Code/JetBrains. Because the endpoint is local, your prompts and data never leave your machine unless you deliberately expose the port.
Is Ollama private and free?
Yes on both. Ollama is open-source and free, and it runs models entirely on your hardware, so your prompts and documents stay local — nothing is sent to a third-party API. That makes it a strong choice for sensitive or proprietary work. The two caveats: keep the API bound to localhost (not 0.0.0.0) so it isn't exposed on your network, and remember that the models themselves have their own licenses you should respect for commercial use.
Is Ollama good enough compared to ChatGPT or Claude?
For many tasks, yes — but honestly, not at the absolute frontier. Local models you run through Ollama (7B–70B class) are excellent for drafting, summarising, coding assistance, RAG over your own documents, and offline/private use. The largest hosted models still lead on the hardest reasoning and longest context. The trade is clear: Ollama gives you privacy, zero per-token cost and offline capability; cloud gives you peak capability. Many people use both.