Ollama for Beginners: Run Your First Local LLM in 10 Minutes
May 15, 2026 6 min read Wise Technologies Team
#Ollama#Beginners#Local LLM#Tutorial#AI
What is Ollama?
Ollama is a free, open-source tool that makes running large language models on your own computer as simple as using a chat app. Think of it as Docker for AI models: one command to download, one command to run. No cloud subscriptions, no API keys, no data leaving your machine. Ollama supports over 100 models including Llama, Mistral, Qwen, DeepSeek, and Hermes.
Step 1: Download and Install
Visit ollama.com and download the installer for your operating system. Windows users: run the .exe and follow the wizard. Mac users: download the .dmg or run "brew install ollama". Linux users: run "curl -fsSL https://ollama.com/install.sh | sh". The installation takes about 2 minutes and uses roughly 500MB of disk space.
Step 2: Pull Your First Model
Open a terminal and run "ollama pull llama3.1". This downloads Meta's Llama 3.1 model (the 8B parameter version, about 4.7GB). For a smaller, faster option, try "ollama pull phi3" (Microsoft's Phi-3, only 2.3GB). For coding tasks, use "ollama pull codellama:7b-code". The download speed depends on your internet connection — typically 5-15 minutes.
Step 3: Start Chatting
Run "ollama run llama3.1" and you will see a prompt. Type anything: "Explain quantum computing in simple terms" or "Write a Python function to sort a list". The model runs entirely on your machine — no internet connection required after the initial download. Press Ctrl+D or type "/bye" to exit.
Step 4: Use the API
Ollama exposes a local API at http://localhost:11434. Test it with curl: "curl http://localhost:11434/api/generate -d '{"model":"llama3.1","prompt":"Hello, how are you?"}'". For applications, use the official JavaScript SDK (npm install ollama) or Python SDK (pip install ollama). This lets you integrate local AI into your web apps, scripts, and automation workflows.
Understanding Model Sizes
Models are measured in billions of parameters (B). A 7B model has 7 billion parameters and requires about 4-8GB of RAM. A 70B model has 70 billion parameters and needs 40-80GB of RAM. For most users, 7B-13B models offer the best balance of quality and speed. Only use 70B+ models if you have powerful hardware (RTX 4090, Mac Studio, or server GPUs).
What Hardware Do You Need?
Minimum: any modern computer with 8GB RAM can run 3B-7B models (slower on CPU). Recommended: 16GB RAM for smooth 7B-13B model performance. Ideal: a dedicated GPU with 12GB+ VRAM (RTX 3060 12GB, RTX 4070, or better) for fast inference on larger models. Apple Silicon Macs (M1/M2/M3) with 16GB+ unified memory are excellent for local LLMs.
Next Steps
Once you are comfortable, explore: running multiple models, creating custom Modelfiles to fine-tune behavior, building apps with the Ollama API, and trying specialized models like Hermes 3 for tool use or CodeLlama for programming. The Ollama community at ollama.com/library has hundreds of models to experiment with.
Wise Technologies Team
AI Infrastructure
"Enjoyed this article? We build the tools we write about."