How It Works

Our AI runs entirely on your computer using Ollama. Nothing is sent to external servers unless you explicitly enable cloud features.

πŸ“
Your Request
β†’
πŸ–₯️
Ollama (Local)
β†’
✨
AI Response
πŸ’»

Local First

AI runs on your computerβ€”no internet required

πŸ”’

No Data Collection

We never see or store your data

☁️

Optional Cloud

Use your own API keys if desired

πŸ”„

Graceful Fallback

Features work even without AI

Local AI with Ollama

We use Ollama, a free tool that runs AI models directly on your computer.

πŸ”
Privacy Data never leaves your device
πŸ“œ
Compliance Safe for regulated environments
πŸ’°
No Cost No API fees or subscriptions
πŸ“‘
Offline Works without internet
⚑
Fast No network latency

Recommended Models by VRAM

Select your GPU memory to see the best models for your system. All use Q4_K_M quantizationβ€”the optimal balance of quality and efficiency.

Integrated GPUs, entry-level cards

Model Size Best For Description
llama3.2:3b 2GB Fast Quick responses, basic tasks
phi3:mini 2GB Fast Compact and efficient
deepseek-r1:8b 5GB Reasoning Step-by-step problem solving
qwen2.5-coder:7b 4.5GB Coding Code specialist
moondream 1.7GB Vision Image understanding

RTX 3060 12GB, RTX 4070

Model Size Best For Description
gemma2:9b 5GB Fast Google's efficient model
deepseek-r1:14b 9GB Reasoning Strong reasoning ability
qwen2.5-coder:14b 9GB Coding Professional grade
llava:13b 8GB Vision Full vision capabilities

RTX 4080, Mac 16GB+

Model Size Best For Description
phi4:14b 8GB Fast Microsoft's latest, excellent
qwq:32b 18GB Reasoning Outstanding chain-of-thought
qwen2.5-coder:14b 9GB Coding Room for longer context
mistral:7b 4GB Creative Creative writing

RTX 3090/4090, Mac Studio

Model Size Best For Description
deepseek-r1:32b 20GB Reasoning Near cloud quality
qwen2.5-coder:32b 20GB Coding Expert-level coding
llava:34b 20GB Vision Advanced visual analysis
llama3.1:70b Q3 General 70B with quantization

Multi-GPU, Mac Studio 64GB+

Model Size Best For Description
llama3.3:70b 40GB General Flagship open model
qwen2.5-coder:32b +32K ctx Coding Full context for codebases
qwen2-vl:72b Large Vision Best-in-class vision

Quick Tips

Start Small

Use fast models for simple tasks, larger for complex reasoning

Context Window

More VRAM = longer context = fewer hallucinations

Specialized Models

Coder models beat general models at coding tasks

Cloud Providers (Optional)

Use your own API keys for more powerful capabilities. We never see your keysβ€”they're stored locally and sent directly to providers.

Anthropic Claude
CodingAcademicAnalysis
Haiku β†’ Sonnet β†’ Opus
OpenAI ChatGPT
CreativeGeneralConversational
GPT-4o-mini β†’ GPT-4o β†’ GPT-4
Google Gemini
MultimodalVisualFactual
Flash β†’ Pro
Perplexity Search
ResearchCitationsCurrent
Sonar β†’ Sonar Pro
⚑
Faster = Cheaper

Haiku, GPT-4o-mini, Flash cost less per token

🎯
Larger = Better

Opus, GPT-4 for complex, accuracy-critical tasks

Privacy Guarantees

❌ What We Never Do

  • Collect or store your data
  • Send data to our servers
  • Track your AI usage
  • Share with third parties

βœ… What Stays Local

  • All text you process
  • Documents you summarize
  • Images you analyze
  • Conversation history
GDPR No personal data transmitted
FERPA Student data stays on device
HIPAA Patient info never leaves browser

Quick Setup

1

Install Ollama

Download from ollama.ai and run the installer.

2

Pull a Model

Run ollama pull llama3.2:3b in your terminal to download a model.

3

Open Our App

The app automatically detects Ollama at localhost:11434.

Recommended Models by VRAM
# 8GB VRAM - Entry level ollama pull llama3.2:3b ollama pull moondream # 12-16GB VRAM - Best experience ollama pull phi4:14b ollama pull deepseek-r1:14b # 24GB+ VRAM - Maximum quality ollama pull deepseek-r1:32b ollama pull qwen2.5-coder:32b
Cloud API Setup (Optional)
  1. Get an API key from: Anthropic, OpenAI, Google, or Perplexity
  2. Open app settings β†’ AI Settings β†’ Cloud Providers
  3. Select provider, paste key, choose model
Troubleshooting
Ollama Not Detected

Ensure Ollama is running. Check localhost:11434 in browser. Restart Ollama if needed.

Slow Responses

Try a smaller model. Close other apps. Check available memory.

Model Download Failed

Check internet connection. Ensure enough disk space. Try smaller model first.

Cloud API Errors

Verify API key. Check account credits. Try a different model.