AI Features & Privacy
Learn how AssisT handles AI processing with local LLMs, browser-native WebLLM, and optional cloud APIs while keeping your data private.
Overview
AssisT uses a privacy-first hybrid AI system that gives you four ways to use AI features, from completely offline to cloud-powered. Your data stays on your device by default. All 9 AI features are routed through a shared AI client that consistently handles mode detection, availability checking, and response generation.
Four AI Modes
AssisT offers flexible AI processing with four distinct modes:
| Mode | Privacy | Cost | Performance | Requirements |
|---|---|---|---|---|
| Off | N/A | Free | Features disabled | None |
| Local AI (Ollama) | 100% Private | Free | Good (hardware-dependent) | Ollama installed |
| Browser AI (WebLLM) | 100% Private | Free | Good (GPU-dependent) | WebGPU-capable browser |
| Cloud AI (API) | Your API only | Free (Gemini) or paid | Excellent | API key |
You can switch between modes instantly using the inline AI mode switcher in the popup — click Off, Cloud, Browser AI, or Local AI chips to switch without leaving the page.
Key Principles
- Your Choice: Pick the AI mode that matches your privacy and performance needs
- No Data Collection: We never see, store, or transmit your data
- Bring Your Own Key: Cloud mode uses your own API keys, not ours
- Graceful Fallback: Features work even without AI (with reduced functionality)
Local AI with Ollama
AssisT integrates with Ollama, a free, open-source tool that runs AI models directly on your computer.
Why Local AI?
| Benefit | Description |
|---|---|
| Privacy | Data never leaves your device |
| Compliance | Safe for GDPR, FERPA, and HIPAA environments |
| No Cost | No API fees or subscriptions |
| Offline | Works without internet connection |
| Speed | No network latency for requests |
Live Model Selector
AssisT auto-detects all models installed on your Ollama instance and shows them in a live dropdown in the popup’s Local AI panel. You can:
- See all your installed models at a glance
- Switch between models without restarting
- Your selection persists across sessions
Recommended Models
| Model | Size | Best For |
|---|---|---|
| qwen3:8b-q4_K_M | 5GB | Default — best JSON compliance, instruction following |
| llama3.1:8b | 5GB | Strong general-purpose, good reasoning |
| gemma3:4b | 3GB | Fast responses for basic tasks |
| deepseek-r1:8b | 5GB | Code and reasoning tasks |
| mistral:7b | 4GB | Complex analysis, detailed responses |
| llava | 4GB | Image understanding (vision) |
Task-Optimised Model Routing
AssisT automatically selects the best available model for each task:
| Task | Priority Models |
|---|---|
| Knowledge Graph | gemma3:4b, qwen3:8b |
| Socratic Tutor | qwen3:8b, llama3.1:8b |
| Citation Analyzer | qwen3:8b, mistral:7b |
| Study Path Generator | qwen3:8b, llama3.1:8b |
| Assignment Breakdown | qwen3:8b, llama3.1:8b |
| Summarization | Any available model |
| Text Simplification | Any available model |
Installing Ollama
- Download Ollama from ollama.com/download
- Install and run Ollama on your computer
- Pull a model:
ollama pull qwen3:8b-q4_K_M - AssisT will automatically detect it
How Local AI Works
Your Browser (AssisT)
↓
Message Bridge
↓
Ollama (localhost:11434)
↓
AI Response
↓
Back to AssisT
All communication happens locally on your machine. Nothing is sent to external servers.
Browser AI with WebLLM
AssisT supports browser-native AI powered by WebLLM and WebGPU. Models run entirely inside your browser tab — no server, no API key, no software to install.
Why Browser AI?
| Benefit | Description |
|---|---|
| Zero Install | No Ollama or other software needed |
| Privacy | All processing happens in your browser |
| No Cost | Completely free |
| Portable | Works on any WebGPU-capable device |
Available Models
| Model | Size | Best For |
|---|---|---|
| llama-3.2-1b | ~1GB | Ultra-fast, basic tasks |
| gemma-2b | ~2GB | Balanced small model |
| phi-3.5-mini | ~3GB | Strong reasoning for its size |
| qwen2.5-3b | ~3GB | Good multilingual support |
| llama-3.2-3b | ~3GB | Capable general model |
| mistral-7b | ~5GB | Complex analysis |
| llama-3.1-8b | ~6GB | Most capable browser model |
| gemma-7b | ~5GB | Strong reasoning |
Model Management
The popup shows three model states:
- Loaded (green) — Ready to use, currently in memory
- Cached (blue) — Downloaded to device, loads quickly
- Available (grey) — Needs downloading first
Models are downloaded once and cached in browser storage. Subsequent loads are fast.
How Browser AI Works
Your Browser (AssisT)
↓
WebLLM Engine (WebGPU)
↓
GPU-Accelerated Inference
↓
AI Response (in-browser)
Everything happens inside your browser. No external connections whatsoever.
Requirements
- Chrome 113+ or any browser with WebGPU support
- Dedicated GPU recommended for larger models (integrated GPUs work for 1-3B models)
- Sufficient storage for model downloads (1-6GB per model)
Cloud Providers (Optional)
For users who want more powerful AI capabilities, AssisT supports multiple cloud providers through API keys you provide.
Supported Providers
| Provider | Strengths | Best For |
|---|---|---|
| Anthropic (Claude) | Coding, academic writing, analysis | Text simplification, tutoring |
| OpenAI (ChatGPT) | Creative, conversational | Brainstorming, general tasks |
| Google (Gemini) | Multimodal, visual, factual | Image understanding |
| Perplexity | Real-time web, citations | Research, fact-checking |
Bringing Your Own API Key
- Get an API key from your preferred provider:
- Anthropic Console (Claude)
- OpenAI Platform (ChatGPT)
- Google AI Studio (Gemini)
- Perplexity Settings
- Open AssisT settings
- Go to AI Settings > Cloud Providers
- Select your provider and enter your API key
- Choose your preferred model
Cost vs Quality
| Model Type | Examples | Cost | Best For |
|---|---|---|---|
| Fast | Haiku 4.5, GPT-5.4 mini, Gemini 2.5 Flash (free), Sonar | Cheaper per token | Simple tasks, high volume |
| Balanced | Sonnet 4.6, GPT-5.4 Thinking, Gemini 2.5 Pro (free), Sonar Pro | Moderate | Most use cases |
| Quality | Opus 4.6, GPT-5.4 Pro, Gemini 2.5 Pro (free), Sonar Deep Research | Higher per token | Complex tasks, accuracy critical |
Tip: Start with faster models for simple tasks. Use larger models when you need more nuanced or accurate responses.
API Key Security
- Your API keys are encrypted with AES-256 and stored locally in Chrome’s secure storage
- They are never sent to Fiavaion servers
- Only transmitted directly to the provider when you use cloud features
- You can remove them anytime from settings
Claude Models (Anthropic)
When using Cloud AI mode with an Anthropic API key, AssisT supports the latest Claude models for powerful language understanding and generation:
| Model | Model ID | Best For | Input Cost | Output Cost |
|---|---|---|---|---|
| Haiku 4.5 | claude-haiku-4-5 | Quick answers, simple tasks, high volume | $0.001/1K tokens | $0.005/1K tokens |
| Sonnet 4.6 | claude-sonnet-4-6 | Everyday tasks, balanced performance (recommended) | $0.003/1K tokens | $0.015/1K tokens |
| Opus 4.6 | claude-opus-4-6 | Complex analysis, critical work, highest quality | $0.015/1K tokens | $0.075/1K tokens |
Cost Example: A typical 500-word document summary using Sonnet 4.6 costs approximately $0.002-0.004 per request.
Recommendation: Start with Sonnet 4.6 for the best balance of quality and cost. Use Haiku 4.5 for simple, high-volume tasks. Reserve Opus 4.6 for complex analysis where accuracy is critical.
Feature-Specific Defaults:
- Summarization: Haiku 4.5 (fast, sufficient for most summaries)
- Text Simplification: Sonnet 4.6 (better comprehension and clarity)
- Assignment Breakdown: Sonnet 4.6 (detailed task analysis)
- Socratic Tutor: Opus 4.6 (complex reasoning and questioning)
- Citation Analysis: Sonnet 4.6 (balanced accuracy and speed)
- Multi-Document Compare: Opus 4.6 (handles complexity well)
OpenAI Models (ChatGPT)
When using Cloud AI mode with an OpenAI API key, AssisT supports the latest GPT models. OpenAI is a paid service — you’ll need to add credit to your account.
| Model | Model ID | Best For |
|---|---|---|
| GPT-5.4 mini | gpt-5.4-mini | Fast responses, cost-effective, high volume |
| GPT-5.4 Thinking | gpt-5.4-thinking | Everyday tasks, balanced performance (recommended) |
| GPT-5.4 Pro | gpt-5.4-pro | Complex tasks, highest accuracy, difficult questions |
| GPT-5.4 nano | gpt-5.4-nano | Ultra-fast, lowest cost per token |
Recommendation: Start with GPT-5.4 mini for cost-effective everyday use. Use GPT-5.4 Thinking when you need deeper reasoning. Reserve GPT-5.4 Pro for complex analysis where accuracy is critical.
Gemini Models (Google)
When using Cloud AI mode with a Google API key, AssisT supports Google’s Gemini models. Gemini is the only major provider offering a genuinely free API tier — no credit card required, just sign in at aistudio.google.com and create a key.
| Model | Model ID | Free Tier Limits | Best For |
|---|---|---|---|
| Gemini 2.5 Flash | gemini-2.5-flash | 10 req/min, 250 req/day | Fast all-rounder — recommended default |
| Gemini 2.5 Flash-Lite | gemini-2.5-flash-lite | Unlimited req/day | Lightweight tasks, highest throughput |
| Gemini 2.5 Pro | gemini-2.5-pro | 5 req/min, 100 req/day | Most capable, complex analysis |
All free-tier models share a 250,000 tokens-per-minute cap. The daily limits reset automatically — you never “run out” permanently and you’re never charged unless you explicitly set up billing on Google Cloud (a separate step you’d have to go out of your way to do).
Recommendation: Start with Gemini 2.5 Flash — 250 requests per day is more than enough for regular student use (summarization, study questions, simplification, etc.). Use Gemini 2.5 Pro for complex analysis tasks like Knowledge Graph or Multi-Document Compare.
Perplexity Models
When using Cloud AI mode with a Perplexity API key, AssisT supports Perplexity’s Sonar models with built-in web search. Perplexity is a paid service.
| Model | Model ID | Best For |
|---|---|---|
| Sonar | sonar | Fast web search and summarization |
| Sonar Pro | sonar-pro | Deeper retrieval and analysis (recommended) |
| Sonar Reasoning | sonar-reasoning | Real-time reasoning with search |
| Sonar Reasoning Pro | sonar-reasoning-pro | Advanced reasoning with search (DeepSeek-R1 based) |
| Sonar Deep Research | sonar-deep-research | Long-form, source-dense research reports |
Perplexity models are unique in providing real-time web access with citations, making them ideal for research and fact-checking tasks.
Recommendation: Use Sonar Pro for most research tasks. Use Sonar Deep Research for comprehensive, source-heavy reports.
Gemini Nano (Experimental)
Chrome’s built-in Gemini Nano model provides on-device AI processing without installing anything.
Status
Gemini Nano support is currently experimental and has been deprioritized in favour of WebLLM (Browser AI), which offers more models, better control, and wider browser compatibility. Gemini Nano remains available for advanced users who have already enabled Chrome’s experimental flags.
Requirements
- Chrome 128 or later (Canary, Dev, Beta, or Stable)
- Feature flag enabled: Visit
chrome://flags/#optimization-guide-on-device-modeland set to “Enabled” - Model download: Chrome downloads the model automatically on first use
Gemini Nano vs WebLLM vs Ollama
| Feature | WebLLM (Browser AI) | Gemini Nano | Ollama |
|---|---|---|---|
| Setup | None — just download a model | Chrome flag required | Install separate app |
| Model Choice | 8 models (1B-8B) | Single model (Google’s) | Unlimited models |
| Performance | Good, GPU-accelerated | Basic tasks only | Best for complex tasks |
| Browser Support | Chrome 113+ | Chrome 128+ with flags | Any browser |
| Customization | Choose model per task | Limited | Full control |
Recommendation: Use Browser AI (WebLLM) for zero-install private AI, or Ollama for maximum capability. Gemini Nano is best suited for users already familiar with Chrome feature flags.
How the AI Mode System Works
AssisT routes all AI requests through a shared AI feature client (ai-feature-client.js) that provides consistent mode detection, availability checking, and response generation across all 9 features.
Feature Request (any of 9 AI features)
↓
Shared AI Feature Client
↓
getAIMode() → reads aiMode from storage
↓
┌────────┬────────┬──────────┬─────────┐
│ │ │ │ │
OFF Cloud WebLLM Ollama Gemini
│ │ │ │ (experimental)
│ API Key WebGPU localhost
│ + Model Engine :11434
│ │ │ │
└→ Status Generate Generate Generate
bar ↓ ↓ ↓
Response Response Response
Persistent Status Bars
Every AI feature displays a persistent status bar showing:
- Orange warning when AI is unavailable (with setup links)
- Green success after a successful generation
- Mode indicator showing which AI backend processed the request
Feature Compatibility by Mode
| Feature | WebLLM | Ollama | Cloud |
|---|---|---|---|
| Summarization | ✅ | ✅ | ✅ |
| Text Simplification | ✅ | ✅ | ✅ |
| Assignment Breakdown | ✅ | ✅ | ✅ |
| Socratic Tutor | ⚠️ Basic | ✅ | ✅ |
| Multi-Doc Compare | ⚠️ Basic | ✅ | ✅ |
| Knowledge Graph | ⚠️ Small texts | ✅ | ✅ |
| Citation Analyzer | ✅ | ✅ | ✅ |
| Emotional TTS | ❌ | ✅ | ✅ |
| Study Path Generator | ⚠️ Basic | ✅ | ✅ |
| Image Understanding | ❌ | ✅ (llava) | ✅ |
| Research & Citations | ❌ | ❌ | ✅ (Perplexity) |
Fallback Behaviors
When AI isn’t available, features gracefully degrade:
| Feature | Fallback Behavior |
|---|---|
| Summarize | Shows first paragraph |
| Simplify | Feature disabled with status message |
| Image Describe | Requires vision model |
| Emotional TTS | Uses standard neutral TTS |
| Knowledge Graph | Disabled with status message |
Privacy Guarantees
What We Never Do
- Collect or store your data
- Send data to our servers
- Track your AI usage
- Share information with third parties
What Stays Local
- All text you process
- Documents you summarize
- Images you analyze
- Conversation history
GDPR/FERPA/HIPAA Compliance
Because AssisT processes everything locally:
- GDPR: No personal data is transmitted
- FERPA: Student data stays on the device
- HIPAA: Patient information never leaves the browser
This makes AssisT safe for educational institutions and healthcare settings.
Performance Tips
For Best Local AI Performance
- Use an SSD: Faster model loading
- 8GB+ RAM/VRAM: Required for larger models
- Keep Ollama Running: Faster first response
- Choose Appropriate Models: Match model size to your hardware
Why Memory Matters
- More VRAM = Better Models: With more video memory (or unified memory on Apple Silicon), you can run larger, more capable models
- More Memory = Longer Context: Additional memory allows longer context windows—the AI can “remember” more of your document
- Longer Context = Fewer Hallucinations: When AI sees more context, it makes fewer mistakes because it has more information to work with
Memory Types
| Type | What Matters | Notes |
|---|---|---|
| Dedicated GPU | VRAM (8GB good, 12GB+ great) | NVIDIA/AMD graphics cards |
| Apple Silicon | Unified memory (16GB good, 32GB+ excellent) | M1/M2/M3/M4 Macs |
| CPU-only | System RAM (16GB min, 32GB recommended) | Slower but works |
Recommended System Requirements
| Setup | RAM/VRAM | Storage | Models |
|---|---|---|---|
| Minimal | 8GB | 4GB free | gemma3:4b |
| Standard | 16GB | 8GB free | qwen3:8b-q4_K_M |
| Full | 32GB+ | 15GB free | Multiple models + longer context |
Troubleshooting
Ollama Not Detected
- Ensure Ollama is installed and running
- Check that it’s accessible at
localhost:11434 - Open a browser and visit localhost:11434 — you should see “Ollama is running”
- Restart Ollama if needed
- Refresh the AssisT extension
Ollama Model Not Showing in Dropdown
- Verify the model is installed: run
ollama listin your terminal - Ensure Ollama is running (the dropdown fetches models live)
- Try closing and reopening the AssisT popup
WebLLM Model Won’t Load
- Check your browser supports WebGPU — visit
chrome://gpuand look for “WebGPU” in the feature list - Ensure you have enough GPU memory for the model (1-6GB depending on model)
- Try a smaller model (llama-3.2-1b or gemma-2b)
- Close other GPU-intensive tabs or applications
- Restart Chrome if the GPU context is corrupted
WebLLM Shows “Not Downloaded”
- Click the Download button next to the model
- Wait for the download to complete (shows progress bar)
- Larger models (7B+) may take several minutes depending on connection
- Downloaded models are cached in browser storage — they persist across sessions
Slow AI Responses
- Local AI: Try a smaller Ollama model (gemma3:4b is fastest)
- Browser AI: Use a smaller WebLLM model (1B-3B range)
- Ensure no other AI requests are processing simultaneously
- Check your system’s available memory/VRAM
- Close other resource-intensive applications
AI Feature Shows Orange “Unavailable” Bar
This status bar appears when the selected AI mode can’t be reached:
- Cloud mode: Check your API key is entered and valid
- Local mode: Ensure Ollama is running at localhost:11434
- Browser AI: Load a WebLLM model first (click the status bar for setup)
- Off mode: Switch to an active AI mode using the mode chips in the popup
AI Response Times Out
Some AI operations (especially Knowledge Graph and Multi-Doc Compare) may timeout on slower hardware. AssisT uses a 25-second timeout for all async operations. If you experience timeouts:
- Try shorter input text
- Use a faster model
- Switch to Cloud AI for complex operations
Model Download Failed
- Check your internet connection
- Ensure enough disk space is available
- Try downloading a smaller model first
- Restart Ollama (for local) or Chrome (for WebLLM) and try again