Skip to main content
Support

AI Features & Privacy

Learn how AssisT handles AI processing with local LLMs, browser-native WebLLM, and optional cloud APIs while keeping your data private.

Overview

AssisT uses a privacy-first hybrid AI system that gives you four ways to use AI features, from completely offline to cloud-powered. Your data stays on your device by default. All 9 AI features are routed through a shared AI client that consistently handles mode detection, availability checking, and response generation.

Four AI Modes

AssisT offers flexible AI processing with four distinct modes:

ModePrivacyCostPerformanceRequirements
OffN/AFreeFeatures disabledNone
Local AI (Ollama)100% PrivateFreeGood (hardware-dependent)Ollama installed
Browser AI (WebLLM)100% PrivateFreeGood (GPU-dependent)WebGPU-capable browser
Cloud AI (API)Your API onlyFree (Gemini) or paidExcellentAPI key

You can switch between modes instantly using the inline AI mode switcher in the popup — click Off, Cloud, Browser AI, or Local AI chips to switch without leaving the page.

Key Principles

  • Your Choice: Pick the AI mode that matches your privacy and performance needs
  • No Data Collection: We never see, store, or transmit your data
  • Bring Your Own Key: Cloud mode uses your own API keys, not ours
  • Graceful Fallback: Features work even without AI (with reduced functionality)

Local AI with Ollama

AssisT integrates with Ollama, a free, open-source tool that runs AI models directly on your computer.

Why Local AI?

BenefitDescription
PrivacyData never leaves your device
ComplianceSafe for GDPR, FERPA, and HIPAA environments
No CostNo API fees or subscriptions
OfflineWorks without internet connection
SpeedNo network latency for requests

Live Model Selector

AssisT auto-detects all models installed on your Ollama instance and shows them in a live dropdown in the popup’s Local AI panel. You can:

  • See all your installed models at a glance
  • Switch between models without restarting
  • Your selection persists across sessions
ModelSizeBest For
qwen3:8b-q4_K_M5GBDefault — best JSON compliance, instruction following
llama3.1:8b5GBStrong general-purpose, good reasoning
gemma3:4b3GBFast responses for basic tasks
deepseek-r1:8b5GBCode and reasoning tasks
mistral:7b4GBComplex analysis, detailed responses
llava4GBImage understanding (vision)

Task-Optimised Model Routing

AssisT automatically selects the best available model for each task:

TaskPriority Models
Knowledge Graphgemma3:4b, qwen3:8b
Socratic Tutorqwen3:8b, llama3.1:8b
Citation Analyzerqwen3:8b, mistral:7b
Study Path Generatorqwen3:8b, llama3.1:8b
Assignment Breakdownqwen3:8b, llama3.1:8b
SummarizationAny available model
Text SimplificationAny available model

Installing Ollama

  1. Download Ollama from ollama.com/download
  2. Install and run Ollama on your computer
  3. Pull a model: ollama pull qwen3:8b-q4_K_M
  4. AssisT will automatically detect it

How Local AI Works

Your Browser (AssisT)

    Message Bridge

Ollama (localhost:11434)

    AI Response

Back to AssisT

All communication happens locally on your machine. Nothing is sent to external servers.

Browser AI with WebLLM

AssisT supports browser-native AI powered by WebLLM and WebGPU. Models run entirely inside your browser tab — no server, no API key, no software to install.

Why Browser AI?

BenefitDescription
Zero InstallNo Ollama or other software needed
PrivacyAll processing happens in your browser
No CostCompletely free
PortableWorks on any WebGPU-capable device

Available Models

ModelSizeBest For
llama-3.2-1b~1GBUltra-fast, basic tasks
gemma-2b~2GBBalanced small model
phi-3.5-mini~3GBStrong reasoning for its size
qwen2.5-3b~3GBGood multilingual support
llama-3.2-3b~3GBCapable general model
mistral-7b~5GBComplex analysis
llama-3.1-8b~6GBMost capable browser model
gemma-7b~5GBStrong reasoning

Model Management

The popup shows three model states:

  • Loaded (green) — Ready to use, currently in memory
  • Cached (blue) — Downloaded to device, loads quickly
  • Available (grey) — Needs downloading first

Models are downloaded once and cached in browser storage. Subsequent loads are fast.

How Browser AI Works

Your Browser (AssisT)

    WebLLM Engine (WebGPU)

    GPU-Accelerated Inference

    AI Response (in-browser)

Everything happens inside your browser. No external connections whatsoever.

Requirements

  • Chrome 113+ or any browser with WebGPU support
  • Dedicated GPU recommended for larger models (integrated GPUs work for 1-3B models)
  • Sufficient storage for model downloads (1-6GB per model)

Cloud Providers (Optional)

For users who want more powerful AI capabilities, AssisT supports multiple cloud providers through API keys you provide.

Supported Providers

ProviderStrengthsBest For
Anthropic (Claude)Coding, academic writing, analysisText simplification, tutoring
OpenAI (ChatGPT)Creative, conversationalBrainstorming, general tasks
Google (Gemini)Multimodal, visual, factualImage understanding
PerplexityReal-time web, citationsResearch, fact-checking

Bringing Your Own API Key

  1. Get an API key from your preferred provider:
  2. Open AssisT settings
  3. Go to AI Settings > Cloud Providers
  4. Select your provider and enter your API key
  5. Choose your preferred model

Cost vs Quality

Model TypeExamplesCostBest For
FastHaiku 4.5, GPT-5.4 mini, Gemini 2.5 Flash (free), SonarCheaper per tokenSimple tasks, high volume
BalancedSonnet 4.6, GPT-5.4 Thinking, Gemini 2.5 Pro (free), Sonar ProModerateMost use cases
QualityOpus 4.6, GPT-5.4 Pro, Gemini 2.5 Pro (free), Sonar Deep ResearchHigher per tokenComplex tasks, accuracy critical

Tip: Start with faster models for simple tasks. Use larger models when you need more nuanced or accurate responses.

API Key Security

  • Your API keys are encrypted with AES-256 and stored locally in Chrome’s secure storage
  • They are never sent to Fiavaion servers
  • Only transmitted directly to the provider when you use cloud features
  • You can remove them anytime from settings

Claude Models (Anthropic)

When using Cloud AI mode with an Anthropic API key, AssisT supports the latest Claude models for powerful language understanding and generation:

ModelModel IDBest ForInput CostOutput Cost
Haiku 4.5claude-haiku-4-5Quick answers, simple tasks, high volume$0.001/1K tokens$0.005/1K tokens
Sonnet 4.6claude-sonnet-4-6Everyday tasks, balanced performance (recommended)$0.003/1K tokens$0.015/1K tokens
Opus 4.6claude-opus-4-6Complex analysis, critical work, highest quality$0.015/1K tokens$0.075/1K tokens

Cost Example: A typical 500-word document summary using Sonnet 4.6 costs approximately $0.002-0.004 per request.

Recommendation: Start with Sonnet 4.6 for the best balance of quality and cost. Use Haiku 4.5 for simple, high-volume tasks. Reserve Opus 4.6 for complex analysis where accuracy is critical.

Feature-Specific Defaults:

  • Summarization: Haiku 4.5 (fast, sufficient for most summaries)
  • Text Simplification: Sonnet 4.6 (better comprehension and clarity)
  • Assignment Breakdown: Sonnet 4.6 (detailed task analysis)
  • Socratic Tutor: Opus 4.6 (complex reasoning and questioning)
  • Citation Analysis: Sonnet 4.6 (balanced accuracy and speed)
  • Multi-Document Compare: Opus 4.6 (handles complexity well)

OpenAI Models (ChatGPT)

When using Cloud AI mode with an OpenAI API key, AssisT supports the latest GPT models. OpenAI is a paid service — you’ll need to add credit to your account.

ModelModel IDBest For
GPT-5.4 minigpt-5.4-miniFast responses, cost-effective, high volume
GPT-5.4 Thinkinggpt-5.4-thinkingEveryday tasks, balanced performance (recommended)
GPT-5.4 Progpt-5.4-proComplex tasks, highest accuracy, difficult questions
GPT-5.4 nanogpt-5.4-nanoUltra-fast, lowest cost per token

Recommendation: Start with GPT-5.4 mini for cost-effective everyday use. Use GPT-5.4 Thinking when you need deeper reasoning. Reserve GPT-5.4 Pro for complex analysis where accuracy is critical.

Gemini Models (Google)

When using Cloud AI mode with a Google API key, AssisT supports Google’s Gemini models. Gemini is the only major provider offering a genuinely free API tier — no credit card required, just sign in at aistudio.google.com and create a key.

ModelModel IDFree Tier LimitsBest For
Gemini 2.5 Flashgemini-2.5-flash10 req/min, 250 req/dayFast all-rounder — recommended default
Gemini 2.5 Flash-Litegemini-2.5-flash-liteUnlimited req/dayLightweight tasks, highest throughput
Gemini 2.5 Progemini-2.5-pro5 req/min, 100 req/dayMost capable, complex analysis

All free-tier models share a 250,000 tokens-per-minute cap. The daily limits reset automatically — you never “run out” permanently and you’re never charged unless you explicitly set up billing on Google Cloud (a separate step you’d have to go out of your way to do).

Recommendation: Start with Gemini 2.5 Flash — 250 requests per day is more than enough for regular student use (summarization, study questions, simplification, etc.). Use Gemini 2.5 Pro for complex analysis tasks like Knowledge Graph or Multi-Document Compare.

Perplexity Models

When using Cloud AI mode with a Perplexity API key, AssisT supports Perplexity’s Sonar models with built-in web search. Perplexity is a paid service.

ModelModel IDBest For
SonarsonarFast web search and summarization
Sonar Prosonar-proDeeper retrieval and analysis (recommended)
Sonar Reasoningsonar-reasoningReal-time reasoning with search
Sonar Reasoning Prosonar-reasoning-proAdvanced reasoning with search (DeepSeek-R1 based)
Sonar Deep Researchsonar-deep-researchLong-form, source-dense research reports

Perplexity models are unique in providing real-time web access with citations, making them ideal for research and fact-checking tasks.

Recommendation: Use Sonar Pro for most research tasks. Use Sonar Deep Research for comprehensive, source-heavy reports.

Gemini Nano (Experimental)

Chrome’s built-in Gemini Nano model provides on-device AI processing without installing anything.

Status

Gemini Nano support is currently experimental and has been deprioritized in favour of WebLLM (Browser AI), which offers more models, better control, and wider browser compatibility. Gemini Nano remains available for advanced users who have already enabled Chrome’s experimental flags.

Requirements

  1. Chrome 128 or later (Canary, Dev, Beta, or Stable)
  2. Feature flag enabled: Visit chrome://flags/#optimization-guide-on-device-model and set to “Enabled”
  3. Model download: Chrome downloads the model automatically on first use

Gemini Nano vs WebLLM vs Ollama

FeatureWebLLM (Browser AI)Gemini NanoOllama
SetupNone — just download a modelChrome flag requiredInstall separate app
Model Choice8 models (1B-8B)Single model (Google’s)Unlimited models
PerformanceGood, GPU-acceleratedBasic tasks onlyBest for complex tasks
Browser SupportChrome 113+Chrome 128+ with flagsAny browser
CustomizationChoose model per taskLimitedFull control

Recommendation: Use Browser AI (WebLLM) for zero-install private AI, or Ollama for maximum capability. Gemini Nano is best suited for users already familiar with Chrome feature flags.

How the AI Mode System Works

AssisT routes all AI requests through a shared AI feature client (ai-feature-client.js) that provides consistent mode detection, availability checking, and response generation across all 9 features.

Feature Request (any of 9 AI features)

Shared AI Feature Client

   getAIMode() → reads aiMode from storage

   ┌────────┬────────┬──────────┬─────────┐
   │        │        │          │         │
  OFF    Cloud    WebLLM     Ollama    Gemini
   │        │        │          │      (experimental)
   │    API Key   WebGPU   localhost
   │    + Model   Engine   :11434
   │        │        │          │
   └→ Status  Generate  Generate  Generate
      bar       ↓        ↓        ↓
              Response  Response  Response

Persistent Status Bars

Every AI feature displays a persistent status bar showing:

  • Orange warning when AI is unavailable (with setup links)
  • Green success after a successful generation
  • Mode indicator showing which AI backend processed the request

Feature Compatibility by Mode

FeatureWebLLMOllamaCloud
Summarization
Text Simplification
Assignment Breakdown
Socratic Tutor⚠️ Basic
Multi-Doc Compare⚠️ Basic
Knowledge Graph⚠️ Small texts
Citation Analyzer
Emotional TTS
Study Path Generator⚠️ Basic
Image Understanding✅ (llava)
Research & Citations✅ (Perplexity)

Fallback Behaviors

When AI isn’t available, features gracefully degrade:

FeatureFallback Behavior
SummarizeShows first paragraph
SimplifyFeature disabled with status message
Image DescribeRequires vision model
Emotional TTSUses standard neutral TTS
Knowledge GraphDisabled with status message

Privacy Guarantees

What We Never Do

  • Collect or store your data
  • Send data to our servers
  • Track your AI usage
  • Share information with third parties

What Stays Local

  • All text you process
  • Documents you summarize
  • Images you analyze
  • Conversation history

GDPR/FERPA/HIPAA Compliance

Because AssisT processes everything locally:

  • GDPR: No personal data is transmitted
  • FERPA: Student data stays on the device
  • HIPAA: Patient information never leaves the browser

This makes AssisT safe for educational institutions and healthcare settings.

Performance Tips

For Best Local AI Performance

  1. Use an SSD: Faster model loading
  2. 8GB+ RAM/VRAM: Required for larger models
  3. Keep Ollama Running: Faster first response
  4. Choose Appropriate Models: Match model size to your hardware

Why Memory Matters

  • More VRAM = Better Models: With more video memory (or unified memory on Apple Silicon), you can run larger, more capable models
  • More Memory = Longer Context: Additional memory allows longer context windows—the AI can “remember” more of your document
  • Longer Context = Fewer Hallucinations: When AI sees more context, it makes fewer mistakes because it has more information to work with

Memory Types

TypeWhat MattersNotes
Dedicated GPUVRAM (8GB good, 12GB+ great)NVIDIA/AMD graphics cards
Apple SiliconUnified memory (16GB good, 32GB+ excellent)M1/M2/M3/M4 Macs
CPU-onlySystem RAM (16GB min, 32GB recommended)Slower but works
SetupRAM/VRAMStorageModels
Minimal8GB4GB freegemma3:4b
Standard16GB8GB freeqwen3:8b-q4_K_M
Full32GB+15GB freeMultiple models + longer context

Troubleshooting

Ollama Not Detected

  1. Ensure Ollama is installed and running
  2. Check that it’s accessible at localhost:11434
  3. Open a browser and visit localhost:11434 — you should see “Ollama is running”
  4. Restart Ollama if needed
  5. Refresh the AssisT extension

Ollama Model Not Showing in Dropdown

  1. Verify the model is installed: run ollama list in your terminal
  2. Ensure Ollama is running (the dropdown fetches models live)
  3. Try closing and reopening the AssisT popup

WebLLM Model Won’t Load

  1. Check your browser supports WebGPU — visit chrome://gpu and look for “WebGPU” in the feature list
  2. Ensure you have enough GPU memory for the model (1-6GB depending on model)
  3. Try a smaller model (llama-3.2-1b or gemma-2b)
  4. Close other GPU-intensive tabs or applications
  5. Restart Chrome if the GPU context is corrupted

WebLLM Shows “Not Downloaded”

  1. Click the Download button next to the model
  2. Wait for the download to complete (shows progress bar)
  3. Larger models (7B+) may take several minutes depending on connection
  4. Downloaded models are cached in browser storage — they persist across sessions

Slow AI Responses

  1. Local AI: Try a smaller Ollama model (gemma3:4b is fastest)
  2. Browser AI: Use a smaller WebLLM model (1B-3B range)
  3. Ensure no other AI requests are processing simultaneously
  4. Check your system’s available memory/VRAM
  5. Close other resource-intensive applications

AI Feature Shows Orange “Unavailable” Bar

This status bar appears when the selected AI mode can’t be reached:

  • Cloud mode: Check your API key is entered and valid
  • Local mode: Ensure Ollama is running at localhost:11434
  • Browser AI: Load a WebLLM model first (click the status bar for setup)
  • Off mode: Switch to an active AI mode using the mode chips in the popup

AI Response Times Out

Some AI operations (especially Knowledge Graph and Multi-Doc Compare) may timeout on slower hardware. AssisT uses a 25-second timeout for all async operations. If you experience timeouts:

  1. Try shorter input text
  2. Use a faster model
  3. Switch to Cloud AI for complex operations

Model Download Failed

  1. Check your internet connection
  2. Ensure enough disk space is available
  3. Try downloading a smaller model first
  4. Restart Ollama (for local) or Chrome (for WebLLM) and try again