Back to Spectra

How to use Spectra

Spectra sees your screen, speaks what matters, and acts on your voice command. No reading. No staring. No typing. Just talk.

1Getting started

Press Q or say "Hey Spectra" to start.

Allow microphone access when your browser asks. Screen sharing is optional, enable it with W for screen-aware responses.

Spectra will say "Connected" when ready.

2Cloud mode (Gemini Live, default)

Cloud mode is the default at spectra.aqta.ai. It is powered by Gemini Live, Google's real-time multimodal API, which lets Spectra see your screen, hear your voice, and respond in speech simultaneously, with no perceptible delay.

No setup required. Open the site, press Q, allow microphone access, and you are connected.

What Gemini Live enables

  • Real-time screen understandingPress W to share your screen. Gemini processes the live video stream, so you can ask "what's on screen?" or "click the submit button" and it responds to exactly what it sees.
  • Native voice in, voice outAudio goes directly to Gemini Live. No intermediate transcription step, responses feel instant.
  • Screen memorySay "Remember this" to save a snapshot. Later, ask "What changed?" and Spectra compares the current screen to the saved one.
  • Guided toursSay "Teach me this app" for a narrated walkthrough of whatever is on screen.
  • InterruptionSay "Stop" at any point during a response. Gemini Live handles barge-in natively so audio cuts off immediately.

Screen and audio are streamed to the Gemini API only while your session is active and are never stored. See the privacy policy for details.

3Local mode (offline, no API key)

Switch to Local in the mode toggle at the top of the page. In local mode, everything runs on your own machine, no data leaves your device.

Local mode uses Gemma 4 via Ollama for language, Whisper for voice recognition, and Piper for speech. You need to run these locally before connecting:

  1. Install Ollama and pull the model:
    ollama pull gemma4
  2. Start Ollama: ollama serve
  3. Run Spectra locally: ./run.sh
  4. Select Local in the mode toggle and press Q.

Local mode is text-only, no screen sharing. Browser control requires the Spectra Bridge extension to be connected. The hosted site at spectra.aqta.ai runs online mode only; local mode requires running Spectra on your own machine.

4Voice commands

"Where am I?"Describes the current screen
"What’s on screen?"Full screen description
"Click the [button name]"Clicks an element
"Type [your text]"Types into the focused field
"Scroll down / up"Scrolls the page
"Go to [website]"Navigates to a URL
"Press Enter / Tab / Escape"Presses a key
"Remember this"Saves a screen snapshotonline only
"What changed?"Compares to a saved snapshotonline only
"Teach me this app"Guided tour of the screenonline only
"Stop / Cancel"Interrupts the current action

5Keyboard shortcuts

QStart or stop Spectra
WShare your screen (Gemini Live only — not available in Gemma preview)
EscStop Spectra
TabNavigate between controls

6Spectra Bridge extension

To control other tabs, clicking, typing, and scrolling on any website, install the Spectra Bridge Chrome extension. Download it from the Spectra site, then:

  1. Open chrome://extensions
  2. Enable Developer mode (top-right toggle)
  3. Click Load unpacked
  4. Select the downloaded spectra-bridge folder

Without the extension, Spectra can still answer questions and respond by voice.

7Privacy

In online mode, Spectra processes data only while a session is active. Screen and voice are streamed to the Gemini API in real time and never stored. When your session ends, everything is discarded. No accounts, no tracking, no analytics.

In local mode, nothing leaves your device at all. All processing, speech recognition, language understanding, and voice output, happens on your own hardware.

Full privacy policy

8Tips

Speak naturally, Spectra understands conversational language, not just commands.

Say "Stop" at any time to interrupt.

Spectra confirms before destructive actions like deleting or sending.

If Spectra cannot find an element, it will scroll and try again automatically.

In local mode, keep requests concise, Gemma performs best with short, direct instructions.