edison

A voice-activated web navigation extension for Chrome

View project on GitHub

Edison

Edison is a voice activated way to navigate the Chrome browser.

Demos

The voice interface can be triggered by the wakeword “Hey Edison”, followed by one of the supported commands documented in the User Guide below.

Wakeword Demo

Download

You can download the extension from the chrome store here.

User Guide

The following commands are currently supported:

  • Open
    Opens the first google search result that best matches the words spoken after the “open” command.
    Examples: “Hey Edison, Open News”, “Hey Edison, Open Youtube”, “Hey Edison, Open Netflix”.

  • Click
    Tries to click anything that approximately matches the words spoken after the “click” command.
    Examples: “Hey Edison, Click Sign-In”, “Hey Edison, Click the title of a video”, “Hey Edison, Click a Netflix profile”.

  • Close
    Closes current tab. Useful if a mistake or unintended tab is opened.
    Example: Just say “Hey Edison, Close tab”.

  • Scroll
    Scrolls the page up/down/left/right.
    Examples: “Hey Edison, Scroll Down”, “Hey Edison, Scroll Up”, “Hey Edison, Scroll Left”, “Hey Edison, Scroll Right”.

  • Media Controls for Video
    Plays or pauses the video in the current tab.
    Example: Just say “Hey Edison, Play” or “Hey Edison, Pause” when viewing a video.

  • Focus Next Tab
    Navigates to the next tab Example: “Hey Edison, Focus Next Tab”

  • Focus Previous Tab
    Navigates to the previous tab Example: “Hey Edison, Focus Previous Tab”

  • Go back
    Hits the browser back button Example: “Hey Edison, Go Back”

  • Go forward
    Hits the browser forward button Example: “Hey Edison, Go Forward”

  • Rewind
    Specific to Netflix. Rewinds the current Netflix title by 10 seconds.
    Example: Just say “Hey Edison, Rewind” when viewing a Netflix title.

  • Skip
    Specific to Netflix. Fast forwards the Netflix title by 10 seconds.
    Example: Just say “Hey Edison, Skip” when viewing a Netflix title.

Note that the interface currently handles one command at a time, therefore, each command will need to invoke the interface again separately.

For accessibility use cases, it is recommended that passwords be saved for the most commonly used websites to improve the overall user expierence.

Check out the Demos to see the tool in action!

Development Guide

To start up a development environment:

  1. Ensure you have nodejs.
  2. Clone the project and run:
  • npm install
  • npm run build
  1. Load the project directory as an unpacked chrome extension by:
  • Going to chrome://extensions
  • Toggle on “developer mode” in the top right corner of the page
  • Click the “Load unpacked” button on the top left and point to the directory you cloned in step 2.
  1. If you are making .jsx changes, you can run the watch command to automatically convert your .jsx changes to loadable .js files:
  • npm run watch

The entry point for all voice commands is located in the background script here.

Logging from extension side javascript is viewable by inspecting the background.html view from the extension entry under chrome://extensions. Note that developer mode must be enabled.

For injected content scripts, logging is viewable by opening the regular developer tools on the webpage the content script was injected into.

Note, the extension currently utilizes a few external dependencies:

  • Speech recognition with annyang.

  • Fuzzy search powered by fuse.

  • Wakeword detection powered by bumblebee.

Some useful resources:

(1) Chrome Extension Architecture Overview

(2) Chrome Extension Message Passing

If you have any questions, feel free to shoot me an email at klee2010@gmail.com.

Design Document

A design document for this project is available here.

You can also watch a presentation on the motivations behind the project here.

Suggestions and Feedback

If you have any ideas on how to improve the tool, or encounter any behaviour that is unexpected, please feel free to shoot me an email at klee2010@gmail.com