Eye tracking has an aura of being a fantastic and somewhat futuristic technology with tons of potential. And while I believe that to be true, this technology has evolved into an established means of solving some challenging problems over the past two decades. It is, for example, the cornerstone technology of assistive devices that provide people with movement and speech disabilities with a means to communicate, helping them live an independent life.
But our goal at Tobii has always been to positively impact everyone’s lives — not just the people who rely on our technology to live a normal life. We aim to make devices better and more intuitive, and in order to do that, enabling technologies like eye tracking need to be universal — and that’s what I’m going to talk about in this post.
To be universally applicable, a technology needs to adhere to norms and standards. Ideally, it shouldn’t take up too much space to ensure portability and mobility. Low computational load is always a consideration for maximizing battery life and ensuring performance. And it goes without saying that if you put a technology into a consumer device, it needs to work for everyone, everywhere, all the time.
It’s relatively easy to build a decent eye tracker that works for most people in most situations. At a basic level, all you need is a camera, a light source, and a processing unit. The light illuminates the person’s eyes, increasing the contrast between the pupil and the iris and creating reflections on the cornea. The camera takes images of a person’s eyes, and the processing unit finds the pupil and these reflections in the cornea. With this information, the known positions of the camera and light source, and the anatomy of the human eye, it is possible to calculate the position and angle of rotation of each eye. Calibrate the eye tracking system by asking the user to look at an object whose position is known, and you have everything you need to determine where a person is looking.
However, each new use case presents new challenges and I wish that there was some kind of secret formula that solved everything but unfortunately, there isn’t. It requires hard and dedicated work to turn a basic eye tracking system into something reliable.
To start with, we usually need to generate massive datasets. We need to know what information to look for and how to slice the data for the target application. A research scenario, for example, doesn’t necessitate the same challenging population coverage requirements as a device-native feature in a mass-market product — such as foveated rendering in a VR headset.
And then there's the issue of latency. A graphics-heavy application that uses split rendering, performing some computing on the device and some in the cloud, for example, requires a low latency connection both with the network and with the eye tracker. On the other hand, an application that supports eye-controlled menu selection won’t have the same latency requirements, which allows for quite some temporal filtering to enhance the user experience.
Some might argue that eye tracking is a pure computer science problem, and that machine learning will solve everything for you. And although machine learning is a vital part of our solution, when designing eye tracking algorithms, you need to consider the anatomy of the eye, how the brain interprets visual signals, as well as the goals of the target application.
But I think the biggest struggle comes when you move from ideation to commercialization. Failure is not an option for a mass-market scenario where millions of devices rely on your technology to be fully functional. Reaching 99% population coverage and beyond means that scenarios and persons that were considered outliers during ideation now need to be solved for. Droopy eyelids, make-up covering vital features, prescription glasses, contact lenses, and lazy/dominant eyes are all typical. In addition, you will likely need to manage headset slippage, as well as variations in interpupillary distance (IPD), face shape, skin reflectance in near-infrared, iris color, and component and placement tolerances.
To give you an idea of what the challenges look like, see the example images above. During development, you need to consider how to handle distortion caused by the VR lens, how to address stray light, and how to filter out ghost reflections caused by prescription glasses. Because, for all of these cases, you still need to find the pupil and the corneal reflections with sub-pixel accuracy — which is a complex problem to solve, but definitely solvable.
So hopefully, you believe me when I say that creating a basic eye tracking system is simple, but building one that works for everyone everywhere takes time and dedication. One of the things I didn't touch on in this post is performance evaluation and the importance of measuring the impact of changes on system performance to ensure you maintain the optimum design as you evolve to cater for new use cases. I purposely didn't address performance because some of my colleagues have spent the past couple of months focusing on this area. They have created a set of metrics and a methodology for measuring the performance of eye tracking systems, — which you can read about in our white paper Eye tracking performance assessment — for VR/AR headsets and wearables. If you want to try native eye tracking in a VR headset, take a look at the latest model to include Tobii’s technology, the
Pico Neo 3 Pro Eye, which has recently been announced.