How does a computer know where you’re looking?

How much information is too much? And where should it go? Heads-up display image from shutterstock.com

Imagine driving a car, using a heads-up display projection on the windshield to navigate through an unfamiliar city. This is augmented reality (AR); the information is used to not only guide you along a route, but also to alert you to salient information in your surroundings, such as cyclists or pedestrians. The correct placement of virtual content is not only crucial, but perhaps a matter of life and death.

Information can’t obscure other material, and should be displayed long enough for you to understand it, but not too much longer than that. Computer systems have to make these determinations in real-time, without causing any of the information to be distracting or obtrusive. We certainly don’t want a warning about a cyclist about to cross in front of the car to obscure the cyclist herself!

As a researcher in AR, I spend a lot of time trying to figure out how to get the right information onto a user’s screen, in just the right place, at just the right moment. I’ve learned that showing too much information can confuse the user, but not showing enough can render an application useless. We have to find the sweet spot in between.

A crucial element of this, it turns out, is knowing where users are looking. Only then can we deliver the information they want in a location where they can process it. Our research involves measuring where the user is looking in the real scene, as a way to help decide where to place virtual content. With AR poised to infiltrate many areas of our lives – from driving to work to recreation – we’ll need to solve this problem before we can rely on AR to provide support for serious or critical actions.

Determining where to put information

It makes sense to have information appear where the user is looking. When navigating, a user could look at a building, street or other real object to reveal the associated virtual information; the system would know to hide all other displays to avoid cluttering the visible scene.

But how do we know what someone is looking at? It turns out that the nuances of human vision allow us to examine at a person’s eyes and calculate where they are looking. By pairing those data with cameras showing a person’s field of view, we can determine what the person is seeing and what he or she is looking at.

Eye-tracking systems first emerged in the 1900s. Originally they were mostly used to study reading patterns; some could be very intrusive for the reader. More recently, real-time eye-tracking has emerged and become more affordable, easier to operate and smaller.

Eye-tracking spectacles can be relatively compact. Anatolich1, CC BY-SA

Eye trackers can be attached to the screen or integrated into wearable glasses or head-mounted displays. Eyes are tracked using a combination of the cameras, projections and computer vision algorithms to calculate the position of the eye and the gaze point on a monitor.

We generally look at two measures when examining eye tracking data. The first is called a fixation, and is used to describe when we pause our gaze, often on an interesting location in a scene because it has caught our attention. The second is a saccade, one of the rapid eye movements used to position the gaze. Basically, our eyes quickly dart from place to place taking in pieces of information about parts of a scene. Our brains then put the information from these fixations together to form a visual image in our minds.

Short periods of fixation are followed by quick movements, called saccades.

Combining eye tracking with AR

Often AR content is anchored to a real-world object or location. For example, a virtual label containing a street name should be displayed on that street. Ideally, we would like the AR labels to appear close to the real object it is associated with. But we also need to be careful not to let multiple AR labels overlap and become unreadable. There are many approaches to managing label placement. We’re exploring one option: calculating where the person is looking in the real scene and displaying AR labels only in that spot.

Augmented reality can provide additional information to shoppers. Augmented reality image via shutterstock.com

Say, for example, a user is interacting with a mobile application that helps him shop for low-calorie cereal in the grocery store. In the AR application, each cereal has calorie information associated with it. Rather than physically picking up each cereal box and reading the nutritional content, the user can hold up his mobile device and point it at a particular cereal box to reveal the relevant information.

But think about how crowded a store’s cereal aisle is with various packages. Without some way to manage the display of AR labels, the calorie information labels for all the cereal boxes would be displayed. It would be impossible to identify the calorie content for the cereal he is interested in.

By tracking his eyes, we can determine which individual cereal box the user is looking at. Then we display the calorie information for that particular cereal. When he shifts his gaze to another box, we display the figures for the next one he considers. His screen is uncluttered, the information he wants is readily available and when he needs additional information, we can display that.

This type of development makes it an exciting time for AR research. Our ability to integrate real-world scenes with computer graphics on mobile displays is improving. This fuels the prospect of creating stunning new applications that expand our ability to interact with, learn from and be entertained by the world around us.

The Conversation

Ann McNamara receives funding from National Science Foundation (NSF). This material is based upon work supported by the National Science Foundation under Grant No. 1253432. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

What you see is not always what you get: how virtual reality can manipulate our minds

Who are you really talking to in your virtual chat? Shutterstock

It is often said that you should not believe everything you see on the internet. But with the advent of immersive technology – like virtual reality (VR) and augmented reality (AR) – this becomes more than doubly true.

The full capabilities of these immersive technologies have yet to be explored, but already we can get a sense of how they can be used to manipulate us.

You may not think you are someone who is easily duped, but what if the techniques used are so subtle that you are not even aware of them? The truth is that once you’re in a VR world, you can be influenced without knowing it.

Unlike video conferencing, where video data is presented exactly as it is recorded, immersive technologies only send select information and not necessarily the actual graphical content.

This has always been the case in multiplayer gaming, where the gaming server simply sends location and other information to your computer. It’s then up to your computer to translate that into a full picture.

Interactive VR is similar. In many cases, very little data is shared between the remote computer and yours, and the actual visual scene is constructed locally.

This means that what you are seeing on your end is not necessarily the same as what is being seen at the other end. If you are engaged in a VR chat, the facial features, expressions, gestures, bodily appearance and many other factors can be altered by software without you knowing it.

Stanford researchers examine the psychology of virtual reality.

Like you like me

In a positive sense VR can be helpful in many fields. For example, research shows that eye contact increases the attentiveness of students, but a teacher lecturing a large class cannot make eye contact with every student.

With VR, though, the software can be programmed to make the teacher appear to be making eye contact with all of the students at the same time. So a physical impossibility becomes virtually possible.

But there will always be some people who will co-opt a tool and use it for something perhaps more nefarious. What if, instead of a teacher, we had a politician or lobbyist, and something more controversial or contentious was being said? What if the eye contact meant that you were more persuaded as a result? And this is only the beginning.

Research has shown that the appearance of ourselves and others in a virtual world can influence us in the real world.

This can also be coupled with techniques that are already used to boost influence. Mimicry is one example. If one person mimics the body language of another in a conversation, then the person being mimicked will become more favourably disposed towards them.

In VR it is easy to do this as the movements of each individual are tracked, so a speaker’s avatar could be made to mimic every person in the audience without them realising it.

More insidious still, all the features of a person’s face can easily be captured by software and turned into an avatar. Several studies from Stanford University have shown that if the features of a political figure could be changed even slightly to resemble each voter in turn, then that could have a significant influence on how people voted.

The experiments took pictures of study participants and real candidates in an mock up of an election campaign. The pictures of each candidate were then morphed to resemble each participant in turn.

Stanford researcher Jeremy Bailenson explains how political manipulation was easily done in VR experiments.

They found that if 40% of the participant’s features were incorporated into the candidates face, the participants were entirely unaware the image had been manipulated. Yet the blended picture significantly influenced the intended voting result in favour of the morphed candidate.

What happens in the virtual world does not stay in the virtual world. We must therefore be mindful when we step into this new realm that what we see is not always what we get.

The Conversation

David Evans Bailey does not work for, consult, own shares in or receive funding from any company or organization that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.

Breaking the fourth wall in human-computer interaction: Really talking to each other

Hold a conversation with Harry Potter! Interactive Systems Group, The University of Texas at El Paso, CC BY-ND

Have you ever talked to your computer or smartphone? Maybe you’ve seen a coworker, friend or relative do it. It was likely in the form of a question, asking for some basic information, like the location of the best nearby pizza place or the start time of tonight’s sporting event. Soon, however, you may find yourself having entirely different interactions with your device – even learning its name, favorite color and what it thinks about while you are away.

It is now possible to interact with computers in ways that seemed beyond our dreams a few decades ago. Witness the huge success of applications as diverse as Siri, Apple’s voice-response personal assistant, and, more recently, the Pokémon Go augmented reality video game. These apps, and many others, enable technology to enhance people’s lives, jobs and recreation.

Yet the potential for future progress goes well beyond just the newest novelty game or gadget. When properly merged, computers can become virtual companions, performing many roles and tasks that require awareness of physical surroundings as well as human needs, preferences and even personality. In the near future, these technologies can help us create virtual teachers, coaches, trainers, therapists and nurses, among others. They are not meant to replace human beings, but to enhance people’s lives, especially in places where real people who perform these roles are hard to find.

This is serious next-level augmented reality, allowing a machine to understand and react to you as you exist in the real physical world. My colleagues and I focus on breaking the fourth wall of human-computer interaction, letting you and computer talk to each other – about yourselves.

Bringing computers to life

Our goal was to help people build rapport with virtual characters and analyze the importance of “natural interaction” – without controllers, keyboard, mouse, text or additional screens.

To make the technology relatable, we created a Harry Potter “clone” by using IBM’s Watson artificial intelligence systems and our own in-house software. Through a microphone, you could ask our virtual Harry anything about his life, provided there was a reference for it in one of the seven books.

A demonstration of the virtual Harry Potter.

Since then we have also built a museum guide that helps visually impaired people to experience art. Our prototype character, named Sara, resides in a gallery in Queretaro, Mexico, where people can talk to her and ask about the artwork also on display.

We also created a “Jeopardy”-style game host, with whom you can play the popular trivia game filled with questions about our university. You talk to the character as if he were a real host, choosing the category you want to play and answering questions.

A college freshman interacts with the game show host for the first time. Interactive Systems Group, The University of Texas at El Paso, CC BY-ND

We even have our own virtual tour guide at the Interactive Research Group laboratory at UTEP. She answers any questions our hundreds of yearly visitors may have, or asks the researchers to help her out if it is a tough question.

Our most advanced project is a survival scenario where you need to talk, gesture and interact with a virtual character to survive on a deserted island for a fictional week (about an hour in real time). You befriend the character, build a fire, go fishing, find water and shelter, and escape other dangers until you get rescued, using just your voice and full-body gesture tracking.

A researcher interacts through speech and gesture with Adriana, the jungle survival virtual character. Interactive Systems Group, The University of Texas at El Paso, CC BY-ND

Understanding humans

These projects are fun to “play” for a reason. When we build human-like characters, we have to understand people – how we move, talk, gesture and what it means when you put everything together. This doesn’t happen in an instant. Our projects are fun and engaging to keep people interested in the interaction for a long time.

We try to make them forget that there are sensors and cameras hidden in the room helping our characters read body posture and listen to their words. While people interact, we analyze how they behave, and look for different reactions to controlled characters’ personality changes, gestures, speech tones and rhythms, and even small things like breathing, blinking and gaze movement.

The next steps are clearly bringing these characters outside of their flat screens and virtual worlds, either to have people join them in their virtual environments through virtual reality, or to have the characters appear present in the real world through augmented reality.

A student talks to Merlin, a character that recognizes speech and interacts in virtual reality. Inmerssion, CC BY-ND

We’re building on functions – particularly graphic enhancements – that have been around for several years. Several GPS-based games, like Pokémon Go, are available for mobile devices. Microsoft’s Kinect system for Xbox lets players try on different clothing articles, or adds an exotic location background to a video of the person, making it appear as if they were there.

More advanced systems can alter our perspective of the world more subtly – and yet more powerfully. For example, people can now touch, manipulate and even feel virtual objects. There are devices that can simulate smells, making visual scenes of beaches or forests far more immersive. Some systems even let a user choose how certain foods taste through a combination of visual effects and smell augmentation.

A vast and growing potential

All these are but rough sketches of what augmented reality technology could one day allow. So far most work is still heavily centered in video games, but many fields – such as health care, education, military simulation and training, and architecture – are already using it for professional purposes.

For now, most of these devices operate independently from one another, rather than as a whole ecosystem. What would happen if we combined haptic (touch), smell, taste, visuals and geospatial (GPS) information at the same time? And then what if we add in a virtual companion to share the experience with?

Unfortunately, it’s common for new technology to be met with fear, or portrayed as dangerous – as in movies like “The Matrix,” “Her” or “Ex-Machina,” where people live in a dystopian virtual reality world, fall in love with their computers or get killed by robots designed to be indistinguishable from humans. But there is great potential too.

A sampling of our team’s developments in virtual characters.

One of the most common questions we get is about the potential misuse of our research, or if it is possible for the computers to attain a will of their own – think “I, Robot” and the “Terminator” movies, where the machines are actually built and operating in the physical world. I would like to think that our research as a community will be used to create incredible experiences, fun and engaging scenarios, and to help people in their daily lives. To that end, if you ask any of our characters if they are planning to take over the world, they will tease you and check their calendar out loud before saying, “No, I won’t.”

The Conversation

Iván Gris was funded by the Interactive Systems Group at the University of Texas at El Paso and is the founder of Inmerssion but the company has not received funding.