Video conferencing had been around for years, after all, and the Stanford University professor had spent two decades studying and writing about digital communication and behavior. But video calls had always been more of an option than the rule, and Bailenson – along with the rest of the world – quickly found himself shocked by the impact of a complete shift to remote communication.
“After a week of shelter-in-place, I was just flabbergasted by how intense and exhausting it was,” says Bailenson, who lives in California, the first U.S. state that required residents to stay home to reduce the spread of the COVID-19 virus. “Most video conference studies are about how to improve productivity and collaboration, but the notion of it being draining hasn’t been studied.”
While Bailenson began re-reading “everything there was to read about video conferencing,” his friend at Microsoft, Jaron Lanier, was pondering a different angle to the problem. A late-night talk-show host in New York whose band Lanier occasionally played in was struggling to perform his monologue to a camera in his living room, without a live audience to react to his jokes. Lanier cast a net into Microsoft’s sea of researchers, psychologists and programmers, and within weeks he had pulled together what he calls a “magical” new feature to help the TV host and his viewers feel connected. His idea evolved into a Teams feature, Together mode, that potentially could reduce the fatigue of video calls for everyone.
“It was a fortuitous coincidence of needs” that led to a dramatic leap in improving remote meetings, says Lanier, a computer scientist, musician, artist and author who coined the term “virtual reality” and is considered a pioneer in the field.
Together mode, now rolling out in Microsoft Teams, combines decades of research and product development to place all the participants on a video call together in a virtual space, such as an auditorium, meeting room or coffee bar, so they look like they’re in the same place together. The new feature ditches the traditional grid of boxes, creating an environment that users say has a profound impact on the feel of the video conference and provides more cohesion to the group.
Together mode is built to give people the impression that everyone is looking at the entire group in a big virtual mirror, which Lanier says was the unique yet simple solution that changes the whole experience. People’s brains are used to being aware of others based on their locations, and the mirror effect makes it harder for the brain to notice eye contact irregularities. Those are some of the qualities that make it easier for everyone to tell how they are responding to each other.
“We’re social creatures, and the social and spatial awareness systems in the brain can finally function more naturally” within Together mode, Lanier says.
Scientists began studying problems with eye contact – or gaze misalignment – in earnest in the 1960s, and Lanier has been working to improve that element of video conferencing since the analog days of the 1970s. Yet while the technology has grown more robust and stable over the decades, there had been no real improvements to the human experience that were viable for widespread use. Together mode uses cloud computing instead of the specialized cameras and screens that used to be needed to make video calls better.
To understand video-call fatigue, Bailenson, the founding director of Stanford’s Virtual Human Interaction Lab, combed through decades of studies on communication and found a few key causes.
For example, he says, if someone’s face looms large in your visual sphere in real life, it generally means you’re either about to fight or mate. So you’re alert and hyper-aware – reactions that are automatic and subconscious – and your heart rate goes up. And in video calls, there’s often a grid with multiple people’s faces filling the boxes. It’s a lot for your body’s nervous system to handle, he says.
In addition, people are constantly interpreting others’ eye movements, posture, how their heads are tilted and more, and attributing meaning to those non-verbal cues. Researchers in the 1960s watched videotapes of groups frame by frame, Bailenson says, and discovered a complex, intricate dance: One person would turn their head and the other would lean back a little, for example.