If you aren’t a born and bred Swede, you might not know about Hassan, a radio comedy show which was hugely popular in the 90’s. The sketches all revolved around prank phone calls. In one of my favourite episodes the caller claims to be the CEO of a company selling patterns for cross stitch embroidery. He’s calling a Norwegian competitor to see if this guy can share any business insights, since the caller plans to enter the Norwegian market.
It’s funny just to hear the Norwegian guy get confused and somewhat provoked (Swedes love making fun of Norwegians, it makes us smile just to hear Norwegians talk), but the real joke comes when the caller (a young Fredrik Lindström, who later went on to become a renowned comedian, author and general media personality) repeatedly interrupts his interlocutor to ask: Are you talking right now or am I?
The question makes no sense, which is what makes the skit work of course. But as with all great comedy, it also works because it riffs on some vaguely familiar experience.
I thought of that the other day when I took part in an experiment where I were to brainstorm together with one other human being (whom I’d never met before) and where the facilitator of the session was a “social robot”.
That is to say; the robot was branded as being social, but really the purpose of the session was for the researchers who recorded us to understand if the humans in the room felt OK with having their conversation being guided by a non-human entity.
The answer, I think for both of us, was: not really.
And here’s a big part of why: From the way it (“he”? “She”? “They”?) asked follow-up questions it was obvious that it didn’t get the first thing about what we were saying. Here’s a teaser:
Human no 1: It’d be cool if you could use a robot to interact with your friends remotely
Human no 2: Yeah like, maybe the robot could show you if they were around, if they were still in bed or if they’d got up and was available, and then you could tell the robot to call them up on facetime.
Robot: That’s very interesting, I’m learning so much. Now on to the next question: Could you imagine using a robot like myself for interacting with friends via video conferencing?
[embarrassed silence…]
But the robot not getting it, wasn’t the only problem. It also kept interrupting the humans in the room, which created an uncertainty of whether one was suppose to talk or shut up and listen. It was a bit like in that Hassan skit, just without the comedy.
The experience made me think of human factors engineering, a discipline that was born during the second world war in an effort to make machines more intuitive to operate.
A recurring problem was that pilots would attempt to land without extending their skids, often with disastrous results.
The psychologists and cognitive scientists tasked with preventing these types of accidents started by abandoning the old explanatory model, where the “human factor” was always to blame.
Instead they tried the idea that machines should be built in such a way that they help the human operator to understand them. The problem with the landing gear proved to be an excellent example: on some airplanes they were extended by pulling a lever forwards, but in other cockpits the lever should be pulled backwards. The poor pilots, who routinely had to rotate through flying a number of different machines, didn’t have much time to understand which airplane worked what way.
Once the engineers caught on and started applying more predictable designs to the cockpits of different types of airplanes, accident rates dropped drastically. Generally speaking, we have the human factors discipline to thank for the standardisation of machine interfaces. Think about that next time you rent a car and you’re up and running with the new control panel at a glance.
But human factors also brought a specific concept which I think is highly pertinent when designing interaction with robots that are supposed to be perceived as social.
That concept is called mode confusion.
The term is self explanatory; mode confusion occurs when it’s not clear which mode a machine is in. Like whether a lever pulled backwards results in extending or retracting the landing gear of an airplane.
Finding ways to avoid mode confusion has been a hot topic in the autonomous vehicles community, where it’s of obvious importance that the driver knows exactly when the car takes over control, and when it cedes it back to the human operator. There are a lot of nifty solutions here, including the framework proposed in this research article, where gaze tracking is used to detect mode confusion in the driver.
I think mode confusion explains a large part of what makes it so eerie to interact with robots. True; the one I workshopped with the other day did move its neck and blink its eyes to indicate that it was paying attention, but one could still never be certain that it was listening or whether it was just waiting for a gap in the audio feed in order to say its next canned line.
I sometimes had similar experiences when I used to work in elderly care and tried to make myself understood with people afflicted by severe dementia. That made me understand how much of human interaction that lives in the tiniest of gestures and sounds, and what a blow it is to us when these abilities are compromised.
I think the problem of mode confusion gets harder the more seamless we want the interaction to be. Pulling a lever is blunt and tactile, it provides relatively good affordance. Punching a button symbol on a screen perhaps less so. Equally, holding down the side button on my iPhone to activate Siri is cumbersome but at least it’s clear, whereas I can never be sure she’s there if I’m trying to summon her with voice command. If one day we get actual robots capable of moving around in our homes, addressing us like any other family member, then it’s going to be of paramount importance that we know exactly “where we have them”.