Analyzing Crew Speech Nuance
Can biosensors improve flight instruction?
It depends, ironically, on the level of instructor input into the process of observing and grading flight crew behaviors in training.
In a research project examining crew audio from flight simulators, Ireland-based company Vocavio Civil Aviation System and B&P Consulting of Canada discovered that, on average, “for every 10 workload events identified by voice analytics, only 4 were observed and annotated by instructors.” This finding was derived from instructor data showing crew workload and crew stress levels.
“Communication competency has consistently posed a challenge to instructor standardization as it relies on subjective judgment calls about the presence or absence of specific aspects of communication,” the companies reported in a recent whitepaper, AI-based pilot training tools and their implications for instructors and pilot training outcomes.
[The paper was authored by Dr. Naila Ayala, a PhD in neuroscience; Jerome Bresee, FRAeS, VP Human Systems for Vocavio, and Conor McKenna, Vocavio Product Director.]
“As the aviation industry navigates the shift of pilot training from an hours-based training system to competency-based training and assessment (CBTA) solutions, there is an increasing need to understand how competencies are identified and assessed”
One of Vocavio’s specialties is capturing and analyzing voice communication… not necessarily the spoken words, but rather ‘speech signals’ such as pitch, rhythm, tone and energy that can be extracted from a crew dialogue. Signals of both crew are extracted and correlated using a patented method that outputs objective metrics for communication effectiveness and teamwork dynamics. Their statistical treatment of signal data is known as ‘Narrow AI’, whereby a narrow range of features in the voice are targeted (no new training data is required to calibrate the system for crew with different accents or procedures).
Voice communication during a flight simulation session is captured from headsets and passed to the vSIM Dialog software application, which converts the speech data into a ‘JSON’ file (JavaScript Object Notation – which uses the Cloud to convert speech to text); the file can be visualized by a debriefing system or exported to a larger AI training system to augment the instructor debrief.
Vocavio is focused on ‘how’ crew speak to each other as this is reliable indicator of communication performance; speakers in a dialogue accommodate each other by modulating their voice tones to ensure effective communication and coordination.
Interestingly, speech signal values that are extracted from a crew dialogue cannot be ‘reverse engineered’ into the words spoken by crew, adding a natural level of security and privacy.
These signals are ‘prosodic’ values from speech communication (non-verbal speech cues) that provide insights into communication, teamwork and most workload levels of crew at critical phases of flight. Prosody reflects the nuanced emotional features of the speaker or of their utterances: their obvious or underlying emotional state, the form of utterance (statement, question, or command), the presence of irony or sarcasm, certain emphasis on words or morphemes, contrast, focus…
According to McKenna, “The software has been deployed to solely meet the needs of safety and mission-critical environments, and the original algorithms that were trained on Japanese speech dialogue were retrained on over 300 hours of commercial and military pilot audio recordings (Tapoia Project, Trinity College, Dublin). This software is novel in its ability to take any multi-crew audio from training and generate performance metrics that directly relate to crew resource management skills deployed during high-stress, high-workload scenarios in a flight simulator.”
Performance metrics provide insights into:
- Communication performance
- Teamwork (team balance, effort levels, turn-taking, engagement)
- Workload management
“An airline from the Gulf region came to us and said, we’ve listened to all the tapes. The pilots all said the right thing (Standard Operating Procedures – SOPs) and they’ve still got it wrong. How was this possible?” asked McKenna. “This is where voice biosignals will play their part in flight operations alongside flight data monitoring data streams. Multi-crew pilots are 100 percent capable of saying the right thing and not being cognitively present in that moment – distraction and fatigue are most likely the culprits to failures in communication and coordination – and it’s not unique to the cockpit either.”
LOW- / HIGH-YIELD INSTRUCTOR DATA
The same project that discovered instructors either missed or didn’t record 6 of 10 observable communication behaviors (COM in competency-based training and assessment jargon) also revealed a wide variance in instructor annotations of simulator sessions. What Vocavio describes as ‘Top Data Yield’ instructors observed up to 82% of what the voice analytics observed; in contrast, ‘Lower Data Yield’ instructors observed as little as 16%.
“In other words,” McKenna said, “the [lower data yield] instructor is not providing enough data to support greater trust, accuracy and explainability that the industry needs for greater adoption of AI in pilot training.”
“One of the big challenges in any training organization is saying, this is how you operate your training exercise and we want you to use this e-grading application. Some instructors are more active in using that grading method than others. More capture is needed to make these new AI solutions more accurate and drive greater efficiencies for the training organization.” “The challenge on our side is mapping observable behaviors to the signal data from voice. The data scientists will tell us that the yield is too low when there are few observable behaviours logged; it simply can’t map…” Therefore, the ability to identify low-yield instructors in terms of their data outputs is critical to improving instructor standardization.
To read the Full Report on Vocavio’s research, order your copy of The Robot in the Simulator