Skip to main content
Advertisement
Live broadcast
Main slide
Beginning of the article
Озвучить текст
Select important
On
Off

Russian scientists have created a technology that will allow AI not only to convey emotions by voice, but also to recognize them by human intonation. To do this, they analyzed the emotional cries of people and identified their acoustic characteristics. The researchers confirmed the effectiveness of the method by converting cat meows into sounds with different emotional shades that are understandable to most people. The technology is planned to be used to train artificial intelligence, as well as to create techniques that help patients with autism better understand emotions and interact with others. In addition, experts note the potential of the approach to enhance emotional impact in cinema.

Emotional meowing

Experts from Skoltech and the Institute of Higher Nervous Activity and Neurophysiology of the Russian Academy of Sciences have experimentally identified the physical parameters of sounds that are characteristic of laughter, crying and fright. They also found acoustic signs of the sincerity of these emotions. Then, in order to verify the validity of their results, the scientists used the data obtained to give neutral cat meows a different emotional color. Most of the people who took part in the study were able to recognize joy, sadness, and fear in the animal's sounds. The obtained characteristics can be used to train AI to express emotions in a voice and understand them from a person's voice, as well as to create technology that will help people with autism and schizophrenia who are unable to understand other people's feelings and communicate with others.

— We examined the non-verbal sounds of crying, laughing, and fright, and using sophisticated athematic methods, we identified their specific physical characteristics. These are indicators of volume, frequency spectrum, degree of randomness, and others. Joyful vocalizations were characterized by higher fractal dimensions, while sad sounds were louder and had reduced acoustic variability. Fear vocalizations were identified by their minimum and maximum volume levels and increased spectral power density in the 1-2 kHz range. Sincerity in non—verbal sounds correlated with non-linear characteristics," said Galina Portnova, a leading researcher at the Laboratory of Higher Human Nervous Activity at the Institute of the Russian Academy of Sciences.

Scientists have collected a set of videos with natural situations in which people experience these emotions, and extracted 664 sounds from it. After they identified the distinctive features of each experience, the scientists decided to test how they could affect a person's perception. To do this, they recorded a set of meows and gave it the specific characteristics of joy, sadness, and fear.

— People tend to "humanize" images and sounds. For example, you can recognize faces in inanimate objects or hear feelings in animal cries. Cats have learned to modify their natural vocalizations so that they sound like humans. That's why we decided to use them in experiments," the specialist said.

Scientists attached a microphone to a domestic cat, Doucet, and recorded its meowing in two days when interacting with humans, when it was as neutral as possible and the animal was not afraid of anything. These records were then modified using the algorithms found. This is how joyful, frightened and sad sounds were received, which were demonstrated to the experts.

80% recognized joyful meowing well, recognition of sadness and fright was slightly less, but also significant. Since the initial sound was neutral, the experiment confirmed our hypothesis about acoustic characteristics specific to each emotion," said Galina Portnova.

Technology application

Since scientists have managed to identify sound parameters for each emotion that indicate its sincerity, the data obtained can also be used in systems for evaluating human veracity in real time. At the next stage, the authors of the study plan to conduct a series of experiments with patients with autism, schizophrenia and depression to teach them to understand the feelings of their interlocutors.

— Our research shows that a person perceives certain universal characteristics of sound to recognize the emotional coloring of incoming information. Moreover, these characteristics are inherent in sounds as a whole, not just in perceived human speech. Next, we can transfer the found characteristics, for example, using generative AI methods, to any other initially neutral sounds and evoke the necessary emotions in the person listening to the recording," said Maxim Sharaev, senior lecturer at the Skoltech Center for Artificial Intelligence.

Максим Шараев

Maxim Sharaev

Photo: Skoltech

Such a technique will be in great demand in the clinic and psychophysiological experiments, but it may also be useful, for example, in the entertainment industry, he added.

— Developments that improve mutual understanding between AI and humans lead us to progress. But it is not entirely clear to what extent he may be sensitive to emotions. This study describes only basic emotions, but complex emotions are built from them, and with them everything is not so clear. So far, the machine does not know how to understand such nuances," said Irina Vetrova, a researcher at the Institute of Psychology of the Russian Academy of Sciences.

In her opinion, technology that suggests the feelings of others would be useful for some categories of patients. The request for such devices is relevant, but it is important in this case to use AI as an assistant, rather than relying entirely on its estimates.

Artemy Kotov, a specialist in communication systems and cybernetic emotional systems, explained to Izvestia that research on the influence of emotional coloring of the voice on the effectiveness of communication between humans and machines has been conducted for several decades. And the right parameters can really make it easier.

— It is necessary and useful. Emotions are recognized by humans, the question is where and how a robot should express them correctly. The machine's recognition of human feelings is also important. But the problem is that people need to separate the emotions they experience and the ones they express. This is a very fine line, and it may be too difficult to explain it to a computer," he noted.

Максим Шараев

Maxim Sharaev

Photo: Skoltech

According to Said Dashuk, the general producer of the RED MOON CINEMA film company, the sound parameters found can be used for additional emotional coloring of the actors' voices in the cinema.

— This is quite realistic, as the AI analyzes and learns well from voice examples. It can identify patterns of certain sound characteristics. Theoretically, this is realistic, but it needs to be tested in practice. Strengthening emotional accents in the voice would be very interesting for filmmakers to emphasize, for example, joy or laughter," the expert said.

According to him, this can be compared to the effect of the "25th frame", but the latter is too obvious and easy to recognize. But fine-tuning the sound remains invisible to the viewer and, thanks to its effect on the subconscious, can have a much stronger effect.

Переведено сервисом «Яндекс Переводчик»

Live broadcast