How technology can capture human emotions
Alexa, Siri & Co. are the leading examples of voice assistants. The difference between them and simple applications is their use of sentiment analysis, which is used to determine linguistic and emotional contexts. The experts of the Reply Practice Voice Machine Interfaces make this technology available to clients so they can use it for deeper customer insights.
Intelligent voice recognition has existed since 1952. In this year, Bell Labs presented automatic digit recognition, which was able to recognise spoken numbers with great accuracy. Today voice recognition systems, such as chatbots, can do quite a good deal more and are used in numerous high-performance voice interfaces like Alexa and co., which are convenient and user-friendly. However, there are still issues at a certain point that are decisive for customer service or product reviews: It is referred to as “sentiment analysis” in voice technology communication. Voice assistants should be able to correctly recognise and interpret the mood or tone of what people say.
Sentiment analysis offers great added value for companies throughout every industry. Software can automatically evaluate written text or spoken content. Employees therefore no longer need to read through long, convoluted, or error-ridden text. Such applications save time and money, especially when the goal is to monitor social media or gather customer reviews and feedback on service.
What happens with our speech when microphones record it? A simplified answer to this question is that voice technology records what people say using pattern recognition and then digitises it by converting what was said into binary language.
The individual noises, words and interrelationships lose their meaning for people as a result. Machines use these language modules to compare them with saved digital models. This comparison can take place on many levels: Using simple numeric pattern recognition makes it possible to process a selection in a hotline queue. Highly complex semantic networks are able to recognise relational meanings in running texts. One example of the latter is sentiment analysis.
The voice application can use sentiment analysis to recognise the semantics of a sentence. The individual linguistic components of a sentence are correctly linked so that a context and meaning can be assigned to this sentence. In order to guarantee correctness, however, the technology must be able to understand the user’s mood and feeling.
Complex machine learning models support high-performance applications. They record the context of spoken or written statements in order to quantify emotions, politeness, vehemence and, of course, factual content.
There are different performance levels when sentiment analysis is used. The most simple software versions search text for unique terms, called “bags of words”, that can be attributed to an emotional state without a doubt. “Today I feel great” or “Man, this weather is terrible!” are statements that are easy to classify using the adjectives they contain.
Things become more complicated when the application needs to recognise the overall meaning of longer statements or text and tonality that changes within a statement. Semantic networks that understand the relationships between different words are used for this purpose. For instance, if a user issues the voice command “I am looking for a place to stay for me and my 100 chickens”, the voice technology needs to recognise that they are not looking for a normal hotel.
Most applications deliver a relatively simple evaluation consisting of keywords and an appropriate probability calculation. It can be processed using algorithms, saved and used for other applications. For this purpose, both an emotional state on a polar scale (such as joy versus rage) and the respective probability are determined as a certain value between zero and one. For instance, the evaluation “Joy: 0.78456” indicates that the user has very probably made a happy, positive statement.
What is referred to as ontologies constitutes an additional level of complexity; ontologies recognise individual terms as a collection of properties that are conceptually connected with other terms. The statement “That was a total surprise!” illustrates such an ontology and can be easily understood: When used in relation to a film at the cinema, the statement would be positive. However, in the context of software application use, it would more likely be negative.
Using interfaces that are specially programmed for these contexts, such as those used for Amazon Alexa or Google Home, it is possible to convert these kinds of statements via speech-to-text and then evaluate them with sentiment analysis APIs. Applications like these are able to interpret the emotionality and polarity of statements.
However, one disadvantage is that voice assistants generally only “listen” for a few seconds when processing statements. That means they are not able to perform the kind of deeper analysis that would be possible with running text, for instance. Nevertheless, they are suitable for recording short recommendations or opinions.
Companies frequently use sentiment analysis for opinion mining, which is to say, opinion analysis. For online retailers or financial service providers, for instance, it is important to know what people are writing about performance, products or services in social media. Additionally, it is possible to gather opinions about what the target group wants or what mood consumers are in when they ring the call centre. The company can use the knowledge gained to improve the products or services or use the benefits of voice technology in marketing.