Tech Team Stories

Call Quality at Aircall, Part 1: General Overview

Artur TrofymovLast updated on April 5, 2023

8 min

Select chapter

Ready to build better conversations?

Simple to set up. Easy to use. Powerful integrations.

Get free access

Select chapter

Ready to build better conversations?

Simple to set up. Easy to use. Powerful integrations.

Get free access

Aircall is a cloud-based phone system and call center software that pulls together main customer communication channels (e.g. phone, SMS) into one consolidated interface. It integrates with popular CRM and Helpdesk tools to keep all conversations connected. At Aircall, we believe in the power of conversation and we know that providing best-in-class call quality is fundamental to achieve that vision.

This is the first part of a series on the diving into Aircall call quality data. It briefly describes the call flow, reports on the main indicators for call quality monitoring we use, and provides a short list of the models we built.

Introduction and objectives

As discussed in an article published in Forbes magazine, understanding and modeling the relationship between quality of service and quality of experience has become an important task over the past decades. For the telecommunication sector it is essential to understand, monitor and control the quality of the telephony as this is the main service provided to customers. Therefore, our main objective is to highlight how much data is the key element for a deep understanding and precise monitoring of the quality of all calls at Aircall.

1. Call flow

At Aircall we use the WebRTC (Web Real-Time Communication) technology for audio communication between a client and a peer. A client is a call participant who has an Aircall account with an associated phone number, and a peer has any external phone number. In this case, a WebRTC connection happens through the media server, nevertheless it is also possible to connect to another web-based client.

A simplified model of the client-peer call flow at Aircall can be visualized as follows:

We split a call flow into two parts (legs) depending on the method of call data transmission:

Client leg - the part which corresponds to a data exchange between the client and media server through the internet connectivity (e.g. WiFi, LTE, 4G, 5G)
PSTN leg - the part between the Media Server and the Peer, which uses Public Switched Telephone Network (PSTN) capabilities and Session Initiation Protocol (SIP) Trunking

Such a split in the call flow requires Aircall to monitor both legs. Having information about data flow on Client and PSTN legs helps us reconstruct the overall call flow and its quality.

2. Call quality indicators

At Aircall, we have two main classes of call quality sources:

Technical - based on the VoIP (Voice over Internet Protocol) industry indicators that monitor accuracy, completeness and success of information being sent and received over connection
User-satisfaction - based on customer feedback

2.1 Technical indicators

To monitor the quality of VoIP calls, there are industry standard metrics that we can use at the finest granularity:

Latency - the time that it takes for data packets (formatted data unit carried by a network) to travel from their source to destination (i.e. client-to-peer delay), usually measured in milliseconds. High latency has such negative effects:
- one participant interrupts another due to delayed packets of voice data
- overlapping noises
- echo, when voices are repeated during a call or the participant hears their own words
RTT (Round-Trip Time) - the time taken for data to travel to the target destination and back. On practice, high RTT have similar effects as high Latency.
Jitter - a variation in latency of packets carrying voice data over a communication channel, measured in milliseconds. Being linked to Latency, high Jitter has the correlated list of symptoms:
- delayed or intermittent audio
- echo (similar to high latency effect)
- issues with connectivity, call may be disconnect for a few seconds or completely
Packet Loss - a measure of failure of packets, to reach their destination. It shows the percentage of data packets which were lost during the travel from source to destination, measured in percentage. In case of high number of dropped packets, the audio quality will suffer:
- lost audio or long pauses in the conversation
- garbled audio
- choppy voice
- call may also be dropped

All these technical indicators make up the industry standard metric called MOS, or Mean Opinion Score, that measures the overall quality of the VoIP calls. MOS is a numerical measure of the human-judged call quality. It is one of the universal metrics, which aggregates Latency, Jitter, Packet Loss as sub-metrics. MOS has a scale from 1 to 5, where 1 - bad, 2 - poor, 3 - fair, 4 - good, and 5 - excellent quality. Additionally, a value of 3.5 is usually considered as acceptable, while 4.7 is the highest possible practically achievable value. This is an Absolute Category Rating scale which is widely used in the industry.

The thresholds on each of these indicators, commonly used across the telecommunication (due to ITU-T) field are: 150ms for Latency, 300ms for RTT, 30ms for Jitter and 1% for Packet Loss. These thresholds set the upper limit on the indicator values before having call quality issues.

In order to have an overview of the call quality metrics integrated over a call duration, we use the average values of the listed indicators. In addition, Aircall has a more granular indicator for MOS evolution. It is equal to the number of warnings which are raised during a call when 3 out of 5 samples (seconds of a call) have MOS lower than 3.5. We call this indicator “Number of low-MOS alerts“. There is no strict threshold on the number of low-MOS alerts, therefore it is a topic for further studies.

2.2 User-satisfaction indicators

To monitor call quality from the customer side we use two indicators :

NPS (Net Promoter Score) - index that measures the willingness of customers to recommend a company’s product to others on a scale of 0 to 10. Based on surveyed rating, we classify customers in three categories: detractors (put 0 to 6), passives (put 7 to 8) and promoters (put 9 to 10).

The NPS is surveyed every 90 days. On top of the numeric rating, customers provide qualitative comments. These comments are of a great interest to Aircall, since they contain details about customers satisfaction with the quality of calls. Aircall and the data team developed a custom NLP-based model to categorize comments and assess whether call quality is the reason for promotion or detraction. Nevertheless, given the frequency of NPS surveys, it is hard to link the rating and comments to the specific call(s) which lead to that rating.
PCM (Post-Call Modal) - rating that customer can provide after each call. It has a scale from 1 to 5 stars, and aims to represent the customer’s level of satisfaction with the call quality. If a call is rated with 1 to 3 stars, the customer can specify the reason of their choice. The predefined reasons are: audio delay, audio breaking up, background noise, couldn’t hear others / others couldn’t hear me, call ended unexpectedly, echo. All these reasons cover industry measured metrics (e.g. Latency, Jitter, Packet Loss). At Aircall, PCM rating is only available for 0.5% of all calls, reflecting the low response rate of our customers.

If by any chance you are an Aircall customer and you are reading this article, please use these surveys to send us feedback about your experience with our product. Customer centricity is our core value at Aircall and we truly value feedback and do our best to act on it.

We have several limitations for both, technical and user-satisfaction, indicators:

The reliable source of data for all the listed technical metrics for us is a Client leg. The PSTN leg has several constraints, as it doesn’t provide us with MOS information and Latency calculation is not completely reliable. In total, for each call we have data on technical indicators (for at least one of the legs).
In the client-peer pair, only the client rates the call and it represents the overall quality from the users side. As mentioned above, our clients leave the PCM feedback only for half of percent of all calls (i.e. every 200th call is rated).

As we are aiming at Aircall to provide the highest-service quality to our customers, it is essential to be able to explain the customer’s feeling about the call quality with technical indicators. In this way we describe the customer’s experience, but we are also able to estimate it in case we don’t have any feedback or rating.

3. Models overview

In order to describe the customer’s satisfaction with the call quality, we built several models using the existing VoIP industry indicators and the surveyed PCM quality. The overall idea is to find the most efficient and precise model among them.

All of the models adopt the same strategy:

correlate technical information and user rating on the call quality
use information at call level

For user rating, all models use the same PCM quality scale to classify calls between poor and good quality. Calls which have from 1 to 3 stars, we label as bad-quality calls, while calls with 4 to 5 stars have good quality.

#	Model class	Model name	Use of call legs data	Technical metrics (thresholds used)	User-satisfaction metric
1	Client leg & MOS	Model 4.0	Client leg	MOS (4.0)	PCM (bad / good)
2		Model 4.13	Client leg	MOS (4.13)	PCM (bad / good)
3		Model 4.26	Client leg	MOS (4.26)	PCM (bad / good)
4		Model 4.36	Client leg	MOS (4.36)	PCM (bad / good)
5	Client + PSTN legs & MOS, Latency, RTTM Jitter, Packet Loss	Model 5	PSTN leg	Latency 150ms) RTT (300ms) Jitter (30ms) Packet Loss (1%)	PCM (bad / good)
6		Model 6	Client & PSTN legs	Latency (150ms, both legs) RTT (300ms, both legs) Jitter (30ms, both legs) Packet Loss (1%, both legs) MOS (4.36, both legs)	PCM (bad / good)
7		Model 7	Client leg	Latency (215ms) RTT (372ms) Jitter (52ms) Packet Loss (1.3%) MOS (4.36)	PCM (bad / good)
8		Model 8	PSTN leg	Latency (180ms) Jitter (40ms) Packet Loss (1.1%)	PCM (bad / good)
9		Model 9	Client & PSTN leg	Latency (170ms, both legs) RTT (370ms, Client leg) Jitter (30ms, both legs) Packet Loss (1.4%, both legs)	PCM (bad / good)
10	Client leg & #low-MOS alerts	Model 10	Client leg	#low-MOS alerts (threshold as a function of call duration)	PCM (bad / good)
11		Model 11	Client leg	#low-MOS alerts (as least 1 alert)	PCM (bad / good)
12		Model 12	Client leg	MOS (4.36) #low-MOS alerts (threshold as a function of call duration)	PCM (bad / good)

The performance of all these models is evaluated using statistical methods. This is a subject of the next parts of this series.

Next: Statistical analysis to determine the MOS threshold

In this article, we have shown the idea of call flow at Aircall, which is organized in two legs connected with the media server. We also walked you through the main indicators used in the VoIP industry to monitor the call quality as well as models built with these indicators.

In Part 2, we will discuss the methodology used for statistical analyses with these models and walk you through one use case of the threshold determination for MOS.

Published on March 13, 2023.