In order to keep track of the usage of their infrastructure, telecommunications companies rely on Call Detail Records (CDRs). These are standardized (see https://www.fcs.org.uk/membergroups/billing) records that are created to document every exchange a given communication device makes with the network. CDR data sit at the base of the billing calculations, which means that its analysis is a fundamental element in the Telco industry.
A CDR contains a wealth of information that characterizes the usage of a network by its users. For instance, a phone call generates a CDR that contains the phone numbers involved in the call, the time, the date and the duration of the call. We will describe how to deal with this type of data using the Unicage method in 5 parts:
- Converting the data to “space-separated value”format;
- Cleaning the data and assessing its quality (part 1);
- Cleaning the data and assessing its quality (part 2);
- Performing summations over all CDRs;
- Joining the information and producing the results of the complete analysis.
This is part 4.
Read the previous parts
In part 2 and part 3 we filtered the CDR data and created a file that contains only 5 fields out of the 42 contained in a CDR. Furthermore, we applied filters to some of the selected filters to assure the quality of the data and we saved the result of these operations in a file named tmp_phone_calls_duration. We can see the original CDR data in csv format (Bogus_CDR_Data.csv), the same CDR data in the Unicage format (Bogus_CDR_Data_UF) and the file with the data selection just mentioned:
[ user@unicage ]$ ls -lh
-rw -rw -r-- 1 user user 5.8G Jan 12 10:18 Bogus_CDR_Data .csv
-rw -rw -r-- 1 user user 4.5G Jan 12 10:20 Bogus_CDR_Data_UF
-rw -rw -r-- 1 user user 1.1G Jan 12 10:22 tmp_phone_calls_duration
-rw -rw -r-- 1 user user 215K Feb 11 10:23 tmp_number_total_duration
Here’s how the first five lines of tmp_phone_calls_duration look like:
[ user@unicage ]$ head -5 tmp_phone_calls_duration
+351999990000 12/02/2020 01:00:00 +351999994375 3558
+351999990000 12/02/2020 01:00:00 +351999991637 1076
+351999990000 12/02/2020 01:00:00 +351999990235 1271
+351999990000 12/02/2020 01:00:00 +351999993094 1804
+351999990000 12/02/2020 01:00:00 +351999990902 102
Next, we want to sum of the duration of all the calls made by each phone number in this file. To do this we use Unicage’s sm2 command. This command sums the values of a field for each record with the same key:
sm2 1 1 5 5 tmp_phone_calls_duration > tmp_number_total_duration
In this case, we summed the contents of the the 5th field (duration) for the records with the same phone number (1st field) in the file tmp_phone_calls_duration. We then write the results of this action to the file tmp_number_total_duration which looks like this
[ user@unicage ]$ head -5 tmp_number_total_duration
As an example, we see that the sum of the duration of all the calls registered in this file for phone number +351999990000 is 4101500 seconds.
Want to learn more?
Find out more about how Unicage can help telecommunications businesses handle large volumes of call detail records.