In order to keep track of the usage of their infrastructure, telecommunications companies rely on Call Detail Records (CDR). These are standardized records that are created to document every exchange a given communication device makes with the network. CDR data sits at the base of the billing calculations, which means that its analysis is a fundamental element in the Telco industry.

As we seen in our previous articles about Reporting from CDR, these reports contain a plethora of information that characterizes the usage of a network by its users. From them, we are able to extract information regarding the phone numbers involved in a call, its duration, along with the time and date.

In Part 1, we have introduced the data: A large .csv file, with milions of records composed by more than 40 fields each. This file had more than 5GB in size. To tackle this challenge, we used Unicage commands to convert this file from .csv to a “a space-separated value” file whislt, at the same time, we reduced its size.

In Parts 2 and 3, we combined Unicage commands with regular expressions in order to extract five fields of interest from the CDR file, along with the Portuguese phone numbers that made the calls. These regular expressions also acted as validation factors to check if the phone numbers were correct or not.

During Part 4, we have shown how simple it is to sum the duration of all the calls made by each phone number present in our file by using a single Unicage command. We stored these results in a separated file so that it could be used later.

Finally, in Part 5, we have brought together all the work developed in the previous parts. We started by adding the summed value of  the calls to each record that we have extracted from the CDR. Then, we have calculated the percentage of time a given phone call took with respect to the total duration of calls. Finally, we have created a final file containing all the information that we needed in a tabular format.

This is just one approach that we can do with Unicage to process CDR data and transform it into easy to use tabular data. More approaches could be considered and more solutions could be designed taking into account the customer needs.

