Reporting from CDR Data:

Use Case (Part 1)

In order to keep track of the usage of their infrastructure, telecommunications companies rely on Call Detail Records (CDRs). These are standardized (see https://www.fcs.org.uk/membergroups/billing) records that are created to document every exchange a given communication device makes with the network. CDR data sit at the base of the billing calculations, which means that its analysis is a fundamental element in the Telco industry. 

A CDR contains a wealth of information that characterizes the usage of a network by its users. For instance, a phone call generates a CDR that contains the phone numbers involved in the call, the time, the date and the duration of the call. We will describe how to deal with this type of data using the Unicage method in 5 parts:

  • Converting the data to “space-separated value”format;
  • Cleaning the data and assessing its quality (part 1);
  • Cleaning the data and assessing its quality (part 2);
  • Performing summations over all CDRs;
  • Joining the information and producing the results of the complete analysis.

This is part 1 of the process. We will work with some bogus CDR data (we put it in the file Bogus_CDR_Data.csv). The first line of our bogus data file is shown below:

 [ user@unicage ] $ head -1 example . csv
"G" ,"0" ," +351999990665 " ," +351999994370 " ," 12/02/2020 " ," 01:00:00 " ," 2865 " ," 837384086 " ,"
   206787737 " ," Faro " ,"" ," Special3 " ," .2577 " ," 1221 " ," +351999990321 " ," 2747 " ," BT312CR " ," Name " ,"
   1" ,"E" ," PT " ," NET2 " ," " ,"" ,"" ," +351999996172 " ," 719 " ,"" ," EUR " ,"" ," " ,"" ," " ," 1044171823 " ,"
   userID52260 " ,"" ,"" ," 361464 " ,"" ," "

Each CDR contains 42 fields, but for our analysis we will be using only two: the third one, which corresponds to the phone number that placed the call, and the seventh, which corresponds to the number of seconds that the phone call lasted. By focusing on these two fields, we will calculate the total number of seconds that each phone number spent on phone calls; along the way, we will apply data quality rules using regular expressions.

Before using the Unicage tools, we need to convert the csv file to the Unicage format, i.e., “a space-separated value” file. This is done with the Unicage command fromcsv:

 fromcsv Bogus_CDR_Data . csv > Bogus_CDR_Data_UF

After this, we have two files, as can be seen if we use the ls command:

[ user@unicage ] $ ls -lh
-rw -rw -r -- 1 user user 5.8 G Jan 12 10:18 Bogus_CDR_Data . csv
-rw -rw -r -- 1 user user 4.5 G Jan 12 10:20 Bogus_CDR_Data_UF

Notice that the converted file is smaller. One can understand why by looking at the first line of the converted file:

[ user@unicage ] $ head -1 Bogus_CDR_Data_UF
G 0 +351999990665 +351999994370 12/02/2020 01:00:00 2865 837384086 206787737 Faro _
Special3 .2577 1221 +351999990321 2747 BT312CR Name 1 E PT NET2 _ _ _ +351999996172 719
_ EUR _ _ _ _ 1044171823 userID52260 _ _ 361464 _ _

The data is exactly the same, but all the quotation marks are gone. In fact, only the relevant data is kept (i.e. the actual information regarding the phone calls) while the unnecessary data is left behind (the quotation marks). The Unicage format focuses on that: keeping only what matters. This way, the storage space required to save data is reduced. When we deal with large amounts of data, the choice of the format in which we save them is important. 

Want to learn more?

Find out more about how Unicage can help telecommunications businesses handle large volumes of call detail records.

Find out more

Request a demo and speak with our team about how you can leverage the power of Unicage in your organization.

Privacy Policy