Financial Report using xml (Part 3)

In part 1 and 2 of this 3 part series we have demonstrated  how to turn a typical xml file into a tabular format file. In the third and final section we will articulate how to validate for missing fields and mismatch datatypes. This is an example of the “Checks” file already in the tabular format:

DevOperacoesDev_montante DevOperacoesDev_quantidade ImpLegal_montante ImpLegal_quantidade OperacoesDev_montante OperacoesDev_quantidade Operacoes_montante Operacoes_quantidade chs_id         divisa_operacao pais_operacao processador scheme setor_institucional tipo_cheque
8.42                     898                        7.77              7                   1040.03               671                     75191.47           30337                CHS.000000000a EUR             PT            1           1      S7                  9          
2286.91                  948                        1.85              1                   2357.33               801                     44864.07           75483                CHS.0000000002 EUR             PT            1           1      S4                  7          
6890.53                  878                        14.47             2                   3972.38               568                     52403.68           @                    CHS.0000000003 EUR             PT            1           1      S4                  7          
647.40                   173                        7.76              6                   3414.78               726                     302043.31          89854                CHS.0000000004 EUR             PT            1           1      S4                  9          
898.11                   183                        10.27             6                   1929.11               448                     479434.94          67094                CHS.0000000005 EUR             PT            1           1      S2                  7          
1282.50                  382                        21.54             3                   10.19                 10                      242343.41          79813                CHS.0000000006 EUR             PT            1           1      S1                  9          
805.10                   325                        83.92             10                  891.31                273                     98729.25           11235                CHS.0000000007 EUR             PT            1           1      S5                  3 

First we replace every missing field with “@”, with this line of code:

cat teste.xml | sed 's/></>@</g' 

We use the “sed” command to identify empty tags in the xml file and fill them with the “@” character. After this is done, we run the following code:

# --- Reads file ---
cat $file                                                                                           |
# --- Selects every field
self 1/NF                                                                                          |
# --- Regular expression to catch every mismatch data type ---
grep -vP "^[0-9]*\.\d\d [0-9]* [0-9]*\.\d\d [0-9]* [0-9]*\.\d\d [0-9]* [0-9]*\.\d\d [0-9]* [A-Z]{3}\.[0-9]*
[A-Z]{2,4} [A-Z]{2,4} \d \d [A-Z][0-9] \d$"         |
fcols --                                                                                                                                                        > INVALID_DATA/$filename

We use a regular expression to instruct the “grep” command to exclude all the records that do not present the desired format. The result of this command is a list all the records that conform to a given data format.

In the first example, the column “chs_id” contains the value “CHS.000000000a”. This value does not conform to the format the we specified using the regular expression. After we run this code we get the following file:

DevOperacoesDev_montante DevOperacoesDev_quantidade ImpLegal_montante ImpLegal_quantidade OperacoesDev_montante OperacoesDev_quantidade Operacoes_montante Operacoes_quantidade chs_id         divisa_operacao pais_operacao processador scheme setor_institucional tipo_cheque
8.42                     898                        7.77              7                   1040.03               671                     75191.47           30337                CHS.000000000a EUR             PT            1           1      S7                  9

6890.53                  878                        14.47             2                   3972.38               568                     52403.68           @                    CHS.0000000003 EUR             PT            1           1      S4                  7

As one can see, we were able to isolate the records with “@”, i.e., records with missing fields, and the record with the incorrect code “CHS.000000000a”, i.e., a mismatched datatype. This file is now in a ready state to be analyzed.

If you want to know more about this demonstration, you can send an email in our contacts section.

Find out more

Request a demo and speak with our team about how you can leverage the power of Unicage in your organization.

Privacy Policy