Processing Financial Reports in XBRL

Part 2

XBRL file format is crucial in nowadays communications between several types of businesses and financial institutions, making it a standard file format for reporting information and financial data. Many national banks and financial regulators such as the European Banking Authority (EBA) use XBRL files in a regular basis, requiring efficient and reliable systems to process and validate this type of files.

In part 1, we made two initial validations in XBRL reports:

  • Validation of the file namespaces
  • Validation of the indicators declared in the report

In this second part, we will start by parsing the entire XBRL file. Then, we will validate the identifier sequences and, finally, extract and validate all values regarding financial transactions in Euros.

After the initial validations, we can finally parse the XBRL file in order to extract its contents. Due to the similarities with XML, Unicage’s xmldir command acts as the right tool for the job. However, since the files have plenty of definitions and declarations in the top hierarchy members, it is better to apply some sed commands to improve the readability before using xmldir.

# PARSES XBRL FILE
# removes first 2 lines
tail -n+3 DUMMYLEI123456789012_PT_RES010200_RESOLCON_2021-07-21.xbrl |
# removes schema info for easy reading
sed -e 's/^<xbrli:xbrl.*/<xbrli:xbrl>/' |
# removes empty lines
sed '/^[[:space:]]*$/d' |
# parsing using xmldir
xmldir xbrli:xbrl - >  DUMMYLEI123456789012_PT_RES010200_RESOLCON_2021-07-21-FULL_XBRL

The result will be a more comprehensible file as seen in the following sample:

xbrli:xbrl link:schemaRef xlink:type simple xlink:href http://www.eba.europa.eu/eu/fr/xbrl/crr/fws/res/cir-2018-1624/2021-07-15/mod/resol_con.xsd
xbrli:xbrl xbrli:unit xbrli:measure xbrli:pure
xbrli:xbrl xbrli:context xbrli:entity xbrli:identifier scheme http://standards.iso.org/iso/17442 DUMMYLEI123456789012
xbrli:xbrl xbrli:context xbrli:period xbrli:instant 2021-12-31
xbrli:xbrl find:fIndicators find:filingIndicator contextRef c1 T_01.00
xbrli:xbrl find:fIndicators find:filingIndicator contextRef c1 T_02.00
xbrli:xbrl find:fIndicators find:filingIndicator contextRef c1 T_03.01
xbrli:xbrl find:fIndicators find:filingIndicator contextRef c1 T_03.02
xbrli:xbrl find:fIndicators find:filingIndicator contextRef c1 T_03.03
(…)
xbrli:xbrl xbrli:context xbrli:entity xbrli:identifier scheme http://standards.iso.org/iso/17442 DUMMYLEI123456789012
xbrli:xbrl xbrli:context xbrli:period xbrli:instant 2021-12-31
xbrli:xbrl xbrli:context xbrli:scenario xbrldi:typedMember eba_typ:LE 1
xbrli:xbrl eba_met:si168 contextRef c2 8880
xbrli:xbrl eba_met:si288 contextRef c2 1883
xbrli:xbrl eba_met:ei555 contextRef c2 eba_ZZ:x428
xbrli:xbrl eba_met:ei152 contextRef c2 eba_GA:DK
xbrli:xbrl eba_met:bi628 contextRef c2 true
xbrli:xbrl eba_met:ei326 contextRef c2 eba_ZZ:x64
xbrli:xbrl eba_met:bi629 contextRef c2 true
xbrli:xbrl eba_met:ei4 contextRef c2 eba_AS:x2
xbrli:xbrl eba_met:bi630 contextRef c2 true
xbrli:xbrl xbrli:context xbrli:entity xbrli:identifier scheme http://standards.iso.org/iso/17442 DUMMYLEI123456789012
xbrli:xbrl xbrli:context xbrli:period xbrli:instant 2021-12-31
xbrli:xbrl xbrli:context xbrli:scenario xbrldi:typedMember eba_typ:LE 2
(…)

After the file has been parsed, we can perform other technical validations that are applied by EBA.  In this case, we validate the identifier sequence of the XBRL file. This sequence must be equal across the entire file and comprises the link containing the XML schema, the file identifier, which usually is part of its name, and the instant, which corresponds to a reference date. Here is an example of the identifier sequence in the original XBRL file and in the parsed file:

XBRL file:

  <xbrli:context id="c1">
    <xbrli:entity>
      <xbrli:identifier scheme="http://standards.iso.org/iso/17442">DUMMYLEI123456789012</xbrli:identifier>
    </xbrli:entity>
    <xbrli:period>
      <xbrli:instant>2021-06-30</xbrli:instant>
    </xbrli:period>
  </xbrli:context>

Parsed file:

xbrli:xbrl xbrli:context xbrli:entity xbrli:identifier scheme http://standards.iso.org/iso/17442 DUMMYLEI123456789012
xbrli:xbrl xbrli:context xbrli:period xbrli:instant 2021-12-31

In order to extract the identifier sequence from the parsed XBRL file, we use a combination of grep and awk to extract only the lines that are of interest to us and organize them. Then, using self, we can select the fields of interest without  having all the other needless information, and store it in a temporary file for validation.

cat DUMMYLEI123456789012_PT_RES010200_RESOLCON_2021-07-21_1-FULL_XBRL |
# Retrieves only the lines containing the identifier sequence
grep -E '(xbrli:identifier|xbrli:instant)' |
# Uses Awk to join each 2 lines that comprise the identifier sequence
awk 'NR%2{printf "%s ",$0;next;}1' |
# 1-5: xbrl tags 6:schema 7:identifier 8-11:xbrl tags 12:instant
self 6 7 NF > tmp-IDENTIFIER-SEQUENCE
# 1:schema 2:identifier 3:instant

The result will be a temporary file containing all the identifier sequences used across the XBRL file:

http://standards.iso.org/iso/17442 DUMMYLEI123456789012 2021-12-31
http://standards.iso.org/iso/17442 DUMMYLEI123456789012 2021-12-31
http://standards.iso.org/iso/17442 DUMMYLEI123456789012 2021-12-31
http://standards.iso.org/iso/17442 DUMMYLEI123456789012 2021-12-31
(…)

The first step in the validation of the identifier sequence is to check whether it is unique across the file. Through a combination of uniq and lcnt, we then check if the result is different than 1. If it is, that means that there is more than one identifier sequence and that the file is invalid.

if [ $(cat tmp-IDENTIFIER-SEQUENCE | delf 1 | uniq | lcnt) != 1 ]

If there is only one sequence, we need to verify if the date is a valid date and if the XML schema corresponds to a link. Both can be achieved with the usage of some if conditions and some regular expressions. If the identifier sequence is valid, we can advance to the extraction of the values corresponding to financial transactions and operations in Euros.

Using the file corresponding to the parsed XBRL, we can use grep to extract the records that contain the monetary values and their corresponding information. In this case, we only care for values in Euros, so we use the Euro identifier that corresponds to ‘uEUR’ and store the information in a temporary file for further validation.

cat DUMMYLEI123456789012_PT_RES010200_RESOLCON_2021-07-21-FULL_XBRL |
# selects the lines that contain only the monetary values in Euros
grep ‘uEUR’ |
# 1:xbrli:xbrl 2:eba-tag 3:unitRef 4:uEUR 5:decimals-tag 6:decimals 7:contextRef 8:id 9:value
delf 1  >  tmp-NUMERICS
# 1:eba-tag 2:unitRef 3:uEUR 4:decimals-tag 5:decimals 6:contextRef 7:id 8:value

After creating this file, we focus on two distinct validations: Validation of missing fields/data and validation of non-numeric values. For the first, we check for missing fields using ccnt. We know that each correct record needs to have 8 columns. Therefore, if ccnt returns a value different than 8, this means that we are missing data, thus our report is invalid.

size=$(cat tmp-NUMERICS | ccnt)
# if the record size is not equal to 8, that means that there are missing values
if [[ $size != 8 ]];

If we pass the previous validation, then we check for non-numeric values. To achieve it, we first use self to select only the field corresponding to the monetary value. Then, using grep, we store in a temporary file each value that is not a number. If the temporary file is not empty, then there are non-numeric values, thus the report is considered invalid. If the file is empty, then every value is numeric, and we can create the final file.

cat tmp-NUMERICS |
# 1:eba-tag 2:unitRef 3:uEUR 4:decimals-tag 5:decimals 6:contextRef 7:id 8:value
self 5 |
# 1:value
# selects each record that doesn't have a numeric value on it and stores it in a NIL tmp file
grep -v "[0-9]" > tmp-NIL

if [ -s tmp-NIL ];
then
    # If the NIL tmp file is not empty, then it means that
    # there are non-numeric values and the report is invalid
    (…)
else
    # If the NIL tmp file is empty, then it means that
    # all values in the report are valid
    cat tmp-NUMERICS|
    # 1:eba-tag 2:unitRef 3:uEUR 4:decimals-tag 5:decimals 6:contextRef 7:id 8:value
    self 7 1 8 3 4 |
    # 1:id 2:eba-tag 3:value 4:uEUR 5:decimals
    awk 'BEGIN{print "id label/reason amount currency decimals"}1' |
    # adds header to file
    fcols > DUMMYLEI123456789012_PT_RES010200_RESOLCON_2021-07-21-NUMERIC
fi

And thus, we finish this example of XBRL file processing. We reach the end with a file in a tabular format containing only the information regarding financial transactions in Euros, which can be easily used by other processing or visualization software without the need of further parsing or validations. Also, the information is much more comprehensible than in the original XBRL file:

id     reason      amount   currency   decimals
c5   eba_met:mi53  2900000    uEUR       -3
c6   eba_met:mi53  9686000    uEUR       -3
c7   eba_met:mi53  6386000    uEUR       -3
c8  eba_met:mi235  9022000    uEUR       -3
c8  eba_met:mi236  7568000    uEUR       -3
(…)

Additional parsing can be executed in XBRL files along with the extraction of other specific fields and information. For example, one might want to extract all values regarding a specific topic, such as Risk Exposure or Market Values.  However, we decided to choose only this small example as a way of showing some of the possibilities of using Unicage to handle this type of files. If you want to find more about us and our technology, please feel free to contact us by filling the form bellow or by clicking in “Book Meeting”.

Find out more

Request a demo and speak with our team about how you can leverage the power of Unicage in your organization.

Privacy Policy