The communication between financial regulators and institutions, such as banks and central banks, is crucial for the world’s economy. For the communication between these entities to be efficient, a standard file format that could encapsulate all information was needed to replace the old paper-based reporting. The answer was the XBRL format, which stands for Extensible Business Reporting Language. XBRL is a file format based on the XML language and focuses on storing data for business and financial reports, allowing an accurate and efficient reporting between businesses and financial institutions.
In this set of articles, we are going to tackle how we can use Unicage to validate and extract information contained within XBRL financial reports used by the European Banking Authority (EBA). EBA uses a vast set of validation rules to check if a XBRL report is valid or not. However, this process is not trivial, not only due to the structure of the XBRL files, but also due to its contents which require specific validations. These validations are generally made by third party software that banks and businesses use in order to check their XBRL reports.
Taking this into account, we decided to choose a small set of validation rules used by EBA and make a small demonstration to show that Unicage can also be a powerful tool to be used in XBRL validation and processing.
We are going to start by validating the namespaces used in the document. The namespaces are a technical header declared at the begin of the file and contains essential information for the whole report, from the XML schema information to information regarding the different taxonomies used within the report. Here is a small excerpt of this header:
<xbrli:xbrl xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xbrli="http://www.xbrl.org/2003/instance" xmlns:link="http://www.xbrl.org/2003/linkbase" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xbrldi="http://xbrl.org/2006/xbrldi" xmlns:iso4217="http://www.xbrl.org/2003/iso4217" xmlns:eba_model="http://www.eba.europa.eu/xbrl/ext/model" (…)
With some small transformations using tr and in combination with Unicage’s makec command, we can store the namespaces into a file where each record is a namespace. Then, after sorting each record with msort, we use cjoin0 with the +ng option to save in a temporary file each namespace that is not present in the Master file that contains the list of valid namespaces for each type of report.
# Retrieves only the namespace information line tail -n+3 DUMMYLEI123456789012_PT_RES010200_RESOLCON_2021-07-21.xbrl | head -n1 | # Removes the '<' and '>' delimeters of the line tr -d '<' | tr -d '>' | # Removes '\r' if they exist tr -d '\r' | # Makes a record with each namespace makec -1 num=1 | # 1:xbrli_tag 2:namespace delf 1 | # 1:namespace # Removes possible blank spaces sed '/[[:blank:]]/d' | # sorts and saves to tmp file msort key=1 > tmp-NAMESPACES
Now, we check if the temporary file created is empty. If it is not empty, then there are invalid namespaces present, and the report is considered invalid. If the temporary file is empty, then the namespaces are valid, but we must check if the number of namespaces is equal to the number of namespaces present in the Master file. In case of being different, then the file has less namespaces than required, thus it is invalid. To do this, we can use lcnt command to in order to obtain the number of namespaces in each file.
The next step before we parse the full XBRL files is to validate the indicators. Indicators are the tables that are reported in the financial reports and each type of report has a group of characteristic tables. Thus, we need to validate if each reported table belongs to that type of report and if there are no repeated tables. Here is an example of the indicators declared in a XBRL report:
<find:fIndicators> <find:filingIndicator contextRef="c1">T_01.00</find:filingIndicator> <find:filingIndicator contextRef="c1">T_02.00</find:filingIndicator> <find:filingIndicator contextRef="c1">T_03.01</find:filingIndicator> <find:filingIndicator contextRef="c1">T_03.02</find:filingIndicator> <find:filingIndicator contextRef="c1">T_03.03</find:filingIndicator> (…) </find:fIndicators>
First, we parse the XBRL file extracting only the section that comprises the indicators. This can be easily made with Unicage’s xmldir command since the XBRL language and structure is based on XML. Then, using self, we can select the indicators and store them in a temporary file for validation.
tail -n+3 DUMMYLEI123456789012_PT_RES010200_RESOLCON_2021-07-21.xbrl | # removes schema info for easy reading sed -e 's/^<xbrli:xbrl.*/<xbrli:xbrl>/' | # removes empty lines sed '/^[[:space:]]*$/d' | # fetches the indicators using xmldir xmldir xbrli:xbrl/find:fIndicators - | # 1-4:xbrl tags 5:id 6:indicator self 6 > tmp-INDICATORS
In this validation, we check for repeated tables in the report through a combination of the uniq and lcnt commands.
# Checks if there are repeated values if [ $(cat tmp-INDICATORS | uniq -d | lcnt) != 0 ]
If the same table is reported more than once, then the file is invalid. If there are no repeated values, then we follow the same rules as the namespace validation: we use cjoin0 with the +ng option to check is the reported tables for that type of report are correct according to a Master file containing the valid tables for each type of report.
And this way, we conclude the initial validations of the XBRL report. This way, we can make an initial division between valid and invalid XBRL reports before we can start parsing the files.