Processing Financial Reports in XBRL

Part 1

The communication between financial regulators and institutions, such as banks and central banks, is crucial for the world’s economy. For the communication between these entities to be efficient, a standard file format that could encapsulate all information was needed to replace the old paper-based reporting. The answer was the XBRL format, which stands for Extensible Business Reporting Language. XBRL is a file format based on the XML language and focuses on storing data for business and financial reports, allowing an accurate and efficient reporting between businesses and financial institutions.

In this set of articles, we are going to tackle how we can use Unicage to validate and extract information contained within XBRL financial reports used by the European Banking Authority (EBA). EBA uses a vast set of validation rules to check if a XBRL report is valid or not. However, this process is not trivial, not only due to the structure of the XBRL files, but also due to its contents which require specific validations. These validations are generally made by third party software that banks and businesses use in order to check their XBRL reports.

Taking this into account, we decided to choose a small set of validation rules used by EBA and make a small demonstration to show that Unicage can also be a powerful tool to be used in XBRL validation and processing.

We are going to start by validating the namespaces used in the document. The namespaces are a technical header declared at the begin of the file and contains essential information for the whole report, from the XML schema information to information regarding the different taxonomies used within the report.  Here is a small excerpt of this header:

<xbrli:xbrl xmlns:xsi=""

With some small transformations using tr and in combination with Unicage’s makec command, we can store the namespaces into a file where each record is a namespace. Then, after sorting each record with msort, we use cjoin0 with the +ng option to save in a temporary file each namespace that is not present in the Master file that contains the list of valid namespaces for each type of report.

# Retrieves only the namespace information line
tail -n+3 DUMMYLEI123456789012_PT_RES010200_RESOLCON_2021-07-21.xbrl | head -n1 |
# Removes the '<' and '>' delimeters of the line
tr -d '<' | tr -d '>' |
# Removes '\r' if they exist
tr -d '\r' |
# Makes a record with each namespace
makec -1 num=1 |
# 1:xbrli_tag 2:namespace
delf 1 |
# 1:namespace
# Removes possible blank spaces
sed '/[[:blank:]]/d' |
# sorts and saves to tmp file
msort key=1 > tmp-NAMESPACES

Now, we check if the temporary file created is empty. If it is not empty, then there are invalid namespaces present, and the report is considered invalid. If the temporary file is empty, then the namespaces are valid, but we must check if the number of namespaces is equal to the number of namespaces present in the Master file. In case of being different, then the file has less namespaces than required, thus it is invalid.  To do this, we can use lcnt command to in order to obtain the number of namespaces in each file.

The next step before we parse the full XBRL files is to validate the indicators. Indicators are the tables that are reported in the financial reports and each type of report has a group of characteristic tables. Thus, we need to validate if each reported table belongs to that type of report and if there are no repeated tables. Here is an example of the indicators declared in a XBRL report:

    <find:filingIndicator contextRef="c1">T_01.00</find:filingIndicator>
    <find:filingIndicator contextRef="c1">T_02.00</find:filingIndicator>
    <find:filingIndicator contextRef="c1">T_03.01</find:filingIndicator>
    <find:filingIndicator contextRef="c1">T_03.02</find:filingIndicator>
    <find:filingIndicator contextRef="c1">T_03.03</find:filingIndicator>

First, we parse the XBRL file extracting only the section that comprises the indicators. This can be easily made with Unicage’s xmldir command since the XBRL language and structure is based on XML. Then, using self, we can select the indicators and store them in a temporary file for validation.

tail -n+3 DUMMYLEI123456789012_PT_RES010200_RESOLCON_2021-07-21.xbrl |
# removes schema info for easy reading
sed -e 's/^<xbrli:xbrl.*/<xbrli:xbrl>/' |
# removes empty lines
sed '/^[[:space:]]*$/d' |
# fetches the indicators using xmldir
xmldir xbrli:xbrl/find:fIndicators - |
# 1-4:xbrl tags 5:id 6:indicator
self 6 > tmp-INDICATORS

In this validation, we check for repeated tables in the report through a combination of the uniq and lcnt commands.

# Checks if there are repeated values
if [ $(cat tmp-INDICATORS | uniq -d | lcnt) != 0 ]

If the same table is reported more than once, then the file is invalid. If there are no repeated values, then we follow the same rules as the namespace validation: we use cjoin0 with the +ng option to check is the reported tables for that type of report are correct according to a Master file containing the valid tables for each type of report.

And this way, we conclude the initial validations of the XBRL report. This way, we can make an initial division between valid and invalid XBRL reports before we can start parsing the files.

Find out more

Request a demo and speak with our team about how you can leverage the power of Unicage in your organization.

Privacy Policy