AS400 to Tabular

Simple and Efficient conversion with Unicage and Bash

AS400 files and Copybooks

In this article, we are going to show how we can use Unicage tools to convert data from files in mainframe AS400 format to Tabular text format. An AS400 file is a text file where each record is composed of a sequential line of text, that can contain several fields, but without any type of separator between those fields. This characteristic leads to an increased difficulty when reading the file that, usually, requires a translation medium. This translation medium will be a file called copybook. 

A copybook is a file that contains the structure of an AS400 file. In each line of the file, we have the definition of each field, ordered from first to last, the type of data that the field contains and its corresponding dimension. All this information is clearly structured, so that it can be easily understandable. An example of an AS400 file and its respective copybook file are shown below:

# Example of an AS400 file

0000000006207.210000001924EUR MV.0000000001 S4933113
0000000301852.060000060665EUR MV.0000000002 S8923221
0000000456970.340000090083EUR MV.0000000003 S8922129
# Example copybook
01 Mv_Detalhe.
05 Operacoes-montante                                PIC S9(13)V9(2).
05 Mv_Detalhe-Operacoes-quantidade          PIC 9(10).
05 Mv_Detalhe-divisa_operacao                        PIC X(3).
05 Mv_Detalhe-mv_id                                  PIC X(14).
05 Mv_Detalhe-setor_institucional                    PIC X(4).
05 Mv_Detalhe-tipo_conta                             PIC X(2).
05 Mv_Detalhe-tipo_movimento                        PIC X(1).
05 Mv_Detalhe-tipo_operacao                          PIC X(3).

As we can see by the file above, the fields are written in an ordered way, with levels and sublevels that help with the organization. The PIC keywords and the values that follow them correspond to the type of data that can be present in the field – X corresponds to alphanumeric data and 9 correspond to numeric data – and to the dimension of each field, respectively.

For example, the field “Mv_Detalhe-Operacoes-quantidade” has PIC 9(10), which means that it is a numeric field with 10 characters long, while the field “Mv_Detalhe-divisa_operacao” has PIC X(3), which means that it is an alphanumeric field with 3 characters long. Let’s look at a more complex field, for example field “Operacoes_montante” which has PIC S9(13)V9(2). This PIC value can be translated to the following: The S9(13) means that we have a numeric value, with 13 characters long, which can have a plus or minus sign, and the V9(2) indicates that the value has a decimal point with 2 more characters after it. In total, this field can occupy up to 17 characters. 

Our Solution

Now that we understand the basics of AS400 files and copybooks, we can start at looking at our solution to convert the AS400 files to tabular format.

Our key file will be the copybook, since it will be thanks to it that we can split the lines of data present in the AS400 file with the Unicage commands.

The first step will be selecting only the fields that have a PIC value associated with them. This is easily achieved with Unicage’s uawk command, which is an optimized version of GNU awk. With this command, we can select the lines that only have 2 fields – the name of the field and the corresponding PIC value. Then, we use Unicage’s self command to select the value present in the PIC field.

The next step is removing the ‘( )’ and ‘.’ from the PIC field, leaving only the numbers and the PIC and letters such as the ‘S’ or ‘V’.  Then based on the number of fields present on each line after this operation, we can know if the field has decimal places or not.

We apply some more transformations using uawk to generate a string of numbers that represents the field along with the position of the starting character and the lenght of that field. This will be crucial for the final step of the converter.

A code snippet of this explanation is shown bellow:

cat $copybook | 
# Replaces all tabs by spaces
sed 's/\t/ /g' |
# Selects only the records with more than 2 fields since these will correspond to the fields
# that actually have a size (PIC) and are part of the file
uawk 'NF>2' |
# 1:field_name 2:PIC 3:PIC_value/field_dimension
self NF |
# 1:PIC_value/field_dimension
# Removes the ( ) and .
fsed -e 's/[\(\)\.]/ /1' |
# Calculates the total dimension of each field.
# -> If the record has more than 2 fields, then we have a field with decimal places, so we have to sum
# the 2nd and 4th fields +1 to have place for the decimal separator.
# -> If the record only has 2 fields, then there is no need for additional operations.
uawk '{if(NF==2){print $2}else{print $2+$4+1}}' |
# 1:total_field_dim
# In each line, we shall have 1, number of starting character on string, length of the field
# this is needed to make the self command further ahead
uawk 'BEGIN{var=1}{print 1,var,$1; var+=$1}' |
# 1:1 2:starting_char 3:length
# replaces spaces by '.'
sed 's/ /\./g' > $dir_tmp/aux_$copybook_name
# 1: 1.starting_char.length

After that, we use store the values inside a variable that will be used as argument to Unicage’s self command. Self is a command that allows the manipulation of text fields in a file, including splitting a single field into several individual fields, which is exactly what we need to do if we want to convert from an AS400 file to a tabular file.

# Input for the final step of the converter. 
# We will have a line where each field will correspond to a field of the copybook, its starting position
# and the length of the field.
self_args=$(cat $dir_tmp/aux_$copybook_name | maker)
1.1.16 1.17.10 1.27.3 1.30.14 1.44.4 1.48.2 1.50.1 1.51.3

The final step will be the conversion of the AS400 file. To achieve this, we first use the Linux built-in sed command to replace all blank spaces by ‘_’ in the AS400 file so that we can garantee that we have a single field on each record, and then we use Unicage’s self  command with the variable that we have created in the previous step. This will split this single field into the several fields that are specified in the copybook, using their corresponding dimensions.

The code and the results are as follows:

# Converts to TABULAR
cat $as400_file |
# replaces all spaces by '_' so that each record has only a single column
sed 's/ /_/g' |
# applies self with the args that were calculated in the previous step in order to separate
# the data in the different fields
self $self_args |
# adds header
uawk -v var="$file_header" 'NR==1 {print var} 1' > $dir_conv/CONVERTED.$as400_file_name
Mv_Detalhe-Operacoes-montante Mv_Detalhe-Operacoes-quantidade Mv_Detalhe-divisa_operacao Mv_Detalhe-mv_id Mv_Detalhe-setor_institucional Mv_Detalhe-tipo_conta Mv_Detalhe-tipo_movimento Mv_Detalhe-tipo_operacao
0000000006207.21 0000001924 EUR _MV.0000000001 __S4 93 3 113
0000000301852.06 0000060665 EUR _MV.0000000002 __S8 92 3 221
0000000456970.34 0000090083 EUR _MV.0000000003 __S8 92 2 129

And that’s it!

With this simple and short program, made mostly of Bash and Unicage commands, we are able to efficiently convert an AS400 file to a tabular format. This conversion, will allow the user to easily understand the data present in the file, manipulate it and/or use it in a visualization software tool with ease.

If this example sparked your curiosity and you want to find more about us and our technology, please feel free to contact us by filling the form bellow or by Booking a Meeting.

Find out more

Request a demo and speak with our team about how you can leverage the power of Unicage in your organization.

Privacy Policy