How to combine legacy sensor data with new sensor data systems for reporting

Use Case (Part 2)

Taming data in different formats is a common challenge in any data pipeline. In what follows, we will consider a scenario in which such a challenge occurs: combining data coming from different types of sensors. Sensors from different manufacturers follow different standards, which demands the inclusion of a normalization procedure. This procedure must reconcile all the data formats, so that all the information potential can be extracted from the data.
 
In part 1, we:
  • Identified the differences between the two datasets from the two sensor types
  • Converted the JSON file to a tabular format SSV file
  • Translated field names that were in different languages
  • Completed the conversion to tabular format to create a complete data pipeline

In part 2, we will convert the xml file to tabular format, i.e., to the space-separated value (SSV) file format. This file format is very close to the csv file format and is the prefered file format byt the Unicage tools.

Using Unicage’s xmldir command, we can parse the xml file

[user@unicage]$ xmldir events/event model02_file.xml 
events event date 2021-04-21
events event time 11:05
events event equipment C012
events event item 87WV2
events event shelf upper
events event date 2021-04-21
events event time 03:23
...

where in the first two columns are the hierarchy tags of the xml file. Notice that the xmldir command requires the specification of the hierarchy of the xml file, which is done with the string “events/event”. We are not interested in the first two columns, so we discard them with Unicage’s self command

[user@unicage]$ xmldir events/event model02_file.xml | self 3 4
date 2021-04-21
time 11:05
equipment C012
item 87WV2
shelf upper
date 2021-04-21
time 03:23
...

At this stage, the data is very similar to the data obtained in part 1 of this series of articles, where we converted the data in json file to tabular form. The difference is that in this case we don’t have a number to identify each event. We add the numbering of the events by noticing that the data fields associated to a given event are aggregated in groups of 5, i.e., from line 1 to line 5 we have data that correspond an event, from line 6 to line 10 we have data that corresponds to another event, and so on. To implement this logic, we use the awk command

[user@unicage]$ xmldir events/event model02_file.xml | self 3 4 | 
awk 'BEGIN{event_number=1;event_counter=0}{if(event_counter == 5){event_counter = 0;event_number++}
else{event_counter++}; print event_number, $0}'
1 date 2021-04-21 1 time 11:05 1 equipment C012 1 item 87WV2 1 shelf upper 2 date 2021-04-21 2 time 03:23 ...

We can now save the data in a file named converted_xml_UF. Here is the complete pipeline of commands with comments:

# 
# **** PART 2 ****
# ================
#
# --- a) parse the xml file
xmldir events/event model02_file.xml                    |
# 1:hierarchy_tag 2:hierarchy_tag 3:field_name 4:field_value
# --- b) discard the hierarchy tags
self 3 4                                                |
# 1:field_name 2:field_value
# --- c) adding an event number to the data
awk 'BEGIN{event_number=1;event_counter=0}
          {if(event_counter == 5){event_counter = 1;event_number++}
          else{event_counter++};
          print event_number, $0}'                     > converted_xml_UF
# 1:event 2:field_name 3:field_value

Find out more

Request a demo and speak with our team about how you can leverage the power of Unicage in your organization.

Privacy Policy