More Data, Less Space

A lighter local repository

Informed decisions rely on robust analytics. And robust analytics are supported by data which is well-organized, clean and readily available. In order to keep data within “arm’s reach”, companies keep a repository of data out of databases. With the increasing amount of data present in these repositories, how can one optimize the resources set-aside for them?

Unicage’s approach to this challenge is to find a compromise between small storage space and ease of access. All the files handled by the Unicage system are plain-text files, which are the “go-to solution” when it comes to storing data out of databases. But plain-text files can come in different flavours. The Unicage system attains its maximum effciency when the plain-text files are in a “space-separated-value (SSV)” format. This format is a highly efficient solution to minimize the space used by data.

Let’s take the example of an XML file. When XML files are converted to SSV files, the xml-tags become unnecessary. This entails a reduction of storage space needed for the local repository of data. In general, XML files organize data as follows:

<COLUMN .1> data_value_1_1 </ COLUMN .1>
<COLUMN .2> data_value_2_1 </ COLUMN .2>
<COLUMN .1> data_value_1_2 </ COLUMN .1>
<COLUMN .2> data_value_2_2 </ COLUMN .2>
<COLUMN .1> data_value_1_n </ COLUMN .1>
<COLUMN .2> data_value_2_n </ COLUMN .2>

After conversion to SSV format, the data looks like: 

COLUMN .1          COLUMN .2
data_value_1_1 data_value_2_1
data_value_1_2 data_value_2_2
data_value_1_n data_value_2_n

For each column we have the data value, the opening <TAG> and the closing </TAG>. Considering the minimum size tag of 1 character and a minimum size data value of 1 digit, the information in an XML file with two different tags can be stored using approximately 80% less space in a SSV file. This figure becomes even more dramatic for files with more tags with longer names.

The use of SSV file format as a standard for storing data outside databases saves space, while keeping data promptly available. This makes the Unicage system an ideal management system for any data repository.

