Why is the processing speed so fast?
Each command is written in C language, and the input/output buffer, memory manipulation and calculation algorithm have been designed to allow high-speed processing.
The Shell uses kernel functions directly. By removing middleware there is no processing.
Unicage Shell Script programming methodology avoids slow variable type programming and functional programming and it follows the data flow programming that takes advantage of the processing speed of each command.
In Unicage Shell programming we organize the data in advance for increased performance. Unicage developed high-speed commands for complex sorting.
Comparison with mainstream technologies
Performance benchmark vs mainstream data storage and data processing frameworks (Spark, SparkSQL, Kudu/ HDFS, Hadoop)
Research studies in prestigious higher education institutions both in the United States (MIT), Japan (Kanazawa University) and Europe (IST Lisbon) show results ranging from 3 to 50 times faster.
Gains in development productivity
Unicage provides a significant reduction of lines of code (depending on the language to be converted, for example Cobol application re-writing is a 20:1 ratio). Unicage is also easy to read and understand and provides easily measurable auditing to the usage of the system.
How is the data organized at the storage level (column,row)?
Data is stored in a UNIX file system as a regular flat text file format. The Unicage methodology consists of the way data is organized and then processed by our proprietary commands. The uniqueness of the solution lies in its ability to utilize flat text files instead of requiring middleware or relational database engines that decrease the speed of execution.
How is the data distributed across data units?
Parallel processing is done on a leader-follower model where flat text files are divided according to the logic of the script across all the available nodes. Those nodes execute the script and finally they are merged and consolidated into a new text file with the result of the execution.
Data file management (levels)
Unicage organizes data files into business units along five levels:
- Level 1 (event data)
- Level 2 (confirmed data)
- Level 3 (organized data)
- Level 4 (application reference data
- Level 5 (application output data)
Unicage does not require the existence of any special file management software
How does the Unicage system join files?
Unicage joins multiple files by using the “join command” such as join1 (inner join) and join2 (outer join).
It is possible to generate a new file that incorporates data in the sales data file and the item leader file.
In Unicage, many commands are provided for each join function.
|Sorting||Inner Join||Outer Join||Multi Join||Full Join|
Access control of data files
In DB, DBMS lock automatically the physical record for exclusive access control. On the other hand, in Unicage, you explicitly specify the range that you want to process exclusively, using the unlock command in your program.
Backup of data files
Backup commands such as tar, cpio in Linux can be used.
For example, you want to backup the directory user1 in the home folder.
# cd /home
# tar cvf /def /nst0 user1
Security of application
One can develop applications (access tools) with access rights by using the “getpermission” command, which is enabled to read participation of the permission by referring to the table.
Security of data
Non-developers cannot directly access text files of Unicage, without specific permission. Developers can also be restricted from access, as one can change the settings of the OS (for example, use the SELinux settings). Likewise, the system administrator (the person who set the OS) may be allowed to access those files, encrypting those files so that the security administrator’s password is needed.
Interoperability, locking, compression and memory management
Unicage is based on UNIX fundamentals. Interoperability through frameworks such as Tivoli (IBM) or JP1 (Hitachi).
Unicage does not require locking unless a concurrent situation is present. For those purposes, the command “ulock” is available.
Unicage is compatible with multiple compression tools. For example .gz .Z is used for data compression.
Unicage processes data based on streaming, which decreases the memory usage comparatively to technologies such as java or python.
Security (authentication, authorization and encryption)
Unicage utilizes segregation mechanisms of the underlying UNIX Operating System: filesystem permissions, memory stack protection and role-based access controls.
Encryption can be achieved on several levels – either native filesystem encryption mechanisms (F2FS in Linux or ZFS in BSD/UNIX) or 3rd party mechanisms offered by a number of vendors. Self-encrypting disk is also a possibility as Unicage just uses the Operating System POSIX infrastructure to access the data storage.
Security can be increased through File checksumming tools (such as native capacity or products like Tripwire) and rule-based firewall.
A typical Linux/UNIX node running Unicage will only need SSH as an open port – this can be ensured by service minimization (disabling and/or uninstalling unnecessary services). This is implemented by a host-based firewall, which will permit access only from known hosts. SSH access is restricted to a number of known users (no Administration/Super User access is conceded).
Essential configuration files are then protected by checksumming to ensure no alteration of content.
File systems can be encrypted to prevent data loss and processes will run only with needed privileges inside protected memory.
Parallelism and Scalability
UNIX parallel processing
UNIX is a multi-user, multi-tasking OS. You can run multiple jobs for multiple users at the same time.
- Parallel processing commands used:
- Specify “& (background)“ when the job starts, to parallelize the job
- “bg” or ”fg” commands switch between parallelizing and sequencing along the processing
- “nice“ command changes the priority of parallel processing
- ”stop“ or ”kill” commands interrupt or stop the job
- “jobs” or “ps” or “tree” commands allow monitoring the parallel processing
Above mentioned job control commands, allow for writing a shell script to perform parallel processing in any number of processes.
Partitioning, indexing and concurrency
There are no special requirements on partitioning. Nodes can be independent servers or virtualized (i.e Docker). The only recommendation we provide is to leave 10% disk space available for our data operations inside the disk.
There are no special requirements for indexing as everything is executed on the UNIX file system as a text file.
Unicage is based on UNIX fundamentals where multithreading, concurrent and exclusive processes are allowed. Our suite of commands contains blocking commands as well as atomic writing.
Scalability and workload management support
Unicage scales quasi-linearly with extra hardware. The cluster version commands provide an automatic map and reduce process.
Workload Management support
There are multiple frameworks that control UNIX processes. As an example for general workload management commands such as “ulimit” or “nice” can be used.
Handling of image files
Use the image processing command that has been published in the UNIX / Linux distribution (examples for ImageDisk functions):
- Conversion of the image format
convert <original file.extension> <file name after conversion.extension> (e.g. to convert from JPG to PPM use convert test.jpg test.ppm)
In the conversion of the image format, image in color can be converted to a format in monochrome (e.g. convert test.jpg test.pgm)
- Image scaling
Use “convert –scale” command: convert -scale 30% test.jpg test.ppm
convert -crop ordinate of upper-left, abscissa of upper-left+width+height filename newfilename
- Create animated GIF from a plurality of images
convert image1 image2 image3 … output_animation.gif
- Binding of a plurality of images
Create a catalog screen for a plurality of images montage Image1 Image2 Image3