Columnstore

Table TBL used in the example:

CREATE TABLE TBL
(
  I INT,  -- integer column
  T TEXT  -- text column
 );

When the above “create table” statement is executed Sprout adds number of columns + 1 rows into the repository of entities: one for table entity and one for each column entity. Each entity, row and value in the database is assigned a unique numeric identifier with help of DBMS auto-increment sequence. In the example the sequences will start with entity=0, row=10, and value=100. After Sprout installation the following exists in its repository:

-- column COMMAND_HEX - address in Sprout interpreter
SELECT * FROM COMMANDS
;
COMMAND_HEX | DESCRIPTION
==============================
         00 | 'root'
         0A | 'table'
         0B | 'column'
         0C | 'columnar flag y/n'

-- help views for use in the example
CREATE VIEW ENTITY_VIEW AS
 (
  SELECT S.ASCENDANT_ID, E.ENTITY_ID, E.COMMAND_HEX
  FROM ENTITIES E INNER JOIN STRUCTURES S
    ON E.ENTITY_ID = S.ENTITY_ID
  ORDER BY S.ORDER
 );

CREATE VIEW VALUE_VIEW AS
 (
  SELECT VALUE_ID, TEXT, INT, BOOL FROM 
   (
    SELECT VALUE_ID, VALUE_TEXT, null FROM VALUES_TEXT
    UNION
    SELECT VALUE_ID, null, VALUE_INT, null FROM VALUES_INT
    UNION
    SELECT VALUE_ID, null, null, VALUE_BOOL FROM VALUES_BOOL
   ) ALL_VALUES
 );

After the “create table” statement is executed tables of Sprout repository contain the following data:

SELECT * FROM ENTITY_VIEW
;
-- root record identifiable where ascendant_id = entity_id
-- Sprout installation contains 1 root record
-- entity_id: root=0, TBL=1, I=2, T=3
ASCENDANT_ID | ENTITY_ID | COMMAND
================================== 
           0 |         0 |      00
           0 |         1 |      0A
           1 |         2 |      0B
           1 |         3 |      0B

SELECT * FROM VALUE_VIEW
;
VALUE_ID | TEXT  | INT | BOOL
=============================
     100 | 'TBL' |     |
     101 | 'I'   |     |
     102 | 'T'   |     |

SELECT * FROM ROWS
;
ROW_ID | ENTITY_ID
==================
    10 |         1
    11 |         2
    12 |         3

SELECT * FROM CELLS
;
ROW_ID | ENTITY_ID | VALUE_ID
==============================
    10 |         1 |      100
    11 |         2 |      101
    12 |         3 |      102

In case of rows 10, 11, and 12 there exists a condition:

ROWS.ENTITY_ID = CELLS.ENTITY_ID AND ROWS.ROW_ID = CELLS.ROW_ID

The interpreter uses it condition to differentiate between single value data nodes and stores of column data. It is true for single value data nodes as in the example above is used to store table and column names. When storing column data of a table the condition is false.

Methods of storing data columns: integral and separable. Integral data column is stored in the main repository and occupies physically same allocation along with other data nodes and columns and their metadata descriptions. Separable data column is stored in a physically separate from the main repository schema section and is intended for large volumes of data.

Insert a row with integral data column storage in both columns of the table TBL:

INSERT INTO TBL VALUES
(1, 'T')
;

Data in the repository will be applied as follows:

SELECT * FROM VALUE_VIEW
;
-- 'T' - existing value, skipped
--   0 - new value, added
VALUE_ID | TEXT  | INT | BOOL
=============================
     100 | 'TBL' |     |
     101 | 'I'   |     |
     102 | 'T'   |     |          
     103 |       |   0 |          

SELECT * FROM ROWS
;
-- table row (rows.entity_id <> cells.entity_id)
ROW_ID | ENTITY_ID
==================
    10 |         1
    11 |         2
    12 |         3
    13 |         1

SELECT * FROM CELLS
;
ROW_ID | ENTITY_ID | VALUE_ID
==============================
    10 |         1 |      100
    11 |         2 |      101
    12 |         3 |      102
    13 |         2 |      103
    13 |         3 |      102

Change storage of column T to separable:

ALTER TABLE TBL ALTER COLUMN T SET separable = TRUE; 

Metadata description of the table changes:

SELECT * FROM ENTITY_VIEW
;
-- entity_id=4 instructs the interpreter that 
-- T is a separable column
ASCENDANT_ID | ENTITY_ID | COMMAND
================================== 
           0 |         0 | 00        
           0 |         1 | 0A        
           1 |         2 | 0B        
           1 |         3 | 0B
           3 |         4 | 0C

After the table structure is updated a data column object is created and the data is moved (in one transaction):

-- data column T is now stored in the table named
-- COL_<column identifier>_<column datatype>
SELECT * FROM COL_3_TXT
;
ROW_ID | VALUE_TXT
==================
    13 | 'T'

SELECT * FROM VALUE_VIEW
;
-- value_id=04 added after "alter table"
VALUE_ID | TEXT  | INT | BOOL
=============================
     100 | 'TBL' |     |
     101 | 'I'   |     |
     102 | 'T'   |     |
     103 |       |   0 |
     104 |       |     | TRUE

SELECT * FROM ROWS
;
ROW_ID | ENTITY_ID
==================
    10 |         1
    11 |         2
    12 |         3
    13 |         1
    14 |         4

SELECT * FROM CELLS
;
ROW_ID | ENTITY_ID | VALUE_ID
==============================
    10 |         1 |      100
    11 |         2 |      101
    12 |         3 |      102
    13 |         2 |      103
    14 |         4 |      104

Part of the Sprout repository table schema used in the example:

colstore

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s