ROW STORE and COLUMN STORE -- MEMORY USAGE http://debajitb.wix.com/debajitbanerjee/apps/blog/compari http://debajitb.wix.com/d ebajitbanerjee/apps/blog/comparison-between-row-and-columnson-between-row-and-column-store store With the new Business Suite, load times have been cut by a factor of 20 and batch programming eliminated completely, he explained, and mathematical functions have been slimmed down in the quest for faster updates, with a goal of making three second response times the norm. But if we look into within the database, why & how it is fast? - Exploitation of current hardware developments - Main Memory is the New Disk - Non-Uniform Memory Access (NUMA) - Multi-core processor parallelism - Efficient communication between database layer with the application layer - Pushing more application semantics into data management layer - Data compression achieves a reduction in disk space - Different compression techniques, Light-Weight/Heavy-Weight - Compression-aware query execution - Data-Dependent Optimization SAP HANA Database supports Row Store and Column Store. Some of the HANA Administrative tables in row store (e.g. SYS schema and tables from Statistics Server) whereas other Administrative tables in column store(e.g. _SYS_BI, _SYS_BIC, _SYS_REPO schema, etc.). Transactional data stored in the physical tables of SAP HANA Database used for Analytical purposes. HANA Analytical Data Modeling is only possible for Columnar Tables; i.e, Information Modeler only works works with column storage tables. Replication Server(SLT) and Data Services create tables in column store by default. If transaction data is stored in a column-based table, then it enables - fast on-the-fly aggregations, - ad-hoc reporting. How data stored in column format? How it differs from row format? Here, the below diagram will help you to understand very easily.
In lab, I have tried out the following: Comparison between ROW store and COLUMN store table [but not for ECC on HANA, this is purely a HANA Database in SAP HANA Appliance] I have created TWO tables in HANA database. First Table, CDHDR_ROW CDHDR_ROW : Table type in Row Format Second Table, CDHDR_COLUMN CDHDR_COLUMN : Table type in Column Format Filled both the tables with same data/records. No of records : 8,413,932
After data load into CDHDR_ROW CDHDR_ROW and CDHDR_COLUMN CDHDR_COLUMN from flat file, within HANA result result shows (I didn’t do anything extra):
Now you can see, ROW stored CDHDR_ROW table occupies how much memory & disk-size ? One thing, ROW store table automatically loaded into memory when HANA starts-up. But ROW store table can not be unloaded from memory.
Whereas, columnar table CDHDR_COLUMN taking very less memory & disk-size. Again, later I have converted ROW store table CDHDR_ROW into COLUMNAR table and record the output:
Now, both tables are in column store.
Now you can see, it occupies less memory and disk-size compared to its ROW based table state.
Demystifying the Column Store: Store Statistics in SAP HANA and the Benefits for Business Insights http://www.agilityworks.co.uk/our-blog/demystifying-the-column-store-%E2%80%93-viewingcolumn-store-statistics-in-sap-hana-and-understanding-the-benefits-for-business-insights/
Although there have been a number of articles already written about the column store in SAP HANA, it’s still a question that gets asked regularly, particularly around data r econstruction. The aim of this blog is to explain: 1. How we can access the statistics behind a table stored in column store format within SAP HANA, such as the main memory size and compression rate. 2. The workings of a column store, including data reconstruction. 3. Some of the advantages of the column store for business insights and analytics. In order to discuss these aims, let’s work with an example of a row based table from a typic al relational database. We will load the data into SAP HANA in order to store it in a column store table within main memory. A sample of the table we will work with is in Figure 1. In total we have 650,216 records. It shows New York Stock Exchange Data. For each company that trades (stock symbol) it shows the prices for each day.
Figure 1 –Source table with 9 sample records
In SAP HANA, we can choose to store the table in a row store or column store, both of which are stored in memory. For this blog we have loaded the same data into two separate tables, one row store, one column store, in order to analyse them both. The size of a table in main memory can be seen in its definition view (see Figure 2 for how to access this in the HANA Studio) as well as the type of store it uses.
Figure 2 – Accessing the table definition If we loaded the table into row store the table is essentially stored as-is from source. The screenshot in Figure 3 shows the size of the table in memory (45,948 Kb = 44.9 Mb) for our 650,216 records. No fields in a column have been merged/compressed.
Figure 3 – Seeing the table definition of the row store table If we loaded the data into column store the size of the table in memory is 15,754 Kb (15.4 Mb) which can be seen in Figure 4. This size is significantly lower than the memory size of the same table loaded into row store. The way the data is stored in column store to enable this reduced size is discussed shortly.
Figure 4 – Seeing the table definition of the column store table Compression has occurred against t he columns. We can see the compression rate against the columns of the table in Figure 5. This info is also visible in the table definition view within the HANA Studio.
Figure 5 – Seeing the table definition of the column store table But how does the compression in column store actually work? SAP HANA has a number of different compression techniques (Run-length encoding, Cluster encoding and Dictionary encoding) deep down in its inner workings. We will be looking theoretically at how Dictionary Encoding works. For each column in the table, the database is storing a Dictionary and an Attribute Vector . We’ll take a look at both of these objects in Figure 6 for our Stock Symbol field based on our table sample of 9 records.
Figure 6 – The original column for Stock Symbol and how it is now represented in column store format The next question to answer is how we reconstruct the row when querying the data in column store. Let’s suppose we want to run a query showing us the sum of the Stock Volume for the Stock Symbol ABB. First we would read the dictionary of the Stock Symbol to get the Value ID for ABB (value ID 2) and use the inverted index to access the record IDs that use Value ID 2 (records 1, 4 & 7). See reference 7 for a visual reference of this.
Figure 7 – Reading of the Stock Symbol Dictionary and Index for ABB records Then with the index of the Stock Volume we can map the records (1, 4 & 7) to the Stock Volume Value IDs (4, 5 & 6) and then map these to the actual Stock Volumes values in the dictionary (see Figure 8). Then we apply the sum function against the values to give us 8,804,800.
Figure 8 – Reading of the Stock Volume Dictionary and Index Key thing with our query is that we only read the columns we needed, i.e. Stock Symbol and Stock Volume. From a business insights and analytics perspective, here are some of the main benefits in us being able to store our data in column store format. Store more data, in more detail than before We can achieve very efficient compression because a large proportion of fields have a low number of distinct values, that are used many times, which we can merge together. The compressed data means the storage space is reduced, meaning more main memory is available whilst helping to eliminate data redundancy. We can store large detailed and historic data sets that were too large for previous data marts and warehouses from both storage and read/write performance perspectives. Quicker access to this larger and more detailed dataset The compressed data can be moved quicker from main memory into the CPUs in order to be processed. SAP HANA already allows data to be moved to the CPU quickly because everything is in the main memory layer and not from disk. With the compressed data the transfer from main memory to the CPU can happen even quicker, meaningful rapid response times for users. Only read what we need When querying the data we only read the columns we need. Research shows¹ that we only use a small number of columns/fields within a table. Also, we would typically want to scan the whole column for each field (e.g. Sum the Sales Revenue, for all Countries). In row store we read all of the columns regardless of whether we actually need them. Column store eliminates unnecessary scanning. Conclusion Of course, we need to be aware that the write performance of a column s tore compared to row store isn’t as good but we can counter this with the current differentiation of read and write areas within the column store (and the subsequent delta merge process). SAP is developing the capabilities of SAP HANA in order to improve write performance into the column store. Ultimately from a business insights perspective we no longer need to compromise or make a decision on what type of data (detailed or aggregated, or at which level of aggregation) we need to store in our analytical database. Neither do we need to think about the most common granularity of data access requests in order to build are indexes optimally (e.g. is most retail data accessed by region and material category or by store and material?). With SAP HANA we can
store the complete dataset as it is and the column store technology will ensure everything is optimised for storage and retrieval (we do still need however to think about lifecycle of data, we shouldn’t load data that isn’t going to be needed). The column store is removing the boundaries of what’s possible, ultimately meaning better insights leading to better decisions.