DataFocus | star model & data model in multidimensional analysis of snowflake model

Today we’re going to talk about the two most common data models used in multi-dimensional business intelligence solutions: the star model and the snowflake model.

First, let’s look at the concepts of star schema and snowflake schema.

Star schema:

The star model is centered on the fact table. When all dimension tables are connected to the fact table, the whole diagram is like a star, so it is called star model. The basic form of the star schema must implement the multidimensional space (often referred to as squares) to use the basic functionality of the relational database. Domo and Qilk are modeled using star models in foreign markets.

Snowflake model

When there are multiple dimension tables that are not directly connected to the fact table but are connected to the fact table through other dimension tables, the graph is like a snowflake, hence the snowflake model. When the dimensions of the star schema need to be normalized, the star schema evolves into the snowflake schema. Our DataFocus is modeled using the snowflake schema.

Snowflake model in sales data warehouse

Therefore,rence between the star model and the snowflake model is whether the dimension tables are directly connected to the fact table or to other dimension tables.

So how do you choose between the two data models?

From the perspective of data optimization, star model architecture is an irregular structure, and there is no gradual dimension, so the data is somewhat redundant and the execution efficiency is relatively high. In the star model, dimensions refer directly to the fact table, and the business hierarchy is not deployed through referential integrity between dimensions; The snowflake model USES normalized data, meaning that the data is organized inside the database to eliminate redundancy, so it can effectively reduce the amount of data. Through referential integrity, its business hierarchy and dimensions are stored in the data model.

From the perspective of query performance: in the oltp-dw link, the performance is lower than that of star schema due to multiple table joins made by snowflake. However, from the dw-olap, the performance of the snowflake schema is higher than that of the star schema because it is more conducive to aggregation of measurements.

From the perspective of ETL operation: the snowflake model loads the data mart, so the ETL operation is more complex in design and cannot be parallelized due to the limitations of the attached model. The star model loads the dimension table without requiring any additional model dependencies between dimensions, so ETL is relatively simple and can be highly parallelized.

To sum up, let’s make a summary.

The star model works well if you don’t have a lot of disk space requirements, because it’s efficient to query.

However, the snowflake model can be used when the dimension table has a large amount of data and needs to save storage space, or when the business logic is more complex and the hierarchical concepts must be presented clearly.

In this era of exponential growth, the amount of data will increase in the future, so the snowflake model will become more and more practical.