This blog will show you everything about Star Schema Data Modelling and its comparison with different types of Schemas including Star Schema.

  • Below are the three primary types of Multidimensional Schemes, each having its unique set of advantages.
    • Star Schema 
    • Snowflake Schema
    • Fact Constellation Schema

What is a Data Warehouse?

  • A Data Warehouse is a combined collection of data from various/many sources.
  • As a result, a Data Warehouse serves as a Single, Central, and consistent Data Storage that aids in Data Mining, Data Analysis, Machine Learning, Artificial Intelligence, and other processes.

What is a Star Schema Data Modelling?

  • This schema is commonly used to create a Data Warehouse and Dimensional Data Marts.
  • It consists of one or more Fact Tables that index an unlimited number of Dimensional Tables.
  • The Snowflake Schema cannot exist without the Star Schema. It’s also good at answering simple questions. Star Schema Data Modelling has a lot of benefits and helps in data modeling for warehouses.

1) Example of Star Schema Data Modelling

It’s called a Star because its physical model is shaped like a Star, with a Fact Table in the center and Dimension Tables on the periphery symbolizing the star’s points. Here’s an example of the Star Schema in action:

2) What are Fact Tables? 

There are two types of columns in a Fact Table: 

  • Fact Columns and Foreign Keys to the Dimension Table
  • Fact Tables’ Primary Key is usually a Composite Key made up of all of the Foreign Keys.

A Fact Table could contain either Detail Level Facts or Aggregated Facts (Fact Tables that include Aggregated facts are often instead called Summary Tables). Facts with the same level of aggregation are usually found in a Fact Table.

SALES is a Fact Table with properties such as (Product ID, Order ID, Customer ID, Employer ID, Total, Quantity, Discount) that refer to Dimension Tables in the example above. 

3) What are Dimension Tables?

  • A Dimension is a Data Classification Architecture made up of one or more hierarchies. A Flat Dimension or List is one that does not have any hierarchies or levels.
  • The Composite Primary Keys of the Fact Table include the Primary Keys of each Dimension Table. The Dimensional Value is defined by Dimensional Qualities.
  • They’re typically textual values that describe something. Fact Tables are frequently larger than dimensional tables. Data about sales is stored in Fact Tables, whereas data about the Geographic Region (markets, cities), customers, items, times, and channels are stored in Dimension Tables.

How does a Star Schema Data Modelling work?

  • Numeric values and dimension attribute values are both stored in the fact table. As an example, consider the following:
  • Numeric value cells are unique to each row or data point, and they have no correlation or relationship to data in other rows. These could be transactional details like the order ID, total amount, net profit, order quantity, or exact time.
  • The foreign key value for a row in a related dimensional table is stored in the dimension attribute values, rather than data. This type of information will be referenced in many rows of the fact table. It might store the sales employee ID, a date value, a product ID, or a branch office ID, for example.
  • The fact table’s supporting data are stored in dimension tables. Each star schema database has at least one dimension table, but many more are frequently present. Each dimension table will be linked to a fact table column containing a dimension value and will store additional information about that value.

How to Querying the Star Schema Data Modelling?

  • To submit report requests, end-users will most likely use business intelligence tools. Each request may result in a series of SELECT statements that are required to build the entire report in Oracle.
  • Your users’ business intelligence tool, for example, could have its own server process that processes the returned SQL SELECT results using advanced OLAP functions and logic not yet available in Oracle.
  • As a result, you’ll only be able to see the intermediate queries by scanning the SGA as the DBA. But if we can speed up the intermediate queries, the overall report performance will improve.
SELECT prod.category_name,
       sum (fact.sales_unit) Units,
       sum (fact.sales_retail) Retail
FROM   pos_day              fact,
       period               per,
       location             loc,
       product              prod
WHERE  fact.period_id     = per.period_id
  AND  fact.location_id   = loc.location_id
  AND  fact.product_id    = prod.product_id
  AND  per.levelx         = 'DAY'
  AND  per.period_month   = 12
  AND  per.period_year    = 1998
  AND  loc.levelx         = 'STORE'
  AND  loc.city           = 'DALLAS'
  AND  loc.state          = 'TX'
  AND  prod.levelx        = 'ITEM'
  AND  prod.category_name in ('BEER','COFFEE')
GROUP BY prod.category_name;

as a result of:

CATEGORY_NAME                 UNITS         RETAIL
---------------------- ------------ --------------
BEER                         11,613      64,490.81
COFFEE                       22,808      20,462.92

Even this example reveals important information about what the DBA can expect from queries against star schema warehouses. A star schema SELECT will, in general, do the following:

  • Make use of GROUP functions and, as a result, GROUP BY.
  • Include a JOIN of one or more dimensions with a fact.
  • Use dimension columns to create a lot of WHERE restrictions.
  • Scans a large number of rows to return a small number of results.

What are the Characteristics of Star Schema Data Modelling?

  • Business Process Data, which contains quantitative data about a company, is distributed in Fact Tables and Dimensions, which are descriptive features associated with Fact Data, in Star Schema. 
  • In the Star Schema, Fact Data includes things like Sale Price, Number, Distance, Speed, Height, and Weight measures.
  • Centipede Schema is the word used to describe a Star Schema with several dimensions. A Star Schema with few attributes dimensions is simple to manage.
  • In a Star Schema, each Dimension is represented by a Single One-Dimension Table.
  • The collection of Attributes should be stored in the Dimension Table.
  • Using a Foreign Key, the Dimension Table is linked to the Fact Table.
  • The Dimension Tables are not connected in any way.
  • The Star structure is simple to Comprehend and Optimises Disc Consumption.
  • There is no normalization in the Dimension Tables. 
  • BI Tools generally support the Star Schema.

1) Simple & Efficient Querying

  • In comparison to other join logic needed to fetch data from a Transactional Schema that is well normalized, Star Schema joins logic is a breeze.
  • Since a Star Schema Database has fewer Tables and clearer join pathways, queries perform faster than on OLTP systems.
  • Small single-table searches, like those involving a Dimension Table, are nearly instantaneous. Large join queries involving numerous tables can be completed in seconds or minutes.
  • Only the core Fact Table connects the Dimensions in a Star Schema Database System. When a two-dimension table is utilized in a query, there is only one join path between the two tables, which intersects the Fact Tables. This design component ensures that query results are genuine and consistent.

2) Streamlined Business Reporting Logic

  • In comparison to a highly standardized Transactional Schema, the Star Schema simplifies basic Business Reporting logic such as-of reporting and period-over-period reporting.

3) In-built Referential Integrity

When Data is imported into a Star Schema, Referential Integrity is built-in. Since each row in a Dimensional Table has a unique Primary Key, and all keys in the Fact Table are authentic Foreign Keys derived from the Dimension Table, Referential Integrity is guaranteed. The correct key value cannot be assigned to a record in the Fact Table that is not appropriately associated with a Dimension.

4) Maintenance & Load Performance

  • The time it takes to load big batches of records into a Star Schema Database is also reduced by Structural Simplicity.
  • The influence of a load structure is decreased by providing Facts and Dimensions and breaking them into different tables.
  • The Dimension Table just has to be filled once and then renewed on a regular basis. By appending records to a Fact Database, you can add new facts on a regular and selective basis.

5) Easy to Understand

  • A Star Schema is easy to understand and navigate, with simply the Fact Table connecting the Dimensions.
  • Since they represent the essential relationship between components of the underlying business, these joins are more important to the end-user. Before generating a query, customers can browse Dimension Table properties.

What are the Disadvantages of Star Schema Data Modelling?

  • Since the Schema is severely de-normalized, Data Integrity is not properly enforced.
  • As a normalized Data Model, it is not as versatile in terms of analytical demands.
  • Star Schemas don’t usually encourage many-to-many linkages among Business Entities.

Explore how to choose between Star and Snowflake Schemas to enhance your data modeling approach for warehouses effectively.

What is a multi-dimensional schema for Data Warehouses?

  • Star Schema 
  • Snowflake Schema
  • Fact Constellation Schema

How to Define a Multi-Dimensional Schema?

Data Mining Query Language is used to define the Multidimensional Schema (DMQL). Cube Definition and Dimension Definition are two primitives that can be used to define Data Warehouses and Data Marts.

Cube Definition Syntax

define cube < cube_name > [ < dimension-list > }: < measure_list >

Dimension Definition Syntax

define dimension < dimension_name > as ( < attribute_or_dimension_list > )

Syntax for Star Schema

The Star Schema discussed can be expressed as follows in Data Mining Query Language (DMQL):

define cube sales star [time, item, branch, location]:   

dollars sold = sum(sales in dollars), units sold = count(*)      

define dimension time as (time key, day, day of week, month, quarter, year)

define dimension item as (item key, item name, brand, type, supplier type)       

define dimension branch as (branch key, branch name, branch type)             

define dimension location as (location key, street, city, province or state, country)

Star Schema vs Other Schemas for Data Warehouse Modelling

1) Star Schema vs Snowflake Schema 

  1. In a Data Warehouse, a Snowflake Schema is the logical arrangement of Tables in a Multidimensional Database that resembles a Snowflake shape on the ER diagram.
  2. A Snowflake Schema is a Star Schema that has been expanded to include more dimensions.
  3. The data is split into new tables after the dimension tables are standardized.

Following is a key difference between Star Schema vs Snowflake Schema:

Star SchemaSnowflake Schema
Hierarchies for the dimensions are stored in the dimensional table.Hierarchies are divided into separate tables.
It contains a fact table surrounded by dimension tables.One fact table is surrounded by a dimension table which is in turn surrounded by a dimension table
In a star schema, only a single join creates the relationship between the fact table and any dimension tables.A snowflake schema requires many joins to fetch the data.
Simple DB Design.Very Complex DB Design.
Denormalized Data structure and queries also run faster.Normalized Data Structure.
High level of data redundancyVery low-level data redundancy
The single Dimension table contains aggregated data.Data Split into different Dimension Tables.
Cube processing is faster.Cube processing might be slow because of the complex join.
Offers higher-performing queries using Star Join Query Optimization. Tables may be connected with multiple dimensions.The Snowflake schema is represented by a centralized fact table which is unlikely connected with multiple dimensions.

2) Star Schema vs Fact Constellation Schema 

  • As you can see in the example below, there are two tables of facts.
    • Revenue 
    • Product

Following is a key difference between Star Schema vs Fact Constellation Schema:

Star SchemaFact constellation schema
A star schema depicts each dimension with only a one-dimension table.While in this, dimension tables are shared by many fact tables.
In a star schema, tables can be maintained easily in comparison to a fact constellation schema.While in fact constellation schema, tables cannot be maintained easily comparatively.
Star schema does not use normalization.Whereas it is a normalized form of star and snowflake schema.
In a star schema, simple queries are used to access data from the database.While in this, heavily complex queries are used to access data from the database.
Star schema is easy to operate as compared to fact constellation schema as it has less number of joins between the tables.While fact constellation schema is not easy to operate as compared to star schema as it has many joins between the tables.
Star schema uses less space as compared to fact constellation schema.While fact constellation schema uses more space comparatively.
It is very simple to understand due to its simplicity.While it is very difficult to understand due to its complexity.
Table Source

Helpful Resources on Star Schema Data Modelling

Conclusion

  • Star Schema is a straightforward and efficient data modeling approach for building Data Warehouses and Dimensional Data Marts.
  • It simplifies querying and business reporting, ensures data integrity, and supports faster data loading.
  • Despite its advantages, it may suffer from data redundancy and limitations in handling complex relationships compared to more normalized schemas like Snowflake or Fact Constellation.
  • Understanding these trade-offs helps in choosing the most suitable schema for specific analytical and reporting needs in data warehouse design.
Harsh Varshney
Research Analyst, Hevo Data

Harsh is a data enthusiast with over 2.5 years of experience in research analysis and software development. He is passionate about translating complex technical concepts into clear and engaging content. His expertise in data integration and infrastructure shines through his 100+ published articles, helping data practitioners solve challenges related to data engineering.