Understanding Star Schema in Data Warehouse Modelling

What’s the simplest way to model complex data?

It’s Star Schema!

A star schema is a data modelling technique that organizes data in data warehouses to streamline analytical query performance. The schema’s structure includes a central fact table linked to multiple dimension tables. This intuitive modelling technique is widely preferred due to its simplicity and support for fast querying.

It benefits users by simplifying complex database relationships and adapting to online analytical processing (OLAP) models. Some key advantages include:

Data management: Separation of facts and dimensions for organizing data storage.
Easy-to-use: A straightforward layout encourages self-service analytics.
Analytics: Enables robust trend analysis and informed decision-making.

In this article, we’ll discuss star schema, its workflow, advantages & disadvantages, and other types of schemas. Whether you’re new to data warehousing or a professional, this blog will help you understand the essentials of the star schema.

What is Star Schema?

A star schema, also known as a star model, is a dimensional data modelling approach used in data warehouses and business intelligence (BI) systems to organize data in a simplified manner, facilitating fast and efficient querying of large datasets.

As mentioned above, it contains two tables: fact and dimension.

Fact table: The fact table stores quantitative data like transaction counts and sales amounts.

Dimension table: The dimension table stores descriptive attributes such as places, products, and people.

Here’s an example:

In a hospital, the fact table stores the number of visits and treatment costs with dimensions like age, patient ID, and insurance type. The fact table joins with relevant dimension tables to assist analysts in querying total visits based on the department and the consultant. Therefore, the separation between facts and dimensions enhances data management, facilitating efficient data analysis.

Users can query data with multiple perspectives:

Total number of visits by patient age group
Average treatment cost for treatment type
Number of beds used by department type

What Are the Components of a Star Schema?

The origin of the name “star schema” becomes clearer when you map out its structure. Its structure comprises four components interlinked with each other.

The key components are:

Fact table

In the star structure, the fact table is the centerpiece of the schema. This component stores measurable data termed “facts” or “measures”, representing a specific business event or transaction.

The primary aspects of the fact table include:

Measures: Referred to as “facts”, these are numeric values that illustrate key business metrics, including sales amount, revenue, profit, and units sold. These measures or facts quantify the business process and streamline analytics.
Granularity: Each row in the fact table demonstrates data at the lowest level of the stack required for analysis. Examples include sales per transaction or product.
Dimensions: Each fact table contains dimension key columns linked to multiple dimension tables. These columns provide additional information about the event.
Primary key: The fact table features a primary key, often a surrogate key (an artificially generated unique identifier), and a composite key (made up of multiple foreign keys).

Dimension table

Following the structure of the star schema, the dimension table surrounds the fact table, providing standardized information required to categorize events and objects for detailed analysis.

Key points include:

Function: The primary function of the dimension table is to filter, group, and summarize the data in the fact table along with the measures.
Descriptive attributes: These attributes are categorical or textual fields describing the business entities, such as customer demographics, location, dates, and product names. These entities “slice and dice” the data in the fact table through a foreign key relationship.
Denormalized layout: Dimension tables are denormalized, which means they store descriptive datasets in a single table to simplify querying and enhance performance.
Hierarchy: The table often facilitates a natural hierarchy to enable drill-down and roll-up analysis (OLAP operations).
Primary key: Dimension tables also have a primary key to form a relationship with the foreign key of the fact table.

Keys

The entire schema workflow relies on keys that join the fact table with the dimension table to enable efficient analysis. The key features include:

Primary key: Each dimension table must have a primary key that relates to the foreign key in the fact table. Whereas, the fact table must have a primary “composite” key, a combination of foreign keys.

The rule is that each foreign key of the fact table should have its counterpart in a dimension table.

Foreign key: In general, a foreign key is a column or a group of columns consisting of values that align with the primary key of another table. For instance, the fact table contains foreign key columns that match the primary keys of the dimension tables.
Composite key: As mentioned above, the fact table contains a composite key. This ensures each row in the fact table is unique based on the combination of dimension values.

Attribute hierarchies

Attributes are nothing but the descriptive columns in the dimension table. When these attributes are organized in a hierarchical structure, it is referred to as an attribute hierarchy.

These hierarchies are either implemented within a single dimension table or by tethering to additional tables representing distinct levels, such as separate tables illustrating months and quarters regarding the original time dimension.

For example, a time dimension might have a hierarchy: Year → Quarter → Month → Day. This allows analysts to view data at different granularities, such as total sales by year, quarter, or month.

To sum it up, keys are a foundational linkage path between fact and dimension tables that maintain data integrity and support multi-dimensional analysis.

What are the Advantages of Star Schema?

1) Simple & Efficient Querying

Star Schema join logic is effortless in comparison to other join logics required to fetch data from a well-normalized Transactional Schema.
Since a Star Schema Database has fewer Tables and clearer join pathways, queries perform faster than on OLTP systems.
Small single-table data searches that involve a dimension table are nearly instantaneous. Whereas, large join queries involve numerous tables that are completed in seconds or minutes.
Only the core Fact Table connects the Dimensions in a Star Schema Database System. When a two-dimension table is utilized in a query, there is only one join path between the two tables, which intersects the Fact Tables. This design component ensures that query results are genuine and consistent.

2) Streamlined Business Reporting Logic

In comparison to a highly standardized Transactional Schema, the Star Schema simplifies basic Business Reporting logic, such as of reporting and period-over-period reporting.

3) In-built Referential Integrity

When data is imported into a star schema, referential integrity is inherently maintained. Since each row in a dimensional table has a unique primary key, and all keys in the fact table are authentic foreign keys derived from the dimension table, referential integrity is guaranteed. The correct key value cannot be assigned to a record in the fact table that is not appropriately associated with a dimension.

4) Maintenance & Load Performance

Structural simplicity reduces the time taken to load large batches of records into a star schema database.
The influence of a load structure is decreased by breaking down the Facts and Dimensions into different tables.
The Dimension Table just has to be filled once and then renewed on a regular basis. By appending records to a Fact Database, you can add new facts on a regular and selective basis.

5) Easy to Understand

A Star Schema is easy to understand and navigate by simply connecting the Fact Table with multiple Dimensions.
Since they represent the essential relationship between components of the underlying business, these joins are more important to the end-user. Before generating a query, customers can browse Dimension Table properties.

What are the Disadvantages of Star Schema Data Modelling?

Since the schema is severely de-normalized, data integrity is not properly enforced.
As a normalized data model, it is not as versatile in terms of analytical demands.
Star Schemas don’t usually encourage many-to-many linkages among business entities.

Explore how to choose between Star and Snowflake Schemas to enhance your data modeling approach for warehouses effectively.

Real-Life Example of a Star Schema

Given the strategic design, the star schema is used in finance, healthcare, retail, and other industries. It is mainly leveraged for data warehousing and BI solutions. Here we consider a retail scenario as a real-life example for better understanding.

Retail store sales analysis

The star schema structure for retail positions the fact table at the centre of the schema to record sales transactions. The dimension tables surround the fact table, providing the required descriptive data for each transaction. This structure will support robust querying to monitor sales trends and product performance.

The foreign key establishes a relationship between the fact table and each dimension table, associating every sales transaction with descriptive data sourced from the dimension table.

Star Schema Modelling Example — Image Source

Here’s how the fact and dimension tables work together:

Fact table: Each row of the fact table demonstrates a single sale and includes:

Transaction_ID: Unique identifier
Product_ID: Foreign key depicting the product sold
Customer_ID: Foreign key depicting the store location
Date_ID: Foreign key depicting the date of sale

Dimension table: Each column of the dimension table provides details to filter, group, and label facts.

Dim_Date: Information about dates, such as day, month, year, and week.
Dim_Product: Defined products including category, name, brand, and price.
Dim_Store: Provides store information like store name, city, and manager.
Dim_Customer: Customer specifications like name, city, and phone number.

Now, if a retail analyst wants the answer to:

“What were the total sales of shoes in Toronto in November?”

The analyst can feed a query that binds with the Fact_Sales table with pertinent dimension tables, optimizing their keys. The query filters for:

Category: Shoes
City: Toronto
Month: November

This approach boosts efficiency, as the star schema reduces the number of joins and maintains straightforwardness in queries with growing data volume. Retailers can easily craft and understand queries without requiring any deep technical knowledge.

How does a Star Schema Data Modelling work?

Numeric values and dimension attribute values are both stored in the fact table. As an example, consider the following:

Numeric value cells are unique to each row or data point, and they have no correlation or relationship to data in other rows. These could be transactional details like the order ID, total amount, net profit, order quantity, or exact time.

The foreign key value for a row in a related dimensional table is stored in the dimension attribute values, rather than data. This type of information will be referenced in many rows of the fact table. It might store the sales employee ID, a date value, a product ID, or a branch office ID, for example.

The fact table’s supporting data are stored in dimension tables. Each star schema database has at least one dimension table, but many more are frequently present. Each dimension table will be linked to a fact table column containing a dimension value and will store additional information about that value.

How to Query the Star Schema?

To submit report requests, end-users will most likely use business intelligence tools. Each request may result in a series of SELECT statements that are required to build the entire report in Oracle.
Your users’ business intelligence tool, for example, could have its own server process that processes the returned SQL SELECT results using advanced OLAP functions and logic not yet available in Oracle.
As a result, you’ll only be able to see the intermediate queries by scanning the SGA as the DBA. But if we can speed up the intermediate queries, the overall report performance will improve.

SELECT prod.category_name,
       sum (fact.sales_unit) Units,
       sum (fact.sales_retail) Retail
FROM   pos_day              fact,
       period               per,
       location             loc,
       product              prod
WHERE  fact.period_id     = per.period_id
  AND  fact.location_id   = loc.location_id
  AND  fact.product_id    = prod.product_id
  AND  per.levelx         = 'DAY'
  AND  per.period_month   = 12
  AND  per.period_year    = 1998
  AND  loc.levelx         = 'STORE'
  AND  loc.city           = 'DALLAS'
  AND  loc.state          = 'TX'
  AND  prod.levelx        = 'ITEM'
  AND  prod.category_name in ('BEER','COFFEE')
GROUP BY prod.category_name;

as a result of:

CATEGORY_NAME                 UNITS         RETAIL
---------------------- ------------ --------------
BEER                         11,613      64,490.81
COFFEE                       22,808      20,462.92

Even this example reveals important information about what the DBA can expect from queries against star schema warehouses. A star schema SELECT will, in general, do the following:

Make use of GROUP functions and, as a result, GROUP BY.
Include a JOIN of one or more dimensions with a fact.
Use dimension columns to create a lot of WHERE restrictions.
Scans a large number of rows to return a small number of results.

What are the Characteristics of Star Schema Data Modelling?

Business Process Data, which contains quantitative data about a company, is distributed in Fact Tables and Dimensions, which are descriptive features associated with Fact Data, in Star Schema.
In the Star Schema, Fact Data includes things like Sale Price, Number, Distance, Speed, Height, and Weight measures.
Centipede Schema is the word used to describe a Star Schema with several dimensions. A Star Schema with few attributes dimensions is simple to manage.
In a Star Schema, each Dimension is represented by a Single One-Dimension Table.
The collection of Attributes should be stored in the Dimension Table.
Using a Foreign Key, the Dimension Table is linked to the Fact Table.
The Dimension Tables are not connected in any way.
The Star structure is simple to Comprehend and Optimises Disc Consumption.
There is no normalization in the Dimension Tables.
BI Tools generally support the Star Schema.

Star Schema vs Other Schemas for Data Warehouse Modelling

1) Star Schema vs Snowflake Schema

In a Data Warehouse, a Snowflake Schema is the logical arrangement of Tables in a Multidimensional Database that resembles a Snowflake shape on the ER diagram.
A Snowflake Schema is a Star Schema that has been expanded to include more dimensions.
The data is split into new tables after the dimension tables are standardized.

Following is a key difference between Star Schema vs Snowflake Schema:

Star Schema	Snowflake Schema
Hierarchies for the dimensions are stored in the dimensional table.	Hierarchies are divided into separate tables.
It contains a fact table surrounded by dimension tables.	One fact table is surrounded by a dimension table which is in turn surrounded by a dimension table
In a star schema, only a single join creates the relationship between the fact table and any dimension tables.	A snowflake schema requires many joins to fetch the data.
Simple DB Design.	Very Complex DB Design.
Denormalized Data structure and queries also run faster.	Normalized Data Structure.
High level of data redundancy	Very low-level data redundancy
The single Dimension table contains aggregated data.	Data Split into different Dimension Tables.
Cube processing is faster.	Cube processing might be slow because of the complex join.
Offers higher-performing queries using Star Join Query Optimization. Tables may be connected with multiple dimensions.	The Snowflake schema is represented by a centralized fact table which is unlikely connected with multiple dimensions.

2) Star Schema vs Fact Constellation Schema

As you can see in the example below, there are two tables of facts.
- Revenue
- Product

Following is a key difference between Star Schema vs Fact Constellation Schema:

Star Schema	Fact constellation schema
A star schema depicts each dimension with only a one-dimension table.	While in this, dimension tables are shared by many fact tables.
In a star schema, tables can be maintained easily in comparison to a fact constellation schema.	While in fact constellation schema, tables cannot be maintained easily comparatively.
Star schema does not use normalization.	Whereas it is a normalized form of star and snowflake schema.
In a star schema, simple queries are used to access data from the database.	While in this, heavily complex queries are used to access data from the database.
Star schema is easy to operate as compared to fact constellation schema as it has less number of joins between the tables.	While fact constellation schema is not easy to operate as compared to star schema as it has many joins between the tables.
Star schema uses less space as compared to fact constellation schema.	While fact constellation schema uses more space comparatively.
It is very simple to understand due to its simplicity.	While it is very difficult to understand due to its complexity.

Table Source

Helpful Resources on Star Schema Data Modelling

Making Data Work with Star Schema

Star Schema is a straightforward and efficient data modeling approach for building data warehouses and dimensional data marts.

It simplifies querying and business reporting, ensures data integrity, and supports faster data loading.

Despite its advantages, it may suffer from data redundancy and limitations in handling complex relationships compared to more normalized schemas like Snowflake or Fact Constellation.

Understanding these trade-offs helps in choosing the most suitable schema for specific analytical and reporting needs in data warehouse design.

Want to automate your data pipelines and load data into your warehouse in a star schema-friendly format? Try Hevo, a no-code platform that helps you move and transform data reliably. Start your 14-day free trial.

Frequently Asked Questions

What is a star schema in data modeling?

A star schema is a data modeling technique used in data warehouses where a central fact table links to multiple dimension tables, making queries fast and reporting simple.

What is the difference between snowflake and star data modeling?

Star schema uses denormalized dimension tables for simpler, faster queries, while snowflake schema normalizes them to reduce redundancy but adds complexity.

What is the difference between ERD and star schema?

ERD (Entity-Relationship Diagram) models operational databases with complex relationships, while star schema is designed for analytical databases with a central fact table and simpler, denormalized dimensions.

Does OLAP use star schema?

Yes, OLAP systems commonly use star schemas for fast, efficient querying and reporting.

Harsh Varshney Research Analyst, Hevo Data

Harsh is a data enthusiast with over 2.5 years of experience in research analysis and software development. He is passionate about translating complex technical concepts into clear and engaging content. His expertise in data integration and infrastructure shines through his 100+ published articles, helping data practitioners solve challenges related to data engineering.

Understanding Star Schema: Star Schema in Data Warehouse Modelling

What is Star Schema?