Star Schema Data Modelling for Warehouses: The Complete Guide 101

on Data Modelling, Data Storage, Data Warehouses, Dimensional Data Modelling • January 27th, 2022 • Write for Hevo

Star Schema for Warehouse Data Modelling | Hevo Data

Data Warehousing Systems were the inspiration for Multidimensional Schema. The schemas are designed to fulfill the needs of large &complex databases that are utilized for analytical reasons (OLAP). Data Warehouse Schemas exist in a range of sizes and shapes. Below are the three primary types of Multidimensional Schemas, each having its unique set of advantages.

  • Star Schema 
  • Snowflake Schema
  • Fact Constellation Schema

This blog will show you everything about the Star Schema Data Modelling and its comparison with different types of Schemas.

Table of Contents

What is a Data Warehouse?

A Data Warehouse is a combined collection of data from various/many sources. As a result, a Data Warehouse serves as a Single, Central, and consistent Data Storage that aids in Data Mining, Data Analysis, Machine Learning, Artificial Intelligence, and other processes. A Data Warehouse is a storehouse for current and historical data that has been gathered. It is a type of information technology that is at the heart of a company’s Business Intelligence Architecture.

The Data Warehouse is your company’s Central Data Repository. Since a Data Warehouse is the “Single Source of Truth” for all the data in the company, it is required for all businesses who seek to make data-driven choices. The ETL (Extract Load Transform) method is used to save data from numerous sources such as APIs, Databases, Cloud Storage, and so on. The best Data Warehouses in the market are Google BigQuery, Amazon Redshift, etc.

Key Features of a Data Warehouse

To emphasize the two most essential features of Data Warehousing, consider the following.

  • Richer Data: This eliminates the issues of Inconsistency and Data Accessibility. With consistent and relevant data from diverse data sources, the Data Warehouse becomes a Single Source of Truth.
  • Faster Decisions: The ready-to-use data combined with the built-in analytics tool makes it the ideal option for making fact-based decisions quickly. Decision-makers can place a high level of trust in their decisions since they are based on complete, high-quality data, decreasing risk.

What is a Star Schema Data Modelling?

Among the DWH Schema, the Star Schema Data Modelling is the most basic and straightforward. This schema is commonly used to create a Data Warehouse and Dimensional Data Marts. It consists of one or more Fact Tables that index an unlimited number of Dimensional Tables. The Snowflake Schema cannot exist without the Star Schema. It’s also good at answering simple questions. Star Schema Data Modelling has a lot of benefits and helps in data modeling for warehouses.

1) Example of Star Schema Data Modelling

It’s called a Star because its physical model is shaped like a Star, with a Fact Table in the center and Dimension Tables on the periphery symbolizing the star’s points. Here’s an example of the Star Schema in action:

Star Schema: Example | Hevo Data
Image Source

2) What are Fact Tables? 

In a Star Schema, a Table that includes Facts and is linked to Dimensions is known as a Fact Table. There are two types of columns in a Fact Table: 

  • Fact Columns and Foreign Keys to the Dimension Table
  • Fact Tables’ Primary Key is usually a Composite Key made up of all of the Foreign Keys.

A Fact Table could contain either Detail Level Facts or Aggregated Facts (Fact Tables that include Aggregated facts are often instead called Summary Tables). Facts with the same level of aggregation are usually found in a Fact Table.

SALES is a Fact Table with properties such as (Product ID, Order ID, Customer ID, Employer ID, Total, Quantity, Discount) that refer to Dimension Tables in the example above. 

3) What are Dimension Tables?

A Dimension is a Data Classification Architecture made up of one or more hierarchies. A Flat Dimension or List is one that does not have any hierarchies or levels. The Composite Primary Keys of the Fact Table include the Primary Keys of each Dimension Table. The Dimensional Value is defined by Dimensional Qualities. They’re typically textual values that describe something. Fact Tables are frequently larger than dimensional tables. Data about sales is stored in Fact Tables, whereas data about the Geographic Region (markets, cities), customers, items, times, and channels are stored in Dimension Tables.

The following attributes are found in the Employee Dimension Table in the above example:  Employee ID, Employee Name, Title, Department, and Region. Product ID, Product Name, Product Category, and Unit Price are all attributes in the Product Dimension Table. Customer ID, Customer Name, Address, City, and Zip are all properties in the Customer Dimension table. Order ID, Order Date, Year, Quarter, and Month are all properties in the Time Dimension table.

How does a Star Schema Data Modelling work?

Numeric values and dimension attribute values are both stored in the fact table. As an example, consider the following:

Numeric value cells are unique to each row or data point, and they have no correlation or relationship to data in other rows. These could be transactional details like the order ID, total amount, net profit, order quantity, or exact time.

The foreign key value for a row in a related dimensional table is stored in the dimension attribute values, rather than data. This type of information will be referenced in many rows of the fact table. It might store the sales employee ID, a date value, a product ID, or a branch office ID, for example.

The fact table’s supporting data are stored in dimension tables. Each star schema database has at least one dimension table, but many more are frequently present. Each dimension table will be linked to a fact table column containing a dimension value and will store additional information about that value.

Consider the following scenario:

  • The employee dimension table can have information like the employee’s name, gender, address, or phone number, and it can use the employee ID as a key value.
  • A product dimension table can hold data such as the product name, manufacturing cost, color, and first date on market.
star schema: working example
Image Source

A star schema should be built with care by organizations. Avoid mixing fact and dimension data in each table. To maximize performance, take into account the total number of dimension tables. In order to optimize for the types of queries that will be run, consider the granularity of the data captured. Determine whether the exact time or the date should be used and whether monetary values should be recorded to the dollar or rounded to the thousandth place, for example.

Star schemas are optimized for querying large data sets, data warehouses, and data marts. They support OLAP cubes, analytic applications, ad hoc queries, and business intelligence (BI). They also allow you to quickly aggregate a large number of fact records using count, sum, average, and other methods. These aggregations can be filtered and grouped (sliced and diced) by users. Users can quickly create queries like “find all sales records for June” or “get the total revenue for the Texas office from 2020.”

How to Querying the Star Schema Data Modelling?

To submit report requests, end-users will most likely use business intelligence tools. Each request may result in a series of SELECT statements that are required to build the entire report in Oracle. Your users’ business intelligence tool, for example, could have its own server process that processes the returned SQL SELECT results using advanced OLAP functions and logic not yet available in Oracle. As a result, you’ll only be able to see the intermediate queries by scanning the SGA as the DBA. But if we can speed up the intermediate queries, the overall report performance will improve.

The following is an example of a common user query: In December 1998, how much beer and coffee did we sell in Dallas? A typical ad-hoc business intelligence intermediate query sent to Oracle, based on our example dimensional model in Figure 4-1, might look like this:

SELECT prod.category_name,
       sum (fact.sales_unit) Units,
       sum (fact.sales_retail) Retail
FROM   pos_day              fact,
       period               per,
       location             loc,
       product              prod
WHERE  fact.period_id     = per.period_id
  AND  fact.location_id   = loc.location_id
  AND  fact.product_id    = prod.product_id
  AND  per.levelx         = 'DAY'
  AND  per.period_month   = 12
  AND  per.period_year    = 1998
  AND  loc.levelx         = 'STORE'
  AND  loc.city           = 'DALLAS'
  AND  loc.state          = 'TX'
  AND  prod.levelx        = 'ITEM'
  AND  prod.category_name in ('BEER','COFFEE')
GROUP BY prod.category_name;

as a result of:

CATEGORY_NAME                 UNITS         RETAIL
---------------------- ------------ --------------
BEER                         11,613      64,490.81
COFFEE                       22,808      20,462.92

Even this example reveals important information about what the DBA can expect from queries against star schema warehouses. A star schema SELECT will, in general, do the following:

  • Make use of GROUP functions and, as a result, GROUP BY.
  • Include a JOIN of one or more dimensions with a fact.
  • Use dimension columns to create a lot of WHERE restrictions.
  • Scans a large number of rows to return a small number of results.

In contrast, SQL generated by a business intelligence tool may target the incorrect level of table summarization (i.e., example query should have used the month table instead of the day table, and therefore offers opportunities for tuning or utilizing query rewrites).

That’s all there is to it when it comes to star schema queries. In reality, they’re just table searches using some lookup tables. For years, Oracle has been able to handle similar queries, but the sheer size of data warehouses makes this something entirely different.

What are the Characteristics of Star Schema Data Modelling?

  • Business Process Data, which contains quantitative data about a company, is distributed in Fact Tables and Dimensions, which are descriptive features associated with Fact Data, in Star Schema. 
  • In the Star Schema, Fact Data includes things like Sale Price, Number, Distance, Speed, Height, and Weight measures.
  • Centipede Schema is the word used to describe a Star Schema with several dimensions. A Star Schema with few attributes dimensions is simple to manage.
  • In a Star Schema, each Dimension is represented by a Single One-Dimension Table.
  • The collection of Attributes should be stored in the Dimension Table.
  • Using a Foreign Key, the Dimension Table is linked to the Fact Table.
  • The Dimension Tables are not connected in any way.
  • The Star structure is simple to Comprehend and Optimises Disc Consumption.
  • There is no normalization in the Dimension Tables. 
  • BI Tools generally support the Star Schema.

Simplify Data Analysis with Hevo’s No-code Data Pipeline!

Hevo Data, a No-code Data Pipeline, helps you directly transfer data from 100+ data sources to Data Warehouses, BI tools, or a destination of your choice in a completely hassle-free & automated manner.

Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready format without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.

Get Started with Hevo for Free

Check out why Hevo is the Best:

  1. Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  2. Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  3. Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  4. Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  5. Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  6. Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, E-Mail, and support calls.
  7. Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

1) Simple & Efficient Querying

In comparison to other join logic needed to fetch data from a Transactional Schema that is well normalized, Star Schema joins logic is a breeze. Since a Star Schema Database has fewer Tables and clearer join pathways, queries perform faster than on OLTP systems. Small single-table searches, like those involving a Dimension Table, are nearly instantaneous. Large join queries involving numerous tables can be completed in seconds or minutes.

Only the core Fact Table connects the Dimensions in a Star Schema Database System. When a two-dimension table is utilized in a query, there is only one join path between the two tables, which intersects the Fact Tables. This design component ensures that query results are genuine and consistent.

2) Streamlined Business Reporting Logic

In comparison to a highly standardized Transactional Schema, the Star Schema simplifies basic Business Reporting logic such as-of reporting and period-over-period reporting.

3) In-built Referential Integrity

When Data is imported into a Star Schema, Referential Integrity is built-in. Since each row in a Dimensional Table has a unique Primary Key, and all keys in the Fact Table are authentic Foreign Keys derived from the Dimension Table, Referential Integrity is guaranteed. The correct key value cannot be assigned to a record in the Fact Table that is not appropriately associated with a Dimension.

4) Maintenance & Load Performance

The time it takes to load big batches of records into a Star Schema Database is also reduced by Structural Simplicity. The influence of a load structure is decreased by providing Facts and Dimensions and breaking them into different tables. The Dimension Table just has to be filled once and then renewed on a regular basis. By appending records to a Fact Database, you can add new facts on a regular and selective basis.

5) Easy to Understand

A Star Schema is easy to understand and navigate, with simply the Fact Table connecting the Dimensions. Since they represent the essential relationship between components of the underlying business, these joins are more important to the end-user. Before generating a query, customers can browse Dimension Table properties.

What are the Disadvantages of Star Schema Data Modelling?

  • Since the Schema is severely de-normalized, Data Integrity is not properly enforced.
  • As a normalized Data Model, it is not as versatile in terms of analytical demands.
  • Star Schemas don’t usually encourage many-to-many linkages among Business Entities.

What is a MultiDimensional Schema for Data Warehouses?

Multidimensional Schema was created with Data Warehousing Systems in mind. The Schemas are tailored to meet the specific requirements of very large Databases used for analytical purposes (OLAP). Schemas for Data Warehouses come in a variety of shapes and sizes. The three main forms of Multidimensional Schemas are listed below, each with its own set of benefits.

  • Star Schema 
  • Snowflake Schema
  • Fact Constellation Schema

How to Define a MultiDimensional Schema?

Data Mining Query Language is used to define the Multidimensional Schema (DMQL). Cube Definition and Dimension Definition are two primitives that can be used to define Data Warehouses and Data Marts.

Cube Definition Syntax

define cube < cube_name > [ < dimension-list > }: < measure_list >

Dimension Definition Syntax

define dimension < dimension_name > as ( < attribute_or_dimension_list > )

Syntax for Star Schema

The Star Schema discussed can be expressed as follows in Data Mining Query Language (DMQL):

define cube sales star [time, item, branch, location]:   

dollars sold = sum(sales in dollars), units sold = count(*)      

define dimension time as (time key, day, day of week, month, quarter, year)

define dimension item as (item key, item name, brand, type, supplier type)       

define dimension branch as (branch key, branch name, branch type)             

define dimension location as (location key, street, city, province or state, country)

Star Schema vs Other Schemas for Data Warehouse Modelling

1) Star Schema vs Snowflake Schema 

In a Data Warehouse, a Snowflake Schema is the logical arrangement of Tables in a Multidimensional Database that resembles a Snowflake shape on the ER diagram. A Snowflake Schema is a Star Schema that has been expanded to include more dimensions. The data is split into new tables after the dimension tables are standardized.

The country is further standardized into a separate table in the following Snowflake Schema example.

Star Schema: Snowflake Schema Example | Hevo Data
Image Source

Following is a key difference between Star Schema vs Snowflake Schema:

Star SchemaSnowflake Schema
Hierarchies for the dimensions are stored in the dimensional table.Hierarchies are divided into separate tables.
It contains a fact table surrounded by dimension tables.One fact table is surrounded by a dimension table which is in turn surrounded by a dimension table
In a star schema, only a single join creates the relationship between the fact table and any dimension tables.A snowflake schema requires many joins to fetch the data.
Simple DB Design.Very Complex DB Design.
Denormalized Data structure and queries also run faster.Normalized Data Structure.
High level of data redundancyVery low-level data redundancy
The single Dimension table contains aggregated data.Data Split into different Dimension Tables.
Cube processing is faster.Cube processing might be slow because of the complex join.
Offers higher-performing queries using Star Join Query Optimization. Tables may be connected with multiple dimensions.The Snowflake schema is represented by a centralized fact table which is unlikely connected with multiple dimensions.
Table Source 

2) Star Schema vs Fact Constellation Schema 

A Fact Constellation Schema is made up of Two Fact tables that share Dimension Tables. It is also called Galaxy Schema. The Galaxy Schema is named for the way the Schema is regarded as a collection of stars. Conformed Dimensions are shared Dimensions in the Fact Constellation Schema.

As you can see in the example below, there are two tables of facts.

  • Revenue 
  • Product
Star Schema: Fact Constellation Schema Example | Hevo Data
Image Source

Following is a key difference between Star Schema vs Fact Constellation Schema:

Star SchemaFact constellation schema
A star schema depicts each dimension with only a one-dimension table.While in this, dimension tables are shared by many fact tables.
In a star schema, tables can be maintained easily in comparison to a fact constellation schema.While in fact constellation schema, tables cannot be maintained easily comparatively.
Star schema does not use normalization.Whereas it is a normalized form of star and snowflake schema.
In a star schema, simple queries are used to access data from the database.While in this, heavily complex queries are used to access data from the database.
Star schema is easy to operate as compared to fact constellation schema as it has less number of joins between the tables.While fact constellation schema is not easy to operate as compared to star schema as it has many joins between the tables.
Star schema uses less space as compared to fact constellation schema.While fact constellation schema uses more space comparatively.
It is very simple to understand due to its simplicity.While it is very difficult to understand due to its complexity.
Table Source

Conclusion

From this blog, you now know about Data Warehouse starting from what it is and its features. You have a decent knowledge of Star Schema and a comparison of different Schemas with Star Schema. In case you want to export data from various sources into your desired Data Warehouse/Database/destination, then Hevo Data is the right choice for you! You can have a look at Star and Snowflake Schema Analytics.

Visit our Website to Explore Hevo

Hevo Data provides its users with a simpler platform for integrating data from 100+ sources for Analysis. It is a No-code Data Pipeline that can help you combine data from multiple sources. You can use it to transfer data from multiple data sources into your Data Warehouses, Database, or a destination of your choice. It provides you with a consistent and reliable solution to managing data in real-time, ensuring that you always have Analysis-ready data in your desired destination.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

Share your experience of learning about Star Schema! Let us know in the comments section below!

No-code Data Pipeline for Your Data Warehouse