Integer, Decimal and Floating-Point form three key Amazon Redshift Numeric data types. Leveraging these data types effectively while creating Amazon Redshift tables allows you to store and analyze data at scale with high performance. With the correct data type assigned to your business data in Amazon Redshift, you can store, manipulate, and query data with ease.
This article provides you with an in-depth guide about Amazon Redshift Numeric data types. Upon a complete walkthrough of the article, you’ll clearly understand each data type and the kind of data you can store in it. You’ll also come across few examples of queries that’ll help you set up Amazon Redshift tables with columns of each data type.
Table of Contents
What is Amazon Redshift?
AWS Redshift is a Data Warehouse service offered by Amazon. It follows a Relational structure with a capable Querying Layer that follows the PostgreSQL standard. Amazon Redshift’s uniqueness comes from its ability to store petabytes of data and execute queries with sub-second response time.
A Massively Parallel Processing (MPP) architecture is used to implement Amazon Redshift. It is made up of Leader and Worker nodes that collaborate to achieve super-fast querying speeds. Client Communication, Query Planning, and Task Assignment to Worker nodes are all handled by Leader nodes. Worker nodes are in charge of actually running the queries on the data chunks that they handle.
Amazon Redshift can be configured using Dense Compute or Dense Storage nodes. SSDs are used in Dense Compute nodes, which helps in circumstances/use cases when customers are willing to pay for faster performance. HDD-based Dense Storage nodes are designed for use cases that require inexpensive storage. Depending on their budget and requirements, organizations can implement their Cluster using Dense Compute or Dense Storage instances. A detailed guide on Amazon Redshift pricing that can help you choose the right instance, can be found here.
One key feature of Amazon Redshift is Concurrent Scaling, which automatically scales the Cluster at peak load times. This is a separate fee-based service. However, for every 24 hours that the Cluster is operating, Amazon provides One Hour of Free Concurrency Scaling.
A fully managed No-code Data Pipeline platform like Hevo Data helps you integrate and load data from 100+ different sources (including 40+ free sources) to a Data Warehouse such as Amazon Redshift or Destination of your choice in real-time in an effortless manner. Hevo with its minimal learning curve can be set up in just a few minutes allowing the users to load data without having to compromise performance.
Get Started with Hevo for Free
Its strong integration with umpteenth sources allows users to bring in data of different kinds in a smooth fashion without having to code a single line.
Check out some of the cool features of Hevo:
Sign up here for a 14-Day Free Trial!
- Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
- Connectors: Hevo supports 100+ integrations to SaaS platforms, files, Databases, analytics, and BI tools. It supports various destinations including Google BigQuery, Amazon Redshift, Snowflake Data Warehouses; Amazon S3 Data Lakes; and MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL Databases to name a few.
- Real-Time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
- 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
- Scalable Infrastructure: Hevo has in-built integrations for 100+ sources (including 40+ free sources) that can help you scale your data infrastructure as required.
- 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
Amazon Redshift Numeric Types: 3 Key Data Formats
Amazon Redshift Numeric types encompass the following three data types:
1) Integer Numeric Type
Amazon Redshift Numeric supports three kinds of Integer data types, SMALLINT, INT, and BIGINT. These are as follows:
- SMALLINT or INT2: It requires 2 bytes of storage space and can be used to represent numbers from -32768 to 32767.
- INTEGER, INT, or INT4: It requires 4 bytes of storage space and can be used to represent numbers from -2147483648 to +2147483647.
- BIGINT or INT8: It requires 8 bytes of storage space. The highest number that you can represent using BIG INT is 9223372036854775807.
In all variants of an Amazon Redshift Numeric Integer, you can only store a value that falls with the mentioned range. Any value outside the range is considered as an error value. You can create a table using the Integer data type using the following SQL Command:
CREATE TABLE test(id INT, name VARCHAR(10));
2) Decimal Numeric Type
The Decimal Amazon Redshift Numeric data type is used to represent values that contain a decimal point. It can use up to 128 bits of storage. Amazon Redshift requires you to specify Precision and Scale while defining Numeric Data Types.
Precision represents the total number of digits including the ones on the left and right sides of the decimal point. Scale represents the number of digits on the right side of the decimal point.
CREATE TABLE SALES(id INT, price DECIMAL(6,2))
For example, in the above SQL statement, the column “Price” can store a total of 6 digits with 2 digits on the right side of the decimal point. If a value of higher scale is inserted into the column, the Redshift Numeric data type will round it off to mentioned scale of the column.
For example, if you try to insert a (6,4) decimal to Price, it will be rounded to a (6,2) value. Let’s consider the below statement:
INSERT INTO SALES VALUES(1,3456.5689);
Here, it will result in 3456.57 being inserted as the price value. The Default Precision is 18 and the Maximum Precision supported is 38. The maximum positive value that can be inserted to a decimal type is 2^63 -1. As a best practice, it is better not to use a 128-bit decimal value or define maximum precision of 38 for your columns unless absolutely required. Maximum precision can lead to high storage usage and slower query execution times.
3) Floating-Point Numeric Type
Floating-point types differ from decimal types in the sense that they don’t allow users to define Precision. They can either be created with 6 digits of Precision or 15 digits of Precision. Everything else will be stored as rounded approximations. Hence, there may be slight discrepancies if you try to store values of different Precision because of the rounding errors.
There are two floating-point types, Real and Double Precision.
- Real: It is also known as FLOAT4 and has 6 digits of Precision.
- Double Precision: It is also known as FLOAT or FLOAT8 and has 15 digits of Precision.
For example, you can try and create a table with the Real data type as follows:
CREATE TABLE SALES(id INT, price REAL)
Once, you’ve created the table, you can now insert some values/create records as follows:
INSERT INTO SALES VALUES(1,3456.5689);
INSERT INTO SALES VALUES(2,345678.1);
Executing a Select SQL query will provide the following results:
You will notice that both the values have been rounded off to 6 digits of Precision.
That’s all there is to know about Redshift Numeric types. It should be noted that selecting the right data type for your columns has a big effect on the Storage Space and Querying Performance of Amazon Redshift.
As a rule of thumb, always choose the least possible data type for your requirements. Selecting high precision Redshift Numeric data type for future-proofing will always be tempting but keep in mind that if you are using Amazon Redshift, you are dealing with at least TBs of data and even a single digit of Precision can have an effect on the Performance.
It’s all about the numbers. You don’t want to lose Data Integrity because of choosing the wrong Amazon Redshift Numeric data type. Integers, Decimals and Floating-point numbers are among the typical numeric data types provided by Amazon Redshift. When querying data, you must be cautious about how the Conversion and Compatibility of Amazon Redshift Numeric data types function.
Amazon Redshift provides a great alternative for setting up Data Warehouses without spending on infrastructure or maintenance. It works seamlessly with components in the AWS ecosystem. If you are primarily an AWS-based organization, moving data in and out from Redshift is very easy. Unfortunately, Modern Hybrid-Cloud strategies do not allow enterprises to stick to a single service provider and most large organizations have data coming from Multiple Cloud and Software-as-a-Service (SaaS) providers.
If you have data coming from multiple sources and want a simple solution to load and transform the data to Amazon Redshift, you should check out Hevo Data. Hevo provides a completely code-free ETL tool that can accomplish data transfer from most standard sources to Amazon Redshift.
Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations with a few clicks.
Visit our Website to Explore Hevo
Hevo Data with its strong integration with 100+ data sources (including 40+ free sources) allows you to not only export data from your desired data sources & load it to the destination of your choice such as Amazon Redshift, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools.
Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.
Share your experience of understanding the Amazon Redshift Numeric Data Types in the comment section below! We would love to hear your thoughts.