Snowflake Create Database Simplified 101: Usage and Examples Explained

on Data Warehouse, Snowflake, Tutorials • December 20th, 2021 • Write for Hevo

Have you just started with Snowflake? Are you confused about how to create a database in Snowflake? If yes, then you are in the right place. In this blog post, you’ll learn about Snowflake, databases, and how to create a database in Snowflake.

Table of Contents

Introduction to Snowflake

Snowflake is a cloud-based data warehouse that uses Amazon web services or Azure as its cloud infrastructure and is entirely a Software-as-a-Service offering. Snowflake handles all the maintenance and performance tuning activities, and you don’t need to install, configure, or manage any hardware or software.

Snowflake Create Database: Snowflake

There are three main components in the Snowflake architecture (assuming AWS as its cloud infrastructure):

  • Database Storage: Snowflake uses Amazon S3 as the file system to store the encrypted, compressed, and distributed data to optimize performance. The data stored in these file systems are not visible to the user and can only be accessed via the SQL interface provided by Snowflake. Snowflake handles all the aspects of the data like metadata, compression, encryption, data storage, statistics, etc.
  • Query Processing: Snowflake has the concept of a virtual warehouse, which are Amazon EC2 clusters that provide compute power to the queries. Each virtual warehouse contains an MPP EC2 compute cluster with multiple nodes to perform quick analysis over data. Snowflake automatically manages the scale up or down of these virtual warehouses on demand and allows you to pause these clusters when not in use.
  • Cloud Services: This layer is a collection of all the services offered by Snowflake. It coordinates and handles services in Snowflake that includes sessions, authentication, metadata management, infrastructure management, SQL compilation, encryption, etc.

Key Features of Snowflake

  • Query Optimization: Snowflake can optimize the query by clustering and partitioning on its own. You don’t need to worry about query optimization.
  • Result Caching: Snowflake works on the caching model so that when the same query is re-issued, it quickly returns the results from the cache. Snowflake uses persisted (within the session) query results to avoid re-generation of the output when nothing has changed.
  • Secure Data Sharing: Secure data sharing enables account-to-account sharing of data through Snowflake database tables, views, and UDFs.
  • Support for Different File Formats:  Snowflake allows you to import semi-structured data such as JSON, Avro, ORC, Parquet, and XML data. It provides a particular column type, VARIANT, which allows you to store semi-structured data.
  • Standard and Extended SQL Support: Snowflake has excellent support for ANSI SQL and supports advanced SQL functionality like merge, lateral view, statistical functions, and many more.
  • Fault-Tolerant: Snowflake provides exceptional fault-tolerant capabilities to recover the Snowflake object (tables, views, database, schema, etc.) in case of failure.

Introduction to Database

A database controls and organizes structured information or data with the help of a database management system (DBMS). The database allows you to easily access, monitor, organize, and manage the data. A large number of databases use structured query language (SQL) for writing and querying data. Snowflake is a data warehouse that is used to manage and analyze data from different sources. The key differences between database and data warehouse are as follows:

  1. The database is typically used to store or record the data, whereas a data warehouse is designed to analyze the data with the help of its programming interface. In the case of Snowflake, SQL is used to perform complex calculations.
  2. The database is application-oriented and is used based on the application, whereas, a data warehouse is a subject-oriented collection of data.
  3. The database uses Online Transactional Processing (OLTP), whereas the data warehouse uses Online Analytical Processing (OLAP).
  4. The database is designed to hold the data up to gigabytes, whereas data warehouses can store data up to terabytes.
  5. The database is used for query processing, whereas a data warehouse is used when there is a need for data analytics and decision making.

Need for Database

In the relational world, databases organize and manage database objects like tables, views, etc. Snowflake is a fully managed data warehouse that manages data centrally to perform faster analysis. In Snowflake, schema organizes the data stored in Snowflake, whereas the database logically groups the data. In contrast, schema logically arranges the database object like tables, views, etc. In Snowflake, the database and schema together comprise namespace. A namespace is explicitly required when you want to execute any operation on Snowflake. If the schema and database are already selected, then this parameter is optional.

Hevo Data: An Alternative Approach to Load Data in Snowflake

Hevo is a No-code Data Pipeline. It can efficiently load data in Snowflake in a few simple steps. It is a fully automated platform and it completely automates the process of data migration.

GET STARTED WITH HEVO FOR FREE

Let’s see some unbeatable features of Hevo Data:

  1. Fully Managed: Hevo Data is a fully managed service and is straightforward to set up.
  2. Pre-Built Integrations: Hevo Data has various connectors incorporated with it, which can connect to multiple sources with ease. 
  3. Schema Management: Hevo Data automatically maps the source schema to perform analysis without worrying about the changing schema.
  4. Real-Time: Hevo Data works on the batch as well as real-time data transfer so that your data is analysis-ready always.  
  5. Advanced Monitoring: Hevo Data offers advanced monitoring that gives you a one-stop view to watch all the activities that occur within pipelines.
  6. Live Support: With 24/7 support, Hevo provides customer-centric solutions to the business use case.

Simplify your Data Analysis with Hevo today! 

SIGN UP HERE FOR A 14-DAY FREE TRIAL!

Snowflake Create Database Command

Let’s understand the semantics of creating databases in Snowflake. The create database command creates a new database in the system as well as can be used:

  1. To clone the existing database.
  2. To create a database from another Snowflake account.
  3. To create a replica of the current database.

Let’s learn about the various syntax of Snowflake create database command:

  1. The Syntax for Standard Database
  2. The Syntax for Shared Database
  3. The Syntax for Database Replication

1. The Syntax for Standard Database

Snowflake Create Database: Standard Database
  • <name>: Specifies the case-sensitive unique identifier name for the database.
  • TRANSIENT: Specifies a transient (temporary) database. It is not fail-safe and is session-based. Once you terminate the session, complete data, i.e., tables, views, objects, are wiped out.
  • CLONE: This is used to create the clone of the existing database.
  • AT | BEFORE ( TIMESTAMP | OFFSET | STATEMENT ): When cloning a database, “AT | BEFORE”  specifies time travel to the clone database at or before a specific period in the past.
  • DATA_RETENTION_TIME_IN_DAYS: The default value is 1 (Standard database), and it specifies the period to retain the objects created in the database.
  • COMMENTS: It specifies an explanation for the database. 

2. The Syntax for Shared Database

Snowflake Create Database: Shared Database
  • <name>: It specifies the case-sensitive unique identifier name for the database.
  • <provider_account.share_name>: It specifies the account name and shares the name of the sharing database.

3. The Syntax for Database Replication

Snowflake Create Database: Database Replication
  • <name>: It specifies the case-sensitive unique identifier name for the database.
  • AS REPLICA OF Snowflake_region.account_name.primary_db_name: It specifies a namespace along with the region and account name while cloning the primary database.
  • AUTO_REFRESH_MATERIALIZED_VIEWS_ON_SECONDARY =  TRUE | FALSE: This command enables the automatic refresh of the views/database in the secondary database. By default, it is false. When set true, this enables automatic maintenance of materialized views in the secondary database. When set false, it disables the automatic background maintenance of materialized views.

Examples of Snowflake Create Database Command

In this section, you will see some examples of using Snowflake’s “create database” command.

  1. Create a Permanent Database
  2. Create a Transient Database
  3. Create a Database Replication

1. Create a Permanent Database

You can create a standard database alone, or you can create a standard database with the retention period.

  • To create standard database:
create database db_test1;
  • To create standard database with retention period:
create database db_test2 data_retention_time_in_days = 10;
  • To show the created database: 
show databases like 'db%';
Snowflake Create Database: Permanent  Database

2. Create a Transient Database

Here’s an example of a transient database.

create transient database db_transient_1;
show databases like ‘db%';
Snowflake Create Database: Transient Database

3. Create a Database Replication

To create a replication of a database, below command can be used:

create database mydb1
  as replica of aws_us_west_2.myaccount.mydb1
  auto_refresh_materialized_views_on_secondary = true;
Snowflake Create Database: Database Replication

Limitations

Although Snowflake is a fully managed advanced data warehouse, certain shortcomings cannot be overlooked. Some of them are as follows:

  1. Snowflake owns the development infrastructure, and in case of any failure, you have to be dependent on the Snowflake support team.
  2. Snowflake also holds complete data of the user in its private cloud, and all the services are shared. In case of any security-related issue, there is a risk of data exposure.
  3. Snowflake has a small ecosystem, and a lot of advanced functionality is missing. It has no geospatial support, minimal user interface functionality, and immature ETL tool integration.
  4. Snowflake UDF’s work only on SQL and Javascript. You cannot use python or other languages to create UDF’s.

Conclusion

Snowflake is a powerful, fully managed platform used for data warehousing and analytics. In this blog post, you have learned about the Snowflake’s database, its uses, and ways to create various types of databases. However, when it comes to fully managed ETL, you can’t find a better solution than Hevo.

Hevo is a No-code Data Pipeline product that will help you move data from multiple data sources to your destination. It is consistent and reliable. It has pre-built integrations from 100+ sources.

VISIT OUR WEBSITE TO EXPLORE HEVO

Want to take Hevo for a spin?

SIGN UP and experience the feature-rich Hevo suite first hand.

Share your experience of using the Snowflake Create Database in the comment section below.

No-Code Data Pipeline for Snowflake