With large volumes of data generated daily, it might be challenging for you to store and manage the data efficiently. Database replication can solve this problem by creating and storing copies of a database in different locations, with high availability and redundancy.
Combine it with a SaaS platform like Snowflake for the best database replication. Snowflake replication ensures various benefits, such as integrating database objects and data across multiple accounts, distributing data across different regions, and providing disaster recovery.
This article is an elaborative guide to mastering Snowflake database replication and maximizing its benefits.
Overview of Snowflake
Snowflake is a fully managed SaaS application that stores and manages all your data in one place. The platform can automatically scale its computational resources, which lets you load and analyze data. It helps you run multiple workloads across multiple users simultaneously without any resource conflict.
If you want to improve your data analytics capabilities to enhance customer experiences and optimize operations, Snowflake is a sound option. Its benefits include enhanced performance, scalability, collaboration, and cost efficiency.
Importance of Database Replication
Database replication is essential for the following reasons:
- Disaster Recovery: Database replication is an effective disaster recovery solution that provides copies of the latest database updates, ensuring minimal data loss and quick recovery in case of any server failure.
- Data Availability and Reliability: Database replication enhances data availability by creating multiple copies and maintaining access to data during server crashes or any extraordinary conditions. With backups available in different locations, database replication ensures that data remains accessible even if one server fails.
- Enhanced Network Performance: Storing copies of data in various locations reduces data access latency by allowing users to retrieve data from the nearest location. Placing a replica closer to the users enhances access times and balances the network load.
Database Replication across Multiple Snowflake Accounts
Replicating databases across multiple Snowflake accounts is a valuable strategy for ensuring high availability, geographical distribution, and disaster recovery. Snowflake facilitates a Database Replication feature, which allows you to replicate databases from one Snowflake account to another in the same or different regions. Let us go through an example to understand the concept better.
The diagram illustrates the Snowflake replicas hosted in different regions: Azure Cloud in Amsterdam and AWS Cloud in Frankfurt.
- Source account: Azure Cloud Amsterdam
This Snowflake account is a primary database in Amsterdam, hosted on Azure Cloud.
DP_SHARE
represents a shared database that shares data between Snowflake accounts or regions.
- The Primary Database consists of
DYNAMIC_PRICING
, which stores and manages the data.
- The stream object captures changes, such as inserts, updates, or deletes, made to the
DYNAMIC_PRICING
database. It is essential for change data capture and incremental data replication.
- SQL Stored Procedure processes the changes captured by the stream and prepares the data for replication. It may involve data transformations or aggregations.
- Task automates the execution of the stored procedure at regular intervals, ensuring that the changes are continuously captured and prepared for replication.
2. Replication Across Clouds and Regions
This link indicates replicating the DYNAMIC_PRICING
database from the Azure Cloud in Amsterdam to the AWS Cloud in Frankfurt.
3. Target Account: AWS Cloud Frankfurt
This Snowflake account is the secondary database in Frankfurt, hosted on AWS Cloud.
- The secondary database also comprises
DYNAMIC_PRICING
, which receives the replicated data from the primary database in Amsterdam.
REFRESH_SECONDARY_DATABASE
is the intermediate database that holds the replicated data temporarily before it is distributed to the target environments.
- The scheduled task is to refresh the replication process, ensuring the secondary database is kept up-to-date with the primary database.
- You can utilize the replicated data in target environments such as
DEV_EDW
, PREPROD_EDW
, and PROD_EDW
. The data from the REFRESH_SECONDARY_DATABASE
is distributed to these environments, ensuring consistency and availability across different databases.
Key Differences Between Account and Database Replication
Both account and database replication are crucial for Snowflake’s data management, which targets different object types in data replication. Here are some key differences between Snowflake account and database replication:
- Access Control: Privileges granted on account objects can be replicated with Snowflake account replication. However, privileges granted on database objects, including existing and future grants, cannot be replicated in a secondary database.
- Parameters: Snowflake account parameters cannot be replicated with database replication; they can only be replicated with account replication. Some specific object parameters, such as
MAX_DATA_EXTENSION_TIME_IN_DAYS
, PIPE_EXECUTION_PAUSED
, and others, can be set at the schema level to be replicated.
However, parameters set on objects in the primary database can overwrite those set on objects in the secondary database. In contrast, parameters set on secondary databases cannot be overwritten.
- Object replication: Account replication can replicate objects in a Snowflake account. Here is a list of the supported objects for account replication. Database replication applies to databases only.
Implement Database Replication In Snowflake
Snowflake supports database replication and Failover/Failback across different platforms, such as Amazon Web Services, Google Cloud Platform, and Microsoft Azure. It provides web interfaces like Snowsight and Classic Console to monitor and manage database replication and failover/failback operations.
The Snowsight web interface enables database replication by managing primary and secondary databases. Frequent monitoring and refreshing status keep the data up-to-date in source and target databases.
The Classic Console web interface enables database replication through the Databases > Replication section. You can also manage refreshes and promote/demote databases.
Let us now explore how to configure database replication across different Snowflake accounts and regions:
Prerequisites:
- Database replication should be enabled and monitored by the ‘ORGADMIN’ at the organization.
- Snowflake account replication should be managed by the ‘ACCOUNTADMIN’ in each account.
- The source and target accounts should be of the same Snowflake organization.
Step 1: View the List of Accounts in Your Organization
Use the following command to view the list of replica Snowflake accounts in your organization.
SHOW REPLICATION ACCOUNTS [ LIKE '<pattern>' ]
For example,
SHOW REPLICATION ACCOUNTS [ LIKE '<%myaccount%>' ]
Step 2: Promote a Local Database as a Primary Database
The following SQL command allows you to promote a local database as a primary database.
ALTER DATABASE mydatabase ENABLE REPLICATION TO ACCOUNTS org.account1, org.account2;
Step 3: Enable Failover Operation for a Primary Database
The following SQL command allows you to enable failover for a primary database so that a secondary database can be promoted to a new primary database.
ALTER DATABASE mydatabase ENABLE FAILOVER TO ACCOUNTS org.account2, org.account3;
Step 4: Create a Secondary Database
The following SQL command enables you to create a Snowflake replica of read-only secondary databases.
CREATE DATABASE mydatabase AS REPLICA OF org.account1.mydatabase;
Ensure the secondary database’s name is the same as the primary database.
Step 5: Refresh the Secondary Database
Execute a refresh command to replicate the changes from primary to secondary databases.
ALTER DATABASE mydatabase REFRESH;
Step 6: Monitor the Refresh Status
To check the refresh status of the secondary database, execute the following
DATABASE_REFRESH_PROGRESS( '<secondary_db_name>' )
Please note that only account administrators can execute the SQL commands.
Snowflake Database Replication Simplified by Hevo
Database replication across different Snowflake accounts will need Snowflake configuration, account admin access, organization admin access, source and target server access, and knowledge of SQL commands. However, there is an easy way to replicate databases in different environments without worrying about technical expertise, maintenance, and consistency.
Hevo is a no-code, real-time ELT data pipeline platform that cost-effectively automates your data integration process to create flexible solutions per your needs. Here are some of the key features of Hevo that will help you perform database replication in Snowflake in a better way:
- Data Transformation: Hevo Data allows you to transform your data for analysis with a simple Python-based drag-and-drop data transformation technique. You can leverage this feature to minimize the use of Snowflake’s compute resources for database replication.
- Automated Schema Mapping: Hevo Data automatically arranges the destination schema to match the incoming data, improving data quality by standardizing the data structure. This feature can regulate the utilization of storage and compute resources for database replication.
- Incremental Data Load: Hevo’s incremental data loading capabilities facilitate loading only new or updated data to the target Snowflake data warehouse. You can use it to eliminate the storage and querying of unnecessary data, reducing storage and computing resources of database replication.
By carrying out Snowflake database replication through the Hevo Data platform, you can ensure high availability, data distribution at different geographical locations, and disaster recovery.
Get Started with Hevo for Free
Best Practices of Database Replication in Snowflake
The best practices that can improve the overall Snowflake replication process include:
- Monitoring the refresh progress: The
DATABASE_REFRESH_PROGRESS
function can be used to monitor the current status of the database replication operation.
- Increase the statement timeout period: By default, a primary database replication can be completed in two days, but you can increase the statement timeout period using the following command:
ALTER SESSION SET STATEMENT_TIMEOUT_IN_SECONDS = 604800;
Please note that 604800 seconds is seven days. You must run this command before executing the ALTER DATABASE <em>secondary_db_name</em> REFRESH.
- Compare the primary and secondary databases: To maintain data consistency across the primary and secondary databases, use the HASH_AGG function and compare the hash values to detect discrepancies.
- Drop the databases: Secondary databases can be dropped using the
DROP DATABASE
command. To drop the primary database, you must first drop all the secondary databases for the primary ones and then drop the primary one.
- Manage database replication cost: Use the Snowsight and Classic Console web interfaces to view your account’s database replication costs. You can also use SQL queries to calculate the replication cost history by day and object.
Things to Avoid in Snowflake Database Replication
Snowflake database replication is a complex process that involves many steps in storing and managing the data. Here are some key points that you should avoid in Snowflake replication:
- You cannot replicate databases created from shares. If the database contains dynamic tables, event tables, or any external tables, the primary database refresh also does not work.
- The
CREATE DATABASE…AS REPLICA
command is incompatible with the ‘WITH TAG’ clause because secondary databases are essentially read-only. Also, replication operations fail if the primary database includes streams with unsupported source objects or any source object stream has been dropped.
Conclusion
You now clearly understand how Snowflake database replication works and how to implement it effectively for your organization. However, if you need a robust database replication platform that can save time and effort, schedule a demo of Hevo.
Interested in mastering Snowflake Change Data Capture? Read our comprehensive guide to discover how it can streamline your data operations and improve accuracy.
It is a more efficient and economical choice that offers 150+ data source connectors from which you can seamlessly integrate data into the destination of your choice.
Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite firsthand. Also checkout our unbeatable pricing to choose the best plan for your organization.
Share your experience of Snowflake Replication in the comments section below!
FAQs
- What cannot be replicated in Snowflake?
You cannot replicate secrets in Snowflake. Replicating secrets through standard database replication may expose sensitive information and pose security risks. Secrets may include API keys, encryption keys, and service account credentials.
- Is cloning and replication the same in Snowflake?
Cloning and replication are two distinct features of Snowflake. Cloning creates a copy of the structure, schema, table, and file format using the underlying metadata to avoid physical duplication of data and save space. Replication involves creating and maintaining a copy of the database and its objects across different Snowflake accounts or regions.
- How do you handle replication for Business Critical or higher editions in Snowflake?
For Business Critical or higher editions, it’s important to note that these accounts have specific compliance requirements (e.g., HIPAA and HITRUST CSF). While these editions support replication, some accounts approved for replication might operate on lower editions and may have different compliance agreements. Therefore, ensuring the replication target meets the necessary compliance standards is crucial.
- What is a primary replication group and a secondary replication group in Snowflake?
- Primary Replication Group: This group manages the original, writable database objects to be replicated.
- Secondary Replication Group: This group contains the replicated, read-only database objects in the target account or region.
5.What is the role of data sharing in Snowflake replication?
Data sharing in Snowflake allows replicated data to be securely shared with other Snowflake accounts, facilitating collaboration and data utilization across different teams or organizations. You can create shares from the secondary (replicated) database and grant access to other accounts.
Sarthak is a skilled professional with over 2 years of hands-on experience in JDBC, MongoDB, REST API, and AWS. His expertise has been instrumental in driving Hevo's success, where he excels in adept problem-solving and superior issue management. Sarthak's technical proficiency and strategic approach have consistently contributed to optimizing operations and ensuring seamless performance, making him a vital asset to the team.