Databricks CREATE TABLE Command: 3 Comprehensive Aspects

on Data Integration, Data Storage, Databricks, Tutorials • November 8th, 2021 • Write for Hevo

Databricks is one of the most popular Cloud platforms used for data storage. Today, most organizations use it to store their Big Data. This makes it an important platform especially at a time when business systems, applications, and devices are generating huge volumes of data. Databricks also supports many features related to Data Analytics. It comes with Machine Learning features that enable you to create and train Machine Learning models from your data. This facilitates Data Analytics and the extraction of insights from data for decision-making. 

Databricks store data in Databases, and a single Database contains numerous tables. Databricks table stores structured data and it is easy to insert new data into a Table and Query it for viewing. This article will introduce you to Databricks and will discuss the Databricks CREATE TABLE command in detail. Read along to learn the syntax and examples associated with the Databricks CREATE TABLE command!

Table of Contents

Prerequisites

  • A Databricks account. 
  • Understanding of SQL.
  • Working knowledge of Databases.

Introduction to Databricks

Databricks CREATE TABLE:  Databricks Logo
Image Source

Databricks is a Cloud-based Data Engineering platform used for processing and transforming huge quantities of data and exploring data using Machine Learning models. It was recently added to Azure, making it the latest Big Data processing tool for Microsoft Cloud. Databricks allow organizations to achieve the full potential of merging their data, Machine Learning, and ELT processes. 

Databricks uses a distributed system, meaning that it automatically divides workloads across different processors and scales up and down depending on demand. It is also an efficient platform, which helps you to save time and costs when doing massive tasks. 

Key Features of Databricks

The following features make Databricks a popular Data Storage option in the market:

  • Data Compression: Databricks uses the unified Spark engine to compress data at large scales. It supports Data Streaming, SQL queries, and Machine Learning. Moreover, it simplifies the task of managing such processes and makes it developer-friendly.
  • Collaboration: Databricks allows stakeholders working in different programming languages ​​such as Python, SQL, R, and Scala to collaborate. Moreover, the interactivity of the platform makes it easier for users to view point-and-click and script options such as D3, and matplotlib.
  • High Security: Databricks uses layered data security that offers options to regulate access for identity management, fine-grained controls, Data Encryption, and compliance standards.

To learn more about Databricks, visit here

Introduction to Databricks CREATE TABLE Statement

The Databricks CREATE TABLE statement is used to define a table in an existing Database. You can use the statement in the following three ways to create tables for different purposes:

  1. CREATE TABLE [USING]: This syntax should be used when:
  • The table will be based on a column definition that you will provide. 
  • The table will be based on data stored in a particular storage location.
  • The table will be derived from a query. 
  1. CREATE TABLE LIKE: Use this syntax when you need to create a table based on a definition rather than another table’s data. 
  2. CREATE TABLE CLONE: In Databricks, table cloning helps you to achieve two major goals:
  • Create a complete and independent copy of a table including its data and definition in a particular version. It is known as a DEEP CLONE. 
  • Create a copy of the table definition which refers to the original storage of the table for the initial data at a particular version. Updates made to the new or source table won’t affect the other. However, the new table will be depending on the existence of the source table and column definition. 

Simplify your Data Analysis with Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline helps to Load Data from any data source such as Databases, SaaS applications, Cloud Storage, SDK,s, its and Streaming Services to destinations like Databricks, Data Warehouses, etc., and simplifies the ETL process. It supports 100+ data sources and loads the data onto the desired Data Warehouse, enriches the data, and transforms it into an analysis-ready form without writing a single line of code.

Get Started with Hevo for Free

Check out why Hevo is the Best:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

Working with Databricks CREATE TABLE Command

Databricks CREATE TABLE:  Example of Databricks Table.
Image Source

You can understand the Databricks CREATE TABLE command by studying its following 2 aspects:

1) Syntax of Databricks CREATE TABLE Command

The Databricks CREATE TABLE statement takes the following syntax:

{ { [CREATE OR REPLACE TABLE] | CREATE TABLE [ IF NOT EXISTS ] }
  tableName
  [ column_definition ] [ USING data_source ]
  [ table_clauses ]
  [ AS query ] }

column_definition
  ( { column_name data_type [ NOT NULL ]
      [ GENERATED ALWAYS AS ( expres ) ] [ COMMENT colComment ] } [, ...] )

table_clauses
  { OPTIONS ( { option_key [ = ] option_value } [, ...] ) |
    PARTITIONED BY clause |
    clustered_by_clause |
    LOCATION path |
    COMMENT tableComment |
    TBLPROPERTIES ( { property_key [ = ] property_value } [, ...] ) } [...]

clustered_by_clause
  { CLUSTERED BY ( cluster_column [, ...] )
    [ SORTED BY ( { sortColumn [ ASC | DESC ] } [, ...] ) ]
    INTO num_buckets BUCKETS }

The following parameters have been used:

  • REPLACE: If used, it replaces the table (as well as its contents) if it already exists. 
  • IF NOT EXISTS: If there is a table with the specified name, the statement will be ignored. 
  • tableName: The name of the table to be created. 
  • column_definition: The names of column names, their data types, properties, and descriptions. 
  • column_name: A unique name for a column. 
  • data_type: The data type of the column. 
  • NOT NULL: Used when the column should not accept null values. 
  • GENERATED ALWAYS AS (express): The specified expres will determine the value of the column. 
  • COMMENT colComment: A string that describes the table.
  • USING data_source: The file format to be used for the table. It must be TEXT, CSV, AVRO, JSON, PARQUET, ORC, JDBC, or DELTA.
  • table_clauses: Clauses that describe the location, clustering, partitioning, comments, options, and user-defined properties for the table. 

2) Examples of Databricks CREATE TABLE Command

Consider the following Databricks CREATE TABLE examples:

The following Databricks CREATE TABLE statement will create a delta table:

> CREATE TABLE students (admission INT, name STRING, age INT);

The query will create a table named students with three columns namely admission, name, and age

Let’s create a new table using data from another table:

> CREATE TABLE students2 AS SELECT * FROM students;

The query will create a table named students2 which is a copy of the students table, that is, the 2 will have same data. 

Let’s create a CSV table:

> CREATE TABLE students USING CSV LOCATION '/mnt/files';

The following Databricks CREATE TABLE command shows how to create a table and specify a comment and properties:

> CREATE TABLE students (admission INT, name STRING, age INT)
    COMMENT 'A table comment'
    TBLPROPERTIES ('foo'='bar');

You can also change the order of the comment and the properties:

> CREATE TABLE students (admission INT, name STRING, age INT)
    TBLPROPERTIES ('foo'='bar')
    COMMENT 'A table comment';

Let’s create a table and partition it by a column:

> CREATE TABLE students (admission INT, name STRING, age INT)
    PARTITIONED BY (age);

The partition has been created using the column named age

Let’s also create a table that has a generated column:

> CREATE TABLE squares(x INT, y INT,
                          area INT GENERATED ALWAYS AS (x * y));

The values of the area column will be the result of the multiplication of the other two columns. You can also create a table based on the metadata and definition of an already existing table. Use the LIKE clause for this as shown below:

> CREATE TABLE Students3 LIKE Students LOCATION '/mnt/my_files';

Conclusion

This blog introduced Databricks and explained its CREATE TABLE command. It further provided the syntax that you can follow to create your tables in Databricks. Furthermore, it also discussed the examples showing the practical application of the Databricks CREATE TABLE command.

Visit our Website to Explore Hevo

You may want to use Databricks as your primary Data Storage option. This will require you to transfer data from your different sources to your Databricks account using complex ETL processes. Hevo Data will automate your data transfer process, hence allowing you to focus on other aspects of your business like Analytics, Customer Management, etc. This platform allows you to transfer data from 100+ multiple sources to Cloud-based Data Warehouses like Snowflake, Google BigQuery, Amazon Redshift, etc. It will provide you with a hassle-free experience and make your work life much easier.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Share your understanding of the Databricks CREATE TABLE Command in the comments below!

No Code Data Pipeline for Databricks