Databricks is one of the most popular Cloud platforms used for data storage. Today, most organizations use it to store their Big Data. This makes it an important platform especially at a time when business systems, applications, and devices are generating huge volumes of data. Databricks also supports many features related to Data Analytics. It comes with Machine Learning features that enable you to create and train Machine Learning models from your data. This facilitates Data Analytics and the extraction of insights from data for decision-making. 

Databricks store data in Databases, and a single Database contains numerous tables. Databricks table stores structured data and it is easy to insert new data into a Table and Query it for viewing. This article will introduce you to Databricks and will discuss the Databricks CREATE TABLE command in detail. Read along to learn the syntax and examples associated with the Databricks CREATE TABLE command!

Prerequisites

  • A Databricks account. 
  • Understanding of SQL.
  • Working knowledge of Databases.

Introduction to Databricks

Databricks CREATE TABLE: Databricks Logo
Image Source

Databricks is a Cloud-based Data Engineering platform used for processing and transforming huge quantities of data and exploring data using Machine Learning models. It was recently added to Azure, making it the latest Big Data processing tool for Microsoft Cloud. Databricks allow organizations to achieve the full potential of merging their data, Machine Learning, and ELT processes. 

Databricks uses a distributed system, meaning that it automatically divides workloads across different processors and scales up and down depending on demand. It is also an efficient platform, which helps you to save time and costs when doing massive tasks. 

Key Features of Databricks

The following features make Databricks a popular Data Storage option in the market:

  • Data Compression: Databricks uses the unified Spark engine to compress data at large scales. It supports Data Streaming, SQL queries, and Machine Learning. Moreover, it simplifies the task of managing such processes and makes it developer-friendly.
  • Collaboration: Databricks allows stakeholders working in different programming languages ​​such as Python, SQL, R, and Scala to collaborate. Moreover, the interactivity of the platform makes it easier for users to view point-and-click and script options such as D3, and matplotlib.
  • High Security: Databricks uses layered data security that offers options to regulate access for identity management, fine-grained controls, Data Encryption, and compliance standards.

Introduction to Databricks CREATE TABLE Statement

The Databricks CREATE TABLE statement is used to define a table in an existing Database. You can use the statement in the following three ways to create tables for different purposes:

  1. CREATE TABLE [USING]: This syntax should be used when:
  • The table will be based on a column definition that you will provide. 
  • The table will be based on data stored in a particular storage location.
  • The table will be derived from a query. 
  1. CREATE TABLE LIKE: Use this syntax when you need to create a table based on a definition rather than another table’s data. 
  2. CREATE TABLE CLONE: In Databricks, table cloning helps you to achieve two major goals:
  • Create a complete and independent copy of a table including its data and definition in a particular version. It is known as a DEEP CLONE. 
  • Create a copy of the table definition which refers to the original storage of the table for the initial data at a particular version. Updates made to the new or source table won’t affect the other. However, the new table will be depending on the existence of the source table and column definition. 

Working with Databricks CREATE TABLE Command

Databricks CREATE TABLE: Example of Databricks Table.
Image Source

You can understand the Databricks CREATE TABLE command by studying its following 2 aspects:

1) Syntax of Databricks CREATE TABLE Command

The Databricks CREATE TABLE statement takes the following syntax:

{ { [CREATE OR REPLACE TABLE] | CREATE TABLE [ IF NOT EXISTS ] }
  tableName
  [ column_definition ] [ USING data_source ]
  [ table_clauses ]
  [ AS query ] }

column_definition
  ( { column_name data_type [ NOT NULL ]
      [ GENERATED ALWAYS AS ( expres ) ] [ COMMENT colComment ] } [, ...] )

table_clauses
  { OPTIONS ( { option_key [ = ] option_value } [, ...] ) |
    PARTITIONED BY clause |
    clustered_by_clause |
    LOCATION path |
    COMMENT tableComment |
    TBLPROPERTIES ( { property_key [ = ] property_value } [, ...] ) } [...]

clustered_by_clause
  { CLUSTERED BY ( cluster_column [, ...] )
    [ SORTED BY ( { sortColumn [ ASC | DESC ] } [, ...] ) ]
    INTO num_buckets BUCKETS }

The following parameters have been used:

  • REPLACE: If used, it replaces the table (as well as its contents) if it already exists. 
  • IF NOT EXISTS: If there is a table with the specified name, the statement will be ignored. 
  • tableName: The name of the table to be created. 
  • column_definition: The names of column names, their data types, properties, and descriptions. 
  • column_name: A unique name for a column. 
  • data_type: The data type of the column. 
  • NOT NULL: Used when the column should not accept null values. 
  • GENERATED ALWAYS AS (express): The specified expres will determine the value of the column. 
  • COMMENT colComment: A string that describes the table.
  • USING data_source: The file format to be used for the table. It must be TEXT, CSV, AVRO, JSON, PARQUET, ORC, JDBC, or DELTA.
  • table_clauses: Clauses that describe the location, clustering, partitioning, comments, options, and user-defined properties for the table. 

2) Examples of Databricks CREATE TABLE Command

Consider the following Databricks CREATE TABLE examples:

The following Databricks CREATE TABLE statement will create a delta table:

> CREATE TABLE students (admission INT, name STRING, age INT);

The query will create a table named students with three columns namely admission, name, and age

Let’s create a new table using data from another table:

> CREATE TABLE students2 AS SELECT * FROM students;

The query will create a table named students2 which is a copy of the students table, that is, the 2 will have same data. 

Let’s create a CSV table:

> CREATE TABLE students USING CSV LOCATION '/mnt/files';

The following Databricks CREATE TABLE command shows how to create a table and specify a comment and properties:

> CREATE TABLE students (admission INT, name STRING, age INT)
    COMMENT 'A table comment'
    TBLPROPERTIES ('foo'='bar');

You can also change the order of the comment and the properties:

> CREATE TABLE students (admission INT, name STRING, age INT)
    TBLPROPERTIES ('foo'='bar')
    COMMENT 'A table comment';

Let’s create a table and partition it by a column:

> CREATE TABLE students (admission INT, name STRING, age INT)
    PARTITIONED BY (age);

The partition has been created using the column named age

Let’s also create a table that has a generated column:

> CREATE TABLE squares(x INT, y INT,
                          area INT GENERATED ALWAYS AS (x * y));

The values of the area column will be the result of the multiplication of the other two columns. You can also create a table based on the metadata and definition of an already existing table. Use the LIKE clause for this as shown below:

> CREATE TABLE Students3 LIKE Students LOCATION '/mnt/my_files';

Conclusion

This blog introduced Databricks and explained its CREATE TABLE command. It further provided the syntax that you can follow to create your tables in Databricks. Furthermore, it also discussed the examples showing the practical application of the Databricks CREATE TABLE command.

Visit our Website to Explore Hevo

You may want to use Databricks as your primary Data Storage option. This will require you to transfer data from your different sources to your Databricks account using complex ETL processes. Hevo Data will automate your data transfer process, hence allowing you to focus on other aspects of your business like Analytics, Customer Management, etc. This platform allows you to transfer data from 150+ multiple sources to Cloud-based Data Warehouses like Snowflake, Google BigQuery, Amazon Redshift, etc. It will provide you with a hassle-free experience and make your work life much easier.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Share your understanding of the Databricks CREATE TABLE Command in the comments below!

Nicholas Samuel
Technical Content Writer, Hevo Data

Nicholas Samuel is a technical writing specialist with a passion for data, having more than 14+ years of experience in the field. With his skills in data analysis, data visualization, and business intelligence, he has delivered over 200 blogs. In his early years as a systems software developer at Airtel Kenya, he developed applications, using Java, Android platform, and web applications with PHP. He also performed Oracle database backups, recovery operations, and performance tuning. Nicholas was also involved in projects that demanded in-depth knowledge of Unix system administration, specifically with HP-UX servers. Through his writing, he intends to share the hands-on experience he gained to make the lives of data practitioners better.

No Code Data Pipeline for Databricks