Are you trying to access and query your Amazon Redshift data using Python? Well, you have landed on the right article. Amazon Redshift has become one most popular Data Warehousing and Analytics platforms for several businesses. After you have consolidated your data from all your sources into Redshift, you’ll need to analyze it to gain important business insights. You can easily access your data by setting up a Python Redshift connection.

Python is a popular open-source programming language that contains libraries to perform advanced statistical operations for data analysis. By setting up the Python Redshift connection you can query your data and visualize it by generating graphs & charts using the inbuilt Python libraries.

This article will walk you through the process of setting up a Python Redshift connector using 3 popular methods. It will also provide you with a brief overview of Python and Amazon Redshift in further sections.

Integrate your data within minutes!

With Hevo’s no-code data pipeline platform, you can seamlessly integrate data from multiple sources in minutes without any technical expertise.

Why Hevo?

  • Ensure real-time data analytics
  • Connect to over 150+ sources, including databases and SaaS applications.
  • Provides an auto-mapping feature for hassle-free schema mapping.

Explore various features of Hevo and experience why Hevo is rated as 4.3 on G2.Try out the 14-day free trial today to experience seamless data integration.

Get Started with Hevo for Free

What Prerequisites Do You Need for this?

  • An active AWS account.
  • Working knowledge of Python and SQL.
  • Working knowledge of Databases and Data Warehouses.
  • Clear idea regarding the type of data to be transferred.

What is Amazon Redshift?

Pyhton Redshift Connection - Amazon Redshift Logo | Hevo Data

Amazon Redshift is a fast, fully managed cloud Data Warehouse that makes it simple & cost-effective to analyze all of the data using standard SQL and existing Business Intelligence (BI) tools. The Redshift architecture is made up of a number of computing resources known as Nodes, which are then grouped into Clusters.

The major benefit of Redshift is its great scalability and quick query processing, which has made it one of the most popular Data Warehouses even today. By adding Nodes in just a few clicks, you can easily scale up your storage and processing performance needs using the AWS Console or Cluster APIs. It’s that simple.

Best Practices for High-Performance ETL to Redshift
Download Your Free EBook Now

What is Python?

Python is a language of Web Development, Data Engineering, Data Science, & Machine Learning. It is one of the most used programming languages nowadays. Python provides a ton of libraries that may be used in a range of fields, including Data Engineering, Data Science, Artificial Intelligence, and many more. Pandas, NumPy, and SciPy are some examples of prominent Python packages.

Python allows you to work more quickly and efficiently integrate systems. Python also offers interactive debugging and testing of code snippets, as well as access to all major commercial databases. Additionally, it has a large and growing worldwide community, with major tech giants including Google, Facebook, Netflix, and IBM relying on it.

Why Connect Python to Redshift?

  • Easily access your Amazon Redshift data using Python and execute queries for your business analysis.
  • After querying the data via the Python Redshift connection, employ the powerful statistical techniques available in Python to gain important insights from your Amazon Redshift Data.
  • Use Python libraries such as Numpy to perform advanced numerical computing.
  • With visually stunning & informative graphs, you can visualize your data using Python libraries such as pandas and matplotlib.

What are the Methods to Set Up Python Redshift Connection?

  • Method 1: Python Redshift Connection using psycopg Driver
  • Method 2: Python Redshift Connection using ODBC Driver
  • Method 3: Python Redshift Connector by AWS

Method 1: Python Redshift Connection using Python psycopg Driver

  • Step 1: You can install psycopg driver using the following command:- 
pip install psycopg2
  • Step 2: The next step in order to set up a Python Redshift connection is to set up proper configuration using standard database access points as shown below by the script of code. 
import psycopg2
conf = { 'dbname' : 'database_name', 
      'host' : 'redshoft_endpoint', 
      port= 'redshift_port', 
      user= 'usr_name', 
      password= 'user_password'}
  • Step 3: You can now wrap the above configuration into a function so that any errors might be handled properly. 
def create_conn(*args, **kwargs):
 
          config = kwargs['config']
          try:
              conn=psycopg2.connect(dbname = config['dbname'],
                                    host = config['host']
                                    port = config['port']
                                    user = config['user']
                                    password = config['password'])
            except Exception as err:
                print err.code, err
            return conn
  • Step 4: Finally, you need to finish the Python Redshift connection part & retrieve all the tables in the search path. This snippet will complete the Python Redshift connection process & get connected to the Redshift database. 
def select(*args,**kwargs):
    # need a connection with dbname=<username>_db
    cur = kwargs['cur']
 
    try:
        # retrieving all tables in search_path
        cur.execute("""select tablename""")
    except Exception as err:
            print err.code,err
 
    rows = cur.fetchall()
    for row in rows:
        print row
 
print 'start'
conn = create_conn(config=configuration)
cursor = conn.cursor()
print 'start select'
select(cur=cursor)
print 'finish'

Executing Queries on Redshift Data using psycopg

  • Step 1: You will have to use the cursor that was obtained in the last step of the Python Redshift connection by writing the following command:
cursor = conn.cursor()
  • Step 2: After getting the cursor in the ‘cur’ variable, you can use it to execute SQL queries to fetch data.  
cursor.execute("SELECT * FROM `table`;")
  • Step 3: After fetching the required data, you can use psycopg to fetch the data from the database. To fetch the complete data, you can simply run this command.  
cursor.execute("SELECT * FROM `table`;")

You can also try out the following Psycopg Update Query:  

cursor.execute("UPDATE Table_name SET price=%s WHERE id=%s", (uPrice, uId))
  • Step 4: In the end, finally, the cursor & Python Redshift connection can be closed by using the following commands.
cursor.close() 
conn.close()
Connect Google Analytics to Redshift
Connect MySQL to Redshift
Connect MS SQL Server to Redshift

Method 2: Python Redshift Connection using Python ODBC Driver

  • Step 1: ODBC Driver needs to get installed first & it needs configuration as well. Once you have ODBC Driver configured, you can use Python commands to run the Python Redshift connection.  
import pyodbc
cnxn = pyodbc.connect('DRIVER={ODBC Driver for Amazon Redshift};
Server=myserver;Database=mydatabase;User_ID=myuser_id;Password=my_password;Port=myport;String Types=Unicode')
  • Step 2: After this, you can run a simple query to get your Python Redshift connection tested. Below is an example of the insert query to insert a record in the table. 
cursor = cxn.cursor()
cursor.execute("INSERT INTO CITY (CNAME, CPLACE, CPOSTNO.) VALUES ('Karachi', 'Port Sea', 'JCX12')")

When talking about data analysis in Python, NumPy & Pandas are the most popular libraries.

  • Numpy: Python Library is used mainly for computing purposes. 
  • Pandas: A widely used Data Analysis library. Provides a high-performance data structure known as DataFrame.

Data Loading in Numpy

It’s a very simple process. Use the following commands. 

import numpy as np
data = np.array(cur.fetchall())

Data Loading into Pandas

Similarly, you can use Pandas as well for further data analysis but the data loading process is a bit different as compared to Numpy. 

from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('postgresql://fz:demo@redshift_host:<port_no>/mydatabase')
data_frame = pd.read_sql('SELECT * FROM `table`;', engine)

With this, you have successfully set up a Python Redshift connection.

Method 3: Python Redshift Connector by AWS

  • Step 1: To get started with the Python Redshift connection, firstly you need to install the package for the Python Connect to Redshift from the source by cloning this repository.
$ git clone https://github.com/aws/amazon-redshift-python-driver.git
$ cd redshift_connector
$ pip install .
  • Step 2: After the DB-API specification, you will notice that the autocommit is off by default. You can toggle it ON using the autocommit property of the Python Redshift Connection. While performing this operation, ensure that you are not in a transaction.
con.rollback()
con.autocommit = True
con.run("VACUUM")
con.autocommit = False
  • Step 3: You can configure the paramstyle of the cursor using cursor.paramstyle.Set to Format by default, you can use qmark, numeric, named, format, pyformat for paramstyle.  
# qmark
redshift_connector.paramstyle = 'qmark'
sql = 'insert into foo(bar, jar) VALUES(?, ?)'
cursor.execute(sql, (1, "hello world"))

# numeric
redshift_connector.paramstyle = 'numeric'
sql = 'insert into foo(bar, jar) VALUES(:1, :2)'
cursor.execute(sql, (1, "hello world"))

# named
redshift_connector.paramstyle = 'named'
sql = 'insert into foo(bar, jar) VALUES(:p1, :p2)'
cursor.execute(sql, {"p1":1, "p2":"hello world"})

# format
redshift_connector.paramstyle = 'format'
sql = 'insert into foo(bar, jar) VALUES(%s, %s)'
cursor.execute(sql, (1, "hello world"))

# pyformat
redshift_connector.paramstyle = 'pyformat'
sql = 'insert into foo(bar, jar) VALUES(%(bar)s, %(jar)s)'
cursor.execute(sql, {"bar": 1, "jar": "hello world"})

You can consider the following basic example for setting the Python Redshift connection using the Redshift connector.

import redshift_connector

# Connects to Redshift cluster using AWS credentials
conn = redshift_connector.connect(
    host='examplecluster.abc123xyz789.us-west-1.redshift.amazonaws.com',
    database='mydatabase',
    user='AmazonUser',
    password='new_password'
 )

cursor: redshift_connector.Cursor = conn.cursor()
cursor.execute("create Temp table employee(firstname varchar,lastname varchar)")
cursor.executemany("insert into employee (firstname, lastname) values (%s, %s)",
                    [
                        ('John', 'Smith'),
                        ('Mike', 'Tatum')
                    ]
                  )
cursor.execute("select * from employee")

result: tuple = cursor.fetchall()
print(result)
>> (['John', 'Smith'], ['Mike', 'Tatum'])

Conclusion

This article provides an introduction to Python and Amazon Redshift. Moreover, you learned about 3 methods that you can use to connect to Redshift using Python. Setting up Python Redshift Integration can help you to access and query your Amazon Redshift data with ease.

Hevo Data provides an Automated No-code Data Pipeline that empowers you to overcome the above-mentioned limitations. You can leverage Hevo to seamlessly transfer data from various sources to Redshift in real-time.Try a 14-day free trial to explore all features, and check out our unbeatable pricing for the best plan for your needs.

Share your experience of setting up a Python Redshift connection in the comments section below!

Frequently Asked Questions

1. What languages are supported by Redshift?

Amazon Redshift is mostly using SQL for query purposes. You can even use Python and R to load and transform data, especially with AWS Lambda. Redshift Spectrum enables the ability to query data in Amazon S3 using standard SQL as well.

2. Which tool is used to connect to Redshift database?

You can connect an Amazon Redshift database to SQL Workbench/J, DBeaver, and Aginity Pro. Other BI tools that work with Redshift include Tableau and Power BI. AWS offers the Redshift Query Editor, which you can directly access through the AWS Management Console.

3. Is Redshift an ETL tool?

No, Amazon Redshift is not ETL. It is actually a data-warehousing service mainly intended for data storage and analytics. It can be integrated with AWS Glue or Talend to extract data, transform it, and load it into Redshift for analysis.

Muhammad Faraz
Technical Content Writer, Hevo Data

Muhammad Faraz is an AI/ML and MLOps expert with extensive experience in cloud platforms and new technologies. With a Master's degree in Data Science, he excels in data science, machine learning, DevOps, and tech management. As an AI/ML and tech project manager, he leads projects in machine learning and IoT, contributing extensively researched technical content to solve complex problems.