Amazon S3 is one of the most popular services of AWS, which allows you to store and fetch colossal amounts of data to and fro the S3 buckets. Since Amazon S3 provides High Scalability, Security, and Performance, it is popularly being used among data professionals to build effective Data Pipelines. Besides, the static data present inside the Amazon S3 buckets can be further processed, analyzed, and visualized to extract meaningful insights and reports out of the given data, thereby allowing companies to make data-driven decisions.
In order to analyze data present in Amazon S3, developers can use the external visualizations tools, frameworks, and web interfaces. One such data visualization tool is Tableau, which allows you to analyze, interpret, and visualize data to make data-driven decisions.
In this article, you will learn about Amazon S3, Tableau, and a step-by-step guide for Tableau S3 connection.
Table of Contents
Fundamental knowledge of data analysis and visualization.
What is Amazon S3?
Amazon S3 (Simple Storage Service) is a low-latency and high-throughput object storage service that allows developers to store massive volumes of data. In other words, Amazon S3 is a virtual unlimited object storage space inside which you can store any kind of data files like documents, mp3, mp4, applications, pictures, and more. You can access Amazon S3 with an easy-to-use Web Interface for configuring S3 buckets to store, organize, and manage various data files.
Amazon S3 is highly fault-tolerant since it automatically makes copies of data objects on multiple devices or servers across various clusters, ensuring the high availability of data. With Amazon S3, you can preserve, retrieve, and restore previous versions of every object in the corresponding buckets so that you can easily recover when the data is accidentally deleted by users or when an application fails.
Amazon S3 can also be connected with Third-Party software like data processing frameworks to securely run queries on S3 data without moving them to a separate analytics platform. In addition to such effective features and capabilities, Amazon S3 asks you to only pay based on the storage space that you actually use with no setup cost or minimum fee.
What is Tableau?
Developed by Pat Hanrahan, Christian Chabot, and Chris Stolte in 2003, Tableau is one of the most popular Data Visualization tools that allow you to create attractive Charts, Graphs, Dashboards, and Reports according to user-specified data. Tableau allows you to create high-level graphs and dashboards just by dragging and dropping the necessary fields, parameters, or columns of datasets. It can also be connected with external data sources and third-party applications via drivers and connectors to provide accurate insights into the various dataset.
The Tableau product suite consists of four offerings, such as Tableau Desktop, Tableau Online, Tableau Server, and Tableau Reader, in which each of the products is designed for enabling users to integrate various Data Sources and create Data Visualizations that can be shared throughout the organization internally or publicly.
Methods to Connect Tableau S3
Method 1: Manually Connecting Tableau S3 using Athena JDBC Connector
To establish the connection between the Tableau S3, you have to satisfy four prerequisites:
- pre-installed Tableau Server,
- pre-registered Amazon S3,
- an active bucket on Amazon S3 space,
- and a secret access key for the AWS IAM (Identity and Access Management).
Since you are about to establish a Tableau S3 connection via Athena JDBC driver, make sure that you installed the latest version of 64-bit Java. A minimum of at least JDK 7.0 or Java 1.7 is required to make a proper Tableau S3 connection.
Step 1: Downloading Athena JDBC Driver
For downloading Athena JDBC drivers, visit the official website of Amazon Athena. On the website, you can find various Athena JDBC drivers in the form of jar files. From those, download the respective JDBC driver that suits your JDK (Java Development Kit) version. After downloading the JDBC jar file, move the respective driver file to the home path or location of Tableau based on your operating system.
The appropriate JDBC storage locations according to the operating system are given below.
1. For Windows: C:Program FilesTableauDrivers
2. For macOS: ~/Library/Tableau/Drivers location
3. For Linux: /opt/tableau/tableau_driver/jdbc
Step 2: Setting up Athena
Before setting up and configuring Athena, you have to create a “student” table in CSV format that points to a student-db.csv file in the Amazon S3 bucket. You should also create a view named “student_view” on top of the student table created before. For creating a student table and student view easily, you can download the respective files from the GitHub repository.
- Now, open the Amazon S3 console and upload the student-db.csv in the bucket you created before.
- In the next step, create a “studentdb” database using the following DDL statement in your Athena Console.
CREATE DATABASE studentdb;
- After creating the database, execute the below given DDL statement for creating a “student” table inside the “studentdb” database. You should also provide the name of your Amazon S3 bucket inside the location parameter, as shown above.
CREATE EXTERNAL TABLE student(
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
- Now, you have to create a view named “student_view.” On creating the view, you can limit the number of fields or columns required to build dashboards in Tableau.
- Execute the following command to create a View.
CREATE OR REPLACE VIEW student_view AS
- After creating the database, table, and views, you should now check whether the entities are created properly. You can check by executing SQL queries in the Athena console.
- Execute the following command to check whether the student_view is created correctly.
SELECT * FROM “studentdb”.”student_view” limit=10;
- After executing the above command, you will get the following output.
- From the above output, you can confirm that the view is successfully created since it only displays the columns that are mentioned inside the SQL view query.
Step 3: Connecting Tableau S3 via Athena Connector
Using the Amazon Athena connector, you can connect Tableau S3 data rapidly and effortlessly. After establishing the connection between both, you can seamlessly perform data visualization operations on data present in Amazon S3 with drag-and-drop flexibility.
- Now, open the Tableau Desktop that was previously installed and configured on your local machine.
- Navigate to Connect > More, and search for Amazon Athena, as shown in the above image.
- Now, the Amazon Athena dialogue box will pop, where you have to enter the connection configuration details.
- In the Server field, enter your response in the athena <region>.amazonaws.com format. The <region> is nothing but your AWS availability zone. Then, enter the appropriate port number in the port field.
- In the S3 Staging Directory field, enter the path to the Amazon S3 location where you wish to store query results.
- You can find the S3 Staging Directory path on the Settings page of Athena Console, which is present in the Query result location field, as shown in the above image.
- Then, you have to enter the Access Key ID and Secret Access Key field. Enter the appropriate values associated with the AWS IAM.
- After filling in all the fields, click on the “Sign in” button.
- Now, you are directed to the data source pane of Tableau Desktop where you can see the previously created “student_view” and “student” tables.
- Drag and drop the “student_view” table from the left side panel to the workspace on the right side. Now, the respective table is ready for you to analyze and visualize using the Tableau Desktop.
Step 4: Analyzing Amazon S3 data using Tableau
- You can create a new worksheet named “country-wise” to analyze the student data based on the country column or field, as shown below.
- Then, you can create another worksheet named “age-wise” to analyze students’ age using the bar chart.
- You can merge or comprise the previously created country-wise and age-wise worksheets for easy visualization. In the Dashboard, choose the new dashboard option. Now, drag and drop the country-wise and age-wise worksheets from the left side panel.
- On following the above steps, you created the Tableau dashboard successfully by utilizing data present in the Amazon S3.
- You can also share the dashboard with your colleagues or anyone across the organization by publishing it. You must also configure the plan to update the Athena data sources utilized by the Tableau dashboard before publishing.
- There are two ways to configure the plan to refresh the Athena data sources with respect to time. One is a Live connection, and the other is Data extract.
- Tableau Live connections provide real-time updates, with any changes in the data source reflected in Tableau right away, while Data extracts are snapshots of data that are optimized into system memory and may be retrieved rapidly for viewing. In sophisticated or complex visualizations with vast datasets, filters, and computations, Data extracts are likely to be significantly quicker than live connections. Based on your use cases and preferences, you can choose the configuration plans.
- With Tableau Desktop, you can also view the raw SQL query generated from the visualizations that were created previously.
- In the Athena Console, click on the “History” tab. There you can see the auto-generated SQL queries for all the visualizations you created using Tableau.
- The following results are the query generated from the country-wise and age-wise visualizations.
SELECT "student_view"."age" AS "age",
"student_view"."sex" AS "sex",
SUM("student_view"."studytime") AS "sum:studytime:ok"
FROM "studentdb"."student_view" "student_view"
GROUP BY "student_view"."age",
SELECT "student_view"."country" AS "country",
SUM("student_view"."studytime") AS "sum:studytime:ok"
FROM "studentdb"."student_view" "student_view"
GROUP BY "student_view"."country"
On following the above-mentioned steps, you successfully established a Tableau S3 connection.
Method 2: Using Hevo Data to Connect Tableau S3
Hevo is a No-code Data Pipeline. It supports pre-built data integrations from 100+ data sources, including Tableau. Hevo offers a fully managed solution for your fully automated pipeline to set up Tableau S3 integration and will let you directly load data to Tableau from Amazon S3. It will automate your data flow in minutes without writing any line of code. Its fault-tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully automated solution to manage data in real-time and always have analysis-ready data at Tableau.
Now you can transfer data from your desired source to your target Destination for Free using Hevo!
Sign up here for a 14-Day Free Trial!
Hevo focuses on three simple steps to get you started
- Connect: Connect Hevo with Stripe and various other payments, sales & marketing data sources by simply logging in with your credentials.
- Integrate: Consolidate your payments & customer data from several sources in Hevo’s Managed Data Integration Platform and automatically transform it into an analysis-ready form.
- Visualize: Connect Hevo with your desired BI tool such as Tableau and easily visualize your unified payments and sales data to gain better insights.
As can be seen, you are simply required to enter the corresponding credentials to implement this fully automated data pipeline without using any code.
Let’s look at some salient features of Hevo
- Fully Managed: It requires no management and maintenance as Hevo is a fully automated platform.
- Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer.
- Real-Time: Hevo offers real-time data migration. So, your data is always ready for analysis.
- Schema Management: Hevo can automatically detect the schema of the incoming data and map it to the destination schema.
- Live Monitoring: Advanced monitoring gives you a one-stop view to watch all the activities that occur within pipelines.
- Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Explore more about Hevo by signing up for a 14-day free trial today.
In this article, you learned about Amazon S3, Tableau, steps to establish a Tableau S3 connection, and how to analyze S3 data using Tableau Desktop. This article mainly focused on integrating S3 and Tableau using the Athena JDBC connector or Driver. However, you can also use third-party drivers or subscribe to online data pipelining platforms to seamlessly integrate the Amazon S3 with Tableau like Hevo.
Visit our Website to Explore Hevo
Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources like Tableau and a wide variety of Desired Destinations, with a few clicks. Hevo Data with its strong integration with 100+ sources (including 40+ free sources) allows you to not only export data from your desired data sources & load it to the destination of your choice, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools.
Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.
Share your experience of learning about Tableau S3! Let us know in the comments section below!