Data and Visualization are becoming an essential aspect of today’s digital world. Businesses often use data to interpret the current market scenario and make decisions based on that. Visualization helps companies analyze data in charts and graphs and allows them to make data-driven decisions. This blog post will discuss Tableau and connect via Python to perform various business transformations.
In this article, you will be provided with an introduction to Tableau, Python, and the procedure to integrate Python scripts with Tableau using TabPy.
What is Tableau?
Tableau is one of the leading visualization tools available in the market, with several in-house connectors that help users connect to various data sources. Users can create visual masterpieces with a perfect blend of graphical elements like charts, tables, colors, and labels to help the business make a market or data-driven decisions with a vast collection of functions.
Tableau is available as different products, and users can use it based on their requirements. The various product offerings are:
- Tableau Desktop
- Tableau Reader
- Tableau Public
- Tableau Online
- Tableau Server
Key Features of Tableau
- Tableau provides extensive features in Dashboard to perform Advanced Analytics on the data and allows users to create a visual masterpiece.
- With Tableau, users can collaborate sheets among colleagues and other team members to review the designs.
- Tableau supports real-time data as well as batch data with robust in-memory computation.
- Tableau has over 200+ connectors available in its library, which can connect to any relational and non-relational databases, CSV files, Excel, Hive, Snowflake, etc.
- Tableau allows users to develop advanced charts and graphs to create high-quality visuals.
What is Python?
Python is a high-level, interpreted, and dynamically typed programming language. It has a built-in data structure that allows users to dynamically type the code, thereby making it very attractive for Rapid Application Development. Python is platform-independent and can be used without change in major platforms.
Python has vast libraries to connect various tools, and hence it is becoming popular in automation and data science projects. One such module that we will be discussing here is TabPy which can be used to connect Python with Tableau.
Key Features of Python
- Python is easy to type and is very similar to typing in the English language with some mathematical semantics.
- Python doesn’t use braces to club the statement; instead, it uses indentation to define scope, and club statements for loops, functions, and classes.
- Python uses newlines to complete a command instead of other programming languages, which often use semicolons or parentheses.
- Python is platform-independent, which means code written on windows can run on a Unix machine without any effect.
- Python allows developers to write code in fewer lines as compared to other programming languages.
With the help of the TabPy module, Python can easily be connected to Tableau as an external service. It allows you to perform actions like row numbers, ranking, cleaning operations, and many more.
Pre-Requisites
- Download and install Python version 3.6 or later.
- Download and install the Tableau Python server.
- Install Pandas. Pandas data frame can be used to integrate with Tableau Prep Builder.
What is TabPy?
TabPy, an Analytics Extension from Tableau, allows users to execute Python scripts and saved functions using Tableau. With the integration of Python in Tableau, you can incorporate your Python Scripts and use Tableau to visualize the results from your analysis. TabPy utilizes a large number of Machine Learning libraries and allows you to perform customized analyses using Python.
You can further control the data you send to TabPy by working with Tableau Worksheet, or Dashboard. With this integration, you get the ability to customize dashboards, and also leverage the Analytics Capability of Tableau to get a better understanding of the project you’re working on.
Data Preparation is an important aspect of Data Analysis, and Data Scientists tend to spend a lot of their time doing that. This integration automates the Data Exploration and Data Visualization part for the developers to focus more on the Data Science logic.
Installing and Running TabPy
To use Python in Tableau, you need to install TabPy. You can download TabPy from the GitHub Repository Once the download is finished, run the command line to install TabPy by using pip:
pip install tabpy
Once the installation is successful, you can run TabPy using the following command:
tabpy
Now, you need to go to Tableau to configure the connection. Go to the “Help” menu from the navigation bar and choose “Settings and Performance“. Select “Manage Analytics Extension Connection“.
In the pop-up window, set up the Server and Port. Select “TabPy/External API” as the Analytics Extension. By default, TabPy will be running in your localhost on port 9004. Once done, click on “Test Connection“, and Tableau will display a “Successfully connected to the analytics extension.” message.
How to Use TabPy?
After Tableau is linked to TabPy, you can now execute different operations on your dataset using Python Script. To do so, you need to do the following:
- Import/load your data set into a new Worksheet in Tableau.
- Create a Calculated Field in Tableau.
- Insert the Python Script into the Calculated Field.
Passing Expressions to PythonThe expressions in Tableau must be passed through any one of the following four functions: SCRIPT_BOOL, SCRIPT_REAL, SCRIPT_INT, and SCRIPT_STR, to go to Python.
Procedure to Integrate Python Scripts with Tableau
To integrate Python scripts in the flow, you need to create a connection between Tableau and TabPy server. Once the connection is established, you can use Python scripts to apply functions to the data using pandas data frame.
You have to specify connection details to securely pass the data to the TabPy server, apply the transformation, and return the data in the data frame format.
Part 1: Connect Tableau Python Server & Tableau Server
Connecting Tableau Python Server and Tableau server is a straightforward process. Use the below instructions to set up the connection.
Step 1: Launch the shell to execute the below commands and enter the below commands:
tsm security maestro-tabpy-ssl enable --connection-type {maestro-tabpy/maestro-tabpy-secure} --tabpy-host <IP address or host name for TabPy> --tabpy-port <TabPy port> --tabpy-username <username> --tabpy-password <password> --tabpy-connect-timeout-ms <TabPy connect timeout>
Step 2: For secure connection use – {maestro-tabpy-secure}
and for unsecure connection use – {maestro-tabpy}
.
Step 3: You have to specify the certificate path by using -cf<certificate file path>
if you selected secure mode.
Step 4: Specify the timeout setting in the variable --tabpy-connect-timeout-ms <TabPy connect timeout>
in ms.
Step 5: To disconnect TabPy connection use the command - tsm security maestro-tabpy-ssl disable
.
Hevo Data, a No-code Data Pipeline helps to transfer data from 150+ sources to a Data Warehouse/Destination of your choice and visualize it in your desired BI tool such as Tableau. Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.
It provides a consistent & reliable solution to manage data in real-time and always have analysis-ready data in your desired destination. It allows you to focus on key business needs and perform insightful analysis using a BI tool of your choice.
Check out what makes Hevo amazing:
- Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects schema of incoming data and maps it to the destination schema.
- Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
- Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
- Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
- Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
- Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Simplify your data analysis with Hevo today! Sign up here for a 14-day free trial!
Part 2: Create Python Script to Connect Tableau Server
Below is the snippet of the function that connects to Tableau Prep Builder and calls the data. The process takes a data frame as input and outputs data frames.
def encode(input):
le = preproces2ing.LabelEncoder()
return pd.DataFrame({
'Opportunity Number' : input['Opportunity Number'],
'Supplies Subgroup Encoded' : le.fit_transform(input['Supplies Subgroup']),
'Region Encoded' : le.fit_transform(input['Region']),
'Route To Market Encoded' : le.fit_transform(input['Route To Market']),
'Opportunity Result Encoded' : le.fit_transform(input['Opportunity Result']),
'Competitor Type Encoded' : le.fit_transform(input['Competitor Type']),
'Supplies Group Encoded' : le.fit_transform(input['Supplies Group']),
})
The following data types are supported:
Data type in Tableau Prep Builder | Data type in Python |
String | Standard UTF-8 string |
Decimal | Double |
Int | Integer |
Bool | Boolean |
Date | Date in the format “YYYY-MM-DD” |
DateTime | Date time in the format “YYYY-MM-DDT:HH:mm:ss” with zone offset (optional) |
If you wish to return different fields that you get in input, a function named get_output_schema needs to be implemented to define output data types.
Following syntax can be used to specify the data types:
Function in Python | Resulting data type |
prep_string () | String |
prep_decimal () | Decimal |
prep_int () | Integer |
prep_bool () | Boolean |
prep_date () | Date |
prep_datetime () | DateTime |
Example for get_output_schema function:
def get_output_schema():
return pd.DataFrame({
'Opportunity Number' : prep_int(),
'Supplies Subgroup Encoded' : prep_int(),
'Region Encoded' : prep_int(),
'Route To Market Encoded' : prep_int (),
'Opportunity Result Encoded' : prep_int (),
'Competitor Type Encoded' : prep_int()
'Supplies Group Encoded' : prep_int()
})
Part 3: Connect Tableau Python (TabPy) Server
Step 1: To connect to the TabPy server, launch Tableau Python (TabPy) server.
Step 2: Click on Help > Settings and Performance > Manage Analytics Extension Connection.
Step 3: In the drop-down list, Select an Analytics Extension
drop-down list, select Tableau Python (TabPy) Server
.
Step 4: Provide the credentials as below:
- The default port would be 9004.
- Provide the username and password.
- Specify the SSL certification if your server uses it.
- On clicking Sign In, it will test the provided credentials, and if the credentials are valid, it will successfully sign in. Otherwise, it will throw an error.
Part 4: Add a Script to Your Flow
Step 1: TabPy requires an extension named tornado 5.1.1 version.
Step 2: Run the below command to check the installed tornado version and install the correct version:
pip list
//If the installed version is not 5.1.1, then run the following command to uninstall the package.
pip uninstall tornado
pip install tornado=5.1.1
Step 3: Start your TabPy server and open the Tableau prep builder.
Step 4: Click on the Add connection button.
Step 5: From the Home page, click on Create > Flow to connect to data.
Step 6: Select the file type or server from the list of connectors.
Step 7: Click the plus icon, and select Add Script from the context menu
Step 8: In the connection type section, select Tableau Python (TabPy) Server.
Step 9: In the File Name section, click Browse to select the script file.
Step 10: Enter the Function Name; then press Enter to run your Script.
That’s it; now you have connected Python with Tableau Server using TabPy.
Conclusion
In this blog post, we have discussed how to connect Tableau using Python (TabPy) to perform various business transformations.
Integrating and analyzing data from a huge set of diverse sources can be challenging, this is where Hevo comes into the picture. Hevo Data, a No-code Data Pipeline helps you transfer data from a source of your choice in a fully automated and secure manner without having to write the code repeatedly. Hevo with its strong integration with 150+ sources & BI tools, allows you to not only export & load Data but also transform & enrich your Data & make it analysis-ready in a jiffy.
Get started with Hevo today! Sign up here for a 14-day free trial!
Vishal Agarwal is a Data Engineer with 10+ years of experience in the data field. He has designed scalable and efficient data solutions, and his expertise lies in AWS, Azure, Spark, GCP, SQL, Python, and other related technologies. By combining his passion for writing and the knowledge he has acquired over the years, he wishes to help data practitioners solve the day-to-day challenges they face in data engineering. In his article, Vishal applies his analytical thinking and problem-solving approaches to untangle the intricacies of data integration and analysis.