Python’s flexibility and dynamic capabilities make it the ideal choice for deployment and maintenance. Likewise, MS Excel is also termed a traditional BI (Business Intelligence) tool. By setting up Python Excel Automation, you can transfer data automatically from various sources to multiple destinations seamlessly.

This article provides a step-by-step guide to help you automate Excel with Python. It also gives you an overview of Python and Excel to understand these technologies better individually. Read along for more!

Steps to Set Up Python Excel Automation

You can follow the below steps to set Python Excel Automation up:

Step 1: Analyzing the Excel Dataset

The first step in Python Excel Automation is to analyze the Dataset. The Dataset used in this tutorial is a Sales Dataset. This Dataset is also available on Kaggle. As this Dataset is in .csv format, you need to change it to .xslx format. The data will be used to create the below report after setting up our Python Excel Automation.

Final Report
Image Source: towardsdatascience.com

Step 2: Making Pivot Tables using Pandas

The next step in Python Excel Automation is to design Pivot Tables. Before doing that, you need to import the following libraries:

import pandas as pd
import openpyxl
from openpyxl import load_workbook
from openpyxl.styles import Font
from openpyxl.chart import BarChart, Reference
import string

Pandas is used to read the Excel file, create the Pivot table and export it to Excel. You can then use the Openpyxl library in Python to write Excel formulas, make charts and spreadsheets in Python.

To read your Excel file, make sure that the file is in the same place where your Python script is located and run the following code in Excel:

excel_file = pd.read_excel('supermarket_sales.xlsx')
excel_file[['Gender', 'Product line', 'Total']]
  • The code uses the pandas library to read an Excel file named supermarket_sales.xlsx.
  • pd.read_excel() loads the data from the specified Excel file into a DataFrame called excel_file.
  • The expression excel_file[['Gender', 'Product line', 'Total']] selects specific columns from the DataFrame: Gender, Product line, and Total.
  • This filtered DataFrame will display only the selected columns, allowing for easier analysis of sales data related to gender and product lines.
  • This code is useful for examining relevant parts of the dataset without displaying all the information at once.

To create the pivot table, you need to access the excel_file data frame that you created previously. You can use the “.pivot_table()” to create the table. If you want to create a pivot table to show the total money split up for males and females you can run the below code:

report_table = excel_file.pivot_table(index='Gender',columns='Product line',values='Total',aggfunc='sum').round(0)
  • The code creates a pivot table from the excel_file DataFrame, summarizing sales data by Gender and Product line.
  • The index='Gender' argument specifies that the rows of the pivot table will be categorized by gender.
  • The columns='Product line' argument indicates that the columns of the pivot table will represent different product lines.
  • The values='Total' argument specifies that the data being aggregated in the table is the Total sales amount.
  • The aggfunc='sum' argument tells the pivot table to calculate the total sales for each combination of gender and product line, and round(0) rounds the resulting values to the nearest whole number.

Finally, to export the Pivot Table, we will use the “.to_excel()” method as shown below:

report_table.to_excel('report_2021.xlsx',sheet_name='Report',startrow=4)

The Excel folder is now exported in the same location as your Python scripts.

Automate Data Integration

With Hevo you can integrate data from 150+ sources to any destination of your choice effortlessly. Automating Excel with Python can be a tedious task, instead, you can move your data somewhere safe with zero maintenance.

Get Started with Hevo for Free

Step 3: Designing the Reports using Openpyxl

The next step in Python Excel Automation is to design the reports. To make the report, you need to use the “load_workbook” method, that is imported from Openpyxl and save it using the “.save()” method. This is shown below:

wb = load_workbook('report_2021.xlsx')
sheet = wb['Report']
# cell references (original spreadsheet)
min_column = wb.active.min_column
max_column = wb.active.max_column
min_row = wb.active.min_row
max_row = wb.active.max_row
  • The code uses the load_workbook function to open an Excel file named report_2021.xlsx.
  • The variable sheet references a specific sheet called Report within the loaded workbook.
  • The min_column variable stores the index of the first column that contains data in the active sheet.
  • The max_column variable stores the index of the last column that contains data in the active sheet.
  • The min_row and max_row variables store the indices of the first and last rows that contain data, respectively, in the active sheet.

Python Excel Automation allows you to create Excel charts using Pivot Tables. To create an Excel chart using a Pivot Table, you need to use the Barchart Module and to identify the position of the data and category values, you can use the Reference Module. Both of these were imported before in Step 1. You can write Excel-based formulas in Python, the same way you write them in Excel. An example of this is shown below:

sheet['B7'] = '=SUM(B5:B6)'
sheet['B7'].style = 'Currency

Step 4: Automating the Report with Python

The next step in Python Excel Automation is to automate your report. You can write all the code into one function so that it will be easy to automate the report. That code is shown below:

import pandas as pd
import openpyxl
from openpyxl import load_workbook
from openpyxl.styles import Font
from openpyxl.chart import BarChart, Reference
import string

def automate_excel(file_name):
   """The file name should have the following structure: sales_month.xlsx"""
   # read excel file
   excel_file = pd.read_excel(file_name)
   # make pivot table
   report_table = excel_file.pivot_table(index='Gender', columns='Product line', values='Total', aggfunc='sum').round(0)
   # splitting the month and extension from the file name
   month_and_extension = file_name.split('_')[1]
   # send the report table to excel file
   report_table.to_excel(f'report_{month_and_extension}', sheet_name='Report', startrow=4)
   # loading workbook and selecting sheet
   wb = load_workbook(f'report_{month_and_extension}')
   sheet = wb['Report']
   # cell references (original spreadsheet)
   min_column = wb.active.min_column
   max_column = wb.active.max_column
   min_row = wb.active.min_row
   max_row = wb.active.max_row
   # adding a chart
   barchart = BarChart()
   data = Reference(sheet, min_col=min_column+1, max_col=max_column, min_row=min_row, max_row=max_row) #including headers
   categories = Reference(sheet, min_col=min_column, max_col=min_column, min_row=min_row+1, max_row=max_row) #not including headers
   barchart.add_data(data, titles_from_data=True)
   barchart.set_categories(categories)
   sheet.add_chart(barchart, "B12") #location chart
   barchart.title = 'Sales by Product line'
   barchart.style = 2 #choose the chart style
   # applying formulas
   # first create alphabet list as references for cells
   alphabet = list(string.ascii_uppercase)
   excel_alphabet = alphabet[0:max_column] #note: Python lists start on 0 -> A=0, B=1, C=2. #note2 the [a:b] takes b-a elements
   # sum in columns B-G
   for i in excel_alphabet:
       if i!='A':
           sheet[f'{i}{max_row+1}'] = f'=SUM({i}{min_row+1}:{i}{max_row})'
           sheet[f'{i}{max_row+1}'].style = 'Currency'
   sheet[f'{excel_alphabet[0]}{max_row+1}'] = 'Total'
   # getting month name
   month_name = month_and_extension.split('.')[0]
   # formatting the report
   sheet['A1'] = 'Sales Report'
   sheet['A2'] = month_name.title()
   sheet['A1'].font = Font('Arial', bold=True, size=20)
   sheet['A2'].font = Font('Arial', bold=True, size=10)
   wb.save(f'report_{month_and_extension}')
   return
  • The automate_excel function reads an Excel file containing sales data and creates a pivot table summarizing total sales by gender and product line.
  • It splits the file name to extract the month and saves the pivot table to a new Excel file named report_{month}.
  • The function adds a bar chart to the Excel sheet that visually represents sales by product line.
  • It calculates the total sales for each product line and formats the report with a title and month name, applying currency style to the totals.
  • Finally, the modified report is saved back to the same Excel file.

Step 5: Scheduling the Python Script

The final step in Python Excel Automation is to run the Python script at different schedules as per data requirements. You just need to use the task scheduler or cron on Windows and Mac respectively.

That’s it! You have successfully set up Python Excel Automation in 5 easy steps!

How to Create and Perform Modifications in a New Workbook

A new spreadsheet, known as a workbook in Openpyxl, is initially generated. After importing the Openpyxl workbook module, we use the Workbook() function to create a new workbook.

The Workbook Import Process

The function load_Workbook(), which takes the filename as an argument, is used to load the file. The file must be kept in the same working directory.

Modifying the Name of the Sheet

Once the workbook has loaded, choose the active sheet. You may modify the workbook’s sheet name by using sheet.title.

How to Alter the Sheet Name Tab’s Colour

You may adjust the tab colour using sheet._properties.tabColor.

How to Interpret a Workbook Cell Value

You may read cell value by using the.value suffix.

How to Use Python to Combine Two Different Spreadsheets

Working on several spreadsheets is a common task for data analysts, which can grow complicated when merging such files. The following code facilitates the merging of two distinct files.

import pandas as pd
# Read in the two Excel files
file1 = pd.read_excel('file1.xlsx')file2 = pd.read_excel('file2.xlsx')
# Merge the two files using the concat() method
merged_file = pd.concat([file1, file2], ignore_index=True)
# Write the merged file to a new Excel file
merged_file.to_excel('merged_file.xlsx', index=False)

To read in and work with the Excel files, we first import the Pandas library into our code.

How to Use Python to Import and Export Data

This work entails reading Excel files into a DataFrame object using Python packages like Pandas. Python may then be used to modify and examine it.

With the same libraries, you can export data from Python back into an Excel file.

import pandas as pd
# Import Excel file

df = pd.read_excel('filename.xlsx', sheet_name='Sheet1')

# Export to Excel file
df.to_excel('new_filename.xlsx', index=False)

In essence, the code creates a new file with the contents of the old Excel file copied across.

Using Python to Clean and Transform Data

This assignment involves cleaning and transforming data in Excel using Python modules like Pandas.

This might entail purging the data of duplicates, filtering it according to predetermined standards, and calculating it.

import pandas as pd
# Remove duplicates
df = df.drop_duplicates()
# Filter data
df = df[df['column_name'] > 10]
# Perform calculations
df['new_column'] = df['column1'] + df['column2']

Using the Pandas library, the code snippet above manipulates and cleans data on a Pandas dataframe called “df.”

How to Use Python for Data Analysis

In order to complete this work, you will need to use Python tools like NumPy and Pandas to analyse Excel data.

This might be figuring up summary statistics like mean and standard deviation or assembling data into unique reports according to predetermined standards.

import pandas as pd
import numpy as np
# Calculate summary statistics
df.describe()
# Create custom reports
df.pivot_table(values='column_name', index='category_name', columns='date')

The code analyses and reports data on a Pandas dataframe called “df” by using the Pandas and NumPy libraries.

Learn More About:

How to Load Data into Python

Conclusion

This article gave you a step-by-step guide on setting up Python Excel Automation. It also gave you an overview of both Python and MS Excel, along with their features. Overall, Python Excel Automation is an innovative process that you can use to create visual reports on Python just the same way as you would on Excel seamlessly. Organizations can leverage Python Excel Automation to leverage their business tasks according to their needs.

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

Want to take Hevo for a spin? Sign up here for a 14-day free trial and experience the feature-rich Hevo suite first hand.

FAQ on Automate Excel with Python

Can Python be used for Excel automation?

Yes, Python can be effectively used for Excel automation and report generation.

How to automate reports in Excel using Python?

1. Using Openpyxl Library
2. Using Pandas Library

Can Python generate an Excel file?

You can create entirely new Excel files or modify existing ones using Python libraries like Openpyxl or xlxswriter.

Can Python replace VBA?

Python can replace VBA (Visual Basic for Applications) for many tasks, especially when dealing with complex data processing, automation, or integration with other systems.

Can we run Python script in Excel?

Yes, you can run Python scripts within Excel using various methods.

Aakash Raman
Business Associate, Hevo Data

Aakash is a research enthusiast who was involved with multiple teaming bootcamps including Web Application Pen Testing, Network and OS Forensics, Threat Intelligence, Cyber Range and Malware Analysis/Reverse Engineering. His passion to the field drives him to create in-depth technical articles related to data industry. He holds a Undergraduate Degree from Vellore Institute of Technology in Computer Science & Engineering with a Specialization in Information Security and is keen to help data practitioners with his expertise in the related topics.