How to merge multiple Excel files using Python

Luca Liu
4 min readMay 26, 2022

Introduction

In today’s data-driven landscape, consolidating information scattered across multiple Excel files is a common challenge. Fortunately, Python provides powerful tools to streamline this process, offering a more efficient way to merge and organize data. In this guide, we’ll explore how to leverage Python’s capabilities to effortlessly combine multiple Excel files into a cohesive and structured dataset.

In the world of data, things get messy when you have multiple Excel files with the same columns. If you’re looking to tidy up and merge them all into one file, especially when you need to go through them one by one, you’re in the right place. This guide will show you an easy way to make sense of it all and keep your data organized.

Python Solution

Python solution with a function

Merge multiple Excel files effortlessly with this Python code! 🚀 Using ‘pandas’ and ‘os’, the script navigates through files in a specified folder, combining them into a neat ‘merged_excel.xlsx’ file. Just plug in your folder path, run the code, and voila — streamlined data! 📊💻

# import packages
import pandas as pd
import os

# Define a function 'append' to merge Excel files in a specified path
def append(path):
frames = []
for root, dirs, files in os.walk(path):
for file in files:
file_with_path = os.path.join(root, file)
df = pd.read_excel(file_with_path)
frames.append(df)
df = pd.concat(frames, axis=0)
return df

# use the function
df = append(path) # path:The folder path where storage all the excel files
df.to_excel("merged_excel.xlsx")

In this code snippet, we’re using two powerful tools: ‘pandas’ and ‘os’ (a module for working with the operating system).

The ‘append’ function is the star here. It digs through all the Excel files in a specified folder (‘path’) and collects them into a ‘DataFrame,’ which is like a neat table for our data.

Now, for the magic moment: the last two lines! They use our ‘append’ function to merge all the Excel data in the specified folder into one consolidated file called ‘merged_excel.xlsx.’

Python solution without a function

Here’s a quick and easy way to merge multiple Excel files using Python without the need to write a function. Simply copy and paste the code into your Python environment, replace ‘path’ with the folder path where your Excel files are stored, and run the script.

import pandas as pd
import os

frames = []
# replace 'path' with the folder path where your Excel files are stored
for root, dirs, files in os.walk(path):
for file in files:
file_with_path = os.path.join(root, file)
df = pd.read_excel(file_with_path)
frames.append(df)
df = pd.concat(frames, axis=0)
df.to_excel("merged_excel.xlsx")

Bonus: Handling Unique Data Situations

While we’ve explored solutions for straightforward data merging, let’s navigate a slight detour to address more nuanced scenarios.

Picture this: your Excel files carry distinct dates in their names, demanding not just consolidation but a thoughtful integration into a final summary so that you can know the date of data in the final version.

Situation 1: Date-Driven Excel Filenames: Crafting a Comprehensive Summary Excel

For example, if the file name follows a pattern like “filename_YYYY-MM-DD.xlsx”, and you want to extract both the date and the filename, you can adapt the code as follows:

import pandas as pd
import os

def append(path):
frames = [] # Create an empty list to store individual DataFrames
for root, dirs, files in os.walk(path):
for file in files:
file_with_path = os.path.join(root, file)

# Extract filename and date information from the file name
file_info = os.path.splitext(file)[0].split('_')

# Read each Excel file using pandas
df = pd.read_excel(file_with_path)

# Add filename and date columns to the DataFrame
df["filename"] = file_info[0] # Assuming filename is the first part
df["date"] = pd.to_datetime(file_info[1]) # Assuming date is the second part

# Append the DataFrame to the list
frames.append(df)

# Concatenate all DataFrames in the list along the rows
df = pd.concat(frames, axis=0)

return df

Situation 2: Streamlining Data: Merging Multiple Excel Files into One with Distinct Sheets Named After Each File

Now, let’s pivot to another scenario. Suppose your goal is a consolidated master file, but with a twist — each file contributes to a separate sheet, named after its original file.

import pandas as pd
import os

def combine(path):
with pd.ExcelWriter("merged_excel.xlsx") as writer:
for root, dirs, files in os.walk(path):
for file in files:
filename = os.path.join(root, file)
df = pd.read_excel(filename)
# Delete the file name suffix, sometimes it could be xlsv/.xlsx
df.to_excel(writer, sheet_name=file.strip(
'.csv'))
return df

Thank you for taking the time to explore data-related insights with me. I appreciate your engagement. If you find this information helpful, I invite you to follow me or connect with me on LinkedIn or X(@Luca_DataTeam). You can also catch glimpses of my personal life on Instagram, Happy exploring!👋

--

--

Luca Liu

Hello there! 👋 I'm Luca, a Business Intelligence Developer with passion for all things data. Proficient in Python, SQL, Power BI, Tableau, SAP BO.