TIL Simple Merge of two CSVs with Python

I generate a lot of CSVs for my jobs, mostly as a temporary storage mechanism for data. So I make report A about this thing, I make report B for that thing and then I produce some sort of consumable report for the organization at large. Part of this is merging the CSVs so I don't need to overload each scripts to do all the pieces.

For a long time I've done this in Excel/LibreOffice, which totally works. But I recently sat down with the pandas library and I had no idea how easy it is use for this particular use case. Turns out this is a pretty idiot-proof way to do the same thing without needing to deal with the nightmare that is Excel.

Steps to Run

  1. Make sure Python is installed
  2. Run python3.13 -m venv venv
  3. source venv/bin/activate
  4. pip install pandas
  5. Change file_one to the first file you want to consider. Same with file_two
  6. The most important thing to consider here: I only want the output if the value in the column is in BOTH files. If you want all the output from file_one and then enrich it with the values from file_two if it is present, change how='inner' to how='left'
import pandas as pd
import os

# Define the filenames
file_one = 'one.csv'
file_two = 'two.csv'
output_file = 'combined_report.csv'

# Define the column names to use for joining
# These should match the headers in your CSVs exactly
deploy_join_col = 'Deployment Name'
stacks_join_col = 'name'

try:
    # Check if input files exist
    if not os.path.exists(file_one):
        raise FileNotFoundError(f"Input file not found: {file_one}")
    if not os.path.exists(file_two):
        raise FileNotFoundError(f"Input file not found: {file_two}")

    # Read the CSV files into pandas DataFrames
    print(f"Reading {file_one}...")
    df_deploy = pd.read_csv(file_one)
    print(f"Read {len(df_deploy)} rows from {file_one}")

    print(f"Reading {file_two}...")
    df_stacks = pd.read_csv(file_two)
    print(f"Read {len(df_stacks)} rows from {file_two}")

    # --- Data Validation (Optional but Recommended) ---
    if deploy_join_col not in df_deploy.columns:
        raise KeyError(f"Join column '{deploy_join_col}' not found in {file_one}")
    if stacks_join_col not in df_stacks.columns:
        raise KeyError(f"Join column '{stacks_join_col}' not found in {file_two}")
    # ----------------------------------------------------

    # Perform the inner merge based on the specified columns
    # 'how="inner"' ensures only rows with matching keys in BOTH files are included
    # left_on specifies the column from the left DataFrame (df_deploy)
    # right_on specifies the column from the right DataFrame (df_stacks)
    print(f"Merging dataframes on '{deploy_join_col}' (from deployment) and '{stacks_join_col}' (from stacks)...")
    df_combined = pd.merge(
        df_deploy,
        df_stacks,
        left_on=deploy_join_col,
        right_on=stacks_join_col,
        how='inner'
    )
    print(f"Merged dataframes, resulting in {len(df_combined)} combined rows.")

    # Sort the combined data by the join column for grouping
    # You can sort by either join column name as they are identical after the merge
    print(f"Sorting combined data by '{deploy_join_col}'...")
    df_combined = df_combined.sort_values(by=deploy_join_col)
    print("Data sorted.")

    # Write the combined and sorted data to a new CSV file
    # index=False prevents pandas from writing the DataFrame index as a column
    print(f"Writing combined data to {output_file}...")
    df_combined.to_csv(output_file, index=False)
    print(f"Successfully created {output_file}")

except FileNotFoundError as e:
    print(f"Error: {e}")
except KeyError as e:
     print(f"Error: Expected column not found in one of the files. {e}")
     print(f"Please ensure the join columns ('{deploy_join_col}' and '{stacks_join_col}') exist and are spelled correctly in your CSV headers.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

Just a super easy to hook up script that has saved me a ton of time from having to muck around with Excel.