/python/mySQL/r/javascript/

What I cannot create, I do not understand. — Richard Feynman

Cobena Isaac

How to Automate your Folder Cleanup

In our previous lesson, you built an intelligent file organizer that automatically sorted files into appropriate folders. Now, let’s take that one step further by cleaning up the unnecessary clutter that remains—empty folders and duplicate files.

Often, after organizing files, you’re left with a different problem: empty folders and annoying duplicates. Manually hunting for these is tedious. This script automates that final cleanup, scanning your directories to delete empty folders and remove duplicate files (like images, videos, and documents), ensuring your newly organized space is truly tidy

This lesson will show you how to make your workspace cleaner, lighter, and more efficient with just a few lines of Python.

Step 1: Import Required Modules

Let’s start by importing the libraries we need.

import os
import hashlib
import shutil

os — lets Python walk through directories and check files or folders.
hashlib — helps us generate a unique fingerprint (called a hash) for each file.
shutil — helps with file operations like deleting folders. We’ll use this to detect duplicates.

Step 2: Deleting Empty Folders

Often when organizing files, you end up with empty folders. Let’s write a function to help us automatically detect and remove them.

def delete_empty_folders(directory_path):
    """
    Deletes all empty folders in the specified directory.
    """
    print(f"\nChecking for empty folders in: {directory_path}")

    # Walk through directory from bottom up so we delete inner folders first
    for folder_path, subfolders, files in os.walk(directory_path, topdown = False):
        if not subfolders and not files:
            try:
                os.rmdir(folder_path)
                print(f"Deleted empty folder: {folder_path}")
            except Exception as e:
                print(f"Error deleting {folder_path}: {e}")

    print("Empty folder cleanup complete!\n")

We used os.walk() to loop through all folders.
The topdown=False parameter makes Python start from the deepest folders first — so we don’t miss nested empties.
os.rmdir() then removes the folder only if it’s empty.

You can test this function by creating an empty folders and running:

delete_empty_folders("/path/to/your/folder")

Step 3: Detecting Duplicate Files

Now let’s find duplicate files — files that have the exact same content, even if their names are different.

We can use hashing to find duplicates files. Think of it like this: Python’s hashlib.md5() generates a unique ID card for each file based on its content. If two files have the same ID card, they are perfect copies of each other.

Let’s write a helper function to generate a hash (a unique fingerprint) for each file’s content:

def get_file_hash(file_path):
    """
    Returns a unique MD5 hash for a file's content.
    """
    hash_md5 = hashlib.md5()
    try:
        with open(file_path, "rb") as f:
            for chunk in iter(lambda: f.read(4096), b""):
                hash_md5.update(chunk)
        return hash_md5.hexdigest()
    except Exception as e:
        print(f"Could not hash {file_path}: {e}")
        return None

What each part does:

Opens the file in binary mode ("rb").
Processes the file in 4KB chunks instead of loading the entire file into memory at once, preventing crashes or so large files don’t crash your system like videos - for chunk reading.
Feeds each chunk into the MD5 hashing algorithm, to build the fingerprint piece by piece as the file is read (i.e., updates the hash with each chunk).
Converts the completed hash into a readable hexadecimal string that serves as the file’s unique fingerprint for comparison, to returns the final hash string.
Handling error by safely catches any problems (like permission errors or missing files) and returns None instead of crashing the program.

This approach ensures reliable duplicate detection while being memory-efficient and robust enough to handle any file type or size.

Step 4: Finding and Removing Duplicates

Now that we can generate hash files, let’s scan the directory to find and remove duplicate files.

def remove_duplicates(directory_path):
    """
    Finds and removes duplicate files in a given directory.
    Keeps one copy of each unique file.
    """
    print(f"\nScanning for duplicate files in: {directory_path}")

    hashes = {}
    duplicates = []

    for folder_path, subfolders, files in os.walk(directory_path):
        for file_name in files:
            file_path = os.path.join(folder_path, file_name)
            file_hash = get_file_hash(file_path)

            if file_hash:
                if file_hash in hashes:
                    duplicates.append(file_path)
                else:
                    hashes[file_hash] = file_path

    # Delete duplicates safely
    if duplicates:
        print("\nDuplicate files found:")
        for file_path in duplicates:
            try:
                os.remove(file_path)
                print(f"🗑 Deleted duplicate: {file_path}")
            except Exception as e:
                print(f"Could not delete {file_path}: {e}")
    else:
        print("No duplicates found.")

    print("Duplicate cleanup complete!\n")

Explanation:

We use a dictionary to store each unique file hash we encounter (hashes) - Tracking the system.
When we find a file hash that’s already in our dictionary, Python knows it’s a duplicate - this is duplicate detection.
We add duplicate file paths to a list for processing later - (i.e., for safe collection).
We carefully loop through duplicates to remove or delete them one by one while handling error to safely remove the extra copies.

To be extra safe, you can also modify this code by replacing os.remove() with shutil.move() to move duplicates to a “Duplicates” folder instead of deleting them directly.

Step 5: Combine Everything into One Cleanup Script

Now, let’s combine both features (deleting empty folders and removing duplicates) into one automated cleanup function.

def cleanup_folder(directory_path):
    """
    Cleans up a directory by deleting empty folders and removing duplicate files.
    """
    print("="*50)
    print(f"Starting cleanup for: {directory_path}")
    print("="*50)

    remove_duplicates(directory_path)
    delete_empty_folders(directory_path)

    print("Folder cleanup complete!")

Now just call it inside the main block:

if __name__ == "__main__":
    target_directory = os.path.expanduser("~/Desktop")  # Example path
    # target_directory = "/path/to/test/folder"

    cleanup_folder(target_directory)

Step 6: Testing the Script

It’s time to test your script safely before using it on important folders.

Create a Test Environment, by making a new folder called test_cleanup. Inside this folder, create:
- A few empty folders (Empty1, Empty2).
- Some duplicate files (you can copy-paste the same image or text file several times).
Run your script from the terminal:
```
python folder_cleanup.py
```
Watch Python do its magic:
- The script will scan and delete all empty folders.
- It will identify and remove duplicate files, keeping only one copy of each.
- You’ll see real-time updates in the terminal showing what’s being cleaned

This test ensures your script works correctly before using it on your actual documents or photos folder.

Expected Output:

Scanning for empty folders in: /path/to/test_cleanup
 Deleted empty folder: /path/to/test_cleanup/Empty1
 Deleted empty folder: /path/to/test_cleanup/Empty2

Scanning for duplicate files in: /path/to/test_cleanup
Deleted duplicate: /path/to/test_cleanup/copy_of_photo.jpg


Cleanup complete! Removed 2 empty folders and 1 duplicate file.

Once you confirm everything works perfectly, you can confidently run the script on any cluttered directory.

In this lesson, you learned how to:

Use os.walk() to navigate folders.
Use hashlib to find duplicate files.
Delete empty folders and redundant files automatically.
Combine everything into a single cleanup function!

Your folder is now both organized and optimized — no wasted space, no clutter.

👉 Next step: Explore How to automate your cleanup to run daily with Python - Schedule File Organizer.

« Student Grade Prediction

Never miss a story — subscribe to my newsletter

python (17) computer vision (2) NLP (2) web development (2)