Software engineer, data guy, Open Source enthusiast, New Hampshire resident, husband, father. Fan of guitars, hiking, photography, homebrewing, sarcasm.
Using PHP and MD5 to find duplicate images in iPhoto and view/compare the results
For this article I’ll share some old school procedural PHP scripts I used to scan a directory for duplicate images and display the results for comparison. A while back, I had a hardware failure and had to write some rsync commands to manually pull my iPhoto images off of a dying Time Machine external harddrive. The basic gist of these script is simple: find all the images, create a unique MD5 hash of the image, collect some other details, write the records to a MySQL database, execute some SQL to find MD5 duplicates, and show the results side by side for comparison. Since I was executing this code on my iMac, I used MAMP to provide the Apache and MySQL services.
The first script which will be included all the rest just sets up a MySQL database connection.
Script: db.php
I then created the script to find all the images, create the md5 hash, and store the data in MySQL. I put this script outside my Apache vhost docroot and only had to execute it once.
Script: scan.php
I then ran the script on the command line. It took a while to go through all 25K+ images in my directory.
The next script I wrote will aid in the display of the images. I wrote this script because the absolute path of my images was outside my Apache vhost docroot. It checks for 2 $_GET variables: the md5 hash and a integer representing which duplicate image to show. The images is read and displayed, so this script can be inserted into the “scr” attribute of an img tag.
Script: view-image.php
The last script ties everything together. It determines which duplicates exist and allows you to view them. For my environment, I decided to store the list of MD5 duplicates in the $_SESSION, to prevent repeat SQL.
Script: view.php
Now I went to my browser to execute the view.php script and view the results.