background image
HomeRecent PostsDrupalSearchTagsRSSContactAboutAccount
Eric.London's picture

For this article, I'll share some old school procedural PHP scripts I used to scan a directory for duplicate images and display the results for comparison. A while back, I had a hardware failure and had to write some rsync commands to manually pull my iPhoto images off of a dying Time Machine external harddrive. The basic gist of these script is simple: find all the images, create a unique MD5 hash of the image, collect some other details, write the records to a MySQL database, execute some SQL to find MD5 duplicates, and show the results side by side for comparison. Since I was executing this code on my iMac, I used MAMP to provide the Apache and MySQL services.

The first script, which will be included all the rest just sets up a MySQL database connection.

Script: db.php

<?php
// define mysql credentials
$db_user = 'picture_data';
$db_pass = 'picture_data';
$db_database = 'picture_data';
$db_table = 'picture_data';
$db_host = 'localhost';

// connect to mysql database
$db = mysql_connect($db_host, $db_user, $db_pass);

// check for mysql connection
if (!$db) {
  die(
'Could not connect to database.');
}
?>

I then created the script to find all the images, create the md5 hash, and store the data in MySQL. I put this script outside my Apache vhost docroot and only had to execute it once.

Script: scan.php

<?php
//////////////////////////////////////////////////
// DATABASE SETUP

require_once('db.php');

setup_database();

//////////////////////////////////////////////////
// PROCESSING IMAGES

// specify path to images
$images_path = '/Users/Eric/Pictures/iPhoto Library/Originals';

// ensure directory exists
if (!is_dir($images_path)) {
  die(
'Directory does not exist.');
}

// change directory
chdir($images_path);

// get a list of files
$files = `find . -type f | sed 's/^\.\///'`;

// explode files list on newline
$files = explode("\n", trim($files));

// define a list of file extensions to process
$file_extensions = array(
 
'jpg',
 
'jpeg',
 
'png',
 
'bmp',
 
'gif',
 
'tiff',
);

// loop through files
foreach ($files as $file_path) {

 
// get path info
 
$path_info = pathinfo($file_path);
 
$file_name = $path_info['basename'];
 
$file_extension = strtolower($path_info['extension']);
 
 
// check file extension
 
if (!in_array($file_extension, $file_extensions)) {
    continue;
  }

 
// get md5 hash of file
 
$file_md5 = md5_file($file_path);

 
// get file modified time
 
$file_modified = date('Y-m-d H:i:s', filemtime($file_path));

 
// create sql to insert record
 
$sql = sprintf(
   
"insert into `%s` (file_path, file_name, file_extension, file_md5, file_modified) values ('%s','%s','%s','%s','%s')",
   
mysql_real_escape_string($db_table),
   
mysql_real_escape_string($images_path . '/' . $file_path),
   
mysql_real_escape_string($file_name),
   
mysql_real_escape_string($file_extension),
   
mysql_real_escape_string($file_md5),
   
mysql_real_escape_string($file_modified)
  );

 
// execute sql
 
$result = mysql_query($sql, $db);

}

//////////////////////////////////////////////////
// FUNCTIONS

function setup_database() {

  global
$db;
  global
$db_database;
  global
$db_table;

 
// create database if it is does not exist
 
$sql = sprintf(
   
"create database if not exists `%s`",
   
mysql_real_escape_string($db_database)
  );
 
$result = mysql_query($sql, $db);
 
 
// check for error
 
if (!$result) {
    die(
mysql_error());
  }
 
 
// select database
 
$result = mysql_select_db($db_database, $db);
 
 
// check for error
 
if (!$result) {
    die(
mysql_error());
  }
 
 
// create table if it does not exist
 
$sql = sprintf("
    CREATE TABLE IF NOT EXISTS `%s` (
      `fid` int(11) NOT NULL AUTO_INCREMENT,
      `file_path` varchar(255) NOT NULL,
      `file_name` varchar(255) NOT NULL,
      `file_extension` varchar(10) NOT NULL,
      `file_md5` varchar(32) NOT NULL,
      `file_modified` datetime NOT NULL,
      PRIMARY KEY (`fid`),
      KEY `idx_file_md5` (`file_md5`)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1"
,
   
mysql_real_escape_string($db_table)
  );
 
$result = mysql_query($sql, $db);
 
 
// check for error
 
if (!$result) {
    die(
mysql_error());
  }
 
 
// drop existing records from table
 
$sql = sprintf(
   
"truncate table `%s`",
   
mysql_real_escape_string($db_table)
  );
 
$result = mysql_query($sql, $db);
 
 
// check for error
 
if (!$result) {
    die(
mysql_error());
  }

}
?>

I then ran the script on the command line. It took a while to go through all 25K+ images in my directory.

$ php scan.php

The next script I wrote will aid in the display of the images. I wrote this script because the absolute path of my images was outside my Apache vhost docroot. It checks for 2 $_GET variables: the md5 hash and a integer representing which duplicate image to show. The images is read and displayed, so this script can be inserted into the "scr" attribute of an img tag.

Script: view-image.php

<?php
//////////////////////////////////////////////////
// DATABASE

require_once('db.php');

// select database
$result = mysql_select_db($db_database, $db);

// check for error
if (!$result) {
  die(
mysql_error());
}

//////////////////////////////////////////////////
// PROCESS REQUEST

$md5 = $_GET['md5'];
$index = intval($_GET['index']);

// fetch images with md5 index
$sql = sprintf("
  select *
  from `%s`
  where file_md5 = '%s'
  order by fid asc
  "
,
 
mysql_real_escape_string($db_table),
 
mysql_real_escape_string($md5)
);

$result = mysql_query($sql, $db);

// check for error
if (!$result) {
  die(
mysql_error());
}

// fetch results
$rows = array();
while (
$row = mysql_fetch_object($result)) {
 
$rows[] = $row;
}

// get image data
$file_path = $rows[$index]->file_path;
$file_extension = $rows[$index]->file_extension;

header("Content-type: image/$file_extension");
readfile($file_path);
?>

The last script ties everything together. It determines which duplicates exist and allows you to view them. For my environment, I decided to store the list of MD5 duplicates in the $_SESSION, to prevent repeat SQL.

Script: view.php

<?php
//////////////////////////////////////////////////
// DATABASE SETUP

require_once('db.php');

// select database
$result = mysql_select_db($db_database, $db);

// check for error
if (!$result) {
  die(
mysql_error());
}

//////////////////////////////////////////////////
// FETCHING MD5S

// start session
session_start();

// check for session data
if (!is_array($_SESSION['md5s']) || empty($_SESSION['md5s'])) {
 
fetch_md5s();
}

// determine which md5 to show
$md5_index = intval($_GET['md5_index']);

// fetch images with md5 index
$sql = sprintf("
  select *
  from `%s`
  where file_md5 = '%s'
  order by fid asc
  "
,
 
mysql_real_escape_string($db_table),
 
mysql_real_escape_string($_SESSION['md5s'][$md5_index])
);

$result = mysql_query($sql, $db);

// check for error
if (!$result) {
  die(
mysql_error());
}

// fetch results
$rows = array();
while (
$row = mysql_fetch_object($result)) {
 
$rows[] = $row;
}

// create image output in a table. note the image scr is calling the view-image.php script with $_GET arguments.
$output = "";
$output .= "<table><tr>";
foreach (
$rows as $index => $data) {
 
$output .= "<td style='width: " . (100/count($rows)) . "%'>";
 
$output .= "<img style='width: 100%' src='/view-image.php?md5=" . $data->file_md5 . "&index=" . $index . "' />";
 
$output .= $data->file_name . "<br/>";
 
$output .= $data->file_path . "<br/>";
 
$output .= "</td>";
}
$output .= "</tr></table>";

$output .= "<a href='/view.php?md5_index=" . ($md5_index+1) . "'>Next >></a>";

print
$output;

//////////////////////////////////////////////////
// FUNCTIONS

function fetch_md5s() {

  global
$db;
  global
$db_table;

 
// get a list of md5 hashes with dupes
 
$sql = sprintf("
    select file_md5
    from `%s`
    group by file_md5
    having count(*) > 1
    "
,
   
mysql_real_escape_string($db_table)
  );
 
 
$result = mysql_query($sql, $db);
 
 
// check for error
 
if (!$result) {
    die(
mysql_error());
  }
 
 
// fetch results
 
$md5s = array();
  while (
$row = mysql_fetch_object($result)) {
   
$md5s[] = $row->file_md5;
  }
 
 
// store md5s in session
 
$_SESSION['md5s'] = $md5s;

}
?>

Now, I went to my browser to execute the view.php script and view the results.

Picture Duplicates

My Drupal photo gallery implements a total of 9 imagecache presets, but due to a PHP memory_limit cap of 90MB in my previous hosting contract, I frequently encountered issues generating images.

Since I was on a shared hosting plan, PHP was executed as CGI and a service ran on the server to kill PHP scripts that reached the PHP memory_limit. This resulted in the following error, which prevented imagecache from creating missing images and redirecting the user properly.

Premature end of script headers: index.php

After changing my hosting plan and increasing my server resources, I decided to write a PHP script to programmatically generate all the missing images. This script when executed on the shell using Drush, will get a list of every image in the files directory, and make an HTTP request for each imagecache preset URL.

<?php
// store original working directory
define('ORIGINAL_DIRECTORY', getcwd());

// define site http host
define('HTTP_HOST','pics.ericlondon.com');

// get imagecache presets data
$presets_data = imagecache_presets();

// loop through presets data and collect preset names
$presets = array();
foreach (
$presets_data as $key => $value) {
 
$presets[] = $value['presetname'];
}

// get a list of pictures
chdir(file_directory_path());
// NOTE: the following line uses find, grep, and sed to generate a list of image files.
// It will vary depending on which file extensions to include and which directories to ignore
$command = 'find . -type f | egrep -ir "\.(gif|jpeg|jpg|png)$" | sed "s/^\.\///" | egrep -iv "^(imagecache|imagefield_thumbs)\/"';
$output = `$command`;
$files = explode("\n", trim($output));
chdir(ORIGINAL_DIRECTORY);

// loop through file list
foreach ($files as $file) {

 
// loop through each preset
 
foreach ($presets as $preset) {
 
   
// define url
   
$url = 'http://' . HTTP_HOST . '/' . file_directory_path() . '/imagecache/' . $preset . '/' . $file;
   
   
// request url
   
$http_request_data = drupal_http_request($url);
   
   
// log entry
   
$log_entry = $http_request_data->code . " " . $http_request_data->status_message . " " . $url;
   
file_put_contents('ic_preset_log.txt', $log_entry . "\n", FILE_APPEND);
 
  }

}
?>

I executed this script on the shell using the following drush command:

drush scr generate_presets.php

The script logs every request and can be reviewed afterward..

I created a Drupal site to host my photography in CCK Imagefield nodes and used Lucene to enhance my search functionality. By default Drupal's search results are text-based so I decided to add some code to show image thumbnails in my search results. I checked out Drupal Lucene's hooks and decided to implement a hook_luceneapi_result_alter() function in my existing module.

<?php
function MYMODULE_luceneapi_result_alter(&$result, $module, $type = NULL) {
 
 
// check for node results
 
if ($type == 'node') {
 
   
// check node type
   
if ($result['node']->type == 'image') {
   
     
// define an imagecache image path for image thumbnail
     
$imagecache_path_thumbnail = file_directory_path() . '/imagecache/thumbnail' . str_replace(file_directory_path(),'',$result['node']->field_image[0]['filepath']);     
     
     
// define an imagecache image path for image (large)
     
$imagecache_path_large = file_directory_path() . '/imagecache/large' . str_replace(file_directory_path(),'',$result['node']->field_image[0]['filepath']);
   
     
// define theme_image() variables
     
$alt = check_plain($result['node']->title);
     
$title = check_plain($result['node']->title);
     
// add rel=lightbox to enable lightbox2 module
     
$attributes = array(
       
'rel' => 'lightbox',
      );
     
// let imagecache define the size
     
$getsize = FALSE;
     
// generate the image hml
     
$image_html = theme('image', $imagecache_path_thumbnail, $alt, $title, $attributes, $getsize);     
   
      if (
$image_html) {
               
       
// define lightbox link
       
$image_link = l(
         
$image_html,
         
$imagecache_path_large,
          array(
           
'html' => true,
           
'attributes' => array(
             
'rel' => 'lightbox',
            )
          )
        );

       
// add data to the result variable, passed by reference
       
$result['image_thumbnail'] = $image_link;
       
      }
   
    }
 
  }

}
?>

The above code adds additional data to my search results variables. I then implemented a hook_preprocess_search_result() function in my theme's template.php file to pass this data to the search-result.tpl.php template file.

<?php
function MYTHEME_preprocess_search_result(&$variables) {

 
// ...snip...

  // check for lucene node search results
 
if ($variables['type']=='luceneapi_node') {

   
// check for image
   
if ($variables['result']['image_thumbnail']) {   

     
// pass additional data to theme template file
     
$variables['image_thumbnail'] = $variables['result']['image_thumbnail'];

    }
   
  }

}
?>

And in my theme's search-result.tpl.php template file, I added the following PHP to show the new variable.

<div class="search-result <?php print $search_zebra; ?>">

  <?php if($image_thumbnail): ?>
    <?php print $image_thumbnail; ?>
  <?php endif; ?>

  <!-- ...snip... -->

I also added a few lines of CSS in my theme's style.css file to tidy up the layout.

.search-results.luceneapi_node-results .search-result {
  clear: both;
}

.search-results.luceneapi_node-results .search-result img {
  float: left;
  margin: 0px 20px 20px 0px;
}

The visual results can be seen here on my photo gallery.

Visual search results

In this tutorial I'll show you how to upload an image using the Forms API, create a new node, and attach the image to the CCK (filefield/imagefield) field. I wrote this code to work with the modules I primarily use for image processing: cck, filefield, imageapi, imagecache, imagefield, mimedetect, and transliteration.

After I installed those modules, I created a new node type (admin/content/types/add) called "Image" and added a single imagefield field.

Image node fields

Next, I created a custom module with a hook_menu() implementation:

<?php
// NOTE: this variable is used through the code,
// so I thought it would be better to put it in a constant
define('IMAGE_UPLOAD_CONTAINER', 'image_upload');

/**
* Implements hook_menu()
*/
function helper_menu() {

 
// create a blank array of menu items
 
$items = array();
 
 
// define page callback for upload form
  // NOTE: you'll want to restrict permission better [see: access arguments]
 
$items['upload'] = array(
   
'title' => t('Upload'),
   
'description' => t('Upload'),
   
'page callback' => 'drupal_get_form',
   
'page arguments' => array('helper_page_callback_upload_form'),
   
'access arguments' => array('access content'),
   
'type' => MENU_CALLBACK,
  );
 
 
// return menu items
 
return $items;

}
?>

I defined the form function page callback:

<?php
/**
* Implements page callback for upload form
*/
function helper_page_callback_upload_form() {

 
// create an empty form array
 
$form = array();
 
 
// set the form encoding type
 
$form['#attributes']['enctype'] = "multipart/form-data";
 
 
// add a file upload file
 
$form[IMAGE_UPLOAD_CONTAINER] = array(
   
'#type' => 'file',
   
'#title' => t('Upload an image'),
  );
  
 
// add a submit button
 
$form['submit'] = array(
   
'#type' => 'submit',
   
'#value' => 'Submit',
  );
 
 
// return form array
 
return $form;

}
?>

This page callback function results in the following form:

Image node form

Then I added the form validation and submit handler functions:

<?php
/**
* Implements form validation handler
*/
function helper_page_callback_upload_form_validate($form, &$form_state) {

 
// if a file was uploaded, process it.
 
if (isset($_FILES['files']) && is_uploaded_file($_FILES['files']['tmp_name'][IMAGE_UPLOAD_CONTAINER])) {

   
// validate file extension
    // NOTE: you can ellaborate on this code and add additional validation
   
if ($_FILES['files']['type'][IMAGE_UPLOAD_CONTAINER] != 'image/jpeg') {
     
form_set_error(IMAGE_UPLOAD_CONTAINER, 'Invalid file extension.');
      return;
    }

   
// attempt to save the uploaded file
   
$file = file_save_upload(IMAGE_UPLOAD_CONTAINER, array(), file_directory_path());

   
// set error if file was not uploaded
   
if (!$file) {
     
form_set_error(IMAGE_UPLOAD_CONTAINER, 'Error uploading file.');
      return;
    }
      
   
// set files to form_state, to process when form is submitted
   
$form_state['storage'][IMAGE_UPLOAD_CONTAINER] = $file;
      
  }
  else {
   
// set error
   
form_set_error(IMAGE_UPLOAD_CONTAINER, 'Error uploading file.');
    return;  
  }

}

/**
* Implements form submit handler
*/
function helper_page_callback_upload_form_submit($form, &$form_state) {
 
 
// create new node object
 
$new_node = (object) array(
   
'type' => 'image',
   
'uid' => $GLOBALS['user']->uid,
   
'name' => $GLOBALS['user']->name,
   
'title' => t('YOUR NODE TITLE'),
   
'status' => 1,
   
'field_image' => array(
      (array)
$form_state['storage'][IMAGE_UPLOAD_CONTAINER],
    ),
  );
   
 
// save node
 
node_save($new_node);
 
 
// clear form storage, to allow form to submit
 
$form_state['storage'] = array();
 
 
// redirect user, set message, etc!

}
?>

After using the form to upload an image, the following node was created:

New image node

In this tutorial I'll show you how you can add jQuery image carousel functionality to your CCK node. This tutorial requires you to install the following modules:

cck
filefield
imageapi
imagecache
imagefield

NOTE: The next few steps assume you have not already setup a content type and ImageCache preset. You can ignore them if you already have.

Once you installed the above modules and set permission accordingly, you'll need to define an ImageCache preset (/admin/build/imagecache/add). I called mine "thumbnail" for this example. I added a "Add Scale" action to set the width to 100 (pixels). This will ensure when you create an image, a thumbnail will be created automatically.

Create a new content type (/admin/content/types/add). I called mine "Carousel" for this example. I then clicked on "Manage Fields" (/admin/content/node-type/carousel/fields) to setup a new Image field. I called my new field "field_images", selected "File" for field type, and selected "Image" for the form element.

On the next screen, scroll down to the Global fieldset region and enable the "Required" checkbox, and set the "Number of Values" to "Unlimited". This will allow the user to upload numerous image files to the same CCK field.

Next, you'll want to set which image preset it shown when viewing the node, by clicking on "Display fields" (/admin/content/node-type/carousel/display). I chose the "thumbnail image" preset for both the teaser and full node displays.

I then created a new carousel node (/node/add/carousel). I used the "Add another item" button to upload three images to this node.

If you view the node, the images will be stacked vertically.

This is where the fun starts. You'll need to download the jQuery Carousel library (http://sorgalla.com/projects/jcarousel/). I downloaded the jcarousel.zip file and unpacked it. Copy the entire unpacked jcarousel folder into your theme directory.

Now, you'll need to add a preprocess_node function to your template.php file:

<?php
function YOURTHEME_preprocess_node(&$variables) {
 
 
// test for carousel node type
 
if ($variables['type'] == 'carousel') {

   
// include jCarousel javascript
   
drupal_add_js(path_to_theme() . '/jcarousel/lib/jquery.jcarousel.js');
   
   
// add jquery in enable jQuery carousel on <ul>
   
$js = "
      $(document).ready(function(){
        $('#mycarousel').jcarousel();
      });
    "
;
   
drupal_add_js($js, 'inline');
   
   
// include jCarousel css
   
drupal_add_css(path_to_theme() . '/jcarousel/lib/jquery.jcarousel.css');
   
drupal_add_css(path_to_theme() . '/jcarousel/skins/tango/skin.css');
   
   
// loop through images and create an item list
   
$items = array();
    foreach (
$variables['field_images'] as $key => $value) {
     
$items[] = $value['view'];
    }
   
   
// ensure images exist
   
if (count($items)) {
     
// add jQuery carousel html to $content variable
     
$variables['content'] .= theme('item_list', $items, NULL, 'ul', array('id' => 'mycarousel', 'class' => 'jcarousel-skin-tango'));
    }
   
  }
 
}
?>

The previous code snippet should result in the following:

If you'd like you can hide the old image html by adding a single line of CSS (or by adding more elaborate code in your template preprocess function).

Example:

div.field-field-images {
  display: none;
}

Adding the previous CSS will result in this example:

Syndicate content