Configuring a server to parse email via a PHP script

In this tutorial I’ll show how you can setup a server to parse email with a PHP script. This tutorial assumes that your server is configured to receive email (I wrote this using a virtual machine running postfix).

The first thing you’ll need to do is configure an alias to direct email to a PHP script (instead of an email box). I added the following entry to the bottom of my /etc/aliases file and then ran the “newaliases” command to refresh my aliases database:

phpscript: "|php -q /usr/local/bin/email.php"

The above entry will pipe email sent to phpscript@MYDOMAIN to the designated PHP script.

And here’s the script:

#!/usr/bin/php
<?php

// fetch data from stdin
$data = file_get_contents("php://stdin");

// extract the body
// NOTE: a properly formatted email's first empty line defines the separation between the headers and the message body
list($data, $body) = explode("\n\n", $data, 2);

// explode on new line
$data = explode("\n", $data);

// define a variable map of known headers
$patterns = array(
  'Return-Path',
  'X-Original-To',
  'Delivered-To',
  'Received',
  'To',
  'Message-Id',
  'Date',
  'From',
  'Subject',
);

// define a variable to hold parsed headers
$headers = array();

// loop through data
foreach ($data as $data_line) {

  // for each line, assume a match does not exist yet
  $pattern_match_exists = false;

  // check for lines that start with white space
  // NOTE: if a line starts with a white space, it signifies a continuation of the previous header
  if ((substr($data_line,0,1)==' ' || substr($data_line,0,1)=="\t") && $last_match) {

    // append to last header
    $headers[$last_match][] = $data_line;
    continue;

  }

  // loop through patterns
  foreach ($patterns as $key => $pattern) {

    // create preg regex
    $preg_pattern = '/^' . $pattern .': (.*)$/';

    // execute preg
    preg_match($preg_pattern, $data_line, $matches);

    // check if preg matches exist
    if (count($matches)) {

      $headers[$pattern][] = $matches[1];
      $pattern_match_exists = true;
      $last_match = $pattern;

    }

  }

  // check if a pattern did not match for this line
  if (!$pattern_match_exists) {
    $headers['UNMATCHED'][] = $data_line;
  }

}

?>

At this point in the code, the body of the message will be contained in the $body variable and the headers will be in $headers.

Here is an example of the parsed headers (using print_r()):

Array
(
    [UNMATCHED] => Array
        (
            [0] => From root@Eric-Centos.localdomain  Sun Jan 10 21:49:50 2010
        )

    [Return-Path] => Array
        (
            [0] => <root@Eric-Centos.localdomain>
        )

    [X-Original-To] => Array
        (
            [0] => phpscript
        )

    [Delivered-To] => Array
        (
            [0] => phpscript@Eric-Centos.localdomain
        )

    [Received] => Array
        (
            [0] => by Eric-Centos.localdomain (Postfix, from userid 0)
            [1] => 	id 4D03F30131; Sun, 10 Jan 2010 21:49:50 -0500 (EST)
        )

    [To] => Array
        (
            [0] => phpscript@Eric-Centos.localdomain
        )

    [Subject] => Array
        (
            [0] => This is the subject
        )

    [Message-Id] => Array
        (
            [0] => <20100111024950.4D03F30131@Eric-Centos.localdomain>
        )

    [Date] => Array
        (
            [0] => Sun, 10 Jan 2010 21:49:50 -0500 (EST)
        )

    [From] => Array
        (
            [0] => root@Eric-Centos.localdomain (root)
        )

)

Now you have all the email headers and message body parsed. You can do whatever your heart desires with the data, like insert it into a database or even create nodes!

Updated: