background image
HomeRecent PostsDrupalSearchTagsRSSContactAboutAccount
Eric.London's picture

I recently found some time to switch my site's search framework from Lucene to Apache Solr. The module's README.txt makes installation for small production sites easy and straight forward.

Following the installation guide, I started the java Solr process by entering the right directory and executing the java jar..

$ java -jar start.jar

Everything was up and running in minutes.. until I closed my terminal and the java service ended with my shell process. Short term, I decided to writing a bash shell script to ensure Solr is running, and cron it to run every five minutes.

Here are the contents of my bash shell script:

#!/bin/bash

# check for process id
pid=`ps ax | grep -ir java.*jar.*start\.jar | grep -iv grep | awk '{print $1}'`

# check if pid is not an integer
if ! [[ "$pid" =~ ^[0-9]+$ ]] ; then

  # start service
  cd /path/to/my/apache-solr-1.4.1/installation
  java -jar start.jar &

  # send email notification
  message='ericlondon.com: starting solr service'
  subject='ericlondon.com: starting solr service'
  to='myemail@example.com'
  echo "$message" | mail -s "$subject" $to

  exit 1;

fi

And I added the following cronjob:

$ crontab -l
*/5 * * * * /path/to/my/scripts/folder/check_solr.sh

A better option would be to setup initialization scripts for the process (/etc/init.d/), or install Solr as a more permanent solution, but I guess this will do for the time being :) ...


Part 2, Using Supervisor (updated: 2011/04/12)

As mentioned above, using a cronjob is probably not the best solution. I decided to install and configure supervisord to monitor the process.

Unfortunately supervisor was not available for for Centos 5.5 (RHEL):

$ yum search supervisor
Finished
Warning: No matches found for: supervisor
No Matches found

Luckily, I found some RPMs via http://rpmfind.net. I installed supervisor and its one dependency:

# downloading RPMs
$ wget ftp://rpmfind.net/linux/epel/5/x86_64/supervisor-2.1-3.el5.noarch.rpm
$ wget ftp://rpmfind.net/linux/epel/5/x86_64/python-meld3-0.6.3-1.el5.x86_64.rpm

# installing RPMs
$ rpm -Uvh python-meld3-0.6.3-1.el5.x86_64.rpm
$ rpm -Uvh supervisor-2.1-3.el5.noarch.rpm

# setting run level for supervisord
$ chkconfig --level 2345 supervisord on

# starting supervisor
$ /etc/init.d/supervisord start

Next, I create a simple shell script to start the Solr process and made the script executable. NOTE: file contents have been simplified:

#!/bin/bash

# enter solr dir
cd /path/to/my/apache-solr-1.4.1/installation

# start solr
java -jar start.jar

Lastly, I added a few line to my supervisor conf file (/etc/supervisord.conf):

[program:apache_solr]
command=/path/to/my/scripts/folder/apache-solr-supervisor-run.sh

Upon restarting supervisor, solr started automatically

$ /etc/init.d/supervisord restart

$ ps aux | grep -ir java | grep -iv grep
root     28670  0.1  8.5 1041076 43548 ?       Sl   13:30   0:02 java -jar start.jar

I killed the script and it immediately came back (with a different process ID)!

$ kill 28670

$ ps aux | grep -ir java | grep -iv grep
root     28869 62.0  5.3 1021532 27016 ?       Sl   13:50   0:00 java -jar start.jar

Rsync is a great command line program for copying and sync'ing data. It can use standard SSH protocol (default port 22) to copy files from computer to computer, or locally from one path to another. It frequently comes on linux/unix systems, but if you're using Windoze, I suggest installing Cygwin.

Part One
The first step in this tutorial is to setup passwordless SSH. Open a terminal on the computer you want to copy files from, referred to in this article as "local".

# use the ssh-keygen command to generate a public and private key
# I left the passphrase empty, and used the default path: ~/.ssh/id_dsa
local$ ssh-keygen -t dsa

# the above command will create two files (public and private keys)
local$ ls -l ~/.ssh/id_dsa*
-rw-------  1 Eric  staff  668 Feb 26 11:32 /Users/Eric/.ssh/id_dsa
-rw-r--r--  1 Eric  staff  611 Feb 26 11:32 /Users/Eric/.ssh/id_dsa.pub

SCP the public key file (id_dsa.pub) to the computer that will receive the files, referred to as "remote".

# NOTE: you'll need to replace "Eric@remote" with your remote username and IP address
local$ scp ~/.ssh/id_dsa.pub Eric@remote:~/.ssh/id_dsa.pub.transferred

SSH to the remote system and execute a few commands to enable passwordless SSH

$ SSH to remote system
local$ ssh Eric@remote

# append public key to "authorized_keys"
remote$ cat ~/.ssh/id_dsa.pub.transferred >> ~/.ssh/authorized_keys

# remove obsolete public key
remote$ rm ~/.ssh/id_dsa.pub.transferred

# exit remote system
remote$ exit

To verify that the public/private keys are working, SSH to the remote system. You should not be prompted for a password this time.

Part Two
The second step of this tutorial is creating an executable shell script that will transfer the files. I chose to put my scripts in the folder "~/scripts/", but you could put them anywhere you want.

Open up your favorite text editor (emacs, vi, nano, etc) and enter your rsync command.

#!/bin/bash
rsync -avz --delete /path/on/local/computer/ Eric@remote:/path/on/remote/computer/

Please note, the "--delete" flag is optional, and will remove files on the remote computer that do not exist on the local computer. Please use caution.

For my real life example, I setup a script to rsync my iTunes library from my iMac to my MacBookPro.

#!/bin/bash
rsync -avz --delete --exclude '*.m4v' --exclude '*.mp4' ~/Music/iTunes/ Eric@remote:~/Music/iTunes/

After saving the script, set it to be executable using chmod.

local$ chmod u+x /path/to/local/rsync.script.sh

Test your script on the command line, and then SSH to the remote computer to verify the copied files.

local$ /path/to/local/rsync.script.sh

If all is working well, you can setup a cron job to run at your desired time interval. Remember, both computers must be running for this to be automated, so choose a time you know they'll both be on. For example, to run this script daily..

local$ crontab -e

# min hour dayMonth month dayWeek command
0 0 * * * /path/to/local/rsync.script.sh

In certain situations you need more control of your cron tasks, execution times, and methods. Check out the SuperCron and Elysia Cron modules as alternative solutions. Both give you the ability to order, manage, disable, and execute your cron tasks, among a slew of other functionality. If you are familiar with Linux crontabs and need that level of scheduling and control, see Elysia Cron.

On a recent project, I struggled to get a massive cron hook to complete, and at one point, some of the tasks required a Batch API implementation to manage system resources. I eventually decided to pull the cron task out of the Drupal/Apache/web environment to execute it on the shell in a separate crontab. Here's the gist of the PHP script I put in my Drupal docroot.

<?php
// check if this is web traffic, versus CLI
if (isset($_SERVER['HTTP_USER_AGENT'])) {
header('HTTP/1.1 403 Forbidden');
die(
'You are not authorized to access this page.');
}

// set environment variables:

// remove time limit of script
set_time_limit(0);

// increase memory usage
ini_set('memory_limit', '256M');

// ensure script is being executed from Drupal docroot
$docroot_path = dirname($_SERVER['SCRIPT_NAME']);
if (
getcwd() != $docroot_path) {
chdir($docroot_path);
}

// set bootstrap variables:
// these lines ensure multi-site environments will work with a bootstrap
$_SERVER['HTTP_HOST'] = 'www.myhostname.com';
$_SERVER['SCRIPT_NAME'] = '/' . basename(__file__);

// include Drupal bootstrap
require_once './includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

// check for custom module and cron hook
if (!module_exists(array('MYMODULE')) || !function_exists('MYMODULE_cron')) {
die(
'Module is not enabled.');
}

// execute custom cron hook
MYMODULE_cron();
?>

I then added a line in my crontab on the shell

$ crontab -e

# file contents:

#min hour dayMonth month dayWeek command

0    0    *        *     *       /usr/bin/php /path/to/drupal/docroot/cron.custom.php

The above crontab executes my custom cron hook once a day.

Eric.London's picture

I recently wrote a quick BASH shell script to FTP a log file to another server monthly. First, I modified the logrorate configuration to rotate a service's logs monthly. Then I added a cron job to be executed the following script once a month. NOTE: It's important to give logrotate enough time to finish rotating the logs. Here's my script:

#!/bin/bash

_user="MYFTPUSER"
_password="MYFTPPASSWORD"

# create a date string in the format YYYYMM for last month
_date=$(date +%Y%m --date="-1 month")

# Create FTP connection and put the log in the user's home folder
ftp -n MYFTPSERVER <<EOF
user $_user $_password
binary
put /var/log/MYROTATEDLOG.log.1 ~/MYROTATEDLOG.$_date.log
bye
EOF

Eric.London's picture

You may have a lot of Drupal sites installed on the same server. Instead of creating a cron job for each individual site, you could write a script like this to loop through your sites and execute each cron job automatically. Here's the script I created using PHP, lynx, and find:

#!/usr/bin/php

<?php

// define where vhosts exist
$sitesDir = '/var/www/vhosts';

// change dir
chdir($sitesDir);

// get a list of directories
// Pipe 1: get all directories
// Pipe 2: remove "./" from beginning of each line
// Pipe 3: remove "."
$command = "find . -maxdepth 1 -type d | sed 's/^\.\///' | sed 's/^\.$//'";

// execute command
$dirs = `$command`;

// convert string into array
$dirs = explode("\n", trim($dirs));

// loop through directories
foreach ($dirs as $d) {
 
// ensure cron.php exists
 
if (file_exists($sitesDir . '/' . $d . '/httpdocs/cron.php')) {
   
$command = "/usr/bin/lynx -source http://$d/cron.php > /dev/null 2>&1";
   
$output = `$command`;
  }
}

?>

I then added the following cron job to root:

#min    hour    dMonth  month   dWeek   command
*/5     *       *       *       *       ~/scripts/drupal-crons.php

Syndicate content