Posts tagged with mysql

Avatar-eric-london
Created by Eric.London on 2013-02-21
Tags:
New Comment
 
Please note: the content on this page orginates from ericlondon.com.
In this blog article, I'll demonstrate a proof of concept: MySQL replication to a Redis cache server. To outline the components used:
  • Ubuntu - linux
  • Mysql - primary database
  • Redis - cache database
  • Ruby - Gearman worker + client
  • Gearman - job queue service
  • MySQL JSON UDF
  • MySQL Gearman UDF
At FreePriceAlerts.com we implemented a similar configuration to produce high volume streams of data for Ziftr.

I started with a clean and minimal installation of Ubuntu 12.10 Server.

# upgrade packages
apt-get update
apt-get upgrade -y

# install ssh server
apt-get install openssh-server -y


Installed Mysql Server

apt-get install mysql-server -y


Install Ruby via RVM

curl -L https://get.rvm.io | bash -s stable
source /etc/profile.d/rvm.sh
# rvm requirements...
apt-get --no-install-recommends install build-essential openssl libreadline6 libreadline6-dev curl git-core zlib1g zlib1g-dev libssl-dev libyaml-dev libsqlite3-dev sqlite3 libxml2-dev libxslt-dev autoconf libc6-dev libgdbm-dev ncurses-dev automake libtool bison subversion pkg-config libffi-dev
rvm install ruby


Install Redis

apt-get install tcl8.5 -y
wget http://redis.googlecode.com/files/redis-2.6.9.tar.gz
tar -xzf redis-2.6.9.tar.gz
cd redis-2.6.9
make
make test
cd src
cp redis-benchmark redis-check-aof redis-check-dump redis-cli redis-sentinel redis-server /usr/local/bin/
cd ..
cp redis.conf /etc

# start redis
redis-server /etc/redis.conf &

# test redis
redis-cli ping
PONG


Install Gearman

apt-get install gearman gearman-server -y

# check if service is running
/etc/init.d/gearman-job-server status
 * gearmand is running


Installed MySQL JSON UDF

apt-get install libmysqlclient-dev -y
cd
mkdir ~/lib_mysqludf_json
cd ~/lib_mysqludf_json
wget http://www.mysqludf.org/lib_mysqludf_json/lib_mysqludf_json_0.0.2.tar.gz
tar -xzf lib_mysqludf_json_0.0.2.tar.gz

# remove shared object, and recompile
rm lib_mysqludf_json.so
gcc $(mysql_config --cflags) -shared -fPIC -o lib_mysqludf_json.so lib_mysqludf_json.c

# locate plugin directory
mysql -u root -pPASSWORD --execute="show variables like '%plugin%';"
+---------------+------------------------+
| Variable_name | Value                  |
+---------------+------------------------+
| plugin_dir    | /usr/lib/mysql/plugin/ |
+---------------+------------------------+

# copy shared object to plugin directory
cp lib_mysqludf_json.so /usr/lib/mysql/plugin/

# enable json_object method
mysql -u root -pPASSWORD --execute="create function json_object returns string soname 'lib_mysqludf_json.so'"


Installed MySQL Gearman UDF

apt-get install libgearman-dev -y
cd
wget https://launchpad.net/gearman-mysql-udf/trunk/0.6/+download/gearman-mysql-udf-0.6.tar.gz
tar -xzf gearman-mysql-udf-0.6.tar.gz
cd gearman-mysql-udf-0.6
./configure --with-mysql=/usr/bin/mysql_config --libdir=/usr/lib/mysql/plugin/
make
make install

# enabled udf functions
mysql -u root -pPASSWORD --execute="CREATE FUNCTION gman_do_background RETURNS STRING SONAME 'libgearman_mysql_udf.so'"
mysql -u root -pPASSWORD --execute="CREATE FUNCTION gman_servers_set RETURNS STRING SONAME 'libgearman_mysql_udf.so'"

# set gearman server
mysql -u root -pPASSWORD --execute="SELECT gman_servers_set('127.0.0.1')"


Ruby setup code

# setup RVM gemset
mkdir ~/ruby
echo "rvm use --create ruby-1.9.3@redis_gearman" > ~/ruby/.rvmrc
cd ~/ruby

# new file: Gemfile; contents:
source 'https://rubygems.org'
gem 'gearman-ruby'
gem 'redis'

# execute bundle to install gems
bundle


Created redis gearman worker, new file: redis_worker.rb

#!/usr/bin/env ruby

require 'rubygems'
require 'gearman'
require 'redis'
require 'json'

servers = ['localhost']
worker = Gearman::Worker.new(servers)

REDIS_DELIMITER = ':'
$redis = Redis.new

module RedisWorker
  def RedisWorker.work(data, job)

    # decode json
    json_data = JSON.parse data

    # create redis key
    redis_key = "user_page_views#{REDIS_DELIMITER}#{json_data['user_id']}"

    $redis.lpush redis_key, data

    true
  end
end

worker.add_ability('redis_worker') do |data,job|
  RedisWorker::work data,job
end

loop {worker.work}


Set file executable

chmod +x redis_worker.rb


Created redis gearman [test] client, new file: redis_client.rb

#!/usr/bin/env ruby

require 'rubygems'
require 'gearman'
require 'json'

servers = ['localhost']
client = Gearman::Client.new(servers)
taskset = Gearman::TaskSet.new(client)

data = '{"user_id":1,"timestamp":"2013-02-14 19:13:15","page":"http://www.google.com"}'

result = client.do_task('redis_worker', data)
puts result


Set file executable

chmod +x redis_client.rb


Testing Gearman worker & Redis

# in terminal 1, start worker
./redis_worker.rb

# in terminal 2, check gearman status & verify worker
(echo status ; sleep 0.1) | netcat 127.0.0.1 4730
redis_worker	0	0	1

# in terminal 3, monitor redis
redis-cli monitor
OK

# in terminal 4, run client test script
./redis_client.rb 
true

# in terminal 3, verify redis lpush:
redis-cli monitor
OK
1361012555.700504 [0 127.0.0.1:34135] "lpush" "user_page_views:1" "{\"user_id\":1,\"timestamp\":\"2013-02-14 19:13:15\",\"page\":\"http://www.google.com\"}"


MySQL data setup

# add database & table
mysql -u root -pPASSWORD
mysql> create database redisgearman;
mysql> use redisgearman;
mysql> CREATE TABLE `user_page_views` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `user_id` int(11) NOT NULL,
  `timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `page` varchar(255) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `user_id` (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
mysql> exit

# add test record
mysql -u root -pPASSWORD redisgearman --execute="insert into user_page_views (user_id, page) values (1, 'http://www.google.com')"

# ensure json udf is working
mysql -u root -pPASSWORD redisgearman --execute="select json_object(user_id as \`user_id\`, timestamp as \`timestamp\`, page as \`page\`) as json from user_page_views"
+--------------------------------------------------------------------------------+
| json                                                                           |
+--------------------------------------------------------------------------------+
| {"user_id":1,"timestamp":"2013-02-14 19:13:15","page":"http://www.google.com"} |
+--------------------------------------------------------------------------------+


Add MySQL trigger, new file: ~/trigger.sql

DELIMITER $$
CREATE TRIGGER redisgearman.redis_gearman AFTER INSERT ON redisgearman.user_page_views
  FOR EACH ROW BEGIN
    SET @ret=gman_do_background('redis_worker', json_object(NEW.user_id as `user_id`, NEW.timestamp as `timestamp`, NEW.page as `page`)); 
  END$$
DELIMITER ;


Enable trigger

mysql -u root -pPASSWORD redisgearman < ~/trigger.sql


Trigger + Gearman worker + Redis test.

# example insert statement
insert into user_page_views (user_id, page) values (1, 'http://ericlondon.com/recent-posts');

# mysql insert => mysql trigger => gearman udf => ruby redis worker => redis insert..

# output from redis-cli monitor
redis-cli monitor
OK
1361327500.888805 [0 127.0.0.1:33649] "lpush" "user_page_views:1" "{\"user_id\":1,\"timestamp\":\"2013-02-19 21:31:40\",\"page\":\"http://ericlondon.com/recent-posts\"}"


Now the MySQL table replicates to the Redis cache database. For future queries, check Redis first, and fall back on MySQL. Here's an example class to do so:

#!/usr/bin/env ruby

require 'rubygems'
require 'redis'
require 'json'
require 'mysql2'

class RedisMysql

  def initialize
    @redis = Redis.new
    @mysql = Mysql2::Client.new(:host => 'localhost', :username => 'root', :password => 'PASSWORD', :database => 'redisgearman')
    @redis_results = []
  end

  def query(key, limit)
    @redis_results = query_redis key,limit
    return @redis_results if @redis_results.size >= limit

    @mysql_results = query_mysql key, (limit-@redis_results.size)
    @redis_results.concat @mysql_results

  end

  def query_redis(key, limit)
    results = @redis.lrange key, 0, limit
    return [] if results.nil?
    results.collect {|r| JSON.parse r}
  end

  def query_mysql(key, limit)

    # parse args
    parts = key.split ':'
    mysql_table = parts[0]
    user_id = parts[1]

    # get last timestamp from redis results
    last_timestamp = @redis_results.last['timestamp'] unless @redis_results.empty?

    where = []
    where << "user_id = '#{@mysql.escape user_id}'"
    where << "timestamp < '#{@mysql.escape last_timestamp}'" unless last_timestamp.nil?

    sql = "
      select *
      from user_page_views
      where #{where.join ' and '}
      order by id desc
      limit #{limit}"

    results = @mysql.query sql
    return [] if results.nil?
    results.collect {|r| r}

  end
end


Class usage

rm = RedisMysql.new
results = rm.query 'user_page_views:1', 10

# debug
puts results


Source code on GitHub
Avatar-eric-london
Created by Eric.London on 2011-07-28
Tags:
New Comment
 
Please note: the content on this page orginates from ericlondon.com.
For this article, I'll share some old school procedural PHP scripts I used to scan a directory for duplicate images and display the results for comparison. A while back, I had a hardware failure and had to write some rsync commands to manually pull my iPhoto images off of a dying Time Machine external harddrive. The basic gist of these script is simple: find all the images, create a unique MD5 hash of the image, collect some other details, write the records to a MySQL database, execute some SQL to find MD5 duplicates, and show the results side by side for comparison. Since I was executing this code on my iMac, I used MAMP to provide the Apache and MySQL services.

The first script, which will be included all the rest just sets up a MySQL database connection.

Script: db.php
<?php
// define mysql credentials
$db_user = 'picture_data';
$db_pass = 'picture_data';
$db_database = 'picture_data';
$db_table = 'picture_data';
$db_host = 'localhost';

// connect to mysql database
$db = mysql_connect($db_host, $db_user, $db_pass);

// check for mysql connection
if (!$db) {
  die('Could not connect to database.');
}
?>


I then created the script to find all the images, create the md5 hash, and store the data in MySQL. I put this script outside my Apache vhost docroot and only had to execute it once.

Script: scan.php
<?php
//////////////////////////////////////////////////
// DATABASE SETUP

require_once('db.php');

setup_database();

//////////////////////////////////////////////////
// PROCESSING IMAGES

// specify path to images
$images_path = '/Users/Eric/Pictures/iPhoto Library/Originals';

// ensure directory exists
if (!is_dir($images_path)) {
  die('Directory does not exist.');
}

// change directory
chdir($images_path);

// get a list of files
$files = `find . -type f | sed 's/^\.\///'`;

// explode files list on newline
$files = explode("\n", trim($files));

// define a list of file extensions to process
$file_extensions = array(
  'jpg',
  'jpeg',
  'png',
  'bmp',
  'gif',
  'tiff',
);

// loop through files
foreach ($files as $file_path) {

  // get path info
  $path_info = pathinfo($file_path);
  $file_name = $path_info['basename'];
  $file_extension = strtolower($path_info['extension']);
  
  // check file extension 
  if (!in_array($file_extension, $file_extensions)) {
    continue;
  }

  // get md5 hash of file
  $file_md5 = md5_file($file_path);

  // get file modified time
  $file_modified = date('Y-m-d H:i:s', filemtime($file_path));

  // create sql to insert record
  $sql = sprintf(
    "insert into `%s` (file_path, file_name, file_extension, file_md5, file_modified) values ('%s','%s','%s','%s','%s')",
    mysql_real_escape_string($db_table),
    mysql_real_escape_string($images_path . '/' . $file_path),
    mysql_real_escape_string($file_name),
    mysql_real_escape_string($file_extension),
    mysql_real_escape_string($file_md5),
    mysql_real_escape_string($file_modified)
  );

  // execute sql
  $result = mysql_query($sql, $db);

}

//////////////////////////////////////////////////
// FUNCTIONS

function setup_database() {

  global $db;
  global $db_database;
  global $db_table;

  // create database if it is does not exist
  $sql = sprintf(
    "create database if not exists `%s`",
    mysql_real_escape_string($db_database)
  );
  $result = mysql_query($sql, $db);
  
  // check for error
  if (!$result) {
    die(mysql_error());
  }
  
  // select database
  $result = mysql_select_db($db_database, $db);
  
  // check for error
  if (!$result) {
    die(mysql_error());
  }
  
  // create table if it does not exist
  $sql = sprintf("
    CREATE TABLE IF NOT EXISTS `%s` (
      `fid` int(11) NOT NULL AUTO_INCREMENT,
      `file_path` varchar(255) NOT NULL,
      `file_name` varchar(255) NOT NULL,
      `file_extension` varchar(10) NOT NULL,
      `file_md5` varchar(32) NOT NULL,
      `file_modified` datetime NOT NULL,
      PRIMARY KEY (`fid`),
      KEY `idx_file_md5` (`file_md5`)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1",
    mysql_real_escape_string($db_table)
  );
  $result = mysql_query($sql, $db);
  
  // check for error
  if (!$result) {
    die(mysql_error());
  }
  
  // drop existing records from table
  $sql = sprintf(
    "truncate table `%s`",
    mysql_real_escape_string($db_table)
  );
  $result = mysql_query($sql, $db);
  
  // check for error
  if (!$result) {
    die(mysql_error());
  }

}
?>


I then ran the script on the command line. It took a while to go through all 25K+ images in my directory.


$ php scan.php


The next script I wrote will aid in the display of the images. I wrote this script because the absolute path of my images was outside my Apache vhost docroot. It checks for 2 $_GET variables: the md5 hash and a integer representing which duplicate image to show. The images is read and displayed, so this script can be inserted into the "scr" attribute of an img tag.

Script: view-image.php
<?php
//////////////////////////////////////////////////
// DATABASE

require_once('db.php');

// select database
$result = mysql_select_db($db_database, $db);

// check for error
if (!$result) {
  die(mysql_error());
}

//////////////////////////////////////////////////
// PROCESS REQUEST

$md5 = $_GET['md5'];
$index = intval($_GET['index']);

// fetch images with md5 index
$sql = sprintf("
  select *
  from `%s`
  where file_md5 = '%s'
  order by fid asc
  ",
  mysql_real_escape_string($db_table),
  mysql_real_escape_string($md5)
);

$result = mysql_query($sql, $db);

// check for error
if (!$result) {
  die(mysql_error());
}

// fetch results
$rows = array();
while ($row = mysql_fetch_object($result)) {
  $rows[] = $row;
}

// get image data
$file_path = $rows[$index]->file_path;
$file_extension = $rows[$index]->file_extension;

header("Content-type: image/$file_extension");
readfile($file_path);
?>


The last script ties everything together. It determines which duplicates exist and allows you to view them. For my environment, I decided to store the list of MD5 duplicates in the $_SESSION, to prevent repeat SQL.

Script: view.php
<?php
//////////////////////////////////////////////////
// DATABASE SETUP

require_once('db.php');

// select database
$result = mysql_select_db($db_database, $db);

// check for error
if (!$result) {
  die(mysql_error());
}

//////////////////////////////////////////////////
// FETCHING MD5S

// start session
session_start();

// check for session data
if (!is_array($_SESSION['md5s']) || empty($_SESSION['md5s'])) {
  fetch_md5s();
}

// determine which md5 to show
$md5_index = intval($_GET['md5_index']);

// fetch images with md5 index
$sql = sprintf("
  select *
  from `%s`
  where file_md5 = '%s'
  order by fid asc
  ",
  mysql_real_escape_string($db_table),
  mysql_real_escape_string($_SESSION['md5s'][$md5_index])
);

$result = mysql_query($sql, $db);

// check for error
if (!$result) {
  die(mysql_error());
}

// fetch results
$rows = array();
while ($row = mysql_fetch_object($result)) {
  $rows[] = $row;
}

// create image output in a table. note the image scr is calling the view-image.php script with $_GET arguments.
$output = "";
$output .= "<table><tr>";
foreach ($rows as $index => $data) {
  $output .= "<td style='width: " . (100/count($rows)) . "%'>";
  $output .= "<img style='width: 100%' src='/view-image.php?md5=" . $data->file_md5 . "&index=" . $index . "' />";
  $output .= $data->file_name . "<br/>";
  $output .= $data->file_path . "<br/>";
  $output .= "</td>";
}
$output .= "</tr></table>";

$output .= "<a href='/view.php?md5_index=" . ($md5_index+1) . "'>Next >></a>";

print $output;

//////////////////////////////////////////////////
// FUNCTIONS

function fetch_md5s() {

  global $db;
  global $db_table;

  // get a list of md5 hashes with dupes
  $sql = sprintf("
    select file_md5
    from `%s`
    group by file_md5
    having count(*) > 1
    ",
    mysql_real_escape_string($db_table)
  );
  
  $result = mysql_query($sql, $db);
  
  // check for error
  if (!$result) {
    die(mysql_error());
  }
  
  // fetch results
  $md5s = array();
  while ($row = mysql_fetch_object($result)) {
    $md5s[] = $row->file_md5;
  }
  
  // store md5s in session
  $_SESSION['md5s'] = $md5s;

}
?>


Now, I went to my browser to execute the view.php script and view the results.

Picture Duplicates
Avatar-eric-london
Created by Eric.London on 2011-04-30
Tags:
New Comment
 
Please note: the content on this page orginates from ericlondon.com.
In this article, I'll show the commands I have been using to set up a fresh Centos server, configured for Apache, MySQL, PHP, Tomcat, Drupal, and Apache Solr. For my article, I used Parallels to create a virtual machine from the Centos 5.6 64bit ISOs I downloaded. To simply this article, all commands are being executed as root, firewall configurations and performance tweaks are not accounted for.

Once the distribution is installed, the first thing I do is upgrade all packages.

$ yum update


Install PHP, Apache, and MySQL

$ yum install php53 php53-gd php53-mbstring php53-mysql php53-xml mysql-server httpd


Set runlevels for Apache and MySQL

$ chkconfig --level 2345 httpd on
$ chkconfig --level 2345 mysqld on


Install subversion. I chose to use subversion for this article because the Drupal 6.x installation works well with svn:externals to fetch the SolrPhpClient library. All subversion commands are connecting to a local subversion repository. If you are using an external server (like Beanstalk), you will have to transpose all commands from using "file://" to "https://".

$ yum install subversion


Add a new local subversion repository (OPTIONAL).

$ mkdir /var/subversion
$ svnadmin create /var/subversion/example.com
$ svn mkdir file:///var/subversion/example.com/trunk -m "added trunk"
$ svn mkdir file:///var/subversion/example.com/branches -m "added branches"
$ svn mkdir file:///var/subversion/example.com/tags -m "added tags"


Download/setup drush

$ cd /var/www
$ wget http://ftp.drupal.org/files/projects/drush-7.x-4.4.tar.gz
$ tar -xzf drush-7.x-4.4.tar.gz
$ ln -s /var/www/drush/drush /usr/local/bin/drush


Create a vhost location on the server for the Drupal installation.

$ mkdir /var/www/vhosts
$ cd /var/www/vhosts
$ drush dl drupal
$ mv drupal-7.0/ example.com 


Integrate the Drupal files with subversion

$ cd /var/www/vhosts/example.com
$ svn co file:///var/subversion/example.com/trunk .
$ svn add * .htaccess
$ svn commit -m "downloaded drupal"


Download the Drupal apachesolr module

# make a folder for contrib modules
$ mkdir /var/www/vhosts/example.com/sites/all/modules/contrib
$ cd /var/www/vhosts/example.com/sites/all/modules/contrib

# note: in the below command, you may be prompted to choose which version of the Solr module to install. I choose option 2 for the Supported version
$ drush dl apachesolr

# commit to subversion
$ cd /var/www/vhosts/example.com/sites/all/modules
$ svn add contrib
$ svn commit -m "added contrib folder and apachesolr module"


Setup MySQL

# start mysql
$ /etc/init.d/mysqld start

# set root password
$ /usr/bin/mysqladmin -u root password 'new-password'

# create new database, user, and set permissions
$ mysql --execute="create database db_example"
$ mysql --execute="grant all privileges on db_example.* to 'example-user'@'localhost' identified by 'some_password'"


Setup Apache vhost

$ cd /etc/httpd/conf.d

# create a new file "example.com.conf", with the contents:

NameVirtualHost *:80

<Directory /var/www/vhosts>
  AllowOverride All
</Directory>

<VirtualHost *:80>
  ServerName example.com
  DocumentRoot /var/www/vhosts/example.com
  ErrorLog logs/example.com-error_log
  CustomLog logs/example.com-access_log common
</VirtualHost>



Reset Apache file permissions. NOTE: you will need a more solid/secure configuration for this!

$ cd /var/www
$ chown -R apache.apache drush*
$ chown -R apache.apache vhosts


Start Apache

$ /etc/init.d/httpd start


Install Drupal via drush

$ cd /var/www/vhosts/example.com

# note: you can set your user 1 username, password, email, etc in the following command if desired. type "drush help si" for more install options
$ drush site-install standard --sites-subdir=example.com --db-url=mysqli://example-user:some_password@localhost/db_example


At this point, you should be able to browse to your site and it will be up and running.
Drupal Installed

Now, we move onto Tomcat and Solr!

Installing Tomcat and Java. The default Centos yum repositories provide Tomcat5. I prefer Tomcat6, so there are some extras steps below and a dependency issue I had to resolve.

# added repo file to get tomcat6:
$ cd /etc/yum.repos.d/
$ wget http://www.jpackage.org/jpackage50.repo

# install Java JDK:
$ yum install java-1.6.0-openjdk

# install tomcat6:
$ yum install tomcat6 tomcat6-admin-webapps tomcat6-webapps

# dang, dependency issue... (!)

java-1.4.2-gcj-compat-1.4.2.0-40jpp.115.x86_64 from base has depsolving problems
  --> Missing Dependency: /usr/bin/rebuild-security-providers is needed by package java-1.4.2-gcj-compat-1.4.2.0-40jpp.115.x86_64 (base)
Error: Missing Dependency: /usr/bin/rebuild-security-providers is needed by package java-1.4.2-gcj-compat-1.4.2.0-40jpp.115.x86_64 (base)
 You could try using --skip-broken to work around the problem
 You could try running: package-cleanup --problems
                        package-cleanup --dupes
                        rpm -Va --nofiles --nodigest
The program package-cleanup is found in the yum-utils package.

# Fixing dependency issue (OPTIONAL):
$ mkdir ~/downloads
$ cd ~/downloads
$ wget http://plone.lucidsolutions.co.nz/linux/centos/images/jpackage-utils-compat-el5-0.0.1-1.noarch.rpm
$ rpm -Uvh jpackage-utils-compat-el5-0.0.1-1.noarch.rpm
$ yum install tomcat6 tomcat6-admin-webapps tomcat6-webapps

# setting tomcat runlevels
$ chkconfig --level 2345 tomcat6 on

# starting tomcat
$ /etc/init.d/tomcat6 start


At this point, you should be able to access Tomcat in your browser (http://example.com:8080)
Tomcat homepage

Downloading Solr Java library.

$ cd ~/downloads
# note: you may need to choose a different mirror to download
$ wget http://www.fightrice.com/mirrors/apache//lucene/solr/1.4.1/apache-solr-1.4.1.tgz
$ tar -xzf apache-solr-1.4.1.tgz

# copy/rename solr war file into Tomcat webapps directory
$ cp ~/downloads/apache-solr-1.4.1/dist/apache-solr-1.4.1.war /var/lib/tomcat6/webapps/solr.war

# copy solr files
$ cp -r ~/downloads/apache-solr-1.4.1/example/solr/ /var/lib/tomcat6/solr/


Copying the java war file into the Tomcat webapps folder will create this directory automatically:

/var/lib/tomcat6/webapps/solr


Create Catalina config file to link war file to solr directory:

$ cd /etc/tomcat6/Catalina/localhost

# create new file: "solr.xml", with the contents:

<?xml version="1.0" encoding="UTF-8"?>
<Context docBase="/var/lib/tomcat6/webapps/solr.war" debug="0" privileged="true" allowLinking="true" crossContext="true">
<Environment name="solr/home" type="java.lang.String" value="/var/lib/tomcat6/solr" override="true" />
</Context>


Setup Tomcat admin user(s):

# edit file: /etc/tomcat6/tomcat-users.xml, ensure similar contents exist:

<?xml version='1.0' encoding='utf-8'?>
<tomcat-users>
<role rolename="admin"/>
<role rolename="manager"/>
<user username="eric" password="supersecretpassword" roles="admin,manager"/>
</tomcat-users>


Update WEB-INF/web.xml file:

# edit file: /var/lib/tomcat6/webapps/solr/WEB-INF/web.xml, update section to reflect solr path:

<env-entry>
  <env-entry-name>solr/home</env-entry-name>
  <env-entry-value>/var/lib/tomcat6/solr</env-entry-value>
  <env-entry-type>java.lang.String</env-entry-type>
</env-entry>


Copy conf files from Drupal apachesolr module into Tomcat Solr conf directory (overwrite):

$ cp /var/www/vhosts/example.com/sites/all/modules/contrib/apachesolr/protwords.txt /var/lib/tomcat6/solr/conf/
$ cp /var/www/vhosts/example.com/sites/all/modules/contrib/apachesolr/schema.xml /var/lib/tomcat6/solr/conf/
$ cp /var/www/vhosts/example.com/sites/all/modules/contrib/apachesolr/solrconfig.xml /var/lib/tomcat6/solr/conf/


Reset Tomcat permissions/ownership:

$ cd /var/lib
$ chown -R tomcat.tomcat tomcat6/


Restart Tomcat

$ /etc/init.d/tomcat6 restart


At this point, you should be able to access the solr/admin tomcat Page (http://example.com:8080/solr/admin)
Solr Admin

If things are not working well at this point, check the Tomcat logs:

/var/log/tomcat6/catalina.out

And, ensure the solr java module is listed in the Tomcat Web Application Manager: http://example.com:8080/manager/html

If all is well, you can now enable the Drupal apachesolr modules:

$ cd /var/www/vhosts/example.com
$ drush en apachesolr apachesolr_search apachesolr_taxonomy apachesolr_access --uri=example.com


Log into your Drupal site. NOTE: default account (via drush): admin/admin

Edit default Apache Solr Host Settings:
URL: http://example.com/admin/config/search/apachesolr/server/solr/edit
Change url to: http://example.com:8080/solr, and save form.

Go to Drupal search settings page:
URL: http://example.com/admin/config/search/settings
Change the default search mode to "Apache Solr search", and save form.

Now, you are ready to test the indexing and integration.

Add a new piece of content to test indexing.
# Example:
# http://example.com/node/add/article
# title: Test Article
# Body: test test test

Browse to solr index page:
URL: http://example.com/admin/config/search/apachesolr/index
Select: Index queued content, and click on Begin button
You should see a status message like: "1 item processed successfully." and "Number of documents in index: 0 (1 sent but not yet processed)"
A few minutes later, refreshing the index page should show: "Number of documents in index: 1"

Search for "test" to verify Solr results.
URL: http://example.com/search/site/test
Solr Search Results

You can also review search results via solr/admin
URL: http://example.com:8080/solr/admin/
Enter "test" in query string box and click search

Part 2, Multicore Configuration (OPTIONAL)

If you need to run multiple sites off a single Solr Tomcat installation, you can setup multicore..

Copy the multicore xml file into your solr directory:

$ cp ~/downloads/apache-solr-1.4.1/example/multicore/solr.xml /var/lib/tomcat6/solr/


Create a new directory for each multisite in the solr directory:

$ mkdir /var/lib/tomcat6/solr/example.com


Replicate the solr conf directory into the new multisite directory:

cp -r /var/lib/tomcat6/solr/conf /var/lib/tomcat6/solr/example.com/conf/


Update the solr.xml file:

# edit file: /var/lib/tomcat6/solr/solr.xml, added <core> section for each site:
<cores adminPath="/admin/cores">
  <core name="example.com" instanceDir="example.com" />
</cores>


Restart Tomcat

$ /etc/init.d/tomcat6 restart


Now, your new multicore site will be accessible here: http://example.com:8080/solr/example.com/admin/
Avatar-eric-london
Created by Eric.London on 2011-04-10
Tags:
New Comment
 
Please note: the content on this page orginates from ericlondon.com.
In this article, I'll share a bash shell script I use periodically to import a directory of CSV files into MySQL tables. This script is most helpful when you need to process a large number of CSV files, and the tables have not yet been created. Of course you could use a GUI tool to accomplish this, but what's the fun in that?

The following script will get a list of CSV files, loop through them, add each table, add each column to the table (based on the first row), and then use the mysqlimport command to load all the CSV records. There are a few caveats though: 1. the first row of each CSV file must contain the column names; 2. it works best when your column names are simple text; and 3. your MySQL user must have permission to process files (see: File_priv).


#!/bin/bash

# show commands being executed, per debug
set -x

# define database connectivity
_db="csv_imports"
_db_user="csv_imports"
_db_password="changeme"

# define directory containing CSV files
_csv_directory="/path/to/the/csv/files"

# go into directory
cd $_csv_directory

# get a list of CSV files in directory
_csv_files=`ls -1 *.csv`

# loop through csv files
for _csv_file in ${_csv_files[@]}
do
    
  # remove file extension
  _csv_file_extensionless=`echo $_csv_file | sed 's/\(.*\)\..*/\1/'`
  
  # define table name
  _table_name="${_csv_file_extensionless}"
  
  # get header columns from CSV file
  _header_columns=`head -1 $_csv_directory/$_csv_file | tr ',' '\n' | sed 's/^"//' | sed 's/"$//' | sed 's/ /_/g'`
  _header_columns_string=`head -1 $_csv_directory/$_csv_file | sed 's/ /_/g' | sed 's/"//g'`
  
  # ensure table exists
  mysql -u $_db_user -p$_db_password $_db << eof
    CREATE TABLE IF NOT EXISTS \`$_table_name\` (
      id int(11) NOT NULL auto_increment,
      PRIMARY KEY  (id)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1
eof
  
  # loop through header columns
  for _header in ${_header_columns[@]}
  do

    # add column
    mysql -u $_db_user -p$_db_password $_db --execute="alter table \`$_table_name\` add column \`$_header\` text"

  done

  # import csv into mysql
  mysqlimport --fields-enclosed-by='"' --fields-terminated-by=',' --lines-terminated-by="\n" --columns=$_header_columns_string -u $_db_user -p$_db_password $_db $_csv_directory/$_csv_file  
  
done
exit


After creating my shell script file, and making it executable, I executed it. Since I added the line "set -x" to the script, a lot of helpful info is shown to debug.

./import.sh


Lastly I executed some SQL on the command line to verify the results. nice.

$ mysql -u csv_imports -pchangeme csv_imports --execute="select * from albums"
+----+-------------+---------------------+
| id | band        | album               |
+----+-------------+---------------------+
|  1 | band        | album               |
|  2 | black keys  | attack & release    |
|  3 | the dodos   | no color            |
|  4 | the xx      | xx                  |
|  5 | surf city   | kudos               |
|  6 | toro y moi  | underneath the pine |
|  7 | cut copy    | zonoscope           |
|  8 | twin shadow | forget              |
+----+-------------+---------------------+
Avatar-eric-london
Created by Eric.London on 2010-08-25
Tags:
New Comment
 
Please note: the content on this page orginates from ericlondon.com.
In this tutorial, I'll show how you can use awk, grep, and sed (my favorite command line tools) to backup and archive your MySQL databases. This can be useful to schedule a cron job, transfer your databases to another server, or any other type of scripting.

First, you'll have to get acquainted with connecting to and dumping your database on the command line. Depending on your user, credentials, and where the databases are located, your command might look something like this. Please note, there is no space between the password and the "-p" flag.


$ mysqldump -u user -pPASSWORD -h hostname database > database.sql


To simplify my example, I'm going to shorten the mysqldump command to the follow.


$ mysqldump database > database.sql


Now that we're MySQL command line pros, I'll break down each command. I'll start by showing all the databases.


Eric-Londons-MacBook-Pro:backup Eric$ mysql --execute="show databases"
+---------------------+
| Database            |
+---------------------+
| customers           |
| db_pics_ericlondon  |
| db_thedrupalblog_d6 |
| drupal              |
| drupal-pics         |
| drupalmusicproject  |
| itunes              |
+---------------------+


Now, I'll "pipe" the output from the previous command into awk to show the first column data.


Eric-Londons-MacBook-Pro:backup Eric$ mysql --execute="show databases" | awk '{print $1}'
Database
customers
db_pics_ericlondon
db_thedrupalblog_d6
drupal
drupal-pics
drupalmusicproject
itunes


And use grep to remove the first line that says "Database".


Eric-Londons-MacBook-Pro:backup Eric$ mysql --execute="show databases" | awk '{print $1}' | grep -iv ^Database$
customers
db_pics_ericlondon
db_thedrupalblog_d6
drupal
drupal-pics
drupalmusicproject
itunes


And use sed to build the mysqldump command. This one is kinda tricky, sorry. As you can see, I also embedded the date command in there to generate today's date in the format: YYYYMMDD.


Eric-Londons-MacBook-Pro:backup Eric$ mysql --execute="show databases" | awk '{print $1}' | grep -iv ^Database$ | sed 's/\(.*\)/mysqldump \1 > \1.'$(date +"%Y%m%d")'.sql/'
mysqldump customers > customers.20100825.sql
mysqldump db_pics_ericlondon > db_pics_ericlondon.20100825.sql
mysqldump db_thedrupalblog_d6 > db_thedrupalblog_d6.20100825.sql
mysqldump drupal > drupal.20100825.sql
mysqldump drupal-pics > drupal-pics.20100825.sql
mysqldump drupalmusicproject > drupalmusicproject.20100825.sql
mysqldump itunes > itunes.20100825.sql


Lastly, if everything looks good, you can pipe the output back to the command line.


Eric-Londons-MacBook-Pro:backup Eric$ mysql --execute="show databases" | awk '{print $1}' | grep -iv ^Database$ | sed 's/\(.*\)/mysqldump \1 > \1.'$(date +"%Y%m%d")'.sql/' | sh

Eric-Londons-MacBook-Pro:backup Eric$ ls -1
customers.20100825.sql
db_pics_ericlondon.20100825.sql
db_thedrupalblog_d6.20100825.sql
drupal-pics.20100825.sql
drupal.20100825.sql
drupalmusicproject.20100825.sql
itunes.20100825.sql


You could even take this one step further and pipe the output through gzip to compress the dumps :)