background image
HomeRecent PostsDrupalSearchTagsRSSContactAboutAccount
Eric.London's picture

In this article, I'll share a bash shell script I use periodically to import a directory of CSV files into MySQL tables. This script is most helpful when you need to process a large number of CSV files, and the tables have not yet been created. Of course you could use a GUI tool to accomplish this, but what's the fun in that?

The following script will get a list of CSV files, loop through them, add each table, add each column to the table (based on the first row), and then use the mysqlimport command to load all the CSV records. There are a few caveats though: 1. the first row of each CSV file must contain the column names; 2. it works best when your column names are simple text; and 3. your MySQL user must have permission to process files (see: File_priv).

#!/bin/bash

# show commands being executed, per debug
set -x

# define database connectivity
_db="csv_imports"
_db_user="csv_imports"
_db_password="changeme"

# define directory containing CSV files
_csv_directory="/path/to/the/csv/files"

# go into directory
cd $_csv_directory

# get a list of CSV files in directory
_csv_files=`ls -1 *.csv`

# loop through csv files
for _csv_file in ${_csv_files[@]}
do
   
  # remove file extension
  _csv_file_extensionless=`echo $_csv_file | sed 's/\(.*\)\..*/\1/'`
 
  # define table name
  _table_name="${_csv_file_extensionless}"
 
  # get header columns from CSV file
  _header_columns=`head -1 $_csv_directory/$_csv_file | tr ',' '\n' | sed 's/^"//' | sed 's/"$//' | sed 's/ /_/g'`
  _header_columns_string=`head -1 $_csv_directory/$_csv_file | sed 's/ /_/g' | sed 's/"//g'`
 
  # ensure table exists
  mysql -u $_db_user -p$_db_password $_db << eof
    CREATE TABLE IF NOT EXISTS \`$_table_name\` (
      id int(11) NOT NULL auto_increment,
      PRIMARY KEY  (id)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1
eof
 
  # loop through header columns
  for _header in ${_header_columns[@]}
  do

    # add column
    mysql -u $_db_user -p$_db_password $_db --execute="alter table \`$_table_name\` add column \`$_header\` text"

  done

  # import csv into mysql
  mysqlimport --fields-enclosed-by='"' --fields-terminated-by=',' --lines-terminated-by="\n" --columns=$_header_columns_string -u $_db_user -p$_db_password $_db $_csv_directory/$_csv_file 
 
done
exit

After creating my shell script file, and making it executable, I executed it. Since I added the line "set -x" to the script, a lot of helpful info is shown to debug.

./import.sh

Lastly I executed some SQL on the command line to verify the results. nice.

$ mysql -u csv_imports -pchangeme csv_imports --execute="select * from albums"
+----+-------------+---------------------+
| id | band        | album               |
+----+-------------+---------------------+
|  1 | band        | album               |
|  2 | black keys  | attack & release    |
|  3 | the dodos   | no color            |
|  4 | the xx      | xx                  |
|  5 | surf city   | kudos               |
|  6 | toro y moi  | underneath the pine |
|  7 | cut copy    | zonoscope           |
|  8 | twin shadow | forget              |
+----+-------------+---------------------+

I recently found some time to switch my site's search framework from Lucene to Apache Solr. The module's README.txt makes installation for small production sites easy and straight forward.

Following the installation guide, I started the java Solr process by entering the right directory and executing the java jar..

$ java -jar start.jar

Everything was up and running in minutes.. until I closed my terminal and the java service ended with my shell process. Short term, I decided to writing a bash shell script to ensure Solr is running, and cron it to run every five minutes.

Here are the contents of my bash shell script:

#!/bin/bash

# check for process id
pid=`ps ax | grep -ir java.*jar.*start\.jar | grep -iv grep | awk '{print $1}'`

# check if pid is not an integer
if ! [[ "$pid" =~ ^[0-9]+$ ]] ; then

  # start service
  cd /path/to/my/apache-solr-1.4.1/installation
  java -jar start.jar &

  # send email notification
  message='ericlondon.com: starting solr service'
  subject='ericlondon.com: starting solr service'
  to='myemail@example.com'
  echo "$message" | mail -s "$subject" $to

  exit 1;

fi

And I added the following cronjob:

$ crontab -l
*/5 * * * * /path/to/my/scripts/folder/check_solr.sh

A better option would be to setup initialization scripts for the process (/etc/init.d/), or install Solr as a more permanent solution, but I guess this will do for the time being :) ...


Part 2, Using Supervisor (updated: 2011/04/12)

As mentioned above, using a cronjob is probably not the best solution. I decided to install and configure supervisord to monitor the process.

Unfortunately supervisor was not available for for Centos 5.5 (RHEL):

$ yum search supervisor
Finished
Warning: No matches found for: supervisor
No Matches found

Luckily, I found some RPMs via http://rpmfind.net. I installed supervisor and its one dependency:

# downloading RPMs
$ wget ftp://rpmfind.net/linux/epel/5/x86_64/supervisor-2.1-3.el5.noarch.rpm
$ wget ftp://rpmfind.net/linux/epel/5/x86_64/python-meld3-0.6.3-1.el5.x86_64.rpm

# installing RPMs
$ rpm -Uvh python-meld3-0.6.3-1.el5.x86_64.rpm
$ rpm -Uvh supervisor-2.1-3.el5.noarch.rpm

# setting run level for supervisord
$ chkconfig --level 2345 supervisord on

# starting supervisor
$ /etc/init.d/supervisord start

Next, I create a simple shell script to start the Solr process and made the script executable. NOTE: file contents have been simplified:

#!/bin/bash

# enter solr dir
cd /path/to/my/apache-solr-1.4.1/installation

# start solr
java -jar start.jar

Lastly, I added a few line to my supervisor conf file (/etc/supervisord.conf):

[program:apache_solr]
command=/path/to/my/scripts/folder/apache-solr-supervisor-run.sh

Upon restarting supervisor, solr started automatically

$ /etc/init.d/supervisord restart

$ ps aux | grep -ir java | grep -iv grep
root     28670  0.1  8.5 1041076 43548 ?       Sl   13:30   0:02 java -jar start.jar

I killed the script and it immediately came back (with a different process ID)!

$ kill 28670

$ ps aux | grep -ir java | grep -iv grep
root     28869 62.0  5.3 1021532 27016 ?       Sl   13:50   0:00 java -jar start.jar

Eric.London's picture

I recently wrote a quick BASH shell script to FTP a log file to another server monthly. First, I modified the logrorate configuration to rotate a service's logs monthly. Then I added a cron job to be executed the following script once a month. NOTE: It's important to give logrotate enough time to finish rotating the logs. Here's my script:

#!/bin/bash

_user="MYFTPUSER"
_password="MYFTPPASSWORD"

# create a date string in the format YYYYMM for last month
_date=$(date +%Y%m --date="-1 month")

# Create FTP connection and put the log in the user's home folder
ftp -n MYFTPSERVER <<EOF
user $_user $_password
binary
put /var/log/MYROTATEDLOG.log.1 ~/MYROTATEDLOG.$_date.log
bye
EOF

Sometimes I am forced to edit PHP files outside Eclipse. Here's a quick guide to make your text editor (in this case, Emacs) a little more user friendly by enabled php-mode and syntax highlighting.

First, download php-mode and stick it in your ~/.emacs.d folder:

cd ~/.emacs.d
wget http://php-mode.svn.sourceforge.net/svnroot/php-mode/tags/php-mode-1.4.0/php-mode.el

Next, paste the following code into your ~/.emacs file. This will enable php-mode and syntax highlighting. As you can see, I also added a default file extension for .module files.

(global-font-lock-mode 1)

(require 'php-mode)
(setq auto-mode-alist
  (append '(("\\.php$" . php-mode)
            ("\\.module$" . php-mode))
              auto-mode-alist))

Now, when you open .php or .module files, your code will be syntax highlighted and emacs will be tailored to editing PHP code. Screen shot:

Eric.London's picture

Here's a command to add all the new files in your current path to subversion:

svn stat | grep ^? | sed 's/?      /svn add "/' | sed 's/$/"/' | sh
svn commit -m "added all my new files"

Syndicate content