Bash Programming


Put these at the top of your scripts to direct bash to send all standard out and standard error created after the statement  to $log_file.  One way or another the following statements will send everything created by the current script, sub-shells, etc. to $log_file.  Your code will be cleaner and easier to maintain.

Overwrite the contents of $log_file
exec > $log_file 2>&1
Append to $log_file
exec >> $log_file 2>&1
Overwrite the contents of $log_file and send output to screen using tee
exec > >(tee $log_file) 2>&1
Append to the contents of $log_file and send output to screen using tee
exec > >(tee -a $log_file) 2>&1

Number Format (see:

Print value with a comma-delimiting the 1,000s place
$ printf "%'.3f\n" 12345678.901

Process IDs

process id of current script
one script put another script in the background (using &) – pid of background process


use variables for ranges
for x in $(eval echo "{$min..$max}");
better way to use variables for ranges using seq
for i in $(seq $min $max)

Return Status (see

When using something like 2>&1 | tee -a $FILE

Since pipe is used to pipe to tee checking $? for the return will not give you the exit status of the script. To get the exit status use PIPESTATUS Example:

ret=${PIPESTATUS: -1}


When writing functions getting a return status may not be as simple as using echo (see: ttp://

Write your function so that it accepts a variable name as part of its command line and then set that variable to the result of the function:

  function myfunc()
    local  __resultvar=$1
    local  myresult='some value'
    eval $__resultvar="'$myresult'"

  myfunc result
  echo $result

For more flexibility, you may want to write your functions so that they combine both result variables and command substitution:

  function myfunc()
      local  __resultvar=$1
      local  myresult='some value'
      if [[ "$__resultvar" ]]; then
          eval $__resultvar="'$myresult'"
          echo "$myresult"

  myfunc result
  echo $result
  echo $result2

Here, if no variable name is passed to the function, the value is output to the standard output.


To access the last element from the previous command line use $! Example:

$ ls -l /tmp/data.txt
-rw-r--r--  1 dan.brown  wheel  95 Nov 26 22:39 /tmp/data.txt
$ rm !$
rm /tmp/data.txt
$ ls -l /tmp/data.txt
ls: /tmp/data.txt: No such file or directory

To access the entire previous line, use !! Example:

$ which ls
$ ls -l `!!`
ls -l `which ls`
-rwxr-xr-x  1 root  wheel  34736 Oct 31  2013 /bin/ls

To access other elements of the previous line, use “word designators” separated by a colon (zero-based).

$ echo a b c d e
a b c d e
$ echo !!:2
echo b

Ranges can also be accessed:

  $ echo a b c d e
  a b c d e
  $ echo !!:3-4
  c d

There are various shortcuts, such as, ‘:$’ to refer to the last argument, ‘:^’ to refer to the first argument, ‘:*’ to refer to all the arguments (synonym to ‘:1-$’), and others. See the cheat sheet for a complete list.

Modifiers can be used to modify the behavior of a word designators (see: For example:

  $ tar -xvzf software-1.0.tgz
  $ cd !!:$:r

Here the “r” modifier was applied to a word designator which picked the last argument from the previous command line. The ‘r’ modifier removed the trailing suffix ‘.tgz’.

The “h” modifier removes the trailing pathname component, leaving the head:

  $ echo /usr/local/apache
  $ echo !!:$:h

The “e” modifier removes all but the trailing suffix:

  $ ls -la /usr/src/software-4.2.messy-Extension
  $ echo /usr/src/*!!:$:e
  /usr/src/*.messy-Extension    # ls could have been used instead of echo

Another interesting modifier is the substitute ‘:s/old/new/’ modifier which substitutes new for old. It can be used in conjunction with ‘g’ modifier to do global substitution. For example,

  $ ls /urs/local/software-4.2 /urs/local/software-4.3
  /usr/bin/ls: /urs/local/software-4.2: No such file or directory
  /usr/bin/ls: /urs/local/software-4.3: No such file or directory
  $ !!:gs/urs/usr/
ls /usr/local/software-4.2 /usr/local/software-4.3
ls: /usr/local/software-4.2: No such file or directory
ls: /usr/local/software-4.3: No such file or directory

This example replaces all occurances of ‘urs’ to ‘usr’ and makes the command correct.

There are a few other modifiers, such as ‘p’ modifier which prints the resulting command after history expansion but does not execute it. See the cheat sheet for all of the modifiers.

Remote Copying

Copy from hdfs to local laptop

hadoop fs -cat /path/dir/file.txt | ssh dan.brown@my-machine 'cat > /tmp/file.txt'

copy from hdfs to hdfs on another machine

hadoop fs -cat /path/dir/file.txt | ssh dan.brown@my-machine 'hadoop fs -put - /data/file.txt'

AWK Scripts

I need to process text every day.  At times it is useful to find counts and I found that using AWK is the easiest way to do so.  Say I had a data file that looks like this:

row col1 col2 col3
0 3500.352 10l1.1 2356.4
1 292.2 3100.0 1997.99
2 1.354 2.3001 3354.2342523

This AWK statement will print information about col1:

$ awk ' NR == 1  {next} { s += $2 } END {  print "sum: ", s, " average: ", s/(NR-1), " samples: ", NR-1 }' /tmp/data.txt

sum:  3793.91  average:  1264.64  samples:  3

To do the same thing but print out comma-delimited numbers:

$ awk ' NR == 1 {next} { s += $2 } END { printf("sum: %'\''d average: %'.\''2f samples: %'\''d\n", s, s/NR, NR)}' /tmp/data.txt

sum: 3,793 average: 948 samples: 4

To print out the line that has the greatest number of characters:

awk '{ if (length($0) > max) {max = length($0); maxline = $0} } END { print maxline }' /tmp/data.txt

2 1.354 2.3001 3354.2342523

To print out the number of characters that appear in the line with the greatest number of characters:

awk '{ if (length($0) > max) max = length($0) } END { print max }' /tmp/data.txt


Installing A Single Instance of SolrCloud

After quite a bit of searching I can’t seem to find a simple example of setting up a single instance of SolrCloud on a local machine. I work with Solr every day. I do most of my development on my laptop and when everything is good I commit it and deploy. We recently hired someone who will be working with Solr and I wanted to get his laptop set up to run SolrCloud locally, too. However, I found it difficult to locate a document that I could just point him to. This is that document.

Note that I understand the installation described below is likely to be useful only when doing development. What’s the point of having a distributed SolrCloud when it’s only running on one machine?

See for a list of required SolrCloud configurations.


ZooKeeper ( is centralized service that is used to coordinate configuration information. You will be telling ZooKeeper where to find SolrCloud configuration files.

SolrCloud comes with an embedded ZooKeeper. However, our production configuration uses ZooKeeper as a stand-alone system and I want to mimic production.


  • Download ZooKeeper from Apache’s site
  • Extract the downloaded file.
  • Follow the steps outlined in the getting started guide  Here are the basics.  Be aware that this may change with future versions of ZooKeeper.
    • Copy ZOOKEEPER_DIR/conf/zoo_sample.cfg to ZOOKEEPER_DIR/conf/zoo.cfg
    • I changed the value of dataDir in ZOOKEEPER_DIR/conf/zoo.cfg to an existing empty directory
    • Start zookeeper: ZOOKEEPER_DIR/bin/ start


Verify that ZooKeeper is running:

ZOOKEEPER_DIR/bin/ -server

You should see a command prompt that looks something like this:

[zk: 0]

Enter quit to exit the client

[zk: 0] quit

If you get a Connection refused error you know the server is not running.

ZooKeeper and Solr’s Configuration Files

Using the SOLR_DIR/example/scripts/cloud-scripts/ script upload Solr configuration files to ZooKeeper:

SOLR_DIR/example/scripts/cloud-scripts/ -zkhost localhost:2181 -cmd upconfig -confdir SOLR_DIR/example/solr/my-collection/conf -confname my-collection-config

From this you should see a bunch of output including a list of the configuration files found in the directory pointed to by the “-confdir” flag.

Start SolrCloud

Start up an instance of SolrCloud:

java -jar SOLR_DIR/bin/solr start -z localhost:2181 -cloud

The “-cloud” flag tells Solr to start in a cloud configuration.  The “-z localhost:2181” flag tells Solr how to connect to ZooKeeper where it will find configuration information.

You may now look at the SolrCloud admin page found here: http://localhost:8983/solr/#/

Create New Solr Collection

So far we’ve uploaded a set of Solr configuration files to ZooKeeper and started an instance of SolrCloud.  Next we need to create a new Solr collection telling Solr how to find its configuration in ZooKeeper.

curl 'localhost:8983/solr/admin/collections?action=CREATE&numShards=1&name=my-collection&collection.configName=my-collection-config'

Notice how the value of “collection.configName” is the same as what was used in “upconfig” command that was sent to ZooKeeper: my-collection-config  This tells Solr to use that name when asking ZooKeeper for the configuration for this new collection.

The “numShards” parameter is required.  The documentation ( is a little confusing.  The table says it is not required but the description says otherwise.  I found that if I do not provide the “numShards” parameter the response from Solr is

org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: numShards is a required param

As such, I just set the value to 1 and everything works as expected.

To make this process easier I created a few scripts which can be found here on github:

Using Hadoop to Create SOLR Indexes

One of the most challenging projects I faced at work recently was to create a Apache SOLR index consisting of approx 15 million records. This index had been created once in the history of the company using a MySQL database and SOLR’s Data Import Handler (DIH). It had not been attempted since then because the original indexing process was time consuming (12-14 hours), required human supervision, and on failure had to be restarted from the very beginning.
Continue reading

Android apps on my phone

I was the first one at work to get an Android phone. As word got around I have become the guy to go to when it comes to Android. I get asked questions about Android before they buy. I also get questions about Android after they’ve bought their new phone. One thing that people seem to appreciate is when I provide them with a list of apps to get them started.

Without further ado, here is a list of most of the apps I have on my phone. I’ve not included some of the apps (e.g., OEM pre-installed crap).
Continue reading

How I managed to install the CyanogenMod 6.0.0-Droid-RC2 on my Motorola Droid from a Mac Book Pro (OSX v.10.6.4)

First I read the wiki entry As I was reading I noticed that there were no directions for OSX; only for Windows and Linux. It turns out that the flash recovery tools are not available for OSX. That means that I would have to do this from a virtual machine.
Continue reading

r cannot be resolved

In the Google group Android Beginners I frequently see messages that ask what the error "r cannot be resolved" means in Eclipse.

Eclipse generates the file for you using the aapt tool (Official Guide to the Android Asset Packaging Tool). contains a mapping to all the resources your application will use. Note that you should have the Build Automatically option under the Project menu checked on.
Continue reading