Bash Programming

Redirection

Put these at the top of your scripts to direct bash to send all standard out and standard error created after the statement  to $log_file.  One way or another the following statements will send everything created by the current script, sub-shells, etc. to $log_file.  Your code will be cleaner and easier to maintain.

Overwrite the contents of $log_file
exec > $log_file 2>&1
Append to $log_file
exec >> $log_file 2>&1
Overwrite the contents of $log_file and send output to screen using tee
exec > >(tee $log_file) 2>&1
Append to the contents of $log_file and send output to screen using tee
exec > >(tee -a $log_file) 2>&1

Number Format (see: http://stackoverflow.com/questions/9374868/number-formatting-in-bash-with-thousand-separator

Print value with a comma-delimiting the 1,000s place
$ printf "%'.3f\n" 12345678.901
12,345,678.901

Process IDs

process id of current script
$$
one script put another script in the background (using &) – pid of background process
$!

Variables

use variables for ranges
for x in $(eval echo "{$min..$max}");
better way to use variables for ranges using seq
for i in $(seq $min $max)

Return Status (see http://www.unix.com/shell-programming-scripting/92163-command-does-not-return-exit-status-due-tee.html

When using something like

script.sh 2>&1 | tee -a $FILE

Since pipe is used to pipe to tee checking $? for the return will not give you the exit status of the script. To get the exit status use PIPESTATUS Example:

ret=${PIPESTATUS: -1}

Functions

When writing functions getting a return status may not be as simple as using echo (see: ttp://www.linuxjournal.com/content/return-values-bash-functions).

Write your function so that it accepts a variable name as part of its command line and then set that variable to the result of the function:

  function myfunc()
  {
    local  __resultvar=$1
    local  myresult='some value'
    eval $__resultvar="'$myresult'"
  }

  myfunc result
  echo $result

For more flexibility, you may want to write your functions so that they combine both result variables and command substitution:

  function myfunc()
  {
      local  __resultvar=$1
      local  myresult='some value'
      if [[ "$__resultvar" ]]; then
          eval $__resultvar="'$myresult'"
      else
          echo "$myresult"
      fi
  }

  myfunc result
  echo $result
  result2=$(myfunc)
  echo $result2

Here, if no variable name is passed to the function, the value is output to the standard output.

History

To access the last element from the previous command line use $! Example:

$ ls -l /tmp/data.txt
-rw-r--r--  1 dan.brown  wheel  95 Nov 26 22:39 /tmp/data.txt
$ rm !$
rm /tmp/data.txt
$ ls -l /tmp/data.txt
ls: /tmp/data.txt: No such file or directory

To access the entire previous line, use !! Example:

$ which ls
/bin/ls
$ ls -l `!!`
ls -l `which ls`
-rwxr-xr-x  1 root  wheel  34736 Oct 31  2013 /bin/ls

To access other elements of the previous line, use “word designators” separated by a colon (zero-based).

$ echo a b c d e
a b c d e
$ echo !!:2
echo b
b

Ranges can also be accessed:

  $ echo a b c d e
  a b c d e
  $ echo !!:3-4
  c d

There are various shortcuts, such as, ‘:$’ to refer to the last argument, ‘:^’ to refer to the first argument, ‘:*’ to refer to all the arguments (synonym to ‘:1-$’), and others. See the cheat sheet for a complete list.

Modifiers can be used to modify the behavior of a word designators (see: http://www.catonmat.net/blog/the-definitive-guide-to-bash-command-line-history/. For example:

  $ tar -xvzf software-1.0.tgz
  software-1.0/file
  ...
  $ cd !!:$:r
  software-1.0$

Here the “r” modifier was applied to a word designator which picked the last argument from the previous command line. The ‘r’ modifier removed the trailing suffix ‘.tgz’.

The “h” modifier removes the trailing pathname component, leaving the head:

  $ echo /usr/local/apache
  /usr/local/apache
  $ echo !!:$:h
  /usr/local

The “e” modifier removes all but the trailing suffix:

  $ ls -la /usr/src/software-4.2.messy-Extension
  ...
  $ echo /usr/src/*!!:$:e
  /usr/src/*.messy-Extension    # ls could have been used instead of echo

Another interesting modifier is the substitute ‘:s/old/new/’ modifier which substitutes new for old. It can be used in conjunction with ‘g’ modifier to do global substitution. For example,

  $ ls /urs/local/software-4.2 /urs/local/software-4.3
  /usr/bin/ls: /urs/local/software-4.2: No such file or directory
  /usr/bin/ls: /urs/local/software-4.3: No such file or directory
  $ !!:gs/urs/usr/
ls /usr/local/software-4.2 /usr/local/software-4.3
ls: /usr/local/software-4.2: No such file or directory
ls: /usr/local/software-4.3: No such file or directory

This example replaces all occurances of ‘urs’ to ‘usr’ and makes the command correct.

There are a few other modifiers, such as ‘p’ modifier which prints the resulting command after history expansion but does not execute it. See the cheat sheet for all of the modifiers.

Remote Copying

Copy from hdfs to local laptop

hadoop fs -cat /path/dir/file.txt | ssh [email protected] 'cat > /tmp/file.txt'

copy from hdfs to hdfs on another machine

hadoop fs -cat /path/dir/file.txt | ssh [email protected] 'hadoop fs -put - /data/file.txt'

AWK Scripts

I need to process text every day.  At times it is useful to find counts and I found that using AWK is the easiest way to do so.  Say I had a data file that looks like this:

row col1 col2 col3
0 3500.352 10l1.1 2356.4
1 292.2 3100.0 1997.99
2 1.354 2.3001 3354.2342523

This AWK statement will print information about col1:

$ awk ' NR == 1  {next} { s += $2 } END {  print "sum: ", s, " average: ", s/(NR-1), " samples: ", NR-1 }' /tmp/data.txt

sum:  3793.91  average:  1264.64  samples:  3

To do the same thing but print out comma-delimited numbers:

$ awk ' NR == 1 {next} { s += $2 } END { printf("sum: %'\''d average: %'.\''2f samples: %'\''d\n", s, s/NR, NR)}' /tmp/data.txt

sum: 3,793 average: 948 samples: 4

To print out the line that has the greatest number of characters:

awk '{ if (length($0) > max) {max = length($0); maxline = $0} } END { print maxline }' /tmp/data.txt

2 1.354 2.3001 3354.2342523

To print out the number of characters that appear in the line with the greatest number of characters:

awk '{ if (length($0) > max) max = length($0) } END { print max }' /tmp/data.txt

27

Comparing Hash Tables

I was recently asked in a comment how to compare 2 hash tables in Perl. Furthermore, the commenter mention that this would be use in a subroutine.

There is a module Data::Compare http://search.cpan.org/~dcantrell/Data-Compare-1.19/lib/Data/Compare.pm. I’ve never used this in any way other than to learn what it can do. From what I can tell it will not provide details. It will just tell you yes, the data structures are the same or no, the data structures are not the same.
Continue reading

Hash tables

I have found a number of potentially unconventional uses for hash tables (aka “associative arrays”). I suppose the first thing that comes to mind when thinking of hash tables is as a way to map a given value to another value. As a very simple example, say you have a list of items and want to keep track of how many of those items you have.

my %items = ();
$items{shoes} = 2;
$items{pants} = 1;
$items{dogs} = 5;
$items{cats} = 50;

We often refer to this arrangement as a “key/value” pair. Now, if you want to know how many shoes you have you can do so by referencing $items{shoes}. If you want to know just how crazy the cat person is, look at $items{cats}.
Continue reading