Hash tables

I have found a number of potentially unconventional uses for hash tables (aka “associative arrays”). I suppose the first thing that comes to mind when thinking of hash tables is as a way to map a given value to another value. As a very simple example, say you have a list of items and want to keep track of how many of those items you have.

my %items = ();
$items{shoes} = 2;
$items{pants} = 1;
$items{dogs} = 5;
$items{cats} = 50;

We often refer to this arrangement as a “key/value” pair. Now, if you want to know how many shoes you have you can do so by referencing $items{shoes}. If you want to know just how crazy the cat person is, look at $items{cats}.

If you want to print out the item and count, loop through the hash as follows:

for my $key ( keys %items ) {
  print "I have $items{$key} $key\n";
}

On my machine, I see this output:

I have 50 cats
I have 1 pants
I have 2 shoes
I have 5 dogs

One thing to note is that there is no guarantee that the hash table will store the keys in a particular order. That is, there is nothing that says that every time that loop gets run the output will appear in the same order. In practice, for a list that small it probably will print out in the same order. However, you cannot bet on it, and once the list gets larger and items are added and removed the likelyhood of an unpredictable order will increase. Part of it has to do with how perl determines the most efficient way to store/retrieve the values.

If the order is important, it is useful to sort the keys before using them.

A nice thing about the for loop is that a sort routine can be inserted in the parentheses. As long as a list of elements appears in the parentheses after all processing is done perl doesn’t care. In the above example, if I wanted to sort the hash based on the keys:

for my $key ( sort { $a cmp $b } keys %items ) {
  print "I have $items{$key} $key\n";
}

This produces the output:

I have 50 cats
I have 5 dogs
I have 1 pants
I have 2 shoes

Since there is a sort done you can be certain that this is how the ouptut will appear every time.

If, on the other hand you would like to sort by the value instead of the key:

for my $key ( sort { $items{$a} <=> $items{$b} } keys %items ) {
  print "I have $items{$key} $key\n";
}

This produces:

I have 1 pants
I have 2 shoes
I have 5 dogs
I have 50 cats

You may have noticed that I used cmp and <=> We use cmp to compare alpha entities. The <=> is used when comparing numbers.

If you want the order reversed, simply revers the $a and $b (or $item{$a} and $item{$b}). The $a and $b variables are special variables in perl. When sorting, 2 elements are compared to see which one should come before the other. Two elements are tested. Then two more elements and so on until the list of elements is exhausted. For now, just know that in sort $a represents the first element and $b represents the second element. If you are curious, you can print out the elements as they are being sorted. Remember when I said that as long as a list of elements appear within the parentheses perl doesn’t care? Printing doesn’t affect the final list so this is valid:

for my $key ( sort { print "$a cmp $b\n"; $a cmp $b } keys %items ) {
  print "I have $items{$key} $key\n";
}

I see this output:

cats cmp pants
shoes cmp dogs
cats cmp dogs
dogs cmp pants
pants cmp shoes
I have 50 cats
I have 5 dogs
I have 1 pants
I have 2 shoes

Now, back to some of the less common uses of hash tables. The keys of a hash are unique. That means that in the following code, there will only be 2 keys: cats and dogs. The second time cats apppears the existing entry for cats gets overwritten with the new value.

my %items = ();
$items{cats} = 10;
$items{dogs} = 1;
$items{cats} = 50;
for my $key ( sort { $a cmp $b } keys %items ) {
  print "I have $items{$key} $key\n";
}

This is the output I see:

I have 50 cats
I have 1 dogs

One way we can take advantage of this feature is to ensure the uniqueness of a list.

Say we have this list:

my @list = qw(apple banana pear apple peach apple cherry apple berry);

By passing the values of this list to a hash table, we can remove dupliate values:

my %fruit = ();
foreach my $value ( @list ) {
  $fruit{$value}++;
}
my @unique_list = keys %fruit;
print "unique list: ", join( ', ', sort {$a cmp $b} keys %fruit ), "\n";

I get this as the output:

unique list: apple, banana, berry, cherry, peach, pear

Again you can stick a sort in there to get a sorted list.

Not only did you get a unique list of fruit, you also got a count of each fruit in the original list. That’s what the $fruit{$value}++ did. When not explicitly initialized the value of an entry in a hash table will be set to 0. In case you are unfamiliar, $fruit{$value}++ is a short-hand way of doing:

foreach my $value ( @list ) {
  if( ! exists $fruit{$value} ) {
    $fruit{$value} = 0;
  }
  $fruit{$value} = $fruit{$value} + 1;
}

In fact, if you do not use the ++ notation and use warnings are in effect (along with use strict, use warnings is a good habit to get into), perl will give a warning stating something like: Use of uninitialized value in addition (+) This means that $fruit{$value} has never been set to a value so perl is warning you that you might be attempting to add a value to an uninitialized value.

Back to the counts. You can access the counts, the value part of the key/value pair, in the same way as we accessed the number of shoes at the top of this article.

print "unique list: ", join( ', ', sort {$a cmp $b} keys %fruit ), "\n\n";
foreach my $key ( sort { $a cmp $b } keys %fruit ) {
  print "$key appears $fruit{$key} time(s) in the original list\n";
}

I get this output:

unique list: apple, banana, berry, cherry, peach, pear

apple appears 4 time(s) in the original list
banana appears 1 time(s) in the original list
berry appears 1 time(s) in the original list
cherry appears 1 time(s) in the original list
peach appears 1 time(s) in the original list
pear appears 1 time(s) in the original list

There you have some ideas about how to use a hash table in ways that may not have crossed you mind.

Leave a Reply