Comparing Hash Tables

I was recently asked in a comment how to compare 2 hash tables in Perl. Furthermore, the commenter mention that this would be use in a subroutine.

There is a module Data::Compare http://search.cpan.org/~dcantrell/Data-Compare-1.19/lib/Data/Compare.pm. I’ve never used this in any way other than to learn what it can do. From what I can tell it will not provide details. It will just tell you yes, the data structures are the same or no, the data structures are not the same.

If you want to get any detailed information you can always roll your own.

use strict;
use warnings;

my %hash1 = ('a' => 'b', 'b' => 'c', 'c' => 'd');
my %hash2 = ('a' => 'b1', 'b1' => 'c', 'c' => 'd');

compare(\%hash1, \%hash2 );

sub compare {
  my $hash1 = shift;
  my $hash2 = shift;

  if( keys %{$hash1} != keys %{$hash2} ) {
    print "Hash1 has ", scalar(keys %{$hash1}),
          " keys but hash2 has ", scalar(keys %{$hash2}), "\n";
  }

  # Compare hash1 to hash2
  # Sorted for clear output
  # Case insensitive comparison for sorting
  # Assumes all keys and values are strings
  #
  for my $key_hash1 (sort {lc($a) cmp lc($b) } keys %{$hash1} ) {
    if( ! exists $hash2->{$key_hash1} ) {
      print "Hash1 contains key '$key_hash1' but hash2 has no such key\n";
      next;
    }
    if( $hash1->{$key_hash1} ne $hash2->{$key_hash1} ) {
      print "Both hashes have '$key_hash1' as a key ",
            "but hash1's value is '$hash1->{$key_hash1}' ",
            "and hash2's value is '$hash2->{$key_hash1}'\n";
    }
  }
}

To complete the process you’d also loop through the keys of $hash2 and compare its keys and values to those of $hash1 in a manner similar to the above loop.

References

The second thing to comment on is that hash tables are just a variant of array. In an array the idex is an integer. In a hash the index can be an integer or also be a string (any SCALAR really). The only thing that the => symbol does in the above example is make it more readable by humans. I could have replaced all => with a comma and it would work.

The problem with passing 2 hashes/arrays to a routine is that they would get flattened into one list of values and the routine would see one hash/array instead of two. In order to maintain their individuality the hashes get passed as a reference. To then use the hashes you have to dereference the hashes. In case you didn’t know what the extra symbols mean, they are there because we’re working with references.

Another Option

The commenter mentioned that he is working with 2 files. In the file there are key value pairs. I got to thinking that he might not have to use Perl. He might try sorting the lines in each file (unix/linux/cygwin has a command called sort that has a number of options for sorting). Then use a tool such as diff to show the differences. Depending on how large the files are this might be a more efficient solution. So if the files are sorted in the same way, the keys, being the first entry in the file, would appear in the same order. Example: file one contains

abc: def
abd: fed
efg: hello
xyz: pdq

abc: def
efg: good bye
xyz: pdq

So diff would show something like

2,3c2
< abd: fed
< efg: hello
---
> efg: good bye

You can see from this that file 1, noted by the lines beginning with <, contains abc: fed and efg: hello and that these are not in file 2. It also tells you that file 2, noted by the lines beginning with >, contains efg: good bye.

If you’re not comfortable with this kind of output you could use something like TkDiff to view the differences in a more visual manner (http://sourceforge.net/projects/tkdiff/).

Additional Reading

This article might be useful for understaning references in Perl: http://www.perlmeme.org/howtos/using_perl/dereferencing.html.

You might also find Tom Christiansen’s Object Oriented Tutorial found here: http://www.perl.com/doc/manual/html/pod/perltoot.html.

Leave a Reply