One Fork: February 2011

Friday, February 25, 2011

Scala Permutations 2

My second attempt at permutations in Scala (see first attempt):

def perms3[T](L: List[T]):List[List[T]] = L match {
   //permutations of a single element is simply a list containing a list with that element
   case head :: Nil => List(List(head))
   case head :: tail => {         
     (List[List[T]]() /: L)((result, head) => (List[List[T]]() /: perms3(L-head))((accum, tailperm) => (head :: tailperm) :: accum) ::: result)     
   }
    case Nil => List[List[T]]()
  }

I actually wound up with three versions (perms, perms2, perms3, all shown below) while trying to get rid of the mutable variable from the first version:

object Perm {
  def main(args : Array[String]) : Unit = {
   tryPerms ("A" :: Nil)
   tryPerms ("A" :: "B" :: Nil)
   tryPerms ("A" :: "B" :: "C" :: Nil)
  }
  
  def tryPerms(input: List[String]) {
   println ("Permutations of " + input.mkString(",") + ", v1")   
   for (perm <- perms (input))
     println (perm.mkString(","))  
   println ("Permutations of " + input.mkString(",") + ", v2")   
   for (perm <- perms2 (input))
     println (perm.mkString(","))  
   println ("Permutations of " + input.mkString(",") + ", v3")   
   for (perm <- perms3 (input))
     println (perm.mkString(","))       
   println
  }
  
  def perms[T](L: List[T]):List[List[T]] = L match {
   //permutations of a single element is simply a list containing a list with that element
   case head :: Nil => List(List(head))
   case head :: tail => {          
    var result = List[List[T]]()
    for (head <- L; tail <- perms(L-head))  
     result = (head :: tail) :: result
    result     
   }
    case Nil => List[List[T]]()
  }
  
  def perms2[T](L: List[T]):List[List[T]] = L match {
   //permutations of a single element is simply a list containing a list with that element
   case head :: Nil => List(List(head))
   case head :: tail => {          
     L.foldLeft(List[List[T]]())((result, current) => {
       perms2(L-current).foldLeft(List[List[T]]())((r, c) => {
        (current :: c) :: r        
       }) ::: result
     })
   }
    case Nil => List[List[T]]()
  }
  
  def perms3[T](L: List[T]):List[List[T]] = L match {
   //permutations of a single element is simply a list containing a list with that element
   case head :: Nil => List(List(head))
   case head :: tail => {         
     (List[List[T]]() /: L)((result, head) => (List[List[T]]() /: perms3(L-head))((accum, tailperm) => (head :: tailperm) :: accum) ::: result)     
   }
    case Nil => List[List[T]]()
  }  
}

Version two gets rid of the offensive mutable result variable from version one. Version three re-expresses version two more concisely (perhaps to the point of mild illegibility!).

Output is:

Permutations of A, v1
A
Permutations of A, v2
A
Permutations of A, v3
A

Permutations of A,B, v1
B,A
A,B
Permutations of A,B, v2
B,A
A,B
Permutations of A,B, v3
B,A
A,B

Permutations of A,B,C, v1
C,A,B
C,B,A
B,A,C
B,C,A
A,B,C
A,C,B
Permutations of A,B,C, v2
C,A,B
C,B,A
B,A,C
B,C,A
A,B,C
A,C,B
Permutations of A,B,C, v3
C,A,B
C,B,A
B,A,C
B,C,A
A,B,C
A,C,B

Thursday, February 24, 2011

Scala vs Erlang vs Java permutations

Erlang permutations is ... terse:

perms([]) -> [[]];
perms(L) -> [[H|T] || H <- L, T <- perms(L--[H])].

I have entire post about this terse little gem, including the Java equivalent (rather less terse!); see Erlang: so awesome it makes my brain bleed.

It seemed rather reasonable to try this in Scala. It's a functional programming language with list comprehensions so surely it will be pretty terse! My first draft as a Scala newb:

def perms[T](L: List[T]):List[List[T]] = L match {
   //permutations of a single element is simply a list containing a list with that element
   case head :: Nil => List(List(head))
   case head :: tail => {          
    //not quite sure how to dodge use of the result var to accumulate results :(
    var result = List[List[T]]()
    for (head <- L; tail <- perms(L-head))  
     result = (head :: tail) :: result
    result
   }
    case Nil => List[List[T]]()
  }

Much nicer than the Java version but it doesn't hold a candle to Erlang for terse! It does however have the marked benefit of being vastly easier to read and comprehend ;) I wish I knew how to get rid of the result var :(

For reference, the Java equivalent (written to clearly express what is happening more than to be as lean as possible):

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
 
 
public class Perms {
 public static void main(String[] argv) {
  System.out.println(perms(Arrays.asList("A", "B", "C")));
 }
 
 public static List<List<String>> perms(List<String> input) {
  List<List<String>> output = new ArrayList<List<String>>();
  for (String H : input) {
   //L--[H]
   List<String> LminusH = new ArrayList<String>(input);
   LminusH.remove(H);   
   
   if (LminusH.isEmpty()) {
    //[H|T] when T is empty
    output.add(new ArrayList<String>(Arrays.asList(H)));
   } else {
    for (List<String> T : perms(LminusH)) {    
     //a list made up of [H|T]
     List<String> HT = new ArrayList<String>();
     HT.add(H);
     HT.addAll(T);
     output.add(HT);
    }
   }
  }
  return output;
 }
}

Tuesday, February 22, 2011

JUnit 4 Scala

Having just tried the very basics of getting ScalaIDE up and running for Scala I naturally immediately want unit tests because like any good Java developer I believe my code doesn't work until proven otherwise.

The internets turned up a number of fairly ugly sounding solutions for getting unit testing going in Scala so I thought rather than try that, why not just see if I can use ye olde JUnit "just like Java" (does that make me a bad Scala user?). Scala compiles to normal .class files so provided I get both Scala and JUnit on classpath it should work just fine. And it does!

First I expressed my dependence on JUnit. As I'm using a simple Eclipse test project I just right-clicked the project, selected Build Path>Add Libraries, picked JUnit and then selected to use JUnit 4. This added the JUnit 4 entry to my .classpath for me:

<?xml version="1.0" encoding="UTF-8"?>
<classpath>
 <classpathentry kind="src" path="src"/>
 <classpathentry kind="con" path="org.scala-ide.sdt.launching.SCALA_CONTAINER"/>
 <classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER/org.eclipse.jdt.internal.debug.ui.launcher.StandardVMType/JavaSE-1.6"/>
 <classpathentry kind="con" path="org.eclipse.jdt.junit.JUNIT_CONTAINER/4"/>
 <classpathentry kind="output" path="bin"/>
</classpath>

Next I created a test class, imported the bits of JUnit I needed, slapped @Test onto a few methods, held my breath, and hit Alt+Shift+X, T to run as JUnit test. And like magic it worked perfectly!

The code:

import org.junit.Assert._
import org.junit.Test

class FizzBuzzTestCase {
    @Test
    def meFail() = assertEquals(1, 2) 
 
    @Test
    def meSucceed() = assertEquals(1, 1)
}

The outcome:

Hooray! Super simple, familiar, and effective :)

ScalaIDE vs Me

I like Eclipse, I like functional programming languages, and I hear great things about Scala so I thought I'd try it out by way of the ScalaIDE.

Initial installation was a little painful. The (experimental) Eclipse 3.6 Helios update site wouldn't work so I grabbed the latest Eclipse 3.5 Galileo (Java Dev, SR2) and tried to use the update site for that. Sadly this just gave me a "we hate your certificate" message with no option to trust anyway and aborted the install process. Arglefarg. Downloading the zip file of the update site and pointing Eclipse at the provided content.jar also produced a "we hate your certificate" message but this time there was an option to trust anyway and finally the damn thing got installed. Finally!

Having the IDE installed adds New Project>Scala Wizards>Scala Project. This sets up the .classpath and .project files appropriately for Scala. I like editing these files manually at times so I tend to inspect them whenever I try a new project type.

.classpath

<?xml version="1.0" encoding="UTF-8"?>
<classpath>
 <classpathentry kind="src" path="src"/>
 <classpathentry kind="con" path="org.scala-ide.sdt.launching.SCALA_CONTAINER"/>
 <classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER/org.eclipse.jdt.internal.debug.ui.launcher.StandardVMType/JavaSE-1.6"/>
 <classpathentry kind="output" path="bin"/>
</classpath>

.project

<?xml version="1.0" encoding="UTF-8"?>
<projectDescription>
 <name>helloscala</name>
 <comment></comment>
 <projects>
 </projects>
 <buildSpec>
  <buildCommand>
   <name>org.scala-ide.sdt.core.scalabuilder</name>
   <arguments>
   </arguments>
  </buildCommand>
 </buildSpec>
 <natures>
  <nature>org.scala-ide.sdt.core.scalanature</nature>
  <nature>org.eclipse.jdt.core.javanature</nature>
 </natures>
</projectDescription>

Using the commandline interpreter hello world looks like this:

println "hello, world"

However, ScalaIDE doesn't like that; it demands a proper object declaration:

object helloapp {
  def main(args : Array[String]) = println("Hello, World")
}

ScalaIDE installs the expected keyboard shortcuts (eg Alt+Shift+X, S to run a Scala app) but sadly it doesn't actually work; the IDE just spits out no class def if you try to run:

java.lang.NoClassDefFoundError: helloapp
Caused by: java.lang.ClassNotFoundException: helloapp
 at java.net.URLClassLoader$1.run(Unknown Source)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(Unknown Source)
 at java.lang.ClassLoader.loadClass(Unknown Source)
 at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
 at java.lang.ClassLoader.loadClass(Unknown Source)
Exception in thread "main"

A little poking around revealed that the projects /bin directory had nothing in it. As this is where it looks for .class files the NoClassDef issue makes sense ...except that the file I tried to run was valid and should have compiled into /bin damnit!

After playing around a bit I finally figured out that the first file I created (the minimal hello world that the IDE doesn't like) doesn't compile in the IDE and apparently that prevented other files added to the project from being compiled either! WTF! In any case, once the offending file was fixed and the Problems window reports no Scala issues the expected .class files show up in /bin and it becomes possible to run as Scala App from within Eclipse.

In painful detail:

Create a few valid Scala files in Eclipse, with Project>Build Automatically turned on

.class files show up in the /bin dir, we can run as Scala App just fine

Add a Scala file with a problem

The IDE reports it in Problems; we can run the other files as Scala App just fine (because their .class files are in /bin)
Other (valid) files can be run OK ... unless you do a Project>Clean, in which case all your .class files are wiped out and none are replaced because one bad file = no class files are created for any file
Additionally, if you modify a valid file (in a valid way) it's .class file is removed but not replaced, again because apparently one bad file = no class files for all

It's not that big of a deal as I usually keep my files in a compilable state, but it definitely gives the impression that the tool (ScalaIDE, not Scala itself) is still pretty immature. Apparently such problems plague others as well - http://www.assembla.com/spaces/scala-ide/tickets/1000073-java-lang-noclassdeffounderror-when-launching-application. Amusingly other features that one might think of as "advanced", such as refactor support for extract variable and the like are working for me, it's just the basic compilation management that seems weak.

On the plus side Scala itself is awesome. Anytime a language lets me pass a function it makes me happy ;)

object fizzbuzz {
  def main(args : Array[String]) = {
    //OMG I can pass a function like a first class citizen!
    //Damnit Java ... let me use functions this way
    (1 to 100).foreach(fizzbuzz)
  }
  
  def fizzbuzz(i : Int) = {   
    val isFizz =  i % 3 == 0
    val isBuzz =  i % 5 == 0
    if (isFizz) print ("Fizz")
    if (isBuzz) print ("Buzz")
    if (!isFizz && !isBuzz) print (i)
    println
  }
}

I hope the ScalaIDE stabilizes a bit because so far Scala looks awesome and I am fond of working in Eclipse.

Monday, February 21, 2011

Java multi-monitor screenshots

Random snippets I might want again one day ;)

Create a "C:\\CanDelete" directory before running any of this, or change the path building.

Take a screenshot of each monitor individually:

package com.blogspot.whileonefork.screencapture;
import java.awt.GraphicsDevice;
import java.awt.GraphicsEnvironment;
import java.awt.Rectangle;
import java.awt.Robot;
import java.awt.image.BufferedImage;
import java.io.File;

import javax.imageio.ImageIO;


public class EachMonitor {
 public static void main(String[] argv) throws Exception {
  //Take a screenshot of every monitor
  GraphicsEnvironment ge = GraphicsEnvironment.getLocalGraphicsEnvironment();
  GraphicsDevice[] screens = ge.getScreenDevices();
  
  for (GraphicsDevice screen : screens) {
   Robot robotForScreen = new Robot(screen);
   Rectangle screenBounds = screen.getDefaultConfiguration().getBounds();
   
   //The screen bounds have an annoying tendency to have an offset x/y; we want (0,0) => (width, height)
   screenBounds.x = 0;
   screenBounds.y = 0;
   BufferedImage screenShot = robotForScreen.createScreenCapture(screenBounds);
   String ext = "jpg";
   String outputFile = "C:\\CanDelete\\screenshot_" 
    + screen.getIDstring().replaceAll("[^A-Za-z0-9]", "_")
    + "."
    + ext;   
   ImageIO.write(screenShot, ext, new File(outputFile));
   System.out.println("Saved " + outputFile 
    + " of " + screen.getIDstring() 
    + " xy=(" + screenBounds.x + "," + screenBounds.y + ")"
    + " bounds=("+ screenBounds.width + "," + screenBounds.height + ")");
  }
 }
}

Take a single big screenshot of all monitors:

package com.blogspot.whileonefork.screencapture;

import java.awt.GraphicsDevice;
import java.awt.GraphicsEnvironment;
import java.awt.Rectangle;
import java.awt.Robot;
import java.awt.image.BufferedImage;
import java.io.File;

import javax.imageio.ImageIO;

public class AllMonitors {
 public static void main(String[] argv) throws Exception {      
  /**
   * Take one big screenshot of the entire UI. 
   * On my Windows box monitors seem to act like they are side by side on X, extending on Y.
   * Seems, at least on windows, to ignore any offset config setup in display properties.
   */
  GraphicsEnvironment ge = GraphicsEnvironment.getLocalGraphicsEnvironment();
  GraphicsDevice[] screens = ge.getScreenDevices();
  
  Rectangle allScreenBounds = new Rectangle();
  for (GraphicsDevice screen : screens) {
   Rectangle screenBounds = screen.getDefaultConfiguration().getBounds();
   
   allScreenBounds.width += screenBounds.width;
   allScreenBounds.height = Math.max(allScreenBounds.height, screenBounds.height);      
  }
  
  Robot robot = new Robot();
  BufferedImage screenShot = robot.createScreenCapture(allScreenBounds);
  
  String ext = "jpg";
  String outputFile = "C:\\CanDelete\\AllMonitors" 
   + "."
   + ext;   
  ImageIO.write(screenShot, ext, new File(outputFile));
  System.out.println("Saved " + outputFile 
   + " of all monitors " 
   + " xy=(" + allScreenBounds.x + "," + allScreenBounds.y + ")"
   + " bounds=("+ allScreenBounds.width + "," + allScreenBounds.height + ")");  
 } 
}

All this has been tested exactly once on a Windows box with two monitors of substantially different size; ymmv :)

Thursday, February 17, 2011

Bash delete directories except that one

Ran into a scenario where I needed to delete all subdirectories of target except target/keepme/. So I want rm -rf target/subdir1 target/subdir2 ... target/subdirN with one directory omitted. I want to use grep to remove the directory I want to keep from the list so something like this should work:

Print the list of directories echo target/*/
Flip space separator to newline tr " " "\n"
Remove the line for the directory to keep grep -v keepme
Flip newline back to space tr "\n" " "
Run rm -rf on the results from steps 1-4

As a 1-liner:

rm -rf $(echo target/*/ | tr " " "\n" | grep -v keepme | tr "\n" " ")

Noting it in case I need to do it again someday ;)

Thursday, February 10, 2011

Terse vs Verbose conditional execution in Bash

As today is apparently my day for remedial Bash (see previous posts about it) I found myself with a complete script (not published) that I felt could be much more terse. This led me to start looking into ways to compress things. In the do or die post we saw one such example, a few more follow.

My examples for this post are largely from the excellent Bash by example, part 3 post on IBM. Refer to The Advanced Bash Scripting Guide sections 7.1 Test Constructs and 7.3 Other Comparison Operators for details on the comparisons used. The examples shown were tested on Solaris 5.10.

Correct number of arguments
Verbose:

if [ $# -ne 2 ]
then
  echo "Please specify ebuild file and unpack, compile or all"
  exit 1
fi

Terse:

[ $# == 2 ] || {echo "Please specify ebuild file and unpack, compile or all"; exit 1;}

This works because Bash short-circuits. That is for A || B we only execute B if A is true, if A is false the result is already known to be false and we don't have to run B at all.

NOTE: Example above was corrected to {} instead of (); this is important as you do NOT want a subshell or else exit won't actually break out of your script. { commands; } doesn't create a subshell and thus works as expected.

Set a variable to a default value if unset
Verbose:

if [ -z "$DISTDIR" ]
then
  # set DISTDIR to /usr/src/distfiles if not already set
  DISTDIR=/usr/src/distfiles
fi

Terse:

# set DISTDIR to /usr/src/distfiles if not already set
[ -z "$DISTDIR" ] && DISTDIR=/usr/src/distfiles

This time we use short-circuited &&: if DISTDIR is null the first part is true and the second part executes.

Terse option 2:

[ -n "$DISTDIR" ] || DISTDIR=/usr/src/distfiles

If DISTDIR is NOT null the first part is true and we skip the second. If DISTDIR is null the first part is false and we have to execute the second part.

Make sure we have a clean directory to work in
Verbose:

if [ -d ${WORKDIR} ]
then    
  rm -rf ${WORKDIR}
fi

mkdir ${WORKDIR}

Terse:

[ ! -d ${WORKDIR} ] || rm -rf ${WORKDIR}
mkdir ${WORKDIR}

Note the addition of ! in the terse version: we want the first part to check false if the directory exists to ensure the second part executes, removing the directory.

Closing Notes
We can obviously continue with just about every if statement that only has one or two short actions.

A huge advantage of the verbose versions is that people are more used to them. Terse is nice in some ways but I strongly suspect more developers understand if [ blah ] then echo exit fi than [ something ] || (echo; exit). There may be additional ramifications to the reduction; I'm not Bash guru enough to say they are exact equivalents.

Another possible advantage to the short-form is it makes error checking easy enough to validate commands we usually just assume to work. For example, don't just tar xzf whatever and assume it unpacked, actually check it worked rather than detecting the failure later when some subsequent step failed.

Anyway, I'm far from ready to suggest that the terse form should be used everywhere but it is a useful tool to have for one-liners in the terminal and possibly to shorten scripts that have piles of dead simple if/then checks.

do or die()

Apparently today is my day to learn Bash tidbits (see prev posts). I've always been fond of concise ways to script do x or die with an informative message. Eg PHPs something() or die("even though I can enter a helpful message I won't damnit");.

So how do we do this in Bash? Well this type of thing pops up fairly regularly:

mycommandhere
if [ "$?" -ne "0" ]; then 
  echo $cmd " failed :("
  exit 1; 
fi

You can use A || B to chain things on failure (if A succeeds B doesn't execute) ... so why don't people do this:

mycommandthatmightfail || (echo "informativeness!"; exit 1)

Note that as we used ; to separate the commands to run if mycommandthatmightfail does indeed fail they will all run even if some of them fail.

*/ for directories

Want to do something to only the directories but not the files? Just use Bash parameter expansion: */ is "everything whose name ends with a /", eg directories.

For example:

#print all the directories
echo */
#also print all the directories (without -d it recurses)
ls -d */
#zip all the directories in target
zip myoutputfile.zip target/*/

Kinda cool.

Bash for-each line 1-liners

Note: Examples run on Solaris 5.10. input.txt contains the following three lines:

A B
C D
E F

Suppose you want to do something for each line in file. The most obvious way is to use a for loop - it almost matches how you think about it! So lets try (input line followed by output):

(/tmp ): for line in $(cat input.txt); do echo $line; done;
A
B
C
D
E
F

Not quite ideal; for uses the Bash IFS variable (ref) to detect token boundaries. IFS defaults to any whitespace (eg space, tab, line-break) so we get each token rather than each line. Handy in some circumstances but here we specifically want each line.

We can set and unset IFS (unset will return it to default; if we were really feeling careful we could store and restore the value in case someone else set it already):

(/tmp ): IFS='\n'; for line in $(cat input.txt); do echo $line; done; unset IFS;
A B
C D
E F
(/tmp ): for line in $(cat input.txt); do echo $line; done;
A
B
C
D
E
F

Note that the first for sets & unsets IFS; this ensure the second for (which doesn't set IFS) still behaves as expected. That's ugly damnit! More important, we failed we could leave IFS set and disrupt every future script run in this terminal.

We can use the builtin read command (ref) instead and construct our for-each line using while read:

(/tmp ): while read -r line; do echo $line; done < input.txt;
A B
C D
E F

Splendiferous! We get each line and we don't have to futz about with IFS. The -r argument to read disables backslash escaping to se read the literal content of the line without any special mangling.

Tuesday, February 8, 2011

Solaris: Who is talking to memcached?

In our environment we have memcached using tcp connections on Solaris 5.10. Sometimes we wonder (particularly in dev environments) what would screw up if I took down this instance? Today it came up that people were unsure how to figure out who was actively communicating with memcached anyway. Most of our team is more familiar with Linux than Solaris

The text protocol provides stats but the output doesn't really help. For example with exactly two live connections to memcached (telnet sessions) stats says this on my dev copy:

STAT curr_connections 11
STAT total_connections 31550
STAT connection_structures 13

Not really what I had in mind. There are NOT 11 current connections in the sense I'm interested in. OK, so how about netstat? Well ... it doesn't seem to provide the linkage from PID to connection we want. I want a list of all the potentially interesting TCP connections to the memcached process, including the IP of the peer.

A little research reveals that the pfiles command lists vast swathes of spiffy information about the files (sockets are essentially a file handle and can be used in the ubiquotous read/write C APIs if you wish), including the socket and peer!

...blah blah blah... 31: S_IFSOCK mode:0666 dev:294,0 ino:46252 uid:0 gid:0 size:0 O_RDWR|O_NONBLOCK SOCK_STREAM SO_REUSEADDR,SO_KEEPALIVE,SO_SNDBUF(49152),SO_RCVBUF(49232),IP_NEXTHOP(1.2.3.4) sockname: AF_INET 1.2.3.4 port: 11211 peername: AF_INET 2.3.4.5 port: 1057

That seems perfect; so now all we need to do is snag the PID of memcached, toss that over to pfiles, and grep out the bits we are interested in.

First up, PID of memcached. /usr/ucb/ps w will list our processes, including memcached (depending on how you started it you may need slightly different arguments):

/usr/ucb/ps x PID TT S TIME COMMAND 15973 ? S 0:51 /my/path/to/memcached ...args 16982 ? S 0:00 /usr/lib/ssh/sshd 16989 pts/11 S 0:00 -bash 18017 pts/11 O 0:00 /usr/ucb/ps -x

We can pluck out the memcached line with grep, and grab the first token however we prefer. In my case I'll just use awk. So to get the PID for memcached (if only one instance is running on the node) we can do this:

/usr/ucb/ps x | grep memcached | grep -v grep | awk '{print $1}'15973

We can use command substitution to pass this to pfiles:

pfiles $(/usr/ucb/ps x | grep memcached | grep -v grep | awk '{print $1}')

Then grep out the parts we are interested in (only IPv4 - AF_INET - items that have known peers; this is basically sockets showing who is connected to them). In the example below I have exactly two connections to my memcached from two different client machines (whose IPs have been modified)

pfiles $(/usr/ucb/ps x | grep memcached | grep -v grep | awk '{print $1}') | grep "peername: AF_INET"

peername: AF_INET 1.2.3.4 port: 46009

peername: AF_INET 1.2.3.5 port: 1057

In the example above the clients connected to memcached are 1.2.3.4 and 1.2.3.5.

Very handy!

Monday, February 7, 2011

Hashmap 101: Build your own!

Recently I saw an article on dzone talking about harassing future minions about HashMap details. I have never personally asked an interview victim this particular question but I might just start!

As I prefer to see code (hmm ... perhaps I should ask candidates to code their own HashMap on a whiteboard?) I have taken it upon myself to build a very simple one as an example. The sample code is all for Java 6, tests are for JUnit 4.

First, an overview of how a basic hashtable works. Every entry has a key that has a hash code and a value. Keys must be comparable in some fashion. Many different keys can have the same hash code, though in practice we want this to tend to be true only of keys that are also considered equivalent.

In Java Object helpfully provides hashCode() and equals() so we know that any object will be usable as a key in a hashtable.

Internally a hashtable stores a set of buckets, typically in a simple array. Each bucket contains 0..N entries, each of which has a key and a value. To find what bucket an entry goes into we generate a numeric hashcode for the key, and determine what bucket index from hashcode mod the number of buckets. In Java, something like:

int bucketIndex = key.hashCode() % buckets.length;

In a crude drawing we can see that we have an array of buckets, each of which contains 0..N entries. As shown below, probably in a linked list, though this is very much an implementation detail:

Anyway, as I'm pro-code anti-word (50% of that is true!), lets see some damn code. First we'll need a simple entry class. We need a key, which can't change, a value, which can change, and a pointer to next so we can make our array that has 0..N entries in each spot. Something like this should do it:

package whileonefork.map;

public class AnEntry<K, V> {
 private AnEntry<K, V> next;
 private final K key;
 private V value;
 
 public AnEntry(K key, V value) {
  this.key = key;
  this.setValue(value);
 }

 public K getKey() {
  return key;
 }

 public void setValue(V value) {
  this.value = value;
 }

 public V getValue() {
  return value;
 }

 public void setNext(AnEntry<K, V> next) {
  this.next = next;
 }

 public AnEntry<K, V> getNext() {
  return next;
 } 
}

As a complete aside ... what a lot of bloody boilerplate. If only Java had a keyword or other shortcut for properties that don't have any logic. private final property K key anyone?

Anyway, returning to the point, next we'll need the skeleton of our map class. We'll provide a dead simple API: no null keys, only get/put as operations.

package whileonefork.map;

public class SimpleHashMap<K, V> {
 
 private int DEFAULT_BUCKET_COUNT = 128;
 
 private AnEntry<K, V>[] buckets;
 
 public SimpleHashMap() {
  buckets = new AnEntry[DEFAULT_BUCKET_COUNT];
 }
 
 public V get(K key) {
  //TODO
 }
 
 public void put(K key, V value) {
  //TODO
 }
}

The first operation we'll implement is get. We'll need to identify what bucket to use, then cycle through the 0..N entries for that bucket to see if any of them have the exact key we are interested in:

public V get(K key) {
  throwIfKeyNull(key);
  
  AnEntry<K, V> entry = buckets[bucketIndexForKey(key)];
  while (entry != null && !key.equals(entry.getKey())) 
   entry = entry.getNext();
  return entry != null ? entry.getValue() : null;
 }

Next, put. If there is nothing at all in the bucket we can just create a new entry and shove it in. If there is already 1..N things there we need to search through them all. If we find one with the exact same key we'll update it, otherwise we'll just shove a new entry on the end:

public void put(K key, V value) {
  throwIfKeyNull(key);
  
  int bucketIndex = bucketIndexForKey(key);
  AnEntry<K, V> entry = buckets[bucketIndex];
  
  if (null != entry) {
   boolean done = false;
   while(!done) {
    if (key.equals(entry.getKey())) {
     entry.setValue(value);
     done = true;
    } else if (entry.getNext() == null) {
     entry.setNext(new AnEntry<K, V>(key, value));
     done = true;
    }
    entry = entry.getNext();
   }
  } else {
   //nothing there at all; just shove the new entry in
   buckets[bucketIndex] = new AnEntry<K, V>(key, value);
  }
 }

Note that the extent to which hash codes are widely distributed will significantly influence how many entries tend to be in the same bucket. If people implement hashCode() as 'return 1;' or some other minimally unique value then our hashtable will tend to be quite slow as all the entries will go into the same bucket. Ideally the vast majority of buckets have 0 or 1 entries.

A real implementation may do fancy things like monitor it's own stats and resize the buckets array for optimal operation if it thinks doing so might help. The most common heuristic for this is how many buckets have something in them, eg if more than 60% of our buckets are non-null (have 1..N entries) then we should rebuild a new, larger, buckets array. hashCode mod buckets.length changes if you alter the bucket count so you have to migrate all the entries from the old buckets into the new ones at potentially different indices. This functionality is omitted from the example.

Putting it all together our final hashmap looks like this:

package whileonefork.map;

public class SimpleHashMap<K, V> {
 
 private int DEFAULT_BUCKET_COUNT = 128;
 
 private AnEntry<K, V>[] buckets;
 
 public SimpleHashMap() {
  buckets = new AnEntry[DEFAULT_BUCKET_COUNT];
 }
 
 public V get(K key) {
  throwIfKeyNull(key);
  
  AnEntry<K, V> entry = buckets[bucketIndexForKey(key)];
  while (entry != null && !key.equals(entry.getKey())) 
   entry = entry.getNext();
  return entry != null ? entry.getValue() : null;
 }
 
 public void put(K key, V value) {
  throwIfKeyNull(key);
  
  int bucketIndex = bucketIndexForKey(key);
  AnEntry<K, V> entry = buckets[bucketIndex];
  
  if (null != entry) {
   boolean done = false;
   while(!done) {
    if (key.equals(entry.getKey())) {
     entry.setValue(value);
     done = true;
    } else if (entry.getNext() == null) {
     entry.setNext(new AnEntry<K, V>(key, value));
     done = true;
    }
    entry = entry.getNext();
   }
  } else {
   //nothing there at all; just shove the new entry in
   buckets[bucketIndex] = new AnEntry<K, V>(key, value);
  }
 }

 /**
  * THIS SHOULDN'T ACTUALLY BE PUBLIC; IT IS SO JUST FOR CLARIFICATION UNIT TEST PURPOSES
  * 
  */
 public int bucketIndexForKey(K key) {
  int bucketIndex = key.hashCode() % buckets.length;
  return bucketIndex;
 }
 
 private void throwIfKeyNull(K key) {
  if (key == null) {
   throw new IllegalArgumentException("key may not be null");
  }
 } 
}

We can even write some basic tests for it:

package whileonefork.map;

import static org.junit.Assert.assertEquals;

import org.junit.Test;

public class SimpleHashMapTestCase { 
 @Test
 public void readWriteSimpleValue() {
  SimpleHashMap<String, Integer> map = new SimpleHashMap<String, Integer>();
  
  map.put("duck", 7);
  map.put("goose", 42);
  
  assertEquals(Integer.valueOf(7), map.get("duck"));
  assertEquals(Integer.valueOf(42), map.get("goose"));
 }
 
 @Test
 public void getSomethingThatIsntThere() {
  SimpleHashMap<Integer, String> map = new SimpleHashMap<Integer, String>();  
  assertEquals(null, map.get(42));
 }
 
 @Test
 public void overWriteValue() {
  SimpleHashMap<Integer, String> map = new SimpleHashMap<Integer, String>();
  
  map.put(42, "meh");
  assertEquals("meh", map.get(42));
  map.put(42, "we have the technology to rebuild him");
  assertEquals("we have the technology to rebuild him", map.get(42));
 }
 
 @Test
 public void doNotOverwriteValueJustBecauseSameHashCode() {
  String diffButSameHash1 = "Ea";
        String diffButSameHash2 = "FB";
        
        //prove they really have the same hash
        assertEquals(diffButSameHash1.hashCode(), diffButSameHash2.hashCode());
                
        SimpleHashMap<String, String> map = new SimpleHashMap<String, String>();
        
        //prove they really go into the same bucket (yes, might be overdoing it here!)
        assertEquals(map.bucketIndexForKey(diffButSameHash1), map.bucketIndexForKey(diffButSameHash2));
                
        map.put(diffButSameHash1, "We Are All Unique");
        map.put(diffButSameHash2, "I'm Not!");
        
        //same hash but amazingly still different values!
        assertEquals("We Are All Unique", map.get(diffButSameHash1));
        assertEquals("I'm Not!", map.get(diffButSameHash2));
 }
}

Disclaimer: This code is garbage, purely for example purposes. It almost certainly has horrible bugs. You'd be crazy to use it for anything.

One Fork

Pages

Friday, February 25, 2011

Scala Permutations 2

Thursday, February 24, 2011

Scala vs Erlang vs Java permutations

Tuesday, February 22, 2011

JUnit 4 Scala

ScalaIDE vs Me

Monday, February 21, 2011

Java multi-monitor screenshots

Thursday, February 17, 2011

Bash delete directories except that one

Thursday, February 10, 2011

Terse vs Verbose conditional execution in Bash

do or die()

*/ for directories

Bash for-each line 1-liners

Tuesday, February 8, 2011

Solaris: Who is talking to memcached?

Monday, February 7, 2011

Hashmap 101: Build your own!

Blog Archive

Pages

Friday, February 25, 2011

Thursday, February 24, 2011

Tuesday, February 22, 2011

Monday, February 21, 2011

Thursday, February 17, 2011

Thursday, February 10, 2011

Tuesday, February 8, 2011

Monday, February 7, 2011

Subscribe

Blog Archive