Recently I saw an
article on
dzone talking about harassing future minions about HashMap details. I have never personally asked an interview victim this particular question but I might just start!
As I prefer to see code (hmm ... perhaps I should ask candidates to code their own HashMap on a whiteboard?) I have taken it upon myself to build a very simple one as an example. The sample code is all for Java 6, tests are for JUnit 4.
First, an overview of how a basic hashtable works. Every entry has a key that has a hash code and a value. Keys must be comparable in some fashion. Many different keys can have the same hash code, though in practice we want this to tend to be true only of keys that are also considered equivalent.
In Java Object helpfully provides hashCode() and equals() so we know that
any object will be usable as a key in a hashtable.
Internally a hashtable stores a set of buckets, typically in a simple array. Each bucket contains 0..N entries, each of which has a key and a value. To find what bucket an entry goes into we generate a numeric hashcode for the key, and determine what bucket index from hashcode mod the number of buckets. In Java, something like:
int bucketIndex = key.hashCode() % buckets.length;
In a crude drawing we can see that we have an array of buckets, each of which contains 0..N entries. As shown below, probably in a linked list, though this is very much an implementation detail:
Anyway, as I'm pro-code anti-word (50% of that is true!), lets see some damn code. First we'll need a simple entry class. We need a key, which can't change, a value, which can change, and a pointer to next so we can make our array that has 0..N entries in each spot. Something like this should do it:
package whileonefork.map;
public class AnEntry<K, V> {
private AnEntry<K, V> next;
private final K key;
private V value;
public AnEntry(K key, V value) {
this.key = key;
this.setValue(value);
}
public K getKey() {
return key;
}
public void setValue(V value) {
this.value = value;
}
public V getValue() {
return value;
}
public void setNext(AnEntry<K, V> next) {
this.next = next;
}
public AnEntry<K, V> getNext() {
return next;
}
}
As a complete aside ... what a lot of bloody boilerplate. If only Java had a keyword or other shortcut for properties that don't have any logic.
private final property K key anyone?
Anyway, returning to the point, next we'll need the skeleton of our map class. We'll provide a dead simple API: no null keys, only get/put as operations.
package whileonefork.map;
public class SimpleHashMap<K, V> {
private int DEFAULT_BUCKET_COUNT = 128;
private AnEntry<K, V>[] buckets;
public SimpleHashMap() {
buckets = new AnEntry[DEFAULT_BUCKET_COUNT];
}
public V get(K key) {
//TODO
}
public void put(K key, V value) {
//TODO
}
}
The first operation we'll implement is get. We'll need to identify what bucket to use, then cycle through the 0..N entries for that bucket to see if any of them have the exact key we are interested in:
public V get(K key) {
throwIfKeyNull(key);
AnEntry<K, V> entry = buckets[bucketIndexForKey(key)];
while (entry != null && !key.equals(entry.getKey()))
entry = entry.getNext();
return entry != null ? entry.getValue() : null;
}
Next, put. If there is nothing at all in the bucket we can just create a new entry and shove it in. If there is already 1..N things there we need to search through them all. If we find one with the exact same key we'll update it, otherwise we'll just shove a new entry on the end:
public void put(K key, V value) {
throwIfKeyNull(key);
int bucketIndex = bucketIndexForKey(key);
AnEntry<K, V> entry = buckets[bucketIndex];
if (null != entry) {
boolean done = false;
while(!done) {
if (key.equals(entry.getKey())) {
entry.setValue(value);
done = true;
} else if (entry.getNext() == null) {
entry.setNext(new AnEntry<K, V>(key, value));
done = true;
}
entry = entry.getNext();
}
} else {
//nothing there at all; just shove the new entry in
buckets[bucketIndex] = new AnEntry<K, V>(key, value);
}
}
Note that the extent to which hash codes are widely distributed will significantly influence how many entries tend to be in the same bucket. If people implement hashCode() as 'return 1;' or some other minimally unique value then our hashtable will tend to be quite slow as all the entries will go into the same bucket. Ideally the vast majority of buckets have 0 or 1 entries.
A real implementation may do fancy things like monitor it's own stats and resize the buckets array for optimal operation if it thinks doing so might help. The most common heuristic for this is how many buckets have something in them, eg if more than 60% of our buckets are non-null (have 1..N entries) then we should rebuild a new, larger, buckets array.
hashCode mod buckets.length changes if you alter the bucket count so you have to migrate all the entries from the old buckets into the new ones at potentially different indices. This functionality is omitted from the example.
Putting it all together our final hashmap looks like this:
package whileonefork.map;
public class SimpleHashMap<K, V> {
private int DEFAULT_BUCKET_COUNT = 128;
private AnEntry<K, V>[] buckets;
public SimpleHashMap() {
buckets = new AnEntry[DEFAULT_BUCKET_COUNT];
}
public V get(K key) {
throwIfKeyNull(key);
AnEntry<K, V> entry = buckets[bucketIndexForKey(key)];
while (entry != null && !key.equals(entry.getKey()))
entry = entry.getNext();
return entry != null ? entry.getValue() : null;
}
public void put(K key, V value) {
throwIfKeyNull(key);
int bucketIndex = bucketIndexForKey(key);
AnEntry<K, V> entry = buckets[bucketIndex];
if (null != entry) {
boolean done = false;
while(!done) {
if (key.equals(entry.getKey())) {
entry.setValue(value);
done = true;
} else if (entry.getNext() == null) {
entry.setNext(new AnEntry<K, V>(key, value));
done = true;
}
entry = entry.getNext();
}
} else {
//nothing there at all; just shove the new entry in
buckets[bucketIndex] = new AnEntry<K, V>(key, value);
}
}
/**
* THIS SHOULDN'T ACTUALLY BE PUBLIC; IT IS SO JUST FOR CLARIFICATION UNIT TEST PURPOSES
*
*/
public int bucketIndexForKey(K key) {
int bucketIndex = key.hashCode() % buckets.length;
return bucketIndex;
}
private void throwIfKeyNull(K key) {
if (key == null) {
throw new IllegalArgumentException("key may not be null");
}
}
}
We can even write some basic tests for it:
package whileonefork.map;
import static org.junit.Assert.assertEquals;
import org.junit.Test;
public class SimpleHashMapTestCase {
@Test
public void readWriteSimpleValue() {
SimpleHashMap<String, Integer> map = new SimpleHashMap<String, Integer>();
map.put("duck", 7);
map.put("goose", 42);
assertEquals(Integer.valueOf(7), map.get("duck"));
assertEquals(Integer.valueOf(42), map.get("goose"));
}
@Test
public void getSomethingThatIsntThere() {
SimpleHashMap<Integer, String> map = new SimpleHashMap<Integer, String>();
assertEquals(null, map.get(42));
}
@Test
public void overWriteValue() {
SimpleHashMap<Integer, String> map = new SimpleHashMap<Integer, String>();
map.put(42, "meh");
assertEquals("meh", map.get(42));
map.put(42, "we have the technology to rebuild him");
assertEquals("we have the technology to rebuild him", map.get(42));
}
@Test
public void doNotOverwriteValueJustBecauseSameHashCode() {
String diffButSameHash1 = "Ea";
String diffButSameHash2 = "FB";
//prove they really have the same hash
assertEquals(diffButSameHash1.hashCode(), diffButSameHash2.hashCode());
SimpleHashMap<String, String> map = new SimpleHashMap<String, String>();
//prove they really go into the same bucket (yes, might be overdoing it here!)
assertEquals(map.bucketIndexForKey(diffButSameHash1), map.bucketIndexForKey(diffButSameHash2));
map.put(diffButSameHash1, "We Are All Unique");
map.put(diffButSameHash2, "I'm Not!");
//same hash but amazingly still different values!
assertEquals("We Are All Unique", map.get(diffButSameHash1));
assertEquals("I'm Not!", map.get(diffButSameHash2));
}
}
Disclaimer: This code is garbage, purely for example purposes. It almost certainly has horrible bugs. You'd be crazy to use it for anything.