Friday, October 01, 2004

Safer collections

In my post on IndexMap I mentioned in passing SafeHashMap. What is that?

The JDK is very useful but has some warts. One of the worst is this inconsistency: if you try to index into a list with a non-existent index (i.e., beyond the end of the list) the collections throw IndexOutOfBoundsException. But what happens when you try to fetch from a map with a non-existent key? The JDK map implementations silent insert a null value into the map for you and return it. This leads to the following anti-idiom:

Value getValue(Map map, Key key) {
    return (Value) map.get(key);
}

See the problem? Now all code dealing with map anywhere in the program needs to test for null values and decide how to handle them, or else be happy with NullPointerException:

for (Iterator it = map.values().iterator(); it.hasNext(); ) {
    Value value = (Value) it.next();

    if (null == value)
        handleNullValue();
    else
        doTheRealWorkWhichIsTheWholePoint(value);
}

Try coding that 10 times real fast.

What is the solution? Force the correct idiom in the first code fragment:

Value getValue(Map map, Key key) {
    if (!map.containsKey(key))
        map.put(key, createMissingValue(key));

    return (Value) map.get(key);
}

And to enforce this idiom, extend a concrete class such as HashMap with a safe wrapper, hence SafeHashMap, and forbid missing keys or null inserts:

public Object get(Object key) {
    if (null == key) throw new NullPointerException();
    if (!containsKey(key)) throw new IllegalArgumentException();

    return super.get(key);
}

public Object put(Object key, Object value) {
    if (null == key) throw new NullPointerException();
    if (null == value) throw new NullPointerException();

    return super.put(key, value);
}

3 comments:

Anonymous said...

If you don't want null values then perhaps you should be using a hashtable not a map.

Your solution unfortunately breaks many of the desired performance characteristics that define a map's behavour. A map is commonly implemented as a hashtable or a red black binary tree. In particular a red black binary tree provides guaranteed O(log n) performance on lookup and insertion.

The following code makes the performance of the lookup operation 3 times worse, or O( 3(log n)) and therfore breaks many of the design requirements of a map.

Value getValue(Map map, Key key) {
if (!map.containsKey(key))
map.put(key, createMissingValue(key));

return (Value) map.get(key);
}

The reason for the degredation in performance is becuase you only get O(log n) when you traverse the tree once and only once per operation. By calling containsKey() put() and then get() your doing three times the work.

It also makes perfect sense for a map to return null, since its saying theres nothing mapped to that key.

To acheive your desired effectyou simply need a small wrapper class which holds a key and value and returns the key as the hashcode. Then you can safely use a hashtable and stop making a hash of the collections API :)

public class Pair
{
public Key;
public Value;

int hashCode(){
return Key.hashCode();
}
}

Brian Oxley said...

The performance consideration is a good observation. Sadly, the JDK APIs don't have bounds such as the C++ STL does.

As for noting that you can check for null for a missing key -- you can't! Since null is both a permitted value and the default value, it's presence does not tell you if the key is present. That's what containsKey(Object) is for.

My beef is one about safety and catching coding errors. In performance-bounded spots where profiling shows a problem, I'd switch from SafeHashMap to HashMap, but otherwise I prefer to keep SafeHashMap to catch problems.

The actual implementation I use also includes checking the classes of the keys and values to ensure you can't stick a Dog into a Map of Cats by String.

Catching bugs like this is much more important for the sort of business-logic programming I normally do than is performance considerations, so you and I might well have very different priorities. At the extreme, a bug-free program which never completes is pretty useless, however. :-)

Anonymous said...

May be I missed somethign, you are throwing a RuntimeException on accessing a non-existent value. Your solution does not make much of a difference to a sloppy programmer who forgets to check the null return value of Map.get, he will get the stack trace anyway, only a little bit earlier.