Thursday, December 09, 2004

What is remove in a map?

While coding a utility map class I was struck by this problem: what is the meaning of remove? That is, what does does it mean to remove a key when the map has more than one value for that key? How could that happen, you ask. My map isn't the ordinary sort: it's a union map, or a view of a list of maps.

Picture a stack of maps like a column of egg cartons over which you stand. As you look down through a particular cell from above, some cartons have an egg there, some do not. My union map is as if you collapsed the stack of egg cartons. Into each cell goes some kind of combination of the eggs from the corresponding cells in the original stack. The usual rule selects the top-most egg; remove just the top-most egg, and the rule selects the next egg down in the stack of cells; if there are no eggs in any cell, the collapsed cell is also lacking.

So I am faced with the question, what does it mean to remove a key from the union map? Do I remove all eggs in the matching cell from all cartons, or do I just remove the top-most one and the cell has a new value? Either way seems to me not quite right. As a practical matter, I chose for remove to remove all values for the key in all stacked maps so that collectively they would still follow the requirements of remove in the JDK, and I provided a removeOne to remove just the top-most value. Linguistically, I'd prefer for remove to perform this function, and for a removeAll to remove the key-value pairs from all stacked maps, but that would break expectations.

Suggestions?

10 comments:

Solly said...

How about making the removal policy an object you pass to the map's ctor?

Brian Oxley said...

Good suggestion but more complexity than I want to chew off.

Anonymous said...

I'm not sure you're not breaking the Map contract in other places. From the java.util.Map api docs: "each key can map to at most one value"; from your post: "...when the map has more than one value for that key". Sounds like you shouldn't be implementing Map at all.

Brian Oxley said...

Ah, but that's the clever bit. The union map *does* map to just one value per key. It is just that the value is synthetic and (potentially) depends on previous values. It is useful for storing, say, a history of maps. Imagine the PATH environment variable -- finding a program in the PATH is very similar to looking up a value (program location) by key (program name) and each layer in the union map is like a directory component of the PATH. Lookup isn't the problem; it is removal.

Solly said...

It doesn't have to be as complex as you might think. The "policy" will likely be an inner class since you don't want to expose too much of the map internals to the outside world -- so for the time being, you can just declare final static instances of the inner class for the two policies you have described (remove top only and remove all) and case on them inside the map class. At least that way you'd have the API done, and if you needed to add new policies down the road you could do so without changing the way in which the two existing policies get used.

Anonymous said...

I understand what you're saying, and it is clever, but I'm still not buying it. Here are more situations which are ambiguous with your impl:

clear() - clears the top egg carton? clears all "visible" entries? what's left after clear()? no keys but still entries(i.e. the "obscured" values)

entrySet() - all entries, even obscured entries?

remove() - as discussed

size() - number of cartons? number of entries incl. obscured?

values() - similar to entrySet()

What happens when this Map impl is passed to, say, the HashMap copy constructor? the resulting instance will have lost a bunch of information.

If you pass someone else an instance of your Map impl as a java.util.Map, how do you explain:

{
m.remove(key)
assert !m.contains(key) : "wha happened?"
}

I just think there are too many leakages. Further - for clients to use removeOne(), they'll need to know the exact type anyway so why the need to implement Map? Is there a problem with looking like Map but not implementing it?

BTW, my name is Matt - no blogger account.

Brian Oxley said...

I fully implement map. Presently remove(Object key) removes all values for the key in all maps in the union. But you gave me a hint: I also provide an extra "List layers()" method which returns the list of maps which compose the union.

Why am I worried about providing extra methods to distinguish between removing a key from the union and removing it from just the topmost occurrence (rolling back, if you will)?

I can just rely on the user to grab the top layer with a call to layers().get(layers.size() - 1) (or some similar convenience method) and remove the key from there.

I suppose I can provide convenience methods to simply that (technically, the key-value pair might not be in the topmost map in the union: one needs to search backwards from the top to bottom to find where the pair really is).

clear() does not bother me. If one wanted to clear just the topmost map, that is identical with popping off the top map from layers().

Anonymous said...

I have a feeling I'm missing your point and I know I'm not getting my point across. Could it be that blog comments are not the best place for a technical discussion? Anyway, I enjoy your blog so, uh, carry on. -Matt

Solly said...

At the risk of flogging an already dead horse, I had a few more thoughts that might be worth setting down. I think I can explain what it is that Matt is trying to say (and please correct me if I'm wrong, Matt). The fundamental problem is that what you're getting from the map is synthetic -- it was never put there in that form, and instead is a dependent variable derived from other stuff that was put into the map. Moreover, based on the logic you have presented, you can't infer the values of the independent variables from the value of the dependent one, so attempting to manipulate the dependent variable directly is indeterminate.

It's as though I said, "a + b = c; now, if c increases by 2, what are the values of a and b?" Without more information, or at least more constraints, it's an insoluble problem.

What I proposed earlier, i.e. passing the removal policy to the ctor, was one way of supplying the additional constraints in order to allow you to backtrack and figure out what had to be done to the indepent variables when the value of the dependent variable changed. If (as you indicated) you don't want to do that, then based on what Matt has said it sounds to me like attempting to manipulate the dependent variable like you want to do cannot be done in a general manner. If you really want to make a general map object with these semantics, then the most general thing to do is to disallow manipulating the dependent variable directly, say by throwing an UnsupportedOperationException if you invoke the remove() method (the way Collections.unmodifiableMap does), and to require you to manipulate the egg cartons instead of the result you get by stacking them.

Brian Oxley said...

Solly and Matt, good insights. I hadn't considered failing to support remove(), but that might be just the thing. And for me, that would be a great use of policies -- does remove(): (a) remove most recent, (b) remove all, or (c) throw. Thanks for the ideas!