Saturday, June 19, 2010

Tracking changing data in Java

I need to track changing data in Java. For example, say I am collecting data sets of some real world behavior, and I'd like to know what has changed between individual readings: additions, removals, changes.

I start with a single datum:

public static class Example {
    public final int index; // primary key
    public final double value; // satelite data
    public final long access; // other data

    public Example(final int index, final double value,
            final long access) {
        this.index = index;
        this.value = value;
        this.access = access;
    }

    @Override
    public boolean equals(final Object o) {
        if (this == o)
            return true;
        if (o == null || getClass() != o.getClass())
            return false;

        final Example example = (Example) o;
        if (index != example.index)
            return false;

        return true;
    }

    @Override
    public int hashCode() {
        return index;
    }
}

Reducing to essentials:

  • Primary key values: different Example instances with the same index represent the same "thing"
  • Satellite data: these are the measurements I want to track
  • "Other" data: these are part of my data set but are not part of my change tracking

Now equals and hashCode take care of matching equivalent data so I can compare values between data sets. But I also need to know if the satellite data has changed:

public static class ExampleComparator
        implements Comparator<Example> {
    @Override
    public int compare(final Example a, final Example b) {
        final int c = Integer.valueOf(a.index).compareTo(b.index);
        if (0 != c)
            return c;
        return Double.valueOf(a.value).compareTo(b.value);
    }
}

An important point to consider: why not have Example implement Comparable<Example> instead of the separate comparator class? Read the Comparator javadocs closely: you really want equals and compareTo to agree (the discussion is good — I have a new source of bugs to comb out of my code base).

But here equals looks at only primary keys and the comparator looks at both primary keys and satellite data: they do not agree.

Having all the tools I need, time for comparing data sets (please be kind about the weak class name):

public class Differator<T, C extends Comparator<T>> {
    private final Map<T, T> items = new HashMap<T, T>();

    private final C comparator;

    public static <T, C extends Comparator<T>> Differator<T, C> newDifferator(
            final C comparator) {
        return new Differator<T, C>(comparator);
    }

    public Differator(final C comparator) {
        this.comparator = comparator;
    }

    public Changes<T> replaceAll(final Set<T> newItems) {
        final Changes<T> changes = new Changes<T>();

        for (final T item : items.keySet())
            if (!newItems.contains(item)) {
                changes.deleted.add(item);
                items.remove(item);
            }
        for (final T newItem : newItems) {
            if (!items.containsKey(newItem))
                changes.inserted.add(newItem);
            else {
                final T item = items.get(newItem);
                if (0 != comparator.compare(item, newItem))
                    changes.updated.add(new Delta<T>(item, newItem));
            }
            items.put(newItem, newItem);
        }

        return changes;
    }

    public static class Delta<T> {
        public final T item;
        public final T newItem;

        public Delta(final T item, final T newItem) {
            this.item = item;
            this.newItem = newItem;
        }

        @Override
        public boolean equals(final Object o) {
            if (this == o)
                return true;
            if (o == null || getClass() != o.getClass())
                return false;

            final Delta delta = (Delta) o;

            if (!item.equals(delta.item))
                return false;
            if (!newItem.equals(delta.newItem))
                return false;

            return true;
        }

        @Override
        public int hashCode() {
            int result = item.hashCode();
            result = 31 * result + newItem.hashCode();
            return result;
        }
    }

    public static class Changes<T> {
        public final Set<T> inserted = new HashSet<T>();
        public final Set<Delta<T>> updated = new HashSet<Delta<T>>();
        public final Set<T> deleted = new HashSet<T>();

        @Override
        public boolean equals(final Object o) {
            if (this == o)
                return true;
            if (o == null || getClass() != o.getClass())
                return false;

            final Changes changes = (Changes) o;

            if (!deleted.equals(changes.deleted))
                return false;
            if (!inserted.equals(changes.inserted))
                return false;
            if (!updated.equals(changes.updated))
                return false;

            return true;
        }

        @Override
        public int hashCode() {
            int result = inserted.hashCode();
            result = 31 * result + updated.hashCode();
            result = 31 * result + deleted.hashCode();
            return result;
        }
    }
}

It was not much effort to write Differator. replaceAll swaps out a new data set for an old one, returning how they two differed. Credit for immediately spotting that Differator is not merely thread unsafe, it is very unsafe. But this is not hard to address in a simple way.

Lastly, some tests:

public class DifferatorTest {
    private Differator<Example, ExampleComparator> differator;
    private Example item;
    private Changes<Example> changes;

    @Before
    public void setUp()
            throws Exception {
        differator = newDifferator(new ExampleComparator());
        item = new Example(1, 3.14159, 123);
        changes = differator.replaceAll(singleton(item));
    }

    @Test
    public void testFirstInsert() {
        assertThat(changes.inserted, is(equalTo(singleton(item))));
        assertThat(changes.updated.isEmpty(), is(true));
        assertThat(changes.deleted.isEmpty(), is(true));
    }

    @Test
    public void testUpdateUnimportantData() {
        final Changes<Example> changes = differator
                .replaceAll(singleton(new Example(1, 3.14159, 456)));
        assertThat(changes.inserted.isEmpty(), is(true));
        assertThat(changes.updated.isEmpty(), is(true));
        assertThat(changes.deleted.isEmpty(), is(true));
    }

    @Test
    public void testUpdateSatelliteData() {
        final Example newItem = new Example(1, 2.71828, 123);
        final Changes<Example> changes = differator
                .replaceAll(singleton(newItem));
        assertThat(changes.inserted.isEmpty(), is(true));
        assertThat(changes.updated, is(equalTo(
                singleton(new Differator.Delta<Example>(item, newItem)))));
        assertThat(changes.deleted.isEmpty(), is(true));
    }

    @Test
    public void testUpdatePrimaryKey() {
        final Example newItem = new Example(2, 3.14159, 123);
        final Changes<Example> changes = differator
                .replaceAll(singleton(newItem));
        assertThat(changes.inserted, is(singleton(newItem)));
        assertThat(changes.updated.isEmpty(), is(true));
        assertThat(changes.deleted, is(singleton(item)));
    }
}

No comments: