Thursday, December 28, 2017

Push early, push often, push on green

(This post follows up on Frequent commits, a post about git command line help for TDD. I assume you are already following good TDD practices. Also, please recall git normally requires pulling before pushing if your repo is behind on commits from the remote.)

Prologue

I'm chatting with a colleague who is new in her role as an Agile Coach (she comes from a PM background). We were talking about ways to organize a team's story board (card wall), and turned to Desk Checks and when they fit into a story's life.

An interesting remark by her: a team had agreed to share work (push) only after the desk check was successful; that is, they did not push code until a story was almost done: work lay fallow on individuals' machines for potentially days at a stretch.

I was surprised. Why would they wait days to push—what did they do about merge conflicts, complex refactorings, integration failures in the pipeline, et al?

Lecture

Entropy

To me this was clearly a smell. Martin Fowler specifically addresses this in Everyone Commits To the Mainline Every Day, and I would go further: Push commits at the earliest responsible moment. This is opposite the advice for refactoring, or especially emergent design, where the "Rule of 3" and last responsible moment cautions waiting for more information before committing to a course of action.

And you can see why early pushes differ from the other two: waiting will not get you more information. On the contrary, waiting will only increase the entropy of the code base! Commits lie fallow in the local repo, increasing the size of potential merge conflicts for others.

  Benefit from more information? Principle Entropy from waiting
Early push None available Earliest responsible moment Rises from fallow commits
Refactoring Get more code examples Rule of 3 Falls after refactoring
Architecture decision Learn more about system Last responsible moment Falls if responsible

(The "information" in the case of pushes are the pulled commits themselves.)

I've definitely experienced this firsthand, when I'd eventually discard my local commits after waiting too long, and letting them grow too much in a different direction from how the rest of the team progressed. Waste!

Complexity

Consider this work cycle:

  1. Local commit
  2. Fetch commits from remote
  3. Merge, if needed
  4. Local commit again, if needed
  5. Push commits to remote

I've grouped these to emphasize what is local to you (the first four) and what is global to your team (the last one).

Considered only locally, you minimize entropy with frequent pulls for yourself, and likewise for your teammates individually, so you can catch merge conflicts early and resolve them when they are small. But considered globally, you need frequent pushes so those local pulls have only small changes in them. The longer you wait to push, the more work for those who pull.

Early push
You Rest of team Work for others
You commit    
You push  
  They pull Less complexity of merge (1 commit)
You commit  
You push  
  They pull Less complexity of merge (1 commit)

Each single push can be treated on it's own. There are two opportunities for merge conflict, but each is a small amount of work.

Late push[1]
You Rest of team Work for others
You commit    
  They pull No changes to merge
You commit  
You push  
  They pull Greater complexity of merge (2 commits)

In each scenario, there are two commits for others to contend with. The larger, combined push has a greater opportunity for merge conflict, and a greater chance for a large amount of work, because of the combined interactions of the two commits.

And as teams work in parallel, there are more opportunities for merge conflicts.

Push early, push often, push on green

From the above discussion, the safest course is to push early rather than wait as commits pile up locally. But when to push—what is the "earliest responsible moment"?

If your codebase is well-tested, and answer presents itself: Push when tests are green and changes alter the complexity.

The goal is to avoid complex commit interactions that lead to merge conflicts. Tests are the safety net. Further, if all else fails and a commit is bad, it is easy to throw away the last commit until things are right again: only a small amount of work is lost, not days worth.

Understanding what kind of changes alter complexity takes skill: skills improve with experience and coaching. The cost of early pushes is low, and the occassional penalty of late pushes high, so this would be a good topic for a "team norms" ("dev practices") discussion.

For example, the team might agree that changes to comments are not in themselves worth a push. At the other end, your refactorings which impact more than one source file almost certainly should be pushed early: discover their impact on others before you add more refactorings.

A good work cycle:

  1. Pull
  2. Build, run tests
  3. Edit sources
  4. Build, run tests
  5. Commit
  6. Pull and push

After a preliminary sanity check (#1 and #2), get in the the cycle of #3 to #6.

Epilogue

I checked with other teams: it is minority practice to wait until a successful desk check to push changes. That's a relief. Hopefully this practice can be made more rare.

One rational reason—itself a smell—is when tests take too long to run frequently. When I design a pipeline, I recommend breaking out "unit" tests from "integration" tests for this exact reason: even when integration tests run long, the initial CI stage with just unit tests should be fast enough to give quick feedback on frequent pushes, and encourage Push early, Push often, Push on (local) green.

Further reading

Footnotes

[1] The simple statement, "a greater chance for a large amount of work", has rather complex reasoning behind it, beyond the scope of this post.

For example, any particular commit can be viewed as applying an exponent to the overall complexity of a program. A neutral change (say, correcting a typo in a comment) has the exponent 1: it does not change the overall complexity; a positive change (say, removing code duplication) has an exponent between 0 and 1: it lowers the overall complexity; a negative change (say, adding a new dependency) has an exponent greater than 1: it raises the overall complexity.

Consider then that these complexity changes are not simple numbers, but distributions ("odds"), and change with time ("bitrot"), and involve more than the code (people or requirments changes).

[2] In Seth's post, do not confuse "publish once" with "wait to push": it means "don't publish the same commit twice" (which does sometimes happen accidentally, even for experts, from merging or rebasing).

Update

Sure enough, right after posting I read an interesting discussion on the value of the statistical mean (average) relevant to the discussion on two commits taken separately or together.

Essentially, even when the merge conflict work averages out over time for pushing two commits separately versus pushing them together, the outliers for pushing them together is significantly worse than for pushing them separately because of interactions and complexity.

Wednesday, November 01, 2017

Old scripting

The landscape

I'm helping someone with a scripting problem on an old system and an old shell. How old? Try IBM AIX 6.1, first released in 2007, and ksh93 "e" released in ... 1993. At least the AIX is a version of 6.1 from 2014! (Kudos to IBM for treating long-term support seriously.)

A second point to ponder. The goal is to improve remote scripting—running scripts on a remote machine. In this environment, ssh exists but is not used. The remote execution tool chosen is rexec, considered one of the most dangerous tools possible. But my remit is not to address the insecurity, just to improve the scripting. (They know this is a bad, and are actively working to eventually resolve.)

So, given these constraints, what problem am I solving?

Example problem

This environment makes extensive use of remotely executed scripts to wire together a distributed, locally-hosted system. Current scripts duplicate the same approach, each implemented as a one-off: Copy a script to a remote machine with rcp; use rexec to invoke the script, capturing the output to a file on the remote host; copy the captured file back to the local host; process the output file; sometimes clean up the remote host afterwards.

Some gotchas to watch out for with ksh93e or rexec:

  • Function tracing - Using the standard xtrace setting to trace script execution in ksh93 has problems with tracing functions, and requires using old-style function syntax
  • Variable scope - To keep variables local to a function in ksh93, you must use the new-style function syntax (note the conflict with tracing)
  • Exit broken with trap - When calling exit to quit a remote script, trap does not get a correct $? variable (it is always 0, as exit succeeded in returning a non-0 exit status). Instead one must "set" $? with the code of a failing command, and then leave with a plain call to exit
  • No pipefail - Release "e" of ksh93 just does not know anything about set -o pipefail, and there is no uninstrusive workaround. This now common feature showed up in release "g"
  • No exit code - Would you believe rexec does not itself exit with the exit code of the remote command, never has, and never will? It always exits 0 if the remote command could be started.
  • Buffered stderr - Empirically, rexec (at least the version with this AIX) buffers the stderr stream of remote commands, and only flushes when rexec exits, so the sense of ordering between stdout, stderr and the command-line prompt is even worse than usual (the actual handling is unspecified)

This problem and environment triggers a memory: The last time I worked on AIX was in 1994, and it was almost the same problem! I really thought I had escaped those days.

A solution

So I refactored. I couldn't change the use of rexec—this environment is not ready for SSH key management—, I couldn't replace KSH93 with BASH or replace AIX with Linux, but I could do something about the imperfect duplication and random detritus files.

The solution

Note the need to call a fail function instead of exit directly because of poor interaction with trap.

Assuming some help, such as a global progname variable (which could simply be $0), and avoiding remote temporary files:

_transfer_exit_code() {
    while read line
    do
        case $line in
            ^[0-9] | ^[1-9][0-9] | ^11[0-9] | ^12[0-7] ) return ${line#^} ;;
            * ) printf '%s\n' "$line" ;;
        esac
    done
    return 1  # ksh93e lacks pipefail; we get here when 'rscript' failed
}

rscript() {
    case $# in
        0 | 1 )
            echo "$progname: BUG: Usage: rexec SCRIPT-NAME HOSTNAME [ARGS]..." >&2 ;;
        * ) script_name=$1 ; shift
            hostname=$1 ; shift ;;
    esac
    # Trace callers script if we ourselves are being traced
    case $- in
        *x* ) _set_x='set -x' ;;
    esac

    rexec $hostname /usr/bin/ksh93 -s "$@" <<EOS | _transfer_exit_code
set - "$@"  # Only reasonable way to pass through function arguments

# Work around AIX ksh93 return code of exit ignored by trap
fail() {
    return \$1
}

# Our hook to capture the exit code for rexec who dumbly swallows it
trap 'rc=\$?; echo ^\$rc; exit \$rc' EXIT

PS4='+$script_name:\$(( LINENO - 14 )) (\$SECONDS) '
$_set_x

# The callers script
$(cat)
EOS
}

Example use

#!/usr/bin/ksh93

progname=${0##*/}

PS4='+$progname:$LINENO ($SECONDS) '

usage() {
    echo "Usage: $0 [-d] HOSTNAME"
}

. rexec.ksh

debug=false
while getopts :d opt
do
    case $opt in
        d ) debug=true ;;
        * ) usage >&2 ; exit 2 ;;
    esac
done
shift $(( OPTIND - 1 ))

case $# in
    1 ) hostname=$1 ;;
    * ) usage >&2 ; exit 2 ;;
esac

$debug && set -x

script_name=My-Remote-Script

tmp=${TMPDIR-/tmp}/$progname.$RANDOM
trap 'rm -f $tmp' EXIT

rscript $script_name $hostname Katy <<'EOS' >$tmp
echo $#: $1
fail 3
EOS

case $? in
    3 ) ;;
    * ) echo "$0: Did not pass through exit code" >&2 ; exit 1 ;;
esac

case "$(<$tmp)" in
    '1: Katy' ) ;;
    * ) echo "$0: Did not pass through arguments" >&2 ; exit 1 ;;
esac

Source

The code is in GitHub.

Tuesday, September 19, 2017

Help for JDBC with Java streams

We wanted to use JDBC with Java Streams, but encountered several difficulties. Fortunately we found solutions with rather small bits of code.

Checked exceptions

The main obstacle was the JDBC API throwing SQLException for all API methods used in our code. SQLException is a checked exception, so must be declared in our method signatures, or caught otherwise. However the Streams API only accepts methods which declare to throw no checked exceptions, so something simple like this will not compile:

stream(results).
        map(row -> row.getString("label")).  // checked exception
        forEach(this::processLabel);

The call to ResultSet.getString(String) throws a checked exception. The usual approach is to wrap the call, and handle the exception in the wrapping method:

method String streamGetLabel(final ResultSet results) {
    try {
        return results.getString("label");
    } catch (final SQLException e) {
        throw new UncheckedIOException(e);
    }
}

(Here UncheckedIOException is an unchecked exception wrapper we wrote for SQLException, similar to UncheckedIOException in the JDK for IOException.)

Then the stream becomes:

stream(results).
        map(this::streamGetLabel).
        forEach(this::processLabel);

This is OK, however needing to write a wrapper method for each time we wanted to use JDBC in a stream became tedious.

Solution

First we wrote a SAM (more on SAM interfaces/classes) interface as a lookalike for the JDK Function interface: this is what Stream.map(Function) wants.The lookalike is different in that it throws SQLException:

@FunctionalInterface
public interface SQLFunction<T, R> {
    R apply(final T t) throws SQLException;

    // Other default methods - no more abstract methods
}

Then we used this in a closed Function implementation to wrap and delegate to the lookalike, and throw UncheckedSQLException if the lookalike throws SQLException:

@RequiredArgsConstructor(staticName = "applyUnchecked")
public final class UncheckedSQLFunction<T, R>
        implements Function<T, R> {
    private final SQLFunction<T, R> wrapped;

    @Override
    public R apply(final T t) {
        try {
            return wrapped.apply(t);
        } catch (final SQLException e) {
            throw new UncheckedSQLException(e);
        }
    }
}

(Here we use the excellent Lombok library to generate our constructor, and give us a static convenience method, "applyUnchecked".)

Finally some static importing, and our example streams use becomes:

stream(results).
        map(applyUnchecked(row -> row.getString("label"))).
        forEach(this::processLabel);

Or with more help:

stream(results).
        map(getString("label")).
        forEach(this::processLabel);

We wrote similar lookalikes and wrappers for Predicate and Consumer. It would be easy enough to write them for other Java functional interfaces, such as BiFunction.

Streaming result sets

The next difficulty we tackled was how to loop over ResultSet, and use them with Streams.

(A side note: ResultSet is not a set but a list: rows are ordered, and they can duplicate each other in their column data. However, they were named after the SQL concept of sets, not the Java one.)

Fundamentally, a ResultSet is not an iterator, but is close:

Iterator ResultSet Returns  
hasNext() next() boolean  
next() this ResultSet (Yes, the ResultSet itself is the equivalent)

Solution

To provide a ResultSet as an iterator:

final List<String> values = new ArrayList<>();
        for (final ResultSet row : iterable(results)) {
            values.add(row.getString("value"));
        }

Moreso, to provide one as a stream:

final List<String> values = stream(results).
                map(getString("value")).
                collect(toList());

Any SQL failures are thrown as unchecked exceptions. The stream has the characteristics: immutable, nonnull, and ordered.

Transactions

We found JDBC transactions to be tricky. Fundamentally they are tied to a connection; there is no proper nesting. (To simulate nesting, use separate connections. Even then, there is no guarantee of ordering from the database engine.) And they have a baroque API, relying on diddling of the "auto-commit" setting with care needed to restore its setting after the transaction concludes. Several bugs ensued before we switched to using small helper interfaces and methods.

Further, some programming languages do not distinguish void from other return types (e.g., Unit type): Java is not one of them. Likewise for user vs primitive types (Boolean vs boolean). Hence, there are separate transaction blocks for consumers, functions, and predicates.

Solution

One example explains them all. Consider functions and the SQLFunction lookalike interface:

@RequiredArgsConstructor(staticName = "applyTransacted")
public final class TransactedFunction<T, R>
        implements SQLFunction<T, R> {
    private final Connection connection;
    private final SQLFunction<T, R> wrapped;

    @Override
    public R apply(final T in)
            throws SQLException {
        connection.setAutoCommit(false);
        try {
            final R out = wrapped.apply(in);
            connection.commit();
            return out;
        } catch (final SQLException e) {
            connection.rollback();
            throw e;
        } finally {
            connection.setAutoCommit(true);
        }
    }
}

(The pattern is the same for other collection operations.)

With a helper and static importing:

final Integer value = applyTransacted(connection, in -> 0).apply("string");

Or when the transaction fails:

applyTransacted(connection, in -> {
        throw new SQLException("Something went wrong");
    }).apply("string");

Some convenience

Many places in these examples are improved with helper functions, or for transactions, with currying (similar to the builder pattern). Hence, the wide use of Lombok static constructors. Transactions are another example as they need a Connection for begin/commit/rollback.

Solution

A simple helper curries connection for transactions:

@RequiredArgsConstructor(staticName = "with")
public final class WithConnection {
    private final Connection connection;

    public <T> Predicate<T> testTransacted(final SQLPredicate<T> wrapped) {
        return UncheckedSQLPredicate.testUnchecked(
                TransactedPredicate.<T>testTransacted(connection, wrapped));
    }

    public <T, R> Function<T, R> applyTransacted(
            final SQLFunction<T, R> wrapped) {
        return UncheckedSQLFunction.applyUnchecked(
                TransactedFunction.<T, R>applyTransacted(connection,
                        wrapped));
    }

    public <T> Consumer<T> acceptTransacted(final SQLConsumer<T> wrapped) {
        return UncheckedSQLConsumer.acceptUnchecked(
                TransactedConsumer.<T>acceptTransacted(connection, wrapped));
    }
}

Example use:

final Optional<Integer> value = Stream.of(0).
        filter(with(connection).testTransacted(in -> true)).
        findFirst();

(Yes, we might also describe the code as partial application. The "object-oriented" implementation confuses matters with the hidden this reference.)

Conclusion

There is nothing we did that was difficult or complex: simple one-liner interfaces, simple wrapper implementations of Java funcional interfaces, some rote JDBC best practices. The main difficulty was conceptual: seeing the duplication of many, small wrapper methods, and pulling out their commonality. This is a good pattern to keep in mind throughout your code.

UPDATE:

And the source: Java helpers for JDBC and Streams.

Friday, August 11, 2017

How to write clean Java

I cannot tell you how to write good Java, but I can help you write clean Java. As usual, automation is key. It's all about the tooling.

The best thing about good tooling is that they work together: each covers a different area, does not impeded another tool, and fixing one complaint a tool reveals often fixes complaints from other tools.

This advice applies to any programming language, not just Java. Java, having the most mature ecosystem, is instructive.

The tools

Use good source control
Git is your best choice. Use either trunk-based development with feature toggles (feature flags), or gitflow patterns, depending on your needs and organization. Require full builds and testing before pushing to a shared repository; hooks can help together with your build tool.
Use good build automation
Maven or Gradle are your best choices; either is excellent. Teach the tool to be strict and fail builds if anything is not just right. Treat your build configuration as code, part of the same repository and held to the same hygiene standards. Keep your local build fast.
Use a good editor
IntelliJ is the best choice. Use inspections, intentions, and reformatting obsessively: code diffs should only show real changes, not whitespace or formatting. Use plugins to ease integrating IntelliJ with other tooling.
Keep your style consistent
Checkstyle is your best choice; other choices are needed for non-Java JVM languages. Fail your build if checkstyle complains. Keep your IntelliJ formatting and Checkstyle rules consistent. Generally, choose either Sun or Google coding standards, and be leary of deviating from them; if unsure, pick Sun.
Fully test your code
JaCoCo works well. Fail your build if coverage drops below a threshhold; rachet up the threshhold as coverage improves. Start low and aim for a 95%+ threshhold. Follow the Test Pyramid; save your end-to-end tests for CI, or run locally only once before pushing commits.
Be zealous in looking for problems
FindBugs, PMD, or Error Prone are your best choices (pick one); other choices may work better for non-Java JVM languages. Fail your build if your choice complains. Be judicious in disabling complaints (for example, FindBugs "experimental" checks, should likely be disabled).
Use code generation
Lombok is your first choice. Add others as needed (or build domain-specific code generators). Generated code does not need test coverage, style checks, or bug detection if the generator is clean and well-tested: trust it.

Update

Dan Wallach reminded me of Error Prone, added above.

Cygwin terminal in IntelliJ

IntelliJ sports an excellent terminal emulator (the "Terminal" tab at bottom of the editor). By default it brings up a terminal native to your Operating System: CMD.EXE on Windows, $SHELL on Linux and Mac.

However I prefer Cygwin when I work on Windows. WSL is incredible, but there are still interoperability issues between its filesystem and Windows-native programs, and IntelliJ (which relies on java.exe, a Windows-native program) is still working on it.

So, how to open a Cygwin terminal in IntelliJ? Setting the program to start in Settings|Tools|Terminal|Shell path, the most obvious thing to do, does not quite work:

C:\cygwin64\bin\bash.exe

This is a non-interactive shell, and does not source your profile. The next try is:

C:\cygwin64\bin\bash.exe --login -i

This produces an error from IntelliJ that it cannot start the program correctly. A little checking says the leading command needs to be quoted, else IntelliJ treats the entire line as the name of the command, not as a command followed by flags. OK:

"C:\cygwin64\bin\bash.exe" --login -i

Hey, I have a shell! Unfortunately, it starts in my home directory, not in my project root. Starting in the project root is one of the nice features of the terminal in IntelliJ. Finally, two changes. First the IntelliJ setting:

"C:\cygwin64\bin\bash" -c "exec /usr/bin/env INTELLIJ=true $SHELL --login -i"

And an addition to my ~/.bashrc:

${INTELLIJ-false} && cd ${OLDPWD-.}

Ipso presto!

Monday, June 12, 2017

Example JVM agent in Kotlin

Oleg Shelajev wrote an excellent tutorial post on writing JVM agents. These are bits of code which run before your main() method. Why do this? It permits some interesting tricks, chiefly modifying classes as they are loaded, but also estimating the actual memory used by Java objects.

I gave this a try myself, but rather than writing my agent in Java, I wrote it in Kotlin. It was straight-forward, with only one gotcha.

AgentX.kt

@file:JvmName("AgentX")

package hm.binkley.labs.skratch.jvmagent

import java.lang.instrument.Instrumentation

fun premain(arguments: String?, instrumentation: Instrumentation) {
    println("Hello from AgentX 'premain'!")
}

fun main(args: Array<String>) {
    println("Hello from AgentX 'main'!")
}

OK, the non-gotcha. You can declare functions at the package level. This acts just like static methods in Java, with simpler syntax (no potentially artificial wrapper class to hold the static method). The two obvious examples in the above code are main() and premain().

But when calling Kotlin from Java, you use a wrapping class name in the the fully-qualified method name. My Kotlin file is named "AgentX.kt", so the default class name for Java is "AgentXKt". I'm lazy, wanted to save some typing, so I used a Kotlin package-level annotation to name the wrapping class just "AgentX".

Output

The JVM requires an absolute path to any agent jar, and I'm running Cygwin, so a little help to get a full path. Similarly, I used the Maven shade plugin to build a single uber-jar holding my own classes, and those of my dependencies (the Kotlin standard library).

$ java -javaagent:$(cygpath -m $PWD/target/skratch-0-SNAPSHOT.jar) -jar target/skratch-0-SNAPSHOT.jar
Hello from AgentX 'premain'!
Hello from AgentX 'main'!

Project is here: https://github.com/binkley/skratch.

Gotcha

Enough preamble, now the gotcha. Unlike Java, Kotlin helps you protect yourself from nulls without boilerplate code. So in premain() for the "arguments" parameter, you need to use String? rather than String as the parameter type as the JVM may pass you a null. The first time I tried the code, I didn't realize this and it blew up:

$ java -javaagent:$(cygpath -m $PWD/target/skratch-0-SNAPSHOT.jar) -cp target/skratch-0-SNAPSHOT.jar hm.binkley.labs.skratch.jvmagent.AgentX
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:386)
        at sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:401)
Caused by: java.lang.IllegalArgumentException: Parameter specified as non-null is null: method hm.binkley.labs.skratch.jvmagent.AgentX.premain, parameter arguments
        at hm.binkley.labs.skratch.jvmagent.AgentX.premain(AgentX.kt)
        ... 6 more
FATAL ERROR in native method: processing of -javaagent failed

Interesting! Kotlin found the issue at runtime. It can't find it at compile time as the JVM API for "premain" is pure convention without an interface or class to inspect.

Let's try running the agent a different way. The command-line lets us pass options, and these become the "arguments" parameter:

$ java -javaagent:$(cygpath -m $PWD/target/skratch-0-SNAPSHOT.jar)= -cp target/skratch-0-SNAPSHOT.jar hm.binkley.labs.skratch.jvmagent.AgentX
Hello from AgentX 'premain'!
Hello from AgentX 'main'!

Sneaky. The mere presence of the "=" on the command line turns the "arguments" parameter from null to an empty string.

Wednesday, May 17, 2017

SVG takeaways

I started playing with SVG for another blog post I'm working on. I wanted to learn more about SVG, an XML application. Some of my takeaways:

Z ordering

Elements are drawn in the order they appear in the XML. So if you want element B to overlay element A, then write A earlier in XML, and write B later. Then B will be drawn after A, and lay on top of it (if they overlap), hiding bits of A if B isn't transparent.

Scaling

You define your own coordinate system, and X/Y coordinates are all relative to this drawing area, called the "view box". So if you declare the view box to have an X ranging 0-20 "units" and a Y ranging 0-100 "units", then your elements will use an X between 0 and 20 and a Y between 0 and 100.

Recall the "S" in "SVG" stands for scalable. So when I say "units", these are relative units in context of the view box, and so in the example, "50%" of the X coordinate is 10 "units". You don't write down "units" in the XML, just provide the X and Y ranges.

What size is this on screen? Your browser or device or PDF will render the SVG view box to fit, and scale your elements accordingly. This is great for diagrams and drawings. If you do nothing special to stretch or squeeze the view box, your elements will retain perspective and always fit the page.

Embedding SVG in HTML5, I found this produced nice standalone diagrams within the vertical flow of text (using the X/Y from the example). The "style" bit is important:

<svg xmlns="http://www.w3.org/2000/svg" version="1.1" viewBox="0 0 100 20"
    width="80%" style="margin: auto; display: block">
    <!-- Write elements here -->
</svg>

Styling

Element attributes are consistent. So when I figured out that stroke="black" drew black lines, I could use the same attribute on any element to assign color. This was nice for learning the SVG language piecewise, picking up bits as I went along.

Fractional values are OK in several contexts. So when I wanted skinnier lines, I could write stroke-width="0.5".

A full example

Here's the SVG for that post I mentioned at start. I provide it without explanation; take your best guess what it's for!

Work Env Dev Prod

Saturday, April 29, 2017

Maven color logging on Cygwin

WORKAROUND: Thanks to comment by Zart Colwing, there is a workaround. Add -Djansi.passthrough=true to MAVEN_OPTS. It is not enough to add the flag to the command line; it needs to be seen by JANSI before maven begins parsing the command line. See No color for maven on Cygwin to track the issue.

Post

I'm quite happy Maven 3.5.0 has added color logging!

With earlier versions of maven, I used Jean-Christophe Guy's excellent maven-color extension. On Windows that involved some manual hacking of my maven installation, but on Mac, it was trivial with homebrew.

So now I'm getting color output from maven out of the box. Except when I don't.

You see, this new feature relies on the good JAnsi library. And JAnsi presently has an issue with color on Cygwin. When I'm at home, Cygwin is my mainstay, used on my gaming-cum-programming rig, so this is significant to me. What is the issue? No color—JAnsi detects I'm on Windows, and uses the native Windows console color handling, which doesn't work in Mintty or Xterm. Those use standard ANSI escape sequences, etc. rather than an OS-specific library.

Digging through the source for JAnsi, I find the trouble spot:

private static final boolean IS_WINDOWS = System.getProperty("os.name").toLowerCase(Locale.ENGLISH).contains("win");

Aha! The special Windows-handling code kicks in when the os.name system property contains "win" (in any case). Using the jps and jinfo tools that come with the JDK, I double-checked against a long-running maven build (16848 just was the PID used by the JVM for this maven build; use jps to list all current PIDs):

$ jinfo -sysprops 16848 | grep os.name
os.name = Windows 10

Some experimenting found a way to work around that:

MAVEN_OPTS=-Dos.name=Cygwin mvn clean

(You need to use MAVEN_OPTS rather than passing -Dos.name=Cygwin to maven; once maven starts the value is immutable.)

Color is back. It turns out, any value for os.name will work (for example, "Bob") as long as it doesn't contain "win". I picked "Cygwin" for explicitness.

UPDATE: I have to rethink this. Sure I get color now, but at the expense of maven test failing as maven no longer believes I'm on Windows, so is unhappy at the lack of a /bin/sh and general UNIX filesystem. One step forward, two steps back.

Friday, April 21, 2017

Quick Kotlin idiom: naming things

Kotlin is very much a better Java. A great example is this simple idiom for naming things:

open class Named<in T>(val name: String, check: (T) -> Boolean)
    : (T) -> Boolean by check {
    override fun toString() = name
}

So I can write something like this:

fun maybe_three(check: (Int) -> Boolean) {
    if (check(3)) do_the_three_thing()
    else println("\"$check says\", Do not pass go.")
}

maybe_three(Named("Is it three?") { i -> 3 == i })
maybe_three(Named("Is it four?") { i -> 4 == i })

The first call of maybe_three prints: Breakfast, lunch, dinner. The second call prints: "Is it four?" says, Do not pass go.

Many variations on this are possible, not just functions! What makes this example work nicely is delegation—the magical by keyword— for the general feature of naming things by overriding toString(); and for the function delegated to, the elegant lambda (anonymous function) syntax for the last parameter. You can delegate anything, not just functions, so you could make named maps, named business objects, et al, by using delegation on existing types without needing to change them.

Thursday, April 13, 2017

WSL, first experiences

I first tried Windows Subsystem for Linux in December 2016, but was not successful in installing, so I held off.

After getting the Windows 10 Creators Edition update, I saw how much work and love went into improving WSL, and decided to try again. I was rewarded.

On the whole, the installation was smooth, brief, you might even say trivial. There were Windows reboots to enable Developer Mode, and after installing WSL, but much solid effort has gone into making Windows reboots quick and painless, and with a regular Linux distro I'd have rebooted anyhow after upgrading, so no disgruntlement.

And what did I get for my efforts? WSL bash is bash. Just bash. Really, it is just plain old bash, with all the command line tools I've grown accustomed to over 30 years. The best praise I can give a tool: It just works. And WSL just works. (But see Almost there, below.)

Out of the box WSL runs Ubuntu 16.04 (Xenial), the official LTS distribution (long-term support). This is a sane choice for Microsoft. It's stable, reliable, secure, tested, trusted. For anyone wanting a working Linux command line, this is a go-to choice. Still, I updated it.

Things I changed

Even with all the goodness, there were some things I had to change:

The terminal
I immediately installed Mintty for WSL. I've grown to love Mintty on Cygwin, trusting it as a reliable and featureful terminal emulator without going overboard. It's a tasteful balance, well executed. And CMD.EXE, though much improved, still is not there (but may head there; we'll see if PowerShell wins out).
DBus
Not to get into flamewars, I just accept that Ubuntu uses DBus. By default it doesn't run on WSL, but this was easy to fix, and it made upgrading Ubuntu smoother. Using sudo, edit /etc/dbus-1/session.conf as others have suggested (I did it by hand, not with sed). You may have to repeat after upgrading Ubuntu.
The Ubuntu version
It seems trivial, but I was unhappy that diff --color didn't work. Am I shallow—color? Some of the scripts I write for open source provide colorized diff output, and I'd like to work on them in WSL without disabling this feature. Microsoft made much hay over 24-bit color support in CMD.EXE. So I updated to Ubuntu 17, which includes diffutils 3.5 (the version in which --color was added). Microsoft does not official support upgrading Ubuntu, but I ran into no real problems.

Upgrading WSL Ubuntu

Caveat coder — there is a reason this is unsupported by Microsoft at present. I just never ran into those reasons myself. For example, I used DBus to make upgrading happier; I am not using any Linux desktop (graphical) programs, so maybe this could be a reason.

Researching several helpful Internet sources, I:

  1. Edited /etc/update-manager/release-upgrades to use "normal" releases, not just LTS
  2. Fixed /etc/dbus-1/session.conf
  3. Ran sudo do-release-upgrade to move to 16.10 from 16.04
  4. Re-fixed /etc/dbus-1/session.conf
  5. Ran sudo do-release-upgrade -d to move to 17.04 from 16.10

(Pay attention: there are many "yN" prompts were the default is to abort: you must enter "y" to these!)

When I am prompted to reboot, I quit the upgrade, close all WSL terminals, and start a fresh one. There is no actual kernel to reboot: it remains 4.4.0-42-Microsoft throughout. The kernel is emulated by Windows, not an actual file to boot, so upgrades only change the packages bundled with the kernel, not the kernel itself. The underlying abstraction is quite elegant.

Almost there

Can I drop Cygwin and make WSL my daily development environment? Not quite yet. For shell script work, WSL is excellent. But for my Kotlin, Java, Ruby, et al, other projects, I rely on IntelliJ IDEA as my editor (though Emacs might return into my life again). Filesystem interop between Windows programs (such as java.exe) and WSL is good but not perfect.

Other options

Cygwin on Windows
This is and has been my solution for bash on Windows for many years. I will move to WSL when I'm ready, but I'm not ready yet. I need my regular development cycle to work first. (See Almost there.) There are downsides to Cygwin, for example, coping with line endings, but it's been reliable for me.
Homebrew on Mac
This is work. My company issues me a Mac laptop, and I use it. For the most part, it is fine for work with colleagues and clients, though at times the Mac is a little strange, and much of the user experiences feels counterintuitive. Still, the software mostly works, and the hardware is incredibly good.

But why not just use Linux? Well, my daily machine at home is a Windows box. Because it's my gaming rig, and games I play don't run well in Linux, and getting a Mac desktop is not currently a pretty story.

UPDATE: More on how syscalls work.

UPDATE: Slightly dated (Microsoft is moving very fast on WSL—kudos!), this is a good video presentation on what happens under the hood.

Wednesday, April 12, 2017

Quick diff tip, make, et al

I'm using make for a simple shell project, to run tests before committing. The check was trivial:

SHELL = bash

test:
	@./run-tests t | grep 'Summary: 16 PASSED, 0 FAILED, 0 ERRORED' >/dev/null

This has the nice quality of Silence is Golden: say nothing when all is good. However, it loses the quality of Complain on Failure: it simply fails without saying why.

A better solution, preserving both qualities:

SHELL = bash

test:
	@diff --color=auto \
	    <(./run-tests t | grep 'Summary: .* PASSED, .* FAILED, .* ERRORED') \
	    <(echo 'Summary: 16 PASSED, 0 FAILED, 0 ERRORED')

It still says nothing when all is good, but now shows on failure how many tests went awry. Bonus: color for programmers who like that sort of thing.

Why set SHELL to bash? I'm taking advantage of Process Substitution. Essentially the command outputs inside the subshells are turned into special kinds of files, and diff likes to compare files. Ksh and Zsh also support process substitution, so I'm going with the most widely available option.

UPDATE:

Why are my arguments to diff ordered like that? In usual testing language, I'm comparing "actual" vs "expected", and more commonly you'll see programmers list "expected" first.

diff by default colors the left-hand input in RED, and the right-hand input in GREEN. On failure, it makes more sense to color "actual" in red and "expected" in green. Example output on failure:

$ make
1c1
< Summary: 17 PASSED, 1 FAILED, 0 ERRORED
---
> Summary: 19 PASSED, 0 FAILED, 0 ERRORED
make: *** [Makefile:4: test] Error 1

Tuesday, April 04, 2017

Maven logging and the command line

I usually set up my Maven-build projects to be as quiet as possible. My preference is "Silence is Golden": if the command says nothing on the command line, it worked; if it failed, it prints to STDERR.

However, sometimes I want to see some output while I'm tracking down a problem. How best to reconcile these?

Maven 3.3.1 (maybe earlier) introduced the .mvn directory for your project root (note leading DOT). In here you can keep a jvm.config file which has the flags to the java command used when running mvn. Here's my usual jvm.config:

-Dorg.slf4j.simpleLogger.defaultLogLevel=WARN
-Dorg.slf4j.simpleLogger.log.Sisu=WARN

This quiets maven down quite a bit, using the properties to control Maven's more recent logger, SLF4J. And I normally commit that into my project code repository.

And for those times I'd like more output? I could edit the file, but I don't trust myself enough not to accidentally commit those temporary changes. So I use the command line:

$ MAVEN_OPTS='-Dorg.slf4j.simpleLogger.defaultLogLevel=INFO' mvn

Ultimately mvn reads .mvn/jvm.config, putting the contents into the variable MAVEN_OPTS, and uses MAVEN_OPTS in the invocation of java, and you can override the variable yourself on the command line.

Sunday, April 02, 2017

DDD, A Handmaid's Tale

(No, this is not a post about the venerable and excellent GNU DDD.)

Documentation Driven Development—DDD—is a term I just made up (not really; read on). I was working on some code TDD-style ("first, write a failing test"), and also thinking about my user documentation. My usual practice is to get my tests and code into good shape, push-worthy, and then update the documentation with my improvements (one hopes). Then the thought struck me: I'm doing this wrong!

We write tests first as miniature specifications for the code. But my documentation is conveying to the public my specifications. In the world of closed-source software, this makes sense. You prepare the documentation to ship to customers (internal or external); generally holding off until the code is stable so your documentation is mostly accurate. After all, with closed source, users can't see your tests or the code: the documentation is their only view into how to use your code.

With open-source software, this picture is radically changed. Your users can see your tests and code, in fact, you generally encourage them to look, or fork! So now your tests are little visible public specifications. Why documentation then?

Personally I still like solid documentation on open source projects. True, I could just browse the tests. But that isn't the best way to start with code that is new to me. I'd like to see examples, some explanation, perhaps some architecture or high-level pictures. Hence, documentation.

So, back to DDD. If I'm pushing out my tests and code to a public repository as soon as they pass (or near enough), how is my documentation ever to keep up? How do I encourage others to clone or fork my code, and contribute? I still want new users to have good documentation for getting started; I still want my tests to ultimately define my specifications. The answer is easy: First write failing documentation.

This is not at all a new idea! See Steve Richert, Zach Supalla, and many others. An early form of this idea is Knuth's Literate Programming.

Failing documentation

What is "failing documentation"?

Firstly, just as with "failing tests", you start with documentation of how your code should behave, but which isn't actually the case. The ways to do this are the usual suspects:

  • Write examples which don't work, or possibly don't even compile
  • Write explanations which don't fit your code
  • Write step-by-step walkthroughs which can't be followed
  • Write architecture diagrams which are wrong
  • Etc, etc, etc, anything you'd put in documentation which is invalid for your current code

Then you fix it:

  1. Write failing documentation
  2. Write failing tests which correspond to the documentation
  3. Fix the code to make the tests pass, and the documentation correct

Afterwards you have:

  • Current, accurate documentation
  • Current, passing tests
  • Current, working code

Supporting ecosystems

As straight-forward as DDD is to explain, some software ecosystems make it easier to actually do than others. A standout example is Python and doctest. In doctest you write your tests directly in the API documentation as examples. This is a perfect marriage of documentation and tests.

Swagger is an interesting case. It's generally a documentation-first approach tailored for REST API specifications. But the documentation is "live documentation"—i.e., an executable web form for exploratory testing—rather than text and code examples to read. Using DDD, you would write your REST API specification first in Swagger, then write failing tests around that before fixing the code to implement. Clever people have leveraged this.

About the post title

The Handmaid's Tale is a sly reference to Chaucer's The Wife of Bath's Tale (featuring a strong protagonist balancing among bickering companions), and The Merchants's Tale sequence. Documentation has often been treated as subservient to code, an afterthought, when really it is the first thing most new users see about a system. Give it its due.

Saturday, April 01, 2017

Kotlinc on Cygwin

There may be a better way, but I found that running kotlinc to bring up the Kotlin REPL, while in a Cygwin BASH shell using Mintty, did not respond to keyboard input. A little research indicated the issue is with JLine, which has some understandable difficulties reconciling running under Cygwin with running under CMD.

The workaround I used:

$ JAVA_OPTS='-Djline.terminal=unix' kotlinc
Welcome to Kotlin version 1.1.1 (JRE 1.8.0_121-b13)
Type :help for help, :quit for quit
>>> println("FOOBAR")
println("FOOBAR")
FOOBAR

Requesting JLine to use UNIX-y primitives for terminal access solved the problem. I would like to hear about other solutions.

UPDATE: Edited for clarity. And some additional reading:

Saturday, March 18, 2017

Followup on Bash long options

A followup on Bash long options.

The top-level option parsing while-loop I discussed works fine for regular options. Sometimes you need special parsing for subcommand options. A hypothetical example might be:

$ my-script --toplevel-thing my-subcommand --something-wonderful option-arg

Here the --toplevel-thing option is for my-script, and --something-wonderful option and its option-arg is for my-subcommand. Regular getopts parsing will try to handle all options for the top level, failing to distinguish subcommand options as separate. Further, getopts in a function does not behave quite as expected.

One solution is simple and hearkens back to the pre-getopts days. For the top level:

while (( 0 < $# ))
do
    case $1 in
        --toplevel-thing ) _toplevel_thing=true ; shift ;;
        -* ) usage >&2 ; exit 2 ;;
        * ) break ;;
    esac
done

Using a while-loop with explicit breaks avoids looking too far along the command line, and wrongly consuming options meant for subcommands. Rechecking $# each time through the loop breaks gracefully. Similarly, for subcommands expressed in a function:

function my-subcommand {
    while (( 0 < $# ))
    do
        case $1 in
            --something-special ) local option_arg="$2" ; shift 2 ;;
            * ) usage >&2 ; exit 2 ;;
        esac
    done
    # Rest of my-subcommand, using `option_arg` if provided

This uses the same pattern as the top level so you avoid needing to remember to handle top level one way, and subcommand another.

An example script using this pattern.

Monday, March 13, 2017

Frequent commits

Pair posting with guest Sarah Krueger!

A source control pattern for TDD

At work we recently revisited our commit practices. One issue spotted: we didn't commit often enough. To address we adopted the source control pattern in this post. There are lots of benefits; the one that mattered to me most: No more throwing the baby out with the bathwater, that is, no more two hour coding sessions only to start again and lose the good with the bad.

So we worked out this command-line pattern using single unit-of-work commits (without git rebase -i!):

# TDD cycle: edit code, run tests, rather-rinse-repeat until green
$ git pull --rebase --autostash && run tests && git commit -a --amend --no-edit
# Simple unit-of-work commit, push, begin TDD cycle again
$ git commit --amend && git push && git commit --allow-empty -m WIP

What is this?

  1. Start with a pull. This ensures you are always current, and find conflicts as soon as possible.
  2. Run your full tests. This depends on your project, for example, mvn verify or rake. If some tests are slow, split them out, and add a full test run before pushing.
  3. Amend your work to the current commit. This gives you a safe fallback known to pass tests. Worst case you might lose some recent work, but not hours worth. (Hint: run tests often.)
  4. When ready to push, update the commit message to the final message for public push.
  5. Push. Share. Make the world better.
  6. Restart the TDD cycle with an empty commit using a message that makes sense to you, for example "WIP" (work in progress); the message should be obvious not to push. Key: the TDD cycle command line only amends commits, so you need a first, empty commit to amend against.

Why?

They key feature of this source control pattern is: Always commit after reaching green on tests; never commit without testing. When tests fail, the commit fails (the && is short-circuit logical and).

In the math sense, this pattern makes testing and committing one-to-one and onto. Since TDD requires frequent running of tests, this means frequent commits when those tests pass. To avoid a long series of tiny commits when pushing, amend to collect a unit of work.

Bootstrapping

The TDD cycle depends on an initial, empty commit. The first time using this source control pattern:

# Do this after the most recent commit, before any edits
$ git commit --allow-empty -m WIP

Adding files

This pattern, though very useful, does not address new files. You do need to run git add with new files to include them in the commit. Automatically adding new files can be dangerous if gitignore isn't set up right.

It depends on your style

The exact command line depends on your style. You could include a script to run before tests, or before commit (though the latter might be better done with a git pre-commit hook). You might prefer merge pulls instead of rebase pulls. If your editor runs from the command line you might toss $EDITOR at the front of the TDD cycle.

The command lines assume git, but this source control pattern works with any system that supports similar functionality.

Fine-grained commits

An example of style choice. If you prefer fine-grained commits to unit-of-work single commits (depending on your taste or project; they're both good practice):

# TDD cycle: edit code, run tests, rather-rinse-repeat until green
$ git pull --rebase --autostash && run tests && git commit -a
# Fine-grained commits, push, begin TDD cycle again
$ git push

Improving your life

No matter your exact command line, it can be made friendlier for you. Yes, shell history can story your long chain of commands. What if they vary slightly between programmers sharing a project, or what if there is a common standard approach? Extend git. Let's call our example subcommand "tdd". Save this in a file named git-tdd in your $PATH:

#!/bin/sh
set -e
case $1 in
    test ) git pull --rebase --autostash && run tests && git commit -a --amend --no-edit ;;
    accept ) git commit --amend && git push && git commit --allow-empty -m WIP ;;
esac

Now your command line becomes:

$ git tdd test  # Repeat until unit of work is ready
$ git tdd accept

The source is in GitHub.

Updated:

An editing error left out the Why? section when initially posted.

Remember to autostash.

A follow up post: Push early, push often, push on green.

Saturday, March 11, 2017

Two BDD styles in Kotlin

Experimenting with BDD syntax in Kotlin, I tried these two styles:

fun main(args: Array<String>) {
    println(So
            GIVEN "an apple"
            WHEN "it falls"
            THEN "Newton thinks")
}

data class BDD constructor(
        val GIVEN: String, val WHEN: String, val THEN: String) {
    companion object {
        val So = So()
    }

    class So {
        infix fun GIVEN(GIVEN: String) = Given(GIVEN)
        data class Given(private val GIVEN: String) {
            infix fun WHEN(WHEN: String) = When(GIVEN, WHEN)
            data class When(private val GIVEN: String, private val WHEN: String) {
                infix fun THEN(THEN: String) = BDD(GIVEN, WHEN, THEN)
            }
        }
    }
}

And:

fun main(args: Array<String>) {
    println(GIVEN `an apple`
            WHEN `it falls`
            THEN `Newton thinks`
            QED)
}

infix fun Given.`an apple`(WHEN: When) = When()
infix fun When.`it falls`(THEN: Then) = Then(GIVEN)
infix fun Then.`Newton thinks`(QED: Qed) = BDD(GIVEN, WHEN)

inline fun whoami() = Throwable().stackTrace[1].methodName

data class BDD(val GIVEN: String, val WHEN: String, val THEN: String = whoami()) {
    companion object {
        val GIVEN = Given()
        val WHEN = When()
        val THEN = Then("")
        val QED = Qed()
    }

    class Given
    class When(val GIVEN: String = whoami())
    class Then(val GIVEN: String, val WHEN: String = whoami())
    class Qed
}

Comparing main() methods, which is easier to read or use? I haven't tried implementing, just have looked at testing code style. Note that I'm using the infix feature of Kotlin to have my BDD "GIVEN/WHEN/THEN" as punctuation free as I'm able.

In the one case—using strings to describe cases—, an implementation would be more similar to Spec or Cucumber, which usually uses pattern matching to associate text with implementation. In the other case—using functions to describe cases—, an implementation goes directly into the function definition. In either case, Kotlin only supports binary infix functions, not unary (of course, you say, that's what "infix" means!), so I need either an initial starting token (So in the strings case) or an ending one (QED in the functions case).

I'm curious how implementation sorts out.

(Code here.)

UPDATE:

I have working code now that runs these BDD sentences but remain unclear which of the two styles (strings vs functions) would be easier to work with:

fun main(args: Array<String>) {
    var apple: Apple? = null
    upon("an apple") {
        apple = Apple(Newton(thinking = false))
    }
    upon("it falls") {
        apple?.falls()
    }
    upon("Newton thinks") {
        assert(apple?.physicist?.thinking ?: false) {
            "Newton is sleeping"
        }
    }

    println(So
            GIVEN "an apple"
            WHEN "it falls"
            THEN "Newton thinks")
}

Vs:

fun main(args: Array<String>) {
    println(GIVEN `an apple`
            WHEN `it falls`
            THEN `Newton thinks`
            QED)
}

var apple: Apple? = null

infix fun Given.`an apple`(WHEN: When) = upon(this) {
    apple = Apple(Newton(thinking = false))
}

infix fun When.`it falls`(THEN: Then) = upon(this) {
    apple?.falls()
}

infix fun Then.`Newton thinks`(QED: Qed) = upon(this) {
    assert(apple?.physicist?.thinking ?: false) {
        "Newton is sleeping"
    }
}

The strings style is certainly more familiar. However, mistakes in registering matches of "GIVEN/WHEN/THEN" clauses appear at runtime and do not provide much help.

The functions style is more obtuse. However, mistakes cause compile-time errors that are easier to understand, and your code editor can navigate between declaration and usage.