Thursday, December 28, 2017

Push early, push often, push on green

(This post follows up on Frequent commits, a post about git command line help for TDD. I assume you are already following good TDD practices. Also, please recall git normally requires pulling before pushing if your repo is behind on commits from the remote.)

Prologue

I'm chatting with a colleague who is new in her role as an Agile Coach (she comes from a PM background). We were talking about ways to organize a team's story board (card wall), and turned to Desk Checks and when they fit into a story's life.

An interesting remark by her: a team had agreed to share work (push) only after the desk check was successful; that is, they did not push code until a story was almost done: work lay fallow on individuals' machines for potentially days at a stretch.

I was surprised. Why would they wait days to push—what did they do about merge conflicts, complex refactorings, integration failures in the pipeline, et al?

Lecture

Entropy

To me this was clearly a smell. Martin Fowler specifically addresses this in Everyone Commits To the Mainline Every Day, and I would go further: Push commits at the earliest responsible moment. This is opposite the advice for refactoring, or especially emergent design, where the "Rule of 3" and last responsible moment cautions waiting for more information before committing to a course of action.

And you can see why early pushes differ from the other two: waiting will not get you more information. On the contrary, waiting will only increase the entropy of the code base! Commits lie fallow in the local repo, increasing the size of potential merge conflicts for others.

  Benefit from more information? Principle Entropy from waiting
Early push None available Earliest responsible moment Rises from fallow commits
Refactoring Get more code examples Rule of 3 Falls after refactoring
Architecture decision Learn more about system Last responsible moment Falls if responsible

(The "information" in the case of pushes are the pulled commits themselves.)

I've definitely experienced this firsthand, when I'd eventually discard my local commits after waiting too long, and letting them grow too much in a different direction from how the rest of the team progressed. Waste!

Complexity

Consider this work cycle:

  1. Local commit
  2. Fetch commits from remote
  3. Merge, if needed
  4. Local commit again, if needed
  5. Push commits to remote

I've grouped these to emphasize what is local to you (the first four) and what is global to your team (the last one).

Considered only locally, you minimize entropy with frequent pulls for yourself, and likewise for your teammates individually, so you can catch merge conflicts early and resolve them when they are small. But considered globally, you need frequent pushes so those local pulls have only small changes in them. The longer you wait to push, the more work for those who pull.

Early push
You Rest of team Work for others
You commit    
You push  
  They pull Less complexity of merge (1 commit)
You commit  
You push  
  They pull Less complexity of merge (1 commit)

Each single push can be treated on it's own. There are two opportunities for merge conflict, but each is a small amount of work.

Late push[1]
You Rest of team Work for others
You commit    
  They pull No changes to merge
You commit  
You push  
  They pull Greater complexity of merge (2 commits)

In each scenario, there are two commits for others to contend with. The larger, combined push has a greater opportunity for merge conflict, and a greater chance for a large amount of work, because of the combined interactions of the two commits.

And as teams work in parallel, there are more opportunities for merge conflicts.

Push early, push often, push on green

From the above discussion, the safest course is to push early rather than wait as commits pile up locally. But when to push—what is the "earliest responsible moment"?

If your codebase is well-tested, and answer presents itself: Push when tests are green and changes alter the complexity.

The goal is to avoid complex commit interactions that lead to merge conflicts. Tests are the safety net. Further, if all else fails and a commit is bad, it is easy to throw away the last commit until things are right again: only a small amount of work is lost, not days worth.

Understanding what kind of changes alter complexity takes skill: skills improve with experience and coaching. The cost of early pushes is low, and the occassional penalty of late pushes high, so this would be a good topic for a "team norms" ("dev practices") discussion.

For example, the team might agree that changes to comments are not in themselves worth a push. At the other end, your refactorings which impact more than one source file almost certainly should be pushed early: discover their impact on others before you add more refactorings.

A good work cycle:

  1. Pull
  2. Build, run tests
  3. Edit sources
  4. Build, run tests
  5. Commit
  6. Pull and push

After a preliminary sanity check (#1 and #2), get in the the cycle of #3 to #6.

Epilogue

I checked with other teams: it is minority practice to wait until a successful desk check to push changes. That's a relief. Hopefully this practice can be made more rare.

One rational reason—itself a smell—is when tests take too long to run frequently. When I design a pipeline, I recommend breaking out "unit" tests from "integration" tests for this exact reason: even when integration tests run long, the initial CI stage with just unit tests should be fast enough to give quick feedback on frequent pushes, and encourage Push early, Push often, Push on (local) green.

Further reading

Footnotes

[1] The simple statement, "a greater chance for a large amount of work", has rather complex reasoning behind it, beyond the scope of this post.

For example, any particular commit can be viewed as applying an exponent to the overall complexity of a program. A neutral change (say, correcting a typo in a comment) has the exponent 1: it does not change the overall complexity; a positive change (say, removing code duplication) has an exponent between 0 and 1: it lowers the overall complexity; a negative change (say, adding a new dependency) has an exponent greater than 1: it raises the overall complexity.

Consider then that these complexity changes are not simple numbers, but distributions ("odds"), and change with time ("bitrot"), and involve more than the code (people or requirments changes).

[2] In Seth's post, do not confuse "publish once" with "wait to push": it means "don't publish the same commit twice" (which does sometimes happen accidentally, even for experts, from merging or rebasing).

Update

Sure enough, right after posting I read an interesting discussion on the value of the statistical mean (average) relevant to the discussion on two commits taken separately or together.

Essentially, even when the merge conflict work averages out over time for pushing two commits separately versus pushing them together, the outliers for pushing them together is significantly worse than for pushing them separately because of interactions and complexity.