Git First-Parent-- Have your messy history and eat it too

February 18, 2018


The first thing I encountered learning about git: there's a lot of conflict about whether it's important to keep a "clean" git history by squashing, rebasing instead of merging, etc.

In favor of 'cleanliness' 1:


Then I came across a blog post (sorry, can't find it now) arguing that this disagreement is stupid and git is stupid because the clean-history side is having a problem with data display, not data collection. The problem isn't the extra information (messy commits): it's that the information isn't displayed in a way that shows them what they're interested in (larger changes that are code-reviewed).

The post struck a chord with me, but didn't suggest any solution so it just made me grumpy and otherwise I ignored it.

Now I know about --first-parent (because my last company used it) which gets you the best of all worlds:

First Parent

When git log encounters a merge commit, it normally follows the history backwards through both parents.

For example, after branching off master, adding some commits, and merging into master4, we might see5:

> git log
7e066db David Chudzicki Sat Feb 10 14:23:43 2018 -0500  Merge branch 'feature-branch'
29061db David Chudzicki Sat Feb 10 14:23:30 2018 -0500  2nd commit on master
04bb2b7 David Chudzicki Sat Feb 10 14:22:32 2018 -0500  2nd commit on feature
0d1561a David Chudzicki Sat Feb 10 14:21:43 2018 -0500  first commit on feature
725034c David Chudzicki Sat Feb 10 14:21:23 2018 -0500  first commit on master

But if we say --first-parent, git log will ignore all of the history in the second parent of a merge commit:

> git log --first-parent
7e066db David Chudzicki Sat Feb 10 14:23:43 2018 -0500  Merge branch 'feature-branch'
29061db David Chudzicki Sat Feb 10 14:23:30 2018 -0500  2nd commit on master
725034c David Chudzicki Sat Feb 10 14:21:23 2018 -0500  first commit on master

Which is the "first" and which is the "second" parent of a merge? If you're on branch some-branch and say git merge some-other branch, then the first parent is the latest commit on some-branch and the second is the latest commit on some-other-branch.

You can also use --first-parent with other git commands, like blame and rev-list. By default we see the two individual commits in the file introduced by the merge:

> git blame feature
0d1561a2 (David Chudzicki 2018-02-10 14:21:43 -0500 1) cats
04bb2b70 (David Chudzicki 2018-02-10 14:22:32 -0500 2) dogs

... but with --first-parent we only see the merge commit:

> git blame feature --first-parent
7e066dba (David Chudzicki 2018-02-10 14:23:43 -0500 1) cats
7e066dba (David Chudzicki 2018-02-10 14:23:43 -0500 2) dogs

Don't fast forward

By default, git uses a "fast-forward" when the two branches haven't diverged at all so you don't really need a merge at all. The new commits are just applied as-is with no merge. If you're hoping to get a clean history from --first-parent, you should avoid that because then the individual commits from your feature branch would show up in git log --first-parent. You should only merge with the --no-ff option, which is Github's default merge strategy.

You can turn fast-forward off (for a particular local repository) by running:

git config --local --add merge.ff false

Give your merge commits a good top-line message

Merge commits should always have nice messages summarizing, in the first line, what was introduced/changed.

If you're using Github, it will by default put the the pull request title in the 2nd line of the merge commit's message (and something like Merge pull request #<number> from <username>/<branch> in the first line, which is not as good as putting it in the first line.

At the company where I was introduced to --first-parent, I think we used a custom script that made more useful topline merge commit messages based on pull request titles. But I don't expect everyone to do that, so I'm not sure what to suggest. Unfortunately, this alone might be enough reason to use the squash option instead, if you're using Github for your merges.

Where I work now, if you merge using our (internal) code review tool, the default commit message for the merge will have its top line be the title of your pull request. That's great for this workflow, because a good commit message for posterity is the same as a good title for reviewers. Just by looking at the title of your request, reviewers are reviewing the commit message.

--first-parent is useful in feature branches too

Let's say you're in a feature branch and have merged changes from master a few times. Then using --first-parent here in your feature branch has the same benefits, but in reverse: You can see the history of your work on this feature, but with merges from master grouped into a single commit.

I hope tooling gets better

The workflow I've described here seems good enough that I would suggest it for my team. But not all tools will nicely support --first-parent. For example:


Thanks to Ben Kuhn for comments on an earlier draft, and showing me this when we worked together.

  1. If it weren't for the --first-parent option discussed in this post, I would find these considerations decisive. 

  2. If you use Github's "Squash and merge" feature, this won't be a problem: You can just commit freely as you go and squash/merge after code review is all over. Other teams I've been on asked people to squash before posting a pull request, but had a strong norm against changing history once anyone else has seen the code. My current company's code review tool supports "revisions", so it's possible to see what's changed even if every new version is completely squashed. But that means we're using this internal tool to track history, which is a job much better suited for git. 

  3. If you make changes to the first branch (based on code review feedback) and squash them into the previous commit, it can be tricky to merge those changes into your 2nd feature branch. 

  4. In an empty directory, you can do this to reproduce my git history: git init && touch hi && git add hi && git commit -m 'first commit on master' && git checkout -b feature-branch && echo "cats" > feature && git add feature && git commit -m 'first commit on feature' && echo 'dogs' >> feature && git add feature && git commit -m '2nd commit on feature' && git checkout master && echo "hello" >> "hi" && git commit -am "2nd commit on master" && git merge feature-branch 

  5. My log is showing up consisely formatted with one commit per line because I did git config format.pretty "format:%h%x09%an%x09%ad%x09%s" 

Decentralized Triggered Secret Release

January 21, 2018


Have you ever had to confront a villain with your discoveries, knowing that unless you take precautions they might just kill you? Or gone on a dangerous adventure, wanting to keep the details secret until you come back (or unless you don't)? Or wanted your passwords to be shared with a friend if you became long-term unavailable?

I haven't either, but who knows.

These problems can be solved with a single trusted secret-holder: You give your secret to someone, and tell them not to look at it or share it until some condition is triggered. (The trigger might be e.g. failure to receive an I'm-still-here message from you every week.)

But that's error-prone. Even if you 100% trust their good intentions, it's still easy for a single point of failure to make a mistake (in either direction: they might be compromised and release the secret too early, or they might disappear or lose the secret).


Can we achieve the same kind of triggered release without trusting a single interediary? I think so, and here's the process I'm imagining:

  1. The secret-sharer gives (different) data to each of the secret-holders such that some subset of them could reconstruct the secret
  2. The secret-sharer makes public a condition under which the secret-holders should release the secret publicly
  3. If the condition is met, the secret-holders publish their parts of the secret.
  4. Now anyone1 can reconstruct the secret.

There are some incentives we need to get right in order for this to work:

The last condition rules out a straightforward application of Shamir's secret sharing. We can't have the secret-sharer compute the secret-holders' shares of the secret, since then the secret-holders couldn't trust that they won't be wrongly punished (if the secret-holder accidentally or purposefully released some or all of the shares).

The punishments must be consist of significant losses for the secret-holders (not just failure to get paid), because a secret-holder who doesn't follow the rules undermines the integrity of the whole system. We'll require deposits which they lose if the punishment conditions are triggered.

I think we can use Ethereum to achieve these incentives. This procedure will require all $N$ secret-holders' for the secret-reconstruction phase, but I think we can modify it later so only a proper subset of the $N$ are needed:

(For convenience, let's say $p_0$ is the secret-sharer, and $p_1$ through $p_n$ are the secret-holders.)

  1. The secret-holders make a deposit to the contract.
  2. Each secret-holder $p_i$ ($1 \leq i \leq N$) generates a random string $s_i$ of the same length as the secret.
  3. For convenience, let's say the secret is $s_0$.
  4. Everyone sends the hash of their $s_i$ to the contract.
  5. Then everyone computes and releases the sum S = $\sum_{i=0}^n s_i$ without revealing anyone's $s_i$ (see below for more on how to do this). $S$ is now public, but doesn't reveal anything about the secret.
  6. If the contract's "release" condition is triggered, then the secret-sharers each release $s_i$ ($1 \leq i \leq N$).
  7. Now the secret $s_0$ is public, since $s_0 = S - \sum_{i=1}^n s_i$.


This is pretty rough, but here's a basic idea of the kinds of functions the Ethereum constract would need:

How to Add

How do we compute the sum S = $\sum_{i=0}^n s_i$ without revealing anyone's $s_i$? Here's one way2:

  1. Each $p_i$ generates a random $r_{ij}$ to send to each $p_j$.
  2. Each $p_i$ now knows $r_{ij}$ and $r_{ji}$ for all other players $j$.
  3. Each $p_i$ computes and publishes $s_i + \sum_{j=0}^n r_ij + \sum_{j=0}^n r_ij$.
  4. Everyone can add up the numbers in (3) to obtain $S$.

What about cheaters?

What if someone provides the wrong input to the sum computation, one they don't even store? Then the secret won't be recoverable, but we can't punish them for this or even know that they're behaving inconsistently (publishing the hash of a different $s_i$ than the one they used in the sum).

I can think of two ways to address this risk:

We could do a random number of "trial runs" first, where the secret-sharer uses a random $s_0$ instead of their secret. Only the secret-sharer will know whether we're in a trial-run. After the secret-sharer announces "that was a trial run", everyone (including the secret sharer) reveals their $s_i$. If the sum is $S$ and the hashes match up, then everyone followed the protocol. If everyone followed the protocol in a bunch of trial runs (and they don't know the real one is coming), then they'll probably be following it in the real run too.

Alternatively, we could switch to something fancier. A lot has been written about secure multiparty computation, including with error-detection.

M of N

As described, this scheme requires all secret-holders to follow the protocol for the secret release. This means that even a single one could hold out for extra payment (or just lose their key).

We really want the secret to be reconstructable from just a smaller subset of the secret-holders.

I think it's probably possible to do this using polynomials like in Shamir's secret sharing - but a very inefficient (probably too inefficient) way would be to repeat the procedure for each subset of $M$ secret-holders.


It might make sense for a secret-holder to be able to use the same deposit for multiple different people's secrets. The benefit would be having a larger deposit overall, so perhaps greater security. The drawback would be that once a secret-holder has lost their deposit for revealing one $s_i$ too early, they have no incentive to keep the others secret or reveal them at the right time.

Time limits

The scheme requires the secret to be released within some fixed time period, because the depositors need to get their deposit back at some point.

Alternatively, we could set things up so that the secret-sharing has the option to pay the holders for each additional time period during which they don't release the secret. As with the originl payment, the continuing would have to be at least a bit higher than the interest the secret-holders could receive on their deposit if they hadn't made it.

The secret-holders would not have the option to withdraw their deposit unless the secret-sharer stops paying.


Thanks to my friend Jeff Kaufman for earlier conversations about this.

  1. Note that this same process can be used for releasing a secret to a single secret recipient rather than the public: First the secret-sharer would encrypt the secret with the recipient's public key, and then the rest of the process is followed as usual.) 

  2. Adding this way isn't original to me. I'm not sure where I came across this, but there's a large literature on this kind of thing. 

Regexes for replacing ugly unittest-style assertions

December 30, 2017

In case they help anyone else, here are some regular expressions I used once to convert some ugly unittest-style assertions (e.g. self.assertEqual(something, something_else) to the pytest style (simply assert something == something_else):

sed -i ".bak" -E 's/self\.assertFalse\((.*)\)/assert not \1/g' tests/*.py
sed -i ".bak" -E 's/self\.assertTrue\((.*)\)/assert \1/g' tests/*.py
sed -i ".bak" -E 's/self\.assertEqual\(([^,]*), (.*)\)$/assert \1 == \2/g' tests/*.py
sed -i ".bak" -E 's/self\.assertIn\(([^,]*), (.*)\)$/assert \1 in \2/g' tests/*.py
sed -i ".bak" -E 's/self\.assertNotEqual\(([^,]*), (.*)\)$/assert \1 != \2/g' tests/*.py
sed -i ".bak" -E 's/self\.assertNotIn\(([^,]*), (.*)\)$/assert \1 not in \2/g' tests/*.py
sed -i ".bak" -E 's/self\.assertIsNone\((.*)\)$/assert \1 is None/g' tests/*.py
sed -i ".bak" -E 's/self\.assertIsNotNone\((.*)\)$/assert \1 is not None/g' tests/*.py

(Pytest gives nice informative error messages even if you just use the prettier form.)


An Interaction or Not? How a few ML Models Generalize to New Data

March 4, 2015

Source code for this post is here.

This post examines how a few statistical and machine learning models respond to a simple toy example where they're asked to make predictions on new regions of feature space. The key question the models will answer differently is whether there's an "interaction" between two features: does the influence of one feature differ depending on the value of another.

In this case, the data won't provide information about whether there's an interaction or not. Interactions are often real and important, but in many contexts we treat interaction effects as likely to be small (without evidence otherwise). I'll walk through why decision trees and bagged ensembles of decision trees (random forests) can make the opposite assumption: they can strongly prefer an interaction, even when the evidence is equally consistent with including or not including an interaction.

I'll look at point estimates from:

I'll also look at two models that capture uncertainty about whether there's an interaction:

BART has the advantage of expressing uncertainty while still being a "machine learning" type model that learns interactions, non-linearities, etc. without the user having to decide which terms to include or the particular functional form.

Whenever possible, I recommend using models like BART that explicitly allow for uncertainty.

The Example

Suppose you're given this data and asked to make a prediction at $X_1 = 0$, $X_2 = 1$ (where there isn't any training data):

plot of chunk unnamed-chunk-2

X1 X2 Y N Training Rows:
0 0 Y = 5 + noise 52
1 0 Y = 15 + noise 23
1 1 Y = 19 + noise 25
0 1 ? 0

( here for the rest of this post)

Covariance As Signed Area Of Rectangles

March 26, 2014

A colleague at work recently pointed me to a wonderful stats.stackexchange answer with an intuitive explanation of covariance: For each pair of points, draw the rectangle with these points at opposite corners. Treat the rectangle's area as signed, with the same sign as the slope of the line between the two points. If you add up all of the areas, you have the (sample) covariance, up to a constant that depends only on the data set.

Here's an example with 4 points. Each spot on the plot is colored by the sum corresponding to that point. For example, the dark space in the lower left has three "positively" signed rectangles going through it, but for the white space in the middle, one positive and one negative rectangle cancel out.

In this next example, x and y are drawn from independent normals, so we have roughly an even amount of positive and negative:

Formal Explanation

The formal way to speak about multiple draws from a distribution is with a set of independent and identically distributed (i.i.d.) random variables. If we have a random variable X, saying that X1, X2, … are i.i.d means that they are all independent, but follow the same distribution.

( here for the rest of this post)

Previous Posts

February 16, 2014

Simulated Knitting (post) Knit3D

I created a KnittedGraph class (subclassing of Python's igraph graph class) with methods corresponding to common operations performed while knitting:

g = KnittedGraph()
g.ConnectToZero() # join with the first stitch for a circular shape 
g.NewRow() # start a new row of stitches
g.Increase() # two stitches in new row connect to one stitch in old

I then embed the graphs in 3D space. Here's a hat I made this way:


2D Embeddings from Unsupervised Random Forests (1, 2) random_forest_visualizations

There are all sorts of ways to embed high-dimensional data in low dimensions for visualization. Here's one:

  1. Given some set of high dimensional examples, build a random forest to distinguish examples from non-examples.
  2. Assign similarities to pairs of examples based on how often they are in leaf nodes together.
  3. Map examples to 2D in such a way that similarity decreases decreases with Euclidean 2D distance (I used multidimensional scaling for this).

Here's the result of doing this on a set of diamond shapes I constructed. I like how it turned out:


A Bayesian Model for a Function Increasing by Chi-Squared Jumps (in Stan) (post) stan_increasing_function

In this paper, Andrew Gelman mentions a neat example where there's a big problem with a naive approach to putting a Bayesian prior on functions that are constrained to be increasing. So I thought about what sort of prior would make sense for such functions, and fit the models in Stan.

I enjoyed Andrew's description of my attempt: "... it has a charming DIY flavor that might make you feel that you too can patch together a model in Stan to do what you need."


Lissijous Curves JSFiddle

Some JavaScript I wrote (using d3) to mimick what an oscilloscope I saw at the Exploratorium was doing:


Visualization of the Weirstrass Elliptic Function as a Sum of Terms


John Baez used this in his AMS blog Visual Insight.