Truths about proofs and groups

9 February 2018

A while back, a PhD student in our group asked me whether the sum over all elements of a stabilizer is a projector. If you know what this means, you're probably (a) a quantum information theorist (in which case, stop reading here) and (b) already know the answer and how to prove it (unlike me, who'd forgotten both).

The answer is no doubt well-known to anyone who works on quantum stabilizer codes, and we could have just googled for the result. It seemed like a nice, self-contained mathematical question, though. So rather than googling, we tried to figure it out for ourselves at the blackboard.

If you just want to see the simple final answer, skip to the end. But then you'll miss all the fun and the main point of this post. The way we came up with the solution makes for a nice toy example of the convoluted, messy and inelegant process by which mathematical results are really proven. Before they get polished up into the simple, elegant, pristine proofs "from The Book" that are all you ever get to see in textbooks and research papers. The unspoken (or at least unpublished) reality is that elegant proofs invariably emerge after following numerous blind alleys, unjustified intuitive leaps, and inelegant, round-the-houses arguments. All of which get simplified away in time for publication. (Or maybe that's just my proofs!)

Instead of just explaining the elegant final answer, I'm going to explain the inelegant process we went through to reach it.

Background

I'm going to assume some basic knowledge of group and representation theory in this post:1 1Thereby reducing the audience by a factor of about 100,000. But 0/100,000 = 0 anyway. not much more than what a group is, what a representation is, what an irreducible representation (irrep) is, and what a unitary matrix is. But let me quickly summarise the small amount of quantum information terminology needed to understand the question.

The Pauli matrices are a particular set of four 2x2 complex matrices that crop up all over the place in quantum mechanics:2 2The Pauli matrices are often denoted I,X,Y,Z in the quantum information literature, or \(\mathbf{1}\), \(\sigma_x\), \(\sigma_y\), \(\sigma_z\) in the physics literature. But indexing them by integers is also common, and often more convenient.

\begin{align*} \sigma_0 &= \begin{pmatrix} 1 & 0 \\ 0 & 1\end{pmatrix} & \sigma_1 &= \begin{pmatrix} 0 & 1 \\ 1 & 0\end{pmatrix}\\ \sigma_2 &= \begin{pmatrix} 0 & -i \\ i & \phantom{-}0\end{pmatrix} & \sigma_3 &= \begin{pmatrix} 1 & \phantom{-}0 \\ 0 & -1\end{pmatrix} \end{align*}

If we include scalar multiples of these matrices by \(\pm 1\) and \(\pm i\), they form a group under matrix multiplication.

"The Pauli group" is a misnomer, however. When a quantum information theorist says "the Pauli group", they're not talking about this particular group, but any of the infinite family of groups formed by tensor products of Pauli matrices (and \(\pm 1, \pm i\) multiples of these), of which the above group is just the simplest case. I.e.

\begin{equation*} G_n = \biggl\{s \bigotimes_{k=1}^n \sigma_{i_k} \biggm| s\in \{\pm 1, \pm i\}, i_k \in \{0,1,2,3\}\biggr\} \end{equation*}

for any \(n \in \mathbb{N}\).

When a quantum informaton theorist talks about "a stabilizer", they mean an abelian subgroup of (one of) the Pauli group(s).3 3In group theory, "stabilizer" has a more general meaning, of which the usual quantum information meaning is a particular case.

So the student was asking me whether, for any stabilizer \(S = \{S_i\}\), \(\sum_i S_i\) is always a projector (i.e. a matrix which squares to itself). Checking a few simple examples, such as \(S = \{\sigma_0, \sigma_3\}\), \(S = \{\sigma_0, \sigma_1\}\), \(S = \{\sigma_0\otimes\sigma_0, \sigma_1\otimes\sigma_0, \sigma_0\otimes\sigma_1, \sigma_1\otimes\sigma_1\}\) etc., it quickly becomes clear that this isn't true unless we normalise the sum appropriately. From these examples, the correct normalisation appears to be to divide by the number of elements in the stabilizer, which I'll write as \(|S|\):

For any stabilizer \(S = \{S_i\}\),

\begin{equation*} P := \frac{1}{|S|} \sum_i S_i \end{equation*}

is a projector, i.e. \(P^2 = P\).

Picking sides

Once upon a time, long, long ago, when I was a PhD student myself, I co-authored a paper about stabilizers. Maybe I knew the answer to the student's question back then. But if so, I'd long since forgotten it. Group and representation theory haven't cropped up much in my research since. They're very important in many areas of quantum information, just not the areas I've happened to work in, which tend to have more of an analysis flavour than an algebra flavour.

To make progress on a maths conjecture, you have to start off by picking sides: true or false. After a while trying and failing to prove a result, it pays to flip sides and instead try to find a counterexample. After some time trying and failing to construct a counterexample, it often pays to flip back and try to prove the result again.

It doesn't matter much if you pick the wrong side initially. Trying and failing to prove a result gives useful insight into how to construct a counterexample. Trying and failing to construct a counterexample gives useful insight into proof approaches. (Typical undergraduate problems completely fail to teach this aspect of "real" mathematics, by making it obvious which side you're supposed to pick.)

Gut instinct also plays a bigger role in mathematics research than you might imagine. Especially in picking an initial side. In this case, my gut was telling me the conjecture could well be true. It just had the flavour of the kind of simple, elegant result that tends to crop up in representation theory.

Another good way of picking an initial side is to test the conjecture numerically on a computer. (Another technique rarely taught at undergraduate level.) An even better way is to test the conjecture by hand on some simple examples before jumping into numerics. You often gain useful insight from working out examples by hand, that you don't get just from seeing an answer spat out by a computer. In this case, the student had already done some numerics before coming to me, and had verified the conjecture for a large number of stabilizers.

So my instinct was to start off trying to prove the conjecture, rather than trying to disprove it. But first, I wanted to change the question. (Unlike in undergraduate problems, you get to do this in real research!) It seemed to me there were three reasons the conjecture could be true:

  1. It was a special property of stabilizers.
  2. It was true for any abelian group.
  3. It was true for any group.

My gut instinct was to go for 2. I didn't fully think this through at the time, but I suspect there were two factors behind this. Firstly, apart from being abelian, the only other special property of stabilizers that sprang to mind was that elements of the Pauli group are Hermitian matrices. But if the result hinged on this, it would mean a linear algebra proof rather than a pure representation theory proof. Whereas if it were true for all abelian groups, there would be less structure to work with, which promised either a simpler proof, or an easily-found counterexample.

Secondly, one thing I did remember is that the irreps of abelian groups are very simple: they're all 1-dimensional. I.e. they're just scalars (or \(1\times 1\) matrices, if you prefer). Which suggests it ought to be easier to prove the result for abelian groups than for arbitrary groups.

We'll see by the end that neither of these reasons was very good. I told you the real process of proving things was messy!

Abelian groups

So, I was now trying to prove the more general conjecture:

For any representation \(\varphi(G)\) of an abelian group \(G\),

\begin{equation*} P := \frac{1}{|G|} \sum_{g\in G} \varphi(g) \end{equation*}

is a projector, i.e. \(P^2 = P\).

The best way to get going on a proof is often not to go for the result right away, but to first see what interesting initial observations you can make, even if they're seemingly unrelated to the problem. One thing that immediately leapt out at me here is that \(\sum_{g\in G} \varphi(g)\) commutes with every element of the representation.

Why is this true? And more importantly, why did it leap out at me? About the only group theory result I remember from the grand total of two weeks of group theory taught in my undergraduate degree4 4My undergraduate degree is in physics, specifically the Cambridge Natural Sciences tripos. Group and representation theory are to this day taught in two weeks at the very end of the second year Maths for Natural Sciences course, just before exams. Guess how much group and representation theory I learned? To be fair, they cram quite a lot of group theory into those two weeks. But I didn't learn even the basics properly until I was teaching that very same course 15 years later. is the Rearrangement theorem: multiplying all group elements by some fixed group element just shuffles the elements around. A theorem which is pleasingly concise when stated precisely:5 5The Rearrangement theorem is also an easy one to prove, using just the defining properties of a group. Try it!

(Rearrangement theorem)

\(\{hg | g \in G\} = G\).

By applying the Rearrangement theorem to our group representation6 6Remember that a representation is itself a group, so the Rearrangement theorem applies equally well to representations as well as to the groups themselves. \(\varphi\) we see that, for any group element \(h\),

\begin{align*} \varphi(h) \biggl(\sum_{g\in G}\varphi(g)\biggr) &= \sum_{g\in G}\varphi(h)\varphi(g) \\ &= \sum_{g\in G} \varphi(g) \qquad \text{by the Rearrangement theorem} \\ &= \sum_{g\in G}\varphi(g)\varphi(h) \\ &= \biggl(\sum_{g\in G} \varphi(g)\biggr) \varphi(h), \end{align*}

i.e. \(\sum_{g\in G} \varphi(g)\) commutes with every element \(\varphi(h)\) of the representation, as claimed.

Why did this seem interesting to me? OK, I lied. The only other group representation theory result I remember without looking it up is Schur's lemma, which I remember in the following form:

(Schur's lemma)

If a matrix commutes with every element of an irreducible representation, it must be proportional to the identity matrix.

If you're facing any problem involving group representations, it's usually a good bet Schur's lemma will turn out to be useful in proving it. But I couldn't immediately see any way to make use of it here. Schur's lemma only applies to irreducible representations. But nothing says our \(\varphi\) is irreducible. And anyway, a matrix that's proportional to the identity isn't a projector unless it's exactly the identity matrix. And trying a couple of simple examples is enough to see that our projector isn't always the identity. I parked this observation for now. (Turns out I should have come back to it sooner. Did I warn you constructing real proofs is a messy process?)

Still, rewriting the conjecture in terms of irreps seemed like a good thing to do. One of the first things you learn in representation theory is that any (unitary) representation can be decomposed into a sum of irreducible representations, or "irreps" for short. I.e. there is always a unitary matrix \(U\) that we can conjugate by, which simultaneously brings all of the matrices \(\varphi(g)\) into block-diagonal form, with one irrep \(\varphi_\lambda\) in each block:

\begin{equation*} U \varphi(g) U^\dagger = \bigoplus_\lambda \varphi_\lambda(g) = \begin{pmatrix} \boxed{\mspace{6mu}\varphi_1(g)\mspace{6mu}\vphantom{\sum_x^a}} \\ & \boxed{\mspace{6mu}\varphi_2(g)\mspace{6mu}\vphantom{\sum_x^a}} \\ & & \ddots \\ & & & \boxed{\mspace{6mu}\varphi_\lambda(g)\mspace{6mu}\vphantom{\sum_x^a}} \end{pmatrix} \end{equation*}

The direct sum symbol \(\bigoplus\) is just a convenient notation for "put these down the diagonal", as shown in the matrix here. I'll use this more compact notation from now on, rather than drawing out big block-matrices.

We can transform \(\varphi(g)\) to a sum over its irreps \(\varphi_\lambda(g)\) by conjugating with the appropriate unitary matrix \(U\). Here's the thing we're trying to prove is a projector in our conjecture again:

\begin{equation*} P := \frac{1}{|G|} \sum_{g\in G} \varphi(g). \end{equation*}

Conjugating the right hand side of this by \(U\) gives:

\begin{align*} U \biggl(\sum_{g\in G} \varphi(g)\biggr) U^\dagger &= \sum_{g\in G} U\varphi(g)U^\dagger \\ &= \sum_{g\in G} \bigoplus_\lambda \varphi_\lambda(g) \\ &= \bigoplus_\lambda \Bigl(\sum_{g\in G} \varphi_\lambda(g)\Bigr). \end{align*}

Conjugating the left hand side by \(U\) gives \(U P U^\dagger\). Since \(U^\dagger U = I\) (the defining property of unitary matrices), and assuming that \(P\) is indeed a projector so that \(P^2 = P\), we have

\begin{equation*} (U P U^\dagger)^2 = U P U^\dagger U P U^\dagger = U P U^\dagger. \end{equation*}

This means that if our conjecture is true and \(P\) really is a projector, then \(U P U^\dagger\) is also a projector. If we call it \(P' := U P U^\dagger\), then our original conjecture transformed to irreps becomes:

For a representation \(\varphi(G) = \bigoplus_\lambda \varphi_\lambda(G)\) of an abelian group \(G\) with irreps \(\varphi_\lambda\),

\begin{equation*} P' := \bigoplus_\lambda \Bigl(\sum_{g\in G} \varphi_\lambda(g)\Bigr) \end{equation*}

is a projector, i.e. \(P'^2 = P'\).

If the original conjecture is true, then so is this one. But if we conjugate this one by \(U^\dagger\), we get back the original conjecture. So these two conjectures are in fact completely equivalent: proving either one will prove the other.

At first sight, this doesn't seem to have simplified things much. But the fact that all the irreps of abelian groups are 1d makes this more useful than might appear. Another group-theoretic concept I vaguely remembered was the "characters" of a representation, which are the traces of the matrices in the representation: \(\chi_\varphi(g) = \mathrm{tr}\;\varphi(g)\).

The trace of a matrix is the sum of its diagonal elements.7 7Despite appearances, the trace of a matrix is basis-independent: it doesn't change if you conjugate by a unitary, or indeed apply any other similarity transformation. In the special case of a 1d representation, there is only one matrix element to sum. So for 1d representations, representations and characters are exactly the same thing. Which means we can also make use of results about characters here.

I dimly remembered that there were various "orthogonality theorems" in representation theory that involved representations and their characters. But I don't have these orthogonality theorems committed to memory, so I had to look them up. Lo and behold, the "character row-orthogonality theorem" tells us:

(Character row-orthogonality theorem)

For any irreducible representations \(\lambda\) and \(\mu\) of a group \(G\) with characters \(\chi_\lambda(g)\) and \(\chi_\mu(g)\),

\begin{equation*} \frac{1}{|G|} \sum_{g\in G} \chi_\lambda(g)\chi_\mu(g)^* = \delta_{\lambda,\mu} \end{equation*}

where

\begin{equation*} \delta_{\lambda,\mu} = \begin{cases} 1 \quad & \lambda = \mu \\ 0 & \lambda \neq \mu \end{cases}. \end{equation*}

As soon as I saw this, I was very happy. Because if we take the special case where \(\mu\) is the trivial representation, i.e. the one that represents all group elements by 1 (which is always a representation of any group), the row-orthogonality theorem reduces to:

\begin{equation*} \label{orgf7622bb} \frac{1}{|G|} \sum_g \chi_\lambda(g) = \begin{cases} 1 \quad & \lambda = \text{trivial representation} \\ 0 & \text{otherwise}. \end{cases} \end{equation*}

With this, we're in fact done proving the conjecture for abelian groups!

Why? This equation says that the sum over the characters of any irrep is either 0 or 1. We've seen that characters of irreps of abelian groups are the same thing as the irreps themselves. So the sum over any irrep is also either 0 or 1. The sum over the irrep decomposition of a general representation must therefore be a diagonal matrix with only 0s and 1s down the diagonal. But it's easy to see that any such matrix squares to itself, hence is a projector. And we already saw that proving the conjecture for the irrep decomposition is equivalent to proving it in general, so we're done.

General groups

Since stabilizers are a special type of abelian group, the proof for abelian groups answers the original question about stabilizers (and more). I could have stopped there. But having proven the conjecture for abelian groups, I was curious whether it was also true for all groups. The proof I'd constructed certainly made use of special properties of abelian groups (namely, that their irreps are the same as their characters). But was this essential to the result?

You haven't fully understood a mathematical result until you've understood how general it is. Or whether your proof is exploiting special properties that aren't actually required, masking some deeper mathematical reason the result is true.

At this point, since I already had an answer to the original question about stabilizers, I got lazy, googled "sum group representation" (or something like that), found this math.stackexchange answer…and kicked myself for not having thought about it a little more before googling.

Remember that observation I made at the very beginning, that \(\sum_g \varphi(g)\) commutes with everything? The one I parked because I didn't immediately see how to make use if it? All that was needed was to apply it to the representation's irrep decomposition, instead of the representation itself. Here's the complete proof, building on that math.stackexchange answer:

Let \(\varphi_\lambda\) be any irreducible representation of a group \(G\), and let \(P_\lambda := \frac{1}{|G|} \sum_{g\in G} \varphi_\lambda(g)\) to save writing out the same sum over and over again. Our earlier observation applies just as well to \(\varphi_\lambda\) as to any other representation. Namely, for any group element \(g\in G\),

\begin{equation}\label{eq:commute} \varphi_\lambda(g) P_\lambda = P_\lambda = P_\lambda \varphi_\lambda(g) \end{equation}

by the Rearrangement theorem. In particular, this means \(P_\lambda\) commutes with all \(\varphi_\lambda(g)\). But now we're dealing with an irreducible representation! So Schur's lemma tells us that

\begin{equation*} P_\lambda = c I \end{equation*}

for some scalar \(c\).

Substituting this in both sides of eq. \eqref{eq:commute} gives

\begin{equation*} \varphi_\lambda(g) c I = c I. \end{equation*}

The only way to satisfy this equation is either to have \(c=0\), hence \(P_\lambda = 0\). Or to have \(\varphi_\lambda(g) = I\) for all group elements \(g\), which implies \(\varphi_\lambda\) is the trivial representation. So we have

\begin{equation}\label{eq:irrepsums} P_\lambda(g) := \frac{1}{|G|} \sum_{g\in G} \varphi_\lambda(g) = \begin{cases} 1 & \lambda = \text{ trivial representation} \\ 0 & \text{otherwise}. \end{cases} \end{equation}

This should look very familiar! It's exactly the result we arrived at for abelian groups using the character row-orthogonality theorem (remember when comparing these two results that irreps and characters are the same thing for abelian groups). Except we've now shown that it holds for any group.

From here, the argument is exactly the same as in the abelian case: we already know it's enough to prove the result for the irrep decomposition, applying eq. \eqref{eq:irrepsums} to the irrep decomposition shows it must sum to a diagonal matrix with 1s and 0s down the diagonal, and any such matrix is a projector.

The Pauli group

Our blackboard discussion ended with this proof for general groups. But the story doesn't quite end there.

The general proof seems simpler than the proof I originally came up with for the special case of abelian groups. But it still uses a crafty Schur's lemma trick which I got from stackexchange. Writing this blog post made me wonder if there was an even simpler proof for the original conjecture about stabilizers.

I'd shied away from focusing on stabilizers during our blackboard discussion, because at the time the most useful special property of stabilizers that occurred to me was that they were abelian. Later, I realised that I'd overlooked some other rather special properties of the Pauli group. Such as the fact that all elements of the Pauli group square to the identity, so are self-inverse, which implies that all their eigenvalues are \(\pm 1\).

Here's the best proof I could come up with that exploits special properties of the Pauli group. In some ways this proof is more "elementary" (it only uses maths you might conceivably have learned at school.) But it's debatable whether it's simpler than the proof for general groups. I strongly suspect there must be an even simpler way to prove it for stabilizers. But you'll see in a moment that that isn't really the point of the story…

Recall that we're trying to prove \(P := \frac{1}{|S|} \sum_i S_i\) is a projector for any stabilizer \(S = \{S_i\}\). Since the \(S_i\) all commute, they have a common eigenbasis which I'll label8 8I'll use Dirac notation for vectors, since this question came from quantum information theory. If you don't like Dirac notation, just replace \(|\psi\rangle\) by \(\vec{\psi}\), or whatever your favourite notation for vectors is. \(\{|\psi_k\rangle\}\). Since elements of the Pauli group are unitaries, this is even an orthonormal basis. Since the \(S_i\) have eigenvalues \(\pm 1\), each \(|\psi_k\rangle\) is a +1 eigenvector of some of the \(S_i\) and a -1 eigenvector of the rest.

First, consider the case of an eigenvector \(|\psi_k\rangle\) which is a +1 eigenvector9 9In quantum error correction, a stabilizer code is precisely the subspace spanned by the +1 eigenvectors, i.e. the common +1 eigenspace of all elements of a stabilizer. of all of the \(S_i\). In this case,

\begin{equation*} P |\psi_k\rangle = \frac{1}{|S|} \sum_i S_i |\psi_k\rangle = \frac{1}{|S|} \sum_i |\psi_k\rangle = |\psi_k\rangle. \end{equation*}

So \(P\) just acts as the identity on these eigenvectors.

If I could show that \(P |\psi_k\rangle = 0\) for all other eigenvectors, i.e. for any \(|\psi_k\rangle\) that is not a +1 eigenvector of every \(S_i\), then we'd be done. Because this would imply that \(P\) written in the \(\{|\psi_k\rangle\}\) basis is a diagonal matrix of 1s and 0s, which is a projector (as we've said many times by now!)

Let's expand out \(P |\psi_k\rangle\) – the thing we're trying to show is equal to 0. Since \(|\psi_k\rangle\) is a \(\pm 1\) eigenvector of each \(S_i\), we get

\begin{equation*} P |\psi_k\rangle = \frac{1}{|S|} \sum_i S_i |\psi_k\rangle = \sum_i (\pm 1 |\psi_k\rangle). \end{equation*}

If this is going to sum to 0, we need equal numbers of +1s and -1s.

This isn't too difficult to show, given what we know about stabilizers and the Pauli group. Let's call the set of \(S_i\) for which \(|\psi_k\rangle\) is a +1 eigenvector \(M := \left\{S_i\; |\; S_i |\psi_k\rangle = |\psi_k\rangle\right\}\), and the set for which it's a -1 eigenvector \(N := \left\{S_i\; |\; S_i |\psi_k\rangle = -|\psi_k\rangle\right\}\). If \(N\) is empty, \(|\psi_k\rangle\) is a +1 eigenvector of every \(S_i\) – a case we've already dealt with. So let's assume \(N\) contains at least one element \(S_n\).

Multiplying any stabilizer element \(S_i\) by \(S_n\) flips the sign of the \(|\psi_k\rangle\) eigenvalue (remember that \(S_i\) and \(S_n\) commute because they're stabilizer elements):

\begin{equation*} S_n S_i |\psi_k\rangle = S_i S_n |\psi_k\rangle = -S_i |\psi_k\rangle. \end{equation*}

So multiplying all \(\{S_i\}\) by \(S_n\) swaps the sets \(M\) and \(N\).

We also know from the Rearrangement theorem that multiplying all \(\{S_i\}\) by \(S_n\) just "reshuffles" the group elements, still leaving us with one copy of each element. In other words, elements that are related by (left) multiplication by \(S_n\) are in one-to-one correspondence. Since this gives a one-to-one correspondence between the elements of \(M\) and \(N\), they must have the same number of elements: \(|M| = |N|\). I.e. \(|\psi_k\rangle\) is a +1 eigenvector of exactly half of the \(S_i\), and a -1 eigenvector of the other half. So for any \(|\psi_k\rangle\) that isn't a +1 eigenvector of all the \(S_i\),

\begin{equation*} P |\psi_k\rangle = \sum_i S_i |\psi_k\rangle = \frac{|S|}{2} |\psi_k\rangle - \frac{|S|}{2} |\psi_k\rangle = 0, \end{equation*}

which is exactly what we wanted to show to finish proving that \(P\) is a projector.

General groups (take 2)

The proof for the Pauli group is perhaps more elementary (i.e. uses slightly less sophisticated maths), but the general proof is certainly simpler and more elegant. And Schur's lemma is one of the first results you learn in representation theory, so hardly counts as extremely advanced. I would have considered my attempt to come up with a simpler proof for the special case of the Pauli group a failure, and not told you about it…except that in the process of coming up with it, I realised there's a much simpler, one-line proof of the general case using just the Rearrangement theorem and nothing else!

Remember the definition of a projector: a matrix that squares to itself. So, once again letting \(P := \frac{1}{|G|} \sum_{g\in G} \varphi(g)\), we want to prove that \(P^2 = P\) for any representation \(\varphi\) of any group \(G\).

Expanding out the square and using the Rearrangement theorem, we can prove this very easily:

\begin{align*} \left(\frac{1}{|G|} \sum_{g\in G} \varphi(g)\right)^2 &= \frac{1}{|G|^2} \left(\sum_{h\in G} \varphi(h)\right) \left(\sum_{g\in G} \varphi(g)\right) \\ &= \frac{1}{|G|^2} \sum_{h\in G} \sum_{g\in G} \varphi(h)\varphi(g) \\ &= \frac{1}{|G|^2} \sum_{h\in G} \sum_{g'\in G} \varphi(g') \quad \text{by the Rearrangement theorem} \\ &= \frac{1}{|G|^2} |G| \sum_{g'\in G} \varphi(g') \\ &= \frac{1}{|G|} \sum_{g\in G} \varphi(g). \end{align*}

OK, not quite one-line. But that's only because I wrote out each individual step separately for clarity. In a research paper or textbook, one would almost certainly leave out the first and fourth lines, and I claim the remaining three equalities would fit on one line of a printed page.

Conclusions

This result is well-known, certainly in its quantum information theory guise. And better mathematicians than me would undoubtedly have spotted the simple, one-line proof right away.

But the convoluted way I arrived at that final, simple proof closely mirrors the way proofs typically emerge in "real" research – at least my research! A messy mixture of gut instinct, numerics, approaches suggested by bits of maths I happen to remember, going fishing in google with likely-sounding keywords. Changing the problem along the way to go for a more general result; or for a special case, as the case may be. Flip-flopping between trying to prove the result and trying to construct a counterexample.10 10Not much flip-flopping in this case, because I got lucky and not only backed the right side initially, but also came up with an ugly first proof fairly quickly. Very, very occasionally this happens in my research. Far more often, I spend months (seemingly) bashing my head against a wall, failing to either prove the result or to construct a counterexample. Usually the key to making progress is taking a break from thinking about the problem. Coming up with an initial, ugly proof that gets the job done. Coming up with a second, cleaner, more elegant proof. Coming back to the problem later and suddenly spotting a new approach that gives a much simpler proof. Some or all of this features in almost every piece of research I've done.

That final, elegantly simple proof is invariably all that makes it into the published paper or textbook. Sometimes I wonder if research papers and textbooks skip the most interesting parts.

Comments

Abhinav

26 April 2018 Nice post! I believe the process of seeing someone 'think aloud' is very helpful pedagogically for a student, and I cherish those courses I take where professors follow this method. Real research is messy, involves flip-flopping both about what the answer likely is and the best way to go about proving it, and far from being neatly wrapped up the way it is presented in papers. Seeing this post is reassuring that even advanced researchers go through the same process. I would love to see more such posts. I have been asking people how they arrive at a mathematical statement they would like to prove/disprove and would like to know your thoughts. For example, do you go: "I wonder if there are families of Hamiltonians for which the spectral gap is uncomputable…", or is it more of something you hit upon by noticing something interesting elsewhere and relating it to a body of literature you are already an expert in?

Toby Cubitt

25 August 2020 @Abhinav #1: I started writing a long blog post in reply to this. Sadly, it's remained half-written. I still hope someday to find time to finish it. In the meantime, the story of how we came to work on undecidability of the spectral gap is told in my Scientific American article "The Unsolvable Problem" (paywall), the text of which is also available via UCL Discovery (open access).

Leave a comment

All comments are moderated. Clicking submit will open your email client and let you send your comment by email. By submitting your comment you agree to license the content under a Creative Commons Attribution-ShareAlike 4.0 International License.




Creative Commons License