Connecting mathematics and philosophy

Alain Badiou, part II

[I recently stumbled across this series of video lectures on Being and Event and was quite impressed by the presenter’s clarity and passion. Unfortunately they are somewhat marred by the presence of one asshole who is constantly interjecting with pretentious comments; but I chalked this up to the ubiquity of “that kid” in any conference class. I was understandably upset when the asshole turned out to be the lecturer for Parts II and III of the book. At any rate, I recommend the lecture on the first part as a great introduction to Badiou’s program, which after all is very ambitious and takes a good half-hour to summarize coherently.]

In the last post we looked at how Badiou thinks math gets Being right- that mathematics is ontology. Another major theme of Being and Event is of course the event. Unsurprisingly, Badiou also thinks that math has a lot to tell us about what the event must be as well.

Badiou’s event

What doe Badiou mean by the event? He is not talking about everyday happenings (the sum of which I think is what he terms “history”); rather Badiou means revolutionary events in any of the conditions of art, love, politics, or science. Examples include the development of atonal music under Schoenberg, the development of set theory under Cantor and Frege, and, Badiou’s favorite example, the French Revolution. (Badiou never gives an example of revolutionary love, or indeed has much to say about this condition in Being and Event; but I recommend the delicious and short interview In Praise of Love for anyone who wants a refreshing perspective on the possibility of real human contact.)

To Badiou, the event is ontologically interesting because it contains itself. Consider the French Revolution. The Revolution contains more than just the Declaration of Rights of Man and of Citizen, the assassination of Jean-Paul Marat, and so on; it contains more than just all the happenings in France between 1789 and 1799; it contains, in an important way, itself. Even as it was ongoing, participants would refer the French Revolution by name as conceived whole. In this way, the French Revolution is different from an arbitrary collection like the set that contains Alaska, the Moon, and Plato. The participants were aware that they were creating a particular Revolution; compare this to the way an art movement like Impressionism is only fully formed when it acquires its name and artists identify themselves as belonging (or not) to the movement. Movements that fail to be revolutionary often dissolve into merely a collection of happenings.

Badiou denotes the event X as follows: X = {eX, X}. Here eX is the evental site (e.g. France 1789-1799), and X is the proper name of the event. Badiou wants to say that the event is a paradoxical object because of how it belongs to itself.

Self-containment and Russell’s paradox

Early set theorists (Cantor, Frege) had a naive conception of set whereby a set is merely the collection of entities picked out by a concept, and any concept defines some (possibly empty) set. This is after all how most working mathematician still conceive of sets when they use, e.g., the set of prime numbers. But such a permissive definition in fact leads to many famous paradoxes, chief among them Russell’s paradox.

Russell’s paradox precedes as follows. Consider the concept ‘is not an element of itself.’ Surely this is a valid property, one which most ‘normal’ sets (like the set of all prime numbers) enjoy. However, consider the potential set S corresponding to this property; that is, let S be the set of all sets that don’t contain themselves. The paradox arises from asking whether S contains itself. If it contains itself, it ought to not; and if it doesn’t, it should. Another way of putting it is as follows: the Barber of Seville cuts the beards of all men who do not cut their own beard. Does the Barber of Seville cut his own beard? (Solution: the Barber of Seville is a woman.)

Russell’s paradox is the basis of all analytic philosophy. This is not because it is especially profound; in fact, it is an inane and obvious problem (so obvious that apparently Zermelo knew of it before Russell communicated it to Frege in 1903.) Indeed, Russell’s paradox leads to analytic thinking precisely because it so inanely demonstrates that the use of naive language in any field (even math!- where we would expect things to be clearest) is suspect and likely riddled with incoherence and vagueness. Thus we must carefully build up an ideal language free of paradoxes, full of clarity.

It is sometimes said that Russell’s paradox shows self-containment to be an inherently contradictory concept. But this is not quite right. Russell’s solution to his paradox was to create a system of types for sets so that sets can only contain sets of lower type; this is like saying rank 1 barbers can cut the beards of rank 0 barbers only, and rank 2 barbers can cut the beards of rank 1 or 0 barbers, and so on. So Russell certainly wants to outlaw self-containment. However, Zermelo put forward a simpler and more lasting solution, which is to allow properties not to define sets but only to carve out sets from other sets. For instance, if A is some already existing set, we can let B be the set of all things in A which have property P. Now letting P = ‘does not contain itself’ is no problem.

But note that this separation axiom does not explicitly disallow self-containment. It is only another axiom, another choice on the part of the set theorist, called the Axiom of Foundation, that says no set can contain itself. Whereas separation is constantly used in ‘everyday’ math, the axiom of foundation is almost never used outside of logic and its truth is not at all self-evident.

What does this mean for Badiou?

Badiou wants to say the event is not a part of Being because it is self-containing and self-containment is disallowed by set theory, the language of ontology. It would be better to say that self-containment is outlawed by certain set theories like ZFC (the most commonly used one.) However other set theories like Quine’s New Foundations allow for a universal ‘set of all sets’ that in particular contains itself.

Badiou recognizes all this, I think. (Certainly New Foundations is mentioned in the introduction.) He would say that the Axiom of Foundation is the right choice to make; it is a true axiom that no structure contains itself. Perhaps. But often it would be convenient in math to think of a structure as containing itself (see for instance this recent question on a forum dedicated to research math questions.) And self-similarity is a very important concept in math; indeed, the correct definition of an infinite set is one which is similar (in number) to a part of itself.

What’s next

Right now I’m reading Badiou’s Number and Numbers which is more straightforwardly a book on the philosophy of mathematics. In it, Badiou proposes that the surreal numbers of John Conway finally describe all that is meant by “number.” I may talk about the surreal numbers in a future post, because at any rate they are a fun concept and are more related to combinatorics (my research interest) than a lot of what I’ve talked about lately.

I also hope to discuss the philosophical writings of Gian-Carlo Rota at some point, but first I have to read them!

Alain Badiou, part I

In a major departure from the largely analytic discussions in my previous post, today I’m gonna talk about a philosopher who draws as much inspiration from Martin Heidegger as from modern mathematics: Alain Badiou. What I find interesting in Badiou is that he does engage with modern math- indeed, he says the “mathematico-logical revolution of Frege-Cantor sets new orientations for thought”; this declaration addresses one of the three trajectories of modern philosophical thought Badiou is concerned with, with the other two being the ontological question as posed by Heidegger and the new theory of subject visible in Marx, Freud, and Lacan. Badiou does more than engage with math on a surface level. It is clear that he has read and understood difficult proofs in set theory of Cantor, Gödel, Skolem, Cohen, etc., at least in broad outline. And he uses specific results like the independence of the continuum hypothesis from the accepted axioms to draw conclusions about the possibility of revolutionary events in politics, art, science, and love.

(My assessment of Alain Badiou is entirely based off my reading of Being and Event, supposedly his magnum opus.)

On the one hand, the area of mathematics that Badiou centers in on (set theory and foundations) is well-worn territory for philosophers. But what he tries to do with set theory is so different from what analytic philosophers have done. He often derides analytic philosophers (whom he variously identifies as “logical positivists”, “the Americans”, “Anglo-Saxons”, and even sometimes “working mathematicians” or “Boubakists“) for considering the classical problems of philosophy as vacuous “language games.” These big problems interest him (as they do me!) But Badiou recognizes that logic and math are powerful tools and wants to end the American monopoly on logic in philosophy. Badiou thinks the division between continental and analytic thought is senseless. He attempts to use the methods of analytic philosophy (e.g. formal logic) to answer the questions of continental philosophy (e.g. ontology). This is clear from the thesis of Being and Event: mathematics is ontology.

What the hell does this mean? It is a bold response to Heidegger’s ontological question. Heidegger saw that traditional philosophy often investigated particular beings without critically investigating the more basic problem of Being: how it is possible that things be at all. And Badiou responds by saying, well, mathematicians have thought about this. They have a system whereby something is built up out of nothing; this system is set theory. Of course, Heidegger would hate this response; he was critical of the Cartesian viewpoint for encountering the world “with its skin off”, and set theory is Cartesian abstraction to the extreme.

Maybe it is best to quote Badiou himself explaining what his thesis means:

The thesis that I support does not in any way declare that being is mathematical, which is to say composed of mathematical objectivities. It is not a thesis about the world but about discourse. It affirms that mathematics, throughout the entirety of its historical becoming, pronounces what is expressible of being qua being. Far from reducing itself to tautologies (being is that which is) or to mysteries (a perpetually postponed approximation of a Presence), ontology is a rich, complex, unfinishable science, submitted to the difficult constraint of a fidelity (deductive fidelity in this case). As such, in merely trying to organize the discourse of what subtracts itself from any presentation, one faces an infinite and rigorous task.

Our conception of how things be has to obey the rules of set theory. What is set theory? Loosely, it deals with sets, which are just collections of objects; for example, the collection of all red socks is a set. Of course, sets can contain other sets as members, and in the formal axiomatization of set theory the term set remains undefined. Rather, we say which sets much exist and how they must behave. For instance, one rule of set theory is that if A and B are sets, then there is a set that is the union of A and B, that is, one which contains every element in either A or B. Normally we use brackets to denote sets, so for instance {1,2,3} is the set that contains the first three counting numbers. Badiou claims these rules, discovered by mathematicians searching for a foundational and universal language of discourse for describing all mathematical entities, are not just technical tricks that allow working mathematicians to describe what they want to describe, but actually give meaning to Being in general. I will now give a specific example of set theory applied to ontology, because what I’m saying is more easily appreciated with concrete cases.

The empty set and multiplicity


The problem of the One and the Multiple is a classic philosophical problem; here is how Badiou describes it: “what presents itself is essentially multiple; what presents itself is essentially one.” What is the ‘what’ here? I think it is best, considering the above block-quote, to let our domain of ‘what’s be concepts. Badiou’s point is clear: in so far as a concept shows itself to us, it only shows us its parts, its multiplicity; but in so far as a concept is a concept at all, it has to be a single thing, a one. I agree with Badiou that set theory easily resolves this ‘paradox’ (which is not really paradoxical, but maybe a bit quizzical) through the empty set.

The empty set, denoted ∅, is the set that contains no elements. It is not ‘nothing’ but rather, the set that contains nothing, the set of all four-sided triangles, the set of all living Kings of France, and so on. Sometimes the existence of the empty set is assumed by an axiom, but at any rate as long as any set at all exists, the empty set must exist. And one we have the empty set, we can create the set that contains the empty set, {∅}, which is different from the empty set. Let’s identify zero with the empty set, i.e. let’s set O = ∅. The one can be the set of the empty set, i.e. 1 = {0} = {∅}. Continuing in this way, 2 = {0,1} = {∅,{∅}}, 3 = {0,1,2} = {∅,{∅},{∅,{∅}}} and so on. The numbers 0,1,2,3 are now Ones and Multiples: they are unique sets and thus are one, but they all only contain other sets as entities and thus are nested multiples. This process by which we only consider sets of sets, built ultimately out of the empty set, yields the von Neumann universe, a model of, but also a motivation for, standard set theory.

Georg Cantor is the father of set theory. Before him, no mathematician thought there was anything interesting to be said about collections of objects in general. Cantor showed this view to be utterly false; as Badiou puts it, set theory is “a rich, complex, unfinishable science.” One of the first things Cantor showed was that there are different sizes of infinite sets, that the real numbers are in some sense more infinite than the counting numbers. Already this should shake our faith in the obvious reality of mathematics. Badiou is right to see a similarity in kind between Cantor and Heidegger: they both questioned the foundational assumptions of their field and brought forth a wave of skepticism. Before Cantor, mathematical Platonism was ubiquitous, but that completely changed as a result of his consideration of mathematical Being. In the next few posts I want to talk more about what mathematicians might have to learn from Badiou’s writings.

Computational complexity, part II

In 1994, Peter Shor shocked the computer science community by demonstrating that a new kind of machine, a quantum computer, if it were ever built, could quickly factor integers. The insolubility of the integer factorization problem is the basis of most online cryptography systems. This means that if quantum computers were ever realized, spies could open your emails, read your credit card numbers, and so on- not to mention steal state secrets. Major cryptography organizations have literally bet hundreds of thousand of dollars that big numbers can’t be factored. But these practical considerations pale in comparison to the philosophical issues that quantum computing raises.

Quantum computing sits at the intersection of mathematics, computer science, and physics; the confluence of these varied fields is what is so remarkable and new in quantum computing- although the basic physical theory upon which it rests (quantum mechanics) dates back to the ’20s, it was only in the ’80s that theorists began wondering whether quantum effects might speed up computation. To the best of our knowledge, they do! Quantum computing represents the first computational model where our understanding of the physical world has informed what we think is efficiently computable, and therefore represents the first serious challenge to the Extended Church-Turing Thesis.

Recall from a few blog posts ago that the Church-Turing Thesis is the claim that a problem is decidable by mechanical means if and only if it is decidable by a Turing machine, i.e. any other model of computation is equivalent to a Turing machine in terms of the problems it can solve. Few have ever seriously disputed the basic Church-Turing Thesis. The Extended Church-Turing Thesis is the stronger claim that any reasonable model of computation is polynomial-time equivalent to a Turing machine. The Extended Thesis says that it shouldn’t matter whether I’m working on an electronic computer or one powered by steam (or doing the math by hand, for that matter): tractable problems stay tractable and intractable ones remain intractable. Although there have been some bizarre assertions that soap bubbles can quickly solve NP-complete problems, the Extended Thesis seemed unassailable before quantum computing.

Quantum computing

What exactly is quantum computing? Instead of formally describing the model and explaining the Von Neumann axioms of quantum mechanics that it relies on, I’ll give a broad overview that hopefully illuminates the ideas behind it. A classical computer stores information as bits, which are 0s or 1s. By encoding numbers in binary, a classical computer can store arbitrary numeric information as bits. A quantum computer stores information in qubits, which can store the classical values 0 or 1, but also can take values “in-between” 0 and 1. A qubit is mathematically notated as a vector |φ⟩ = α |0⟩ + β |1⟩. Here α and β are complex numbers which we call amplitudes, and so a single qubit could be said to store an infinite amount of information. Hoever, this last assertion is extremely misleading because we are not free to arbitrarily access the amplitudes of our qubits. Rather, when we measure a qubit, we obtain only classical information (either 0 or 1). The real values |α|2 and |β|2 determine the probabilities of measuring 0 or 1 respectively. One way of representing a qubit is with the Bloch sphere, pictured below. Although the qubit may lie anywhere on this sphere before we observe it, at measurement it collapses to the north or south pole:


The incorrect account of quantum computing contends that a quantum algorithm can solve an intractable problem because “it tests all the possibilities at once.” But if we used a single system of qubits to compute all possible solutions to some problem and then attempted to read off the solution by measuring this system, instead we would only measure the result of a random computation. This is no better than the classical probabilistic approach of randomly guessing the answer and checking if it works. However, quantum computers have one tremendous advantage over classical probabilistic computers: unlike classical probabilities, which are nonnegative reals, the amplitudes of a quantum state are complex numbers and thus can cancel out. The true explanation for why quantum algorithms work is that they exploit the structure of the problems they solve and cleverly manipulate their quantum states so that the amplitudes of bad solutions destructively interfere while the amplitudes of the correct solutions constructively interfere.

Integer factorization, one of the few problems that is believed to be classically intractable but known to have an efficient quantum solution, nevertheless does have some structure because the integers have an algebraic structure. This is hinted at by results like Fermat’s little theorem that I mentioned in the last blog post on complexity. If you want a nice, low-level explanation of what is going on in Shor’s algorithm, check out this blog post by Scott Aaronson; but I hope I have at least hinted at how the collusion of qubits can allow for a computational speed-up for problems when there is some structure to work with.

Does quantum computing tell us something about quantum mechanics?

What I’ve talked about so far is how our knowledge of the physical world can inform our mathematical (and philosophical) conclusions about what kinds of problems are efficiently decidable. Already quantum computing is pretty remarkable in this respect, because back in the ’30s when computation was first being explored, it was seen as a philosophical question that ought have nothing to do with what kind of world we live in. But we might flip perspectives and ask if our mathematical theorems and intuition related to computation can tell us anything about quantum mechanics.

Physicists have been interested in interpreting quantum mechanics since as long as they have studied quantum phenomenon. The so-called Copenhagen interpretation, perhaps the most standard account, contends that the act of measurement causes a collapse of the quantum world to the classical one and that quantum states only describe probabilities of measurement. An alternative view is the many-worlds interpretation, which denies that quantum collapse ever happens and instead contends that all possible pasts and futures that quantum mechanics allows for are real. David Deutsch, one of the original proponents of quantum computing, believes that the spectacular results of quantum computing lend credence to the many-worlds interpretation:

Logically, the possibility of complex quantum computations adds nothing to a case [for the Many-Worlds Interpretation] that is already unanswerable. But it does add psychological impact. With Shor’s algorithm, the argument has been writ very large. To those who still cling to a single-universe world-view, I issue this challenge: explain how Shor’s algorithm works. I do not merely mean predict that it will work, which is merely a matter of solving a few uncontroversial equations. I mean provide an explanation. When Shor’s algorithm has factorized a number, using 10500 or so times the computational resources that can be seen to be present, where was the number factorized? There are only about 1080 atoms in the entire visible universe, an utterly minuscule number compared with 10500. So if the visible universe were the extent of physical reality, physical reality would not even remotely contain the resources required to factorize such a large number. Who did factorize it, then? How, and where, was the computation performed?

But I think Aaronson has a very nice response to this: why couldn’t it just be that there is only one world, and that world is fundamentally quantum? We can’t rely on our intuition of what is classically efficiently computable to lend us intuition about what the quantum world is like. While it is true that quantum computing involves exponential speed-ups, that might represent a kind of exponentiality inherent in the quantum world and require no appeal to an exponential number of worlds.

Here’s another way that computational complexity could inform our physical theories: the impossibility of solving NP-complete problems in polynomial time should be an a priori requirement of physical theories in the same way that the second law of thermodynamics or the impossibility of superluminal communication are. Aaronson has put forward this view, and I like it a lot because of its boldness. Important, there are no known quantum algorithms for solving NP-complete problems! Integer factorization is thought to be harder than P but not so hard as NP-complete.

But, to play the devil’s advocate, I have to pose the following: what if some brilliant mathematician were to come up with an efficient quantum solution to an NP-complete problem? What would Scott Aaronson say? He’s often defends quantum computing against myriad skeptics who range from cranks to very intelligent and nuanced thinkers like Gil Kalai. I think he would have to reject the possibility of a quantum computer in such a case. But why is NP-complete so important? Why doesn’t the assumed difficulty of integer factorization gives us a reason to doubt the possibility of quantum computing? The best response I can think of to this question makes some reference to ideas of “structure” and lack of structure that I had been talking about in loose terms throughout my discussion of complexity. Making rigorous the distinction between structured and unstructured problems seems immensely difficult but important, and of course lies at the heart of the million dollar question.

Traveling Salesman…

This week I’m visiting open houses at various math PhD programs, so I don’t have time for an update. Next week I’ll blog about quantum computing, the subject of my undergraduate thesis at Reed College.

Computational complexity, part I

Logicomix ends with the narrators attending a performance of the Oresteia. Aeschylus’s trilogy becomes a metaphor for the quest for the foundations of mathematics: although formally a tragedy, and although it involves a great deal of bloodshed, the Oresteia ends on a happy note in which Orestes is reprieved by a popular vote and democracy is introduced to Athens. [Warning: this is a very simple reading of a complex piece of literature from a culture far removed from our own.] The theorems of Gödel and Turing, which say that not all true statements in arithmetic are provably true and those which can be proven cannot be algorithmically separated from those which can’t, might be seen as a tragic resolution to Hilbert’s foundational quest. However, similarly to the Oresteia, out of this tragedy comes a new hope: the field of computer science is born from considerations of what one can do “algorithmically” and eventually actual machines are built which take the place of Turing’s theoretical constructions.

Regardless of how well the metaphor works literarily, the point is apt: after mathematicians discovered that certain problems cannot be mechanically decided, they started to look into which problems could be, and in particular built machines which now solve thousands of practical problems. Once you start to worry about solving practical problems with a computer, you immediately have to wonder about not only whether a question is in principle decidable but also how much space and time it takes to decide it. As Scott Aaronson points out, this might seem to be only a pragmatic concern. However, natural problems seem to divide between those that are tractable and those that are intractable, and this gap caries philosophical import. Before I touch on that import, however, I need to explain the mathematics.

The P versus NP problem

[This video by Michael Sipser is a wonderful introduction to the P versus NP problem and you can watch it instead of or as a supplement to the following.]

Problems that admit of solutions whose running times scale reasonably with input size are in complexity class P, which stands for “polynomial time.” On an input of size n, a problem in P can be solved in time like n or n2 or in general nc for some constant c. Problems in P include multiplying numbers together and optimizing over linear constraints. These are the problems that computers are great at; these problems are considered tractable.

NP (which stands for “nondeterministic polynomial time” and not “not polynomial time”) is the class of problems whose solutions can be verified quickly. As an example, say I give you a map of the major cities of some country and ask if there is a loop that connects all of them using less than 1000 kilometers total. The best known approaches to this problem are little better than the naive approach of exhaustively searching all possible loops. Such solutions take time that grows exponentially with the size of the map. However, it is easy to verify that there is such a loop if I give it to you: all you have to do is add up the distances of the steps in the loop. The best such loop visiting the approximately 25,000 cities of Sweden was recently resolved via intense computation. I think the following image will convince you that the optimal path does not obviously jump out at you:


For another example of a problem in NP, consider the reverse problem of multiplying a series of numbers- this is the problem of breaking an integer into its prime factors. The best approaches to integer factorization don’t do much better than the naive approach of checking if any of the numbers smaller than the input number divide it. However, it is easy to verify that a factorization is correct if I give it to you because all you have to do is multiply the factors together. Integer factorization is a hugely important problem because the major online cryptography systems rely on its being intractable; it will come up again in my next blog post when I talk about the potential of quantum computation.

The question of whether the class P is the same as the class NP is the greatest open problem in computer science. Indeed, the problem comes with a million dollar award to anyone who can solve it. It is even the subject of this feature film (which I have not seen and which looks cheesy)!

It is widely believed that P does not equal NP, i.e. that some search problems essentially require exhaustive search. Aaronson makes the analogy that if P were to equal NP, this would be like saying that the ability to appreciate a great symphony implies the ability to compose one. It is worth stressing how great a divide there is between polynomial-time solutions and exponential-time solutions: this is often lucidly demonstrated by a professor starting to draw the exponential function ex on the left side of a standard blackboard and then telling the class that by the right side the function will have reached the moon. For input sizes well within the realm of practical import, exponential-time solutions require more time than there are particles in the universe.

It seems obvious to many that these unstructured search problems are intractable. What choice do we have but to consider all possibilities? So then why is it so hard for us to prove that this is the case? Well, to prove that P is not equal to NP we have to rule out cleverness; we have to say that no cleverly efficient algorithm can find a hidden structure in our search problems. And sometimes, there is such a hidden structure.

For example, consider a similar problem to integer factorization: the problem of deciding whether a number is prime. It turns out that this problem does have an efficient solution (although this was only proven in 2002). To get an idea of why that might be the case, consider the following result about prime numbers, called Fermat’s little theorem, which dates back to the 17th century:

If p is prime and a is any number, then the remainder of ap-1 when divided by p is 1.

If p is not prime, then it is very unlikely that this remainder will be 1 for a random choice of a. So by picking a few random a, we can convince ourselves with high probability that p is prime or it isn’t. “Randomness” is not allowed for Turing machines, so it takes a lot of work to turn an idea like this into a fast, deterministic algorithm, but at any rate it is possible and shows that certain problems may have non-obvious solutions.

But primality testing is just one example, and surely it does not suggest other NP problems are actually P problems. However, there is one single problem in NP called boolean satisfiability, involving some technical set-up about logical formulae and variables, for which an efficient solution implies an efficient solution to any NP problem. Such a problem is called NP-complete. After boolean satisfiability was shown to be NP-complete, several other problems were shown also to be NP-complete via the method of reduction.

Reduction involves giving a polynomial-time algorithm to translate one problem of type A into another of type B: for each possible instance of a type-A problem, reduction gives a instance of a type-B problem that allows us to read off an answer for the type-A problem. This often involves the use of gadgets, which are small bits of a B-type problem that correspond to other bits of an A-type problem. As an example, the following image comes from a paper about a problem of whether we can tile 2D shapes with Tetris-like pieces:

Screen shot 2013-03-03 at 2.40.09 AM

We can see how these three-pronged shapes represent logical connectives like AND and OR, and will allow a reduction of boolean satisfiability to the tiling problem. In a similar way, the loop-finding problem I mentioned earlier was shown to be NP-complete in the ’70s.

Philosophical issues

In his article’s introduction, Aaronson remarks that for a philosophical argument to truly involve computational complexity, it is important than it does more than merely mention complexity or the fact that there exist some hard problems. It has to use some mathematical ideas like P versus NP. However, I think Aaronson actually falls into this trap in one of the early examples he gives involving philosophy of the mind.

The philosophical debate is about the possibility of artificial intelligence. The Turing test, proposed by Alan Turing, says that we ought to call an entity intelligent if it could in a conversation convince another human that it was human. Maybe some of you have played with jabberwacky or similar “chat bots” that attempt to hold conversations with humans. This is basically what Turing imagined.

One objection to the test goes as follows: since humans decide whether another entity is intelligent in a few minutes if not a matter of seconds, there are only a finite (but huge) number of potential conversations humans could have in this time. If a computer simply stored all these conversations, it could look them up as needed and perfectly mimic a human.

Aaronson points out that the test could be amended by requiring that the computer use only polynomial resources and not exponential resources (where we are measuring against the conversation length, say) like these huge look-up tables. Such a requirement might force the computer to have subroutines corresponding to concepts and consciousness rather than just an enormous memory.

That’s a fine suggestion as far as it goes, but it does not really use any ideas from complexity theory because Aaronson does not suggest that for example humans can solve NP-problems efficiently (and thus are differentiated in this way from computers). In fact he thinks that computers are probably better at most math problems like these. And making mathematically rigorous notions like “capable of writing poetry” which we might hope would separate men and computers seems hopeless.

Reduction and waterfalls

But in another philosophical debate, also involving the philosophy of the mind, I think Aaronson offers a very nice application of some real mathematical ideas. Here the debate is about whether computations can ever be “about” anything or whether they must always be syntactic. Similar to Searle’s famous Chinese Room argument, the “waterfall argument” attempts to reject computabilism- the idea that the mind is a computer. It goes as follows. Any sufficiently complicated physical system, such as say a waterfall, translates via deterministic physical laws certain input (the state of the waterfall at one time) to certain output (the state at a later time). Because the waterfall contains so much information in it, we can come up with a way of encoding any problem, like finding good chess moves, into the waterfall: we simply set up the waterfall in a certain state via some translator and then wait five minutes and use the translator to read off the right move from the state of the waterfall. But it would be crazy to attribute intelligence to a waterfall! So computation doesn’t ever involve semantics.

Aaronson’s response is to point out that this “translator” from chess moves to waterfall states must be doing all of the computational work. We can see the translator as a reduction of the chess problem to the waterfall problem, except that it is overwhelming likely that this reduction does not take polynomial time. Indeed, if there were some efficient reduction of chess to waterfall, we would have to think that the waterfall (or at least the laws of physics that govern it) was really in some way knowledgable about chess. But because there is no such reduction, the waterfall is just acting chaotically and knows nothing.

The Halting Problem

At the beginning of the article by Scott Aaronson that I mentioned in the last blog post, Aaronson laments the fact that although philosophers have paid a lot of attention to results in computability theory that date back to the ’30s, they have not shown interest in recent developments in computational complexity theory. In order to work up to issues of complexity, which I agree deserve attention, it’s important for us to understand the history of theoretical computer science and in particular to discuss these fundamental results about computability to which Aaronson alludes.

Hilbert and foundations

David Hilbert, the greatest mathematician of the last century, founded the school of mathematical formalism: he preached that all mathematical statements were ultimately statements about the manipulation of formal strings of symbols. These symbols correspond informally to notions like “space” and “number”, but ultimately it is only the symbolic axioms and rules of inference that give us formal proofs. In 1899, Hilbert came up with a set of 21 axioms that completely characterize Euclidean geometry. Two thousand years earlier Euclid famously wrote down five axioms in an attempt to describe the behavior of points, lines and planes; but Hilbert showed that Euclid’s list was incomplete and that more and in particular more rigorous propositions were required. Hilbert’s axiomatization of geometry gave hope to mathematicians that all of math could be put on solid foundations. Indeed, “Hilbert’s program” was to prove that such a set of axioms exist and are consistent.

The “foundational crisis of mathematics” is lucidly illustrated in Logicomix, which is, in my professor’s words, simply the best graphic novel about the birth of analytic philosophy there is. I think the panel below gives you a taste of the style:
To cut to the point: the foundations were shown to be inherently shaky. Gödel in 1931 demonstrated that any sufficiently strong formal system, if it were consistent, could not prove its own consistency. This is largely seen as resolving Hilbert’s question from 1900 asking for a formal proof of the consistency of arithmetic. But in 1928, Hilbert put forward a more refined question, called the Entscheidungsproblem, that asks whether there can exist some algorithm that inputs any statement in first-order logic and accepts or rejects the statement based on whether it is tautologically true. This is the problem which led to Turing machines, and consequently, computers as we know them.

Turing machines

In 1936, Alan Turing, in order to address the Entscheidungsproblem and make precise what is meant by an “algorithm”, described what would become known as a Turing machine. A Turing machine is an abstract model of a computer. When Turing was writing, “computer” referred to a person employed to perform arithmetic computations. The Turing machine describes what this “human computer” is capable of doing by purely mechanical means. It consists of an infinite tape on which it can mark 0s and 1s (think of this as its memory) and a set of instructions. Given a certain input on its tape, a Turing machine can run for a while and then halt and accept the input or reject it. It is also possible that the machine gets stuck in an infinite loop and attempts to run forever. A certain problem is decidable by a Turing machine if, given the appropriate input, the Turing machine correctly accepts or rejects the input. For instance, the problem of whether a number n is prime is decidable because a Turing machine can easily loop through all the numbers between 2 and n-1 and check if these divide n. In code, we have,

        for i in range 2 to n-1
                if i evenly divides n then reject
        otherwise accept

Turing machines are robust and universal. “Robust” means that giving a Turing machine more tools doesn’t change the set of problems it can solve: we could define Turing machines with multiple tapes, tapes that are infinite in both directions, access to randomness, arbitrarily large alphabets instead of 0s and 1s only, but all of these machines solve the same problems as the simpler device I described above. Robustness allows us to think of all sufficiently strong notions of computer (including the laptop I’m typing on right now) as more-or-less equivalent. “Universal” means that there exists a single Turing machine that takes as input the description of another machine and its input and emulates the behavior of that machine. These universal Turing machines are basically the programming languages and universality tells us that as long as my system is sufficiently programmable, it can address any problem that any Turing machine can.

The Halting Problem

With such nice properties as robustness and universality, we might wonder whether there is anything these Turing machines can’t do. Well, there is! The Halting Problem asks whether we can do just a little bit better than a universal Turing machine and not only emulate some input code but check whether the code is good. Specifically, it asks if there is a Turing machine which given as input another machine and its input tells us whether that machine will eventually halt (either accepting or rejecting) or will get stuck in a loop and run forever. It turns out that such a machine cannot exist.

Indeed, imagine that we had such a machine M that takes as input another machine N and some input t and accepts if N halts on t but rejects if N runs forever given input t. Then we can define an evil machine X that runs on input t as follows:

        if M accepts on input of (t,t) then run forever
        otherwise accept

How does M behave given an input of (X,X)? If it accepts, that means the first line of the code above is true and the evil machine X runs forever given an input of itself, which means that actually M should have rejected! Conversely, if M rejects (X,X), then X halts on an input of X and so M should have accepted! Either way we are led to a contradiction and so such a machine M cannot exist.


The Halting Problem gives us an example of a decision problem that is not decidable by a Turing machine. The general consensus is that this resolves the Entscheidungsproblem in the negative: it shows that certain statements formalizable in symbolic logic are not decidable by an algorithm. The Church-Turing Thesis is the claim that a Turing machine accurately captures all that is resolvable by mechanical procedures. Some philosophers have taken this thesis much further and claimed that it asserts that a Turing machine could simulate any “physical system”; thus, in particular, human brains ultimately are describable by Turing machines and so fundamental limitations like the existence of undecidable problems apply to the way we think. Others have taken an opposite position and contend that precisely because we are able to recognize undecidability (something that a Turing machine cannot), human brains are not reducible to computers and are in some way transcendent. I’m not going to weigh in on this debate but I will point out that it shows the importance of communicating mathematical results to philosophers because it is a debate that only philosophers can have; working mathematician are not engaged in speculation about the limitations of human knowledge.

Again, I want to stress that the Halting Problem is about what kinds of mathematical problems are resolvable in principle. Of course many problems may be difficult to resolve in practice (such as for instance the computation of Ramsey numbers), but the Halting Problem says that even given unlimited resources there are just some problems I cannot solve by mechanical means. This is indeed a startling result. Hilbert, at least until he saw the results of Gödel and Turing, believed that all of mathematics could be resolved by computation. But it is important also to recognize that the kinds of problems that are not decidable by computers are weird and often, like the Halting Problem, hinge crucially on self-reference. Practical problems like finding optimal paths from one place to another, which tend to always be decidable, will be the subject of our next blog post.

What this blog is

This blog is my attempt to communicate interesting results from mathematics to those who like ideas but are scared of numbers. It’s important that mathematicians reach out to the wider intellectual community; some mathematicians might claim that technical expertise is what keeps their field insulated, but I think that a simpler explanation is how little mathematicians advertise what’s cool about modern math. Keith Devlin, a popular math writer whose books are universally excellent, points out that while newspapers generally have no trouble explaining medical research or even many ideas from theoretical physics, they are at a complete loss when it comes to math research. Indeed, many people are surprised that there is such a thing as math research; when I tell them that what I do most of the day is “think about 7 and 18″, their quizzical looks indicate that they don’t understand I’m joking. As Devlin has also remarked, since 1960 we have entered a new mathematical Golden Age, marked by deep insights into long-standing problems. This Golden Age has a number of different roots but one primary one is the emergence of computer science (which I will not distinguish from math). As computers come to occupy a more and more central place in our society, it is all the more important that mathematicians explain the theory behind these mysterious devices.

What this blog is not

This blog is not about the “philosophy of mathematics”, at least not as that term is traditionally used by analytic philosophers. Philosophy of mathematics attempts to answer questions like, “Do numbers exist?” and, “Are mathematical statements literally true or false?” Various schools have emerged in response to these questions. Platonists contend that the objects of mathematical discourse are real, abstract entities that exist independent of human experience. Intuitionists think that all of mathematics is ultimately mental construction. Formalists would deflect the questions a bit and say that mathematics is only the correct manipulation of certain formulas. Fictionalists think mathematics is a kind of grand fiction: a (convincing) story that mathematicians tell one another, but one which is ultimately untrue. I’m being very simplistic in my treatment of these different positions, but that’s because I think these questions are only interesting to those who already believe math itself is very interesting, a belief I am not assuming in my audience.

This blog is also not about the application of mathematics to other scientific fields. So while the recent advances in the application of statistics and computing to biology are remarkable, I will not comment on these at all. This is not meant as elitism; it’s because no one disputes that math is a useful tool. What’s at issue is whether math is independently interesting. Again, I’m hedging a bit here because I am including theoretical computer science within math where not everyone would. But theoretical computer science is formal and distanced from the material world in the same way as math and these are the properties that make math intimidating and exciting, so I feel okay treating them as synonymous.

This blog is more interested in results like Gödel’s incompleteness theorems, which say, roughly, that no consistent formal theory can prove its own consistency. Gödel’s theorems are relatively well-known even among non-experts. Philosophers have drawn from them some wild conclusions about the limitations of human knowledge. However, I won’t be talking directly about Gödel’s theorems because I think a lot of these philosophical conclusions are overblown and because they are too technical to work out with any justice. In a future blog post I plan to discuss the halting problem, which is morally very close to Gödel’s theorems and is much easier to explain informally.

Ramsey Theory

“Complete disorder is impossible” – Theodore Motzkin

But before I address the halting problem, I thought I’d warm up with a problem that is on the one hand completely elementary and on the other quite deep.

Imagine you are at a party with six people. Then there is some group of three who all don’t know each other or who are all acquainted with one another.

To see this, look from your own perspective. Break up the five others into two groups: those you know and those you don’t. The bigger of these two groups has at least three people in it; let’s say you know all these three. Then either there are two of them who know each other and you with those two form a trio of acquaintances, or these three all do not know one another and they form a trio of non-acquaintances. Lovely!

If this party only had five people, we couldn’t conclude that there was such a trio, as the following graph demonstrates:
A graph is a collection of vertices (black dots) and edges (lines between them). This graph is called complete because there is an edge between every pair of vertices. Here the vertices represent people and the edges the relationship between them. The edges have been colored red and blue to denote whether the two people are acquaintances or strangers. As you can check, no three vertices of this graph give you a triangle where all the edges are the same color.

What’s amazing about this result is that it extends to as big a group of acquaintances or strangers as we’d like. Suppose we wanted to look for a group of four instead of three. Then some slightly more advanced local analysis like above can show us that eighteen people at a party is always enough. On the other hand, there is an edge-coloring of the complete graph graph on seventeen vertices that, like the pentagram above, shows that seventeen people will just not do. Already the result for four people can cause some startling conclusions about the emergence of order in random groups:

In the course of an examination of friendship between children some fifty years ago, the Hungarian sociologist Sandor Szalai observed that among any group of about twenty children he checked he could always find four children any two of whom were friends, or else four children no two of whom were friends. Despite the temptation to try to draw sociological conclusions, Szalai realized that this might well be a mathematical phenomenon rather than sociological one. Indeed, a brief discussion with the mathematicians Erdös, Turán, and Sós convinced him this was the case. – Princeton Companion to Mathematics, p. 562

The nth Ramsey number, R(n), denotes the smallest number k such that for any edge-coloring of the complete graph on k vertices with colors red and blue, there is always some set of n vertices that induces a monochromatic subgraph. Ramsey’s theorem says that Ramsey numbers always exist: for any n, the value R(n) is finite. The proof is a straightforward application of mathematical induction, where a bigger case is repeatedly reduced to a smaller case until the statement is trivial. Above we showed that R(3) = 6 and R(4) = 18. Incredibly, the exact value of R(5) is unknown! If you figured out this number that would be a huge breakthrough. Although in principle finding R(5) involves only a finite number of computations, because for any fixed graph there are only a finite number of ways to color its edges with two colors, the amount of computation grows exponentially in n and so is beyond even our fastest supercomputers. Paul Erdős, the 20th century’s most prolific and eccentric mathematician, had this to say about the computation of Ramsey numbers:

“Erdős asks us to imagine an alien force, vastly more powerful than us, landing on Earth and demanding the value of R(5) or they will destroy our planet. In that case, he claims, we should marshal all our computers and all our mathematicians and attempt to find the value. But suppose, instead, that they ask for R(6). In that case, he believes, we should attempt to destroy the aliens.” – Joel Spencer

Ramsey’s theorem has two important philosophical conclusions. The first is that, as Motzkin puts it, it shows how order emerges in large structures even if those structures are allowed to be as random and apparently disordered as possible. Ramsey theory is a branch of mathematics concerned with finding similar results among other structures like sequences of numbers, points on lattices, or even infinite collections of objects. Ramsey theory shows that this phenomenon of emergent order is universal across a wide range of structures. As the case of the Hungarian sociologist proves, the order can be so striking that it might prompt us to look for external explanations!

But secondly, the fact that the fifth Ramsey number remains unknown tells us that computational complexity, the amount of resources like time and space required to carry out a mathematical procedure, is intimately linked to knowledge. It took me only a few paragraphs to explain the problem, and I assumed absolutely no background. That a puzzle so simply stated, and in principle so “easily” resolved, remains an important open problem tells us that devising efficient procedures to decide mathematical statements is not just a practically important but also philosophically important issue. Scott Aaronson has an excellent paper about many fascinating aspects of computational complexity. Aaronson’s paper is reading for our next blog post.


Get every new post delivered to your Inbox.