Wednesday, May 1, 2013

Scones, Logical Relations, and Parametricity

Scones, Logical Relations, and Parametricity:
MathML-enabled post (click for more details).

Consider a formal system, such as logic, type theory, or some programming language. Suppose you want to prove some “meta-theoretic property” of this system, such as the following.

  • Consistency: There is no proof of false. This is obviously a desirable property of a formal system.
  • The existence property: if it is provable that “there exists an x such that P(x)”, then there is some particular t such that P(t) is provable.
  • The disjunction property: if “P or Q” is provable, then either P is provable or Q is provable. The existence and disjunction properties are not true of all formal systems: they are characteristic of constructive logics. (Excluded middle obviously violates the disjunction property, while the axiom of choice obviously violates the existence property.) In fact, arguably they are a defining feature of constructive logics, where proving that something exists ought to involve “constructing” a particular such thing.
  • Canonicity: any term of natural-number type evaluates to (or, is provably equal to) some numeral (and various similar statements). This is an important property for any system that can function as a programming language: we want our programs to compute results!
  • Parametricity: I’ll explain this later on.

How might you go about proving properties like this? If you’re a syntactically minded type theorist, you might try to use induction over types. That is, you first prove something for “base types” such as the natural numbers N, then prove that if it holds for A and B then it holds for derived types such as A×B, A+B, and A→B, etc. After a lot of work reformulating the thing you’re trying to prove as a sufficiently general statement for such an induction to go through, you finally make it all work and write QED. Then, because the reformulation you ended up with involves assigning “relations” (sometimes unary, sometimes binary) to types in an inductive way, you call the method logical relations.

I’ve gathered that this sort of thing is perfectly intuitive to a type theorist. However, when I first encountered it, being a category theorist, I had a very hard time understanding it. Certain steps seemed extremely unmotivated, and I didn’t see any principled reason for what was going on. I think I understand the type theorists’ point of view a little better now — but fortunately there’s another way to structure the whole argument, which I think makes much more sense to a category theorist.

MathML-enabled post (click for more details).

As category theorists, we like to use universal properties to avoid doing work. (Or, more precisely, we like to use them to package up work into an easily reusable theorem, so that we only have to do the work once.) Fortunately, the syntax of type theory has a nice universal property: it constructs free categories with specified structure.

A little more precisely, from any type theory T, we can build a category Syn T whose objects are, roughly, the types in the theory, and whose morphisms are terms with free variables. That is, a morphism A→B is a term of type B involving a free variable of type A. Some more details, and a few examples, can be found at the nLab page on syntactic categories.

Now whatever structure our type theory T had, it is usually visible as some categorical property of the syntactic category Syn T. For instance, if T had product types, then Syn T has categorical products. If T also had function types, then Syn T is cartesian closed. And so on. Moreover, Syn T is the initial such category: for any other category C with the appropriate categorical structure, there is an essentially unique functor Syn T→C preserving that structure. (I’m sweeping a lot of details under the rug here, having to do with weakness and coherence and strictification and what categorical level this universal property lives at, but the basic idea is the important thing.)

Now the question becomes, how can we use this universal property to prove meta-theoretic properties of the type theory? Clearly, we are going to have to cook up some particular structured category C (or perhaps more than one) so that the existence of a structure-preserving functor Syn T→C tells us something about T. Thus, the construction of C will have to involve T somehow—but C must also be different from Syn T. Moreover, we should expect to have to use the uniqueness aspect of the universal property too.

When I first wrote the above paragraph, I said that we were going to have to cleverly cook up some category C. Then I felt embarrassed, because I remembered Tom’s comment that “if cleverness is the first quality that comes to mind, then it suggests to me that something is insufficiently understood.” So I thought for a bit, and now I can say more about where this C comes from. (I can actually say even more than I’m about to right now, but the rest will have to wait for another post.)

Let’s take a clue from type theory itself. Consider a type like N, the natural numbers type. Semantically, we want to think of this as a natural numbers object, which is an initial object in some category. More specifically, for any X equipped with morphisms z:1→X and s:X→X, there is supposed to be a unique morphism N→X respecting this structure.

Inside of type theory, the existence of such a morphism N→X is easy to specify: it’s called recursion and is the natural “elimination rule” for the type N. But uniqueness of that morphism is not as natural, type-theoretically: it would be asserting an equality, which is less “computational”. Instead, we generalize the recursion rule to say that not only does every such X come with a morphism out of N, every such X over N comes with a section. This turns out to be equivalent to asking N to be initial. For if N is initial, then when X is over N, the map N→X must be a section (by uniqueness of the composite N→X→N); and conversely, for any two morphisms N⇉X, if their equalizer admits a section, then they are equal. A homotopical version of this equivalence was proven recently by Awodey, Gambino, and Sojakova.

(You may wonder whether talking about “sections” is any better than talking about equality, since f being a section of p means that pf=1. However, dependent type theory has a way to talk about sections of particular kinds of maps—called “display maps” or “dependent projections” or “fibrations”—without referring explicitly to any kind of equality.)

This suggests that to use the uniqueness as well as the existence part of the universal property of Syn T, we might look for a structured category C over Syn T, for which it would be interesting to have a section. Moreover, we might expect the functor C→Syn T to be some sort of category-theoretic fibration.

Now how can we build an interesting fibration over Syn T, without knowing much about Syn T itself? Well, if we had a functor Γ:Syn T→D for some other category D, and some other functor G:E→D, then we could build the comma category of Γ over or under G. The most obvious choice of G, of course, would be the identity functor of D. But what sort of functor Γ:Syn T→D can we build without knowing much about Syn T? Well, as Tom also reminded us last year, about the only functors out of an arbitrary category that we can even mention are the representables, hom(x,−):Syn T→Set. And finally, what’s the most canonical object x∈Syn T? One very canonical object is the terminal one.

There are still a few choices that we might make differently, but I hope I’ve convinced you that at least one reasonable possibility to consider for C is the scone of Syn T: the comma category of Id Set over the global sections functor Γ=hom(1,−):Syn T→Set. Explicitly, an object of C=scn(Syn T) consists of an object A∈Syn T together with a function A′→ΓA. Elements of ΓA=hom(1,A) are called global sections of A, so we can view A′ as a family of sets indexed by the global sections of A. Similarly, a morphism from (A,A′) to (B,B′) consists of a morphism f:A→B in Syn T together with a function f′:A′→B′ making the evident square commute. The projection scn(Syn T)→Syn T simply forgets the primed data.

Over a year ago, I talked about scones from a “geometric” point of view. At that time, I said that given a “forgetful functor” U:S→Set, we can think of the scone as a “concretization” of S: its objects are “sets equipped with S-structure”. It’s at least interesting to ponder this in our current situation: the objects of scn(Syn T) are “sets equipped with the structure of a type”. I’m not really sure what that means, but it sounds profound.

Anyway, in order to apply the universal property of Syn T to get a functor R:Syn T→scn(Syn T), we’ll need to know that scn(Syn T) inherits all the categorical structure that Syn T has. And to conclude that this map is a section of the projection, we’ll need to know that the projection also preserves all this structure. But suppose we take these two facts as given for a moment, so that R exists and is a section of the projection. That last means that for A∈Syn T, R(A) is the object A itself equipped with a family of its global sections. Thus, what R does is just to equip every object of Syn T with a family of sets indexed by its global sections—that is, a map R′(A)→A—in a way which is functorial and structure-preserving.

In particular, insofar as the maps R′(A)→A are injective (as often happens), what R does is assign to each type a predicate—that is, a unary relation—on its set of global sections. And a global section of a type A in Syn T in just a “closed term” of type A (meaning a term involving no free variables). Thus, in general we may think of the elements of R′(A) over a:A as “witnesses” or “proofs” that some relation “R(a)” holds.

Finally, if we look “under the hood” into the actual construction of R, we see that it’s built inductively over the structure of types (since this is how we prove the universal property of Syn T). Thus, these unary relations R′(A)↪A are precisely the syntactically-minded type theorist’s “logical relations”. What category theory does for us is (1) fold all inductions over the structure of types into the single statement that Syn T is initial in some category, so that we only have to do it once; and even more importantly, (2) tell us automatically what each inductive step in the construction of R′ should look like: it’s determined by the corresponding categorical structure in scn(Syn T), which in turn is characterized by a universal property.

In order to get any mileage out of this, of course, we need an explicit description of how scn(Syn T) inherits structure from Syn T and from Set, in a way that is preserved by the forgetful functor scn(Syn T)→Syn T. This is basically always completely straightforward. Here are some easy examples that you can check for yourself:

  • The terminal object of scn(Syn T) is 1→1≅Γ(1).
  • The initial object of scn(Syn T) is ∅→Γ(0).
  • The product of A′→Γ(A) and B′→Γ(B) is A′×B′→Γ(A)×Γ(B)≅Γ(A×B). In other words, a witness of (a,b):A×B is just a witness of a:A together with a witness of b:B.
  • A morphism (f,f′):(A,A′)→(B,B′) is a monomorphism just when both f and f′ are so. In particular, a subobject of A′→Γ(A) is given by a subobject m:B↪A in Syn T, together with for each b∈Γ(B), a subset B′(b)⊆A′(m(b)).
  • If T includes (“truncated”) existential quantification, so that Syn T is a regular category, then (f,f′) is a regular epi just when both f and f′ are so.
  • If T includes binary unions of subobjects, then the union of (B,B′)↪(A,A′) and (C,C′)↪(A,A′) is (B∪C,B′∪C′).
  • If T includes a type N of natural numbers, then the natural numbers object of scn(Syn T) is ℕ→Γ(N), where ℕ is the ordinary set of natural numbers and the function sends each n to the numeral n̲. In other words, a witness of t:N is a natural number n such that t is equal to the numeral n̲.

This is enough to prove consistency, canonicity, and the disjunction and existence properties! Consistency is the easiest: if there were a term of type false, it would be a morphism 1→0 in Syn T. But then R would take this to a morphism R(1)→R(0). Since R preserves all the categorical structure, R(1) is the terminal object 1→Γ(1) and R(0) is the initial object ∅→Γ(0). But then we would have a morphism 1→∅ in Set, which is impossible. (Of course, this is a relative consistency result, as always – if our axioms for Set were inconsistent, then there would be a morphism 1→∅.)

For the existence property, suppose we can prove ∃(x:A).P(x). That means exactly that the composite P↪A→1 in Syn T is a regular epi. Since R preserves the relevant categorical structure in all cases, in this case it preserves regular epis. It follows that (P,R′(P))→(1,R′(1)) is regular epi, and in particular R′(P)→R′(1) is a surjective function. But R also preserves the terminal object, so R′(1)=1, and hence R′(P) must be nonempty. Therefore, Γ(P) is nonempty, so P has a global element in Syn T. In other words, there is a closed term t:A such that P(t) is provable. The argument for the disjunction property is similar; I’ll leave it as an exercise. (I think Peter Freyd was the first to prove these two properties categorically in this way.)

Finally, for canonicity, suppose t:N is a term of natural-number type, hence a global section t:1→N in Syn T. Then R(t) is a morphism (1,1)→(N,ℕ) in scn(Syn T), since R preserves the terminal object and the natural numbers object. But ℕ is the usual set of natural numbers, so this means that t must be equal to some numeral (namely, the primed part of R(t)).

I think this much is already pretty awesome, but now we get to parametricity. For this we need to know, first of all, what exponentials look like in scn(Syn T). Of course, they must be preserved by the forgetful functor, so the underlying Syn T-object of (B,B′) (A,A′) is the exponential B A in Syn T. But now the elements of the set (B A)′ which lie over f:1→B A must be equivalent to maps 1→(B,B′) (A,A′) in scn(Syn T) lying over f:1→B A in Syn T, and hence equivalently to maps (A,A′)→(B,B′) lying over f:A→B. In other words, if f:A→B is a morphism in Syn T, hence a global element of B A, then the fiber of (B A)′→(B A) over f consists of all functions A′→B′ lifting Γ(f).

In particular, if A′→Γ(A) and B′→Γ(B) are monic, hence merely predicates on the global sections of A and B respectively, then (B A)′ is also a predicate on global sections of B A, which holds of f:A→B precisely when Γ(f) maps A′ into B′. In other words, a witness of a function f is a function exhibiting the fact that f preserves witnesses. (This isn’t a proof that scn(Syn T) has exponentials, of course, only a characterization of what they must be if they exist. But as usual with this sort of thing, it’s easy to prove that this definition works.)

More generally, we’ll need local exponentials (exponentials in slice categories) and dependent products, but these work in essentially the same way: to every “function term” we assign the set of “liftings” of it to witnesses.

The other thing we need to know about is universe objects. Thus, suppose Type is a type of (small) types, giving a universe object in Syn T. In particular, we have the type Type˜=∑ X:TypeX of pointed types, with the first projection Type˜→Type being the “universal fibration” in Syn T. That is, for any type A, dependent types over A are classified by maps A→Type, with the resulting display map B→A being the corresponding pullback of Type˜→Type.

Now the first thing we need is a family of sets Type′→Γ(Type) indexed by the global sections of Type. Of course, a global section of Type is just a type dependent over 1, which is to say a type defined in the empty context. Now given such an A, a witness of A must correspond to a map (1,1)→(Type,Type′) in scn(Syn T) lifting A:1→Type in Syn T. But if (Type,Type′) is to be a universe object, such maps must be equivalent to objects of scn(Syn T) lying over A. In other words, a witness of a type A:Type is any lifting of A to an object of scn(Syn T). Any lifting at all.

In particular, since R:Syn T→scn(Syn T) must preserve all structure, we must have R(Type)=(Type,Type′). Note that this is not merely a unary relation on Γ(Type): a given type has many witnesses.

We also need the universal fibration (Type˜,Type˜′)→(Type,Type′). But this is likewise easy: suppose A′→Γ(A) is an object of scn(Syn T), hence a global element of (Type,Type′), and suppose also that a:1→A is a global section of A. Then we must have a commutative square of sets

Type˜′ → Γ(Type˜) ↓ ↓ Type′ → Γ(Type).

Elements of Type˜′ lying over (A′,a)∈Type′× Γ(Type)Γ(Type˜) should be the same as liftings of (A,A′):1→(Type,Type′) to (Type˜,Type˜′) which agree with a:1→A. But since the object (A,A′) must be the pullback of (Type˜,Type˜′) along its classifying map 1→(Type,Type′), this means precisely to give an element of A′ lying over a∈Γ(A).

That is, for a type A:Type with a witness A′→Γ(A), a witness of a:A is just an element of A′ over a∈Γ(A), i.e. a witness of a:A in the ordinary sense defined above.

Finally, we’re ready for parametricity. Consider the type ∏ X:Type(X→X), an inhabitant of which is a function assigning to every (small) type X an endomorphism of X. In classical mathematics, there are lots and lots of such functions. However, in constructive type theory, there’s essentially only one such function: the one which assigns to each X its identity function! The point is that without some additional assumption such as excluded middle or AC, we can’t write down such a function that behaves differently depending on what X is: e.g. we can’t say “if X has exactly 2 elements, then swap them, otherwise, …”. (I believe this is the origin of the word “parametricity” — we say the function has to be defined parametrically in X.) And if you just have an arbitrary X, without knowing anything about it, then as Tom would say, what’s the only function X→X that you can even mention? The identity function.

Of course, you can throw other garbage into the definition of such a function, but it won’t be able to change the fact that what comes out at the end of the day acts like the identity function. This is the theorem that we can now prove using the scone.

Categorically, the type ∏ X:Type(X→X) is built as the local exponential (Type˜→Type) (Type˜→Type) in the slice category of Syn T over Type, followed by the dependent product from this slice category to Syn T itself. And all these constructions must be preserved by R. Unraveling this, the first step gives us an object of scn(Syn T) over (Type,Type′), whose component in Syn T is the dependent type (X:Type)⊢(X→X):Type. The additional data assigned by R consists of

  • for every A:Type (this is a global section of Type in Syn T), and
  • for every choice of a lift A′→Γ(A) of A to scn(Syn T) (this is a witness of A:Type), and
  • for every term f:A→A (this is a global section of (Type˜→Type) (Type˜→Type) lying over the classifying map of A),

the set of liftings of f to witnesses. Note that these witnesses are precisely given by the specified A′→Γ(A) above. If we think only about relations, then at this stage, R is recording, for every term f:A→A, whether it preserves each unary relation on Γ(A).

Now at the second step, we consider the liftings of each g:∏ X:Type(X→X) to the witnesses defined above. In other words, a witness of such a g consists of, for every A:Type and every A′→Γ(A), a lifting of g A:A→A to these witnesses. In particular, this means that the function Γ(g A):Γ(A)→Γ(A) preserves every unary relation on the set Γ(A). Clearly this is only possible if Γ(g A) is the identity function (consider the characteristic relations of singletons).

Finally, any actual term g:∏ X:Type(X→X) yields a global section 1→∏ X:Type(X→X) in Syn T, hence a global section of R(∏ X:Type(X→X)), and hence a witness of g in the above sense. This is a precise form of the first parametricity theorem: any term of type ∏ X:Type(X→X) behaves like the identity function when applied to terms in the empty context.

What is the point of all this? I think the example of universe objects is where the category-theoretic point of view really shines. When you do “logical relations” from the syntactic type-theory point of view, you basically have to make up each inductive clause of the definition of the logical relation R as you go. The definition for function types is not too surprising, if you think about the unary relation R as a sort of “well-behavedness”: it’s not unreasonable to say that a function is well-behaved if it maps well-behaved inputs to well-behaved outputs. (I believe the usual word for “well-behaved” in this context is “reducible”.) But I think the definition for a universe is pretty opaque from this point of view: a witness to reducibility of a type is any relation on its global elements at all?

Moreover, if you insist on thinking of R as merely a relation on global elements, then you can’t even phrase it like this. Basic presentations of parametricity don’t actually include a universe object; instead they work in a “polymorphic type theory”, where you can form types that behave like “∏ X:Type(X→X)” but which are not literally dependent products over any type “Type”. Then you basically have to make up the combined rule for defining R on these types as above: g:∏ X:Type(X→X) is reducible if for every type A, the function Γ(g A) preserves every unary relation on Γ(A).

Let me end with a few words about generalizations. There are lots of other “parametricity” theorems – have a look at this classic paper. For instance, for any definable functors F and G, any term of type ∏ X:Type(F(X)→G(X)) must be a natural transformation. To prove this, you need not just unary logical relations but binary logical relations. But these is easy to do categorically too: instead of the scone of Syn T, we consider the category whose objects consist of an A∈Syn T, a set A′, and a function A′→Γ(A)×Γ(A). This category inherits all the relevant structure in essentially the same way.

The most general context that I know of in which to do this is the colax limit of a (pseudo)functor C:I op→Cat, where I is an inverse category. If I is the interval category 1→0 and C picks out Γ:Syn T→Set, then this colax limit is the scone. If I is the parallel pair 1⇉0 and C picks out Γ twice, then this colax limit is the place for binary relations I mentioned above. We obviously have n-ary relations for all n, and also potentially “higher” logical relations. (Are there any uses for those?)

The assumption that I is inverse is what gives things like exponentials and universe objects such a nice form in the colax limit. I made use of that same fact in this paper, focusing mainly on the special case of colax limits of constant functors C:I op→Cat — which is the same as categories of diagrams over I in some fixed category C. But I recently realized that the same techniques work for non-constant functors, so that the scones work homotopically as well (up to a point), and in particular, inherit things like the univalence axiom.

In principle, this means that we can extend all the above theorems to type theory with univalence. Canonicity in the presence of univalence, in particular, has been an important open question since its introduction: does the univalence axiom introduce new closed terms of natural number type whose numeric value isn’t determined? Dan Licata and Bob Harper showed that the answer is essentially no in the 1-truncated case, by adding definitional equality rules to make the type theory 1-truncated, and then giving a logical relations argument to show canonicity. The gluing approach, on the other hand, doesn’t modify the underlying type theory, but yields instead a homotopical answer: every term of natural number type is homotopic to a numeral. (That this is true was conjectured by Voevodsky when he introduced the univalence axiom.)

Unfortunately, the gluing approach also suffers from a coherence problem. In a homotopical setting, we’d like to glue along a “global sections” functor valued in simplicial sets (or some other model for ∞Gpd). However, it seems hard to obtain a strict functor of this sort. We do have global sections functors into Set or Gpd, and since Set and Gpd contain one and two univalent universes, respectively, we can derive homotopy canonicity for 1-truncated type theory (in which all types are 1-types) with one or two univalent universes. (The groupoid scone was also considered, without univalence, in this paper by Hofstra and Warren.) The new version of my paper (posted today) describes this; it’s an open question how any of these results could be extended to type theories without a truncation axiom, and with arbitrarily many univalent universes.

Acknowledgments: this post owes a lot to Bob Harper, from whom I first learned about logical relations at OPLSS; to Peter Lumsdaine, who first tried to tell me what they had to do with scones; and to Thierry Coquand, whose talk at IAS last term about regarding setoids as truncated semi-simplicial types finally crystallized for me the relation between inverse diagrams and logical relations.


DIGITAL JUICE

No comments:

Post a Comment

Thank's!