Why is servant a type-level DSL?


This post is an attempt at explaining servant’s design as an embedded domain specific language, and particularly why it had to be a type-level domain specific language, given our requirements. Along the way, we will discuss approaches for designing extensible EDSLs in Haskell and see why other simpler approaches just can’t meet the said requirements.

It all started with a problem

Back in 2014, Sönke Hahn, Julian Arni and myself were working together in “the Haskell team” at Zalora on all sorts of projects. Many of them involved serving web applications, querying external APIs or our own services from Haskell, PHP, JS and probably a few other languages. At the time, we were using a few of the well established “web frameworks”, among which scotty, whenever we had to offer some service over HTTP.

However, writing all those functions for hitting our own webservices was a lot of manual, error-prone, tedious work. The bigger web applications got, the more tedious it became. And it had to be done once per language in which we wanted to hit the application. This could not continue.

For reference, this is what a simple scotty application looks like:

How could we somewhat automate the creation of one client function per endpoint of the web application? In an ideal world, we would just show this application to some program or library and it would collect all the data it needs about the overall structure of the application from the code itself, in order to produce 2 client functions:

which would do all the hard work of preparing an HTTP request for us, even taking care of JSON encoding and decoding for us. But… the entire structure of the application is just hidden in the do block and we just cannot programmatically access it.

So… this is not realistically doable. We clearly need to change (a little bit? a lot?) the way we write our applications, making sure we get a description of the web’s application structure (the endpoints, the part of the request they use or depend on, what they return) that we could then hand over to something, which would get us our client functions.

We will now try implementating such a web application description DSL in the most straightforward way possible.

Note: We could define a DSL with an interpreter that spits out Haskell code for us (through Template Haskell or another mechanism). The result would be typed, but the process would not be. The translation from the DSL to the result would be untyped code, therefore easy to get wrong. As Haskell programmers, we prefer static type checking where possible. This therefore excludes Template Haskell and other code generation mechanisms.

A first, non-modular attempt

We want to produce client functions that look like the ones above, that prepare and send HTTP requests for us by taking some pieces of data given as arguments to those functions and encoding then storing them in the right places of the request (request path for URL captures, request body, headers, etc). Let’s perhaps start simple with a data type for describing an endpoint that can be served under some path (which can contain static string fragments and captures), for a given http method, ignoring everything else for now.

It could look like this:

and, if we want it to look a little more “servant-y”, we can define:

Unlike servant though, as you can see with the type of getHello and getHelloNew, our descriptions are good old Haskell values, both of the Endpoint type.

Given those few definitions, how could we go about, say, generating links to endpoints? Well, here is a straightforward attempt.

But… what should we put in place of those ???, if anything?

Well, we definitely want to add some path component, to fill the Capture slot. However, by definition, a captured path fragment is not fixed, it is allowed to vary. In other words, Capture :> Verb Post matches both POST /x and POST /y. We cannot just pick one value and hope that it is the one the user wanted. We need to take it as an argument. But what about Capture :> Capture :> Verb Post? We would need our linkTo function to take 2 arguments for that case. And zero additional argument for Static "hello" :> Verb Post. This is quite problematic.

Indeed, we would like the type of linkTo to be Endpoint -> Link, Endpoint -> String -> Link, Endpoint -> String -> String -> Link and so on depending on what the Endpoint argument is. In other words, we want the return type of linkTo (when really seen as a function of one argument, which it is anyway) to depend on the value of type Endpoint it gets as input. That is, we want a type that depends on a value, i.e dependent types.

The alternative would be:

This solution is very unsatisfactory, first and foremost because it is possible for it to error out if we don’t supply the right number of captures. However, it will also silently let us pass too many capture values without telling us that some of them are not used. Such a function should be total, we really don’t want an implementation that can let us down if we’re not very careful.

Fortunately, GADTs can help here. We could turn Endpoint into a GADT that tracks captures and then use some type-level computations to get the type of the link-making function from our list of captures, as well as define the link making function through typeclass instances that would go through the captures and add an argument for each of them. Request bodies, query parameters, headers? We could probably track them too, in a similar way. Or we could unify it all by basically building up and tracking actual servant API types through a GADT version of Endpoint’s type argument, and do some of what servant does at the type-level, with everything else at the value-level.

However, all those approaches have a big problem. Once you’ve made a decision, it is set in stone, in a way. Indeed, in all these approaches, if you want to extend your web app description language, you need to add a new constructor to your Endpoint type. You then need to handle this new constructor in all the functions that pattern match on Endpoint constructors since they’ve instantly become partial. Not to mention that just the act of adding a constructor requires rebuilding the entire library. You cannot explore two different directions simultaneously without breaking code, you cannot add new constructs you hadn’t thought of without touching the library, e.g just locally in a project of yours. Extensibility and modularity were central requirements as we had been bitten by the lack of them in libraries that we were using at the time.

Note that a GADT-based approach would work well (in addition to being more approachable) for very stable domains, and is not considered here because of the kind of flexibility we are asking for.

So… how do people build extensible/modular DSLs in Haskell? The next section talks about the general problem behind this and a solution that I read about that gets us halfway to servant.

The Expression Problem

To quote Phil Wadler: “the expression problem is a new name for an old problem. The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety (e.g., no casts)”.

In Haskell, the standard approach to representing some domain is to define an algebraic data type for it. For a simple type of expressions with additions and integers, we usually do:

and proceed to write what we call “interpreters”, which in this case are just functions that take expressions as input and do something interesting with them.

So, given an expression type, we can easily “add new functions over the data type”, to reuse Phil’s wording. We just write a new function. However, when the time comes to “add new cases to the data type”, this approach becomes painful. A “new case” here means a new constructor for our Expr data type. Let’s say we want to support multiplications too:

Now, we have to modify every single function that patterns matches on an Expr to handle the Mul constructor, including our eval and prettyPrint “interpreters”. For any non-trivial domain, this becomes very painful, very quickly. Fine, so what other options are there?

Ralf Lämmel’s slides on the topic have been of a great help for me, back when we were looking for a solution suitable to our needs. With Oleg Kiselyov, they show how we can reasonably easily (that is, in Haskell 98) achieve full extensibility in both directions (constructors and interpretations) in Haskell. It boils down to:

  • Turn what would be a constructor into its own little data type.
  • Turn what would be a simple function that operates on the data type into a typeclass with a method.
  • Write instances of those typeclasses for the data types representing the DSL’s constructs.

This effectively means that we won’t have a single type to represent all the valid “endpoint descriptions”. Instead, with this approach, we will be able to process any “reasonable” combination of “endpoint components”. The Expr typeclass below is exactly what lets us say what is a valid endpoint description and what isn’t. Using their approach for our expressions would look like this:

Every constructor that we had in our previous Expr data type is now turned into its own little type, and every interpretation becomes a type class that all those little types are then free to provide an instance for. In fact, we do not necessarily have to supply an instance of each interpretation for all of our constructs. If we try to interpret an expression that uses a construct not supported by this interpretation, we get a type error! This is much better than calling error in some corner cases that should in theory not be reached… In theory. Right.

Anyway, if we now want to add support for multiplications, we can do:

We didn’t have to change any existing function, that’s great! Let’s apply this approach to a very simplified web application description “language” that we could make out of tiny building blocks (static path fragments, captures, etc).

A first modular attempt

Adapting the approach from the previous section to our domain, we can give a shot at decomposing the kind of information we want to represent into a few different “constructs” (i.e data types).

OK, why not. Let’s now try to write an interpretation for generating links to endpoints like the one above. This is a lot simpler and self-contained than investigating client generation or server-side routing, while retaining many of the difficulties. The main one is that depending on what we find in the description of the endpoint, we need the type of the link-generating function to change: indeed, if we encounter Captures, then the user has to supply values for them. We will let the user do that through one additional argument per Capture we encounter.

Let’s start with something really simple.

We should be appending something in place of those ??? there. But since Capture represents variable path fragments (like :userid in /user/:userid, in many web frameworks), we do not want to pick a fixed string, we would like for this string to be supplied by the caller of link, as stated above. Let’s introduce a slightly fancier HasLink class to make it seemingly “variadic”.

Looks good. Except that this does not typecheck. The problem is with the Capture :> api and Static :> api instances. While we know that the link function will eventually return a Link, once given arguments for all the Captures, we don’t know whether there is another Capture later in api. If there is, then link api would have type e.g String -> Link, and we cannot cons a String on top of… a function.

We have to be a little smarter and accumulate the path components as we go without building up the final list directly. We will be accumulating the path components in reverse order, to make the accumulation efficient, and reverse the whole list at the end to give the final Link (= [String]) value.

We can finally generate links with the new approach:

This looks promising. Let’s now try to introduce some more types here, by allowing captures to not be specified just as simple strings, but any Showable type (this is terrible, but simple enough for this post). We need to modify Capture to track that Showable type we will use to specify the value of that path fragment.

We unfortunately cannot just “track” some type by storing it in a field (which is different from storing a value of that type). Instead we make Capture a clone of Proxy (from Data.Proxy) and just carry around a phantom type parameter. This is a little inconvenient as we will have to type annotate all Captures (or use the TypeApplications language extension), but let’s roll with this approach for now.

Let’s now see an endpoint description using this variant of Capture.

OK, interesting, why not. It does look a little bit ugly. It would look even uglier if we included the response type in Verb, turning it into data Verb a = Verb Method which would require the same kind of type annotations. And the same problem would manifest itself if we were to add all the similar types from servant (ReqBody, QueryParam, Header, etc). This is quite disappointing.

Unrelatedly, have you noticed that I have not given the type of any of our endpoint descriptions so far? This is on purpose, because those types are a little bit fancy. Fortunately, they should look familiar:

That’s right, not only do the descriptions (which are good old haskell values) look like servant’s API types, but their types too! We can see that we are only “hiding” the strings (in static path fragments) and the HTTP method (in verbs) from the type-level.

Most of the other bits of information we would want to see in API descriptions will also have to be reflected at the type-level. When we consider content types for example, we have no choice but to keep track of them at the type level too, even with this design. Because we need to make sure suitable encoding/decoding instances are available for the types that will be represented with those MIME types, and this cannot be done when discovering "application/json" in a list somewhere, at runtime.

All in all, there is no value in keeping anything at the value level at this point. And we are already traversing a bunch of types mixed together with funny symbols and computing the type of a link making function as we go, as evidenced by the HasLink instances from above, so we’ve already got one foot in type-level land.

An important tradeoff that we are making here is that while putting more information at the type-level indeed makes things more complex, it does however give a chance to our descriptions to influence more things, including other types. This is noticeable in the last HasLink instance we wrote, where making Capture track the type the url fragment is going to be decoded to allowed us to directly make the link-making function take a value of that type, instead of a string. This is strictly more powerful and will allow us to work in a very strongly typed world and where the typechecker “writes” a whole lot of code for us.

Let’s bite the bullet and finally take a quick look at what servant’s type-level approach looks like.

Servant’s approach (simplified)

First, let me emphasize that any of the designs we have considered so far are interesting on their own and are fruitful in different ways. They were not quite good enough to meet our requirements which were, again, dictated by the projects and needs we had at work. This whole project started because we were sick of getting things wrong when manually constructing (client) or deconstructing (server) HTTP requests and so on.

Now, let’s write our type-level DSL. If you want a longer version of just this section, with more explanations, you may want to read Implementing a minimal version of servant.

As you can see, there isn’t a single constructor in sight, all the types (but Method) are empty. And now, we proceed with the HasLink class. Since we don’t have any value to give to the link method, given that the description is now a type, we will use data Proxy a = Proxy to act as an intermediate between the value level, where the calls to link will happen, and the type level, where the descriptions live and drive the link interpretation through our typeclass instances.

It is not all that different from the code in the previous section. We can use it all as follows:

And that’s it! The key ingredients to servant’s design are all here. If you want to read more about actually implementing server/client interpretations for the DSL, the Going further section has got you covered with a few relevant links.

Conclusion

I hope this little tour of some of the designs we explored on our way to writing servant was useful and informative, whether from a Haskell EDSL writer perspective or for any Haskeller who has ever wondered about why the descriptions live at the type-level. The real servant libraries of course have a much richer vocabulary for describing endpoints and entire APIs, and offer many interpretations in addition to the type-safe links. But the core ideas behind the design and implementation are the same ones we progressively arrived at in this post.

Going further

  • Servant, Type Families, and Type-level Everything - A look at advanced GHC features used in Servant

    I suspect this is a rather useful resource for Haskellers who haven’t yet encountered type-level programming in (GHC) Haskell.

  • Implementing a minimal version of servant

    A more approchable and more narrowly focused alternative to the servant paper, which consists in implementing a very simplified version of servant, using however the same “API type” based approach for the EDSL as the real servant.

  • the servant paper, published at the Workshop on Generic Programming, 2015.

  • Software extensions and Integration with Type Classes

    by Ralf Lämmel and Klaus Ostermann talks in greater depth than the slides about the highly modular approach to embedded domain specific languages in Haskell that we’ve seen above, and uses it on several examples.

  • serv and solga are smaller, younger and (I think) humbler relatives of servant which make slightly different choices for the DSL.

    Somewhat relatedly, there is servant-0.1, which wasn’t anything like the current approach with its API types. The link leads to its README, with an example and some explanations about the approach, for the curious reader.