GraphQL

Foreword

The first part of this article is translated from REST 2.0 Is Here and Its Name Is GraphQL. The title is quite visually impactful, and I accidentally took the bait.

The rest consists of my reflections on GraphQL. Now, let's go through the translation while gathering our questions.

I. Translation

GraphQL is a query language for APIs. Although fundamentally different from REST, GraphQL can serve as an alternative to REST, providing high performance, a great developer experience, and powerful tools.

In this article, we will see how to handle some common scenarios using REST and GraphQL. This article is accompanied by three projects providing APIs for popular movies and actor information, along with a simple frontend application built with HTML and jQuery to view the corresponding REST and GraphQL source code.

We will examine the differences between these two technologies through these APIs to understand their pros and cons. Before starting, let's set the stage with a quick overview of how these technologies emerged.

The Early Days of the Web

The early days of the Web were simple. Early Web applications were static HTML documents. They evolved into websites wanting to include dynamic content stored in databases (such as SQL) and adding interactive features via JavaScript. Most Web content was accessed through desktop web browsers, and everything seemed great.

Web Server fetches data & outputs HTML

REST: The Emergence of APIs

Fast forward to 2007 when Steve Jobs introduced the iPhone. Besides the profound impact of smartphones on the world, culture, and communication, they also made the work of developers much more complex. Smartphones disrupted the status quo of development; within a few short years, we suddenly had desktops, iPhones, Androids, and tablets.

In response, developers began using RESTful APIs to provide data to applications of various shapes and sizes. The new development model looked something like this:

REST maps URIs to Resource

GraphQL: The Evolution of APIs

GraphQL is a query language for APIs, designed and open-sourced by Facebook. You can think of GraphQL as an alternative to REST for building APIs, whereas REST is a conceptual model used for designing and implementing APIs. GraphQL is a standardized language, a type system, and a specification that establishes a strong contract between the client and the server. It provides a standard language for communication between all devices, simplifying the process of creating large cross-platform applications.

With GraphQL, the diagram changes to this:

GraphQL is a query language for APIs

GraphQL vs REST

For the following section, it is recommended to follow along with the source code, which can be found in the accompanying GitHub repo.

The source code contains three projects:

RESTful API implementation
GraphQL API implementation
A simple client-side webpage built with jQuery and HTML

To compare these two technologies as simply as possible, the projects were intentionally designed to be simple.

If you want to follow along, first open three terminal windows, then cd into the RESTful, GraphQL, and Client directories of the repository, and start the services with npm run dev. Once ready, continue reading :)

Querying via REST

Our RESTful API has several endpoints:

Endpoint	Description
/movies	returns an Array of objects containing links to our movies (e.g. [ { href: ‘http://localhost/movie/1’ } ]
/movie/:id	returns a single movie with id = :id
/movie/:id/actors	returns an array of objects containing links to actors in the movie with id = :id
/actors	returns an Array of objects containing links to actors
/actor/:id	returns a single actor with id = :id
/actor/:id/movies	returns an array of objects containing links to movies that the actor with id = :id has acted in

Note: Even with such a simple data model, we already have six endpoints to maintain and document.

Imagine we are client-side developers needing to build a simple webpage with HTML and jQuery using these movie APIs. To construct this page, we need information about movies and their corresponding actors. Our API has all the features we need, so we'll fetch the data directly.

Open a new terminal and run:

curl localhost:3000/movies

You should get a response like this:

[
  {
    "href": "http://localhost:3000/movie/1"
  },
  {
    "href": "http://localhost:3000/movie/2"
  },
  {
    "href": "http://localhost:3000/movie/3"
  },
  {
    "href": "http://localhost:3000/movie/4"
  },
  {
    "href": "http://localhost:3000/movie/5"
  }
]

In a RESTful manner, the API returns an array of links, each corresponding to an actual movie object. Then execute curl http://localhost:3000/movie/1 to get the first movie, curl http://localhost:3000/movie/2 for the second... and so on.

In app.js, you can see the method we use to fetch all the data needed for the page:

const API_URL = 'http://localhost:3000/movies';
function fetchDataV1() {

  // 1 call to get the movie links
  $.get(API_URL, movieLinks => {
    movieLinks.forEach(movieLink => {

      // For each movie link, grab the movie object
      $.get(movieLink.href, movie => {
        $('#movies').append(buildMovieElement(movie))

        // One call (for each movie) to get the links to actors in this movie
        $.get(movie.actors, actorLinks => {
          actorLinks.forEach(actorLink => {

            // For each actor for each movie, grab the actor object
            $.get(actorLink.href, actor => {
              const selector = '#' + getMovieId(movie) + ' .actors';
              const actorElement = buildActorElement(actor);
              $(selector).append(actorElement);
            })
          })
        })
      })
    })
  })
}

As you've noticed, this is less than ideal. To accomplish this, we made 1 + M + M + sum(Am) API calls, where M is the number of movies and sum(Am) is the total number of actors in all M movies. This might be fine for apps with small data needs, but not for large-scale production systems.

The conclusion? Our simple RESTful approach isn't suitable. To optimize the API, we might ask the backend team for a dedicated /moviesAndActors endpoint to support this page. Once this endpoint is ready, we can replace the 1 + M + M + sum(Am) network requests with a single request.

curl http://localhost:3000/moviesAndActors

You should get a response like this:

[
  {
    "id": 1,
    "title": "The Shawshank Redemption",
    "release_year": 1993,
    "tags": [
      "Crime",
      "Drama"
    ],
    "rating": 9.3,
    "actors": [
      {
        "id": 1,
        "name": "Tim Robbins",
        "dob": "10/16/1958",
        "num_credits": 73,
        "image": "https://images-na.ssl-images-amazon.com/images/M/MV5BMTI1OTYxNzAxOF5BMl5BanBnXkFtZTYwNTE5ODI4._V1_.jpg",
        "href": "http://localhost:3000/actor/1",
        "movies": "http://localhost:3000/actor/1/movies"
      },
      {
        "id": 2,
        "name": "Morgan Freeman",
        "dob": "06/01/1937",
        "num_credits": 120,
        "image": "https://images-na.ssl-images-amazon.com/images/M/MV5BMTc0MDMyMzI2OF5BMl5BanBnXkFtZTcwMzM2OTk1MQ@@._V1_UX214_CR0,0,214,317_AL_.jpg",
        "href": "http://localhost:3000/actor/2",
        "movies": "http://localhost:3000/actor/2/movies"
      }
    ],
    "image": "https://images-na.ssl-images-amazon.com/images/M/MV5BODU4MjU4NjIwNl5BMl5BanBnXkFtZTgwMDU2MjEyMDE@._V1_UX182_CR0,0,182,268_AL_.jpg",
    "href": "http://localhost:3000/movie/1"
  },
  ...
]

Great! With just one request, we can fetch all the data the page needs. You can see the specific implementation of this optimization in app.js within the Client directory:

const MOVIES_AND_ACTORS_URL = 'http://localhost:3000/moviesAndActors';
function fetchDataV2() {
  $.get(MOVIES_AND_ACTORS_URL, movies => renderRoot(movies));
}
function renderRoot(movies) {
  movies.forEach(movie => {
    $('#movies').append(buildMovieElement(movie));
    movie.actors && movie.actors.forEach(actor => {
      const selector = '#' + getMovieId(movie) + ' .actors';
      const actorElement = buildActorElement(actor);
      $(selector).append(actorElement);
    })
  });
}

Our new application will be faster than the previous version, but it's still not perfect. If we open http://localhost:4000 to view our page, we'll see:

Move demo page

If you look closely, you'll notice our page only uses the movie's title and image, and each actor's name and image (in fact, we only use 2 out of 8 fields for the movie object, and 2 out of 7 fields for the actor object). This means we are wasting 3/4 of the information retrieved from the network request! Such excessive bandwidth usage significantly impacts performance and brings extra infrastructure costs.

A clever backend developer might smirk and quickly implement a special query parameter called fields, which accepts a list of field names to dynamically determine which fields a specific request should return.

For example, we might use curl http://localhost:3000/moviesAndActors?fields=title,image instead of curl http://localhost:3000/moviesAndActors. There might even be another special query parameter actor_fields to specify which fields the actor model should contain, such as curl http://localhost:3000/moviesAndActors?fields=title,image&actor_fields=name,image.

Now, this is pretty much the best implementation for our simple application, but it introduces a bad habit of creating customized endpoints for specific pages in a client-side app. This problem becomes even more evident when you start building an iOS app that displays information differently from the web and Android apps.

Wouldn't it be great if we could build a universal API that explicitly describes the entities in our data model and the relationships between them, without incurring the performance issues of 1 + M + M + sum(Am)? Good news! We can!

Querying with GraphQL

With GraphQL, we can jump straight to the optimal query, fetching all the information we need without any redundancy through a simple and intuitive query:

query MoviesAndActors {
  movies {
    title
    image
    actors {
      image
      name
    }
  }
}

Highly recommended! Try it manually: open GraphiQL (a great in-browser GraphQL IDE) at http://localhost:5000 and execute the query above.

Now, let's dive a bit deeper.

Thinking in GraphQL

GraphQL takes an entirely different approach to APIs than REST. Instead of relying on HTTP structures like verbs and URIs, it proposes an intuitive query language and a powerful type system layer on top of your data, providing a strong contract between the client and the server. The query language provides a mechanism for client-side developers to fetch exactly the data any page wants, permanently.

GraphQL encourages thinking of data as a virtual information graph. Entities containing information are called types, and these types can be related to each other through fields. Queries start from the root and traverse this virtual graph for the required information.

This "virtual graph" is called a schema. A schema is a collection of types, interfaces, enums, and unions that make up an API's data model. GraphQL also includes a convenient schema language to define our API. For example, here is the schema for our movie API:

schema {
    query: Query
}

type Query {
    movies: [Movie]
    actors: [Actor]
    movie(id: Int!): Movie
    actor(id: Int!): Actor
    searchMovies(term: String): [Movie]
    searchActors(term: String): [Actor]
}

type Movie {
    id: Int
    title: String
    image: String
    release_year: Int
    tags: [String]
    rating: Float
    actors: [Actor]
}

type Actor {
    id: Int
    name: String
    image: String
    dob: String
    num_credits: Int
    movies: [Movie]
}

The type system opens the door to a wealth of great things, including better tooling, better documentation, and more efficient applications. There's a lot to discuss here, but for now, let's skip ahead to focus on more scenarios showing the differences between REST and GraphQL.

GraphQL vs REST: Versioning

A quick Google search will yield many opinions on (or involving) REST API versioning. We won't delve deep into that here, but I just want to emphasize that it is a meaningful problem. One factor that makes versioning difficult is that it's usually hard to know what information is being used by which apps and devices.

Adding information is generally easy; in both REST and GraphQL, adding fields will flow into REST clients but be safely ignored by GraphQL unless the query changes. However, deleting and editing information is a different story.

In the REST approach, it's hard to know which information at the field level is being used. We know an endpoint /movies is being used, but we don't know if the client is using title, image, or both. One possible solution is adding a query parameter to specify returned fields, but these parameters are usually optional. Therefore, we often see endpoint-level changes, such as introducing a new endpoint /v2/movies. This works, but it increases the surface area of our API while burdening developers with constant updates and providing exhaustive documentation.

Versioning in GraphQL is different. Every GraphQL query must accurately describe what fields are requested. This mandatory requirement means we know exactly what information is requested, allowing us to further inquire about request frequency and the requester. GraphQL also supports primitives for decorating a schema with deprecated fields and deprecation reasons.

Versioning in GraphQL:

Evolve your API without versions

GraphQL vs REST: Caching

Caching in REST is straightforward and efficient. In fact, caching is one of the six REST constraints, built into RESTful design. If a response from the /movies/1 endpoint says it can be cached, any future request to /movies/1 can simply be replaced with the cached content. Very simple.

Caching in GraphQL is handled slightly differently. Caching a GraphQL API typically requires introducing some form of unique identification for each object in the API. With a unique identifier for each object, clients can build standardized caches using this identifier for reliable caching, updates, and expiration. When a client initiates a downstream query referencing that object, it uses the cached version of that object. If you want to know more about GraphQL caching principles, there is a good article discussing this topic in depth.

GraphQL vs REST: Developer Experience

Developer experience is a crucial aspect of application development and the reason we as engineers spend a lot of time building good tools. The comparison here is somewhat subjective, but I believe it's still worth mentioning.

REST has a rich ecosystem of tools that help developers document, test, and inspect RESTful APIs. That said, developers pay a huge price for extending REST APIs. The number of endpoints explodes, inconsistencies become more apparent, and versioning becomes more difficult.

GraphQL truly excels in developer experience. The type system has opened the door to all kinds of incredible tools, such as the GraphiQL IDE, and having documentation built into the schema. There is only one endpoint in GraphQL, and you don't rely on documentation to find what data is available. You have a type-safe language with autocomplete for available items, which you can use to build APIs rapidly. GraphQL also works well with popular frontend frameworks and tools like React and Redux. If you're considering building an application with React, I highly recommend checking out Relay or the Apollo client.

Summary

GraphQL provides a powerful set of tools for building efficient data-driven applications. REST isn't going away anytime soon, but GraphQL offers many desirable features, especially when building client-side applications.

If you want to dive further, check out Scaphold.io’s GraphQL Backend as a Service, where you can deploy a production-ready GraphQL API with AWS in minutes, and then customize and extend your own business logic.

I hope you enjoyed this article. I'm happy to exchange any thoughts or comments. Thanks for reading!

II. Reflections

Adding an extra layer of abstraction over interfaces certainly brings greater flexibility. For instance, you only need to implement atomic interfaces to freely combine return content.

Note: The translation above mentions that GraphQL is an abstraction over data, but it should actually be an abstraction over interfaces (it's just that the concept of an interface is weakened and not exposed; the weakened interface is closer to something like SQL statements). If every field corresponds to a query interface, then it would be easy to implement a universal interface management layer to fulfill all GraphQL functions. In fact, GraphQL provides such a universal definition.

Thus, the biggest issue should be the existence of redundant queries, because the premise of freely combining field returns is pinpointing down to the field level first. In other words, whereas a strong interface would originally return a bunch of fields, now every field is required to provide a weak interface so that return content can be accurately assembled based on a custom query.

Of course, redundant queries can be partially alleviated through query optimization, such as batching queries for fields based on their dependency relationships. However, in complex scenarios, such optimization might not be easy to implement.

If there were a database (or an abstract query layer) with this optimization built-in to solve performance issues, I believe GraphQL would gain an overwhelming advantage. First, you'd never have to endlessly add endpoints anymore; second, maintaining a set of standardized things is far better than maintaining 'n' interfaces with different versions of the same interface—it's almost a no-brainer.

As for the integration with the frontend ecosystem (Redux isn't that universal after all), it's clearly not a big problem.