Evaluating GraphQL Schema Designs

Here are some things to consider when working on the design of GraphQL schemas.

Naming⌗

Does it use generic naming? In large systems, it’s common to have multiple entities with similar names, such as ‘Event’ and ‘User’. To avoid confusion and improve the overall readability of the schema, it’s a good idea to use parent interfaces and implement them in specific types, such as TeamMember implements User and Viewer implements User. See the related section on Interfaces.

Scalars and Enums⌗

Are there opportunities to use scalars or Enums over generic strings?

type User {
  id: ID!
  name: String!
  email: EmailAddress!
  status: UserStatus
  creditCard: CreditCardNumber
}

enum UserStatus {
  ACTIVE
  INACTIVE
  PENDING
}

GraphQL has recently improved its support for Custom Scalars: https://graphql.org/blog/2023-01-14-graphql-scalars/.

Interfaces⌗

Interfaces are great for splitting up an Object that has several variants.

interface Event {
  id: ID!
  name: String!
  date: String!
}

type Birthday implements Event {
  id: ID!
  name: String!
  date: String!
  age: Int!
}

type Holiday implements Event {
  id: ID!
  name: String!
  date: String!
  location: String!
}

type GenericEvent implements Event {
  id: ID!
  name: String!
  date: String!
  description: String!
}

type Query {
  events: [Event]!
}

Use them for objects that have shared behaviors, not just shared fields. A good indication of the latter is if the interface name is generic. Interfaces of purely shared field’s can lead to a schema that is difficult to evolve in the future. This also applies to Fragments.

Mutations⌗

When designing ‘create’ mutations, think about their usage. Sometimes, consumers will not be looking to provide all the fields for a type. Rather than having a number of nullable fields, is there a better approach? Small, specific ‘create’ queries with looser ‘update’ queries is one way.

type Query {
  createUser(input: CreateUserInput!): User!
  updateUser(id: ID!, input: UpdateUserInput!): User!
}

type User {
  id: ID!
  name: String!
  email: String!
  password: String!
  address: String
  phone: String
}

input CreateUserInput {
  name: String!
  email: String!
  password: String!
}

input UpdateUserInput {
  name: String
  email: String
  password: String
  address: String
  phone: String
}

The result is clear error messages; simpler resolvers; and in some cases, less confusion for consumers.

Default Values⌗

For queries with optional arguments, consider if it would be useful to have default values in place. For example, search queries with arguments for the sort order or date.

Global IDs⌗

This idea comes from Relay, would it be useful for consumers to fetch any object via a global ID? This can be leveraged by consumers for caching. Apollo Client does have other mechanisms such as combining fields.

query {
  node(id: "VXNlcjox") {
    id
    ... on User {
      name
      email
    }
  }
}

type Node {
  id: ID!
}

type User implements Node {
  id: ID!
  name: String
  email: String
}

Should ID’s map directly to the underlying ID or be ‘opaque’, like using base64 encoding? If they map directly, users may attempt to ‘hack’ ID’s. Opaque ID’s are not meant as a serious obfuscation technique, rather to discourage consumers from building logic based on the ID construction.

The GitHub API settled on an opaque ID design where the first letters indicate what type of object it is, e.g. U_kgDOADP9xw, PR_kwDOAHz1OX4uYAah: https://docs.github.com/en/graphql/guides/migrating-graphql-global-node-ids

Nullable or Required Fields⌗

Keep in mind, that changing a field from null to required in the future is a breaking change. The opposite is not.

Making arguments required is a good choice. They can be paired with Default Arguments, as discussed previously. Adding additional arguments in the future as nullable is a non-breaking change.

Pagination⌗

I mostly come across ‘offset’ pagination:

type Query {
  products(limit: Int!, page: Int!): [Product!]!
}

Offset-style pagination is simple. It involves using query arguments to specify the number of items to retrieve and the page number to retrieve. This is easy to implement, but it can result in poor performance when used with large datasets, as the underlying SQL queries can become inefficient:

SELECT * FROM products
WHERE user_id = %user_id
LIMIT 250 OFFSET 500;

Cursor-style pagination is a more advanced approach that can provide better performance with large datasets. It involves using a cursor, which is a unique identifier for a specific item in the dataset. The client can then specify the cursor for the item they want to start from, as well as the number of items to retrieve. This allows for more efficient SQL queries, as the query can use the cursor to efficiently locate the desired items:

type Query {
  products(limit: Int!, after: String): [Product!]!
}

SELECT * FROM products
WHERE user_id = %user_id
AND id >= 15
ORDER BY id DESC
LIMIT 10

The Relay ‘connection’ pagination model is a good design principle to follow when using cursor-style pagination. This involves using a Connection type that contains an array of Edge objects, each of which contains a cursor and the data for a single item. The Connection type also includes a pageInfo object that contains information about the pagination.

type Query {
  allUsers(first: Int, after: String): UserConnection
}

type UserConnection {
  edges: [UserEdge]
  pageInfo: PageInfo!
}

type UserEdge {
  cursor: String!
  node: User
}

type User {
  id: ID!
  name: String!
}

type PageInfo {
  hasNextPage: Boolean!
  hasPreviousPage: Boolean!
}

Errors⌗

Errors are often the most inconsistent parts of GraphQL APIs. There is a good blog post that discusses different approaches to handling errors in GraphQL: https://productionreadygraphql.com/2020-08-01-guide-to-graphql-errors.

A general design is to use ’top-level’ errors for global issues like rate-limiting and syntax errors, and then ’lower-level’ errors for issues like a username being too long. The blog post describes this better than I can. I usually check that errors are treated in a consistent way, with consistent fields.