API Schemas

Sat Oct 12 2024•5 min read

Overview

Say we have a simple application that manages users.

The frontend displays user information, such as the user’s name and email address. It gets this information from the backend (Users API).

import { useEffect, useState } from "react";
 
interface User {
  userId: string;
  name: string;
  email: string;
}
 
function UserInfo(props: { userId: string }) {
  const [user, setUser] = useState<User>();
 
  useEffect(() => {
    // Get the user's info
    fetch(`https://our-fancy-backend.io/v1/api/users/${props.userId}`)
      .then((response) => response.json())
      .then((data) => {
        setUser(data);
      });
  }, [props.userId]);
 
  if (!user) {
    return <p>Loading...</p>;
  }
 
  return (
    <div>
      <h1>{user.name}</h1>
      <p>{user.email}</p>
    </div>
  );
}

Taking a closer look, we see that the User interface is defined in the frontend codebase. Some questions arise:

How does the frontend team know what fields are available on the User object?
Where is the source of truth? It can’t possibly be the frontend codebase, since the backend controls it.

Loose data contracts

The backend team at some point had to have communicated the User schema to the frontend team. Depending on the organization, this could have been in the form of a simple Slack message, an email, a Confluence page, etc.

This established a loose contract between the backend and frontend teams - the backend team promises to return a User object in this shape. I use the term loose because the contract hinges on human commitment, rather than being guaranteed by technology.

Any system that relies on human commitment (i.e. manual work) is prone to errors, and adds significant overhead to the SDLC. This introduces the worst kind of coupling - the kind that is implicit and obfuscated.

The pain is amplified when there are multiple downstream teams.

Formal data contracts

API schemas are a way to formalize the data contract between teams.

The team defines their API schema in a machine-readable format (e.g. JSON Schema, OpenAPI, Protobuf, GraphQL).
Downstream teams can generate client code from the schema and use it to interact with the API.

The data contract is now explicit and enforced by technology.

Examples

OpenAPI is a specification for building RESTful APIs. An OpenAPI document defines the endpoints and request/response schemas.

In our example, we have an OpenAPI document - a file named openapi.yaml - that defines the User API schema:

openapi: 3.0.0
info:
  title: User API
  description: API for retrieving user information by userId
  version: 1.0.0
 
paths:
  /user/{userId}:
    get:
      operationId: getUser
      summary: Retrieve a user by ID
      description: Fetches a user's information based on their unique userId
      parameters:
        - name: userId
          in: path
          required: true
          description: Unique identifier for the user
          schema:
            type: integer
      responses:
        '200':
          description: Successfully retrieved user
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/User'
 
components:
  schemas:
    User:
      type: object
      properties:
        userId:
          type: integer
          description: Unique identifier for the user
        name:
          type: string
          description: User's name
        email:
          type: string
          description: User's email address
      required:
        - userId
        - name
        - email

With this, we can generate client code in multiple languages to interact with the User API. Some examples include:

Language	Library
Typescript	@hey-api/openapi-ts
Go	oapi-codegen
Java	openapi-generator
Python	openapi-python-client

Updating our Frontend

import { useEffect, useState } from "react";
import { type User, UserClient } from "@api/users";  // Generated code
 
const client = new UserClient("https://our-fancy-backend.io/v1");
 
function UserInfo(props: { userId: string }) {
  const [user, setUser] = useState<User>();
 
  useEffect(() => {
    client
      .getUser({ userId: props.userId })
      .then(setUser);
  }, [props.userId]);
 
  if (!user) {
    return <p>Loading...</p>;
  }
 
  return (
    <div>
      <h1>{user.name}</h1>
      <p>{user.email}</p>
    </div>
  );
}

The frontend team now uses the generated client code to interact with the User API. The schema is now enforced by technology, and the frontend team can be confident that the User object will always have the fields they expect.

Schema drift is also easier to detect, catchable at compile time, and can be automatically flagged by CI/CD pipelines.

For example,

Backend team accidentally makes a backwards-incompatible change to the User schema, such as removing the email field.
The client is regenerated to reflect the new schema.
Line 22 above will cause a Typescript compilation error, since the User object no longer has an email field.

The proper way of removing a field from the schema is to deprecate it first, and then remove it in a future version. For example, OpenAPI supports marking fields as deprecated. The generated client code can then emit warnings wherever deprecated fields are used. Like magic! 🤯

TLDR

The developer experience improvements of API schemas are significant:

Reduced manual communication of data contracts.
Elimination of manual syncing of data contracts between teams.
Automatic detection of schema drift, resulting in fewer runtime errors.
Enables API exploration; developers can easily see 1 source of truth that defines the API and the data it provides.

Use API schemas to formalize data contracts between teams / systems. The toolchains around API schemas help eliminate all the manual work and pain brought about by loose data contracts.

@eli-lim

in

profile

hello@elilim.dev