API Schemas
Overview
Say we have a simple application that manages users.
The frontend displays user information, such as the user’s name and email address. It gets this information from the backend (Users API).
import { useEffect, useState } from "react";
interface User {
userId: string;
name: string;
email: string;
}
function UserInfo(props: { userId: string }) {
const [user, setUser] = useState<User>();
useEffect(() => {
// Get the user's info
fetch(`https://our-fancy-backend.io/v1/api/users/${props.userId}`)
.then((response) => response.json())
.then((data) => {
setUser(data);
});
}, [props.userId]);
if (!user) {
return <p>Loading...</p>;
}
return (
<div>
<h1>{user.name}</h1>
<p>{user.email}</p>
</div>
);
}
Taking a closer look, we see that the User
interface is defined in the frontend codebase. Some questions arise:
- How does the frontend team know what fields are available on the
User
object? - Where is the source of truth? It can’t possibly be the frontend codebase, since the backend controls it.
Loose data contracts
The backend team at some point had to have communicated the User
schema to the frontend team. Depending on the
organization, this could have been in the form of a simple Slack message, an email, a Confluence page, etc.
This established a loose contract between the backend and frontend teams - the backend team promises to return a User
object in this shape. I use the term loose because the contract hinges on human commitment, rather than being
guaranteed by technology.
Any system that relies on human commitment (i.e. manual work) is prone to errors, and adds significant overhead to the SDLC. This introduces the worst kind of coupling - the kind that is implicit and obfuscated.
The pain is amplified when there are multiple downstream teams.
Formal data contracts
API schemas are a way to formalize the data contract between teams.
- The team defines their API schema in a machine-readable format (e.g. JSON Schema, OpenAPI, Protobuf, GraphQL).
- Downstream teams can generate client code from the schema and use it to interact with the API.
The data contract is now explicit and enforced by technology.
Examples
OpenAPI is a specification for building RESTful APIs. An OpenAPI document defines the endpoints and request/response schemas.
In our example, we have an OpenAPI document - a file named openapi.yaml
- that defines the User API schema:
openapi: 3.0.0
info:
title: User API
description: API for retrieving user information by userId
version: 1.0.0
paths:
/user/{userId}:
get:
operationId: getUser
summary: Retrieve a user by ID
description: Fetches a user's information based on their unique userId
parameters:
- name: userId
in: path
required: true
description: Unique identifier for the user
schema:
type: integer
responses:
'200':
description: Successfully retrieved user
content:
application/json:
schema:
$ref: '#/components/schemas/User'
components:
schemas:
User:
type: object
properties:
userId:
type: integer
description: Unique identifier for the user
name:
type: string
description: User's name
email:
type: string
description: User's email address
required:
- userId
- name
- email
With this, we can generate client code in multiple languages to interact with the User API. Some examples include:
Language | Library |
---|---|
Typescript | @hey-api/openapi-ts |
Go | oapi-codegen |
Java | openapi-generator |
Python | openapi-python-client |
Updating our Frontend
import { useEffect, useState } from "react";
import { type User, UserClient } from "@api/users"; // Generated code
const client = new UserClient("https://our-fancy-backend.io/v1");
function UserInfo(props: { userId: string }) {
const [user, setUser] = useState<User>();
useEffect(() => {
client
.getUser({ userId: props.userId })
.then(setUser);
}, [props.userId]);
if (!user) {
return <p>Loading...</p>;
}
return (
<div>
<h1>{user.name}</h1>
<p>{user.email}</p>
</div>
);
}
The frontend team now uses the generated client code to interact with the User API. The schema is now enforced by
technology, and the frontend team can be confident that the User
object will always have the fields they expect.
Schema drift is also easier to detect, catchable at compile time, and can be automatically flagged by CI/CD pipelines.
For example,
- Backend team accidentally makes a backwards-incompatible change to the
User
schema, such as removing theemail
field. - The client is regenerated to reflect the new schema.
- Line
22
above will cause a Typescript compilation error, since theUser
object no longer has anemail
field.
The proper way of removing a field from the schema is to deprecate it first, and then remove it in a future version.
For example, OpenAPI supports marking fields as deprecated
. The generated client code can then emit warnings wherever
deprecated fields are used. Like magic! 🤯
TLDR
The developer experience improvements of API schemas are significant:
- Reduced manual communication of data contracts.
- Elimination of manual syncing of data contracts between teams.
- Automatic detection of schema drift, resulting in fewer runtime errors.
- Enables API exploration; developers can easily see 1 source of truth that defines the API and the data it provides.
Use API schemas to formalize data contracts between teams / systems. The toolchains around API schemas help eliminate all the manual work and pain brought about by loose data contracts.