Caching Strategies Decoded

01Why we cache

Caching trades freshness for speed. The same data, served from a faster location, with the understanding that "faster" usually means "slightly older." Every cache decision is a freshness-vs-latency tradeoff. If you don't understand that explicitly, you'll get bugs that look like magic.

Caching also trades correctness for cost. If your application doesn't need the absolute latest value, serving a 10-second-old value from a cache costs 1000x less than recomputing it.

02The four layers

A typical request hits up to four cache layers before reaching original storage:

Browser cache — closest to user, most variable
CDN edge cache — geographically distributed, controlled by you
Application cache — Redis or Memcached, in front of your database
Database cache — the database's own buffer pool

Each has its own invalidation model, its own scaling characteristics, and its own failure modes. Understanding all four lets you decide where to invest.

03Browser / HTTP cache

Controlled by HTTP response headers. The browser will reuse cached responses according to what you tell it.

✓ headers that matter

# Static assets with content hash in URL — cache forever
Cache-Control: public, max-age=31536000, immutable

# HTML pages — revalidate every request
Cache-Control: no-cache

# API responses — short cache, with ETag for revalidation
Cache-Control: private, max-age=60
ETag: "v1.2.3-abc"

Three concepts to internalize:

max-age: seconds the response can be reused without checking the server.
immutable: tell the browser "never even ask if this changed." Only safe when the URL contains a content hash.
ETag: opaque identifier. Browser sends If-None-Match on revalidation; server returns 304 Not Modified if unchanged. Saves bandwidth, not latency.

04CDN edge cache

Sits between users and your origin servers. Caches responses at geographically-distributed edge nodes. Cloudflare, Fastly, CloudFront, Bunny — same fundamental model.

Two crucial concepts:

Cache key: what URL + headers combination uniquely identifies a cached response? By default, just the URL. Add the Vary header to include headers (like Accept-Encoding or Accept-Language) in the cache key.
Purge: CDNs can be told to invalidate specific URLs or all URLs. Purging by URL is instant; purging by tag (where you've marked URLs with custom headers) is more flexible.

Surrogate keys. The advanced CDN pattern: tag responses with Surrogate-Key: user-123 article-456 and later purge by tag. When user 123 updates their profile, purge anywhere that tag appears — homepage, sidebar, search results — all in one call. Fastly pioneered this; most modern CDNs now support it.

05Application cache (Redis / Memcached)

In-memory key-value store. Application reads from cache; on miss, reads from database and populates the cache.

✓ cache-aside pattern

async function getUser(id: string): Promise<User> {
  // 1. Try cache
  const cached = await redis.get(`user:${id}`);
  if (cached) return JSON.parse(cached);

  // 2. Miss — query database
  const user = await db.users.findById(id);
  if (!user) throw new Error('Not found');

  // 3. Populate cache for next time (60-second TTL)
  await redis.setex(`user:${id}`, 60, JSON.stringify(user));

  return user;
}

This pattern is called cache-aside. The application is responsible for cache management. Simple, predictable, easy to debug.

Other patterns:

Write-through: writes go to cache and database simultaneously. Reads always hit cache. Higher write cost, lower read latency.
Write-behind: writes go to cache immediately, flushed to database asynchronously. Fastest writes, but data loss risk if cache crashes.
Read-through: cache library handles miss logic transparently. Cleaner code, less control.

06The five invalidation patterns

Where caching goes wrong. The patterns, in increasing complexity:

1. TTL-based. Cache for N seconds, then expire. Simplest. Acceptable staleness window is the TTL. Use when slight staleness is fine.

2. Write-through invalidation. When data changes, delete the cache key. Next read repopulates. Works well for single-key invalidation.

3. Tag-based invalidation. Store a list of tags with each cache entry. On update, purge all entries with the affected tag. Works for "this update affects many cached entries."

4. Versioned keys. Include a version in the cache key (user:123:v5). On update, increment the version. Old entries become unreachable and expire naturally. Eventually consistent.

5. Event-driven invalidation. Database change → event bus → cache invalidator. Most complex, most powerful. Required when cache spans multiple services.

07Cache stampede — the failure mode you'll hit

Popular item's cache entry expires. 1000 concurrent requests all miss simultaneously. All 1000 hit the database, all 1000 compute the same expensive operation, all 1000 try to write back to the cache. Database falls over.

Two solutions, depending on your tolerance for staleness:

Mutex / lock: when a cache miss occurs, acquire a lock. Only the lock-holder computes and writes. Others wait for the lock to release, then read the now-populated cache. Adds complexity.
Probabilistic early expiration: before the entry actually expires, a small percentage of reads recompute and refresh. Spreads the work over time so no single thundering herd hits.

✓ probabilistic early expiration

async function getCached<T>(key: string, ttl: number, compute: () => Promise<T>) {
  const entry = await redis.get(key);
  if (entry) {
    const { value, expiresAt } = JSON.parse(entry);
    const timeLeft = expiresAt - Date.now();
    const probability = timeLeft / (ttl * 1000);

    // As we approach expiration, increasing chance we recompute
    if (Math.random() < probability) return value;
  }
  return recomputeAndCache(key, ttl, compute);
}

08Negative caching — the trap

Caching "not found" responses requires care. If you cache null with the same TTL as found values, then:

User requests /users/abc — doesn't exist. null cached for 60s.
5 seconds later, the user signs up with ID abc.
For the next 55 seconds, requests return null even though the user exists.

Fix: cache negative responses with much shorter TTLs (5-10 seconds). Or, on write, explicitly invalidate any negative cache entries for that key.

09When NOT to cache

Caching adds complexity. It's not free. Skip caching when:

The query is already fast (under 10ms). Cache overhead can exceed the gain.
The data must be perfectly fresh. Real-time pricing, inventory at checkout, security-sensitive checks. Even 1-second staleness can be a bug.
Each request is unique. Per-user, per-query, no overlap. Cache hit rate would be ~0%.
You haven't measured. Cache because profiling showed a bottleneck. Not because "caching is good."

10Observability — measure the cache

You can't tune what you don't measure. The metrics that matter:

Hit rate. Percentage of reads served from cache. Should be 80%+ for things you bother caching. Below 50% suggests you're caching the wrong things.
Latency p99. Cache should be much faster than original. If p99 is climbing, you have eviction pressure or memory issues.
Evictions. Cache running out of memory means thrashing. Increase memory or shorten TTLs.
Origin requests. The whole point of caching. Lower is better.

∞The discipline

Caching is one of the most leveraged engineering investments — and one of the most dangerous. The teams that win with caching treat it as infrastructure, not as an afterthought. They have metrics on hit rate. They have a documented invalidation strategy. They know exactly what happens when the cache layer goes down (the answer should be "everything still works, just slower").

Cache early in the request path. Invalidate aggressively. Measure constantly. Trust nothing.