When Caching Made Things Worse

This is related to an older post The 50MB Markdown Files That Broke Our Server. It's also something that was refactored a while ago. However, I was reminded of the issue as I was cleaning up some of our technical notes.

The setup is that I identified that we are spending a lot of CPU time parsing markdown to AST.

const parseMarkdown = (markdown: string): AST => {
  return ...;
};

I needed a quick fix, so I reached for the quickest fix I could think:

import QuickLRU from 'quick-lru';

const cache = new QuickLRU<string, AST>({
  maxSize: 1000,
});

const parseMarkdown = (markdown: string): AST => {
  const cached = cache.get(markdown);

  if (cached) {
    return cached;
  }

  const ast = ...;

  cache.set(markdown, ast);

  return ast;
};

This overall reduced our CPU usage. Profiling revealed that we were no longer spending a lot of time in parseMarkdown. However, a new issue emerged – suddenly our GC (garbage collection) started to spike. Also, I was confused why quick-lru is rather slow; I even started benchmarking quick-lru and looking if something is off with their implementation.

The Issue

If you haven't noticed yet, the issue is that I used the entire markdown as the cache key. With maxSize: 1000, I could theoretically have had 50GB+ of just keys sitting in memory.

Just to illustrate the point, here is a quick benchmark:

import QuickLRU from 'quick-lru';

const cache = new QuickLRU<string, string>({
  maxSize: 100,
});

const smallKey = 'hello';
const largeKey = 'x'.repeat(10_000_000); // 10MB string

const iterations = 1_000;

// Warm up
cache.set(smallKey, 'value');
cache.set(largeKey, 'value');

// Test small key
console.time('small key get');

for (let i = 0; i < iterations; i++) {
  cache.get(smallKey);
}

console.timeEnd('small key get');

// Test large key - same reference
console.time('large key get (same ref)');

for (let i = 0; i < iterations; i++) {
  cache.get(largeKey);
}

console.timeEnd('large key get (same ref)');

// Test large key - new string each time (realistic scenario)
const largeKeys = Array.from({ length: iterations }, () =>
  'x'.repeat(10_000_000),
);

console.time('large key get (different refs)');

for (let i = 0; i < iterations; i++) {
  cache.get(largeKeys[i]);
}

console.timeEnd('large key get (different refs)');

Scenario	Time	Why
small key	0.101ms	Tiny string comparison
large key (same ref)	0.045ms	V8 short-circuits with reference equality - doesn't even look at content
large key (new string)	1,275ms	Must compare 10M characters × 1000 iterations

NOTE

Large key (same ref) is actually faster than small key because V8 checks reference equality first - if it's the exact same object in memory, it skips character comparison entirely.

Lessons Learned

Don't use large strings as cache keys. Hash them.

The Fix

We used xxhash to hash the markdown content before using it as a cache key. Unlike cryptographic hashes like SHA-256, xxhash is designed for speed - it's roughly 10-50x faster while still providing enough collision resistance for cache key purposes.

We also moved the cache from in-memory to Redis. This way, the AST cache persists across server restarts, can be shared across multiple instances, and doesn't contribute to Node.js memory pressure and GC spikes.