Skip to main content
Glama

Don't Use Large Strings as Cache Keys

Written by on .

markdown
node-js
cache

  1. The Issue
    1. Lessons Learned
      1. The Fix

        This is related to an older post The 50MB Markdown Files That Broke Our Server. It's also something that was refactored a while ago. However, I was reminded of the issue as I was cleaning up some of our technical notes.

        The setup is that I identified that we are spending a lot of CPU time parsing markdown to AST.

        const parseMarkdown = (markdown: string): AST => { return ...; };

        I needed a quick fix, so I reached for the quickest fix I could think:

        import QuickLRU from 'quick-lru'; const cache = new QuickLRU<string, AST>({ maxSize: 1000, }); const parseMarkdown = (markdown: string): AST => { const cached = cache.get(markdown); if (cached) { return cached; } const ast = ...; cache.set(markdown, ast); return ast; };

        This overall reduced our CPU usage. Profiling revealed that we were no longer spending a lot of time in parseMarkdown. However, a new issue emerged – suddenly our GC (garbage collection) started to spike. Also, I was confused why quick-lru is rather slow; I even started benchmarking quick-lru and looking if something is off with their implementation.

        The Issue

        If you haven't noticed yet, the issue is that I used the entire markdown as the cache key. With maxSize: 1000, I could theoretically have had 50GB+ of just keys sitting in memory.

        Just to illustrate the point, here is a quick benchmark:

        import QuickLRU from 'quick-lru'; const cache = new QuickLRU<string, string>({ maxSize: 100, }); const smallKey = 'hello'; const largeKey = 'x'.repeat(10_000_000); // 10MB string const iterations = 1_000; // Warm up cache.set(smallKey, 'value'); cache.set(largeKey, 'value'); // Test small key console.time('small key get'); for (let i = 0; i < iterations; i++) { cache.get(smallKey); } console.timeEnd('small key get'); // Test large key - same reference console.time('large key get (same ref)'); for (let i = 0; i < iterations; i++) { cache.get(largeKey); } console.timeEnd('large key get (same ref)'); // Test large key - new string each time (realistic scenario) const largeKeys = Array.from({ length: iterations }, () => 'x'.repeat(10_000_000), ); console.time('large key get (different refs)'); for (let i = 0; i < iterations; i++) { cache.get(largeKeys[i]); } console.timeEnd('large key get (different refs)');

        Scenario

        Time

        Why

        small key

        0.101ms

        Tiny string comparison

        large key (same ref)

        0.045ms

        V8 short-circuits with reference equality - doesn't even look at content

        large key (new string)

        1,275ms

        Must compare 10M characters × 1000 iterations

        NOTE

        Large key (same ref) is actually faster than small key because V8 checks reference equality first - if it's the exact same object in memory, it skips character comparison entirely.

        Lessons Learned

        Don't use large strings as cache keys. Hash them.

        The Fix

        We used xxhash to hash the markdown content before using it as a cache key. Unlike cryptographic hashes like SHA-256, xxhash is designed for speed - it's roughly 10-50x faster while still providing enough collision resistance for cache key purposes.

        We also moved the cache from in-memory to Redis. This way, the AST cache persists across server restarts, can be shared across multiple instances, and doesn't contribute to Node.js memory pressure and GC spikes.

        Written by punkpeye (@punkpeye)