Setting up efficient LLM applications requires more than just calling an API. In this session, Martin Visser explains how semantic caching can significantly reduce the cost and latency of Large Language Model (LLM) applications by reusing meaningfully similar responses instead of exact matches.
Using Valkey and Redis as vector databases, Martin walks through how embeddings, similarity thresholds, and TTLs work together to cache LLM responses efficiently. Participants will learn about practical architecture decisions, configuration trade-offs, and see a real-world demo showing how semantic caching can cut LLM usage by up to 60% while improving response times from seconds to milliseconds.
- π Date: January 23, 2026
- π€ Speaker: Martin Visser - Valkey and Redis Tech Lead @ Percona
- π₯ Watch the Recording: https://youtu.be/7vdFUJgGOSs
Watch the Presentation!



