Background
So Ashok, can you tell me how will you design a Least Frequently Used cache that have operations of O(1)?
I know, I know, we already have an old post about the LRU cache but it’s been 3 years now and this time the question for designing the LFU cache came as the second question in the first round of the interview. I was interviewing for the role of Staff Software Engineer in the Fintech domain and the company is very popular especially for the mutual funds!
LFU Cache
The LFU cache uses a caching strategy that removes items based on how often they are accessed. When the cache reaches its maximum capacity and needs to evict an item, it removes the entry with the lowest access frequency. This approach is based on the idea that data accessed less frequently is less likely to be needed again in the near future.
Internals
In an LFU cache, each item has a frequency count that shows how many times it has been accessed. Every time an item is read or updated, its count is increased. When the cache needs to remove an item, it looks for the one with the lowest frequency count and removes it.
If more than one item has the same lowest count, the cache uses a tie-breaker. It removes the item that was used least recently (yes, you hear that correctly, LFU uses LRU 😆).
I have wrote more about caching in the older blog.
Naive Implementation
The naive LFU implementation is the simplest way to evict the item that has been used the fewest times, without optimizing for speed. We will use a HashTable which will store key-value pair where the key will be string and the value will be a pair of the integer value and the frequency count of the that key.
Each time the key is accessed of updated, the frequency of that key will be incremented.
1 |
|
Observations
The problem is that when the cache is full and we want to remove a key from the cache we will scan all the key-value pairs and remove the key with the minimum frequency. This will be O(n) operation but our aim is to achieve O(1), a constant time operation.
Also this time I have added the negative capacity check and throwing exception which I missed in the older writing.
Optimized Solution?
An optimized LFU cache achieves constant time, O(1) time for both get and put. We have already discussed about the optimized solution. Here’s are the things we need:
- Key Table: HashTable with (K, N) key-node for the string keys and the Node will contain value, frequency and iterator to position in frequency list.
- Frequency Table: HashTable with (F, L) frequency-list for the frequency of each values and a list (doubly list) holding the keys recently accessed to allow eviction via LRU.
- Tracker Variable: We will use to store the minimum frequency as an integer. It will always points to the lowest frequency currently in cache which will allow instant eviction (will act as key for the Frequency Table).
Implementation
Let’s start with the class design for the LFU cache.
Class Design
As explained above, we need 2 hash tables and also a Node structure. Let’s add them:
1 |
|
capacityis used for storing and comparing the max capacity of the cachemin_frequencywill be unsigned integer to store the minimum frequency that is currently in the cachecacheis the hash table of<K, Node>which stores the node against a keyfrequency_tableis the another hash table used to store the <F, L> list of string keys against each integer frequency.Nodeis the structure that will hold the value provided against the key, the frequency of that key and the iterator(simply address) of the key node from thelistof thefrequency_table.
Putting Keys
When adding keys to an LFU cache (with LRU as a tie-breaker), we need to handle three main cases:
Key already present in the cache
- Check if the key exists in the cache.
- If present:
- Increment its frequency since the key is accessed.
- Remove it from the list corresponding to its old frequency in
frequency_table. - If the old frequency list becomes empty:
- Remove the entry for that frequency from
frequency_table. - Update
min_frequencyonly if the old frequency was equal tomin_frequency.
- Remove the entry for that frequency from
- Add the key to the front of the list corresponding to its new frequency in
frequency_table.- Note: If the list for the new frequency does not exist, it is automatically created.
Cache is full
- If the cache has reached its capacity:
- Use
min_frequencyto find the frequency list with the least frequently used keys. - Remove the key at the back of the list (LRU as tie-breaker for the same frequency keys).
- Remove this key from the
cache. - If the list for
min_frequencybecomes empty after eviction:- Remove the frequency entry from
frequency_table.
- Remove the frequency entry from
min_frequencywill be reset when a new key is inserted.
- Use
Key is new and cache is not full
- Set
min_frequencyto1. - Insert the new key at the front of the list corresponding to frequency
1infrequency_table.- If the list does not exist, it is created automatically.
- Add the key to the
cachewith its value, frequency1, and an iterator pointing to its position in the frequency list.
Let’s have a look at the code:
1 | void put(const std::string& key, int val) |
Notes:
- Updating
min_frequencyis conditional and depends on the emptiness of the old frequency list. - Using
push_frontensures that among keys with the same frequency, the most recently used key is at the front, which preserves LRU order for tie-breaking. - All frequency lists are created lazily so there is no need to pre-initialize them in the constructor in case of C++.
Getting Values
When retrieving a key from the cache, we need to handle two main cases:
Key is present in the cache
- Check if the key exists in the
cache. - If present:
- Retrieve the
Nodefor the key. - Increment its frequency since the key has been accessed.
- Remove it from the list corresponding to its old frequency in
frequency_table.- If the old frequency list becomes empty:
- Remove the frequency entry from
frequency_table. - Update
min_frequencyonly if the old frequency was equal tomin_frequency.
- Remove the frequency entry from
- If the old frequency list becomes empty:
- Add the key to the front of the list corresponding to its new frequency.
- If the list for the new frequency does not exist, it is created automatically.
- Update the iterator in the
Nodeto point to its new position in the list. - Return the value of the key.
- Retrieve the
Key is not present in the cache
- Return a special value indicating a miss, I am using
-INT_MIN.
Let’s look at the code:
1 | int get(const std::string& key) |
Notes:
- Accessing a key via
get()always increases its frequency, just like input(). - The front of each frequency list represents the most recently used key among that frequency.
- This ensures that LRU is used as a tie-breaker when multiple keys have the same frequency.
min_frequencyis only updated when the old frequency list becomes empty and it was the currentmin_frequency.
The Full Code
Okay, here is the full C++11 code for the lfu_cache.
1 |
|
Reflections
Frankly though I was aware of LRU cache since a long time I was not exposed to the implementation of the LFU cache and particularly the point of doing the constant operations involving two HashTables. Initially I drafted the naive implementation and got some hints but I performed poorly even after the hints provided. Overall this went as a bouncer and I exhausted the 40 mins I was provided.
I hope this will prove helpful to you and if you are actively interviewing then all the best. You can find the implementation on GitHub as well.