Designing LFU Cache

Background

So Ashok, can you tell me how will you design a Least Frequently Used cache that have operations of O(1)?

I know, I know, we already have an old post about the LRU cache but it’s been 3 years now and this time the question for designing the LFU cache came as the second question in the first round of the interview. I was interviewing for the role of Staff Software Engineer in the Fintech domain and the company is very popular especially for the mutual funds!

LFU Cache

The LFU cache uses a caching strategy that removes items based on how often they are accessed. When the cache reaches its maximum capacity and needs to evict an item, it removes the entry with the lowest access frequency. This approach is based on the idea that data accessed less frequently is less likely to be needed again in the near future.

Internals

In an LFU cache, each item has a frequency count that shows how many times it has been accessed. Every time an item is read or updated, its count is increased. When the cache needs to remove an item, it looks for the one with the lowest frequency count and removes it.

If more than one item has the same lowest count, the cache uses a tie-breaker. It removes the item that was used least recently (yes, you hear that correctly, LFU uses LRU 😆).

I have wrote more about caching in the older blog.

Naive Implementation

The naive LFU implementation is the simplest way to evict the item that has been used the fewest times, without optimizing for speed. We will use a HashTable which will store key-value pair where the key will be string and the value will be a pair of the integer value and the frequency count of the that key.

Each time the key is accessed of updated, the frequency of that key will be incremented.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
#include <iostream>
#include<unordered_map>
#include<string>
#include<climits>
#include <stdexcept>

class naive_lfu_cache {
private:
size_t capacity;
std::unordered_map <std::string, std::pair<int, int>> cache;

std::string get_evict_key()
{
std::string min_freq_key;
int min_freq = INT_MAX;

for (const auto& item : cache) {
const std::string& key = item.first;
int freq = item.second.second;

if (freq < min_freq) {
min_freq = freq;
min_freq_key = key;
}
}

return min_freq_key;
}

public:
naive_lfu_cache(size_t cap = 3) : capacity(cap)
{
if (cap == 0) {
throw std::invalid_argument("capacity must be positive");
}
}

int get(const std::string& key)
{
// check if key present
auto it = cache.find(key);
if (it != cache.end()) {
// key is present - update frequency and return value
it->second.second += 1;
return it->second.first;
}
// when key is not present
return -INT_MIN;
}

void put(const std::string& key, int val)
{
// check if key present
auto it = cache.find(key);

if (it != cache.end()) {
// key is present - update frequency and value
it->second.second += 1;
it->second.first = val;

return;
}

if (cache.size() >= capacity) {
// we need to evict key
std::string evict_key = get_evict_key();
cache.erase(evict_key);
cache[key] = { val, 1};
} else {
// add new key
cache[key] = { val, 1};
}
}
};

Observations

The problem is that when the cache is full and we want to remove a key from the cache we will scan all the key-value pairs and remove the key with the minimum frequency. This will be O(n) operation but our aim is to achieve O(1), a constant time operation.

Also this time I have added the negative capacity check and throwing exception which I missed in the older writing.

Optimized Solution?

An optimized LFU cache achieves constant time, O(1) time for both get and put. We have already discussed about the optimized solution. Here’s are the things we need:

  • Key Table: HashTable with (K, N) key-node for the string keys and the Node will contain value, frequency and iterator to position in frequency list.
  • Frequency Table: HashTable with (F, L) frequency-list for the frequency of each values and a list (doubly list) holding the keys recently accessed to allow eviction via LRU.
  • Tracker Variable: We will use to store the minimum frequency as an integer. It will always points to the lowest frequency currently in cache which will allow instant eviction (will act as key for the Frequency Table).

Implementation

Let’s start with the class design for the LFU cache.

Class Design

As explained above, we need 2 hash tables and also a Node structure. Let’s add them:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <iostream>
#include<unordered_map>
#include<string>
#include<climits>
#include <stdexcept>
#include<list>

struct Node {
int value;
int frequency;
std::list<std::string>::iterator it;

Node(int val, int freq, std::list<std::string>::iterator addr)
: value(val), frequency(freq), it(addr) {}
};

class lfu_cache {
private:
size_t capacity;
uint min_frequency;
std::unordered_map <std::string, Node> cache;
std::unordered_map<int, std::list<std::string>> frequency_table;

public:
// check the value of cap provided, also set min_frequency to 0
lfu_cache(size_t cap = 3) : capacity(cap), min_frequency(0) {
if (cap == 0) {
throw std::invalid_argument("capacity must be positive");
}
}
};
  • capacity is used for storing and comparing the max capacity of the cache
  • min_frequency will be unsigned integer to store the minimum frequency that is currently in the cache
  • cache is the hash table of <K, Node> which stores the node against a key
  • frequency_table is the another hash table used to store the <F, L> list of string keys against each integer frequency.
  • Node is the structure that will hold the value provided against the key, the frequency of that key and the iterator(simply address) of the key node from the list of the frequency_table.

Putting Keys

When adding keys to an LFU cache (with LRU as a tie-breaker), we need to handle three main cases:

Key already present in the cache

  • Check if the key exists in the cache.
  • If present:
    • Increment its frequency since the key is accessed.
    • Remove it from the list corresponding to its old frequency in frequency_table.
    • If the old frequency list becomes empty:
      • Remove the entry for that frequency from frequency_table.
      • Update min_frequency only if the old frequency was equal to min_frequency.
    • Add the key to the front of the list corresponding to its new frequency in frequency_table.
      • Note: If the list for the new frequency does not exist, it is automatically created.

Cache is full

  • If the cache has reached its capacity:
    • Use min_frequency to find the frequency list with the least frequently used keys.
    • Remove the key at the back of the list (LRU as tie-breaker for the same frequency keys).
    • Remove this key from the cache.
    • If the list for min_frequency becomes empty after eviction:
      • Remove the frequency entry from frequency_table.
    • min_frequency will be reset when a new key is inserted.

Key is new and cache is not full

  • Set min_frequency to 1.
  • Insert the new key at the front of the list corresponding to frequency 1 in frequency_table.
    • If the list does not exist, it is created automatically.
  • Add the key to the cache with its value, frequency 1, and an iterator pointing to its position in the frequency list.

Let’s have a look at the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
void put(const std::string& key, int val) 
{
auto it = cache.find(key);
// the key already exists
if (it != cache.end()) {
int old_freq = it->second.frequency;
int new_freq = old_freq + 1;

// remove key node from old frequency list
frequency_table[old_freq].erase(it->second.it);
// if the list for a feq becomes empty
if (frequency_table[old_freq].empty()) {
// remove the feq entry from the frequency_table
frequency_table.erase(old_freq);
// reset the min_frequency
if (min_frequency == old_freq)
min_frequency = new_freq;
}

// add to new frequency list for the new freq
frequency_table[new_freq].push_front(key);
it->second = Node(val, new_freq, frequency_table[new_freq].begin());
return;
}

// we need eviction
if (cache.size() >= capacity) {
// get the list with the min_frequency
auto &list = frequency_table[min_frequency];

// get the key having the min_frequency

std::string evict_key = list.back();

// but following LRU and removing the last key
// if multiple keys are present for min_frequency
list.pop_back();

// if the list for a feq becomes empty
// remove the feq entry from the frequency_table
if (list.empty()) frequency_table.erase(min_frequency);

// remove from cache
cache.erase(evict_key);
}

// add new key
min_frequency = 1;
frequency_table[1].push_front(key);
cache[key] = {Node(val, 1, frequency_table[1].begin())};
}

Notes:

  • Updating min_frequency is conditional and depends on the emptiness of the old frequency list.
  • Using push_front ensures that among keys with the same frequency, the most recently used key is at the front, which preserves LRU order for tie-breaking.
  • All frequency lists are created lazily so there is no need to pre-initialize them in the constructor in case of C++.

Getting Values

When retrieving a key from the cache, we need to handle two main cases:

Key is present in the cache

  • Check if the key exists in the cache.
  • If present:
    • Retrieve the Node for the key.
    • Increment its frequency since the key has been accessed.
    • Remove it from the list corresponding to its old frequency in frequency_table.
      • If the old frequency list becomes empty:
        • Remove the frequency entry from frequency_table.
        • Update min_frequency only if the old frequency was equal to min_frequency.
    • Add the key to the front of the list corresponding to its new frequency.
      • If the list for the new frequency does not exist, it is created automatically.
    • Update the iterator in the Node to point to its new position in the list.
    • Return the value of the key.

Key is not present in the cache

  • Return a special value indicating a miss, I am using -INT_MIN.

Let’s look at the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
int get(const std::string& key) 
{
// check if key is present
auto it = cache.find(key);

// key not found in cache
if (it == cache.end()) return -INT_MIN;

Node &node = it->second;
int old_freq = node.frequency;
int new_freq = old_freq + 1;

// remove key from old frequency list
frequency_table[old_freq].erase(node.it);

// if this was the last key with min_frequency, update it
if (frequency_table[old_freq].empty()) {
frequency_table.erase(old_freq);
if (min_frequency == old_freq) {
min_frequency = new_freq;
}
}

// add key to new frequency list
frequency_table[new_freq].push_front(key);
node.frequency = new_freq;
node.it = frequency_table[new_freq].begin();

return node.value;
}

Notes:

  • Accessing a key via get() always increases its frequency, just like in put().
  • The front of each frequency list represents the most recently used key among that frequency.
  • This ensures that LRU is used as a tie-breaker when multiple keys have the same frequency.
  • min_frequency is only updated when the old frequency list becomes empty and it was the current min_frequency.

The Full Code

Okay, here is the full C++11 code for the lfu_cache.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
#include <iostream>
#include<unordered_map>
#include<string>
#include<climits>
#include <stdexcept>
#include<list>

struct Node {
int value;
int frequency;
std::list<std::string>::iterator it;

Node(int val, int freq, std::list<std::string>::iterator addr)
: value(val), frequency(freq), it(addr) {}
};

class lfu_cache {
private:
size_t capacity;
uint min_frequency;
std::unordered_map <std::string, Node> cache;
std::unordered_map<int, std::list<std::string>> frequency_table;

public:
// check the value of cap provided, also set min_frequency to 0
lfu_cache(size_t cap = 3) : capacity(cap), min_frequency(0) {
if (cap == 0) {
throw std::invalid_argument("capacity must be positive");
}
}

int get(const std::string& key)
{
// check if key is present
auto it = cache.find(key);

// key not found in cache
if (it == cache.end()) return -INT_MIN;

Node &node = it->second;
int old_freq = node.frequency;
int new_freq = old_freq + 1;

// remove key from old frequency list
frequency_table[old_freq].erase(node.it);

// if this was the last key with min_frequency, update it
if (frequency_table[old_freq].empty()) {
frequency_table.erase(old_freq);
if (min_frequency == old_freq) {
min_frequency = new_freq;
}
}

// add key to new frequency list
frequency_table[new_freq].push_front(key);
node.frequency = new_freq;
node.it = frequency_table[new_freq].begin();

return node.value;
}

void put(const std::string& key, int val)
{
auto it = cache.find(key);
// the key already exists
if (it != cache.end()) {
int old_freq = it->second.frequency;
int new_freq = old_freq + 1;

// remove key node from old frequency list
frequency_table[old_freq].erase(it->second.it);
// if the list for a feq becomes empty
if (frequency_table[old_freq].empty()) {
// remove the feq entry from the frequency_table
frequency_table.erase(old_freq);
// reset the min_frequency
if (min_frequency == old_freq)
min_frequency = new_freq;
}

// add to new frequency list for the new freq
frequency_table[new_freq].push_front(key);
it->second = Node(val, new_freq, frequency_table[new_freq].begin());
return;
}

// we need eviction
if (cache.size() >= capacity) {
// get the list with the min_frequency
auto &list = frequency_table[min_frequency];

// get the key having the min_frequency

std::string evict_key = list.back();

// but following LRU and removing the last key
// if multiple keys are present for min_frequency
list.pop_back();

// if the list for a feq becomes empty
// remove the feq entry from the frequency_table
if (list.empty()) frequency_table.erase(min_frequency);

// remove from cache
cache.erase(evict_key);
}

// add new key
min_frequency = 1;
frequency_table[1].push_front(key);
cache[key] = {Node(val, 1, frequency_table[1].begin())};
}
};

Reflections

Frankly though I was aware of LRU cache since a long time I was not exposed to the implementation of the LFU cache and particularly the point of doing the constant operations involving two HashTables. Initially I drafted the naive implementation and got some hints but I performed poorly even after the hints provided. Overall this went as a bouncer and I exhausted the 40 mins I was provided.

I hope this will prove helpful to you and if you are actively interviewing then all the best. You can find the implementation on GitHub as well.