sharethis:
Caches
Caches are a simple way to improve the performance of an application that reads data from a "slow" source such as files on disk or rows of data from a database table, but may need to re-read the same data multiple times. The idea is simple: instead of discarding the data after using it, keep it in memory so that it doesn't have to be re-read later.
For example, a simple way to organize this for files might be to create a HashMap
that maps the file names to objects containing the file data. When your application needs a particular file it first checks the map to see if it already has the data; if it doesn't, it reads the file and places it in the map in case it needs it again later.
LRU caches
The problem with this simple method is that your application could use a vast amount of memory. Once read in, a file is in the cache for the lifetime of the application whether or not it is ever used again. For an application such as a server that is intended to stay up and running for weeks or months at a time, this is probably not acceptable.
What's needed is a cache that automatically discards entries that haven't been accessed for some time so as to limit the amount of space being used. Such as cache is called an LRU Cache. LRU stands for "Least Recently Used", which refers to the policy of removing the oldest, or least-recently used entries to make space for new data.
LRU caches have a maximum number of data items that they will hold and these items are usually arranged in a list. When an item is added to the cache, and every time it is accessed after that, it is automatically moved to the head of the list. If the cache is full and a slot is required for a new item, the cache makes room by discarding the entry at the tail of the list - the least-recently used item.
The java.util.LinkedHashMap
classLinkedHashMap
is a subclass of java.util.HashMap
that adds a couple of useful features. One is that by default the iteration order reflects the order that entries are added to the map, rather than the rather haphazard order of a HashMap.
The other feature - the one we're interested in here - is that LinkedHashMap
has an option to use access-order instead of insertion order, and includes a way to remove the least-recently accessed entries automatically. This makes it well suited for creating LRU caches.
Creating a cache class
The simplest way to create a cache using LinkedHashMap
is to extend it. Here is an example:
public class Cache extends LinkedHashMap
{
private final int capacity;
public Cache(int capacity)
{
super(capacity + 1, 1.1f, true);
this.capacity = capacity;
}
protected boolean removeEldestEntry(Entry eldest)
{
return size() > capacity;
}
}
Note that the constructor takes as an argument the maximum number of entries we want a Cache object to hold. The superclass constructor has three arguments: the initial capacity of the map, the load factor, and a boolean argument that tells the
LinkedHashMap
constructor to keep entries in access order instead of the default insertion order. (See the java.util.HashMap
API documentation for a description of the initial capacity and load factor.) In this case we set the initial capacity to be one more than our required cache size - this is because new entries are added before any are removed, so for example if we want to hold 100 entries the cache will actually contain 101 for an instant when new data is added. Setting the load factor to 1.1 ensures that the rehashing mechanism of the underlying HashMap
class isn't triggered - this isn't a vital point but helps a little with efficiency at run-time.The
removeEldestEntry()
method overrides a default implementation in LinkedHashMap
and is where we determine the policy for removing the oldest entry. In this case, we return true
when the cache has more entries than our defined capacity.Using the cache
Using the cache is simple - just use a suitable key to access the cache; if the data is in the cache we can read it from there. If it's not there we pull it from the slow medium and add it to the cache so that it's in place if needed later:
Cache cache = new Cache(100);
// ...
String filename = "file.txt";
String filedata = (String) cache
.get(filename);
if (filedata == null)
{
// Read filedata from file system here...
cache.put(filename, filedata);
}
How does it perform?
Our basic implementation should work just fine but we may need to know how effective it is in a given application. One measure of how well a cache is performing is Hit Rate, which tells us how many cache accesses are "hits" - that is, how many times the required data was found in the cache for a given number of accesses. Hit rate is usually expressed as a percentage - a hit rate above 80% is usually pretty good.
We can add a few things to our Cache class to make it possible to monitor performance so that we can tune the cache by setting an optimum size. We need a counter for the number of accesses and another for the number of hits. We will need getter methods to allow us to retrieve those values after the cache has been running for a time, and finally we must override the
get()
method to update the counts. Here is our Cache class complete with these new members:
public class Cache extends LinkedHashMap
{
private final int capacity;
private long accessCount = 0;
private long hitCount = 0;
public Cache(int capacity)
{
super(capacity + 1, 1.1f, true);
this.capacity = capacity;
}
public Object get(Object key)
{
accessCount++;
if (containsKey(key))
{
hitCount++;
}
Object value = super.get(key);
return value;
}
protected boolean removeEldestEntry(Entry eldest)
{
return size() > capacity;
}
public long getAccessCount()
{
return accessCount;
}
public long getHitCount()
{
return hitCount;
}
}
One point to note is that calls to the
containsKey()
method don't update the access counts; we may want to override that method also, so that the hit rate isn't skewed by code such as this:
if (cache.containsKey(filename))
{
filedata = (String) cache
.get(filename);
}
else
{
// Read filedata from file system here...
cache.put(filename, filedata);
}
bravo!!