If you're thinking about a server-side caching solution, it's likely that you've heard about Redis or Memcached.
- noSQL key-value in-memory data storage systems
- open source
- used to speed up applications
- supported by the major cloud service providers
So, what makes them apart? That’s exactly what I’ll address in this article. Based on a project we developed for a client, I'm going to cover how they handle data storage, scalability and which one performs better considering certain scenarios. But first, let's start with the basics.
What is Redis
Redis, which means Remote Dictionary Server, was created in 2009 by Salvatore Sanfilippo, to improve the scalability of the web log analyzer that his Italian startup was building. The first prototype was written in Tcl and later transcripted to C. When Sanfilippo decided to open source the project it then started to get some traction. Giants like GitHub and Instagram were some of the first companies to adopt it.
What is Memcached
Memcached was created a bit earlier, in 2003, by Brad Fitzpatrick for his LiveJournal website. It was initially developed in Perl and then translated into C. It is used by some of the biggest companies out there such as Facebook, Youtube and Twitter.
Data storage: Redis vs Memcached
How Redis stores data
Redis has five data types:
- String: a text value
- Hash: A hash table of string keys and values
- List: A list of string values
- Set: A non-repeating list of string values
- Sorted Set: A non-repeating list of string values ordered by a score value
Redis supports data type operations. This means you can access or change parts of a data object without having to load the entire object to an applicational level, modify it and then re-store the updated version. Redis uses an encapsulated version of the malloc/free memory management, being a simpler approach compared to the Memcached Slab mechanism, as I'm going to explain below. Redis supports keys with a maximum size of 512MB and also values up to 512MB. This limit is per element on aggregate data types (Lists and Sets).
How Memcached stores data
Unlike Redis, Memcached has no data types, as it stores strings indexed by a string key. When compared to Redis, it uses less overhead memory. Also, it is limited by the amount of memory of its machine and, if full, it will start to purge values on a least recently used order. It uses an allocation mechanism called Slab, which segments the allocated memory into chunks of different sizes, and stores key-value data records of the corresponding size. This solves the memory fragmentation problem. Memcached supports keys with a maximum size of 250B and values up to 1MB. But note that’s just the default and can be changed at startup by upping the max slab size
Data type advantages
Let's take the simple example of using a cache to store a user session object.
If we use Memcached to modify a single field in the object, the string has to be loaded, deserialized, the object field edited, serialized and stored. If we use Redis, the hash data type can be used. It gives access to each field in the hash individually so any CRUD (create, read, update, delete) operation can be executed on each one of them. This allows to mitigate the need to do it at an applicational level. It leads to efficiency, once it requires less I/O operations.
How Redis and Memcached scale
How Redis scales
Since Redis is predominantly single-threaded and has native support for clustering, it grows well horizontally.
Redis clustering works with a master/slave architecture. For every master node there are two slave nodes for redundancy, therefore, if the master fails, then the system automatically promotes one of the slaves as the new master. This kind of scalability comes with the disadvantage of upkeep complexity. It's harder to maintain several nodes running synchronously than a single one.
How Memcached scales
Memcached is easily scaled vertically, as it is multithreaded. The only requirements are to give it more cores and more memory. It can also be scaled horizontally, on the client side, by the implementation of a distributed algorithm. This comes with the disadvantage of being more complex to implement while Redis has it out of the box.
When deciding whether to use Redis or Memcached a major difference between these two is data persistence.
While Redis is an in-memory (mostly) data store and it is not volatile, Memcached is an in-memory cache and it is volatile.
Also Memcached is limited to the LRU (least recently used) eviction policy whilst Redis supports six different policies:
- No eviction returning an error the memory limit is reached.
- All keys LRU removing keys by the least recently used first
- Volatile LRU removing keys, that have an expiration time set, by the least recently used first.
- All keys random removing keys randomly
- Volatile random removing keys, that have an expiration time set, randomly
- Volatile TTL removing keys, that have an expiration time set, by the shortest time to live first.
How Redis achieves persistence
Redis supports persistence, thus it’s called a data store, in two different ways:
RDB snapshot: Is a point-in-time snapshot of all your dataset, that is stored in a file in disk and performed at specified intervals. This way, the dataset can be restored on startup.
AOF log: Is an Append Only File log of all the write commands performed in the Redis server. This file is also stored in disk, so by re-running all the commands in their order, a dataset can be restored on startup.
These files are handled by a child process and this is a key factor in deciding which kind of persistence to use.
If the dataset stored in Redis is too big, the RDB file will take some time to be created, which has an impact on the response time. On the other hand, it will be faster to load on boot up compared to the AOF log.
The AOF log is better if data loss is not acceptable at all, as it can be updated at every command. It also has no corruption issues since it's an append-only file. However, it can grow much larger than an RDB snapshot.
Use Cases and Performance
Redis Use Cases:
1. Session Caching in Web Applications: Redis can store user session data for web applications, which is particularly useful for platforms with a large number of concurrent users. For example, an e-commerce site can use Redis to quickly retrieve user sessions without hitting the database, thus speeding up the user experience during login and checkout processes.
2. Real-time Analytics: Redis's data structures like sorted sets and hashes can be used to implement real-time analytics dashboards. For instance, a social media platform might use Redis to track and display the number of active users or posts trending in real-time.
3. Message Queuing and Chat Applications: Redis's pub/sub capabilities enable real-time message queuing systems. It can be used to build chat applications where messages need to be delivered instantly to various subscribers.
4. Leaderboards and Counting: Games and social platforms often use Redis to manage leaderboards due to its ability to handle high write and read rates. For example, an online gaming platform can use Redis to update and display player rankings in real-time.
5. Full-Page Cache (FPC): Redis can be used as an FPC to store the output of database queries and reduce the load on the database. For instance, a content management system (CMS) can use Redis to cache pages and serve them quickly without regenerating them on each request.
Memcached Use Cases:
1. Simple String Caching: Memcached is ideal for small and medium-sized websites that need a simple caching layer for strings. For example, a blog site can use it to cache the results of database queries for blog posts and serve them faster to visitors.
2. Database Query Result Caching: Memcached can be used to cache the results of commonly accessed database queries, reducing the database load. An online catalog system could use it to cache product listings and details.
3. Caching HTML Fragments: Websites with static HTML fragments that are expensive to generate can use Memcached to store these fragments. For example, a news website might cache article snippets on the homepage to improve load times.
4. API Rate Limiting: Memcached's atomic increment and decrement operations can be used to implement API rate limiting. A RESTful API could use Memcached to track the number of requests from a user within a given timeframe, preventing abuse.
5. Session Store: Similar to Redis, Memcached can be used to store user sessions, although without persistence. This is suitable for applications where session data is transient and can be lost without significant impact, such as in stateless microservices.
Which is better: Redis or Memcached
It certainly depends on the requirements.
Redis is surely more flexible and powerful, but Memcached does serve some purposes very well and in some cases achieves better performance. By being multi-threaded it has advantages, especially when working with big data.
Redis supports data operations thanks to its data types, which can speed up case scenarios by reducing the network I/O counts and data sizes. These data operations are as heavy as a get or set.
At Imaginary Cloud we have used both in many different client’s projects. In one that I was involved in, we had to choose between the two options. At first we went with Memcached based on its simplicity, ease of use, easy setup and we simply needed a cache so persistency wasn't a requirement. Although, after some testing, we decided to swap to Redis due to the advantages of having data types.
In this project, the data type operations were an advantage for the kind of data that was going to be stored. Also, Redis provides a command to search for keys that match a pattern along many other useful commands to deal with keys. This was something really useful to us and a key point in deciding to migrate to Redis.
Concerning the migration, it was very easy to perform as Redis supports most of the commands that Memcached does. If we had inverted the route and decided to migrate from Redis to Memcached, it would have been way harder since Memcached has no data types. Any Redis data type command would have been translated to many commands, along with some data processing in between them to achieve the same result.
When it comes to making a decision, we cannot really say that one is better than the other, as it all depends on the project requirements. However, based on our experience, it's important to consider its pros and cons right from the beginning to avoid changes and migrations during the project.