Distributed locks are primitives that help you control access to the resource that is shared between different processes. In modern cloud-based applications processes are usually distributed between multiple physical machines.
Redis provides a great opportunities for implementing distributed locking. It’s fast (in cloud you can archive sub-millisecond response time), and it provides atomic execution guarantees for LUA scripts.
The most popular Java Redis client library that implements distributed locking algorithms is Redisson (lock docs).
Mainly Redisson lock implementations rely on Redis’s pubsub mechanism. Basically, their algorithm can be described as following:
- Try aquire a lock by executing LUA script
- If lock is aquired — return
- Else subscribe to the changes to lock object using pubsub mechanism
- When the thread gets notified that lock is released, return to step 1 if acquisition process didn’t timeout
Redis pubsub implementation
This approach has one major disadvantage — current Redis pubsub implementation in cluster mode is inefficient. According to docs (link) all pubsub messages are distributed to all cluster nodes. It means that if you have a cluster of 20 shards, and each shard consists of master and slave, message is distributed to 40 different places. If you aquire and release many locks in a short period of time, for example, 1.000.000 locks per minute, inefficient message distribution can saturate your node’s network. One of symptoms of this problem is unexpected raise of cpu to 100%.
Naive pubsub implementation also makes your solution’s Redis not scalable horizontally. If you add more nodes to the cluster, each node will get less hash slots and thus less objects. But still each node will receive all pubsub messages and distribute it’s own messages to all other nodes.
If we have
m messages published in a time period, and a cluster has
n nodes, then each node will:
m/nmessages, and distribute them to all other nodes. Messages sent during a time period —
m(n-1)/n = m — m/n. This number grows when
ngrows, so each node’s output traffic will grow if we add new nodes to the cluster
- Reseive message from other nodes. Their amout is total messages minus messages published by this node, or,
m — m/n. So, the input traffic also grows when we add more nodes to the cluster
RedissonSpinLock is a reentrant implementation of
RLock interface. It eliminates the
RedissonLock’s major disadvantage. It doesn’t rely on pubsub mechanism. It’s algorithm can be described as following:
- Try aquire the lock
- If lock is aquired — return
- Else wait for some time
- Return to step 1 if lock aquisition didn’t timeout
- As mentioned earlier, it doesn’t depend on pubsub mechanism, so the solution that uses spinlock is scalable horizontally
- In large clusters (20+ nodes) it can make each node’s network input and output 90% lower
- Thread that tries to aquire the lock sleeps for full backoff period despite the fact that lock could already be released. Aquisition time during contention is expected to be higher
- Under contention thread frequently wakes up, makes failing aquire attempt and falls asleep again. The amount of Redis requests will be higher.
Backoff controlls the amount of time, that lock waits between two
tryAquire calls. It’s required to prevent massive spam of failing requests to the Redis.
There are two backoffs provided by default:
ExponentialBackOff— back off period is exponentially increased every time up to
maxDelay. To prevent massive amount of simultanious requests from different threads some random value is added to the backoff period.
ConstantBackOff— returns the same back off period each time. In production environment should be used carefully.
Other back off algorithms can be easily implemented by extending appropriate classes.
RedissonSpinLock is a new distributed locking algorithm that eliminates some problems of other
RLock implementations and can be used in huge Redis clusters. Maybe it doesn’t fit your needs, but you can still test it.