How a famous fantasy app use Circuit Breaker Pattern?
Learn about circuit breaker pattern in DS + How Dream 11 uses circuit breaker (fantasy sport app) in their services
๐ Hi, this is Venkat, and here with a full issue of the ZenMode Engineer Newsletter. In every issue, I cover one topic explained in simpler terms in areas related to computer systems and tech and beyond.
๐ง Imagine this
The control room buzzed with the frantic rhythm of flashing red lights and alarms blaring. Max, a grizzled veteran engineer at SkyNet, the world's largest drone delivery service, slammed his fist on the console.
"Package delivery network's down again!" he roared, his voice barely audible over the din. "Millions of deliveries stalled mid-air!"
Across the room, Sarah, a brilliant but still green programmer, chewed nervously on her fingernail. "It seems the central server can't handle the surge in holiday orders. The whole network grinds to a halt every time it gets overloaded."
Max sighed, the lines on his face deepening. "We can't keep throwing more servers at the problem. It's like trying to quench a fire with a bucket โ it just buys us a few minutes."
Just then, a young intern, Alex, piped up, his voice barely a whisper. "What if... we could build an automatic safety switch? Like a... circuit breaker?"
Max and Sarah exchanged surprised glances.
The idea was simple yet ingenious. Imagine the network as a complex electrical grid. A circuit breaker, placed strategically, could automatically cut off power to overloaded sections, preventing a complete blackout.
"Interesting," Max mused, stroking his beard. "This circuit breaker could monitor traffic flow. If a specific server gets overloaded, the breaker would shut it off temporarily, diverting deliveries to unaffected areas."
"And once the load subsides?" Sarah chimed in, her eyes sparkling with newfound hope.
"The circuit breaker would slowly re-establish connections, testing the server with minimal traffic. If it holds, the network opens up again. If not, it waits a bit longer before trying again."
The room fell silent, everyone processing the implications. A circuit breaker wouldn't fix the underlying problem, but it could buy them precious time to upgrade their infrastructure and prevent cascading failures that could cripple the entire delivery network.
A slow smile spread across Max's face. "Looks like we have a greenhorn with the winning idea, Sarah. Get him some coffee โ strong coffee. We've got a network to save!"
In todayโs edition, weโre taking a look at Circuit Breaker.
Here is the agenda:
๐ What is a Circuit Breaker?โ Understand more about Circuit breaker
๐ฟ How Does a Circuit Breaker Work?โ Looking into how the circuit breaker works from the inside.
๐งชExamples of Circuit Breakersโ how to implement them using a widely used programming language(java/python)
๐ How Dream11 (fantasy app)๐ uses Circuit Breakersโ lessons learned from the Dream11 circuit breaker technique.
๐ What is a Circuit Breaker?
A circuit breaker is a protective mechanism that automatically disconnects electrical power when it senses a fault or abnormal condition.
Similarly, in software, a circuit breaker is a design pattern that prevents a client from continuously sending requests to a failing service.
The circuit breaker acts as a proxy between the client and server and monitors the health of the service.
If the service starts failing, the circuit breaker trips and stops all incoming requests, allowing the service time to recover.
Once the service has recovered, the circuit breaker allows a limited number of requests before fully reopening the connection.
This process helps ensure that the system remains stable during high failure rates.
๐คBut first... do we need one?
In a distributed environment, calls to remote resources and services can fail due to transient faults, such as slow network connections, timeouts, or the resources being overcommitted or temporarily unavailable.
These faults typically correct themselves after a short period, and a robust cloud application should be prepared to handle them by using a strategy such as the Retry pattern.
However, there can also be situations where faults are due to unanticipated events, and that might take much longer to fix.
These faults can range in severity from a partial loss of connectivity to the complete failure of a service.
In these situations, it might be pointless for an application to retry an operation that is unlikely to succeed, and instead, the application should quickly accept that the operation has failed and handle this failure accordingly.
Additionally, if a service is busy, failure in one part of the system might lead to cascading failures.
Condier an operation that invokes a service could be configured to implement a timeout and reply with a failure message if the service fails to respond within this period. However, this strategy could cause many concurrent requests to the same operation to be blocked until the timeout period expires.
These blocked requests might hold critical system resources such as memory, threads, database connections, and so on. Consequently, these resources could become exhausted, causing failure of other possibly unrelated parts of the system that need to use the same resources. In these situations, it would be preferable for the operation to fail immediately, and only attempt to invoke the service if it's likely to succeed. Note that setting a shorter timeout might help to resolve this problem, but the timeout shouldn't be so short that the operation fails most of the time, even if the request to the service would eventually succeed.
๐ฟHow Does a Circuit Breaker Pattern work?
Circuit breakers typically have three states: closed, open, and half-open.
These states determine whether the circuit breaker should allow traffic to flow or block it.
Let's take a closer look at each state:
๐ซClosed State:
In the closed state, the circuit breaker allows all requests to pass through to the service. During this time, the circuit breaker keeps track of the success rate of those requests.
For instance, let's say we set a threshold of 90% - meaning that if fewer than 90% of the requests are successful over a given period, then the circuit breaker will trip.
For example, suppose our application makes 100 requests per minute to a particular service, and 85 of those requests fail. Since the success rate is less than 90%, the circuit breaker would move from the closed state to the half-open state.
๐ซHalf-Open State:
When the circuit breaker enters the half-open state, it only allows a small percentage of requests to pass through to the service โ perhaps just 10/req per minute. This gives the service a chance to recover without being overwhelmed by too much traffic.
If all of these requests succeed, then the circuit breaker assumes that the service has recovered and moves back to the closed state. However, if even one request fails, then the circuit breaker immediately goes back to the open state. This ensures that the service doesn't receive unnecessary traffic while it's still unstable.
Continuing with the previous example, let's say that after entering the half-open state, only 5 of the next 10 requests fail. In this case, since more than 50% of the requests succeeded, the circuit breaker would assume that the service had recovered and move back to the closed state.
However, if all 10 requests had failed, then the circuit breaker would go straight back to the open state without waiting for additional requests.
๐ซOpen State:
In the open state, the circuit breaker blocks all requests to the service. This effectively isolates the service from its clients, giving it time to recover without being bombarded with new requests.
After a configurable period, the circuit breaker transitions to the half-open state, allowing several requests to pass through. As mentioned earlier, if these requests succeed, then the circuit breaker moves back to the closed state. But if they fail, the circuit breaker stays in the open state for longer.
Going back to the initial example, if 85 out of 100 requests had failed, the circuit breaker would enter the open state. Then, after a configured period of time, it might transition to the half-open state, allowing 10 requests to pass through. If all of these requests succeed, the circuit breaker would move back to the closed state. But if even one request had failed, the circuit breaker would stay in the open state for longer, continuing to block all requests to the service.
Circuit Breaker from scratch
Java code example that showcases basic functionalities of how a circuit breaker class would look like and how the states would be implemented.
1. Closed State:
This is the initial state where the circuit breaker allows calls to the protected operation to proceed normally.
The
call
method checks the state. If it'sCLOSED
, theexecuteWithRetries
method is called.executeWithRetries
handles retries and timeouts for the operation.Inside the loop, successful execution or a timeout exception triggers a return.
Any other exception (
handleCallFailure
is called).In
handleCallFailure
, if the state isCLOSED
, a helper method (incrementFailureCount
) is called (a placeholder for your specific failure-tracking logic).If
incrementFailureCount
returnstrue
(indicating the failure threshold is reached), the state transitions toOPEN
usingtransitionToOpen
.
2. Open State:
This state signifies a period where the circuit breaker completely blocks calls to the protected operation.
The
call
method checks the state. If it'sOPEN
, aCircuitBreakerOpenException
is thrown, rejecting the call.No attempt to execute the operation is made in this state.
The circuit breaker remains open for a pre-defined duration (
openDuration
).After the open duration elapses, the circuit breaker might transition to
HALF_OPEN
usingisReadyForHalfOpen
.
3. Half-Open State:
This is a temporary state where the circuit breaker allows a single attempt to execute the protected operation. It's used to test if the issue that caused failures has been resolved.
The
call
method checks the state. If it'sHALF_OPEN
,executeWithRetries
is called, but only for one attempt (attempt == 0
).If the single attempt succeeds, the circuit breaker transitions back to
CLOSED
after a pre-defined duration (halfOpenDuration
) usingtransitionToClosedAfterDelay
.If the single attempt fails, the circuit breaker remains in
HALF_OPEN
and the exception is thrown. This allows another attempt during the next call (within retry limits).The circuit breaker also transitions back to
OPEN
if the failure threshold is reached again duringHALF_OPEN
(implement your logic inincrementFailureCount
).
๐ฉ๐ฉWhen to use this pattern?
โโUse this pattern:
To prevent an application from invoking a remote service or accessing a shared resource if this operation is highly likely to fail.
โโThis pattern isn't recommended:
For handling access to local private resources in an application, such as in-memory data structure. Using a circuit breaker would add overhead to your system in this environment.
As a substitute for handling exceptions in the business logic of your applications.
โญโญCheckListโญโญ
Before liberally sprinkling circuit breakers overall API calls, consider the following questions:
Will a circuit breaker improve service performance?
โ> Does the service frequently encounter slow responses from dependent downstream services?
โ> Will reducing the requests downstream will help the service recover?
โ> Do we see highly correlated errors?
โ> Can we build some tools to diagnose when the circuit breaker should open and close?
โ> Should the circuit breaker be closed based on several requests that fail, or should it be a function of time?
๐กCan we deploy a circuit breaker?
โ> Do we have any resources to accurately measure the maximum load the downstream dependency service can or cannot handle?
โ> What monitoring mechanism do we need to verify that the circuit breaker is working as expected?
โ> Can we override the circuit breaker in an emergency if it stops working as expected?
๐กCan we bear the cost of circuit breaker maintenance?
โ> Do we have the capacity to re-evaluate the dependencies and re-configure all circuit breaker parameters frequently?
Inefficient circuit breakers could create problems
If the circuit breaker is not tuned and monitored frequently, it can result in the following problems:
โ> Possibility of having less throughput than normal.
โ> Prolongated downtime when the circuit breaker enters the open state, even if the downstream service has recovered.
โ> Incomprehensible request patterns for downstream services.
๐ฒSteps to implement a Circuit Breaker Pattern
To implement a circuit breaker, you typically need to follow these steps:
Define the conditions for triggering the circuit breaker: This could be based on the number of failed requests, the response time of a service, or other metrics that indicate a service is not healthy.
Monitor the health of the service: The circuit breaker should continuously monitor the health of the service by tracking its response time, error rate, and other relevant metrics.
Trip the circuit breaker: If the predefined conditions are met, the circuit breaker should trip and stop all traffic to the service. The circuit breaker should also notify the system so that other services can take appropriate action.
Allow the service to recover: Once the circuit breaker has tripped, it should allow the service some time to recover before attempting to send traffic to it again.
Test the service: After the recovery period, the circuit breaker should send a test request to the service to see if it responds normally. If the service responds successfully, the circuit breaker can allow traffic to resume.
Reset the circuit breaker: If the service continues to respond successfully, the circuit breaker should reset and allow normal traffic to flow.
Letโs understand how it gets implemented in Java using the Resilience4J
library. Resilience4J is a fault tolerance library designed for functional programming that offers higher-order functions (decorators) to enhance any functional interface, lambda expression, or method reference with a Circuit Breaker.
Letโs get to 999 subscribers!!! If you enjoyed this issue, please share it with a friend!
You can also hit the like โค๏ธ button at the bottom of this email or share this post with a friend. It helps!
How does Dream11 implement the circuit breaker pattern?
Dream11 is a popular fantasy sports platform in India that allows users to create virtual teams and compete with each other in various sports events. Dream11 uses circuit breakers to improve its services' resiliency and fault tolerance as a large-scale distributed system.
Keep reading with a 7-day free trial
Subscribe to The ZenMode to keep reading this post and get 7 days of free access to the full post archives.