System Design: Feature Flag
Design a system to enable and disable features dynamically without deploying code.
👋 Hi, I’m Venkat, and welcome to the latest issue of the ZenMode Engineer Newsletter!
In each issue, I break down a system design topic into simple, easy-to-understand terms. If you’re passionate about learning how to design and build scalable, robust systems, this newsletter is for you. Don’t miss out—subscribe today and start leveling up your architecture skills!
Do you know? 🤔
“Do you know that millions of users are streaming through our new system right now, and they don’t even know it?”
At Netflix, they were rolling out a new video compression codec—an innovation that promised sharper visuals and smoother streaming on slower networks.
But this wasn’t a risky, all-at-once launch. They used feature flags, a tool that allowed them to release the codec to only 10% of users. Their control panel showed real-time data: playback errors, streaming quality, user feedback. Everything looked stable.
“If anything breaks, we toggle it off in seconds.”
No app redeploy, no late-night firefights.
Netflix’s ability to test, monitor, and adapt made innovation seamless, keeping their users obliviously happy.
Netflix uses feature flags extensively to roll out features, conduct A/B testing, and ensure high system reliability
Imagine managing a website or an app that you want to turn on or off certain features (like dark mode or a new recommendation system) without redeploying the code.
To do this, we need to build a Feature Flag System.
Feature Flags Concepts
What is a Feature Flag?
A switch that allows you to turn a feature on or off in your application without deploying new code.
Think of a feature flag system as a light switch panel in your house:
Each switch (feature flag) controls a light (a feature in your app).
You can turn lights on or off as needed without needing to rewire your house (redeploy your code).
Some lights (features) might be dimmable or only available in certain rooms (feature rollout to specific users).
Example: Dark mode can be hidden from users by keeping a "flag" for dark mode off, even if the code is already deployed.
Martin Fowler categorizes feature toggles into four types:
Release Toggles: Allow incomplete features to be shipped to production in a dormant state, enabling trunk-based development and continuous delivery.
Experiment Toggles: Facilitate A/B testing by exposing different user segments to various feature implementations to gather data-driven insights.
Ops Toggles: Provide operational control to enable or disable features in response to system performance or reliability issues.
Permissioning Toggles: Manage feature access for different user groups, such as granting premium features to paying customers.
Key Components of the System
To build a dynamic feature management system, consider the following components:
Feature Flag Control Service: Acts as the control plane, managing all flag configurations. This service should be robust and scalable to handle organizational needs.
Example (Admin Dashboard):
A user interface for developers or admins to turn flags on or off.
Example: A web app where you see a list of features and toggle them.
Database or Data Store: Stores feature flag configurations reliably. Options include SQL databases, NoSQL databases, or key-value stores, depending on scalability and performance requirements.
A database or a config file where the feature flags (on/off settings) are stored.
Examples:
Key-Value Store (e.g., Redis, DynamoDB): Store
feature_dark_mode: true
.SQL or NoSQL Databases.
API Layer: Exposes endpoints for your application to interact with the Feature Flag Control Service, allowing retrieval and management of flag configurations.
The part of your code that reads the flag’s value (on/off) and adjusts behavior.
Example: If
dark_mode = true
, show the app in dark mode.
Feature Flag SDK: Provides an interface for fetching and evaluating feature flags at runtime within your application.
The SDK should handle caching and background updates to minimize latency.
Continuous Update Mechanism: Ensures that feature flag configurations are updated dynamically without requiring application restarts or redeployments.
This can be achieved through mechanisms like long polling, WebSockets, or server-sent events.
Design Process
Here’s how to put it all together:
1. Define the Flags
Each feature gets a unique name.
Decide:
Type of Flag (e.g., toggle, percentage rollout).
Default State (on/off).
Who can see it (all users, specific group, etc.).
Example:
dark_mode:
- type: boolean
- default: false
- audience: premium_users
2. Build a Feature Flag Store
Store flag configurations in a reliable database.
Example:
Use Redis for fast access if you need frequent flag checks.
dark_mode = true new_homepage = false
Use PostgreSQL for storing complex rules.
3. Write an API to Access Flags
Create APIs to fetch flag values in your app:
GET /feature-flags: Returns all flag configurations.
POST /update-flag: Updates a flag’s status.
Basic API Structure
The API will have endpoints for:
Fetching All Flags: Retrieve all feature flags and their current states.
Fetching a Specific Flag: Retrieve the status of a single feature flag by its name.
Updating a Flag: Modify the status or properties of a feature flag.
Adding a New Flag: Create a new feature flag (optional).
Deleting a Flag: Remove a flag from the system (optional).
Below is the flask implementation to understand it.
from flask import Flask, jsonify, request import redis app = Flask(__name__) # Connect to Redis (or use an in-memory dictionary for simplicity) r = redis.Redis(host='localhost', port=6379, decode_responses=True) # Example: Prepopulate some flags default_flags = { "dark_mode": "false", "recommendation_engine": "true" } for key, value in default_flags.items(): r.set(key, value) # Helper function to get all flags def get_all_flags(): return {key.decode(): r.get(key).decode() for key in r.keys("*")} # API Endpoints # 1. Fetch All Flags @app.route('/api/flags', methods=['GET']) def get_flags(): flags = get_all_flags() return jsonify(flags), 200 # 2. Fetch a Specific Flag @app.route('/api/flags/<flag_name>', methods=['GET']) def get_flag(flag_name): value = r.get(flag_name) if value is None: return jsonify({"error": "Flag not found"}), 404 return jsonify({flag_name: value}), 200 # 3. Update a Flag @app.route('/api/flags/<flag_name>', methods=['PUT']) def update_flag(flag_name): data = request.json if 'enabled' not in data: return jsonify({"error": "Missing 'enabled' field"}), 400 r.set(flag_name, str(data['enabled']).lower()) # Ensure 'true/false' strings return jsonify({"message": f"Flag '{flag_name}' updated successfully"}), 200 # 4. Add a New Flag @app.route('/api/flags', methods=['POST']) def create_flag(): data = request.json if 'name' not in data or 'enabled' not in data: return jsonify({"error": "Missing 'name' or 'enabled' field"}), 400 flag_name = data['name'] if r.exists(flag_name): return jsonify({"error": f"Flag '{flag_name}' already exists"}), 400 r.set(flag_name, str(data['enabled']).lower()) return jsonify({"message": f"Flag '{flag_name}' created successfully"}), 201 # 5. Delete a Flag @app.route('/api/flags/<flag_name>', methods=['DELETE']) def delete_flag(flag_name): if not r.exists(flag_name): return jsonify({"error": "Flag not found"}), 404 r.delete(flag_name) return jsonify({"message": f"Flag '{flag_name}' deleted successfully"}), 200 if __name__ == '__main__': app.run(debug=True)
4. Integrate the SDK
The SDK is a small piece of code that acts as the messenger between your app and the Feature Flag Store.
Write a lightweight SDK (library) for your app to:
Fetch Flags: Load all flags at startup or in real-time.
Cache Flags: Store them locally to avoid frequent database hits.
Evaluate Flags: Check if a feature is on or off before executing code.
Example Code:
feature_flags = sdk.get_flags()
if feature_flags['dark_mode']:
enable_dark_mode()
else:
enable_light_mode()
5. Enable Real-Time Updates
Use methods to update flags dynamically:
Polling: The app checks for changes every few seconds.
Push Notifications: The server sends updates (e.g., WebSockets).