ZenMode

ZenMode

Share this post

ZenMode
ZenMode
System Design: Feature Flag

System Design: Feature Flag

Design a system to enable and disable features dynamically without deploying code.

venkat's avatar
venkat
Jan 24, 2025
∙ Paid
1

Share this post

ZenMode
ZenMode
System Design: Feature Flag
2
Share

👋 Hi, I’m Venkat, and welcome to the latest issue of the ZenMode Engineer Newsletter!
In each issue, I break down a system design topic into simple, easy-to-understand terms. If you’re passionate about learning how to design and build scalable, robust systems, this newsletter is for you. Don’t miss out—subscribe today and start leveling up your architecture skills!


Do you know? 🤔

“Do you know that millions of users are streaming through our new system right now, and they don’t even know it?”

At Netflix, they were rolling out a new video compression codec—an innovation that promised sharper visuals and smoother streaming on slower networks.

But this wasn’t a risky, all-at-once launch. They used feature flags, a tool that allowed them to release the codec to only 10% of users. Their control panel showed real-time data: playback errors, streaming quality, user feedback. Everything looked stable.

“If anything breaks, we toggle it off in seconds.”

No app redeploy, no late-night firefights.

Netflix’s ability to test, monitor, and adapt made innovation seamless, keeping their users obliviously happy.

Netflix uses feature flags extensively to roll out features, conduct A/B testing, and ensure high system reliability

Subscribe Now


Imagine managing a website or an app that you want to turn on or off certain features (like dark mode or a new recommendation system) without redeploying the code.

toggle switch micro interaction

To do this, we need to build a Feature Flag System.

Feature Flags Concepts

What is a Feature Flag?

A switch that allows you to turn a feature on or off in your application without deploying new code.

Think of a feature flag system as a light switch panel in your house:

  • Each switch (feature flag) controls a light (a feature in your app).

  • You can turn lights on or off as needed without needing to rewire your house (redeploy your code).

  • Some lights (features) might be dimmable or only available in certain rooms (feature rollout to specific users).

Example: Dark mode can be hidden from users by keeping a "flag" for dark mode off, even if the code is already deployed.

Feature Flags (Flow)

Martin Fowler categorizes feature toggles into four types:

  • Release Toggles: Allow incomplete features to be shipped to production in a dormant state, enabling trunk-based development and continuous delivery.

  • Experiment Toggles: Facilitate A/B testing by exposing different user segments to various feature implementations to gather data-driven insights.

  • Ops Toggles: Provide operational control to enable or disable features in response to system performance or reliability issues.

  • Permissioning Toggles: Manage feature access for different user groups, such as granting premium features to paying customers.

    https://martinfowler.com/articles/feature-toggles.html

Key Components of the System

To build a dynamic feature management system, consider the following components:

Feature Flag Control Service: Acts as the control plane, managing all flag configurations. This service should be robust and scalable to handle organizational needs.

  • Example (Admin Dashboard):

    • A user interface for developers or admins to turn flags on or off.

    • Example: A web app where you see a list of features and toggle them.

Database or Data Store: Stores feature flag configurations reliably. Options include SQL databases, NoSQL databases, or key-value stores, depending on scalability and performance requirements.

A database or a config file where the feature flags (on/off settings) are stored.

Examples:

Key-Value Store (e.g., Redis, DynamoDB): Store feature_dark_mode: true.

SQL or NoSQL Databases.

API Layer: Exposes endpoints for your application to interact with the Feature Flag Control Service, allowing retrieval and management of flag configurations.

The part of your code that reads the flag’s value (on/off) and adjusts behavior.

Example: If dark_mode = true, show the app in dark mode.

Feature Flag SDK: Provides an interface for fetching and evaluating feature flags at runtime within your application.

The SDK should handle caching and background updates to minimize latency.

Continuous Update Mechanism: Ensures that feature flag configurations are updated dynamically without requiring application restarts or redeployments.

This can be achieved through mechanisms like long polling, WebSockets, or server-sent events.


Design Process

Here’s how to put it all together:

1. Define the Flags

  • Each feature gets a unique name.

  • Decide:

    • Type of Flag (e.g., toggle, percentage rollout).

    • Default State (on/off).

    • Who can see it (all users, specific group, etc.).

Example:

dark_mode: 
- type: boolean 
- default: false 
- audience: premium_users

2. Build a Feature Flag Store

  • Store flag configurations in a reliable database.

  • Example:

    • Use Redis for fast access if you need frequent flag checks.

      dark_mode = true
      new_homepage = false
    • Use PostgreSQL for storing complex rules.

3. Write an API to Access Flags

  • Create APIs to fetch flag values in your app:

    • GET /feature-flags: Returns all flag configurations.

    • POST /update-flag: Updates a flag’s status.

    Basic API Structure

    The API will have endpoints for:

    1. Fetching All Flags: Retrieve all feature flags and their current states.

    2. Fetching a Specific Flag: Retrieve the status of a single feature flag by its name.

    3. Updating a Flag: Modify the status or properties of a feature flag.

    4. Adding a New Flag: Create a new feature flag (optional).

    5. Deleting a Flag: Remove a flag from the system (optional).

    Below is the flask implementation to understand it.

    from flask import Flask, jsonify, request
    import redis
    
    app = Flask(__name__)
    
    # Connect to Redis (or use an in-memory dictionary for simplicity)
    r = redis.Redis(host='localhost', port=6379, decode_responses=True)
    
    # Example: Prepopulate some flags
    default_flags = {
        "dark_mode": "false",
        "recommendation_engine": "true"
    }
    for key, value in default_flags.items():
        r.set(key, value)
    
    # Helper function to get all flags
    def get_all_flags():
        return {key.decode(): r.get(key).decode() for key in r.keys("*")}
    
    # API Endpoints
    
    # 1. Fetch All Flags
    @app.route('/api/flags', methods=['GET'])
    def get_flags():
        flags = get_all_flags()
        return jsonify(flags), 200
    
    # 2. Fetch a Specific Flag
    @app.route('/api/flags/<flag_name>', methods=['GET'])
    def get_flag(flag_name):
        value = r.get(flag_name)
        if value is None:
            return jsonify({"error": "Flag not found"}), 404
        return jsonify({flag_name: value}), 200
    
    # 3. Update a Flag
    @app.route('/api/flags/<flag_name>', methods=['PUT'])
    def update_flag(flag_name):
        data = request.json
        if 'enabled' not in data:
            return jsonify({"error": "Missing 'enabled' field"}), 400
        
        r.set(flag_name, str(data['enabled']).lower())  # Ensure 'true/false' strings
        return jsonify({"message": f"Flag '{flag_name}' updated successfully"}), 200
    
    # 4. Add a New Flag
    @app.route('/api/flags', methods=['POST'])
    def create_flag():
        data = request.json
        if 'name' not in data or 'enabled' not in data:
            return jsonify({"error": "Missing 'name' or 'enabled' field"}), 400
    
        flag_name = data['name']
        if r.exists(flag_name):
            return jsonify({"error": f"Flag '{flag_name}' already exists"}), 400
    
        r.set(flag_name, str(data['enabled']).lower())
        return jsonify({"message": f"Flag '{flag_name}' created successfully"}), 201
    
    # 5. Delete a Flag
    @app.route('/api/flags/<flag_name>', methods=['DELETE'])
    def delete_flag(flag_name):
        if not r.exists(flag_name):
            return jsonify({"error": "Flag not found"}), 404
    
        r.delete(flag_name)
        return jsonify({"message": f"Flag '{flag_name}' deleted successfully"}), 200
    
    if __name__ == '__main__':
        app.run(debug=True)

4. Integrate the SDK

The SDK is a small piece of code that acts as the messenger between your app and the Feature Flag Store.

  • Write a lightweight SDK (library) for your app to:

    • Fetch Flags: Load all flags at startup or in real-time.

    • Cache Flags: Store them locally to avoid frequent database hits.

    • Evaluate Flags: Check if a feature is on or off before executing code.

Example Code:

feature_flags = sdk.get_flags()

if feature_flags['dark_mode']:
    enable_dark_mode()
else:
    enable_light_mode()

5. Enable Real-Time Updates

  • Use methods to update flags dynamically:

    • Polling: The app checks for changes every few seconds.

    • Push Notifications: The server sends updates (e.g., WebSockets).


USECASE: Facebook’s "Dark Mode" Rollout

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 venkat
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share