System Design Interview : A Step by Step Guide

Learn the framework and process to answer any system design questions.

Understanding System Design Interviews

The System Design interview is often the most crucial evaluation if you're applying for senior engineering roles. In my opinion, it is also the most relevant evaluation of the candidate based on their past skills and the need of the job. There is no right or wrong answer in a system design interview. It is mostly about comparing different options, their trade offs and making an informed decision to have the highest impact to the business.

Generally system design interviews are for 45 minutes but the first 5 minutes is for introduction and the last 5 minutes is for interviewees to give a chance to ask questions. So essentially you have 30 minutes to give a signal to the interviewer that you have the necessary skills to understand ambiguous requirements, break them down into actionable items, prioritize and use the technical knowledge you have gained in the last several years to build and scale the services.

System design interview starts with the interviewer asking an open ended question often in the form of one liner such as Design an e-commerce website.

Step 1: Functional Requirements

The first thing you want to do as an interviewee is to ask clarifying questions. Avoid jumping straight into solutions; first, ensure you fully understand the problem even if you have worked in similar systems before.

For e.g when you're asked to design an e-commerce website, you may want to ask questions like:

Make sure to include any assumptions and risks you are making for e.g a dependency outside of your control such as another team delivering an api that your system consumes.

Regardless of the size of the company, time and resources are limited so prioritizing business requirements and clearly enumerating the features to deliver in each release of MVPs demonstrates your level at Senior.

For the same example, to design an e-commerce website, product browsing, cart management and secure checkout could be core features as part of functional requirements.

Non Functional Requirements

Next is discussing the non functional requirements. Most important NFRs are scalability, availability and consistency.

In order to understand them you are expected to ask different metrics like number of daily active users. You may have to make some assumptions esp. if the interviewer does not give you data points. Like assuming that out of all users visiting the website only half of them buy the product.

Once you have those data points, you can quickly do back of the envelope calculation to come up with throughput or query per second and further breakdown by read and write traffic.

These are critical to ensure the system can handle real-world user demands. For an e-commerce platform, you'll also want to consider high availability. Aim for at least 99.9% uptime to avoid losing customers.

Consistency is another important NFRs for an ecommerce website. Maintaining data consistency, especially with inventory counts, is crucial.

Addressing these non-functional requirements shows interviewers that you're thinking beyond just basic functionality. And especially if you are applying for companies like Google, Amazon, Microsoft, Uber etc. scaling is very important and you need to demonstrate that you have the skills to build systems at large scale.

These NFRs are the basis of discussion during deep dive. So for now we will move to the next step.

Step 2: APIs and Data Modeling

When you have reached this step it is expected that interviewee and interviewer have agreed on both functional and nonfunctional requirements.

APIs

Next is listing the interface or APIs that the system needs to support. Usually these APIs are one to one mapping between what you have identified under functional requirements.

For e.g. for an ecommerce website, you will need to have a search api that allows users to look-up for products and shopping cart api to buy the product.

Although you can use any protocol to define the API, due to the widely accepted practices of RESTful it is common to use these conventions. This means using proper HTTP methods like GET, POST, PUT, and DELETE as well as meaningful Http Status codes. Also, pay attention to your request and response structures.

You may want to briefly talk about authorization, rate limiting, idempotency, etc. to give a signal to the interviewer that you have hands-on experience on API implementation.

Data Modeling

Data Modeling is crucial for designing how your system will handle and store data. Let's break this down into three levels:

  1. Conceptual Model: This is a high-level overview—think of it as a map. Identify key entities like Users, Products, Orders, Carts, Payments etc. for an ecommerce website.
  2. Logical Model: Here, you'll define the relationships between those entities. For example, a User can have multiple Orders, and each Order can have multiple Products.
  3. Physical Model: Finally, this is where you choose specific databases and technologies. Should you use SQL or NoSQL? Maybe a mix of both? For product catalogs, you might use a NoSQL database like MongoDB for flexibility. For transactions, a relational database like MySQL ensures ACID compliance.

The decision on what database technology to use depends upon the non functional requirements and nature of the product. Often if you need to scale very high like ecommerce website in large geographic regions NoSQL is a better option but comes with its own challenges. SQL database provides ACID (atomicity, consistency, isolation and durability) by design but has limitations.

Schema design often depends upon data storage technology. If you are using SQL or relational databases, you will have more normalized data while NoSQL favors denormalization.

Hence, Data modeling is all about making sure your system can handle data efficiently and at scale.

Step 3: High Level Design

High Level Design is the core of the system design interview. When you have reached this stage, you have clearly listed the functional requirements, non functional requirements, make an estimation of storage and compute required, list down APIs to build and data modeling and storage.

Now at this step first starts with simple service oriented architecture. Usually, the first version has an API Proxy at the front, a couple of services based on the APIs discussed and a database.

As you discuss the first version of the design with the interviewer you will identify the bottlenecks. And that is where it is time to switch gear and scale the systems.

Cache

One often common practice to scale the system is introducing cache. If a system is read heavy and you are hitting the database for every user request, it is common to introduce a cache such as Redis and instead fetch the data from there. It is important to talk about the eviction policy such as Least Recently Used (LRU) when using cache.

CDN

If a system is delivering a static content such as an image, video etc. you can suggest to use CDNs like akamai to deliver it for low latency and closer to the user geographic region.

It also provides redundancy across multiple locations, handles failover automatically and continues serving cached content even if origin is down. Some CDNs also provide security features such as DDoS protection and Web Application Firewall.

Vertical and Horizontal Scaling

Vertical and horizontal scaling are two often very common techniques to handle increasing load.

Vertical and Horizontal Scaling

Vertical scaling is also known as scale up. When more memory or CPU is added to an existing server or database it is able to increase load.

Vertical scaling is simpler to implement with no architectural changes. But it comes with its own limitation of hardware as well as increase in cost. It also increases the chances of a single point of failure and more downtime during upgrades.

Horizontal scaling is also known as scale out. It is adding more nodes or machines to the system to distribute load across machines.

Horizontal scaling provides better fault tolerance and no downtime during upgrades. It is also more cost effective than vertical scaling. However, it comes with its own challenges with managing distributed systems for data consistency, load balancing and session management.

Database sharding

Database sharding is a horizontal partitioning strategy for scaling databases. When data size exceeds single server capacity or need to distribute geographic load this strategy is often a common practice.

Range based, hash based and geographic region are common sharding strategies.

However, it also comes with its own challenges such as increased infrastructure complexity, data management issues, increase in operations cost and data aggregation.

Wrap up

After you have discussed different strategies to scale the system it is now time to wrap up the design.

Summarize your design by reiterating key points like scalability, reliability, and performance. Make sure to mention any features or improvements you'd add if you had more time. For example, adding a machine learning model for better search product recommendations etc.

This is the most common template or framework to follow in System Design. However depending upon the role like networking, infrastructure, security, machine learning, you will have to adjust your response. But the framework remains the same so that you can focus on the problem and be able to systematically approach the question.