Design NearBy Friends Service
Designing a scalable backend system for `Nearby Friends` mobile app
Nearby friends service identifies and list active friends within a defined radius of your location. This service is similar to the proximity service we discussed in our last article, with the key difference being that in the ‘Nearby Friends’ service, all entities (i.e., your friends) have dynamic locations that change over time. In contrast, the proximity service involved businesses with fixed locations, allowing users to see nearby shops. However, in the ‘Nearby Friends’ service, only the user’s friends within the specified radius will be visible.
Let’s understand the problem and establish design scope.
Problem
We are trying to design a service which can achieve the following:
- Whenever user opens the app, a list of nearby active friends should be visible (`active` means the other users should also have the app open).
- This service should be scalable and return results with low latency.
- In the nearby friends list, distance from user should also be visible.
Assumptions
Below assumptions were made while designing the service:
- The app will have total of 1 billion registered users but only 10% of them are actively using the app.
- Each user can have maximum of 1000 friends.
- Each user has 400 friends on average.
- We will be storing the user location history data for analytics.
- Location update will be received every 30 sec from user mobile.
- If location update for any user is not received by system for 10 minutes, they will be marked as inactive.
- Friends within 5km range will be considered nearby and distance is calculated straight line.
- Each page displays 20 results and load more nearby friends on request.
- Each location update request data is 20 bytes.
- Average online time for a user is 1 hour in a day.
Functional Requirements
- User should see a list of nearby active friends whenever they open the app with the distance and timestamp of location updated.
- Location of friends should be updated every 30 seconds.
Non-Functional Requirements
- User only sees their friends for whom they have accepted the request.
- User should have the option to disable the location sharing while running the app.
- Location updates from friends should be received with low latency.
- System should be reliable and eventually consistent, few seconds delay in receiving the location data in different replicas is acceptable.
Scale Estimation
There are 1 billion users registered on the system and only 10% of them use the app daily. We can further assume 10% of DAU are online at the same time.
Daily active users(DAU) = 10% of 1 billion = 100 million
Concurrent users = 10% of 100 million = 10 million
// Users location is updated every 30 secs
Location update query per sec = 10 million/30 = 334k approx.
Our system will receive 334k location update RPS and considering each user will query the system for nearby friends with same rate will double the total number of RPS on the system.
So, the throughput required(read + write) is 668k
Considering each location data takes upto 20 bytes of data in memory and each user is active on average for 1 hour.
Memory required per user for 1 hour = 20 bytes * 120(since 1 hour = 120 updates per user)
= 2400 bytes
= 2.4 KB per user.
Total memory for 10 million users = 10 million * 2.4 KB
= 2.4 * 10^3 * 10^7 Bytes
= 24 GB
Total memory for 1 year data = 24 GB * 24 * 365
= 210 TB
So for a year storage of location data, we will need 210 TB of persistent storage.
Please note that initial RPS and storage calculation are based on assumptions, and the real requirements would depend on the detailed design and actual usage patterns.
High Level Design
Lets start with a simple design and discuss over basic required components.
As we discussed in scale estimation, querying the server for nearby friends after 30 secs can result in 334kb read RPS. Querying the database every 30 seconds to check for nearby friends can introduce significant consistency issues and strain the system.
With millions of users simultaneously sending requests, the database could become overwhelmed, leading to delayed responses and potentially outdated or inconsistent data being served to users.
A better approach would be to send the updated data to users when received by system. Yes — you are thinking correct, I am suggesting creating a dedicated connection via web sockets.
With this approach our system throughput will also reduce to 334k.
Initial Design
A simple design would be to have a shared backend. Responsibilities of the backend are :
- Receive location update from all active users.
- For each location update, find all active friends who should receive it and forward it to those user devices.
- If the distance between the user and friend is more than threshold, do not forward the location update.
This looks simple. Let’s think over the issues we might face.
Scaling is the issue in this design. Think over it, there are 334k update queries per sec from 10 million users and each user has an average of 400 friends, and we assume only 10% of these friends are active at the moment ( 334k x 400 x 10% = 14 million) that sums to 14 million requests forwarded by system for location update per second. That’s a lot of requests and it will keep on growing with time.
Final Design
In this design we will use both Http and web socket protocol for communication.
Client Initialisation
- The mobile client will send the request for websocket connection with the user location.
- The websocket server will interact with the user database to fetch user and its friends info.
- Websocket server then fetches the friends location data from location cache server. It will return only the friends info who have been active in the last 10 minutes, as other friends location data will be expired.
- Then it will compute the distance for each friend location and also stores the latest location of client to websocket connection handler for future use.
- The computed data will be returned in reponse to the client.
Client Sending Location Update
- The mobile client sends a location update to the load balancer.
- The load balancer forwards the request to web socket server for updating the location data of that user.
- The web socket server saves the data to location history table.
- Then the web socket server updates the location to cache server with user_id as key and location with timestamp as value. It will also set the TTL with entry of 10 minutes. If there is no location update for that user within 10 minutes, its entry will be deleted from cache due to expiry time and user will marked as inactive.
- The web socket server publishes the the new location to user’s channel in the redis pub/sub server. Basically each user will have a separate channel in pub/sub server and its friends have subscribed to the channel.
Steps 3 to 5 can be executed in parallel. - When their is a location update in pub/sub server channel it will be broadcasted to all its subscribers. In this case, subscribers are all its online friends sending the update. For each subscriber, its websocket connection handler would receive the user location update.
- Upon receiving the message the websocket connection handler will compute the distance between the new location and the subscriber(the location data is stored in connection handler for subscriber).
- After distance computation, if the distance is more than radius defined, then location update is dropped there. Otherwise, the new location and the last updated timestamp are sent to the subscriber client.
With this design, there are only 40 request forwards on average for each user location update. Since the message will be forwarded to pub/sub and then broadcasted to all active subscribers(10% of 400 friends).
This calculation will be repeated for each subscriber.
API Designing
WebSocket
User send and receive the location update via websocket protocol. At minimum we need following api’s
- WebSocket initialisation: Setup initial connection between client and websocket server.
- Location update: Periodic location update api.
- Client location update: Periodic location updates on client.
- Subscribe user: Subscribe to a new friend.
- Unsubscribe user: Remove user from friend listd.
HTTP
User management api’s will use http protocol. It will perform basic CRUD operations:
- Add user —
POST: /api/user
- Fetch user details —
GET: /api/user/<user_id>
- Update user details —
PUT: /api/user/<user_id>
- Delete user —
DELETE: /api/user/<user_id>
Data Modelling
As per our final design we need to store location data for two different use case:
Location Cache
Location cache stores the latest location of each active user with the TTL of 10 minutes. We can use redis for this as it fulfils our requirement of cache server, key-value pair and expiry time features.
Location history Database
This database should be able to handle heavy write requests and have the ability to scale via sharding. Cassandra is a good candidate for such type of requirements.
One might question, why not use a relational database?
You can but single database server might not able to handle all the user history data and we need to shard the data by user id. This was its easy to maintain and load will be distributed evenly.
User Database
This database will be responsible for handling the users data and their relationship as friends. Since it will have more read operations than write and have a defined schema , we can go for relational database like Mysql.
It will have two tables:
- User — Contains meta data of users.
- Friends — Contains the user friends relationship.
Bottleneck & Proposal
Let’s discuss the problems we might face in our design
Redis pub/sub server
The bottleneck of redis pub/sub server is the CPU usage and not the memory usage. With scale we might face issue with the redis cluster managing channel for each user and processing large number of messages per sec.
We need a distributed redis pub/sub server cluster. Good news is each channel is independent on each other, thus we can distribute channels across multiple pub/sub servers by sharding on publisher id.
Adding/Removing friends
Adding and removing the friends include subscribing and unsubscribing the related channels but how?
We can setup a callback on the client which will get triggered, whenever there is a new friend added to the friends list. This callback will send the request to websocket handler to subscribe the new friend channel and returns the latest location of friend in response.
Similarly, a callback can be setup for the removing a friend from friends list, which will request the websocket handler to unsubscribe the removed friend channel.
Beyond these bottlenecks, there may be others that we’ll explore when we dive deeper into each component. However, for this article, I’ve limited the discussion to just these two.
Conclusion
By understanding the challenges associated with location updates, data consistency, and system load, we can make informed decisions about the architecture and technology stack. Leveraging techniques such as sharding, choosing a distributed database like Cassandra, and adopting WebSockets for real-time updates ensures that the service remains responsive and efficient, even as the user base grows.
For this article, I’ve drawn inspiration from Alex Xu & Sahn Lam’s book System Design Interview Vol.2.
♥️ If you found this Medium story helpful, please share your thoughts in the comments. I would love to hear your feedback. Also, feel free to check out my other articles on Medium.