Placeholder Image

字幕表 動画を再生する

  • Hello, everyone!! Myself Sandeep from codeKarle.

  • And in this video, let's look at how do we design a cab booking system, something like an Uber or OLA or Lyft or something of that sort.

  • Let's start with the functional and non functional requirements that we want this system to support.

  • So, very first thing is - as a customer, when you open your app, you should be able to see what cabs are around you.

  • So, that is the See Cab feature in your vicinity.

  • The next thing is - if you want to book a cab from one place, point A to point B let's say, you need to know how much time it will take to travel

  • from point A to point B. And, you should be able to get an approximate price of how much would it cost you to book a cab via this platform.

  • The next thing is - you should obviously be able to book a cab.

  • We'll not get into the varieties of.. types of cab like go, premium and all of that. We'll just assume that there is one type of cab.

  • You can add more features, simply! That's not a big trouble.

  • The next thing is - there should be a very good location tracking of what driver was at what place at what point in time, for various reasons.

  • From a non-functional standpoint, this platform should be global and it should be accessible to people of all countries.

  • At least, we'll design it in that way.

  • What that means is - you need to have servers in a lot of geographies so as to make sure that people in a certain geography

  • are accessing the servers near to them and thereby reducing the latency.

  • Next obvious thing - it needs to work at a fairly low latency.

  • Though this is not very, very mission critical, but it still needs to be, you know, reasonably fast.

  • Availability should be very high.

  • This system should not go down. It will cause a lot of problems to people who are stuck somewhere, if the system is down.

  • And at the same time, it should have high consistency.

  • High Availability - High Consistency, might seem to you that it is trying to violate the CAP theorem, which basically says that -

  • assuming all the systems in the world are distributed nowadays, so out of availability and consistent, you can just get one.

  • The idea is - certain components of this need to be very highly available and certain other components need to be very highly consistent,

  • not both at the same time.

  • From a scale standpoint, this system should scale to a very good number.

  • If I look at just some of the statistics of Uber, there are roughly 100 million active users that use Uber on a monthly basis. These are unique users.

  • And Uber, in general, does roughly 14 million rides per day.

  • So with that thought in mind, let's try to look at a system that can scale up to these numbers, with these criterion.

  • Now, the main problem that companies like Uber are trying to solve is - when you have a customer's location, who wants to book a cab,

  • you try to find out some few drivers who are very near to this location and then using some logic,

  • try to come up with the best driver who is suited to do this trip for this customer.

  • So the problem then becomes, how do you find these 2 - 3 closest drivers to this customer?

  • There are multiple ways to do that. We'll go over one of the ways which uses a concept called segment and mapping segments, basically.

  • Now, this is a term that I just made up. It's not an industry standard term. So, just keep that in mind.

  • Now let's just say, you have a city, something like this.

  • The idea is you basically divide it into rectangular segments.

  • So, you kind of divide this city into multiple pieces.

  • And you say that - this is probably your segment id 1.

  • This is your segment id 2, something of that sort.

  • Now, the idea is - you are dividing a city.

  • It'll normally be divided into a lot more segments than what you can see here.

  • Now, the idea is - given certain coordinates of the segment boundary and given certain coordinates of a cab,

  • you should be able to figure out which segment does a cab belong in.

  • Now, the problem looks trivial, and it is not difficult also.

  • So, think of it like a standard coordinate system.

  • This is point (0,0). This is (0,1). This is (1,0). This is (1,1).

  • Somewhere here.

  • Now, if a point lies somewhere in between here, let's say (0.4, 0.5),

  • you should be able to mathematically say that this point lies within this boundary.

  • A very similar logic we'll try to use when we try to assign a particular segment to a cab.

  • Also, keep in mind that cabs continuously moving and their locations are continuously changing.

  • So, we'll try to make sure that we get continuous pings from all the cabs and then keep a track of which segment do they belong in.

  • A cab could be here right now. Could be in this particular segment. And the cab driver is going via a road, over here. Now, this changes the segment.

  • And this information would be calculated at runtime, as and when we are getting pings from the cab.

  • So, we'll have something called a Maps Service.

  • This Maps Service will do a couple of things.

  • The very first thing is that - it will be responsible for dividing the city into these segments and taking care of the segments.

  • The other thing it will also do is - given a lat long of a cab and given a lat long of a customer,

  • tell which segment do these users belong to at this point in time.

  • This service will also be used to calculate ETA from point A to point B and the route from point A to point B and thus even the distance.

  • But, we'll abstract out. We'll not go into much detail on how that ETA and distance piece is implemented.

  • Think of it for now as if we'll be using a Google Maps Service and we'll go over the details of implementation of that

  • when we do another system design video on implementing Google Maps.

  • With that being said, there is also one more thing that this service does.

  • So let's just say, there is a huge amount of traffic or huge amount of cab drivers in this particular segment. And it is getting unmanageable.

  • So the idea of segment is - it should be a small set of drivers that are in the segment.

  • So this service will take care of dividing this segment into multiple parts. It could do it into 4 parts. It could divide it into 6 parts.

  • That logic resides within Maps Service.

  • Let's just say, there is very less traffic in some other locality.

  • So, it could also decide to merge a couple of segments into one segment and say, this whole thing is now one segment.

  • So all the segment management remains within this service.

  • Now, let's look at the overall architecture and how individual users and drivers get connected to the system.

  • All the users get connected through this User App, which talks to a Load Balancer, which talks to something called as a User Service.

  • This User Service is your repository of all the user information.

  • Plus, it is also a proxy, that will connect to other services to get any information that a user wants.

  • So, for example, if a user wants to see their profile, update their profile, all the APIs to do that are powered by User Service.

  • If somebody wants to fetch user information, any other service for example, then all the APIs for that are powered by this service.

  • Let's say, if a user wants to see their trips, then this User Service will talk to Trip Service, fetch all the trips for that user and send it back to the user.

  • So that's the responsibility of User Service.

  • From a database standpoint, it sits on top of a MySQL cluster, which stores all the user information within that.

  • And it also uses a Redis for caching the same information.

  • So let's say, a GET API to get a user's information is called, it first queries the Redis. If it has information, it returns from there. If it doesn't have

  • information, then it queries a MySQL slave, fetches that information, stores it in Redis and then returns back to whoever was calling.

  • The next flow that the user calls is basically when they try to book a cab.

  • The whole screen that the customer, kind of, goes through when they are trying to book a cab is powered by this Cab Request Service.

  • Essentially what it does is -

  • it basically makes a WebSocket connection with the User's App, which displays them a few cabs onto their UI which are around them.

  • Also, it places a request with something called a Cab Finder. We'll go over this in the next section. Whenever Cab Finder responds back with a cab,

  • this Cab Request Service talks to the User App, basically sends them a response, through this WebSocket connection,

  • saying that the cab is booked and these are the details and whatever is required.

  • This is approximately the major user flows.

  • Now quickly, let's look at the driver flows.

  • A driver basically talks via the Driver App.

  • Again, there is a very similar Driver Service, exactly similar to the User Service.

  • But for drivers, again, all the APIs for getting, updating all the driver information is powered via this.

  • If a driver wants to see their payment information for example, their payment history, this Driver Service will expose an API

  • which the Driver App will call and Driver Service will internally call a Payment Service to get that information and respond back.

  • It could call Trip Service to get the trips of all the drivers. And, all the UI data gets powered by this service.

  • This service sits on top of another MySQL which has all the driver information.

  • And it uses Redis for caching in exactly the same way the User Service does.

  • This Driver App also talks to something called as a Location Service through a series of servers,

  • again maintaining a WebSocket connection with these servers.

  • And, as and when a driver is moving through the city, every 5 seconds or 10 seconds, their location is being sent out to this Location Service,

  • which then queries the Map Service that we talked about earlier, to find out which segment does the driver belong to.

  • And when the customer places this cab request, customer's segment is calculated, driver segments are calculated

  • and they are mixed and matched by the Cab Finder and a couple of other components to give the best suited driver for a particular trip.

  • Now, let's look at how does a customer and driver come together.

  • All the active drivers in the system, who are online right now, ready for trips and all, are mentioned as D1, D2, D3.

  • There'll be a lot more such drivers.

  • All those drivers.. each of them is connected to one of the servers through a WebSocket connection.

  • And those servers are called out as WebSocket Handler 1, WebSocket Handler 2, WebSocket Handler 3.

  • Now, why do we need WebSocket here?

  • So, we always need a connection between the driver and a service.. for a lot of reasons.

  • One of the very first things is that - a driver continuously sends location pings to the backend, telling about the location.

  • Now, if each time they start creating a new connection, that's a kind of a heavy operation. So, we'll have this connection live.

  • Also, at times, the servers might want to talk to driver. So, let's say, if a trip is assigned to a driver, we need to inform the driver.

  • So, we can reuse the same connection to talk to a driver and tell them that this is the trip information that you have to do right now.

  • For all of that, there are these WebSocket Handler servers.

  • In the real world, there'll be hundreds of such servers, who are interfacing with all the drivers, which are throughout the world, geographically split.

  • Now, let's say, somebody in the system identifies that a trip is being given to a driver and to reach out to the driver,

  • they first need to know that which out of these hundreds of WebSocket Handler servers, do I need to talk to.

  • For that, there is something called as a WebSocket Manager.

  • Now, this WebSocket Manager is another distributed service which manages the fact that which server is connected to what all drivers.

  • So, let's say, D3 is a new driver that has come online right now and through the load balancer, it got connected to WebSocket Handler 3.

  • So, this Handler will inform the Manager, saying I have now got connected to D3 also. So, if there's anything for D3, inform me.

  • And this Manager will store this in it's database.

  • Now, let's say, this connection got broken and D3 is offline right now. Again this Handler will inform the Manager that D3 is now offline.

  • Do not reach out to me for any communication of D3.

  • This manager sits on top of a Redis cluster.

  • This Redis would not just be storing data in-memory, it will also be storing it in a persistent store on disk.

  • And it will basically store 2 kinds of mapping.

  • One is saying that.. the most frequently used one, is that - which driver is connected to which host..

  • which is saying something like D1 is connected to H1.

  • Similarly, there'll be an entry for each of the drivers in the system, saying which driver_id is connected to what host_id.

  • It'll also have a reverse mapping, saying which host_id is connected to what all driver_id.

  • So, it could have a mapping saying H1 is connected to D1, D2, D3, so on and so forth. Because that mapping might be used for something.

  • Coming to other things that this WebSocket is used for.

  • So these drivers/ devices send location pings to our backend, let's say, every 5 seconds.

  • So, every 5 seconds, we get a hit about the location information.

  • All the location related information is managed by something called Location Service. It does a lot of things.

  • One of the things that happens here is - it stores the information about the driver's location into it's Cassandra.

  • Why Cassandra?

  • Because, again, there are like thousands probably or maybe even millions of drivers across the globe

  • who are sending their location updates every 5 seconds.

  • So, there are a lot of updates happening. So, a Cassandra should be able to easily scale up to that number. That's the main reason.

  • There are 2 kinds of information that get stored here.

  • One is - the live location of the driver, which is the last known location.

  • The other thing that is stored is - while a driver is doing a trip with a customer, we need to know exactly what was the route followed,

  • for any auditing purpose or billing purpose.

  • Very common use case is - Once we know the points that the driver followed, we would be able to trace that out and then come up

  • with the real distance that the driver actually travelled and then use that to come up with the pricing.

  • So all of those things are basically responsibilities of Location Service.

  • Location Service also talks to Map Service.

  • Remember, Map Service from the previous section. Map Service is a service that maintains the segments that we have created

  • throughout the city and throughout the globe. So, Map Service maintains not just the segments, it also gives us the ETA, which will be the

  • time taken from point A to point B and the distance that will be followed and also the route that should be followed from point A to point B.

  • Think of Map Service as an abstraction that we have. We'll not go into the details of implementation of Map Service right now.

  • I have made another video which is on the design of Google Maps, that goes into details of how Map Service is implemented.

  • But, that being said, Location Service, as soon as it gets a ping from a driver,

  • it basically queries Map Service and tries to figure out that this lat long belongs to which segment.

  • It then stores it into it's Redis, saying, this segment has these drivers.

  • This update happens only when a driver's segment changes. If he's in the same segment, then no change happens.

  • This basically is used for a lot of purposes. So, let's say, we want to find out drivers in a vicinity,

  • we'll query this service saying who all are the drivers that are basically in S1.

  • There is one more thing that Map Service does. It basically keeps a mapping of which all are the segments surrounding a particular segment.

  • Which, we'll come to in a while on how it is being used.

  • There's something called as Trip Service. Trip Service is basically the source of truth for all the trip information.

  • It sits on top of a MySQL database and a Cassandra database.

  • It uses MySQL for all the live information, basically information of all the trips that are either about to happen in some near future or are in progress.

  • Once the trip is completed, then it basically can move to Cassandra.

  • Now, why don't we store all the information in MySQL? Because over time, this will become a very massive volume of data.

  • Plus, if it is just for read queries, Cassandra is also good enough. So, we don't really need to store it in MySQL.

  • The main reason of storing it in MySQL is because trip would have a lot of information.

  • It would have information about the customers, about the divers, about potential start times; end times, about the potential distances,

  • about the real values and maybe some events information that have come in between, maybe some payment information and a lot of other things.

  • Now, if you look at it in tables terms, these will be a lot of tables.

  • And, for each event that comes in against a trip, we might need to update a lot of such tables.

  • And there, it is very good to have transactional properties.

  • So, that's the reason we'll be using a MySQL for all the trips that will be updated.

  • And once the trip is completed, then we can move it to Cassandra.

  • Now, this movement from MySQL to Cassandra is taken up by this Trip Archiver Service, which basically is a cron,

  • which spawns once in every probably 12 hours and pulls the data from MySQL and puts it into Cassandra.

  • Coming back to Trip Service.. Trip Service will expose all the APIs around trips. So, if you want to get a trip by id or if you want to get all the trips

  • of a particular driver or all the trips of a user, all of those APIs would be powered by Trip Service.

  • And let's say, if it's a search by a driver_id, it will query MySQL, it will query Cassandra, it will get the results from both of them, merge them

  • and then return it back to whoever was calling. So, that's how it's flow would be.

  • Now, let's get to the main flow on what happens when a customer actually requests a cab.

  • The customer flow begins at this point where they basically make a request to Cab Request Service through again

  • an open connection between both the parties, the Customer and this service.

  • And basically, what the customer says is - this is my source lat long. I need to go to a destination which is identified by certain lat long. Get me a cab.

  • I'm assuming there is just one type of cab and no varieties and types of cab. If you want to implement, that is a straightforward thing to implement.

  • But given this request, Cab Request Service then queries something called as Cab Finder,

  • which is responsible to come up with that one driver who will do the trip.

  • At the end of all of it, Cab Finder will respond back to Cab Request Service saying, this is your trip_id, this is your driver_id,

  • go send it back to the customer.. in a nice form, with all the details about the driver and all of that. And Cab Service would send that.

  • Cab Finder will also put a notification into a Kafka, whether or not it was able to find a driver.

  • Let's say, if it's not able to find a driver, may be because there is a scarcity of drivers, all of that would go into Kafka,

  • which can be used for further Analytics like for example, telling drivers that this is a location where there are more customers and less drivers

  • so why don't you go into that location to get more trips or something of that sort.

  • Coming to what Cab Finder does.

  • The very first thing Cab Finder would do is basically - it has a source lat long, which is basically identifier of a particular location of a customer.

  • It will first of all query Location Service saying, get me the segment in which this customer currently is.

  • Along with that, also give me a list of drivers that are near this customer.

  • What it does is - it first of all queries Map Service with the lat long of the customer, to get the segment which the customer is in.

  • It then queries surrounding segments. So, I'll try to explain why do we need surrounding segments.

  • So, let's say, there is a rectangle like this and customer was sitting here.

  • There were a couple of drivers in various locations in this segment. There could possibly be a driver just here.

  • But, if we query all the drivers within this segment, then we'll get a driver who is far away.

  • But, a driver is just next to customer, maybe in another segment.

  • So, for doing that, we basically.. let's say if it is segment S1, we need to find all the segments that are

  • surrounding S1 and get the closest 10 drivers, let's say, in all of those segments.

  • So, let's say, there could be a couple of segments over here and maybe you can have something like this. This is S2, S3, S4, something of that sort.

  • So, we'll query all of these segments. Basically, we'll not query, we'll just off load that job to Map Service saying, get me all the segments

  • that are surrounding this segment S1, that Location Service gets. Then, Location Service basically figures out all the drivers

  • and then tells Map Service to get the closest 10 drivers to this customer, out of a list of those drivers.

  • Distance between the customer and the driver is also something Map Service is good at.

  • Remember, it is able to find and identify the distance between two points.

  • So, it can identify the distance between two customers and drivers, taking into account the road distance, not the aerial distance.

  • Let's say, it got some 5 - 10 drivers which are close by to the customer. Location Service will return it to Cab Finder.

  • Now basically, we need to identify 1 driver out of these 10 who will do the trip.

  • Now, there comes something called modes. There could be multiple modes in which this cab request could be served.

  • We could say that for certain kind of customers, just pick the best driver.

  • Let's say, if it's a premium customer, then we just pick the best driver out of the lot and assign that.

  • Or, if it's an average customer, we might want to do some different thing.

  • We might want to broadcast to all the drivers and whoever accepts first, we can assign that driver to the trip.

  • So, all of these modes are basically something that Cab Finder decides that which mode I want to run it in.

  • Given whatever mode it is, it might need some additional inputs.

  • So, if it's the best driver mode, then it might need to stack rank all the drivers.

  • So, all of these are basically something that is handled by Driver Priority Engine. We'll get to the logic, what it follows, later on.

  • Cab Finder then queries Driver Priority Engine saying, I have these drivers for this kind of a customer, you just arrange them and give me back.

  • Then it gets a list of those drivers. And given the mode and given the list, it then tries to identify one of the drivers who will actually do that trip.

  • It then queries WebSocket Manager and asks WebSocket Manager saying,

  • which was the host that was actually, you know, integrating with this particular driver. It will then call WebSocket Handler.

  • Let's say it chose D1, it will then call this Handler 1 saying, D1 you have a trip, go do it.

  • The same notification would be sent via Cab Request Service to Customer saying, Customer, you have got a new driver, which is driver D1.

  • And then the regular flow follows wherein the driver starts moving to the customer's location.

  • So, this is how we'll figure out, how to assign a driver to a customer.

  • Once a driver is assigned, basically, what it means is, a trip has to be created and updated.

  • So, this will then basically update the Trip Service saying, I've created a trip with this customer, with this particular start point, this end point,

  • this driver and whatever information it needs to. And, that gets persisted into this MySQL via this Trip Service.

  • So, this is basically the booking flow.

  • As part of the booking flow, we had inserted a lot of events.

  • Whether Cab Finder was able to find a ride for the customer or not. And even the Location Service was putting in a lot of events into Kafka.

  • Now, let's try to look at how do we use those events.

  • This Kafka is getting a lot of events like location update events, trip update events, no driver found events, a lot of things.

  • Let's look at some of the use cases where we can utilize those events to our benefit.

  • One of the very common thing is - whenever a trip is completed, we need to initiate a payment to the driver.

  • That could be aggregated over a few hours or a few days or something of that sort. But, we still need to store information about a potential payment.

  • There would be a Payment Service, which sits on top of this Kafka, which would have a Kafka consumer, which listens to all the

  • trip completion events. And as soon as a trip is completed, it would insert a record in it's Payment MySQL database, which says that -

  • this particular driver, did this particular trip_id, for a user with user_id, and with lots of attributes like distance travelled, time taken and all of that.

  • And finally an amount of money that needs to be paid to this driver.

  • If that needs to be an instantaneous payment, this Payment Service could talk to a Payment Gateway to deliver a payment.. like transact the money.

  • Or, if it needs to be aggregated, then it can basically run a cron which does the payment every once in a few days or something of that sort.

  • If, let's say, a driver wants to see their payment history, there would be APIs that would be running out of this service,

  • which will give all the payment transaction information against a particular driver,

  • which we talked about in the very first section that, could be powered through Driver service, talking to Payment Service.

  • Now, let's look at some other use cases.

  • On top of this Kafka cluster, there would be a Spark Streaming Cluster in which some Spark streaming jobs would be running.

  • One of the very common things is to basically create a heat map.

  • If, let's say, from a particular geography, we are getting a lot of events saying, there are no drivers found,

  • that means that, there is a surge of customers in that area and there are very few drivers.

  • So, we can create a heat map within Driver App, powered by this Streaming, which kind of shows a particular segment or a few areas,

  • which are having this kind of a scarcity so drivers can move to locations to get more tips.

  • This is a classic example of a Streaming kind of an application.

  • What it will also do is - it will basically put all the events into a Hadoop cluster, which can be used for further Analytics.

  • On top of this Hadoop cluster, we could run a lot of ML jobs or regular Spark jobs, which will do a lot of things.

  • So, the very first thing is basically to do a customer classification into various categories.

  • So, if a customer takes a trip with us every day, we'll classify them as a premium customer.

  • If it's a once in a while coming customer, then it would be just another customer for us.

  • Same classification could be done for drivers.

  • So, there'll be these User Profiling and Driver Profiling jobs, which would ideally be a ML classification model running,

  • which will classify those users as premium or regular, or drivers as premium drivers or regular drivers.

  • The same information would also be used to create an ML model, which can do the Driver Priority that we talked about in the previous section.

  • So, based on certain attributes of drivers, for example, their ratings or the customer feedback or their ETA..

  • basically the accuracy of how much ETA was supposed to happen and how much ETA did actually the driver take.

  • There could be a lot of attributes on which we stack rank the drivers.

  • All of those could be aggregated in these jobs and drivers could be given a score.

  • And using those scores, drivers could be ranked by Driver Priority Engine. So, something of that sort could also be built.

  • Now, we could also power the same... I forgot to make this link.

  • This same information could be used to generate a fraud score.

  • Let's say, if there is a very high correlation between a driver and a customer. Let's say, if all the location pings say that

  • a customer and driver move together or if all the trips done by a driver are against a particular customer.

  • For most of them, we can safely say that this customer and driver are either friends or that's the same person

  • using two mobile phones to book a trip by this customer and do a trip and thereby, you know, using the..

  • there's something called as incentive programme run by these ubers and all the other companies.

  • So, it's basically a way to exploit that kind of a programme. So, all those kind of frauds could be captured by these kind of models again.

  • The same data could be used as an input into Map Service.

  • Let's say, if we don't have traffic information into maps. All these lat longs could basically power the traffic information of Map Service.

  • We can safely assume that - we'll not know the exact number of people traveling on the road,

  • but, if some road has more of our cars, then we can safely assume that probably there is more traffic also there.

  • If we are at a Uber scale, we can safely assume that.

  • So, that could be an input into Map Service or at least the traffic data and some of the road condition data also.

  • So, if the average speed of a driver on certain roads is very high, we can safely assume that that's a freeway or a highway or a good road.

  • If the average speed is very low, we can assume that there is either high traffic or the road is not in good condition.

  • Or there is something wrong with the road.

  • And also the same information could then be used to come up with a better, enhanced ETA Calculation Engine.

  • Exact details of it, we'll go over in the Google Maps video. So, I would recommend that you look at that.

  • But this could be one of the use cases where we use all of these information.

  • So, yeah, I think this is mainly about an Uber kind of an application.

Hello, everyone!! Myself Sandeep from codeKarle.

字幕と単語

ワンタップで英和辞典検索 単語をクリックすると、意味が表示されます

B1 中級

Uber System Design | Ola System Design | System Design Interview Question - Grab, Lyft

  • 19 1
    meowu に公開 2023 年 06 月 14 日
動画の中の単語