#nerdweekend: visualizing transport data from grab

Manila, 4 January—Hello 2019! How’s the first weekend of 2019 shaping up so far? More importantly, how did the first work days of 2019 go for you? Admittedly, I’m bad at going on vacations, because I get anxious about going back to work. Working in media for bulk of my life meant always being on-call in some form, and never really tuning out. I’m still trying to figure out how to get better at vacationing.

Anyway, speaking of the holidays, here’s something I thought about doing while in the middle of festivities: A small dataviz project for the dataset that Grab has on its passengers, which recently made the rounds on social media.

The immediate takeaway from this dataset, of course, is how much any one person spent over a specific period for Grab rides alone. For instance, some spent as much as P100,000 in 12 months, and some commented how this was already equivalent to a downpayment for a car. Which led me to think about researching into payment terms for cars, average cost of maintenance, historic prices of fuel, parking rates around Makati, etc—a vortex of its own, I assure you, but not really one that was relevant for me, seeing that I don’t have a car to begin with.

Anyway, so I also looked into my data, and decided it was par for the course, considering my life movements this year: I moved closer to work, so transportation wasn’t a huge chunk of my expenses, but just looking at the mix of outrage, disbelief and resignation among my friends told me a thing or two about how crucial Grab has been as a service amid the terrible mass transport situation in the metro.

Expensive or cheap?

Crucial, yes, but at what cost? If a person spends P100,000 on 300 Grab rides, that comes up to P333 per ride. Is this figure cheap or expensive? This can be answered in a variety of ways. One way would be to quantify it as a percentage of wages earned; perhaps a P300 Grab ride is unwise for anyone earning less than that in a day.

However, if I were to remain within the confines of the Grab dataset, what other information did I have that can help further contextualize these figures?

Here are the available data:

  • Date and time of ride

  • Pick-up and drop-off points

  • Ride distance in kilometers

  • Total fare paid

I wish the dataset also included time spent per ride, because it’s the only data point that can actually quantify how terrible traffic has been in the metro. Grab did, however, aggregate that information: In its personalized Your #2018withGrab infographic, it informed me that we spent more than 6,400 minutes in more than 100 rides, which meant I spent close to an hour or so in every Grab ride I took.

Anyway, I thought the Total Fare Paid info is essentially incomplete. Some rides are expensive due to distance, while some are due to surge and demand. Certainly, paying a certain amount for a within Makati ride and paying the same amount for a Makati-BGC ride are two different stories, and I wanted to capture both stories accurately.

That’s how I ended up with the Cost per Kilometer metric (Fare over Ride distance), which is the anchor data point for the entire dataset. With this info now computed, the full range of what I paid for this service came into clearer view: I spent between P13 to P90 per kilometer on Grab rides from April to December 2018. I chose April as the starting point because that was when Uber ceased operations and Grab became the sole provider of this type of service.

Trial and error

My initial instinct was to make a two-axis graph and plot the cost per km on the Y-Axis and the distance in km on the X-Axis. I realized that with 133 data points, this was going to be unwieldy. I tried plotting the first 5 rides before I gave up.

I regrouped and conceded that I needed to work with ranges. Studying the data further I realized that they naturally occurred in clusters. I ended up grouping together rides that cost under P20/km, between P20-P39, between P40-P59, and P60 and above. Distance-wise, the rides were either under 2 km, between 2-5 km, between 5-10km, and over 10km.

To get this on paper, I used some shapes and colors: Green circles for the under P20, Blue triangles for the P20-39 segment, Orange squares for the P40-59 segment, and Red pentagon houses for the P60 and above segment.

Distance traveled was represented by small arcs beside the shapes: one for the under 2km segment, two for the between 2-5km segment, three for the between 5-10km segment, and four for the over 10km segment.

In terms of location traveled, since bulk of my trips either started or ended in Makati, I just took note of that by shading the appropriate portion of the shape: Shaded left halves for trips that originated in Makati, and shaded right halves for those that terminated in Makati. This meant that fully shaded shapes were just trips around Makati, and empty shapes were trips around other cities.

And since I had run out of elegant ideas, I decided to lump them together by month.

Here’s how it all looked like put together:

Anyway, some takeaways from going through my data:

  • The cheapest rides (green circles) are the longest ones—check out the arcs beside those.

  • Conversely, the most expensive rides (red houses) are all under 2km, and are mostly within Makati rides.

  • Most of my trips either originated from or ended in Makati, in one way or another. I very rarely took Grab rides around other areas in the Metro.

  • Bulk of my rides cost between P20-P39 per km—definitely cheaper than roughly half of the other rides that cost at least P40 per km.

Using these stats, next year’s personal transport targets are now clearer:

  • Even as I already do a lot of walking around Makati, there is still need to intensify walking efforts and cut the number of within Makati/under 2km trips by half.

  • As for total Grab expenses, target is to reduce them by at least 5% by end of this year.

These may not be implementable for all types of Grab users, but I encourage everyone to go through their data and look for insights. They may surprise you.