We go in-depth into how we use the world’s most popular database to make our app speedier.
tl;dr: we use SQLite to quickly and easily distribute massive databases to dozens of servers.
Whiz operates worldwide, and we want to make sure that the app responds quickly wherever you are. That can be a bit tricky — if our servers are based in New York, and you’re in Sydney, it can take up to a second for a message to travel halfway across the world.
Why? Sending information to and from a smartphone requires getting it to the right place. That message will be relayed hundreds of times through various networks throughout the Internet before it reaches our server, and the same going back.
Plus there’s the speed of light. It’s fast, but sending a message through cellular connections, Wi-Fi, underwater fibre-optic cable and much more takes it’s toll, and even if by magic the message could be sent directly from Sydney to New York, it would still take around 200 milliseconds.
That’s why we use servers located around the world, including Sydney, San Francisco, Toronto, New York, London and Singapore. Each location has more than one server as well, to take on extra load and for redundancy. To make sure you get the fastest connection, we use Cloudflare’s geographical load balancer — a great tool that chooses the server closest to you automatically, wherever you are.
And that’s where the story begins. In the beginning, our command server based in London would spend it’s days downloading schedule feeds from hundreds of transit agencies and package them up into searchable databases (one for each continent). We used MySQL to store the databases, and when those were small, it was a simple matter of using mysqlbackup to backup, compress and store those databases on S3 storage.
This would happen every six hours, and each server, distributed around the world, would download those SQL backups and restore the database, ready to be used.
Here’s the problem: restoring a MySQL database is slow. As the years went by, and more and more cities were added, our schedules databases grew to over 200Gb. That’s a lot of data to decompress and restore, and each server was taking up to five hours a day just to keep up. While that’s happening, the server switches itself off from the public, reducing the resources we could provide to customers.
We experimented with different ways to speed this up: we tried hot restores using Persona XtraBackup, but that came with it’s own set of problems (it’s a great tool BTW, just not suitable for our use-case). Then we hit upon an idea: why restore the database at all? Why not just compress and store it on S3 as usual, then each server downloads that database and overwrites the existing one?
Unfortunately, doing that with MySQL or Postgres could result in nasty consequences, including database corruption. However, everyone’s favourite embedded database, SQLite, uses single-file databases by default. This could be the answer.
Now, there’s a common misconception that SQLite isn’t suitable for production web servers, and that can be true when you are expecting a lot of concurrent writes, but these schedule databases are effectively read-only: they are built with everything the app needs, and the API doesn’t need to write anything to the database.
So we tried it out: the command server writes everything to MySQL as usual, then a custom Bash script takes each database, passes it through mysqlbackup and converts that backup into an SQLite database file. Store that on S3, the web servers download and decompress it and then overwrite the existing SQLite file. Done!
Total restore time: 0.03 seconds. Essentially, each server updates 200Gb of schedules data every six hours instantaneously, with no interruption to service.
The best part: SQLite is fast. Insanely fast at read queries. Properly indexed, the database queries from our API come back so much faster than the same queries performed in MySQL. Here’s a quick comparison:
Get all departures today from Times Sq-42 St station in New York:
Now, we send back over 95% of API queries within 40 milliseconds of receiving them, making the app feel snappier and more responsive.
If you’ve got this far, and you’ve always ignored SQLite’s potential on a production server, we suggest you give it a try. Of course, our situation is a little unusual: dozens of servers that need exactly the same data at the same time. But after six months in production, we haven’t had a single blip using SQLite, and we couldn’t be happier.