I have been suffering from back pain for some time now and after several visits to the doctor, one x-ray and a scary MRI session, his diagnose was clear: I must work-out to harden the muscles on my back to avoid the pain. Indeed, several months without regular exercises along with a routine of being seated most of the day are not good. Although I walk regularly on the park nearby, that’s not enough to prevent this problem on my back. Thanks COVID-19 global pandemic, I guess.
Some of the gyms and fitness centers here in Brno, Czech Republic, publish the utilization of their buildings, informing how many people are present and their capacity.
This information is very useful because you can check how full the gym is and avoid the trouble of going there if there’s no capacity for you.
Unfortunately this only presents the capacity at the very moment and, for a gym as small as the one I go, it can be very volatile. The free capacity can change quickly in the 20 minutes it takes to walk from home to the gym.
Therefore, I wanted to maximize the chances of meeting as few people as possible on the gym by looking at its utilization trends over time. I know that there are times of the day when less people are going there.
Trying to solve this problem was a good opportunity to write some code in Go, a language I would like to know more about, and experiment with some technologies that have been in my radar. My idea was to save that information into a time series database to simplify playing with the data.
I created an application to collect the information from these fitness centers I was interested about and save it on a time series database. There are a few options of databases suitable for this purpose, some of which I have worked in the past, but I wanted to try something new. I was specifically looking for two things for this tool. First, I wanted what I call “elegant simplicity”: the ability to get the basic things done simply and quickly, but also being suitable for other larger projects if I ever need that. Second, it must be open source.
After some research, I ended up with InfluxDB. If you haven’t heard about it, is an open source time series platform. There’s a few components in this platform. In my case, I was particularly interested in the ability to store time-based data and this is what I’ll be mentioning here.
Unfortunately, none of the webpages have APIs, so I had to write a small web crawler to parse the pages of the gyms and fitness and extract the information. For this crawler, I ended up with goquery for it’s ability to parse nodes using a syntax that is similar to jQuery. Even though I am not a jQuery or front-end expert, it wasn’t difficult to find the appropriate selector to extract the information I wanted by looking at the HTML code.
With that information at hand, then I can use the InfluxDB client API to save the record into the database. This is something that I liked about my experience using InfluxDB: the go client API was easy to get started even for a go beginner like myself. In fact, if you look at the go client documentation, the code is not significantly different from the example one in the documentation.
Having the important code in place, then it’s a matter of handling the infrastructure bits and making sure everything integrates well and runs as automated and independent as possible.
I decided to run InfluxDB as a container, since it makes things simpler to run and manage. Unfortunately, this is one area where I had some confusion in the InfluxDB documentation, as I couldn’t figure out quickly what volumes I should mount to ensure that the data is preserved between container restarts (the docker documentation in the getting started only lists the ports). After some investigation I found out that you have to create a volume and mount /root/.influxdbv2 to it. Like this:
docker run --rm --name influxdb -v influxdb-data:/root/.influxdbv2 -p 8086:8086 quay.io/influxdb/influxdb:v2.0.2
During the first execution, it is important to access the InfluxDB console to finish the configuration and create the organization, bucket and the token required for recording the data. This was a very simple operation that happened only one time.
Both the influxdb container as well as the application and its runtime configuration are managed by systemd, which helps make their management simple and consistent. For example, reading the service logs is a breeze. Using journalctl I can check the logs with something like:
journalctl -ru is-it-free@$USER.service
Building and installing the bits is as boring as it could be, with a Makefile doing the bulk of the work:
make build token="" influxdb=http://myserver:8086 gym=https://gym-website.cz pool=https://pool-website.cz
With all of this out of the way, then it’s a matter of leaving the system running for a few days to collect the trends and help me choose the best time of the day to go. The crawler updates the data every 10 minutes which, in my opinion, offers a resolution good enough for data collection while avoiding abusing the websites.
It has been running only for a few days and with the gyms in Czech Republic closing again today, there is not much data available just yet. Hopefully once they are open again and with a few more days of data gathering, I will be able to develop a routine that allows me to go to the gym on the least crowded time possible. My back certainly needs that.