It’s easy to forget, amid all the cleaning, sorting, and filtering, that data journalism is all about storytelling. A wise man once told me “It’s called data journalism, not data journalism.” This is undoubtedly true, but before you get to embedding insightful graphs and creating beautiful visualisations, you first need some data to work with.
Sourcing data isn’t always easy. This is especially true of government data, where administrators will seemingly take every measure to make data unusable. And while private data is typically of a better standard, it also typically pricey. Whether public or private, getting your hands on the data you need can be a quite an effort.
I am currently working on a story about gentrification in Cape Town. A large part of my project involves identifying areas that are currently undergoing gentrification, using indicators such as change in property values, median income levels, and the density of neighbourhood coffee shops.
I started with creating a heatmap of all the coffee shops in and around the city of Cape Town. If I could get my hands on a definitive list of all the coffee shops in the city, I could map their coordinates and create a coffee shop heatmap, I thought. That sounded simple enough, so I set about obtaining a coffee shop dataset.
My first thought was contacting the Department of Trade and Industry to point me to the relevant administrator, or section of their website I might have overlooked. I was redirected to the Companies and Intellectual Property Commission (CIPC), whom I was told had a colossal database of every single registered company in the country. What a goldmine.
Alas, like gold, data on commercial enterprises comes at a cost. I was charged 4 cents for every record (row) of data I received, with a base payment of 10 rand. My total cost came to 600 rand.
What I received was not what I had anticipated. There were two problems with the data. Firstly, it only gave me the businesses that had registered in the year I specified, not every single registered business as of that year. Secondly, the sic codes (a unique code that indicates business type, i.e. 3 for “restaurant”), which would have allowed me to filter out the coffee shops (pun intended), which were missing, due to new legislation that did not require newly registered businesses to supply them. This data was unusable.
I had to look elsewhere. But where? I had contacted nearly every government department I could think of, and googled myself to exhaustion.
Then, I had an idea. Foursquare, the search and discovery app for nearby services, has a list of nearly every business, depending on category. My plan was to search for every coffee shop around Cape Town, and work my way from neighbourhood to neighbourhood. I could then use Google Maps to identify the location of the coffee shop, and use the coordinates to fit into my own Google Sheet. This could work, but it would require some manual labour.
Step 1: Extracting coordinates.
This is fairly simple. Get a name of a coffee shop and search for it on Google Maps. Right-click on the map, then click on ‘What’s here?’ This brings up a window with a picture of the place, the street address, and the latitude and longitudes.
Step 2: Populate your spreadsheet
Most mapping tools will recognise the column headings ‘Latitude’ and ‘Longitude’ as location data, and plot them accordingly, so your spreadsheet will need to have those headings. The rest is a process of repetition that could span a few hours, or even days, depending how many items you have to look up. Add the coordinates for all the locations you want to plot, and you should have a spreadsheet ready to map.
I used Cartodb for creating the heatmap. The process is as simple as importing the data, and choosing the visualisation filter that creates heatmaps. And voila!
Remember, data is everywhere, and you don’t always have to rely on scraping websites, petitioning government, or paying hefty fees to get what you want, with obvious exceptions. Sometimes, all you need is some creativity, hard work, and lumbar support, and you could create your very own spreadsheet.