The other day I was asked to extract the data from the IEC website for an article on the 2016 Local Government Elections. Unfortunately, at that time there was no full report available. One option I had (which could not be any less attractive) was to download the documents one-by-one, municipality by municipality, or for each voting district.
Of course, that was not an option. Besides the fact that the page reloads for several seconds every time you select an option from the menu, the whole process can waste hours, just to get access to data and to then find out that you still need to clean it. I decided to write this post because this is not the first time I needed to do this, and you might find it valuable. So, how can we do this quickly?
When you click on “download”, a little screen pops up. In this screenshot you will see the url and will see that it consists of a bunch of parts. If you download two different files you will begin to see the pattern.
Take a closer look: The url is a source of information and you should always read it. In this case, it is telling us the vital parts we need to construct the url that we need to download the complete data.
Once we have seen what the composition of the url is, we can reproduce it and download the data in an easy way.
- Build the pattern:
Using a Google spreadsheet, you will need four columns: the first part of the url, the province code, the municipality code and the end of the url.
(NB: Don’t forget the “/” at the end)
- Concatenate the information.
Using the “concatenate” function, you will be able to merge the columns into one url.
- Paste the urls in a text doc.
Now with the urls ready, you just need to copy and paste them in a text file using a text editor. Make sure you save it as “txt”.
- Use the Firefox plugin “DownloadThemAll”.
You will need to install an add-in called DownloadThemAll in Firefox. Once you have downloaded and installed it you will need to go to “tools + download them all + manager”.
Right click on the empty space, click “advance + open the text file” and click “start”.
5. Download your data.
Ready! All your data is there in less than half of the time.
(You can find the IEC data on our open data portal here.)
Do you know of another way to do it? Please share it! And don’t forget, our third Data-Driven Journalism Academy is about to start. If you want to learn a bit more about data journalism, don’t hesitate to take a look here to find out how you can apply.