Short Courses - Sourcing Data for Storytelling - Code for South Africa Data Journalism Academy

Course Description

In today’s world, information is being transformed into data. A key ability for a Data Journalist is to be able to find, extract and transform relevant data into machine readable format. Sourcing and scraping data is the very first step of any data process, or Data Pipeline. Once relevant data has been sourced and extracted into machine readable format, it can then be cleaned and then analysed.

Learning Outcomes

Identify Data Sourcing as a distinct stage in the Data Pipeline
Understand the legal resources that are available to journalists or citizens in order to acquire public interest information
Source data through a deep search online
Open data portals & information formatting
Use Google advanced search & Google academic
Finding data using wayback web pages, cache functions and files hidden from browser, add-ins & URL patterns
Build own datasets when no other information is available
Scrape information from pdf’s or scanned images
Scrape information from online sources into structured format

Intended Audience

Journalists, storytellers, ands data wranglers

Course Level

Introductory

Course Length

2 days

Additional Comments

Participants must have access to Excel / Open (Libre) Office, an activated google account, google chrome browser.

Download course overview | Back to short courses

Short Courses – Sourcing Data for Storytelling