Course Description

In today’s world, information is being transformed into data.  A key ability for a Data Journalist is to be able to find, extract and transform relevant data into machine readable format.  Sourcing and scraping data is the very first step of any data process, or Data Pipeline.  Once relevant data has been sourced and extracted into machine readable format, it can then be cleaned and then analysed.

Learning Outcomes

  • Identify Data Sourcing as a distinct stage in the Data Pipeline
  • Understand the legal resources that are available to journalists or citizens in order to acquire public interest information
  • Source data through a deep search online
  • Open data portals & information formatting
  • Use Google advanced search & Google academic
  • Finding data using wayback web pages, cache functions and files hidden from browser, add-ins & URL patterns
  • Build own datasets when no other information is available
  • Scrape information from pdf’s or scanned images
  • Scrape information from online sources into structured format

Intended Audience

Journalists, storytellers, ands data wranglers

Course Level

Introductory

Course Length

2 days

Additional Comments

Participants must have access to Excel / Open (Libre) Office, an activated google account, google chrome browser.

Download course overview | Back to short courses