Chapter 2. Working with Data

Building real world's data analytics requires accurate data. In this chapter we discuss how to obtain, clean, normalize, and transform raw data into a standard format such as Comma-Separated Values (CSV) or JavaScript Object Notation (JSON) using OpenRefine.

In this chapter we will cover:

  • Datasource
    • Open data
    • Text files
    • Excel files
    • SQL databases
    • NoSQL databases
    • Multimedia
    • Web scraping
  • Data scrubbing
    • Statistical methods
    • Text parsing
    • Data transformation
  • Data formats
    • CSV
    • JSON
    • XML
    • YAML
  • Getting started with OpenRefine