Chapter 2.  Access, Speed, and Storage with Hadoop

This chapter aims to target the challenge of storing and accessing large volumes and varieties (structured or unstructured) of data offering working examples demonstrating solutions for effectively addressing these issues.

Since it is expected that you are somewhat familiar with Hadoop, this chapter starts with a brief overview of the technology, but doesn't intend to cover all of the details as the goal is to provide a demonstration using Hadoop as a technology to address the challenge of storing and accessing big data.

In addition, in an effort towards completeness, we'll touch on the possible alternatives to using Hadoop, such as Apache Spark and even a simple scripting solution.

By the end of this chapter, the reader should have an idea of what Hadoop is and how it works, should have acquired an appreciation for the reasoning for leveraging Hadoop to store, and should have accessed big data and also have worked through example solutions using Hadoop.

We'll break down this chapter like this:

  • About Hadoop
  • Log files and Excel
  • Hadoop and big data
  • Example 1
  • Example 2