书名：Learning Data Mining with Python（Second Edition）
作者名：Robert Layton
本章字数：327字
更新时间：2021-07-02 23:40:08

Collecting the data

The data we will be using is the match history data for the NBA for the 2015-2016 season. The website http://basketball-reference.com contains a significant number of resources and statistics collected from the NBA and other leagues. To download the dataset, perform the following steps:

Navigate to http://www.basketball-reference.com/leagues/NBA_2016_games.html in your web browser.
Click Share & more.
Click Get table as CSV (for Excel).
Copy the data, including the heading, into a text file named basketball.csv.
Repeat this process for the other months, except do not copy the heading.

This will give you a CSV file containing the results from each game of this season of the NBA. Your file should contain 1316 games and a total of 1317 lines in the file, including the header line.

CSV files are text files where each line contains a new row and each value is separated by a comma (hence the name). CSV files can be created manually by typing into a text editor and saving with a .csv extension. They can be opened in any program that can read text files but can also be opened in Excel as a spreadsheet. Excel (and other spreadsheet programs) can usually convert a spreadsheet to CSV as well.

We will load the file with the pandas library, which is an incredibly useful library for manipulating data. Python also contains a built-in library called csv that supports reading and writing CSV files. However, we will use pandas, which provides more powerful functions that we will use later in the chapter for creating new features.

For this chapter, you will need to install pandas. The easiest way to install it is to use Anaconda's conda installer, as you did in Chapter 1, Getting Started with data mining to install scikit-learn:
$ conda install pandasIf you have difficulty in installing pandas, head to the project's website at http://pandas.pydata.org/getpandas.html and read the installation instructions for your system.

本周热推：

Python编程：从入门到实践 Python编程：从入门到实践（第2版）C Primer Plus（第6版）中文版【最新修订版】区块链架构之美：从比特币、以太坊、超级账本看区块链架构设计 Python从入门到精通