书名：Learning Data Mining with Python（Second Edition）
作者名：Robert Layton
本章字数：138字
更新时间：2021-07-02 23:40:12

Obtaining the dataset

Since the inception of the Netflix Prize, Grouplens, a research group at the University of Minnesota, has released several datasets that are often used for testing algorithms in this area. They have released several versions of a movie rating dataset, which have different sizes. There is a version with 100,000 reviews, one with 1 million reviews and one with 10 million reviews.

The datasets are available from http://grouplens.org/datasets/movielens/ and the dataset we are going to use in this chapter is the MovieLens 100K dataset (with 100,000 reviews). Download this dataset and unzip it in your data folder. Start a new Jupyter Notebook and type the following code:

import os
import pandas as pd
data_folder = os.path.join(os.path.expanduser("~"), "Data", "ml-100k")
ratings_filename = os.path.join(data_folder, "u.data")

Ensure that ratings_filename points to the u.data file in the unzipped folder.

本周热推：

Visual Basic从入门到精通（第5版）高可用可伸缩微服务架构：基于Dubbo、Spring Cloud和Service Mesh JavaScript动态网页编程 Spring快速入门 JavaScript高级程序设计（第3版）