书名：Learning Data Mining with Python（Second Edition）
作者名：Robert Layton
本章字数：225字
更新时间：2021-07-02 23:40:04

What is classification?

Classification is one of the largest uses of data mining, both in practical use and in research. As before, we have a set of samples that represents objects or things we are interested in classifying. We also have a new array, the class values. These class values give us a categorization of the samples. Some examples are as follows:

Determining the species of a plant by looking at its measurements. The class value here would be: Which species is this?
Determining if an image contains a dog. The class would be: Is there a dog in this image?
Determining if a patient has cancer, based on the results of a specific test. The class would be: Does this patient have cancer?

While many of the examples previous are binary (yes/no) questions, they do not have to be, as in the case of plant species classification in this section.

The goal of classification applications is to train a model on a set of samples with known classes and then apply that model to new unseen samples with unknown classes. For example, we want to train a spam classifier on my past e-mails, which I have labeled as spam or not spam. I then want to use that classifier to determine whether my next email is spam, without me needing to classify it myself.