- Learning Data Mining with Python(Second Edition)
- Robert Layton
- 225字
- 2021-07-02 23:40:04
What is classification?
Classification is one of the largest uses of data mining, both in practical use and in research. As before, we have a set of samples that represents objects or things we are interested in classifying. We also have a new array, the class values. These class values give us a categorization of the samples. Some examples are as follows:
- Determining the species of a plant by looking at its measurements. The class value here would be: Which species is this?
- Determining if an image contains a dog. The class would be: Is there a dog in this image?
- Determining if a patient has cancer, based on the results of a specific test. The class would be: Does this patient have cancer?
While many of the examples previous are binary (yes/no) questions, they do not have to be, as in the case of plant species classification in this section.
The goal of classification applications is to train a model on a set of samples with known classes and then apply that model to new unseen samples with unknown classes. For example, we want to train a spam classifier on my past e-mails, which I have labeled as spam or not spam. I then want to use that classifier to determine whether my next email is spam, without me needing to classify it myself.