封面
版权信息
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
Getting Started with Data Mining
Introducing data mining
Using Python and the Jupyter Notebook
Installing Python
Installing Jupyter Notebook
Installing scikit-learn
A simple affinity analysis example
What is affinity analysis?
Product recommendations
Loading the dataset with NumPy
Downloading the example code
Implementing a simple ranking of rules
Ranking to find the best rules
A simple classification example
What is classification?
Loading and preparing the dataset
Implementing the OneR algorithm
Testing the algorithm
Summary
Classifying with scikit-learn Estimators
scikit-learn estimators
Nearest neighbors
Distance metrics
Loading the dataset
Moving towards a standard workflow
Running the algorithm
Setting parameters
Preprocessing
Standard pre-processing
Putting it all together
Pipelines
Summary
Predicting Sports Winners with Decision Trees
Loading the dataset
Collecting the data
Using pandas to load the dataset
Cleaning up the dataset
Extracting new features
Decision trees
Parameters in decision trees
Using decision trees
Sports outcome prediction
Putting it all together
Random forests
How do ensembles work?
Setting parameters in Random Forests
Applying random forests
Engineering new features
Summary
Recommending Movies Using Affinity Analysis
Affinity analysis
Algorithms for affinity analysis
Overall methodology
Dealing with the movie recommendation problem
Obtaining the dataset
Loading with pandas
Sparse data formats
Understanding the Apriori algorithm and its implementation
Looking into the basics of the Apriori algorithm
Implementing the Apriori algorithm
Extracting association rules
Evaluating the association rules
Summary
Features and scikit-learn Transformers
Feature extraction
Representing reality in models
Common feature patterns
Creating good features
Feature selection
Selecting the best individual features
Feature creation
Principal Component Analysis
Creating your own transformer
The transformer API
Implementing a Transformer
Unit testing
Putting it all together
Summary
Social Media Insight using Naive Bayes
Disambiguation
Downloading data from a social network
Loading and classifying the dataset
Creating a replicable dataset from Twitter
Text transformers
Bag-of-words models
n-gram features
Other text features
Naive Bayes
Understanding Bayes' theorem
Naive Bayes algorithm
How it works
Applying of Naive Bayes
Extracting word counts
Converting dictionaries to a matrix
Putting it all together
Evaluation using the F1-score
Getting useful features from models
Summary
Follow Recommendations Using Graph Mining
Loading the dataset
Classifying with an existing model
Getting follower information from Twitter
Building the network
Creating a graph
Creating a similarity graph
Finding subgraphs
Connected components
Optimizing criteria
Summary
Beating CAPTCHAs with Neural Networks
Artificial neural networks
An introduction to neural networks
Creating the dataset
Drawing basic CAPTCHAs
Splitting the image into individual letters
Creating a training dataset
Training and classifying
Back-propagation
Predicting words
Improving accuracy using a dictionary
Ranking mechanisms for word similarity
Putting it all together
Summary
Authorship Attribution
Attributing documents to authors
Applications and use cases
Authorship attribution
Getting the data
Using function words
Counting function words
Classifying with function words
Support Vector Machines
Classifying with SVMs
Kernels
Character n-grams
Extracting character n-grams
The Enron dataset
Accessing the Enron dataset
Creating a dataset loader
Putting it all together
Evaluation
Summary
Clustering News Articles
Trending topic discovery
Using a web API to get data
Reddit as a data source
Getting the data
Extracting text from arbitrary websites
Finding the stories in arbitrary websites
Extracting the content
Grouping news articles
The k-means algorithm
Evaluating the results
Extracting topic information from clusters
Using clustering algorithms as transformers
Clustering ensembles
Evidence accumulation
How it works
Implementation
Online learning
Implementation
Summary
Object Detection in Images using Deep Neural Networks
Object classification
Use cases
Application scenario
Deep neural networks
Intuition
Implementing deep neural networks
An Introduction to TensorFlow
Using Keras
Convolutional Neural Networks
GPU optimization
When to use GPUs for computation
Running our code on a GPU
Setting up the environment
Application
Getting the data
Creating the neural network
Putting it all together
Summary
Working with Big Data
Big data
Applications of big data
MapReduce
The intuition behind MapReduce
A word count example
Hadoop MapReduce
Applying MapReduce
Getting the data
Naive Bayes prediction
The mrjob package
Extracting the blog posts
Training Naive Bayes
Putting it all together
Training on Amazon's EMR infrastructure
Summary
Next Steps...
Getting Started with Data Mining
Scikit-learn tutorials
Extending the Jupyter Notebook
More datasets
Other Evaluation Metrics
More application ideas
Classifying with scikit-learn Estimators
Scalability with the nearest neighbor
More complex pipelines
Comparing classifiers
Automated Learning
Predicting Sports Winners with Decision Trees
More complex features
Dask
Research
Recommending Movies Using Affinity Analysis
New datasets
The Eclat algorithm
Collaborative Filtering
Extracting Features with Transformers
Adding noise
Vowpal Wabbit
word2vec
Social Media Insight Using Naive Bayes
Spam detection
Natural language processing and part-of-speech tagging
Discovering Accounts to Follow Using Graph Mining
More complex algorithms
NetworkX
Beating CAPTCHAs with Neural Networks
Better (worse?) CAPTCHAs
Deeper networks
Reinforcement learning
Authorship Attribution
Increasing the sample size
Blogs dataset
Local n-grams
Clustering News Articles
Clustering Evaluation
Temporal analysis
Real-time clusterings
Classifying Objects in Images Using Deep Learning
Mahotas
Magenta
Working with Big Data
Courses on Hadoop
Pydoop
Recommendation engine
W.I.L.L
More resources
Kaggle competitions
Coursera
更新时间:2021-07-02 23:40:49