封面
版权信息
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
Data Science Using Java
Data science
Machine learning
Supervised learning
Unsupervised learning
Clustering
Dimensionality reduction
Natural Language Processing
Data science process models
CRISP-DM
A running example
Data science in Java
Data science libraries
Data processing libraries
Math and stats libraries
Machine learning and data mining libraries
Text processing
Summary
Data Processing Toolbox
Standard Java library
Collections
Input/Output
Reading input data
Writing ouput data
Streaming API
Extensions to the standard library
Apache Commons
Commons Lang
Commons IO
Commons Collections
Other commons modules
Google Guava
AOL Cyclops React
Accessing data
Text data and CSV
Web and HTML
JSON
Databases
DataFrames
Search engine - preparing data
Summary
Exploratory Data Analysis
Exploratory data analysis in Java
Search engine datasets
Apache Commons Math
Joinery
Interactive Exploratory Data Analysis in Java
JVM languages
Interactive Java
Joinery shell
Summary
Supervised Learning - Classification and Regression
Classification
Binary classification models
Smile
JSAT
LIBSVM and LIBLINEAR
Encog
Evaluation
Accuracy
Precision recall and F1
ROC and AU ROC (AUC)
Result validation
K-fold cross-validation
Training validation and testing
Case study - page prediction
Regression
Machine learning libraries for regression
Smile
JSAT
Other libraries
Evaluation
MSE
MAE
Case study - hardware performance
Summary
Unsupervised Learning - Clustering and Dimensionality Reduction
Dimensionality reduction
Unsupervised dimensionality reduction
Principal Component Analysis
Truncated SVD
Truncated SVD for categorical and sparse data
Random projection
Cluster analysis
Hierarchical methods
K-means
Choosing K in K-Means
DBSCAN
Clustering for supervised learning
Clusters as features
Clustering as dimensionality reduction
Supervised learning via clustering
Evaluation
Manual evaluation
Supervised evaluation
Unsupervised Evaluation
Summary
Working with Text - Natural Language Processing and Information Retrieval
Natural Language Processing and information retrieval
Vector Space Model - Bag of Words and TF-IDF
Vector space model implementation
Indexing and Apache Lucene
Natural Language Processing tools
Stanford CoreNLP
Customizing Apache Lucene
Machine learning for texts
Unsupervised learning for texts
Latent Semantic Analysis
Text clustering
Word embeddings
Supervised learning for texts
Text classification
Learning to rank for information retrieval
Reranking with Lucene
Summary
Extreme Gradient Boosting
Gradient Boosting Machines and XGBoost
Installing XGBoost
XGBoost in practice
XGBoost for classification
Parameter tuning
Text features
Feature importance
XGBoost for regression
XGBoost for learning to rank
Summary
Deep Learning with DeepLearning4J
Neural Networks and DeepLearning4J
ND4J - N-dimensional arrays for Java
Neural networks in DeepLearning4J
Convolutional Neural Networks
Deep learning for cats versus dogs
Reading the data
Creating the model
Monitoring the performance
Data augmentation
Running DeepLearning4J on GPU
Summary
Scaling Data Science
Apache Hadoop
Hadoop MapReduce
Common Crawl
Apache Spark
Link prediction
Reading the DBLP graph
Extracting features from the graph
Node features
Negative sampling
Edge features
Link Prediction with MLlib and XGBoost
Link suggestion
Summary
Deploying Data Science Models
Microservices
Spring Boot
Search engine service
Online evaluation
A/B testing
Multi-armed bandits
Summary
更新时间:2021-07-02 23:44:57