Python Reinforcement Learning
Sudharsan Ravichandiran Sean Saito Rajalingappaa Shanmugamani Yang Wenzhuo更新时间:2021-06-24 15:18:32
最新章节:Leave a review - let other readers know what you thinkcoverpage
Title Page
Copyright and Credits
Python Reinforcement Learning
About Packt
Why subscribe?
Packt.com
Contributors
About the authors
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Conventions used
Get in touch
Reviews
Introduction to Reinforcement Learning
What is RL?
RL algorithm
How RL differs from other ML paradigms
Elements of RL
Agent
Policy function
Value function
Model
Agent environment interface
Types of RL environment
Deterministic environment
Stochastic environment
Fully observable environment
Partially observable environment
Discrete environment
Continuous environment
Episodic and non-episodic environment
Single and multi-agent environment
RL platforms
OpenAI Gym and Universe
DeepMind Lab
RL-Glue
Project Malmo
ViZDoom
Applications of RL
Education
Medicine and healthcare
Manufacturing
Inventory management
Finance
Natural Language Processing and Computer Vision
Summary
Questions
Further reading
Getting Started with OpenAI and TensorFlow
Setting up your machine
Installing Anaconda
Installing Docker
Installing OpenAI Gym and Universe
Common error fixes
OpenAI Gym
Basic simulations
Training a robot to walk
OpenAI Universe
Building a video game bot
TensorFlow
Variables constants and placeholders
Variables
Constants
Placeholders
Computation graph
Sessions
TensorBoard
Adding scope
Summary
Questions
Further reading
The Markov Decision Process and Dynamic Programming
The Markov chain and Markov process
Markov Decision Process
Rewards and returns
Episodic and continuous tasks
Discount factor
The policy function
State value function
State-action value function (Q function)
The Bellman equation and optimality
Deriving the Bellman equation for value and Q functions
Solving the Bellman equation
Dynamic programming
Value iteration
Policy iteration
Solving the frozen lake problem
Value iteration
Policy iteration
Summary
Questions
Further reading
Gaming with Monte Carlo Methods
Monte Carlo methods
Estimating the value of pi using Monte Carlo
Monte Carlo prediction
First visit Monte Carlo
Every visit Monte Carlo
Let's play Blackjack with Monte Carlo
Monte Carlo control
Monte Carlo exploration starts
On-policy Monte Carlo control
Off-policy Monte Carlo control
Summary
Questions
Further reading
Temporal Difference Learning
TD learning
TD prediction
TD control
Q learning
Solving the taxi problem using Q learning
SARSA
Solving the taxi problem using SARSA
The difference between Q learning and SARSA
Summary
Questions
Further reading
Multi-Armed Bandit Problem
The MAB problem
The epsilon-greedy policy
The softmax exploration algorithm
The upper confidence bound algorithm
The Thompson sampling algorithm
Applications of MAB
Identifying the right advertisement banner using MAB
Contextual bandits
Summary
Questions
Further reading
Playing Atari Games
Introduction to Atari games
Building an Atari emulator
Getting started
Implementation of the Atari emulator
Atari simulator using gym
Data preparation
Deep Q-learning
Basic elements of reinforcement learning
Demonstrating basic Q-learning algorithm
Implementation of DQN
Experiments
Summary
Atari Games with Deep Q Network
What is a Deep Q Network?
Architecture of DQN
Convolutional network
Experience replay
Target network
Clipping rewards
Understanding the algorithm
Building an agent to play Atari games
Double DQN
Prioritized experience replay
Dueling network architecture
Summary
Questions
Further reading
Playing Doom with a Deep Recurrent Q Network
DRQN
Architecture of DRQN
Training an agent to play Doom
Basic Doom game
Doom with DRQN
DARQN
Architecture of DARQN
Summary
Questions
Further reading
The Asynchronous Advantage Actor Critic Network
The Asynchronous Advantage Actor Critic
The three As
The architecture of A3C
How A3C works
Driving up a mountain with A3C
Visualization in TensorBoard
Summary
Questions
Further reading
Policy Gradients and Optimization
Policy gradient
Lunar Lander using policy gradients
Deep deterministic policy gradient
Swinging a pendulum
Trust Region Policy Optimization
Proximal Policy Optimization
Summary
Questions
Further reading
Balancing CartPole
OpenAI Gym
Gym
Installation
Running an environment
Atari
Algorithmic tasks
MuJoCo
Robotics
Markov models
CartPole
Summary
Simulating Control Tasks
Introduction to control tasks
Getting started
The classic control tasks
Deterministic policy gradient
The theory behind policy gradient
DPG algorithm
Implementation of DDPG
Experiments
Trust region policy optimization
Theory behind TRPO
TRPO algorithm
Experiments on MuJoCo tasks
Summary
Building Virtual Worlds in Minecraft
Introduction to the Minecraft environment
Data preparation
Asynchronous advantage actor-critic algorithm
Implementation of A3C
Experiments
Summary
Learning to Play Go
A brief introduction to Go
Go and other board games
Go and AI research
Monte Carlo tree search
Selection
Expansion
Simulation
Update
AlphaGo
Supervised learning policy networks
Reinforcement learning policy networks
Value network
Combining neural networks and MCTS
AlphaGo Zero
Training AlphaGo Zero
Comparison with AlphaGo
Implementing AlphaGo Zero
Policy and value networks
preprocessing.py
features.py
network.py
Monte Carlo tree search
mcts.py
Combining PolicyValueNetwork and MCTS
alphagozero_agent.py
Putting everything together
controller.py
train.py
Summary
References
Creating a Chatbot
The background problem
Dataset
Step-by-step guide
Data parser
Data reader
Helper methods
Chatbot model
Training the data
Testing and results
Summary
Generating a Deep Learning Image Classifier
Neural Architecture Search
Generating and training child networks
Training the Controller
Training algorithm
Implementing NAS
child_network.py
cifar10_processor.py
controller.py
Method for generating the Controller
Generating a child network using the Controller
train_controller method
Testing ChildCNN
config.py
train.py
Additional exercises
Advantages of NAS
Summary
Predicting Future Stock Prices
Background problem
Data used
Step-by-step guide
Actor script
Critic script
Agent script
Helper script
Training the data
Final result
Summary
Capstone Project - Car Racing Using DQN
Environment wrapper functions
Dueling network
Replay memory
Training the network
Car racing
Summary
Questions
Further reading
Looking Ahead
The shortcomings of reinforcement learning
Resource efficiency
Reproducibility
Explainability/accountability
Susceptibility to attacks
Upcoming developments in reinforcement learning
Addressing the limitations
Transfer learning
Multi-agent reinforcement learning
Summary
References
Assessments
Chapter 1: Introduction to Reinforcement Learning
Chapter 2: Getting Started with OpenAI and TensorFlow
Chapter 3: The Markov Decision Process and Dynamic Programming
Chapter 4: Gaming with Monte Carlo Methods
Chapter 5: Temporal Difference Learning
Chapter 6: Multi-Armed Bandit Problem
Chapter 8: Atari Games with Deep Q Network
Chapter 9: Playing Doom with a Deep Recurrent Q Network
Chapter 10: The Asynchronous Advantage Actor Critic Network
Chapter 11: Policy Gradients and Optimization
Chapter 19: Capstone Project – Car Racing Using DQN
Other Books You May Enjoy
Leave a review - let other readers know what you think
更新时间:2021-06-24 15:18:32