Contents
All blogs / 7 real-world applications of reinforcement learning
February 17, 2022 • Joy Zhang • Resources • 4 minutes
Reinforcement learning is a subdomain of machine learning in which agents learn to make decisions by interacting with their environment. It recently gained popularity through its ability to achieve superhuman-levels of play in games like Go, Chess, Dota, and StarCraft II.
In this article, I’ve put together a list of 7 examples where reinforcement learning is being applied in real-world use cases.
Approaches to self-driving cars have historically involved defining logic rules. This can be difficult to scale out to the countless number of situations that might be encountered by autonomous vehicles on public roads. This is where deep reinforcement learning may be promising.
Wayve is a UK-based company that has been testing autonomous vehicles on public roads since 2018. In their paper, 'Learning to Drive in a Day', they describe how they used deep reinforcement learning to train a model using a monocular image as input. The reward was the distance travelled by the vehicle without the safety driver taking control. The model was trained in a driving simulation and then deployed in the real world on a 250-meter section of road.
While their autonomous vehicle technology continues to evolve, they claim that reinforcement learning continues to play a part in motion planning (ensuring the existence of a feasible path between the target and destination points).
Netflix has 200 million users in over 190 countries. For each of these users, Netflix aims to present the most entertaining and relevant videos. In the presentation 'Netflix Explains Recommendations and Personalization' by Justin Basilico (Director of Machine Learning and Recommender Systems at Netflix), he describes how they achieve this by combining four key approaches: deep learning, causality, bandits & reinforcement learning, and objectives.
The challenge is to train a model that optimizes for a user’s long-term satisfaction, over immediate gratification. Reinforcement learning can help by introducing exploration which lets the model learn about new interests over time.
Justin notes that reinforcement learning is challenging to apply in this setting due to the high dimensionality and large problem space. To help with this, the team developed Accordion — a simulator for long-term training.
Walmart is the world's largest retailer and grocer with over 4,650 stores. Walmart must constantly move unsold inventory to make space for new and better-selling items. The usual strategy to move unwanted stock is to implement price reduction. This is a time-consuming and laborious undertaking that requires re-labelling discounted merchandise multiple times on a store-by-store basis.
To reduce operating costs, Walmart created an algorithm to optimize price reductions. The algorithm ingests data including sales data, operating costs, number and type of merchandise, and the dynamic time frame for when the merchandise must be sold by.
The approach applies data analytics, reinforcement learning, and dynamic optimization to make automated decisions for each individual product, and is tailored to each store. The result is lowered operating costs and increased sales, with some stores experiencing up to 15% higher sales of the stock to be moved.
Search.io is an AI search engine for on-site search queries. They use both 'learn-to-rank' and reinforcement learning techniques to improve their search ranking algorithm.
Learn-to-rank involves using a machine learning model trained on a dataset of query-result pairs scored based on their relevance. One disadvantage of this technique is that the inputs (query-result pair scores) remain static.
Reinforcement learning helps to improve the search algorithm over time using feedback in the form of clicks, sales, signups, etc. The challenge with applying reinforcement learning in this setting is that the search result quality typically starts out low, and needs time and data before it starts to meet customer expectations.
GPT-3 is a language model used to generate human-like text. A downside of these language models is the tendency to 'hallucinate' information when performing tasks that require obscure real-world knowledge. To improve this, OpenAI taught GPT-3 to use a text-based web browser. The model is able to search and collect information from web pages, and use these to compose answers to open-ended questions.
The model is initially trained using human demonstrations. From there, the helpfulness and accuracy of the model are improved by training a reward model to predict human preferences. The system is then optimized against this reward model using either reinforcement learning or rejection sampling. The result was that the system was found to be more 'truthful' than GPT-3.
There has been reluctance in the financial industry to apply machine learning due to the high monetary risks. In this article, IBM describes a trading system trained with reinforcement learning.
The advantage of reinforcement learning in this setting is the ability to learn to make predictions that account for whatever effects the algorithm’s actions have had on the state of the market. This feedback loop allows the algorithm to auto-tune over time, continually making it more powerful and adaptable. The reward function is based on the profit or loss made in each trade.
The model was assessed against a Buy-and-Hold strategy and ARIMA-GARCH (a forecasting model). They found that the model was able to capture head-and-shoulder patterns, which is a non-trivial feat.
Developing controllers for robotics is a challenging task. Typical methods include careful modelling, but can be prone to failure when exposed to unexpected situations and environments.
A team at the University of California, Berkeley tried to address this by training a real bipedal robot using reinforcement learning. The team was able to develop a model that resulted in a more diverse and robust walking control of a robot named Cassie.
The deployed model was able to perform various behaviours such as changing walking heights, fast walking, walking sideways and turning in the real world. It was also robust to changes in the robot itself (e.g. partially damaged motors) and the environment (e.g. changes in ground friction and being pushed from different directions). You can watch Cassie in action in this video.
While reinforcement learning applications in the real world are still in their early days, I hope this list highlights the potential of the technology and the exciting progress that has already taken place so far. Who knows what else we might see in the next few years with ongoing developments in data collection, simulations, processing power, and research?
If the field of reinforcement learning excites you, here are some of my other articles you might find useful:
Thanks for reading!