Notes on Reinforcement Learning 11: Why pay a psychologist when you have ChatGPT
Topic recommendations for psychologists, code smells and the loss of plasticity in Neural Networks
Good Morning everyone!
I hope you had a great weekend. I myself had a great week, climbing a lot… even in the nighttime! (Not many people know it, but Madrid is THE place in Europe for granite climbing).
However, you are probably expecting more of Reinforcement Learning info and less climbing, so here a bring you very short summaries of three papers that I found interesting.
Psychotherapy AI Companion with Reinforcement Learning Recommendations and Interpretable Policy Dynamics
In this paper, Baihan Lin, Guillermo Cecchi, and Djallel Bouneffouf propose a Reinforcement Learning Psychotherapy AI Companion that can generate topic recommendations for therapists by analyzing patient responses. The algorithm has been designed to address four distinct psychiatric conditions, namely anxiety, depression, schizophrenia, and suicidal tendencies.
The authors evaluate the effectiveness of the algorithm by measuring its ability to accomplish the task, establish a bond with the patient, and achieve the desired outcome. The results indicate that the algorithm can capture data relatively accurately but there is clearly much progress to be made before this is useful for practitioners.
Prevalence of Code Smells in Reinforcement Learning Projects
This paper has been written by Nicolás Cardozo, Ivana Dusparic and Christian Cabrera.
The authors hypothesize that the quality of RL code can often be suboptimal due to the fact that it is not exclusively developed by RL Engineers. This can result in issues related to program quality, bugs, maintainability, and evolution problems.
To investigate this matter, a study was conducted on 24 commonly used RL-based Python problems. The study utilized standard software engineering metrics to analyze the quality of the code. The findings revealed that on average, 3.95% of the code base contained code smells, indicating a suboptimal code quality.
The study measured eight different code smells:
Long Method
Large Class
Long Parameter List
Long Method Chain
Long Scope Chaining: multiply-nested functions
Long Ternary Conditional Expression
Multiply-Nested Container: containers inside containers
Long Lambda Function
The most frequent code smells identified were "long method" and "long method chain," which highlight issues with the definition and interaction of agents. Additionally, "responsibility separation" and the appropriateness of current abstractions for the definitions of RL algorithms were also identified as common code problems.
The authors conclude with some potential improvements:
Developing specialized software quality metrics that are specifically tailored to RL projects.
Creating more expressive abstractions that can accommodate the complexity of RL problems and algorithms in a simpler way. This could help to make RL code more accessible and easier to understand, which in turn could lead to better program quality and performance.
My opinion
There is an ongoing debate on the data side of the software engineering community about if some of these code smells should be categorized as such. Long parameter lists are very common in Statistics, and rightfully so, since every mathematical model is susceptible to many kinds of tuning with different parameters. Also, one of the best ways to program in pandas involves long method lists, which some people despise but are the bread and butter of the data specialist (see here for more info on that one), and far more practical and efficient than alternatives.
I therefore disagree with the smell codes proposed but I agree with the authors conclusions that there is a lot of work to be done on the software engineering side of things to improve the way code is written in our realm. For starters, I think SOLID principles should be the status quo of code analysis, and they stand true for RL software engineering, as with any object-oriented programming framework.
Loss of Plasticity in Continual Deep Reinforcement Learning
In this paper, authors Zaheer Abbas , Rosie Zhao, Joseph Modayil , Adam White and Marlos C. Machado demonstrate that deep RL agents lose their ability to learn good policies when they cycle through a sequence of Atari 2600 games.
They analyze this phenomenon (loss of plasticity) in a broad number of experiments, concluding that the culprit is that with training, the activation footprint of the network becomes sparser, contributing to diminishing gradients.
They propose a mitigation strategy (Concatenated ReLUs (CReLUs) activation function) and demonstrate its effectiveness.
Classification of papers (13th-19th March)
These are the papers on Reinforcement Learning announced from the 13th through the 19th of March on arxiv.org, classified by area of research.
Engineering Applications
Energy
Image
Industrial Control Systems
Navigation
Networks
Programming
Robotics
Reinforcement Learning Theory
Actor-Critic
Continual RL
Exploration Methods
Explainable/Interpretable RL
Graph RL
Markov Decision Processes / Deep Theory
Act-Then-Measure: Reinforcement Learning for Partially Observable Environments with Active Measuring
Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs
Policy Gradient Converges to the Globally Optimal Policy for Nearly Linear-Quadratic Regulators
Reinforcement Learning for Omega-Regular Specifications on Continuous-Time MDP
Multi-Agent
Multi-objective
Model-based
Needs-driven
Offline RL
Offline-to-Online
Planning
Policy optimization
Q-Learning
Reinforcement Learning from Human Preferences/Feedback
Representation Learning
Robust RL
Reward optimization
World Model Learning
Legal Applications
Financial Applications
Human-agent interaction
Game Theory
Psychology
Healthcare Applications
Natural Language Processing
This is all for this week! Have a great one everyone!