Reinforcement Learning Notes 2: Value Alignment
Have a great week, some creepy stuff, the summary we needed,The Challenge of Value Alignment and last week's sorted articles
Good morning everyone!
I wish you all a very productive week, and to that end I provide you this humble Substack article on this week’s Reinforcement Learning notes.
This was a busy week, with lots of papers submitted to arxiv.org and dutifully processed by yours truly, including some ethically ambiguous stuff. On that topic, I will be delivering on last week’s post promise to summarize the paper The Challenge of Value Alignment: from Fairer Algorithms to AI Safety.
Some creepy stuff
In this review of articles, one technical paper has grabbed my attention: Robofriend: An Adaptive Storytelling Robotic Teddy Bear . As described by the designers:
Robofriend is constructed by taking a large, 1 meter tall teddy bear, and instrumenting it to be able to move its head and arms, play videos and sound, and look at the children it is reading the story to.
Among the things Robofriend is portrayed to do is count the number of faces, so it knows when a child has walked away or stopped looking at the camera, measure focus and excitement on children’s faces, emit positive/negative feedback (“great job” vs “please be quiet”, among others) and move head and arms randomly.
On top of all these features, Robofriend is designed to learn by interaction to be as attention-grabbing as possible. And to think that the only two ethical considerations cited are privacy concerns and that the teddy bear might one day replace daycare teachers.
My primary concern in this case would be that children might be unable to differenciate between humans and artificial agents, developing affection to a machine optimized to be always monopolizing their attention.
On a similar note, the paper e-Inu: Simulating A Quadruped Robot With Emotional Sentience has also been submitted, in which a simulated quadrupled robot is designed to detect underlying emotions in speech, tonality, and facial expressions of humans present in the scene and respond accordingly.
The summary we needed
Also this week, a succint summary of Reinforcement Learning has been published by Sanjeevan Ahilan. This is aimed to “those who already have some familiarity with RL and are looking to review, reference and/or remind themselves of important ideas in the field.”
I’ve just finished the fundamentals chapter and I think it is extraordinarly useful. So, if it has been some time since you last opened Sutton and Barto’s RL Bible, this might just be your opportunity to review some important RL concepts.
Other interesting articles
If I were to underline some articles that I found specially interesting, these would be:
Towards Deployable RL - What’s Broken with RL Research and a Potential Fix
Emergent collective intelligence from massive-agent cooperation and competition
Interpretable Learned Emergent Communication for Human-Agent Teams
I really wish I had had enough time to do a deep dive on these, but I’ll try to make up for it by reviewing Towards Deployable RL - What’s Broken with RL Research and a Potential Fix on next week’s newsletter.
By the way, you can find the complete classified list of Reinforcement Learning articles submitted to arxiv.org from January 2nd through January 9th on the bottom of this newsletter.
And without any more delay, I present to you this week’s feature:
The Challenge of Value Alignment: A Summary
There is a lot of talk about Artificial General Intelligence (AGI) and value alignment, and I think this paper, The Challenge of Value Alignment: from Fairer Algorithms to AI Safety, is a great place to start, even though it does lean a bit on the utilitarian side of ethics, frequently citing classic authors such as John Stuart Mill, David Hume or Peter Singer.
Here I’ll present to you what I believe are the most important ideas and concepts by chapter. To ease the flow I will only focus on the content and will not do the usual “the authors argue that…”, “they believe that….”, “the text then explains…”. The authors of this paper are Iason Gabriel, a researcher from Deepmind, and Vafa Ghazavi, Executive Director for the James Martin Institute for Public Policy. Other authors will be cited if necessary.
1. Introduction
The challenge of value alignment centres upon the question of how to ensure that AI systems are properly aligned with human values and ameanable to human control.
AI is a complicated subject for value alignment, since we are able to encode a richer set of values in AI systems that in simpler artifacts, and also AI systems have a greater scope of action and intelligence.
2. Technology and Values
Langdon Winner argues it is possible to instill values in technologies, with examples such as Robert Moses’ design bridges to limit transport between poor and rich neighbourhoods [turns out the story might be more complex], or Baron Haussman’s design of the streets of Paris after the French Revolution, that ease the manoeuvring of the military. Another example might be Harari’s claim that the need for large datasets and computer power favors centralized forms of political authority.
We can say that there is no value neutral technology since new technologies make some outcomes more likely and some outcomes less likely to occur, they create new possibilities, and sometimes exclude certain possibilities to be realized. Technologists are engaged in a world-making activity: there is a level of responsability, and the need for methods to ensure that technology is aligned with human values. They should think about this issues early on, including whether to develop new technologies at all.
Human alignment should also be social, through key methods such as stakeholder analysis and citizen consultation.
3. Is AI Special?
What is AI?
To Stuart Russell, “machines are intelligent to the extent that their actions can be expected to match their objectives”.
AI also includes Machine Learning, a family of statistical and algorithmic approaches like supervised, unsupervised and reinforcement learning.
The Potential Uniqueness of AI Systems
ML algorithms have the same concerns about injustice, safety, and unintended consequences that are present with other technologies. They are subject to algorithmic bias, the potential to manifest a particular set of values.
Another concern has been social value misalignment, as algorithms in the criminal justice system, healthcare and facial analysis have been found to discriminate against women and non-white folks.
Some challenges specific to AI systems is that they can make decisions or choices that are more meaningful that those encountered by technologies in the past, and that once the model has been trained is hard to know why it decides one thing or another.
As per Daniel Dennet’s degrees of freedom paradigm, a simple switch that can be turned on/off by some environmental change marks a degree of freedom, and biological organisms, humans and AI networks have additional degrees of freedom, with issues of control growing complex and non-linear with each degree. Hammers and pencils are not able to respond to their environment, but artificial agents can learn new mappings between inputs and outputs, coming up with results that even surprise their human designers.
He argues that the most useful way to think about AI systems is as rational agents that have goals and intentions (intentional stance), with trajectories that "can unfold without any direct dependence on us, their creators, and whose discriminations give their internal states a sort of meaning to them that may be unknown and not in our service".
On a similar note, to Luciano Floridi and J. W. Sanders AI systems are moral agents, due to their capacity for interactivity, autonomy and adaptability.
4. Technological Approaches to Value Alignment
The alignment of powerful AI systems requires interdisciplinary collaboration, as we need a clear understanding of both the goal of alignment and the technical means.
Top-Down and Bottom-Up Approaches
As described in Wallach and Allen’s research, top-down approaches start by identifying an appropriate moral theory to align with and then designing algorithms capable of implementing it while bottom-up approaches focus upon the creation of environments or feedback mechanisms that enable agents to learn from human beahaviour and be rewarded for morally praiseworthy conduct.
Top-down approaches are based on the possibility that ethical rules can be stated in computer code, and face a dilemma: either base it on our own moral beliefs, or public principles (primarly utilitarianism). One example of this approach would be Isaac Asimov’s three laws of robotics.
An example of a bottom-up approach would be Inverse Reinforcement Learning, where the agent focuses on 'the problem of extracting a reward function given observed, optimal behaviour'. This has a particular set of challenges, like the algorithm’s opaqueness and the difficulty of ensuring it is free from bias.
Concrete Problems
Reward hacking or reward corrumption is a situation where the artificial agent manages to maximise the numerical reward it receives by finding unanticipated shortcuts or corrupting the feedback system. A classic example would be the development of an agent for the game CoastRunners, where it priorizes driving in circles and winning points by destroying stuff over finishing the race.
Other problems facing IA are that the agent may take the most efficient path and not consider side effects, how to ensure that the agent explores the world in a safe manner or ways to evaluate complex agent behaviour.
Highly Advanced AI
According to Stuart Russell, the ultimate goal of AI research is the discovery of a general purpose method that is applicable across all problem types and works effectively for large and difficult stances while making very few assumptions. This is popularly referred to as AGI, or Artificial General Intelligence.
This is very related to Nick Bostrom’s superintelligence: any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest.
Bostrom argues through the orthogonality thesis that every intelligence is compatible with any goal (benign or malign). This is contrary to Derek Parfit and Peter Singer’s belief that substantial moral insight might result from the capacity for instrumental reason.
Bostrom and Russell propose versions of the Instrumental Convergence Thesis: that AGI would display instrumental goals of self-improvement, self-preservation and resource acquisition in pursuit of its goals, even if it is to the disadvantage of human beings.
A concern with AGI is how to provide advice and direction to a smarter entity than us. To that end, techniques such as reward modelling (supplementing RL with human oversight) or safety via debate (systems debate with each other, competing to provide true answers to human operators) have been proposed.
5. The Fundamental Relevance Of Value
There are three ways to understand value alignment: alignment with instructions, alignment with true intentions, or alignment with human preferences. All three can yield misinformed, irrational or unethical results.
To achieve social value alignment AI systems ultimately need to embody principles widely endorsed by those affected, and individual values have to be aggregated to collective judgement, either through utility functions, coherent extrapolated volition or others.
Here lie two fundamental obstacles: moral uncertainty, as we are ensure if an action/theory is morally right, and moral pluralism, as people ascribe to a variety of reasonable views and perspectives.
6. Conclusion
A summary of the above.
Papers published last week
I hope you enjoyed the summary above. Here I present to you the papers published from January second through January 9th, as sorted by my own proprietary classification.
Engineering Applications
Multi-Agent Reinforcement Learning for Fast-Timescale Demand Response of Residential Loads
Nondeterministic efficient cooling with a near-unit probability
UAV-aided Metaverse over Wireless Communications: A Reinforcement Learning Approach
Distributed Machine Learning for UAV Swarms: Computing, Sensing, and Semantics
Safe Reinforcement Learning for an Energy-Efficient Driver Assistance System
Safety Filtering for Reinforcement Learning-based Adaptive Cruise Control
Deep reinforcement learning for irrigation scheduling using high-dimensional sensor feedback
Efficient Robustness Assessment via Adversarial Spatial-Temporal Focus on Videos
Large-Scale Traffic Signal Control by a Nash Deep Q-network Approach
Fairness Guaranteed and Auction-based x-haul and Cloud Resource Allocation in Multi-tenant O-RANs
FRAS: Federated Reinforcement Learning Empowered Adaptive Point Cloud Video Streaming
Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs
Catch Me If You Hear Me: Audio-Visual Navigation in Complex Unmapped Environments with Moving Sounds
Interpretable Disease Prediction based on Reinforcement Path Reasoning over Knowledge Graphs
RL + Genetic Algorthms
Mathematical Theory
IMKGA-SM: Interpretable Multimodal Knowledge Graph Answer Prediction via Sequence Modeling
Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization
Temporal Difference Learning with Compressed Updates: Error-Feedback meets Reinforcement Learning
A Survey of Feedback Particle Filter and related Controlled Interacting Particle Systems (CIPS)
Data-Driven Optimization of Directed Information over Discrete Alphabets
Inference on Time Series Nonparametric Conditional Moment Restrictions Using General Sieves
Sym-NCO: Leveraging Symmetricity for Neural Combinatorial Optimization
On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control
Reinforcement Learning Theory
Provable Reset-free Reinforcement Learning by No-Regret Reduction
Centralized Cooperative Exploration Policy for Continuous Control Tasks
Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations
Data-Driven Inverse Reinforcement Learning for Expert-Learner Zero-Sum Games
Learning-based MPC from Big Data Using Reinforcement Learning
Attention-Based Recurrency for Multi-Agent Reinforcement Learning under State Uncertainty
Towards Deployable RL - What’s Broken with RL Research and a Potential Fix
Contextual Conservative Q-Learning for Offline Reinforcement Learning
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
First Go, then Post-Explore: the Benefits of Post-Exploration in Intrinsic Motivation
Inapplicable Actions Learning for Knowledge Transfer in Reinforcement Learning
Hypernetworks for Zero-shot Transfer in Reinforcement Learning
Solving Collaborative Dec-POMDPs with Deep Reinforcement Learning Heuristics
GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond
Robust Imitation via Mirror Descent Inverse Reinforcement Learning
Phantom - A RL-driven multi-agent framework to model complex systems
DM2: Decentralized Multi-Agent Reinforcement Learning via Distribution Matching
Reinforcement Learning With Sparse-Executing Actions via Sparsity Regularization
Financial Applications
Transformer Theory
Game Theory
Ethically Ambiguous
Transformer Theory
Ethics
Human Agent Cooperation
Finishing thoughts
I hope you enjoyed this edition of Notes On Reinforcement Learning. Please subscribe if you haven’t, and you’ll receive the summary of Towards Deployable RL - What’s Broken with RL Research and a Potential Fix on next week’s newsletter.