Departmental Colloquium: Loopholes: A window into goal communication and value alignment

Sophie Bridgers
Wednesday, March 29, 2023, 4:00 pm – 5:30 pm

Finding and exploiting a loophole, a possible but unintended interpretation of a rule or request, is a familiar facet of fable, law, and everyday life. A child may respond to their parent who says “It’s time to put the tablet down” by continuing to play with the tablet after physically putting it down on the floor. Engaging with loopholes requires a nuanced understanding of goals, social ambiguity, and value alignment and offers a new lens through which to examine human communication and cooperation. In this talk, I will first present a proposal for a formal framework of goal communication that supports intentional misunderstandings. Second, I will share a series of experiments with children and adults that (1) reveal that loophole behavior emerges around age five and is prevalent and diverse in parent-child and adult-adult interactions, and (2) indicate that adults and children evaluate loopholes more leniently than outright non-compliance, as well as predict that others will exploit loopholes when there is a pressure to cooperate but goals are in conflict. I will conclude with a discussion of the development of loophole behavior, and its implications for improving communication among humans, as well as for safer human-AI interactions.​​