CSE 599J: Social Reinforcement Learning

Instructor: Natasha Jaques (nj at cs)

TAs:
Sriyash Poddar (sriyash at cs)

Lecture: MW: 1:30 PM - 2:50 PM Location: LOW 105

Remote Meeting Link: meet.google.com/ovx-ykoh-rdi

Office Hours:

Natasha (Gates 234): Monday 4-5pm
Sriyash (Gates 223): Friday 12:30-1:30pm

Important Links

  • Ed discussion board (link)
  • Presentation sign up sheet (link)
  • Paper presentation schedule (TBD)
  • Submit anonymous feedback here

Overview

How can we accelerate AI when learning in an environment with other intelligent agents? This course focuses on Social Reinforcement Learning in multi-agent and human-AI interactions. From studying the natural world, we know that social learning is an incredibly powerful mechanism that helps both humans and animals rapidly adapt to new circumstances, coordinate with others, and drives the emergence of complex learned behaviors. From recent advances in AI, we know that reinforcement learning from human feedback (RLHF) is an incredibly powerful mechanism for improving the capabilities and alignment of large models. This course will link these two perspectives, examining the complexities of modeling, learning from, and coordinating with other agents, whether those agents are humans or other RL agents in a simulation. We will study how social learning can address fundamental issues in AI like learning and generalization, as well as improving the ability of AI to coordinate with and interact with people.

Although we will cover a brief introduction to reinforcement learning (RL), familiarity with RL and deep learning is encouraged. The course is a project course; in addition to reading and discussing relevant research papers, students will submit a team-based final project in the form of a research paper.

Schedule Overview

  • Introduction to RL and multi-agent RL
  • Discussion of interesting multi-agent RL papers
  • Coordination with humans and SOTA MARL
  • Emergent complexity
  • Social learning
  • Learning from humans (including inverse RL, language-conditioned RL)
  • RLHF history
  • RLHF latest developments
  • RLAIF / Multiagent LLMs
  • Project presentations

Class Format

Classes will be split so that on approximately half the days Natasha will present a ~50 minute lecture followed by ~30 minutes of questions and discussion, and on the remaining days we will have a student-led ‘Discussion’ class.

For student Discussion days, we will have students prepare presentations about 3 papers. Each presentation will be 10 minutes, followed by 5 minutes of clarifying questions about that paper. After the papers are presented, we will break into groups to discuss each of the papers, the themes that connect them, and interesting and impactful research directions that relate to them. In summary, Discussion days will have the following format:

  • 45 minutes of paper presentations
    • Each is 10 minutes of presentation, 5 minutes of questions
  • 25 minutes of group discussions

Grading

  • Class participation (15%)

    Because this is primarily a discussion course, to make it work we need students to attend class in person, ask questions, and participate in paper discussions. Therefore, a portion of your grade depends on doing this. To get the full 15%, you need to read the relevant papers, show up on time, and make comments or ask questions in at least 14/17 classes (you are allowed to miss 3; see the Course Policies section). However, we hope you will go beyond this minimum requirement and make the most of the class by actively participating and sharing your thoughts and questions on the research we are learning about.

  • Paper reflections (10%)

    We will have 9 Discussion classes for which you will prepare a 300-word paper reflection to be uploaded to EdSTEM (due at 11:59pm two days (Saturday/Monday) before every class). Reflections will be graded on a pass/fail basis. As per the Course Policies, you can miss submitting 0 reflection with no penalty. Presenters still need to submit reflections for the papers they are presenting.

  • Lead discussion (10%)

    During the quarter, every student is expected to present at least once. Use the class schedule spreadsheet to sign up for a particular presentation time, and choose the paper you would like to present. Add a link to your presentation slides to the spreadsheet by at least 11:59pm two days before class. Because we currently have more students than presentation slots, you are allowed to present in pairs. However, to encourage students to present alone we will give a 5% bonus for those who do a solo presentation. For more information about the criteria we will use to grade the presentations, please see Paper Presentations.

  • Peer review (15%)

    Projects in the course will be evaluated in part using a peer review system, such as OpenReview. You will be responsible for submitting 3 reviews of other students’ papers in the system, which will be due 1 week after the project is due, by 11:59pm on May 31. Each review will be worth 5% of your grade and will be graded based on whether it is thorough, complete, fair, and whether it gives the authors useful feedback for improving their paper. Examples of how to give good reviews of machine learning papers can be found online, for example here.

  • Course project (50%)

    Proposal (5%): due 11:59pm April 17, 2024
    Project presentation (10%): in class on May 22 and May 29.
    Writeup (35%): due 11:59pm May 22, 2024

Students are encouraged to form groups of up to 4 students. Projects with more students are expected to be more impactful, such that a project with 3 students should be roughly 3x as impressive as a project with 1 student. Please include a statement of contributions in the appendix of the paper describing what each person did. However, don’t be discouraged from teaming up into groups! In the ideal case, you can aim to write a paper that could potentially be submitted to a top-tier ML conference (note that the NeurIPS deadline is 5/22), and you may need more students to pull this off.

Course Policies

Late submissions and absences

To reduce the burden on instructors, create flexibility for students, and maintain consistent treatment across students, we will allow you to drop 1/9 paper reflections with no penalty. Your participation grade will also be based on participating in 14/17 class discussions, so you can miss 3 classes with no penalty. We understand that sometimes you may need to miss class due to travel obligations, or various other circumstances. Our goal is that for most of these issues you do not need to contact us about it. It will not affect your grade unless you go beyond 3 classes.
Late assignments will not be graded by default. The due dates have been set to being as late as possible while allowing for feedback on the presentation slides and prompt grading of the course project.

DEI

This course welcomes all students of all backgrounds. The computer science and computer engineering industries have a significant lack of diversity. This is due to a lack of sufficient past efforts by the field toward even greater diversity, equity, and inclusion. The Allen School seeks to create a more diverse, inclusive, and equitable environment for our community and our field. You should expect and demand to be treated by your classmates and myself with respect.

  • If any incident occurs that challenges this commitment to a supportive, diverse, inclusive, and equitable environment, please let me know so the issue can be addressed. I have created an anonymous feedback form to make this easier. If you have any feedback, suggestions, or experience any issues related to diversity, equity, and inclusion, and would like to report them anonymously, please use the form. Supporting DEI is a process that requires continual learning and growth, and I value your feedback on how we can improve the course on this dimension.
  • You can also submit feedback through the Allen School’s anonymous feedback form.

Generative AI

Do not use generative AI tools to write your paper reflections or class paper. This will inhibit your ability to gain useful skills from taking this course, which is the whole point.

DRS accommodations

Embedded in the core values of the University of Washington is a commitment to ensuring access to a quality higher education experience for a diverse student population. Disability Resources for Students (DRS) recognizes disability as an aspect of diversity that is integral to society and to our campus community. DRS serves as a partner in fostering an inclusive and equitable environment for all University of Washington students. The DRS office is in 011 Mary Gates Hall. Please see the UW resources at the DRS website. If you have DRS accommodations that the course staff should know about, please contact us at the beginning of the course.

Religious accommodations

Washington state law requires that UW develop a policy for accommodation of student absences or significant hardship due to reasons of faith or conscience, or for organized religious activities. The UW’s policy, including more information about how to request an accommodation, is available at Religious Accommodations Policy. Accommodations must be requested within the first two weeks of this course using the Religious Accommodations Request form.

Sexual harassment

University policy prohibits all forms of sexual harassment. If you feel you have been a victim of sexual harassment or if you feel you have been discriminated against, you may speak with your instructor, teaching assistant, the chair of the department, or you can file a complaint with the UW Ombudsman’s Office for Sexual Harassment. Their office is located at 339 HUB, (206)543-6028. There is a second office, the University Complaint Investigation and Resolution Office, who also investigate complaints. The UCIRO is located at 22 Gerberding Hall. Please see additional resources at the UW office of Ombud.

Land acknowledgement

The University of Washington acknowledges the Coast Salish peoples of this land, the land which touches the shared waters of all tribes and bands within the Suquamish, Tulalip and Muckleshoot nations. Resources: For additional resources, see CSE Students and Student Resources.

Collaboration

Programming projects are designed for a group of 4 students. Each group should write their own writeup and code.

We encourage you to discuss all course activities with your friends and classmates as you work through them. Feel free to talk through struggles with your peers as long as you follow the academic misconduct warnings that have been relayed in every course you’ve taken thus far. It’s okay to look at online resources as long as sources are cited and code isn’t copied.

Here’s a reference in case you need a refresher.

Disclaimer

I reserve the right to modify any of these plans as need be during the course of the class; however, I won’t do anything capriciously, anything I do change won’t be too drastic, and you’ll be informed as far in advance as possible.

Acknowledgements

We thank Brian Hou, Abhishek Gupta and Zoey Chen for providing us the template for the website.