General information

Course Logistics

Theme

This is a capstone course. This quarter, we will work on projects using large language models (chatGPT, bioGPT) for biomedicine.

Prerequisites and Grading

Prerequisites: Students entering the class should be comfortable with programming and should have a pre-existing working knowledge of linear algebra (MATH 308), vector calculus (MATH 126), probability and statistics (CSE 312/STAT390), and algorithms. For a brief refresher, we recommend that you consult the linear algebra and statistics/probability reference materials on the Textbooks page.

Grading: Your grade will be based on course project (80%), participation in classes (10%), participation in Research Showcase (10%).

Course project: Every student needs to work on an individual project (not as a team, but we will have lots of discussion in the class). Every student might need to briefly update the project progress in every class. Milestone: mid-term presentation, final presentation, final project report.

Research Showcase: 45-minute invited presentation about ongoing computational biology research by Allen School PhD students and the instructor. The instructor will then lead the discussion about the limitation, potential improvement and future directions.

Tentative Schedule

Date Content Reading Due
3/28 Welcome/overview. Introduction to CSE428. (Sheng) Molecular Biology for Computer Scientists
Central dogma (10 mins)
Transcription/translation
3/30 How to formulate a computational biology problem to a natural language processing problem? (Sheng)
Multilingual translation for zero-shot biomedical classification using BioTranslator
ProTranslator: zero-shot protein function prediction using textual description
Slides
4/4 Project idea presentation (students) A Review of Biomedical Datasets Relating to Drug Discovery: A Knowledge Graph Perspective Project idea presentation (10 minutes each student). Put slides here .
4/6 Project idea presentation (students) Project idea presentation (10 minutes each student)
4/11 Literature review (students)
4/13 Literature review (students)
4/18 Benchmark for success, potential pitfalls, alternative approaches (students)
4/20 Benchmark for success, potential pitfalls, alternative approaches (students)
4/25 Work time (coding, questions, experiments)
4/27 Work time (coding, questions, experiments)
5/2 Mid-term project presentation (part 1, students)
5/4 Mid-term project presentation (part 2, students)
5/9 Research Showcase (Addie Woicik, NLP for biomedical data augmentation)
5/11 Research Showcase (Hanwen Xu, NLP for retrosynthesis)
5/16 Project updates, work time
5/18 Research Showcase (Tong Chen, NLP for COVID protein interaction prediction)
5/23 Project updates, work time
5/25 Research Showcase (Zixuan Liu, NLP for biomedical noisy label learning)
5/30 Final project presentation (part 1, students)
6/1 Final project presentation (part 2, students)