Biomedical Informatics 214 (also listed as Computer Science 274 and Genetics 214)
Representations and Algorithms for Computational Molecular Biology 

Spring 2009


Lectures: Tuesdays and Thursdays 3:15pm-4:45pm in Thornton 102 (Live on E3).
Sections: TBA. Sections will not be held every week. Watch the schedule below and the class emails for information on sections dates and topics.
Internet: BMI 214 Course by streamed Internet video online on Stanford Center for Professional Development

Table of Contents

Announcements
Homeworks
Class Schedule
Sections
General Course Information

Course Wiki - post questions here


Announcements 

June 17:

Grade Histogram is available.

June 10:

Project 4 grades are now available.

June 08:

Assignment 3 grades are now available.

June 07:

Project 3 Grades are now available.
Mean/Stdev is 42.33/5.34.

June 1:

The final will be held in Alway M-112 (map) on Friday June 5th from 12:15 to 3:15 PM

June 1:

Please fill out your anonymous class surveys on axess and email the staff list once you have done so to receive participation credit for that.

May 25:

Assignment 2 Grades are now available.
Please email the staff list with questions about your grade.

May 20:

Project 2 Grades are now available.
Please email florians@stanford.edu with questions about your grade.

May 12:

Midterm Grades are now available. Grade distribution

May 7:

Project 1 Grades are now available.

April 28:

Sample questions for the midterm are available here.

April 23:

The final exam will he held during the regularly scheduled time on Friday June 5th, 12:15-3:15 p.m.
You can find the full schedule of final exams here.

April 21:

Professor Altman will have the makeup lectures on May 1st and May 15th at 11-11:50 during section in Gates B03

April 21:

The midterm will be on May 4th from 7-8:30 pm in Jordan Hall (Building 420, aka the Psychology building) in room 40. It will be multiple choice, fill-in-the-blank, and true-false. It will cover material from the first day of class up to and including material from class on April 30th.

April 13:

Class will be cancelled this thursday 4/16. Professor Altman will hold a makeup lecture during a friday discussion section on a date that is TBA.

March 31:

Just to let everyone know, there is an industry seminar series this quarter held Fridays 12:15-1:05 pm that might be of interest to students in this class:

Spring Seminar Series: Biomedin 206 - Informatics in Industry
Effective management, modeling, and acquisition and mining of biomedical information has become a crucial and strategic arm of today's healthcare and biotechnology companies. This seminar series explores the approaches to information management adopted by industry leaders.

BMI206 Website

Homeworks 

Post all questions about projects and assignments to the wiki

You can currently pick up graded assignments (if you missed getting them at class), atClark S260.
Topic
Out
Due
Assignment 0 Questionnaire Instructions Thurs., April 2, 2009 5pm, Sat., April 4, 2009
Assignment 1 Exploring information on the Internet
Grades
Fri., April 3, 2009 5pm, Fri., April 10, 2009
Project 1 Sequence Alignment Quiz Thurs., April 2, 2009 5pm, Sat., April 18, 2009
Project 2 Supervised and Unsupervised Learning on Microarray Data
Thurs., April 16, 2009 5pm, Sat., May 2, 2009
Assignment 2 Machine learning for expression data and genotype-phenotype association, using an existing software package
Assignment 2

Turn in as PDF by email to biomedin214-spr0809-submit@lists.stanford.edu.
Tues., April 28, 2009 5pm, Wed., May 6, 2009
Project 3 3D Structure and Function
Instructions
Thur., April 30, 2009 5pm, Sat., May 16, 2009
Project 4 Molecular Dynamics
Instructions
Thur., May 14, 2009 5pm, Sat., May 30, 2009
Assignment 3 Sequence Analysis
Assignment 3
Tues., May 26, 2009 5pm, Thur., June 4, 2009

Class Schedule

Topic
Lecturer
Lecture Notes
Recommended Readings and Other Info
Introduction to Bioinformatics and Computational Genomics  Altman 
Dynamic Programming Sequence Alignment  Altman 
Intro to Microarrays  Altman 
Microarrays, Clustering and Classification  Altman 
Basic 3D computation, and structural alignment  Altman 
Multiple sequence alignment  Altman 
1D & 3D motifs Altman 
Hidden Markov Models   Altman 
Gibbs Sampling  Altman 
Molecular energetics and dynamics  Altman 
Protein structure prediction: homology modeling and ab initio  Altman 
Fold recognition & Intro to RNA  Altman 
RNA Folding  Altman 
Phylogenetics  Altman 
Comparative genomics  Altman 
Natural Language Processing in Biology & Wrap up  Altman 

Section 

Sections are held on Fridays, 11:00-11:50 AM in Gates B03 . Sections are scheduled as needed; they will not be held every week. The schedule below will be updated with dates and topics throughout the quarter.
Date
Topic
TA
Notes
April 3 Introduction to biology Tiffany Chen Intro to Biology
April 10 Python tutorial Jesse Rodriguez Python notes

Python notes II - Matrices in Python

Python tutorial: http://diveintopython.org/
(a free but excellent book)

Python documentation: http://www.python.org/doc/
(The tutorial is a good introduction, and the library reference tells you all about the standard library, which is a big part of what makes python so useful.)
May 1 Makeup Lecture I Prof. Altman
May 15 Makeup Lecture II Prof. Altman
May 21 Population Haplotype Based Positive Selection Detection Methods Erik Corona Papers Discussed:
Original Publication Introducing EHH and REHH (Nature)
Original Publication Introducing iHS (PLOS Biology)
Original Publication Introducing XP-EHH (Nature)
Recent Paper Relying on Methods Discussed (Genome Research)
Slides for Positive Selection Talk
May 22 From Genome-Wide Association Studies to Medicine Florian Schmitzberger Lecture slides
Optional reading:
Genomewide Association Studies and Human Disease (NEJM)
Genomewide Association Studies-Illuminating Biologic Pathways (NEJM)
Common Genetic Variation and Human Traits (NEJM)
Genetic Risk Prediction-Are We There Yet? (NEJM)
Genes Show Limited Value in Predicting Diseases (NY Times)
Articles discussed in section (not required reading, the relevant parts will be presented in section):
Genetic Mapping in Human Disease (Science)
Genetically Elevated C-Reactive Protein and Ischemic Vascular Disease (NEJM)
Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers (PLOS)


General Course Information

|| Staff || Discussion Groups || FAQ || Description || Units || Grading || Exams || Late Policy || Partner Policy || Auditors || Prerequisites || Computer Resources || Code Policy || Textbook || Note on courses ||

Instructor:

Russ B. Altman
Professor of Bioengineering, Genetics, & Medicine (and Computer Science by courtesy)
russ.altman at stanford.edu
Course Coordinator:
Tiffany Murray (tiffany.murray at stanford.edu)
Department of Bioengineering
Clark S170, MC: 5444
650-725-0659
Teaching Assistants:

Erik Corona
Office Hours: Fridays 1:00-2:00pm, MSOB. Take every right after climbing the stairs to the 2nd floor and this will lead you to my carrel (in front of Joel Dudley's office).

Florian Schmitzberger
Office Hours: Thursdays 9am-10am. Starting April 16th
Location: Clark Center South Building, third floor. Room S-347. Enter using the doors next to Peet's Coffee.

Jesse Rodriguez
Office Hours: Monday 10am-11am, Clark Center South Building, Second floor, 260

Tiffany Chen
Office Hours: Tuesdays 2pm-3pm. Clark Center South Building, Second floor, S260.

Discussion Groups:(top)
Wiki
The course wiki is the preferred method of asking questions. It also is a great way to communicate with your peers, set up study groups, etc. We suggest that you use this to discuss projects and assignments with one another as often, that will be your fastest way of receiving a response, especially when deadlines are coming up.

Mailing Lists
All questions relating to assignments, projects, and exams should be posted to the wiki and NOT sent to the staff mailing list. The staff will not respond to these kinds of questions sent to the mailing list.

Description: (top)

This course will introduce the basic computational issues and methods used in molecular biology, combining core lectures, programming assignments, with midterm and final. The course will introduce and use biological data sources available on the World Wide Web. Topics will include basic algorithms for alignment of biological sequences and structures, as well as more advanced representational and algorithmic issues in structure and sequence computation. These include, for example, dynamic programming algorithms for alignment, structural superposition algorithms, computing with distance information, 3D motif definition and computation, hidden Markov models, phylogenetic trees, statistical feature detection, genetic algorithms, design of data resources, automated analysis of biological literature, database integration, and collaborative environments for supporting biology.
Units:(top) Grading: (top)
The course will be graded by performance on short homework assignments (approximately 25%), long projects (approximately 50%), midterm (approximately 10%), final (approximately 10%), and participation (approximately 5%).
Exams: (top)
Midterm: Mon., May 4, 7-8:30pm, location TBA

Final: Fri, June 5, 12:15-3:15 p.m., location TBA

Late Policy: (top)
Each student is granted 7 "free" late days that can be used as extensions for any project or assignment. (Note that this is a total of 7 days for the entire quarter, not per assignment). Late days will be measured in 24-hour/day calendar days with no distinction for weekends or holidays, and will be rounded UP to the nearest integer (thus, 10 minutes late = 23 hours late = 1 day late). After you use up all your free days, your grade on late projects/assignments/exams will be reduced 10% for each late day. Extensions beyond the 7 free days may be granted at the discretion of the instructor (not the TAs) and must be requested prior to the due date.
Partner Policy: (top)
For assignments:
Students may discuss and work on problems in groups but must write up their own solutions. When writing up the solutions, students must write the names of people with whom they discussed the assignment.

For programming projects:
Students may discuss ideas with others. However, programs are to be completed independently and should be original work. Code may not be shared. Names of students with whom programming ideas were discussed should be included with assignment.

Auditors: (top)
Auditors for the course should take it for one unit as BMI 216. This course requires attendance at lectures, sign-in at each lecture (approval for missing a lecture), but does not require completion of homeworks or tests. It is for one unit, received for attending all lectures.

Auditors who want to sit-in on the course but not be officially signed up for 1 unit of credit should get approval from Dr. Altman, and will also be asked to attend all lectures, sign-in, and not do the homeworks or tests.

Prerequisites: (top)
  1. Programming skills are required at the level of CS106A/CS106B or CS106X. This course has a significant component of programming, and so students should enter it with ability to create moderately complex data structures, and implement algorithms using these data structures. Acceptable languages are outlined in the code policy.
  2. Biology 40 or equivalent is recommended, since we will quickly move through many biology topics. It may be useful to have a textbook of molecular biology for reference during the course, for those who do not think about biology very much. We will have TA sessions devoted to biology brush-up.
Computer Resources: (top)
You will need to have access to email (be sure you're registered on Axess so that you get email announcements sent to the course list), the course website, and the Stanford cardinal machines. All of these resources are available to Stanford students at Sweet Hall and elsewhere as well as through remote (ssh) access. To log in, you will need to use your SUNet ID. If you don't have a SUNet ID, see http://sunetid.stanford.edu ASAP.

To log in to the "cardinal" cluster machines, use a secure shell (ssh).

Windows: You will have to download a terminal emulation that allows ssh. Stanford offers a few free ones here; a popular ne is PuTTY. Directions for using Putty to connect to cardinal:

Under "Host Name", enter cardinal.stanford.edu
Under "Protocol", choose SSH
Press the "open" button.
A terminal window should appear, connected to cardinal. Putty will tell you if there was an error.
OS X, Unix, Linux
Open a terminal window
Type "ssh sunetid@cardinal.stanford.edu"
(See http://www.stanford.edu/services/cluster/ and http://www.stanford.edu/services/cluster/which.html for more information on various campus machines.) Most course material will be placed on the course website in *.pdf (Adobe Acrobat) format, which allows the documents to be read on multiple platforms. Readers are available for free for Windows, Macintosh and many Unix platforms at the Adobe website.
Code / Language Policy: (top)
Familiarize yourself with the Code/Language Policy before choosing a language and starting the first programming assignment.
Optional Course Textbook: (top)

Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G., Biological Sequence Analysis : Probabilistic Models of Proteins and Nucleic Acids. 1999, Cambridge Univ Pr. ISBN: 0521629713

Other Recommended books:

Beazley, David M., Python Essential Reference, 3rd ed., SAMS Publishers, 2006. Chapter 1 is an excellent tutorial and introduction to Python, and overall, this can be a valuable reference when coding.

Kohane, I.S., Kho, A., Butte, A.J., Microarrays for an Integrative Genomics (Computational Molecular Biology). 2002, MIT Press. ISBN: 026211271X.

Mount, D.W., Bioinformatics : sequence and genome analysis. 2nd edition (July, 2004), Cold Spring Harbor Laboratory Press. ISBN: 0879696877.

Bourne, P.E., Weissig, H. (editors), Structural Bioinformatics. 2004, John Wiley & Sons. ISBN: 0471201995. This book is also available from the Wiley Interscience website at http://www3.interscience.wiley.com/cgi-bin/homepage/?isbn=0471721204 via the campus network.


Note on courses in computational biology: (top)

BMI 214 (also listed as CS 274) is this course. It has been taught since 1996 and is an introduction to representations and algorithms for analysis of sequence, structure and function. It requires programming skills and aims to give an understanding of the biological problems that arise, and how algorithms are developed to address them. It does not train students to be expert users of tools, but gives them an in-depth knowledge of some tools and a broad introduction to the technical issues in analysis of biological data. It is taught live on Tuesdays/Thursdays and is also on Stanford Online. Section is taught on Friday mornings.

Biochem 218 (also listed as BMI 231) is Doug Brutlag's course introducing computational molecular biology, also a number of years old. It is more geared towards gaining an expert understanding of existing tools and databases, and as such complements BMI 214 very nicely. There is no programming required. Most students take both eventually and learn a lot--even the areas where there is overlap are presented differently enough to round out one's understanding. For logistical reasons this course is also being taught on Tuesday/Thursday, and is on Stanford Online.

CS 262 (Computational Genomics) is Serafim Batzoglou's course. It focuses principally on algorithms for sequence assembly, analysis and comparison. It will have a strong CS algorithms and data structures component, probably with an element of software engineering as well. It is likely to complement both courses, although in the future, about 1/3 of BMI 214 may overlap sufficiently to require coordination--the part about sequence and string analysis. The coordination has not been done as of now, however. It does not contain much on 3D structure computation and functional computing, judging from the syllabus. The course will be taught live. You should ask Prof. Batzoglou about his plans to offer it via Stanford Online.

|| Staff || Discussion Groups || FAQ || Description || Units || Grading || Exams || Late Policy || Partner Policy || Auditors || Prerequisites || Computer Resources || Code Policy || Textbook || Note on courses ||

Questions? Contact: biomedin214-spr0809-staff@lists.stanford.edu