Computational Molecular Biology

CSE 5400/4510
Florida Institute of Technology
Instructor: Debasis Mitra

Department: Computer Sciences


Flatten The Curve


Since the days when DNA was found to be comprised of four letters a, t, c, and g, "life" started showing its digital face. Life seems to be partly biology and partly computing, and for this reason computer becomes more and more important instrument for doing biology, particularly molecular biology.

In this course on computational biology we will introduce some important problems and algorithms for computational molecular biology. Some introduction may be provided on data structures and algorithms, the students are expected to have background on fundamentals of programming. Similarly necessary introduction on biology will also be provided. A sample of the past activities can be seen below.

In the Summer 2020, I am teaching this course as Computation Virology, with focus on virus and anti-body that fights infection or foreign antigens in body. This is the first offering of the course, so, you are the lab-rats! I hope we learn together, as this fascinating field becomes deadly important in human history.

The LEARNING OBJECTIVE is spelled out here

I will try to use mostly web-resources, or rather, ask you to find web-resources and study from there.
No text is necessary, but a good book to have is An introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevezner, MIT Press 2004, ISBN: 0-202-10106-8
Text web page

Slides from Pevezner TEXT book:
Ch 3: Background
Ch 4: Motif finding
Ch 4: DNA mapping/ Partial Divest

Ch 6:Sequence distance measurement with dynamic programming, slide 52 onwards.
Pairwise Alignment with DP
Alignment algorithms Needleman–Wunsch algorithm, Smith-Waterman algorithm,
Also, check Wiki
Protein Data Bank uses mmseq2 algorithm: search->sequence search->Advanced search
Multiple Sequence Alignment

Phylogenetic Tree

Bi-clustering for GeneExpression Analysis with Micro-array

Ch 10: Clustering of Microarray data
Bioconductor tutorial on clustering: presentation
Bi-clustering presentation
Bi-clustering review paper by Tanay-Shamir 2004

Density-based non-partitioning clustering Gupta et al., ACM Tr. on Comp. Bio (2010).

Ch 10: Molecular Evolution
Ch 8: DNA sequencing: graph theory
Ch 8: Mass Spec analyses
Ch 5: Genome rearrangements
Ch 11: Hidden Markov Models

From the past:
Fragment Assembly
Structure Prediction
A tutorial on HMM

A decent introduction to molecular biology by Hunter.

Some information on Human Genome Project is here.
Wiki on DNA sequencing

Some collection of important web databases / tools Pvz-Jones book's asnmt 1 page.

A short but good tutorial on Hidden Markov Model (from Horse's mouth!)
Rabiner (Problems 1, 2 & 3) in Proc. of IEEE, Feb 1989

Pfam protein family DB

BLAST original: Altschul et al's paper
Altschul et al's Gapped & Psi-blast, Comment on Gapped-blast complexity

A lecture-note on < a href="Resources/ProtStr.ppt">Protein structure (acknowledgement:
A good short description of domain, motif, fold, etc. of proteins from wiki:
SCOP (Strctural Classification of Proteins):
CATH (Class-Architecture-Topology-HomologSuperfamily) protein structure classification wiki:
FSSP (Families of Structurally Similar Proteins) automated database, uses DALI algorithm, wiki:
Aaron's talk on Folds-Motifs
Steve Johnson's talk on Go ontology

Protein structure alignments:
Ye et al's paper
(1) STRAP site,
(2) TM Align paper
(3) LOCK alignment paper
(4) DALI paper


Intelligent Systems for Molecular Biology ISMB.

International Conference on Research on Molecular Biology RECOMB.

IEEE Computational Systems Bioinformatics Conference CSB.

---------------- Summer 2020: Computatinal Virology ---------
Official Syllabi: CSE-4510-SpecialTopic-CompVirology, CSE-5400-SpecialTopic-CompVirology

A weekly course plan/overview


A codon translator logic-code, due to Dr. Frazan

================ Spring 2011 ================
Class: W 6:30-9:15pm Crawford 402 (we will see!)
Office Hours: 2-4pm TR

A dated course plan for Fall 2011.

A news item on gene expression analysis leading to regulatory pathway discovery.

-------------------- Spring 2010 --------------
Office Hours: 2-4pm TW Home Work 1 on Biology primer

Assignment 1 1 on Sequence search

Assignment 2 1 on UNIPROT, Heart-2DE, and Phylip

Assignment 3 on Protein 3D structure alignment

Parallel suffix tree paper from IBM.

Exam time meeting: Wednesday 5/5/10, 8:20pm, on the final project results
Target PRIB 2010 conference: deadline May 20, 2010

============== PREVIOUS SEMESTERS =======

Spring 2009
plan /journal
Algorithms basics syllabus

Projects assignment (developing)

Key to the Quiz 1 on Biology primer.

Programming Assignment 1

Programming Assignment 2
Updated & due Wednesday. Penalty after due date.
Sorry, two input sequences are same. Use any example-pair(s) of your choice.

Quiz 2

Programming Assignement 3 on HMM.

On 4/15/09 Wednesday: Project-discussion with each group for 15 minutes


Final Project Presentations:
Protein Strcuture similarity measurement
Clustering gene Expresion data
EST to homologus sequences
GO Ontology

DUE 4/29/09: ANONYMOUS class feedback. THANKS.

Final (5/6, 8:30 pm): Closed book, 1 hour (not 2), some short questions, some from bio-basics, some on each project, question on writing algorithms, dry runing alg and basic understanding of algorithms...

On Quiz2 BreakPtReversal & DP answers are regraded.
Formula for aggregate is up on the spreadsheet!

Spring 2006
Assignment 1 (due 1/19/06)

Assignment 2 (points 30): (1) Answer the questions on Genebank and Swissprot databses (print the questions too)
(2) Questions 4.15, 4.16 and 4.17 from the text (p122-3).
(3) Analyze the complexities of the algorithms "BruteForceMotifSearch" (p 109) and "SimpleMedianSearch" (p113). Do not use book's analyses even if you arrive at the same results. (due 2/10/06)

Assignment 3 (points 50, FINALLY Due: May 4, 06)

There will be a guest lecture by Dr. Leonard on Tuesday, February 28.

--(Due: Presentation on May 2, '06, 7:30-10:30 pm) SEE ANNOUNCEMENT -- Presentation schedule (Room Olin EC 239-240:
System biology: 7:30-8:30 pm. (Gary Hrezo and Weijung Huang)
Protein Docking: 8:30-9:30 pm. (Johannes Nangolo and Christpher Roach)
Correlogram method in protein classification: 9:30-10:30 pm. (Kyle Cacciatore and Stephen Jonsson)

Spring 2005
Assignment 1 (due 2/8/05): Text Exc. 1, 2, 3 on page 30

Biology Presentation schedule:
Robert Asfar - 2/8/05 Florent launay - 2/8/05
Park Sung Hoon - 2/10/05 Ram, Anjali - 2/10/05

Programming assignment:
Implement Global alignment Dynamic programming algorithm, (Due: 2/12/05)

Project proposal due 3/17/05, Thursday.

Presentations: (BLAST: Rob, PAM: Anjali,
Suffix tree: Park: 3/15/05 Tuesday (15 min)

Quiz on Fragment assembly: 3/24/05 Thursday
I will let you complete it in the next class Thursday 3/31/05, for about 20 minutes at the end of the class

Programming assignment 2:(Due: 4/20/05 Thursday) Implement the dynamic programming algorithm for RNA base pairing-prediction with the simplest assumption. Use alpha values as follows: alpha(ri,rj)=-2, if (ri,rj)=(A,U) or (U,A) or (G,C) or (C,G), =0, otherwise. Program should work on any string of length up to 100.

Presentation schedule:
Protein structure prediction: Rob Asfar: 4/14-19/05
Anjali Ram: 4/19-21/05
System biology: Park: 4/21-26/05


----------------------------- A tutorial on BLAST.

Spring 2005:

Class Time: Tuesday Thursday 6:30-7:45 pm
Room: E250 -----------------------------
Spring 2003:

(The notes below are primarily from the submissions from the students in Spring 2003, particularly those from Michael Smith.)

Class schedule: Monday-Wednesday 11 - 12:15 am
Meets at: Room 132EC
Chapter 1: Introduction to Biology lecture notes.

Some database search procedures: here.

String comparison algorithms: from Cormen et al's Algorithms text book, embedded in my lecture notes on the Algorithms class notes.

Chapter 3: Sequence comparison lecture slides.

Chapter 4: Fragment Assembly lecture slides.

Chapter 6: Phylogenetic Trees lecture notes.

Chapter 8: Molecular Structure Prediction was not covered this time.

Chapter 9: DNA computing lecture slides.

Project description.

A self study done on Sickle Cell Anemia, some notes.

Spring 2004:


Expectation: (1) Literature survey on the current status of the field evidenced in bibiography development and a presentation(s), (2) and a software implementation. Both (1) and (2) for the Graduate Students, and only (2) for the Undergraduate student.
A report of approximately 5 page typed, data from the experiments, and (outside the 5 pages) source code will be due. E-mail/CD/ floppy any format is acceptable.

Due date for report submission: April 15 or next class to that date.

System Biology of E-coli cell division process. (Data: Prof. Leonard) Michel Lacle

Implementation of Blast allignment algorithm and Sequence distance measurement between Protein chains (subsequently to be expanded toward usage of Correlogram method of Huang et al, as an MS Thesis). (Data: Protein Data Bank) Gandhali Samant
Microarray data clustering algorithm implementation. (Data: ??) Sunjit Bir

Instance-based learning implementation for clustering sequences. (Data: Prof. Leonard / PDB) Manav Rattan

Fragment assembly implementation.(Data: ??) Aditi Gupta

Phylogeny reconstruction implementation. (Data: ??) Lalit Samant

Helix, sheet (secondary structure) prediction, and solvent accessibility prediction (1D structure) using DSP and homology modeling techniques. (Data: PDB) Seema Gandhi

Deploying matrix method, and dynamic programming method to detect motifs in some nucleotide sequences and then representing the sequences based on existing motifs (Ref: Gaurv Tandon's work on computer security). (Data: Prof. Leonard) Carl Harroch


Materials are copyrighted to me (year 2003), or shared with the acknowledged students, as the case may be. E-mail: