Machine Learning Algorithms for Bioinformatics

Table of Contents


I. Introduction

Biological data in digital symbol sequences
Genomes--diversity, size, & structure
Proteins & proteomes
On the information content of biological sequences
Prediction of molecular function & structure


II. Machine Learning Foundations: The Probabilistic Framework

Introduction: Bayesian modeling
The Cox Jaynes axioms
Bayesian inference & induction
Model structures: graphical models & other tricks
Summary


III. Probabilistic Modeling and Inference: Examples

The simplest sequence models
Statistical mechanics


IV. Machine Learning Algorithms

Introduction
Dynamic programming
Gradient descent
EM/GEM algorithms
Markov-chain Monte-Carlo methods
Simulated annealing
Evolutionary & genetic algorithms
Learning algorithms: miscellaneous aspects


V. Neural Networks: The Theory

Introduction
Universal approximation properties
Priors & likelihoods
Learning algorithms: backpropagation

VI. Neural Networks: Applications

Sequence encoding & output interpretation
Sequence correlations & neural networks
Prediction of protein secondary structure
Prediction of signal peptides & their cleavage sites
Applications for DNA & RNA nucleotide sequences
Prediction performance evaluation
Different performance measures

VII. HiddenMarkov Models: The Theory

Introduction
Prior information & initialization
Likelihood & basic algorithms
Learning algorithms
Applications of HMMs: general aspects


VIII. Hidden Markov Models: Applications

Protein applications
DNA & RNA applications
Advantages & limitations of HMMs


IX. Hybrid Systems: Hidden Markov Models and Neural Networks

The zoo of graphical models in bioinformatics Markov models & DNA symmetries
Markov models & gene finders
Hybrid models & neural network parameterization of graphical models
The single-model case
Bidirectional recurrent neural networks for protein secondary structure prediction


X. Probabilistic Models of Evolution: Phylogenetic Trees

Introduction to probabilistic models of evolution
Substitution probabilities & evolutionary rates
Rates of evolution
Data likelihood
Optimal trees & learning
Parsimony
Extensions


XI. Stochastic Grammars and Linguistics

Introduction to formal grammars
Formal grammars & the Chomsky hierarchy
Applications of grammars to biological sequences
Prior information & initialization
Likelihood
Learning algorithms
Applications of SCFGs
Experiments
Future directions


XII. Microarrays & Gene Expression Introduction to microarray data

Probabilistic modeling of array data
Clustering
Gene Regulation


XIII. Internet Resources & Public Databases


A rapidly changing set of resources Databases over databases and tools
Databases over databases in molecular biology
Sequence & structure databases
Sequence similarity searches
Alignment
Selected prediction servers
Molecular biology software links
Ph.D. courses over the Internet
Bioinformatics societies
HMM/NN simulator

Textbook's Appendix

A. Statistics
Decision theory & loss functions
Quadratic loss functions
The bias/variance trade-off
Combining estimators
Error bars
Sufficient statistics
Exponential family
Additional useful distributions
Variational methods

B. Information Theory, Entropy, & Relative Entropy
Entropy
Relative Entropy
Mutual Information
Jensen's Inequality
Maximum Entropy
Minimum Relative Entropy

C. Probabilistic Graphical Models
Notation & preliminaries
The undirected case: Markov random fields
The directed case: Bayesian networks

D. HMM Technicalities, Scaling, Periodic Architectures, State Functions, and Dirichlet Mixtures
Scaling
Periodic architectures
State functions: bendability
Dirichlet mixtures

E. Gaussian Processes, Kernel Methods, and Support Vector Machines
Gaussian process models
Kernel methods & support vector machines
Theorems for Gaussian processes & SVMs

 

 

    Main page
    Table of Content
    HW, Quiz and Exams
    Syllabus
    Policy