Bioinformatics
Table of Contents
1. BIOINFORMATICS AND THE INTERNET
Internet Basics
Connecting to the Internet
Electronic Mail
File Transfer Protocol
The World Wide Web2. THE NCBI DATA MODEL
Introduction
PUBs: Publications of Perish
SEQ-Ids: What's in a Name?
BIOSEQs: Sequences
BIOSEQ-SETs: Colelction of Sequences
SEQ-ANNOT: Annotating the Seqeunce
SEQ-DESCR:Describing the Sequence
Using the Model
Conclusions3. THE GENBANK SEQUENCE DATABASE
Introduction
Primary and Secondary Databases
Format vs. Content: Computers vs. Humans
The Database
The GenBank Flatfile: A Dissection
Concluding Remarks4. SUBMITTING DNA SEQUENCES TO THE DATABASES
Introduction
Why, Where, and What to Submit?
DNA/RNA
Population, Phylogenetic, and Mutation Studies
Protein-Only Submissions
How to Submit on the World Wide Web
How to Submit with Sequin
Updates
Consequences of the Data Model
EST/STS/GSS/HTG/SNP and Genome Centers
Concluding Remarks
Contact Points for Submission of Sequence Data to DDBJ/EMBL/GenBank5. STRUCTURE DATABASES
Introduction to Structures
PDB: Protein Data Bank at the Research Collaboratory for Structural Bioinformatics (RCSB)
MMDB: Molecular Modeling Database at NCBI
Stucture File Formats
Visualizing Structural Information
Database Structure Viewers
Advanced Structure Modeling
Structure Similarity Searching6. GENOMIC MAPPING AND MAPPING DATABASES
Interplay of Mapping and Sequencing
Genomic Map Elements
Types of Maps
Complexities and Pitfalls of Mapping
Data Repositories
Mapping Projects and Associated Resources
Practical Uses of Mapping Resource7. INFORMATION RETRIEVAL FROM BIOLOGICAL DATABASES
Integrated Information Retrieval: The Entrez System
LocusLink
Sequence Databases Beyond NCBI
Medical Databases8. SEQUENCE ALIGNMENT AND DATABASE SEARCHING
Introduction
The Evolutionary Basis of Sequence Alignment
The Modular Nature of Proteins
Optimal Alignment Methods
Substitution Scores and Gap Penalties
Statistical Significance of Alignments
Database Similarity Searching
FASTA
BLAST
Database Searching Artifacts
Position-Specific Scoring Matrices
Spliced Alignments
Conclusions9. CREATION AND ANALYSIS OF PROTEIN MULTIPLE SEQUENCE ALIGNMENTS
Introduction
What is a Multiple Alignment, and Why Do It?
Structural Alignment or Evolutionary Alignment?
How to Multiply Align Sequences
Tools to Assist the Analysis of Multiple Alignments
Collections of Multiple Alignments10. PREDICTIVE METHODS USING DNA SEQUENCES
GRAIL
FGENEH/FGENES
MZEF
GENSCAN
PROCRUSTES
How Well Do the Methods Work?
Strategies and Considerations11. PREDICTIVE METHODS USING PROTEIN SEQUENCES
Protein Identity Based on Composition
Physical Properties Based on Sequence
Motifs and Patterns
Secondary Structure and Folding Classes
Specialized Structures or Features
Tertiary Structure12. EXPRESSED SEQUENCE TAGS (ESTs)
What is an EST?
EST Clustering
TIGR Gene Indices
STACK
ESTs and Gene Discovery
The Human Gene Map
Gene Prediction in Genomic DNA
ESTs and Sequence Polymorphisms
Assessing Levels of Gene Expression Using ESTs13. SEQUENCE ASSEMBLY AND FINISHING METHODS
The Use of Base Cell Accuracy Estimates or Confidence Values
The Requirements for Assembly Software
Global Assembly
File Formats
Preparing Readings for Assembly
Introduction to Gap4
The Contig Selector
The Contig Comparator
The Template Display
The Consistency Display
The Contig Editor
The Contig Joining Editor
Disassembling Readings
Experiment Suggestion and Automation
Concluding Remarks14. PHYLOGENETIC ANALYSIS
Fundamental Elements of Phylogenetic Models
Tree InterpretationThe Importance of Identifying Paralogs and Orthologs
Phylogenetic Data Analysis: The Four Steps
Alignment: Building the Data Model
Alignment: Extraction of a Phylogenetic Data Set
Determining the Substitution Model
Tree-Building Methods
Distance, Parsimony, and Maximum Likelihood: Whats the Difference?
Tree Evaluation
Phylogenetics Software
Internet-Accessible Phylogenetic Analysis Software
Some Simple Practical Considerations15. COMPARATIVE GENOME ANALYSIS 359
Progress in Genome Sequencing
Genome Analysis and Annotation
Application of Comparative GenomicsReconstruction of Metabolic Pathways
Avoiding Common Problems in Genome Annotation
Conclusions
Problems for Additional Study16. LARGE-SCALE GENOME ANALYSIS
Introduction
Technologies for Large-Scale Gene Expression
Computational Tools for Expression Analysis
Hierarchical Clustering
Prospects for the Future17. USING PERL TO FACILITATE BIOLOGICAL ANALYSIS
Getting Started
How Scripts Work
Strings, Numbers, and Variables
Arithmetic
Variable Interpolation
Basic Input and Output
Filehandles
Making Decisions
Conditional Blocks
What is Truth?
Loops
Combining Loops with Input
Standard Input and Output
Finding the Length of a Sequence File
Pattern Matching
Extracting Patterns
Arrays
Arrays and Lists
Split and Join
Hashes
A Real-World Example
Where to Go From Here18. Glossaries of Bioinformatics
Main page
Table of Content
HW, Quiz and Exams
Syllabus
Policy