Bioinformatics

Table of Contents


 

 

1. BIOINFORMATICS AND THE INTERNET

Internet Basics
Connecting to the Internet
Electronic Mail
File Transfer Protocol
The World Wide Web
2. THE NCBI DATA MODEL

Introduction
PUBs: Publications of Perish
SEQ-Ids: What's in a Name?
BIOSEQs: Sequences
BIOSEQ-SETs: Colelction of Sequences
SEQ-ANNOT: Annotating the Seqeunce
SEQ-DESCR:Describing the Sequence
Using the Model
Conclusions

3. THE GENBANK SEQUENCE DATABASE

Introduction
Primary and Secondary Databases
Format vs. Content: Computers vs. Humans
The Database
The GenBank Flatfile: A Dissection
Concluding Remarks

4. SUBMITTING DNA SEQUENCES TO THE DATABASES

Introduction
Why, Where, and What to Submit?
DNA/RNA
Population, Phylogenetic, and Mutation Studies
Protein-Only Submissions
How to Submit on the World Wide Web
How to Submit with Sequin
Updates
Consequences of the Data Model
EST/STS/GSS/HTG/SNP and Genome Centers
Concluding Remarks
Contact Points for Submission of Sequence Data to DDBJ/EMBL/GenBank

5. STRUCTURE DATABASES

Introduction to Structures
PDB: Protein Data Bank at the Research Collaboratory for Structural Bioinformatics (RCSB)
MMDB: Molecular Modeling Database at NCBI
Stucture File Formats
Visualizing Structural Information
Database Structure Viewers
Advanced Structure Modeling
Structure Similarity Searching

6. GENOMIC MAPPING AND MAPPING DATABASES

Interplay of Mapping and Sequencing
Genomic Map Elements
Types of Maps
Complexities and Pitfalls of Mapping
Data Repositories
Mapping Projects and Associated Resources
Practical Uses of Mapping Resource

7. INFORMATION RETRIEVAL FROM BIOLOGICAL DATABASES

Integrated Information Retrieval: The Entrez System
LocusLink
Sequence Databases Beyond NCBI
Medical Databases

8. SEQUENCE ALIGNMENT AND DATABASE SEARCHING

Introduction
The Evolutionary Basis of Sequence Alignment
The Modular Nature of Proteins
Optimal Alignment Methods
Substitution Scores and Gap Penalties
Statistical Significance of Alignments
Database Similarity Searching
FASTA
BLAST
Database Searching Artifacts
Position-Specific Scoring Matrices
Spliced Alignments
Conclusions

9. CREATION AND ANALYSIS OF PROTEIN MULTIPLE SEQUENCE ALIGNMENTS

Introduction
What is a Multiple Alignment, and Why Do It?
Structural Alignment or Evolutionary Alignment?
How to Multiply Align Sequences
Tools to Assist the Analysis of Multiple Alignments
Collections of Multiple Alignments

10. PREDICTIVE METHODS USING DNA SEQUENCES

GRAIL
FGENEH/FGENES
MZEF
GENSCAN
PROCRUSTES
How Well Do the Methods Work?
Strategies and Considerations

11. PREDICTIVE METHODS USING PROTEIN SEQUENCES

Protein Identity Based on Composition
Physical Properties Based on Sequence
Motifs and Patterns
Secondary Structure and Folding Classes
Specialized Structures or Features
Tertiary Structure

12. EXPRESSED SEQUENCE TAGS (ESTs)

What is an EST?
EST Clustering
TIGR Gene Indices
STACK
ESTs and Gene Discovery
The Human Gene Map
Gene Prediction in Genomic DNA
ESTs and Sequence Polymorphisms
Assessing Levels of Gene Expression Using ESTs

13. SEQUENCE ASSEMBLY AND FINISHING METHODS

The Use of Base Cell Accuracy Estimates or Confidence Values
The Requirements for Assembly Software
Global Assembly
File Formats
Preparing Readings for Assembly
Introduction to Gap4
The Contig Selector
The Contig Comparator
The Template Display
The Consistency Display
The Contig Editor
The Contig Joining Editor
Disassembling Readings
Experiment Suggestion and Automation
Concluding Remarks

14. PHYLOGENETIC ANALYSIS

Fundamental Elements of Phylogenetic Models
Tree Interpretation—The Importance of Identifying Paralogs and Orthologs
Phylogenetic Data Analysis: The Four Steps
Alignment: Building the Data Model
Alignment: Extraction of a Phylogenetic Data Set
Determining the Substitution Model
Tree-Building Methods
Distance, Parsimony, and Maximum Likelihood: What’s the Difference?
Tree Evaluation
Phylogenetics Software
Internet-Accessible Phylogenetic Analysis Software
Some Simple Practical Considerations

15. COMPARATIVE GENOME ANALYSIS 359

Progress in Genome Sequencing
Genome Analysis and Annotation
Application of Comparative Genomics—Reconstruction of Metabolic Pathways
Avoiding Common Problems in Genome Annotation
Conclusions
Problems for Additional Study

16. LARGE-SCALE GENOME ANALYSIS

Introduction
Technologies for Large-Scale Gene Expression
Computational Tools for Expression Analysis
Hierarchical Clustering
Prospects for the Future

17. USING PERL TO FACILITATE BIOLOGICAL ANALYSIS

Getting Started
How Scripts Work
Strings, Numbers, and Variables
Arithmetic
Variable Interpolation
Basic Input and Output
Filehandles
Making Decisions
Conditional Blocks
What is Truth?
Loops
Combining Loops with Input
Standard Input and Output
Finding the Length of a Sequence File
Pattern Matching
Extracting Patterns
Arrays
Arrays and Lists
Split and Join
Hashes
A Real-World Example
Where to Go From Here

18. Glossaries of Bioinformatics

 


 

    Main page
    Table of Content
    HW, Quiz and Exams
    Syllabus
    Policy