Course BIO110
Genomic Perl and BLAST  
An advanced PERL Bioinformatics Course
Genomic Perl and BLAST  
An advanced PERL Bioinformatics Course
Duration: 5 Days
Intended Audience
Attendees are expected to be experienced Perl programmers with a sound knowledge of databases and molecular biology.
Course Overview
The course covers, in depth, the Perl programming techniques used to compare DNA sequences, statistical techniques for species prediction, subsitution matrices for amino acids, reading and processing sequence files, understanding BLAST and using BLAST from within Perl programs.
The second half of the course deals with more advanced topics such as multiple sequence alignment, phylogeny reconstruction, protein motifs, coding sequence prediction, satellite identification and restriction mapping.
The course is split roughly 50/50 between teaching and labs.
Course Benefits
This course is aimed at Perl Bioinformatics application developers who already have a sound knowledge of Perl programming and the molecular biology concepts underpinning Bioinformatics and who need to develop advanced Bioinformatics applications for e.g. medical, pharmaceutical, agricultural or environmental applications. It covers development of genomic Perl modules as well as use of existing genomic Perl modules.
Course Contents
Overview of the Central Dogma of Molecular Biology
- Perl programs modeling transcription and translation
RNA secondary structure
- overview of RNA molecular biology
- secondary structure of RNA
- Perl scripts for identifying RNA secondary structures
Perl scripts for basic DNA sequence comparison
Perl scripts and statistical models for species prediction
Substitution matrices for Amino Acids
- PAM matrices
- Perl scripts for working with PAM matrices Perl scripts for accessing sequence databases
- FASTA format
- GenBank format
- Perl scripts for reading sequence files
BLAST (Basic Local Alignment Search Tool)
- NCBI BLAST, WU BLAST
- Submitting searches and viewing results over the Web
- Output formats and alternate alignment views Sequence alignment theory and algorithms
- Needleman-Wunsch algorithm for global alignment
- Smith-Waterman algorithm for local alignment
- Sequence similarity
- Amino acid similarity
- Scoring matrices
- Target frequencies
- Sequence similarity metrics
- Karlin-Altschul statistics
- Sum statistics and sum scores
BLAST programs, algorithms and structure
- Design of BLAST
- Overview of the BLAST algorithm
- Structure of BLAST reports
- BLAST statistics
- BLAST search heuristics
- BLAST Protocols
- BLAST databases
Perl and BLAST
- Implementing BLAST in Perl
- BLAST statistics in Perl
Multiple Sequence Alignment
- Heuristics for alignment merging
- Implementing Perl scripts for merging alignments
- Tunnel alignments and the branch and bound method implemented in Perl
Perl scripts for Phylogeny Reconstruction
- Data structures and algorithms for Trees
- Parsimonious Phylogenies and tree pruning
- Perl scripts for reconstructing Phylogenies
- Perl scripts for building and pruning trees
Protein Motifs and PROSITE
- PROSITE database format
- Patterns in PROSITE and in Perl
- Suffix trees and suffix links
- Application of suffix trees in searching PROSITE databases
Fragment Assembly
- Shortest common superstrings
- PHRAP algorithm
- Aligning reads
- Adjusting qualities
- Assigning Reads to Contigs
- Consensus sequences
Coding sequence prediction
- Trigram model
- Hexagram model
- Gene prediction
- Perl scripts for finding genes
Satellite Identification and scripts for locating satellites
Restriction mapping
- Backtracking algorithm for partial digests
- Perl scripts for processing partial digest data
- Uncertain measurement and interval arithmetic
- Perl scripts for interval arithmetic
- Handling Partial digest data and dealing with uncertainty in Perl
Studying Genome rearrangements,Theory and associated Perl scripts
- Reversals
- Sorting by reversals
- Signed and unsigned permutations
- Happy cliques
