 |
BST 520 - Current Topics in Bioinformatics - Spring 2007
Lecturers:
Anthony Almudevar
Dept of Biostatistics and Computational Biology
Office: MRBX G-11311F
Phone: (585) 275-6992
Fax: (585) 273-1031
E-mail: Anthony_Almudevar@urmc.rochester.edu
Galina Glazko,
Dept Biostatistics and Computational Biology
Office: MRBX G-11318D
Phone: (585) 273-4319
Fax: (585) 273-1031
E-mail: Galina_Glazko@urmc.rochester.edu
Schedule: Mondays and Wednesdays, 11:00 – 12:15, room TBA
Syllabus: The following represents scheduled topics by date. Assigned Readings will be given in advance, and discussed during the lecture.
I. Introduction
01.17. Introduction to Molecular Biology and Bioinformatics.
II. Sequence-Oriented Bioinformatics
01.22. 1. Alignment of pairs of sequences. Scoring matrices. Dynamic programming algorithms.
01.24. 2. Sequence database searching for similar sequences.
01.29. 3. DNA level: Coding and non-coding DNA, repeats. Computational methods for gene prediction.
01.31. 4. RNA level: Algorithms for RNA folding problem.
02.05. 5. Protein level: Protein classification and structure prediction.
02.07. 6.
Phylogenetic inference.
02.12. 7. Students presentations based on the assigned reading.
02.14. 8. Students presentations based on the assigned reading.
III Gene Expression Studies
02.19. 1. Methodological aspects. Differences among microarray platforms. Data pre-processing: background correction and normalization. Models to summarize probe sets.
02.21. 2. Statistical aspects. Identification of differentially expressed genes (parametric and non-parametric statistics, drawbacks and benefits).
02.26. 3. Statistical aspects. Multiple hypothesis testing: Control of error rates (PCER, FWER, FDR). Empirical Bayes Method.
02.28. 4. Statistical classification of gene expressions data.
03.05. 5. Prognostic and diagnostic molecular signatures.
Spring Break: March 12 –March 16, 2007
03.07. 6. Searching for differentially expressed gene combinations.
03.19. 7. Stochastic dependencies in gene expression data.
03.21. (Auxiliary Lecture) Pedigree Analysis.
03.26. Students presentations based on the assigned reading.
03.28. Students presentations based on the assigned reading.
IV Networks.
04.02 1. Functional genomics and gene regulatory networks.
04.04 2. Network models: Relevance networks, Boolean networks, Bayesian networks, Petri nets.
04.09 3. Network models (continued).
04.11 4. Graphical modeling: Monte Carlo Markov chain methods. Learning methods. Computational Bayesian approaches.
04.16 5. Graphical modeling (continued).
04.18 6. Gene perturbation experiments and experimental design.
04.23 Students presentations based on the assigned reading.
04.25 Students presentations based on the assigned reading.
VI. Final Presentations.
04.30 Session 1.
05.02 Session 2.
05.07 Session 3.
Evaluation: Students will be expected to give several presentations during the course. During the dates 02.12, 02.14, 03.21, 03.26, 04.18, 04.23 presentations will be based on the assigned readings. The number given will depend on the enrollment. In addition, students will give one final presentation based on a paper or topic of the students choosing during the 05.02 and 05.07 lectures (this presentation period will be extended if needed). Presentations will be followed by a question and answer period. Students will be graded based on their presentations (1/3 for assigned reading presentations and 2/3 for final presentation).
Suggested Reading: There will be no single text for this course. Assigned reading (usually journal articles) will be posted at the course website.
Useful supplemental reading includes:
DNA Microarrays and Gene Expression: From Experiments to Data Analysis and Modeling, Baldi P and Hatfield GW. Cambridge University Press, 2002.
Genomic Signal Processing and Statistics. EURASIP Book Series on Signal Processing and Communications, editors Dougherty ER, Shmulevich I, Chen J and Wang ZJ. Hindawi Publishing Corporation, 2005.
The Analysis of Gene Expression Data, edited by Parmigiani G, Garrett ES, Irizzary RA, and Zeger SL. Springer-Verlag, 2003.
Computational genome analysis: an introduction. Deonier, RC, Tavare, S., Waterman, M.S. Springer, 2005.
Assigned reading for Lecture N1, Section II (Alignment of pairs of sequences. Scoring matrices. Dynamic programming algorithms)
Altschul SF (1991) Amino acid substitution matrices from an information theoretic perspective. J Mol Biol 219:555-565.
altschul1991.pdf
Dayhoff MO, Schwartz RM and Orcutt BC. A model of evolutionary change in proteins. dayhoffetal1978.pdf
Karlin S and Altschul SF (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. PNAS 87:2264-2268. karlin_altshul.pdf
Bruno M et al (2003) LAGAN and multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Research 13:721:731.
Atschul SF et al (2001) The estimation of statistical parameters for local alignment score distributions. Nucleic Acids Research 29:351-361.
Notredame C et al (2000) T-Coffee: a novel method for fast accurate multiple sequence alignment. J Mol Biol 302:205-217.
Phuong TM et al (2006) Multiple alignment of protein sequences with repeats and rearrangements. Nucleic Acids Research 34:5932-5942
Bray et al (2003) AVID: A Global Alignment Program. Genome Research 13:97-102. http://www.genome.org/cgi/content/full/13/1/97
Pei J and Grishin NV (2006) MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information. Nucleic Acids Research 34:4364-4374.
http://nar.oxfordjournals.org/cgi/content/full/34/16/4364
Assigned reading for Lecture N2, Section II (Sequence database searching for similar sequences)
Kent WJ (2002) BLAT – the BLAST-like alignment tool. Genome Research 12:656-664.
Brenner SE et al (1998) Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. PNAS 95:6073-6078.
Ma B et al (2002) PatternHunter: faster and more sensitive homology search. Bioinformatics 18:440-445.
Yu Y-K and Altschul SF (2005) The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions. Bioinformatics 21:902-911.
Yu Y-K et al (2006) Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches. Nucleic Acids Research 34:5966-5973.
Assigned reading for Lecture N3, Section II (Gene regulation and prediction)
Borodovsky M and McIninch J (1993) GENMARK: Parallel gene recognition for both DNA strands. Computers Chem. 17:123-133.
Burge C and Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268:78-94.
Batzoglou S et al (2000) Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Research. 10:950-958.
Kaplan et al (2005) Ab initio prediction of transcription factor targets using structural knowledge. PloS Computational Biology. 1:e1.
Keich U and Pevzner PA (2002) Finding motifs in the twilight zone. Bioinformatics. 18:1374-1381.
Siddharthan R et al (2005) PhyloGibbs: a Gibbs sampler motif finder that incorporates phylogeny. PloS Computational Biology. 1:e67.
Ng P et al (2006) Apples to apples: improving the performance of motif finders and their significance analysis in the Twilight Zone. Bioinformatics. 22:e393-e401.
Korf I et al (2001) Integrating genomic homology into gene structure prediction. Bioinformatics. 17:S140-S148.
Assigned reading for Lecture N4, Section II (RNA folding problem)
Mathews DH et al (1999) Expanded Sequence Dependence of Thermodynamic Parameters Improves Prediction of RNA Secondary Structure. J. Mol. Biol. 288:911-940.
Mathews DH (2004) Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. RNA 10:1178-1190.
Mathews DH (2006) Revolutions in RNA Secondary Structure Prediction J. Mol. Biol. 359: 526–532.
Xia T et al (1998) Thermodynamic Parameters for an Expanded Nearest-Neighbor Model for Formation of RNA Duplexes with Watson-Crick Base Pairs. Biochemistry 37: 14719-14735.
Mathews, D. H., Schroeder, S. J., Turner, D. H. & Zuker, M. (2005). In The RNA world, third edition (Gesteland, R. F., Cech, T. R. & Atkins, J. F., eds.), pp. 631-657. Cold Spring Harbor Laboratory Press, Cold Spring Harbor. (Available online at: http://rna.cshl.edu/content/free/contents/rnaworld3e_toc.html)
Assigned reading for Lecture N5, Section II (Protein level: Protein classification and structure prediction)
Lathrop RH and Smith TF Global Optimum Protein Threading with Gapped Alignment and Empirical Pair Score Functions. J. Mol. Biol. 255:641–665.
Rost B. Protein Structure Prediction in 1D, 2D, and 3D.
Simons KT et al (1997) Assembly of Protein Tertiary Structures from Fragments with Similar Local Sequences using Simulated Annealing and Bayesian Scoring Functions. J. Mol. Biol. 268:209-225.
Sali A and Blundell TL (1993) Comparative protein modeling by satisfaction of spatial restraints. J. Mol. Biol. 234:779-815.
Schueler-Furman O et al (2005) Progress in Modeling of Protein Structures and Interactions Science. 310:638-642.
Salom D et al (2006) Crystal structure of a photoactivated deprotonated intermediate of rhodopsin. PNAS. 103:16123-16128.
software/websites (for the same lecture):
MODELLER, "for homology modeling http://salilab.org/modeller/
AMBER, " for molecular dynamics simulations http://amber.scripps.edu/
Folding at home http://folding.stanford.edu/
ROSETTA, ab initio protein structure prediction software: http://depts.washington.edu/ventures/UW_Technology/Express_Licenses/Rosetta/
DOCK, free software for protein-ligand docking http://dock.compbio.ucsf.edu/
Assigned reading for Lecture N6, Section II (Phylogenetic inference)
Wang L-S et al (2006) Distance-Based Genome Rearrangement Phylogeny. J Mol Evol 63:473–483.
Ane´ C et al (2006) Bayesian Estimation of Concordance among Gene Trees Mol. Biol. Evol. 24:412–426.
Zakon HH et al (2006) Sodium channel genes and the evolution of diversity in communication signals of electric fishes: Convergent molecular evolution. PNAS. 103:3675-3680.
Durand D et al (2005) A Hybrid Micro-Macroevolutionary Approach to Gene Tree Reconstruction.
S. Miyano et al. (Eds.): RECOMB 2005, LNBI 3500, pp. 250–264.Springer-Verlag Berlin Heidelberg 2005.
Moret BME and Warnow T (2004) Advances in Phylogeny Reconstruction from Gene Order and Content Data. Prepint.
Assigned reading for Lecture 2, Section III (Identification of differentially expressed genes)
Qiu X, Brooks AI, Klebanov L and Yakovlev A (2005) The effects of normalization on the correlation structure of microarray data. BMC Bioinformatics, 6:120.
Klebanov L, Gordon A, Xiao Y, Land H and Yakovlev A (2006) A permutation test motivated by microarray data analysis. Computational Statistics & Data Analysis, 50, 3619 – 3628.
Xiao Y, Gordon A and Yakovlev A (2006) A C++ Program for the Cramer-von Mises Two-Sample Test. Journal of Statistical Software. 17:8.
Klebanov L, Qiu X, Welle S and Yakovlev A (2007) Statistical methods and microarray data. Nature Biotechnology 25, 25 – 26.
MACQ consortium (2006) The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nature Biotechnology 24, 1151 - 1161.
Assigned reading for Lecture 3, Section III (Multiple hypothesis testing)
Benjamini Y and Hochberg Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57, 289-300.
Dudoit S, Yang YH, Callow MJ and Speed TP (2000) Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica, 12, 111-139.
Qiu X, Klebanov L and Yakovlev A (2005) Correlation Between Gene Expression Levels and Limitations of the Empirical Bayes Methodology for Finding Differentially Expressed Genes. Statistical Applications in Genetics and Molecular Biology. 4, Article 34.
Qiu X, Xiao Y, Gordon A and Yakovlev A (2006) Assessing Stability of Gene Selection in Microarray Data Analysis. BMC Bioinformatics, 7:50.
Efron B (2003) Robbins, empirical Bayes and microarrays. Ann. Statist. 31, 366–378.
Assigned reading for Lecture 4, Section III (Statistical classification of gene expressions data).
Ambroise C and McLachlan GJ (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. PNAS, 99, 6562-6566.
Bryan J (2004) Problems in gene clustering based on gene expression data. Journal of Multivariate Analysis. 90, 44-66.
Datta S and Datta S (2003). Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics. 19, 459-466.
Dobra A, Wang Q and West M (2004) Graphical model-based gene clustering and metagene expression analysis. Technical Report, Institute of Statistics & Decision Sciences, Duke University.
Garge NR, Page GP, Sprague AP, Gorman BS and Allison DB (2005) Reproducible Clusters from Microarray Research: Whither? BMC Bioinformatics. 6(Suppl 2): S10.
Kerr MK and Churchill GA (2001) Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments. PNAS, 98, 8961-8965.
Yeung KY, Haynor DR and Ruzzon WL (2001). Validating clustering for gene expression. Bioinformatics. 17, 309-318.
Zhou X, Kao M-CJ and Wong WH (2002) Transitive functional annotation by shortest-path analysis of gene expression data. PNAS, 99, 12783-12788.
Assigned reading for Lecture 5, Section III (Prognostic and diagnostic molecular signatures).
Hastie T, Tibshirani R, and Friedman J (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, New York.
Li H and Gui J (2004). Partial Cox regression analysis for high-dimensional microarray gene expression data. ISMB04/Bioinformatics (in press).
Nguyen DV and Roche DM (2002). Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18:39–50.
Osborne, Presnell, and Turlach (2000). On the LASSO and its Dual. Journal of Computational and Graphical Statistics 9:319-337.
Park PJ, Tian L, and Kohane IS (2002). Linking expression data with patient survival times using partial least squares. Bioinformatics 18:1625–1632.
Tibshirani R (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58:267–288.
Wang, Nan, Zhu, and Beer (2006). Doubly Penalized Buckley-James Method for Survival Data with High-Dimensional Covariates. Working Paper, U Michigan Biostats (in press).
Assigned reading for Lectures 6-7, Section III ( (6) Searching for differentially expressed gene combinations. (7) Stochastic dependencies in gene expression data. Prognostic and diagnostic molecular signatures).
Qiu X and Yakovlev A (2006) Some comments on instability of false discovery rate estimation. Journal of Bioinformatics and Computational Biology. 4, 1057–1068.
Klebanov L and Yakovlev A (2006) How High is the Level of Technical Noise in Microarray Data? Technical Report. noiseDirect5.pdf
Chen L, Klebanov L and Yakovlev A (2007) Normality of gene expresseion revisited. Journal of Biological Systems. 15, 39–48
Klebanov L and Yakovlev A (2006) Treating epression lvels of dfferent gnes as a sample in microarray data analysis: is it worth a risk? Statistical Applications in Genetics and Molecular Biology. 5, Article 9
Klebanov L, Jordan C and Yakovlev A (2006) A New Type of Stochastic Dependence Revealed in Gene Expression Data. Statistical Applications in Genetics and Molecular Biology. 5, Article 7
Assigned reading for Auxiliary Lecture, Section III (Pedigree Analysis).
Almudevar A (2003) A simulated annealing algorithm for maximum likelihood pedigree reconstruction. Theoretical Population Biology. 63, 63-75.
Guo S-W and Thompson E (1992) Performing the exact test of Hardy-Weinberg proportion for multiple alleles. Biometrics, 48, 361-372.
Marshall,TC, Slate J, Kruuk LEB and Pemberton JM (1998). Statistical confidence for likelihood-based paternity inference in natural populations. Molecular Ecology, 7, 639-655.
Thompson E and Meagher TR (1987) Parents and sib likelihoods in genealogy reconstruction. Biometrics, 43, 585-600.
Thompson E (1994) Monte Carlo likelihood in the genetic mapping. Statistical Science. 9, 355-366.
Assigned reading for Section IV (Gene Networks).
Lectures A,B1,B2
- Butte AJ, Kohane IS (2000) Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pacific Symposium on Biocomputing, 418-429.
- Butte AJ, Tamayo P, Slonim D, Golub TR, Kohane IS (2000) Discovering functional relationships between RNA expression and chemotherapetic susceptibility. PNAS, 97, 12182-12186.
- Chu T, Glymour C, Scheines R, Spirtes P. A statistical problem for inference to regulatory structure from associations of gene expression measurements with microarrays. Bioinformatics 19:1147-1152, 2003
- de la Fuente A, Bing N, Hoeschele I, Mendes P. Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics 20(18):3565-3574 2004.
- Hughes, T.R., Marton, M.J., Jones, A.R., Roberts, C.J., Stoughton, R., Armour, C.D., Bennett, H.A., Coffey, E., Dai, H., He, Y.D., Kidd, M.J., King, A.M., Meyer, M.R., Slade, D., Lum, P.Y., Stepaniants, S.B., Shoemaker, D.D., Gachotte, D., Chakraburtty, K., Simon, J., Bard, M., Friend, S.H. (2000). Functional Discovery via a Compendium of Expression Profiles, Cell, 102, 109-126.
- Schafer J, Strimmer K. An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics 21(6):754-64, 2005.
- Wille A, Bühlmann P. Low-order conditional independence graphs for inferring genetic networks. Statistical Applications in Genetics and Molecular Biology 5(1) Article1. 2006. Available at: http://www.bepress.com/sagmb/vol5/iss1/art1.
Lectures C,D,E,F:
- Akutsu T, Maruyama O, Kuhara S and Miyano S (1998) A system for identifying genetic networks from gene expression patterns produced by gene disruptions and overexpression. The Ninth Workshop on Genome Informatics, 151-160.
- Akutsu T, Miyano S and Kuhara S (1999) Identification of genetic networks from a small number of gene expression patterns under the Boolean network model.
Pacific Symposium on Biocomputing, 4, 17-28.
- Akutsu T, Miyano S and Kuhara S (2000) Algorithms for inferring qualitative models of biological networks, Pacific Symposium on Biocomputing, 5, 290-301.
- Almudevar A (2007) A graphical approach to relatedness inference. Theoretical Population Biology, 71, 213-229
- Friedman N, Murphy K and Russel S (1998) Learning the structure of dynamic probabilistic networks. Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, 139-147.
- Friedman N and Goldszmidt M (1998) Learning Bayesian networks with local structure. In Learning in Graphical Models, (ed. Jordon M.I.), 412-459, MIT press, Cambridge, Massachusetts.
- Friedman N, Linial M, Nachman I and Pe'er D (2000) Using Bayesian Network to Analyze Expression Data, J. Computational Biology, 7, 601-620.
- Friedman N and Koller D (2003) Being Bayesian about network structure: A Bayesian approach to structure discovery in Bayesian networks. Machine Learning. 50, 92-125
- Friedman N (2004) Inferring cellular networks using probabilistic graphical models. Science, 303, 799-805.
- Husmeier D (2003) Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks. Bioinformatics, 22, 2271-2282.
- Ideker TE, Thorsson V and Karp R, (2000) Discovery of regulatory interactions through perturbation: inference and experimental design, in Pacific Symposium on Biocomputing, 5, 302-313.
- Kyoda K, Baba K, Onami S and Kitano H (2004) DBRF–MEGN method: an algorithm for deducing minimum equivalent gene networks from large-scale gene expression profiles of gene deletion mutants
Bioinformatics, 20, 2662 - 2675.
- Nariai N, Tamada Y, Imoto S and Miyano S (2005) Estimating gene regulatory networks and protein–protein interactions of Saccharomyces cerevisiae from multiple genome-wide data. Bioinformatics, 21, 206-212.
- Pe'er D, Regev A, Elidan G and Friedman N (2001) Inferring subnetworks from perturbed expression profiles. Bioinformatics, 1, 1-9.
- Sebastiani P, Abad M and Ramoni MF (2005), Bayesian Networks for Genomic Analysis. In Genomic Signal Processing and Statistics. EURASIP Book Series on Signal Processing and Communications.
- Segal E, Wang H and Koller D (2003) Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics, 19, 264-271.
- Shmulevich I, Dougherty ER and Zhang W (2002) From Boolean to probabilistic Boolean networks as models of genetic regulatory networks. Proceeding of the IEEE, 90, 1778-1792.
- Shmulevich I, Dougherty ER, and Zhang W (2003) Gene perturbation and intervention in probabilistic Boolean networks. Bioinformatics, 18, 1319-1331.
- Tegner J, Yeung MKS, Hasty J and Collins JJ (2003)
Reverse engineering gene networks: integrating genetic perturbations with dynamical modeling, Proceedings of the National Academy of Sciences, 100, 5944-5949.
- Wagner A (2001) How to construct a large genetic network from n perturbations in fewer than n2 easy steps. Bioinformatics, 17, 1183-1197
- Wagner A (2004) Reconstructing pathways in large genetic networks from genetic perturbations. Journal of Computational Biology, 11, 53-60.
- Yu J, Smith VA, Wang PP, Hartemink AJ, Jarvis ED (2004) Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics, 20, 3594-3603.
- Zhou X, Wang X, Pal R, Ivanov I, Bittner M and Dougherty E (2004) A Bayesian connectivity-based approach to constructing probabilistic gene regulatory networks, Bioinformatics, 20, 2918-2927.
Course Notes:
Section II - Lecture 0
Section II - Lecture 1
Section II - Lecture 2
Section II - Lecture 3
Section II - Lecture 6
Section III - Lecture 1
Section III - Lecture 2
Section III - Lecture 3
Section III - Lecture 4
Section III - Lecture 5
Section III - Lecture 6
Section III - Lecture 7
Section III - Auxiliary Lecture
Section IV - Lecture A
Section IV - Lecture B1
Section IV - Lecture B2
Section IV - Lecture C
Section IV - Lecture D
Section IV - Lecture E
Section IV - Lecture F
Please send your comments and suggestions about this web
page to the BST Webmaster (webmaster@bst.rochester.edu)
|