B. Tech Biosciences and Bioengineering

BT 307        Biological Data Analysis        2-0-2-6

 

 

Syllabus: Data, descriptive statistics, and visualization: Introduction to different types of data in biology; Descriptive statistics like mean, median, mode, quartiles, standard deviation, standard error; Different types of plots like scatter plot, bar graph, line graph, pie chart, box plot, frequency histogram; Understanding error bars. Probability and probability distributions: basic concepts of probability, conditional probability, Bayes theorem; binomial, multinomial, Poisson, exponential, and Gaussian distribution; Sampling distribution and central limit theorem. Hypothesis testing: Student's t-test, Z-test, Chi-squared test, ANOVA. Correlation, regression and estimation: Pearson correlation; Regression: linear, non-linear, single and multivariate; concept of likelihood and method of maximum likelihood. Tools for data of high throughput experiments: principle component analysis; Clustering of data: K-means algorithm, hierarchical clustering; Visualization tools: heat map, volcano plot. Laboratory component: R and MS Excel based exercises on graphical visualization of data, different tests of hypothesis, estimation of correlation, regression, PCA, clustering.

 

Texts:

1.   S. Ross, A First Course in Probability, 9th Edition, Pearson Education India, 2014.

2.   R. C. Elston and W. D. Johnson, Basic Biostatistics for Geneticists and Epidemiologists: A Practical
Approach
, 1st Edition, Wiley, 2008.

3.   G. Hartvigsen, A Primer in Biological Data Analysis and Visualization Using R, 1st Edition, Columbia
University Press, 2014.

 

References:

1.M. C. Whitlock, and D. Schluter, The Analysis of Biological Data, 2nd Edition, W. H. Freeman &
Company, 2014.

2.G. P. Quinn, and M. J. Keough, Experimental Design and Data Analysis for Biologists, 1st Edition,
Cambridge University Press, 2002.

3. M. D. Ugarte, A. F. Militino, and A. T. Arnholt, Probability and Statistics with R, 2nd Edition, CRC
Press, 2016.