Preparing Students for the Data-Driven Challenges of Today's World
Our MS in Biostatistics and Data Science program provides top-class training in biostatistics and data science techniques that are essential to collect, manage, and analyze biomedical and health data.

Our coursework offers students a foundation for data science careers in health-related fields and beyond.
Real-World Skills
We provide comprehensive hands-on training in statistical concepts and programming. During the MS in Biostatistics and Data Science program, students will:
- Use state-of-the-art statistical and data science approaches to address modern data challenges.
- Gain invaluable real-world exposure under the guidance of experienced biostatisticians and data scientists.
- Build experience in the field through a faculty-mentored research project.
- Take advantage of NYC’s proximity to leading educational institutions and some of the largest pharmaceutical hubs in the country.
- Create close professional relationships with a diverse faculty, through low student-to-faculty class ratios.
- Exposure to specializations such as health services research, cost-effectiveness, and comparative-effectiveness.
Unique Expertise
Our MS in Biostatistics and Data Science program is unique as it focuses on data mining and machine learning techniques yet retains the rigor of a traditional Biostatistics program.
Students from all over the world join this track with backgrounds in science (e.g., statistics, mathematics, biology, etc.), engineering, health and medicine.
Graduates are prepared for data science careers in the public and private biomedical, healthcare, insurance and pharmaceutical sectors, both in academia and industry.
The MS in Biostatistics and Data Science program has close ties to other programs within the Weill Cornell Medical College and Cornell University, the Department of Statistics and Data Science at Cornell University, the Cornell Tech campus in New York City, and NewYork-Presbyterian. Students can complete the MS in Biostatistics and Data Science program in 16 months starting in Fall 2023. Students must complete at least 36 credits to graduate.
Prerequisites for Admission
Information Sessions
Alumni Outcomes
Program Director
BDS Full Time Student - Recommended Curriculum Progression
Students are recommended to follow the schedule below in order to ensure eligibility for graduation. The Education Team will monitor progression, but it is ultimately the student’s responsibility to track their progression to ensure they meet graduation requirements. Course offerings and course availability are subject to change. BDS students must take 27 credits of the required courses, and 9 credits of electives (optional courses).
Fall Term 1
Students take 12 required credits, with the option of adding an elective
Biostatistics I with R Lab (HBDS 5005) - Required
Course Director: Xi Kathy Zhou, PhD
4 credits
This course provides an introduction to important topics in biostatistical concepts and reasoning. Specific topics include tools for describing central tendency and variability in data, probability distributions, sampling distributions, estimation, and hypothesis testing. Assignments will involve computation using the R programming language.
Study Design (HBDS 5015) - Required
Course Director: Linda Gerber, PhD
1.5 credits
The course will describe and apply measures of disease incidence and prevalence, and measures of effect; explain the basic principles underlying different study designs, including descriptive, ecological, crosssectional, cohort, case-control and intervention studies; assess strengths and limitations of different study designs; identify problems interpreting epidemiological data: chance, bias, confounding and effect modification; address validity, intra-rater reliability and inter-rater reliability.
Categorical and Censored Data Analysis (HBDS 5016) - Required
Data Science I with R (HBDS 5018) - Required
Course Director: Wenna Xi, PhD
3 credits
This course provides an introduction to data science using both the R and python programming languages. In this course students will gain experience working directly with data to pose and answer questions. The course will be divided into two parts; the first part will be taught with the programming language R and the second with python. Topics covered include: reproducible research, exploratory data analysis, data manipulation, data visualization techniques, simulation design, and unsupervised learning methods.
Master’s Project 1 (HCPR 6010) - Required
Course Director: Faculty
2 credits
This is the culminating capstone course of all masters-level graduate education programs. It has two aims: (1) helping students to discover and develop new and effective ways of managing and working together with all the stakeholders within the healthcare field and (2) helping accelerate a student's development of 12 the context awareness, integrative management, and industry skills that are needed to lead in a rapidly changing healthcare sector. This capstone course puts students in a new organization, one they don’t already know well, and gives them the chance to practice hitting the ground running. This culminating course provides a deeper preparation for the next stages of a student's career. The capstone project will last the entire year: the first term involves matching students with the right project, the second term has students working with their client, and the third term consists of a detailed report and final presentation in front of the client as well as faculty and fellow classmates.
Statistical Programming with SAS (HBDS 5011) - Elective
Course Director: Zhengming Chen, PhD, MPH, M.S.
3 credits
This course provides introduction to the statistical software SAS. Students will receive a hands-on 6 exposure to data management and report generation with one of the most popular statistical software packages.
AI Fundamentals with Python (HBDS 5022) - Elective
3 credits
Spring Term
Students take 6 required credits, with the option of 1 or 2 electives
Biostatistics II - Regression Analysis (HBDS 5008) - Required
Course Director: Samprit Banerjee, PhD., MStat
3 credits
The focus of this course is theory and application of different types of regression analysis. Topics will include: linear regression, logistic regression, and cox proportional hazards regression. Additional topics will include coding of explanatory variables, residual diagnostics, model selection techniques, random effects and mixed models, and maximum likelihood estimation. Homework assignments will involve 4 computation using the R statistical package.
Master’s Project 2 (HCPR 6021) - Required
Course Director: Faculty
3 credits
This is the culminating capstone course of all masters-level graduate education programs. It has two aims: (1) helping students to discover and develop new and effective ways of managing and working together with all the stakeholders within the healthcare field and (2) helping accelerate a student's development of the context awareness, integrative management, and industry skills that are needed to lead in a rapidly changing healthcare sector. This capstone course puts students in a new organization, one they don’t already know well, and gives them the chance to practice hitting the ground running. This culminating course provides a deeper preparation for the next stages of a student's career. The capstone project will last the entire year: the first term involves matching students with the right project, the second term has students working with their client, and the third term consists of a detailed report and final presentation.
Data Management (SQL) (HBDS 5021) - Elective
Course Director: Debra D’Angelo
3 credits
This course covers tools that students will need to create, manage and maximize value from big databases. The emphasis is on design and implementation of relational databases and the use of Structured Query Language (SQL). At the end of this course, students will be able to explain the requirements for handling large and complex datasets; be able to design, build, and query a relational database; and understand how relational databases and big-data targeted tools complement one another.
Big Data in Medicine (HBDS 5020) - Elective
Course Director: Samprit Banerjee, PhD, MStat
3 credits
There has been an explosion of big data in medicine and healthcare. There are four main sources of such big data – 1) administrative databases in healthcare such as electronic health records and health insurance claims, 2) biomedical imaging (e.g. MRI, CT-Scan, X-ray etc.) 3) sensors in smartphones, wearable and implantable devices and 4) genetics and genomics. It is difficult to navigate and critically assess the statistical methods and analytic tools that are needed to conduct analytics and research with such big biomedical data. This course will introduce the four above-mentioned important sources of big data in medical studies, discuss the nuances and intricacies of how such data are generated and introduce tools to navigate such databases visualize and describe them.
Summer Term
Students take 3 required credits
Master’s Project 3 (HCPR 6030) - Required
Course Director: Faculty
3 credits
This is the culminating capstone course of all masters-level graduate education programs. It has two aims: (1) helping students to discover and develop new and effective ways of managing and working together with all the stakeholders within the healthcare field and (2) helping accelerate a student's development of the context awareness, integrative management, and industry skills that are needed to lead in a rapidly changing healthcare sector. This capstone course puts students in a new organization, one they don’t already know well, and gives them the chance to practice hitting the ground running. This culminating course provides a deeper preparation for the next stages of a student's career. The capstone project will last the entire year: the first term involves matching students with the right project, the second term has students working with their client, and the third term consists of a detailed report and final presentation in front of the client as well as faculty and fellow classmates.
Fall Term 2
Students take 6 required credits, with the option of 1 or 2 electives
Data Science II – Statistical Learning (HBDS 5014) - Required
Course Director: Samprit Banerjee, PhD, MStat
3 credits
The course starts with logistic regression and discriminant analysis with emphasis on classification and prediction. This course would cover some of more advanced topics such as regularized regression, resampling methods, tree-based methods and support vector machines.
Hierarchal Modeling & Longitudinal Data Analysis (HBDS 5010) - Required
Course Director: Arindam RoyChoudhury, PhD
3 credits
An independent biostatistician often encounters data collected on patients over a length of time, or data that are otherwise clustered. This course will give the students necessary tools to analyze such data, while building on the core biostatistics material they have learned from other courses. Specifically, the students will learn to use mixed-effect models, mixed-effect ANOVA, generalized linear mixed models (GLMM), mixed-effect Cox-regression, Bayesian hierarchical models, repeated measure and longitudinal data analysis with appropriate covariance structures.
Statistical Programming with SAS (HBDS 5011) - Elective
Course Director: Zhengming Chen, PhD, MPH, M.S.
3 credits
This course provides introduction to the statistical software SAS. Students will receive a hands-on 6 exposure to data management and report generation with one of the most popular statistical software packages.
AI Fundamentals with Python (HBDS 5022)
3 credits
Modern Methods for Causal Inference (HBDS 5017) - Elective
Course Director: Himel Mallick, PhD
3 credits
The goal of this course is to introduce a core set of modern statistical concepts and techniques to the students, and to demonstrate how to use them to answer complex research questions in healthcare. The students will acquire knowledge on causal inference methods using machine learning, including directed acyclic graphs, non-parametric structural equation models, inverse probability weighting, g-computation, survival analysis, marginal structural models, longitudinal data, mediation analyses, effect modification, and precision medicine. This course will use the free software R to perform all statistical analysis.
Pharmaceutical Statistics (HBDS 5019) - Elective
Course Director: Faculty
3 credits
Pharmaceutical studies use many statistical methods that are not routinely taught as part of conventional biostatistics courses. In this course, the students will learn the statistical methods specifically used in pharmaceutical studies. The course is divided into three modules. (1) “Statistical Aspects of Phase I Clinical Trial” will include 3+3 Design, accelerated titration; up and down designs; continual reassessment method (CRM), Modified CRM, TITE CRM, Bayesian Logistic Regression Model (BLRM), escalation with overdose control (EWOC), toxicity probability interval (TPI) and modified TPI (mTPI). (2) “Statistical Aspects of Phase II Clinical Trial” will include design and analyses for One stage and Simon’s Two Stage Designs, Multi-arm Phase II design. (3) “Statistical Aspects of Phase III Clinical Trial” will include randomization, design and analysis for parallel, crossover, factorial, seamless Phase II/III, Adaptive and SMART designs.