The statement that accompanies your application is an important element that helps us learn more about you. There is no required length or topic, but we would like to learn more about your background and expected future in data science at a minimum.
We regularly accept applicants who have had work experience after their undergraduate degree, both in data science related fields and not. Not all applicants have work experience, but for those who do, a good description helps us better evaluate your application.
Research experience is not required for master's applicants and many of our applicants do not have any, but you can use experience you've had to demonstrate your ability to handle graduate-level data science material.
需要提交 GRE 成绩
DATA 1030. Hands-on Data Science.
Develops all aspects of the machine learning pipeline: data acquisition and cleaning, handling missing data, exploratory data analysis, visualization, feature engineering, modeling, interpretation, presentation in the context of real-world datasets. Fundamental considerations for data analysis are emphasized (the bias-variance tradeoff, training, validation, testing). Classical models and techniques for classification and regression are included (linear and logistic regression with regularization, support vector machines, decision trees, random forests, XGBoost). Uses the Python data science ecosystem (e.g., sklearn, pandas, matplotlib).
DATA 1050. Data Engineering.
This course covers the storage, retrieval, and management of various types of data and the computing infrastructure (such as various types of databases and data structures) and algorithmic techniques (such as searching and sorting algorithms) and query languages (such as SQL) for interacting with data, both in the context of transaction processing (OLTP) and analytical processing (OLAP). Students will be introduced to measures for evaluating the efficacy of different techniques for interacting with data (such as ‘Big-Oh’ measure of complexity and the number of I/O operations) and various types of indexes for the efficient retrieval of data. The course will also cover several components of the Hadoop ecosystem for the processing of "big data." Additional topics include cloud computing, NoSQL databases, and modern data architectures. Introduction to some of the concepts and techniques of computer science essential for data science will also be covered.
APMA 1690. Computational Probability and Statistics.
Examination of probability theory and mathematical statistics from the perspective of computing. Topics selected from random number generation, Monte Carlo methods, limit theorems, stochastic dependence, Bayesian networks, dimensionality reduction. Prerequisites: APMA 1650 or equivalent; programming experience is recommended.
DATA 2020. Statistical Learning.
A modern introduction to inferential methods for regression analysis and statistical learning, with an emphasis on application in practical settings in the context of learning relationships from observed data. Topics will include basics of linear regression, variable selection and dimension reduction, and approaches to nonlinear regression. Extensions to other data structures such as longitudinal data and the fundamentals of causal inference will also be introduced.
DATA 2080. Data and Society.
A course on the social, political, and philosophical issues raised by the theory and practice of data science. Explores how data science is transforming not only our sense of science and scientific knowledge, but our sense of ourselves and our communities and our commitments concerning human affairs and institutions generally. Students will examine the field of data science in light of perspectives provided by the philosophy of science and technology, the sociology of knowledge, and science studies, and explore the consequences of data science for life in the first half of the 21st century.
CSCI 2470, or equivalent. Deep Learning.
Deep Learning belongs to a broader family of machine learning methods. It is a particular version of artificial neural networks that emphasizes learning representation with multiple layers of networks. Deep Learning, plus the specialized techniques that it has inspired (e.g. convolutional neural networks, recurrent neural networks, and transformers), have led to rapid improvements in many applications, such as computer vision, machine learning, sound understanding, and robotics. This course gives students an overview of the prominent techniques of Deep Learning and its applications in computer vision, language understanding, and other areas. It also provides hands-on practice of implementing deep learning algorithms in Python. A final project will implement an advanced piece of work in one of these areas.
Machine Learning Theory:
New course coming spring 2023. We will introduce the mathematical methods of data science through a combination of theory, computational methods, and visualization. We formally define the statistical learning framework, common assumptions in the data generation process, and learning models. The mathematical models behind common supervised and unsupervised techniques are discussed. Students will implement some of the algorithms from scratch using standard python and numpy. The course includes a final project. Students will read a peer-reviewed publication on a machine learning topic of their choice and they will write a blog post/article and give a presentation explaining the methods and results of the publication to a non-expert audience.
DATA 2050. Data Practicum.
The practicum experience is a hands-on thesis project that entails an in-depth study of a current problem in data science. Students will synthesize their knowledge of probability and statistics, machine learning, and data and computational science. Students may use an internship in industry for the practicum, or work with a faculty member at Brown or elsewhere. The project must be approved beforehand by the DATA 2050 instructor and students must provide regular interim reports and a final presentation.