You're logged in as |

Data Science Institute

Brown University's Data Science Institute serves as a campus hub for research and education in data science. Engaging partners across campus and beyond, the DSI 's mission is to facilitate and conduct both domain-driven and fundamental research in data science, increase data fluency and educate the next generation of data scientists, and ultimately explore the impact of the data revolution on culture, society, and social justice. We envision our role in the university and beyond as something to build over time, with the flexibility to meet the changing needs of Brown’s students and research community.

The Master’s Program in Data Science (Master of Science, ScM) prepares students from a wide range of disciplinary backgrounds for distinctive careers in data science. Rooted in a research collaboration between four strong academic departments (Applied Mathematics, Biostatistics, Computer Science,  and Mathematics), the Master's Program offers a unique and rigorous education for people building careers in data science and/or big data management. 

For additional information, please visit the institute's website: http://dsi.brown.edu/

Course usage information

DATA 0080. Data, Ethics and Society.

A course on the social, political, and philosophical issues raised by the theory and practice of data science. Explores how data science is transforming not only our sense of science and scientific knowledge, but our sense of ourselves and our communities and our commitments concerning human affairs and institutions generally. Students will examine the field of data science in light of perspectives provided by the philosophy of science and technology, the sociology of knowledge, and science studies, and explore the consequences of data science for life in the first half of the 21st century. Fulfills requirement for Certificate in Data Fluency

Course usage information

DATA 0200. Data Science Fluency.

As data science becomes more visible, are you curious about its unique amalgamation of computer programming, statistics, and visualizing or storytelling? Are you wondering how these areas fit together and what a data scientist does? This course offers all students regardless of background the opportunity for hands-on data science experience, following a data science process from an initial research question, through data analysis, to the storytelling of the data. Along the way, you will learn about the ethical considerations of working with data, and become more aware of societal impacts of data science. Course does not count toward CS concentration requirements.

Course usage information

DATA 0250. Applied Statistics in Python.

As more students engage in data science there is a need to provide guidance on conducting basic statistical analysis in Python. This course will provide a non-specialist approach to applied statistics, specifically linear models Python. Students will learn how to conduct linear modules using the Statsmodels package in Python. Students should have good working knowledge of descriptive statistics (equivalent to a high school AP level). Python coding experience is helpful but not required. Student learning would be assessed through hands-on Python coding activities and written interpretation of statistical reports.

Course usage information

DATA 1010. Probability, Statistics, and Machine Learning.

An introduction to the mathematical methods of data science through a combination of computational exploration, visualization, and theory. Students will learn scientific computing basics, topics in numerical linear algebra, mathematical probability (probability spaces, expectation, conditioning, common distributions, law of large numbers and the central limit theorem), statistics (point estimation, confidence intervals, hypothesis testing, maximum likelihood estimation, density estimation, bootstrapping, and cross-validation), and machine learning (regression, classification, and dimensionality reduction, including neural networks, principal component analysis, and unsupervised learning).

Course usage information

DATA 1030. Hands-on Data Science.

Develops all aspects of the machine learning pipeline: data acquisition and cleaning, handling missing data, exploratory data analysis, visualization, feature engineering, modeling, interpretation, presentation in the context of real-world datasets. Fundamental considerations for data analysis are emphasized (the bias-variance tradeoff, training, validation, testing). Classical models and techniques for classification and regression are included (linear and logistic regression with regularization, support vector machines, decision trees, random forests, XGBoost). Uses the Python data science ecosystem (e.g., sklearn, pandas, matplotlib). Prerequisites: A course equivalent to CSCI 0050, CSCI 0150 or CSCI 0170 are strongly recommended.

Course usage information

DATA 1050. Data Engineering.

The course will cover the storage, retrieval, and management of various types of data and the computing infrastructure (such as various types of databases and data structures) and algorithmic techniques (such as searching and sorting algorithms) and query languages (such as SQL) for interacting with data, both in the context of transaction processing (OLTP) and analytical processing (OLAP). Students will be introduced to measures for evaluating the efficacy of different techniques for interacting with data (such as ‘Big-Oh’ measure of complexity and the number of I/O operations) and various types of indexes for the efficient retrieval of data. The course will also cover several components of the Hadoop ecosystem for the processing of ‘big data.’ Additional topics include cloud computing and NoSQL databases. Introduction to concepts and techniques of computer science essential for data science will also be covered.

Course usage information

DATA 1150. Data Science Fellows.

This course is for junior and senior students with data science skills, seeking to apply these skills and teach others how to implement and interpret data science. Working in conjunction with a faculty partner this course teaches students communication skills, how to determine the needs (requirements) for a project, and how to teach data science to peers. Qualified students will have a combination of programming experience (intermediate level or aboveR or Python), some statistical knowledge (intermediate level or above) and knowledge of how data and computing can be used in applied fields. Students in the data fluency certificate must have DATA 0200 prior to DATA 1150. Students need to complete the application to express interest.  Qualified students must participant in an interview with the instructor and override requests will be granted only to students by instructor approval.

Course usage information

DATA 1200. Reality Remix - Experimental VR.

This course pursues collaborative experimentation with virtual and augmented reality (AR and VR). The class will work as a team to pursue research (survey of VR/AR experiences, scientific and critical literature review), reconnaissance (identifying VR/AR resources on campus, in Providence and the region), design (VR/AR prototyping). Research findings are documented in a class wiki. The course makes use of Brown Arts Initiative facilities in the Granoff Center where an existing VR laboratory will be expanded through the course of the semester based on student needs. Class culminates in the release the class wiki as a resource for the Brown community.

Course usage information

DATA 1450. Text Analytics.

This course will first cover techniques for compiling textual corpora from web pages, pdfs, scanned pdfs, images, audio clips, etc. Secondly, it will look at processes for extracting some common types of information from these corpora. In particular, we will cover extracting named entities (persons, locations, organizations, etc.), relations between entities, events, transactions, topics, document summaries, abstracts, legal clauses, etc. This course is different from standard courses in Natural Language Processing and Computational Linguistics in that we will spend significant amount of course time on compiling textual corpora from documents in a variety of formats and our emphasis will be on extracting information that can be fed to analytics pipelines.

Course usage information

DATA 1720. Tackling Climate Change with Machine Learning (EEPS 1720).

Interested students must register for EEPS 1720.

Course usage information

DATA 2020. Statistical Learning.

A modern introduction to inferential methods for regression analysis and statistical learning, with an emphasis on application in practical settings in the context of learning relationships from observed data. Topics will include basics of linear regression, variable selection and dimension reduction, and approaches to nonlinear regression. Extensions to other data structures such as longitudinal data and the fundamentals of causal inference will also be introduced.

Course usage information

DATA 2040. Deep Learning and Special Topics in Data Science.

A hands-on introduction to neural networks, reinforcement learning, and related topics. Students will learn the theory of neural networks, including common optimization methods, activation and loss functions, regularization methods, and architectures. Topics include model interpretability, connections to other machine learning models, and computational considerations. Students will analyze a variety of real-world problems and data types, including image and natural language data.

Course usage information

DATA 2050. Data Science Practicum.

The capstone experience is a hands-on thesis project that entails an in-depth study of a current problem in data science. Students will synthesize their knowledge of probability and statistics, machine learning, and data and computational science. A faculty member from one of the four core DSI departments (Applied Mathematics, Biostatistics, Computer Science, Mathematics) will oversee the capstone course. Students may collaborate with an additional faculty member, postdoc, or industry partner on projects. DATA 1010 and DATA 1030 are recommended pre-requisites.

Course usage information

DATA 2080. Data and Society.

A course on the social, political, and philosophical issues raised by the theory and practice of data science. Explores how data science is transforming not only our sense of science and scientific knowledge, but our sense of ourselves and our communities and our commitments concerning human affairs and institutions generally. Students will examine the field of data science in light of perspectives provided by the philosophy of science and technology, the sociology of knowledge, and science studies, and explore the consequences of data science for life in the first half of the 21st century.

Course usage information

DATA 2110. Topics in Econometrics.

This course will begin with a survey of the literature on identification using instrumental variables, including identification bounds, conditional moment restrictions, and control function approaches. The next part of class will cover some of the theoretical foundations of machine learning, including regularization and data-driven choice of tuning parameters. We will discuss in some detail the canonical normal means model, Gaussian process priors, (empirical) Bayes estimation, and reproducing kernel Hilbert space norms. We will finally cover some selected additional topics in machine learning, including (deep) neural nets, text as data (topics models), multi-armed bandits, and data visualization.

Course usage information

DATA 2980. Research in Data Science.

Section numbers vary by instructor. Please check Banner for the correct section number and CRN to use when registering for this course.

Data Science

Master of Science in Data Science

The Data Science Initiative at Brown offers a new master's program (ScM) that will prepare students from a wide range of disciplinary backgrounds for distinctive careers in Data Science. Rooted in a research collaboration among four very strong academic departments (Applied Mathematics, Biostatistics, Computer Science, and Mathematics), the master's program will offer a rigorous, distinctive, and attractive education for people building careers in Data Science and/or in Big Data Management. The program's main goal is to provide a fundamental understanding of the methods and algorithms of Data Science. Such an understanding will be achieved through a study of relevant topics in mathematics, statistics and computer science, including machine learning, data mining, security and privacy, visualization, and data management. The program will also provide experience in important, frontline data-science problems in a variety of fields, and introduce students to ethical and societal considerations surrounding data science and its applications.

The program's course structure, including the capstone experience, will ensure that the students meet the goals of acquiring and integrating foundational knowledge for data science, applying this understanding in relation to specific problems, and appreciating the broader ramifications of data-driven approaches to human activity. Moreover, our strong industry partnerships will help you better learn about industry's needs and directions, and will expose you to novel and unique opportunities. In addition, several professors from all across the different department's groups work closely with industry (regional and beyond) and the government, so you will be able to sharpen your skills here on problems that bring research ideas and methods to bear on problems of practical value.

The program will be conducted over one academic year plus one summer, with the option for an additional pre-program summer for students who lack one or more of the basic prerequisites. The regular program includes two semesters of coursework and a one-summer (5- 10 week) capstone project focused on data analysis in a particular application area.

There are nine credits unites required to pass the program: four in each of the academic year semesters, and one (the capstone experience) in the summer. The nine credit-units divide as follows:

3 credits in mathematical and statistical foundations,
3 credits in data and computational science,
1 credit in societal implications and opportunities,
1 elective credit to be drawn from a wide range of focused applications or deeper theoretical exploration, and
1 credit capstone experience.
We also offer an option as a 5-th Year Master's Program if you are an undergraduate at Brown. This allows you to substitute maximally 2 credits with courses you have already taken.

Master of Science in Data Science

Semester I
DATA 1010Probability, Statistics, and Machine Learning2
DATA 1030Hands-on Data Science1
DATA 1050Data Engineering1
Semester II
DATA 2020Statistical Learning1
DATA 2040Deep Learning and Special Topics in Data Science1
DATA 2080Data and Society1
An appropriate 1000-level or 2000-level course to be determined by the student and approved by the program advisor. Possible courses could range from advanced mathematical methods to very specific applications of data science.1
Summer
DATA 2050Data Science Practicum 11
Total Credits9

For more information on admission and program requirements, please visit the following website:

https://www.brown.edu/academics/gradschool/programs/data-science