Degree Requirements – M.S. Data Science

The MS in Data Science provides students the training they need in data collection, exploration, manipulation and storage, analysis, and presentation in order to navigate data-rich workplace environments. The degree requires 30 total units and can typically be completed in 1.5 years for full-time students.

Plan of Study

You should work with your faculty advisor to develop a Master’s Plan of Study during your first few months in the program. The Plan of Study should be submitted to the Graduate College no later than your second semester in the program.

The Master’s Plan of Study identifies 1) courses you intend to transfer from other institutions; 2) courses already completed at the University of Arizona which you intend to apply toward the graduate degree, and 3) additional coursework to be completed to fulfill degree requirements. The Plan of Study must have the approval of the Director of Graduate Studies before it can be submitted to the Graduate College.


Suggested Completion Plan:

Year 1

Year 2

  • Fall: 6 units of electives
  • Spring: INFO 698 (Capstone), 6 units of electives

Core Courses

  • 9 units total

This course presents an overview and understanding of the intractable and pressing ethical issues as well as their related policies in the information fields. Emerging technological developments in relation to public interests and individual well-being are highlighted throughout the course. Special emphasis is placed on case studies and outcomes as well as frameworks for ethical decision-making.

This course will introduce students to the concepts and techniques of data mining for knowledge discovery. It includes methods developed in the fields of statistics, large-scale data analytics, machine learning, pattern recognition, database technology and artificial intelligence for automatic or semi-automatic analysis of large quantities of data to extract previously unknown interesting patterns. Topics include understanding varieties of data, data preprocessing, classification, association and correlation rule analysis, cluster analysis, outlier detection, and data mining trends and research frontiers. We will use software packages for data mining, explaining the underlying algorithms and their use and limitations. The course include laboratory exercises, with data mining case studies using data from many different resources such as social networks, linguistics, geo-spatial applications, marketing and/or psychology.

Capstone Project

Complete 3 units:

  • Register for INFO 698: Capstone Project
  • Course may be repeated once if you do not obtain a satisfactory score the first time
  • Project must be supervised by at least one faculty member in the School of Information
  • This project will evaluate all competencies required for the M.S. degree

You must submit your application in Handshake. More information can be found on the individual studies page.

Upon completing the capstone project, you will submit a report (5000-6000 words in length) in the form of an academic paper, documenting what has been accomplished and explain how the competencies have been demonstrated. The Graduate Committee (or its subcommittee), plus the supervisors, will evaluate the project and competencies and assign a pass/fail grade.

Elective Courses

  • 18 units total
  • Students have the choice of completing the MS alone or adding graduate certificates with specific emphases. Visit our Graduate Certificates page for more information.
  • Any non-core courses with the INFO prefix is considered elective (see below for options within emphasis areas)
  • The following out-of-department courses are also pre-approved for electives:


Elective Courses Organized as Optional Certificates

Certificate 1: Natural Language Processing (Linguistics)

This course introduces the key concepts underlying statistical natural language processing. Students will learn a variety of techniques for the computational modeling of natural language, including: n-gram models, smoothing, Hidden Markov models, Bayesian Inference, Expectation Maximization, Viterbi, Inside-Outside Algorithm for Probabilistic Context-Free Grammars, and higher-order language models.  Graduate-level requirements include assignments of greater scope than undergraduate assignments. In addition to being more in-depth, graduate assignments are typically longer and additional readings are required.

Plus one more course chosen from:

Neural networks are a branch of machine learning that combines a large number of simple computational units to allow computers to learn from and generalize over complex patterns in data. Students in this course will learn how to train and optimize feed forward, convolutional, and recurrent neural networks for tasks such as text classification, image recognition, and game playing.

Elective Courses Not Organized as Certificates

(if not opting for a set of certificates, courses can also be chosen from this set).

This course will guide students through advanced applications of computational methods for social science research.  Students will be encouraged to consider social problems from across sectors, including health science, environmental policy, education, and business. Particular attention will be given to the collection and analysis of data to study social networks, online communities, electronic commerce, and digital marketing.  Students will consider the many research designs used in contemporary social research, including “Big” data, online surveys, and virtual experimental labs, and will think critically about claims of causality, mechanisms, and generalization.

Machine learning describes the development of algorithms, which can modify their internal parameters (i.e., "learn") to recognize patterns and make decisions based on example data. These examples can be provided by a human, or they can be gathered automatically as part of the learning algorithm itself. This course will introduce the fundamentals of machine learning, will describe how to implement several practical methods for pattern recognition, feature selection, clustering, and decision making for reward maximization, and will provide a foundation for the development of new machine learning algorithms.  

Data Warehousing and Analytics In the Cloud will utilize concepts, frameworks, and best practices for designing a cloud-based data warehousing solution and explore how to use analytical tools to perform analysis on your data. In the first half of the course, I will provide an overview of the field of Cloud Computing, its main concepts, and students will get hands-on experience through projects which utilize cloud computing platforms. In the second half of the course, we will examine the construction of a cloud-based data warehouse system and explore how the Cloud opens up data analytics to huge volumes of data.

Most of web data today consists of unstructured text. This course will cover the fundamental knowledge necessary to organize such texts, search them a meaningful way, and extract relevant information from them. This course will teach natural language processing through the design and development of end-to-end natural language understanding applications, including sentiment analysis (e.g., is this review positive or negative?), information extraction (e.g., extracting named entities and their relations from text), and question answering (retrieving exact answers to natural language questions such as "What is the capital of France" from large document collections). We will use several natural language processing toolkits, such as NLTK and Stanford's CoreNLP. The main programming language used in the course will be Python, but code written in Java or Scala will be accepted as well.  Graduate-level requirements include implementing more complex, state-of-the-art algorithms for the three proposed projects. This will require additional reading of conference papers and journal articles.

Most of the web data today consists of unstructured text. Of course, the fact that this data exists is irrelevant, unless it is made available such that users can quickly find information that is relevant for their needs. This course will cover the fundamental knowledge necessary to build such systems, such as web crawling, index construction and compression, boolean, vector-based, and probabilistic retrieval models, text classification and clustering, link analysis algorithms such as PageRank, and computational advertising. The students will also complete one programming project, in which they will construct one complex application that combines multiple algorithms into a system that solves real-world problems.  Graduate level requirements include implementing more complex, state-of-the-art algorithms for the programming project, which might require additional reading of research articles. Written assignments will have additional questions for graduate students.

This course covers theory, methods, and techniques widely used to design and develop a relational database system and students will develop a broad understanding of modern database management systems. Applications of fundamental database principles in a stand-alone database environment using MS Access and Windows are emphasized. Applications in an Internet environment will be discussed using MySQL in the Linux platform. Graduate-level requirements include a group project consisting of seven sections: Database Design; Implementation (Tables); Forms; Data Retrieval (Queries/Reports); Project Presentation; Project Report; and, Peer Evaluation.

In today's digital society, people have access to a wide variety of information sources and scientific data. In this course, students will learn about the role of science and scientific data in society, and they will consider means for making science information findable and understandable for a wide variety of audiences. This course will provide students an interdisciplinary experience for considering science data and how that information gets shared across contexts.

You are encouraged to select one or two emphasis areas or develop your own in collaboration with your faculty advisor. The areas of emphasis listed below represent some anticipated areas of interest and specialization based on student interest and faculty expertise, but it is not a comprehensive list of courses.