MSDS Curriculum & Courses

Image
MSDS faculty teaching

The UArizona MS in Data Science, ranked the #9 program in the nation by Fortune, empowers students with the in-demand skills they need to transform data into actionable insights through data collection, exploration, analysis and manipulation.

The STEM-designated degree, which is offered on campus and online, requires 30 total units and can typically be completed in 18 months for full-time students.


MSDS Student Competencies

Students who graduate from the UArizona MS in Data Science will have the following competencies:

Competency 1

Students will establish the ability to exercise the four key techniques of computational thinking: decomposition, pattern recognition, abstraction and algorithms.

Competency 2

Students will obtain the skills of collecting, manipulating and analyzing different types of data at different scales, and interpreting the results properly.

Competency 3

Students will acquire the skills to communicate the results of their work to interdisciplinary teams, using appropriate visualizations, multimedia or artistic performance.

Competency 4

Students will demonstrate an understanding of information and data ethics, including ethical and legal requirements of data privacy and security, and the values of the information fields to serve diverse user groups.


Master's Plan of Study

As an MSDS student, you will work with your faculty advisor to develop a Master’s Plan of Study during your first few months in the program. The Plan of Study, which must be submitted to the Graduate College no later than your second semester in the program, identifies:

  1. Courses you intend to transfer from other institutions (if any)
  2. Courses already completed at the University of Arizona which you intend to apply toward the graduate degree (if any)
  3. Additional coursework to be completed to fulfill degree requirements

The Plan of Study must have the approval of the director of graduate studies before it can be submitted to the Graduate College.


Curriculum

MSDS students will take the following four required core data science courses:

This course explores ethical challenges stemming from data-driven decision making in society. Students will focus on important topics like bias, fairness, privacy, surveillance, discrimination, as well as data collection, storage, and management. Exploring dilemmas tied to data science, artificial intelligence, robotics, etc. will allow students to consider their own data behaviors as well as trends and problems across contexts like organizations, social media, health, and education. Special attention in the class will be given to matters of policy and governing protocols around the world. Related challenges tied to Internet governance, misinformation, fake video, automation, etc., will also be explored.

This course presents fundamental aspects of data science, including Python programming (e.g., data collection, cleaning, visualization), statistics, and mathematics (e.g., linear algebra and calculus). The course establishes the foundation for advanced data-intensive classes, providing both theoretical understanding and practical knowledge essential for comprehending Data Science and its applications.

This course will introduce students to the concepts and techniques of data mining for knowledge discovery. It includes methods developed in the fields of statistics, large-scale data analytics, machine learning, pattern recognition, database technology and artificial intelligence for automatic or semi-automatic analysis of large quantities of data to extract previously unknown interesting patterns. Topics include understanding varieties of data, data preprocessing, classification, association and correlation rule analysis, cluster analysis, outlier detection, and data mining trends and research frontiers. We will use software packages for data mining, explaining the underlying algorithms and their use and limitations. The course include laboratory exercises, with data mining case studies using data from many different resources such as social networks, linguistics, geo-spatial applications, marketing and/or psychology.

This course provides an overview of the various concepts and skills required for effective data visualization. It presents principles of graphic design, programming skills, and statistical knowledge required to build compelling visualizations that communicate effectively to target audiences. Visualization skills addressed in this course include choosing appropriate colors, shapes, variable mappings, and interactivity based on principles of color perception, pre-attentive processing, and accessibility.

Students have the choice of completing the MSDS alone or using sets of courses in order to attain one or more graduate certificates at the same time. Please see corresponding units' web pages for more information about their graduate certificates (e.g., Linguistics). Learn more about the Graduate Certificate in Foundations of Data Science or visit our Graduate Certificates page for more information about all iSchool graduate certificates.

Any non-core courses with the INFO prefix or out-of-department courses are also considered electives.

Elective courses include:

The objective of the course is to provide a sound understanding of fundamental statistical theory underlying econometric techniques utilized in quantitative analysis of problems in economics, business, and finance, public health and other social sciences.

Econometric model-building, estimation, forecasting and simulation for problems in agricultural and resource economics. Applications with actual data and models emphasized.

Emphasis in the course is on econometric model specification, estimation, inference, forecasting, and simulation.  Applications with actual data and modeling techniques are emphasized.

Most of the web data today consists of unstructured text. Of course, the fact that this data exists is irrelevant, unless it is made available such that users can quickly find information that is relevant for their needs. This course will cover the fundamental knowledge necessary to build these systems, such as web crawling, index construction and compression, Boolean, vector-based, and probabilistic retrieval models, text classification and clustering, link analysis algorithms such as PageRank, and computational advertising. The students will also complete one programming project, in which they will construct one complex application that combines multiple algorithms into a system that solves real-world problems.

This course covers important algorithms useful for natural language processing (NLP), including distributional similarity algorithms such as word embeddings, recurrent and recursive neural networks (NN), probabilistic graphical models useful for sequence prediction, and parsing algorithms such as shift-reduce. This course will focus on the algorithms that underlie NLP, rather than the application of NLP to various problem domains.

This course provides an introduction to technical aspects of cyber security. It describes threats and types of attacks against computers and networks to enable students to understand and analyze security requirements and define security policies. Security mechanisms and enforcement issues will be introduced. Students will be immersed in the cyber-security discipline through a combination of intense coursework, open-ended and real-world problems, and hands on experiments.

Machine learning deals with the automated classification, identification, and/or characterizations of an unknown system and its parameters. There are an overwhelming number of application driven fields that can benefit from machine learning techniques. This course will introduce you to machine learning and develop core principles that allow you to determine which algorithm to use, or design a novel approach to solving to engineering task at hand. This course will also use software technology to supplement the theory learned in the class with applications using real-world data.

 

Cloud Computing is an emerging paradigm that aims at delivering computing, information services, and data storage as a utility service over a network (e.g., Internet). There is a strong interest in cloud computing due to their performance and host, but their rapid deployment will exacerbate the security problem. In cloud computing, organizations relinquish direct control of many security aspects to the service providers such as trust, privacy preservation, identity management, data and software isolation, and service availability. The adoption and proliferation of cloud computing and services will be severely impacted if cloud security is not adequately addressed. The main goal of this course is discuss the limitations of current cybersecurity approaches to clouds and then focus on the fundamental issues to address the cloud security and privacy such as the confidentiality, integrity and availability of data and computations in clouds. In this course we will examine cloud computing models, look into the threat model and security issues related to data and computations outsourcing, and explore practical applications to make cloud resources secure and resilient to cyber attacks.

Introduction to computer networks and protocols. Study of the ISO open systems interconnection model, with emphasis on the physical, data link, network, and transport layers. Discussion of IEEE 802, OSI, and Internet protocols. Graduate-level requirements include additional homework and assignments.

Provides an introduction to problems and techniques of artificial intelligence (AI). Automated problem solving, methods and techniques; search and game strategies, knowledge representation using predicate logic; structured representations of knowledge; automatic theorem proving, system entity structures, frames and scripts; robotic planning; expert systems; implementing AI systems.  Graduate-level requirements include additional assignments.

The goal of this course is to gain an introductory understanding of geographic programming and data automation techniques using ModelBuilder and the Python language.

The focus of this class is to examine and apply GIS open source programming.  We will examine common languages used like Python, Java, html 5, as well as APIs, JSON, html, and SQL, to automate workflows, extend the tools, and create interactive web and mobile GS platforms. Topics include preparing data as strings, lists, tuples, and dictionaries prior to use, using Python to run SQL queries, working with roasters in Python, automating mapping tasks, and developing custom scripting tools.  In addition to weekly assignments and readings, assessment will be oriented around a single, student-directed project that will take the second half of the semester to complete.  It will require students to write a simple script to accomplish a specified task in ArcGIS and present the results of their work to peers.

This course will guide students through advanced applications of computational methods for social science research.  Students will be encouraged to consider social problems from across sectors, including health science, environmental policy, education, and business. Particular attention will be given to the collection and analysis of data to study social networks, online communities, electronic commerce, and digital marketing.  Students will consider the many research designs used in contemporary social research, including “Big” data, online surveys, and virtual experimental labs, and will think critically about claims of causality, mechanisms, and generalization.

Machine learning describes the development of algorithms, which can modify their internal parameters (i.e., "learn") to recognize patterns and make decisions based on example data. These examples can be provided by a human, or they can be gathered automatically as part of the learning algorithm itself. This course will introduce the fundamentals of machine learning, will describe how to implement several practical methods for pattern recognition, feature selection, clustering, and decision making for reward maximization, and will provide a foundation for the development of new machine learning algorithms.  

Students will learn from experts from projects that have developed widely adopted foundational Cyber infrastructure resources, followed by hands-on laboratory exercises focused around those resources. Students will use these resources and gain practical experience from laboratory exercises for a final project using a data set and meeting requirements provided by domain scientists. Students will be provided access to computer resources at: UA campus clusters, iPlant Collaborative and at NSF XSEDE. Students will also learn to write a proposal for obtaining future allocation to large-scale national resources through XSEDE.  Graduate-level requirements include reading a paper related to cyberinfrastructure, present it to the class, and lead a discussion on the paper.

Data Warehousing and Analytics In the Cloud will utilize concepts, frameworks, and best practices for designing a cloud-based data warehousing solution and explore how to use analytical tools to perform analysis on your data. In the first half of the course, I will provide an overview of the field of Cloud Computing, its main concepts, and students will get hands-on experience through projects which utilize cloud computing platforms. In the second half of the course, we will examine the construction of a cloud-based data warehouse system and explore how the Cloud opens up data analytics to huge volumes of data.

This course focuses on the use of modern data science methods to help learners make socially responsible decisions and mitigate harm that arises from issues like bias, discrimination, and threats to one's personal privacy. More and more individuals are needing to make data-driven decisions in a wide variety of contexts including non-governmental organizations, not-for-profit industries, human services, environmental organizations, refugee camps, and more. Students in this class will thus learn about data science and how it can be utilized in contexts where socially-good decisions are desired and emphasized. This active learning class is designed for students who have an interest in the topic but who may have little to no previous experience with data science or programming.

Most of web data today consists of unstructured text. This course will cover the fundamental knowledge necessary to organize such texts, search them a meaningful way, and extract relevant information from them. This course will teach natural language processing through the design and development of end-to-end natural language understanding applications, including sentiment analysis (e.g., is this review positive or negative?), information extraction (e.g., extracting named entities and their relations from text), and question answering (retrieving exact answers to natural language questions such as "What is the capital of France" from large document collections). We will use several natural language processing toolkits, such as NLTK and Stanford's CoreNLP. The main programming language used in the course will be Python, but code written in Java or Scala will be accepted as well.  Graduate-level requirements include implementing more complex, state-of-the-art algorithms for the three proposed projects. This will require additional reading of conference papers and journal articles.

Most of the web data today consists of unstructured text. Of course, the fact that this data exists is irrelevant, unless it is made available such that users can quickly find information that is relevant for their needs. This course will cover the fundamental knowledge necessary to build such systems, such as web crawling, index construction and compression, boolean, vector-based, and probabilistic retrieval models, text classification and clustering, link analysis algorithms such as PageRank, and computational advertising. The students will also complete one programming project, in which they will construct one complex application that combines multiple algorithms into a system that solves real-world problems.  Graduate level requirements include implementing more complex, state-of-the-art algorithms for the programming project, which might require additional reading of research articles. Written assignments will have additional questions for graduate students.

Neural networks are a branch of machine learning that combines a large number of simple computational units to allow computers to learn from and generalize over complex patterns in data. Students in this course will learn how to train and optimize feed forward, convolutional, and recurrent neural networks for tasks such as text classification, image recognition, and game playing.

This course covers theory, methods, and techniques widely used to design and develop a relational database system and students will develop a broad understanding of modern database management systems. Applications of fundamental database principles in a stand-alone database environment using MS Access and Windows are emphasized. Applications in an Internet environment will be discussed using MySQL in the Linux platform. Graduate-level requirements include a group project consisting of seven sections: Database Design; Implementation (Tables); Forms; Data Retrieval (Queries/Reports); Project Presentation; Project Report; and, Peer Evaluation.

In today's digital society, people have access to a wide variety of information sources and scientific data. In this course, students will learn about the role of science and scientific data in society, and they will consider means for making science information findable and understandable for a wide variety of audiences. This course will provide students an interdisciplinary experience for considering science data and how that information gets shared across contexts.

This course provides an overview of modern database systems at the time. Both relational databases (SQL) and a few non-relational databases (NoSQL) are covered, including topics on data warehouses. The focus of the course is on the practical skills of the design and implementation of data storage and access for data and information sciences. Topics covered include ER-diagrams, database normalization, data modeling in NoSQL databases, SQL and other query languages, and data warehousing. The course will selectively cover one or two types of NoSQL databases, for example, document-oriented, key-value pair, column-oriented, or graph databases. Database platforms used in this course could change with time, some examples include MySQL, PostgreSQL, Apache HBASE, Apache Cassandra, MongoDB, and Neo4J.

Organizing information in electronic formats requires standard machine-readable languages. This course covers recent standards including XML (eXtensible Markup Language) and related technologies (XPath and XSLT) which are used widely in current information organization systems. Building on a sounding understanding of XML technologies, the course also introduces students to newer standards that support the development of the Semantic Web. These standards include RDF (Resource Description Framework), RDFS (RDF Schema), and OWL (Web Ontology Language) and their application under the Linked Data paradigm. While the application of many specific XML schemas used in libraries and other information setting such as science and business will be used to provide the context for various topics, the main focus of the course is on understanding the concepts of XML and Semantic Web technologies and on applying practical skills in various settings, including but not limiting to libraries. The course is heavy with hands-on assignments and requires students complete a final group project.

This course introduces the key concepts underlying statistical natural language processing. Students will learn a variety of techniques for the computational modeling of natural language, including: n-gram models, smoothing, Hidden Markov models, Bayesian Inference, Expectation Maximization, Viterbi, Inside-Outside Algorithm for Probabilistic Context-Free Grammars, and higher-order language models.  Graduate-level requirements include assignments of greater scope than undergraduate assignments. In addition to being more in-depth, graduate assignments are typically longer and additional readings are required.

Topics include speech synthesis, speech recognition, and other speech technologies.  This course gives students background for a career in the speech technology industry.  Graduate students will do extra readings, extra assignments, and have an extra presentation. Their final project must constitute original work in a speech technology.

This course provides a hands-on project-based approach to particular problems and issues in computational linguistics.

This course focuses on statistical approaches to pattern classification and applications of natural language processing to real-world problems

Statistical methodology of estimation, testing hypotheses, goodness-of-fit, nonparametric methods and decision theory as it relates to engineering practice. Significant emphasis on the underlying statistical modeling and assumptions.  Graduate-level requirements include additionally more difficult homework assignments.

This course will provide senior undergraduate and graduate students from a diverse engineering disciplines with fundamental concepts, principles and tools to extract and generalize knowledge from data. Students will acquire an integrated set of skills spanning data processing, statistics and machine learning, along with a good understanding of the synthesis of these skills and their applications to solving problem. The course is composed of a systematic introduction of the fundamental topics of data science study, including: (1) principles of data processing and representation, (2) theoretical basis and advances in data science, (3) modeling and algorithms, and (4) evaluation mechanisms. The emphasis in the treatment of these topics will be given to the breadth, rather than the depth. Real-world engineering problems and data will be used as examples to illustrate and demonstrate the advantages and disadvantages of different algorithms and compare their effectiveness as well as efficiency, and help students to understand and identify the circumstances under which the algorithms are most appropriate.

Unconstrained and constrained optimization problems from a numerical standpoint. Topics include variable metric methods, optimality conditions, quadratic programming, penalty and barrier function methods, interior point methods, successive quadratic programming methods.

Decomposition-coordination algorithms for large-scale mathematical programming. Methods include generalized Benders decomposition, resource and price directive methods, subgradient optimization, and descent methods of nondifferentiable optimization. Application of these methods to stochastic programming will be emphasized.

This course is devoted to structure and properties of practical algorithms for unconstrained and constrained nonlinear optimization.

Complete a total of 3 units for the required internship or capstone project:

Internship is intended to provide an opportunity for students to build on what they have mastered in the program and practice the knowledge and skills in the real world. The Internship should be relevant to student's degree competencies and contribute to the development and enforcement of the student's knowledge and skill sets in the field of Information Science. The student should propose an internship plan and then identify an internship site supervisor, who typically is external. The site supervisor and the graduate advisor of the school need to approve the plan prior to course registration. The plan should include goals for the internship, degree competencies addressed by the internship, expected tasks to be completed, work schedule, and the assessment plan. The amount of the work should be appropriate for the units registered (3 units = 135 hours). The internship may be paid or unpaid. Student may take an internship in the same organization where student is employed, but work planed for the internship need to have a clear separation from the work expected by the employment. At the conclusion of the internship, the site supervisor is expected to submit a written assessment of student's work.

Capstone Project is intended to provide an opportunity for students to show off what they have mastered in the program. The project should be relevant to MS degree competencies and contribute to the development and enforcement of the student's knowledge and skill sets in the field of Information Science. The student should propose a project plan and the faculty advisor should approve it before registration. The project plan should include goals for the project, MS competencies addressed by the project, system design, an implementation schedule, and the assessment plan. The project plan should also include reasonable milestones and check points. The amount of the work should be appropriate for a 3-unit course. The primary faculty advisor must be an SI faculty, but faculty members from other units may participate in advising the student.


Internship or Capstone Project

Either an internship or a capstone project of 1 to 3 units is required as part of the MSDS.

Internship

The internship is intended to provide an opportunity for students to build on what they have mastered in the program and practice the knowledge and skills in the real world, whether corporate, institutional, nonprofit or otherwise. The internship should be relevant to student's degree competencies and contribute to the development and enforcement of the student's knowledge and skill sets in the fields of data science and information science.

iSchool master's students have interned at a wide range of organizations, including:

  • Amazon
  • Avirtek
  • CyVerse
  • Freeport-McMoRan
  • Genentech
  • iDE Global
  • Intel
  • Labcorp Drug Development
  • Lightsense Technology
  • Lum.ai
  • Lunewave
  • Mayo Clinic
  • NuvOx Pharma
  • Onebridge
  • Pima County Public Library
  • Pitney Bowes
  • Roche
  • RNC Mobile Services
  • Tesla
  • The University of Arizona
  • Tucson Police Department
  • U.S. Food and Drug Administration (FDA)
  • Viasat
  • Vue Data

The student should propose an internship plan and then identify an internship site supervisor, who typically is external. The site supervisor and the graduate advisor of the school need to approve the plan prior to course registration. The plan should include:

  • Goals for the internship
  • Degree competencies addressed by the internship
  • Expected tasks to be completed
  • Work schedule
  • Assessment plan

The amount of the work should be appropriate for the units registered (3 units = 135 hours). The internship may be paid or unpaid. A student may take an internship in the same organization where the student is employed, but work planned for the internship needs to have a clear separation from the work expected by the employment.

At the conclusion of the internship, the site supervisor is expected to submit a written assessment of student's work.

For additional information about internships, including resources for finding an internship and select internship postings, view the iSchool Internships & Mentorships page:

iSchool Internship Information & Resources


Capstone Project

The 1- to 3-unit MSDS capstone project is an opportunity for students to showcase what they have mastered in the program. It is based on a project plan that includes project goals, master's competencies addressed by the project, system design, implementation schedule, assessment plan and milestones. The project contributes to the development and enforcement of the student's knowledge and skill sets in the field of information science.

The capstone project must exercise all competencies required for the MSDS and must also have a software development component. Students will deposit capstone project code in GitHub or another source code repository.

To declare capstone projects, students follow these steps:

  1. Identify your iSchool faculty supervisor.
  2. Request an experience via Handshake (mandatory).
  3. Upon completing the capstone project, submit a report (5,000-6,000 words in length) in the form of an academic paper, documenting what has been accomplished and explaining how the competencies have been demonstrated.
  4. Your supervisor(s) will complete a competencies evaluation form, evaluate the project and assign a pass/fail grade.


 

Ready to transform your future in data science?

Learn more about the Master of Science in Data Science by contacting us at si_admissions@arizona.edu, or review the admissions process and begin your application now:

Start Your Application