Data Science and Visualization Certificate

About the Program

The purpose of this certificate is to appeal to a wide variety of learners from across the campus. This certificate will signal to employers that students have dedicated the time and energy necessary to develop the skills and confidence for tackling messy, real-world data problems using modern programming languages.

The UArizona's iSchool certificate will service a diverse student population, training both 1) technically-minded students the nuances associated with successfully developing and communicating data methods and results for non-experts and the general public, and 2) less technically-minded students the basic skills necessary for gathering insights from data. 

The Data Science and Visualization Certificate is distinct in its accessibility for students from across domains, fields, and disciplines at the University – students in the arts, humanities, or those less inclined to embrace degree program in Information, Computer Science, etc. will find this certificate an important skill-building option for today’s employers.

Up to 6 units may be shared with a degree requirement (major, minor, General Education) or second certificate.

Declare your Certificate 

Learning Outcomes

The Data Science and Visualization Certificate will provide undergraduate students the confidence and training they need in data collection, exploration, manipulation and storage, analysis, and presentation in order to navigate data-rich workplace environments.

In completing the Certificate, students will obtain practical experience using a variety of data science techniques and software applications, gain hands-on experience working with real-world data sets drawn from science, social media, and business and build on basic statistical and programming knowledge to become familiar with the tools utilized for advanced work in today’s data-rich landscape.

Required Courses 

  • 12 units are required for the certificate 
  • Up to 6 units may be shared with a degree requirement (major, minor, General Education) or second certificate.

  • All students, including Information Science students, may only 'double use' 6 units towards another program of study (major, minor, General Education, or another certificate)

Choose either ESOC 214 or ISTA 116,  then take ISTA 320 Data Visualization, and ISTA 321 Data Mining. One elective is also required.

As data continue to grow in volume and penetrate everything we do in contemporary work across many professions, employers are seeking data scientists to extract meanings and patterns from large quantities of data. This user-friendly course will provide an introduction to a variety of skills required for data analytics in organizations, education, health contexts, and the sciences. Specifically, this course examines information management in the context of massive sets of data, provides students proficiency with a variety of data analysis tools, and exposes learners to varied data platforms as well as skills and concepts related to data mining and statistical analysis. Particular attention will be given to toolkits imbedded in R and other platforms.

Understanding uncertainty and variation in modern data: data summarization and description, rules of counting and basic probability, data visualization, graphical data summaries, working with large data sets, prediction of stochastic outputs from quantitative inputs.  Operations with statistical computer packages such as R.


This course will introduce students to the fundamental concepts and tools used to convey the information contained within large, complex data sets through a variety of visualization techniques. Students will learn the fundamentals of data exploration data via visualizations, how to manipulate and reshape data to make it suitable for visualization, and how to prepare everything from simple single-variable visualizations to large multi-tiered and interactive visualizations. Visualization theory will be presented alongside the technical aspect of the course to develop a holistic understanding of the topic.

This course introduces students to the theory and practice of data mining for knowledge discovery. This includes methods developed in the fields of statistics, large-scale data analytics, machine learning, and artificial intelligence for automatic or semi-automatic analysis of large quantities of data to extract previously unknown and interesting patterns. Topics include understanding varieties of data, classification, association rule analysis, cluster analysis, and anomaly detection. We will use software packages for data mining, explaining the underlying algorithms and their use and limitations. The course will include laboratory exercises, with data mining case studies using data from biological sequences and networks, social networks, linguistics, ecology, geo-spatial applications, marketing and psychology.

Elective Courses 

  • Complete 1 additional course (3 units)

This course will explore broad research paradigms and theoretical approaches that inform contemporary social research, varying study designs, as well as the systematic methods utilized in differing types of data analyses. Though this course will introduce research processes across the academic spectrum, quantitative analysis of both small and large data sets will be emphasized. Therefore, students will learn about basic statistical analyses and will be introduced to the emerging worlds of data science and social media analytics. Students will also consider related topics such as data visualization or research presentations.

This course will be inviting for a wide variety of students from across disciplines, and they will learn how to use industry standard tools and practices to make large data sets usable for scientists and other decision makers. From data collection and preparation, to the creation of big data stores, databases, or systems to make data flow, this course will focus on the practical work needed to prepare big data for analyses across contexts. Students will be introduced to a variety of technical tools for data management, storage, use, and manipulation.

This course surveys the techniques central to the modern practice of extracting useful patterns and models from large bodies of data and the theory behind these techniques.  Students will learn the purpose, power, and limitations of models, with concrete examples from business and science.  Course subject matter may include classification and regression, supervised segmentation and decision trees, similarity/distance metrics and recommender systems, clustering and nearest neighbors, support vector machines, understanding and avoiding overfitting, natural language processing and sentiment analysis, machine learning, neural networks, and AI, and logistic regression.

Natural language processing (NLP) is the study of how we can teach computers to use language by extracting knowledge from text, and then use that knowledge in some meaningful way.  In this introductory course, we will examine the fundamental components on which natural language processing systems are built, including frequency distributions, part of speech tagging, syntactic parsing, semantics and analyzing meaning, search, introductory information and relation extraction, and structured knowledge resources.  We will also examine pragmatic concerns in processing raw text from real-world sources.

Machine learning describes algorithms which can modify their internal parameters (i.e., "learn") to recognize patterns and make decisions based on examples or through interaction with the environment.  This course will introduce the fundamentals of machine learning, will describe how to implement several practical methods for pattern recognition, feature selection, clustering, and decision making for reward maximization, and will provide a foundation for the development of new machine learning algorithms.

Students will learn from experts from projects that have developed widely adopted foundational Cyberinfrastrcutrue resources, followed by hands-on laboratory exercises focused around those resources. Students will use these resources and gain practical experience from laboratory exercises for a final project using a data set and meeting requirements provided by domain scientists. Students will be provided access to computer resources at: UA campus clusters, iPlant Collaborative and at NSF XSEDE. Students will also learn to write a proposal for obtaining future allocation to large scale national resources through XSEDE.

Neural networks are a branch of machine learning that combines a large number of simple computational units to allow computers to learn from and generalize over complex patterns in data. Students in this course will learn how to train and optimize feed forward, convolutional, and recurrent neural networks for tasks such as text classification, image recognition, and game playing.

Do you want to live permanently on Antarctica? Now is your chance, apply for Mission Antarctica! The ice is melting, the penguins are marching; it seems like a perfect time to settle, but many challenges await. Data can help you live and thrive in this changing environment and not be eaten by a leopard seal. However, most of us do not know how to organize, analyze, and translate real-life data into decisions. In this class, we undergo a series of scenarios to teach you how to use data to design and evaluate if we are making a difference in our new society. These scenarios include case studies related to disease, food security, conservation, sustainability, and nutrition. Through a combination of lectures, hands-on problem solving, and collaboration, this course teaches introductory data literacy skills such as data management, analytics, and visualization useful for decision making and your careers. No programming experience is required and students are encouraged to have in class laptops for in-class activities and assignments. All readings and supplemental material are open source, or free to students. Most importantly, no penguins will be harmed in this adventure, we promise.