As data continue to grow in volume and penetrate everything we do in contemporary work across many professions, employers are seeking data scientists to extract meanings and patterns from large quantities of data. This user-friendly course will provide an introduction to a variety of skills required for data analytics in organizations, education, health contexts, and the sciences. Specifically, this course examines information management in the context of massive sets of data, provides students proficiency with a variety of data analysis tools, and exposes learners to varied data platforms as well as skills and concepts related to data mining and statistical analysis. Particular attention will be given to toolkits imbedded in R and other platforms.
Data Science and Visualization Certificate
About the Program
The purpose of this certificate is to appeal to a wide variety of learners from across the campus, those in programs that may not find other UArizona data science programs accessible given their own hesitancies, their course pre-requisites, or their familiarity with big data. This certificate will signal to employers that students have dedicated the time and energy necessary to develop the skills and confidence for tackling messy, real-world data problems using modern programming languages.
The UArizona's iSchool certificate will service a diverse student population, training both 1) technically-minded students the nuances associated with successfully developing and communicating data methods and results for non-experts and the general public, and 2) less technically-minded students the basic skills necessary for gathering insights from data.
The Data Science and Visualization Certificate is distinct in its accessibility for students from across domains, fields, and disciplines at the University – students in the arts, humanities, or those less inclined to embrace degree program in Information, Computer Science, etc. will find this certificate an important skill-building option for today’s employers. It serves students who may or may not bring experience or prerequisites required of many data-oriented courses and programs on campus.
9 units must be taken via the University of Arizona (not transferred).
Up to 6 units may be shared with a degree requirement (major, minor, General Education) or second certificate.
The Data Science and Visualization Certificate will provide undergraduate students the confidence and training they need in data collection, exploration, manipulation and storage, analysis, and presentation in order to navigate data-rich workplace environments.
In completing the Certificate, students will obtain practical experience using a variety of data science techniques and software applications, gain hands-on experience working with real-world data sets drawn from science, social media, and business and build on basic statistical and programming knowledge to become familiar with the tools utilized for advanced work in today’s data-rich landscape.
- Complete 3 courses (9 units)
Up to 6 units may be shared with a degree requirement (major, minor, General Education) or second certificate. Information Science students may only 'double use' 6 units towards an Information Science major or minor.
Choose either ESOC 214 or ISTA 116, then take ISTA 320 Data Visualization, and ISTA 321 Data Mining
Understanding uncertainty and variation in modern data: data summarization and description, rules of counting and basic probability, data visualization, graphical data summaries, working with large data sets, prediction of stochastic outputs from quantitative inputs. Operations with statistical computer packages such as R.
This course will introduce students to the fundamental concepts and tools used to convey the information contained within large, complex data sets through a variety of visualization techniques. Students will learn the fundamentals of data exploration data via visualizations, how to manipulate and reshape data to make it suitable for visualization, and how to prepare everything from simple single-variable visualizations to large multi-tiered and interactive visualizations. Visualization theory will be presented alongside the technical aspect of the course to develop a holistic understanding of the topic.
This course introduces students to the theory and practice of data mining for knowledge discovery. This includes methods developed in the fields of statistics, large-scale data analytics, machine learning, and artificial intelligence for automatic or semi-automatic analysis of large quantities of data to extract previously unknown and interesting patterns. Topics include understanding varieties of data, classification, association rule analysis, cluster analysis, and anomaly detection. We will use software packages for data mining, explaining the underlying algorithms and their use and limitations. The course will include laboratory exercises, with data mining case studies using data from biological sequences and networks, social networks, linguistics, ecology, geo-spatial applications, marketing and psychology.
- Complete 1 additional course (3 units)
This course introduces biostatistical methods and applications, covering descriptive statistics, probability, and inferential techniques necessary for appropriate analysis and interpretation of data relevant to health sciences. Students will use a statistical software package.
Students learn how to identify and acquire medical and health data, assess quality, and integrate data from multiple sources. Students gain knowledge of how data collection procedures influence data quality and techniques for combining health datasets. Students gain skills by completing applied projects to collect, access and work with existing health data.
This course teaches students basic programming approaches for mapping large disparate health data to analyzable formats. Students will also gain data processing skills including version control, assessment for missing data, errors, and outliers. Students will develop hands-on skills including batch processing, and data aggregation and learn how to create and manage a database using REDCap. Students will also learn what data visualization is, and how they can use it to better present and understand health data.
This course will explore broad research paradigms and theoretical approaches that inform contemporary social research, varying study designs, as well as the systematic methods utilized in differing types of data analyses. Though this course will introduce research processes across the academic spectrum, quantitative analysis of both small and large data sets will be emphasized. Therefore, students will learn about basic statistical analyses and will be introduced to the emerging worlds of data science and social media analytics. Students will also consider related topics such as data visualization or research presentations.
This course will guide students through advanced applications of computational methods for social science research. Students will be encouraged to consider social problems from across sectors, like health science, education, environmental policy and business. Particular attention will be given to the collection and use of data to study social networks, online communities, electronic commerce and digital marketing. Students will consider the many research designs used in contemporary social research and will learn to think critically about claims of causality, mechanisms, and generalization in big data studies.
This course will be inviting for a wide variety of students from across disciplines, and they will learn how to use industry standard tools and practices to make large data sets usable for scientists and other decision makers. From data collection and preparation, to the creation of big data stores, databases, or systems to make data flow, this course will focus on the practical work needed to prepare big data for analyses across contexts. Students will be introduced to a variety of technical tools for data management, storage, use, and manipulation.
This course surveys the techniques central to the modern practice of extracting useful patterns and models from large bodies of data and the theory behind these techniques. Students will learn the purpose, power, and limitations of models, with concrete examples from business and science. Course subject matter may include classification and regression, supervised segmentation and decision trees, similarity/distance metrics and recommender systems, clustering and nearest neighbors, support vector machines, understanding and avoiding overfitting, natural language processing and sentiment analysis, machine learning, neural networks, and AI, and logistic regression.
Natural language processing (NLP) is the study of how we can teach computers to use language by extracting knowledge from text, and then use that knowledge in some meaningful way. In this introductory course, we will examine the fundamental components on which natural language processing systems are built, including frequency distributions, part of speech tagging, syntactic parsing, semantics and analyzing meaning, search, introductory information and relation extraction, and structured knowledge resources. We will also examine pragmatic concerns in processing raw text from real-world sources.
Machine learning describes algorithms which can modify their internal parameters (i.e., "learn") to recognize patterns and make decisions based on examples or through interaction with the environment. This course will introduce the fundamentals of machine learning, will describe how to implement several practical methods for pattern recognition, feature selection, clustering, and decision making for reward maximization, and will provide a foundation for the development of new machine learning algorithms.
Students will learn from experts from projects that have developed widely adopted foundational Cyberinfrastrcutrue resources, followed by hands-on laboratory exercises focused around those resources. Students will use these resources and gain practical experience from laboratory exercises for a final project using a data set and meeting requirements provided by domain scientists. Students will be provided access to computer resources at: UA campus clusters, iPlant Collaborative and at NSF XSEDE. Students will also learn to write a proposal for obtaining future allocation to large scale national resources through XSEDE.
Neural networks are a branch of machine learning that combines a large number of simple computational units to allow computers to learn from and generalize over complex patterns in data. Students in this course will learn how to train and optimize feed forward, convolutional, and recurrent neural networks for tasks such as text classification, image recognition, and game playing.
Do you want to live permanently on Antarctica? Now is your chance, apply for Mission Antarctica! The ice is melting, the penguins are marching; it seems like a perfect time to settle, but many challenges await. Data can help you live and thrive in this changing environment and not be eaten by a leopard seal. However, most of us do not know how to organize, analyze, and translate real-life data into decisions. In this class, we undergo a series of scenarios to teach you how to use data to design and evaluate if we are making a difference in our new society. These scenarios include case studies related to disease, food security, conservation, sustainability, and nutrition. Through a combination of lectures, hands-on problem solving, and collaboration, this course teaches introductory data literacy skills such as data management, analytics, and visualization useful for decision making and your careers. No programming experience is required and students are encouraged to have in class laptops for in-class activities and assignments. All readings and supplemental material are open source, or free to students. Most importantly, no penguins will be harmed in this adventure, we promise.