Natural Language Processing Undergraduate Certificate

iSchool student

The 12-credit-hour Natural Language Processing (NLP) Undergraduate Certificate provides students the confidence and training they need in natural language processing: teaching computers to use language by extracting knowledge from text, and then using that knowledge in meaningful ways.

The certificate will signal to employers that students have dedicated the time and energy necessary to develop the skills and confidence for working from these types of data. It is designed to train both technically minded students as well as less technically minded students in the basic skills necessary for gathering insights from NLP data.


Learning Outcomes

  • Students will able to critically analyze a data science problem to determine how natural language processing techniques might be applied
  • Students will be able to code a variety of natural language processing algorithms and techniques and apply them to specific data science problems

Declaration or Application Process

A certificate can be completed as a stand-alone program or alongside an undergraduate degree. There are no additional application requirements for the Data Science and Visualization certificate. 

If you are a current UArizona student, you may declare your certificate here:

Declare my Certificate 

If you are not a current UArizona student, you can apply for admission as a certificate-seeking student.

The Data Science and Visualization Certificate is available via main campus and Arizona Online

Students must meet the same general UArizona admissions criteria as degree-seeking students. The requirements and expectations are the same as a first-year, transfer or readmit student depending on what admit type a student is (first-year, transfer or readmit). Students have to fill out the application fully and submit all required transcripts and requested materials. Certificate seeking students (as in Certificate seeking only, not as part of a degree) are not eligible for merit aid or financial aid and if they apply as degree-seeking in the future, they are considered "readmits."


Courses

Required Courses

  • 12 units are required for the certificate 
  • Up to 6 units may be shared with a degree requirement (major, minor, General Education) or second certificate.
  • All students, including Information Science, Computer Science, and Linguistics major students, may only 'double use' 6 units towards another program of study (major, minor, General Education, or another certificate)

View or Download Certificate Fillable Checklist (PDF)

Student will choose either ISTA 130 (4 units, description below), CSC 110 (4 units), LING 201 (3 units), or LING 408 (3 units):

An introduction to computational techniques and using a modern programming language to solve current problems drawn from science, technology, and the arts. Topics include control structures, elementary data structures, and effective program design and implementation techniques. Weekly laboratory.

**Programming-intensive Course, College Algebra recommended

AND student will choose either ISTA 355 (3 units) or LING 388 (3 units):

Natural language processing (NLP) is the study of how we can teach computers to use language by extracting knowledge from text, and then use that knowledge in some meaningful way.  In this introductory course, we will examine the fundamental components on which natural language processing systems are built, including frequency distributions, part of speech tagging, syntactic parsing, semantics and analyzing meaning, search, introductory information and relation extraction, and structured knowledge resources.  We will also examine pragmatic concerns in processing raw text from real-world sources.

Fundamentals of processing of natural language and computational linguistics.

AND student will take LING/ISTA/CSC 439 (3 units):


Elective Courses

Complete at least 3 units from the following courses (ISTA course descriptions below):

  • LING 408 (3 units)  
  • LING 438 (3 units) 
  • LING 478 (3 units) 
  • ISTA 131 (3 units) 
  • ISTA 455 (4 units)  
  • ISTA 456 (3 units) 
  • CSC 483 (3 units) 

At the core of Information Science lies the digital data that is the object of study. This course aims to introduce the tools, techniques, and issues involved with the handling of this data: where it comes from, how to store and retrieve it, how to extract knowledge from the data via analysis, and the social, ethical, and legal issues involved in its use. Throughout the course, students will be given hands-on experience with actual datasets from a variety of sources including social media and citizen science projects, as well as experience with common tools for analysis and visualization. Students will also examine topical case studies involving legal and ethical issues surrounding data.

Most of the data available on the web or managed by institutions and businesses consists of unstructured text. Natural language processing tools help to organize such texts, extract relevant information from them, and automatize time-consuming processes. This course will teach the fundamental knowledge necessary to design and develop end-to-end natural language understanding applications, drawn from examples such as question answering, sentiment analysis, information extraction, automated inference, machine translation, chatbots, etc. We will use several natural language processing toolkits and libraries.

Most of the web data today consists of unstructured text. Of course, the fact that this data exists is irrelevant, unless it is made available such that users can quickly find information that is relevant for their needs. This course will cover the fundamental knowledge necessary to build such systems, such as web crawling, index construction and compression, boolean, vector-based, and probabilistic retrieval models, text classification and clustering, link analysis algorithms such as PageRank, and computational advertising. The students will also complete one programming project, in which they will construct one complex application that combines multiple algorithms into a system that solves real-world problems.