Biosemantic Research Group

Biosemantic Research Group focus on converting factual information from biodiversity literature to computable data, covering research in information extraction, controlled vocabulary/ontology construction, and knowledge modeling.  

Current Projects:

  • ABI innovation: Authors in the driver's seat: fast, consistent, computable phenotype data and ontology production. NSF DBI-1661485. July 2017-Jun 2020. Link (under construction)

  • Collaborative Research: AVATOL - Next Generation Phenomics for the Tree of Life.  NSF DEB-1208567. May 2012- May 2017. Link.

  • Collaborative Research: ABI Development: Exploring Taxon Concepts (ETC) through analyzing fine-grained semantic markup of descriptive literature. NSF DBI-1147266. 7/2012-6/2016. Link.

  • Collaborative Research: Building a Comprehensive Evolutionary History of Flagellate Plants NSF DEB-1541509. Jan 2016- Dec 2019.

Biosemantic software tools online:

Web-based application that ​Includes the following tools

  1. Text Capture (charaparser) that extract trait/phenotype characters from taxonomic descriptions of different taxon groups,
  2. Ontology Building that facilitates the creation of a phenotype ontology using terms from taxonomic descriptions,
  3. Matrix Generation that builds a taxon-by-character matrix from the extracted character data,
  4. Key Generation that builds an interactive keys using characters, and
  5. Taxonomy Comparison that compare taxon concepts using EULER tools and extracted characters.

A simple web application that allow multiple users to categorize a set of terms by drag and drop terms. This tool is meant to gather consensus from a group of users in order to support the development of a formal ontology. Relationships supported are is_a, part_of, and order (follows/precedes).

​Extracts microbial physiological characters from descriptions and returns a taxon-by-character matrix.

  • CharaParser+EQ: not maintained at this time.

Biosemantic software code repository:

Biosemantics Group Members:

Hong Cui Thomas Rodenhausen Vikas Yadav
 Hong Cui
Associate Professor/PI
Thomas Rodenhausen 
Lead Software Developer
Vikas Yadav
Ph.D Student
Erman Gurses Autumn Fun at Marana Pumpkin Patch Rubber Duck Race at Marana Pumpkin Patch

Erman Gurses
M.S. Student

Autumn Fun 2016 Rubber Duck Race 2016


Group Weekly Presentations

Date Topic Presenter
Dec 1, 2016 Entity-Level Modeling for Coreference Resolution Dongfang Xu
Nov 17, 2016 Anaphora Resolution Thomas Rodenhausen
Nov 3, 2016 Restricted Boltzmann Machines for Classification Jin Mao
Oct 13, 2016 Deep Learning for Bacteria Event Identification Jin Mao
Oct 6, 2016 Medical Semantic Similarity Comparison With NLM Dongfang Xu
Sep 15, 2016 Bidirectional CRF for NER Jin Mao
Sep 8, 2016 CRF &SVM in Medication Extraction Dongfang Xu
Aug 23, 2016 Using CRF to label Bacteria and Habitat entities Jin Mao
Aug 16, 2016 Medication Information Extraction Dongfang Xu
Aug 2, 2016 Indented Tree or Graph Thomas Rodenhausen
Jul 26, 2016 A Review of Ontology Editor Evaluation Yujie Cao
Apr 19, 2016 Conditional Random Field & Table Extraction Dongfang Xu
Mar 8, 2016 A Brief Introduction to Distant Supervision Jin Mao
Feb 9,2016 Docker Thomas Rodenhausen
Feb 2, 2016 Factor Graph in DeepDive Jin Mao
Jan 29, 2016 Deepdive case study Dongfang Xu
Jan 22, 2016 Graphic Models Jin Mao
Dec 18, 2015 Hidden Markov Model Dongfang Xu
Dec 4, 2015 Logic Lecture Professor Martin Frické
Nov 20,2015 Habitat-Lite & EnvO Jin Mao
Nov 13, 2015 The model of DeepDive Dongfang Xu
Oct 30, 2015 Profiling users with tag networks Jin Mao
Oct 23, 2015 DeepDive Introduction Dongfang Xu
Oct 16, 2015 Ranking In Folksonomies Thomas Rodenhausen
Oct 9, 2015 Ontology Based Information Extraction Jin Mao
Oct 2, 2015 Interactive Data Analysis Aarthy Sankari Bhaska
Sept 25 2015 OWL & Protege Dongfang Xu
Sept 18 2015 Issues on Classification Problem Jin Mao
Sept 11 2015 RDF & SPARQL Dongfang Xu
Sept 4, 2015 Topical Scientific Community Jin Mao


Selected Publications since 2010

  • Cui, H*. (2010). Semantic annotation of morphological descriptions: An overall strategy. BMC Bioinformatics,11, 1-11. DOI:10.1186/1471-2105-11-278.
  • Cui, H*., Boufford, D., & Selden, P. (2010). Semantic annotation of biosystematics literature without training examples. Journal of the American Society for Information Science and Technology, 61(3), 522-542.
  • Cui, H*. (2010). Competency evaluation of plant character ontologies against domain literature.  Journal of the American Society for Information Science and Technology, 61(6), 1144-1165.
  • Cui, H*., Duan, Y. & Li, F. (2011). Machine learning based semantic markup of biodiversity literature in English. Document, Information, & Knowledge (in Chinese), 2, 73-77.
  • Cui, H*. (2012). CharaParser for fine-grained semantic annotation of organism morphological descriptions. Journal of the American Society for Information Science and Technology, 63(4), 738-754.
  • Thessen, A., Cui, H., & Mozzherin, D. (2012). Applications of natural language processing in biodiversity science. Advances in Bioinformatics.
  • Duan, Y, Hei, Z, Ju, F., Cui, H. (2012).  Study on Semantic Markup of Species Description Text in Chinese Based on Auto-Learned Rules. New Technology of Library and Information Services (Chinese). 2012 (5). 
  • Duan, Y, Hei, Z, Ju, F., Cui, H. (2012). Semantic Annotation of Species Description Text in Chinese Literature by Naive Bayes Classifier. Journal of the China Society for Scientific and Technical Information (Chinese). 31, (8), 805-812.
  • Arighi, C.N., Carterette, B., Cohen K.B. et al. (2013). An Overview of the BioCreative 2012 Workshop Track III: Interactive Text Mining Task. Database. doi: 10.1093/database/bas056
  • Burleigh, G, et al. (2013). Next generation phenomics for the Tree of Life. Plos Current.
  • Duan, YF., Hei ZZ., Jiu, F., & Cui, H.(2013)  Heuristics based semantic annotation of biodiversity documents in Chinese. Chinese Journal of Library and Information Science (English). 2013,6(2):33-46.
  • Dahdul, W.M., Cui, H., Mabee, P. et al. (2014) The Biological Spatial Ontology: anatomical descriptors for spatial and topological aspects of biological structures. Journal of Biomedical Semantics. 5:34. doi:10.1186/2041-1480-5-34
  • Zhang, Y., Cui, H., Burkell, & J. Mercer, R.E. (2014) A machine learning approach for rating the quality of depression treatment web pages. iConference, 2014. [full paper]
  • Deans AR, Lewis SE, Huala E, Anzaldo SS, Ashburner M, et al. (2015) Finding Our Way through Phenotypes. PLoS Biology 13(1): e1002033. doi:10.1371/journal.pbio.1002033 [perspective paper].
  • Huang, F, Macklin, J.A., Cui, H.*, Cole, H.A., & Endara, L. (2015). OTO: Ontology Term Organizer. BMC Bioinformatics. 16:47  doi:10.1186/s12859-015-0488-1
  • Cui, H., Dahdul, W., Dececchi, A., Ibrahim, N., Mabee, P., Balhoff, J., Gopalakrishnan, H. (2015) CharaPaser+EQ: Performance Evaluation Without Gold Standard. Annual Meeting of American Society for Information Science and Technology, Nov 6-10, St Louis, Missouri, 2015. (Full paper, acceptance rate: 36.%) 
  • Carrine, B., Cui, H., Moore, L., Ramona, W. (2016). MicrO: an ontology of phenotypic and metabolic characters, assays, and culture media found in prokaryotic taxonomic descriptions. Journal of Biomedical Semantics, 7:18, DOI: 10.1186/s13326-016-0060-6,
  • Cui, H.*, Xu, D., Chong, S.S., Ramirez, M.J., Rodenhausen, T., Macklin, J.A., Ludascher, B., Morris, R.A., Soto, E. M., & Koch, N.M.  (2016). Introducing Explorer of Taxon Concepts with a Case Study on Spider Measurement Matrix Building. BMC Bioinformatics.
  • Mao, J. Moore, L., Blank, C. Wu, E.H-H, Ackerman, M., Ranade, S., & Cui, H* (2016). Microbial Phenomics Information Extractor (MicroPIE): A Natural Language Processing Tool for the Automated Acquisition of Prokaryotic Phenotypic Characters from Text Sources. BMC Bioinformatics.
  • Endara, L. Cole, H.A., Burleigh, J.G., Nagalingum, N., Macklin, J.A., Liu, J., Cui, H*. (Under review) Using taxonomic descriptions to build a standardized Plant Glossary, Taxon. 
  • T. Dang, H. Cui, and A. G. Forbes (2016). MultiLayerMatrix: Visualizing large taxonomic datasets. In Proceedings of the EuroVis Workshop on Visual Analytics (EuroVA), Groningen, Netherlands, June 2016. 


College of Social and Behavioral Sciences