Biosemantic Research Group

Biosemantic Research Group is led by Dr. Hong Cui. It focuses on (1) converting factual information from biodiversity literature to computable data, covering research in information extraction, controlled vocabulary/ontology construction, and knowledge modeling, (2) enabling authors to write/record semantically clear phenotypic description/data so the data can be harvested at the time of publication, (3) contributing to data integration efforts under FAIR principles. 

Research Projects:

  • Collaborative Research: Frameworks: Internet of Samples: Toward an Interdisciplinary Cyberinfrasture for Material Samples. NSF-2004562. Aug, 2020 - July 2024.  In collaboration with Columbia University (System for Earth Sample Registration, SESAR), UC-Berkeley, University of Kansas, Open Context (, and Smithsonian National Museum of Natural History. PI role transferred to Dr. Thomer in 2022. 

  • ABI innovation: Authors in the driver's seat: fast, consistent, computable phenotype data and ontology production. NSF DBI-1661485. July 2017-Jun 2020. 

  • Collaborative Research: AVATOL - Next Generation Phenomics for the Tree of Life.  NSF DEB-1208567. May 2012- May 2017. Link.

  • Collaborative Research: ABI Development: Exploring Taxon Concepts (ETC) through analyzing fine-grained semantic markup of descriptive literature. NSF DBI-1147266. 7/2012-6/2016. Link.

  • Collaborative Research: Building a Comprehensive Evolutionary History of Flagellate Plants NSF DEB-1541509. Jan 2016- Dec 2019.

Biosemantic Software Tools Online:

  • Measurement Recorder
  • Allows authors to define and reuse characters that involve some measurements, for example, length of perigynium beak, among other ways, may be measured from the summit of achene to the summit of perigynium beak, including the perigynium teeth.  The software will add landmark terms the user needs to define this character and the defined character itself to a shared ontology so others can use these terms or the character for their characters. The goal of the software is to clearly define measurement methods and to encourage reuse and convergence among community users. Demo:


  • Description Editor
  • Shares the same goal as other tools for authors, but its focus is on writing taxonomic/non-measurement character descriptions. It utilizes our CharaParser Web API ( to convert user's character descriptions into a matrix format and add/link the entity and quality terms, or characters, to a given ontology. DE features description templates that are accessible for all users, with the goal to reduce redundant work and promote parallelism in taxonomic descriptions. Demo:


  • Character Recorder 
  • When authors  create taxonomic descriptions, they first examine a set of specimens and document their characters in a spreadsheet. The Character Recorder  is a novel spreadsheet with ontology support. The user populate a spreadsheet by selecting ontology terms but also have the freedom to use free text constraints. Demo: 


  • Add a term to an ontology experiment site 
  • To evaluate different ways to add terms to ontologies, a set of four methods are included in this experiment site, including the wizard we design and implemented. Demo: 


  • Explorer of Taxon Concept Toolkit (ETC): 
  • A Web-based application that ​Includes the following tools: (1) Text Capture (charaparser) that extract trait/phenotype characters from taxonomic descriptions of different taxon groups, (2) Ontology Building that facilitates the creation of a phenotype ontology using terms from taxonomic descriptions, (3)Matrix Generation that builds a taxon-by-character matrix from the extracted character data, (4) Key Generation that builds an interactive keys using characters, and (5) Taxonomy Comparison that compare taxon concepts using EULER tools and extracted characters.


  • Ontology Term Organizer (OTO): [temporarily offline]
  • A simple web application that allow multiple users to categorize a set of terms by drag and drop terms. This tool is meant to gather consensus from a group of users in order to support the development of a formal ontology. Relationships supported are is_a, part_of, and order (follows/precedes).



  • CharaParser+EQ: not maintained at this time.

Biosemantic Software Code Repository:

Biosemantics Group Members:

Hong Cui

Thomas Rodenhausen 
Former Lead Software Developer

Vikas Yadav
Ph.D Student

Erman Gurses


Autumn Fun 2016

Rubber Duck Race 2016

Selected Publications Since 2010

  • Cui, H*. (2010). Semantic annotation of morphological descriptions: An overall strategy. BMC Bioinformatics,11, 1-11. DOI:10.1186/1471-2105-11-278.
  • Cui, H*., Boufford, D., & Selden, P. (2010). Semantic annotation of biosystematics literature without training examples. Journal of the American Society for Information Science and Technology, 61(3), 522-542.
  • Cui, H*. (2010). Competency evaluation of plant character ontologies against domain literature.  Journal of the American Society for Information Science and Technology, 61(6), 1144-1165.
  • Cui, H*., Duan, Y. & Li, F. (2011). Machine learning based semantic markup of biodiversity literature in English. Document, Information, & Knowledge (in Chinese), 2, 73-77.
  • Cui, H*. (2012). CharaParser for fine-grained semantic annotation of organism morphological descriptions. Journal of the American Society for Information Science and Technology, 63(4), 738-754.
  • Thessen, A., Cui, H., & Mozzherin, D. (2012). Applications of natural language processing in biodiversity science. Advances in Bioinformatics.
  • Duan, Y, Hei, Z, Ju, F., Cui, H. (2012).  Study on Semantic Markup of Species Description Text in Chinese Based on Auto-Learned Rules. New Technology of Library and Information Services (Chinese). 2012 (5). 
  • Duan, Y, Hei, Z, Ju, F., Cui, H. (2012). Semantic Annotation of Species Description Text in Chinese Literature by Naive Bayes Classifier. Journal of the China Society for Scientific and Technical Information (Chinese). 31, (8), 805-812.
  • Arighi, C.N., Carterette, B., Cohen K.B. et al. (2013). An Overview of the BioCreative 2012 Workshop Track III: Interactive Text Mining Task. Database. doi: 10.1093/database/bas056
  • Burleigh, G, et al. (2013). Next generation phenomics for the Tree of Life. Plos Current.
  • Duan, YF., Hei ZZ., Jiu, F., & Cui, H.(2013)  Heuristics based semantic annotation of biodiversity documents in Chinese. Chinese Journal of Library and Information Science (English). 2013,6(2):33-46.
  • Dahdul, W.M., Cui, H., Mabee, P. et al. (2014) The Biological Spatial Ontology: anatomical descriptors for spatial and topological aspects of biological structures. Journal of Biomedical Semantics. 5:34. doi:10.1186/2041-1480-5-34
  • Zhang, Y., Cui, H., Burkell, & J. Mercer, R.E. (2014) A machine learning approach for rating the quality of depression treatment web pages. iConference, 2014. [full paper]
  • Deans AR, Lewis SE, Huala E, Anzaldo SS, Ashburner M, et al. (2015) Finding Our Way through Phenotypes. PLoS Biology 13(1): e1002033. doi:10.1371/journal.pbio.1002033 [perspective paper].
  • Huang, F, Macklin, J.A., Cui, H.*, Cole, H.A., & Endara, L. (2015). OTO: Ontology Term Organizer. BMC Bioinformatics. 16:47  doi:10.1186/s12859-015-0488-1
  • Cui, H., Dahdul, W., Dececchi, A., Ibrahim, N., Mabee, P., Balhoff, J., Gopalakrishnan, H. (2015) CharaPaser+EQ: Performance Evaluation Without Gold Standard. Annual Meeting of American Society for Information Science and Technology, Nov 6-10, St Louis, Missouri, 2015. (Full paper, acceptance rate: 36.%) 
  • Carrine, B., Cui, H., Moore, L., Ramona, W. (2016). MicrO: an ontology of phenotypic and metabolic characters, assays, and culture media found in prokaryotic taxonomic descriptions. Journal of Biomedical Semantics, 7:18, DOI: 10.1186/s13326-016-0060-6,
  • Cui, H.*, Xu, D., Chong, S.S., Ramirez, M.J., Rodenhausen, T., Macklin, J.A., Ludascher, B., Morris, R.A., Soto, E. M., & Koch, N.M.  (2016). Introducing Explorer of Taxon Concepts with a Case Study on Spider Measurement Matrix Building. BMC Bioinformatics.
  • Mao, J. Moore, L., Blank, C. Wu, E.H-H, Ackerman, M., Ranade, S., & Cui, H* (2016). Microbial Phenomics Information Extractor (MicroPIE): A Natural Language Processing Tool for the Automated Acquisition of Prokaryotic Phenotypic Characters from Text Sources. BMC Bioinformatics.
  • Endara, L. Cole, H.A., Burleigh, J.G., Nagalingum, N., Macklin, J.A., Liu, J., Cui, H*. Using taxonomic descriptions to build a standardized Plant Glossary, Taxon. 
  • T. Dang, H. Cui, and A. G. Forbes (2016). MultiLayerMatrix: Visualizing large taxonomic datasets. In Proceedings of the EuroVis Workshop on Visual Analytics (EuroVA), Groningen, Netherlands, June 2016. Blank, C.,
  • Cui, H., Moore, L., Ramona, W. (2016). MicrO: an ontology of phenotypic and metabolic characters, assays, and culture media found in prokaryotic taxonomic descriptions. Journal of Biomedical Semantics. 7(1), 1-10, DOI: 10.1186/s13326-016-0060-6
  • Cui, H.**, Xu, D., Chong, S.S., Ramirez, M.J., Rodenhausen, T., Macklin, J.A., Ludascher, B., Morris, R.A., Soto, E. M., & Koch, N.M.  (2016). Introducing Explorer of Taxon Concepts with a case study on spider measurement matrix building. BMC Bioinformatics. 17(1),471-492.  DOI:19.1186/s12859-016-1352-7.
  • Mao, J. Moore, L., Blank, C. Wu, E.H-H, Ackerman, M., Ranade, S., & Cui, H**. (2016). Microbial Phenomics Information Extractor (MicroPIE): A natural language processing tool for the automated acquisition of prokaryotic phenotypic characters from text sources. BMC Bioinformatics. 17(1), 528-543. DOI 10.1186/s12859-016-1396-8
  • Endara, L., Cole, H.A., Burleigh, J.G., Nagalingum, N., Macklin, J.A., Liu, J., & Cui, H**. (2017) Building the “Plant Glossary” — A controlled botanical vocabulary using terms extracted from the Floras of North America and China. TAXON. 66(4), 953-966. DOI: 10.12705/664.9
  • Endara, L., Cui, H. & Burleigh, J. G. (2018) Semiautomatic extraction of phenotypic traits from taxonomic descriptions using a Natural Language Processing approach. Applications in Plant Sciences. DOI: 10.1002/aps3.1035
  • Mao, J. and Cui, H.** (2018) Identifying bacterial biotope entities using sequence labeling: performance and feature analysis. Journal of Association for Information Science and Technology.
  • Xu, D**., Chong S., Rodenhausen, T., & Cui, H**. (2018) Resolving “orphaned” parts using machine learning and natural language processing methods. Biodiversity Data Journal.
  • Dahdul, W., Manda, P., Cui, H., Balhoff, J., Dececchi, A., Ibrahim, N., Lapp, H., Mabee, P., & Vision, T. (2018). Annotating phenotypes using ontological concepts: Inter-curator consistency as a baseline for evaluating the performance of a natural language processing system. Database (Oxford). The corpus in the different formats, as well as the ontologies and annotations generated in its production, have been archived at Zenodo ( The source code for the analysis of inter-curator and SCP consistency based on semantic similarity metrics, as well as the data and ontologies used as input, have been archived separately, also at Zenodo ( Semantic CharaParser is available in source code from GitHub ( phenoscape/phenoscape-nlp/) under the MIT license. The version used for this paper is the 0.1.0-goldstandard release ( releases/tag/v0.1.0-goldstandard), which is also archived at Zenodo (https://doi. org/10.5281/zenodo.1246698).
  • Pender, J., Sachs.J.L., Macklin, J.A., Cui, H., Vallance, A., Lujan-Toro, B., Rodenhausen, T., Belisle-Leclerc, M., Levin, G. (2018).  Bringing a Semantic MediaWiki Flora to Life Biodiversity Information Science and Standards 2: e25885.
  • Endara,L., Thessen, A.E., Cole, H.A., Walls, R., Gkoutos, G., Cao, Y., Chong, S.S., Cui, H*(2018). Modifier Ontologies for frequency, certainty, degree, and coverage phenotype modifiers. Biodiversity Data Journal 6: e29232.
  • Cui, H., Macklin, J.A.,  Sachs, J., Reznicek, A., Starr, J., Ford, B., Penev, L., Chen, H.L., (2018) Incentivizing use of structured language in biological descriptions: Author-driven phenotype data and ontology production. Biodiversity Data Journal 6: e29616.

  • Cui, H, Zhang, L., Ford, B., Cheng, H-L., Macklin, J., Reznicek, A. Starr, J.  (2020) Measurement Recorder: developing a useful tool for making species descriptions that produces computable phenotypes. Database (Oxford).  

  • Zhang, L., Yang, X., Cota, Z., Cui, H., Ford, B., Cheng, H-L., Macklin, J. Reznicek, A., Starr, J., (2021) Which methods are the most effective to enable novice users to participate in FAIR ontology creation? A usability study. Database (Oxford): The Journal of Biological Databases and Curation. DOI:
  • Cui, H., Ford, B., Starr, J., Zhang, L., Reznicek, A., & Macklin, J. (submitted 2021) A survey of biologists’ attitude towards using ontologies to make the phenotypic data computable at the time of publication. Database (Oxford): The Journal of Biological Databases and Curation.