The task of recognizing and normalizing protein name mentions in biomedical literature is a challenging task and important for text mining applications such as protein-protein interactions, pathway reconstruction and many more. In this paper, we present ProNormz, an integrated approach for human proteins (HP) tagging and normalization. In Homo sapiens, a greater number of biological processes are regulated by a large human gene family called protein kinases by post translational phosphorylation. Recognition and normalization of human protein kinases (HPK) is considered to be important for the extraction of the underlying information on its regulatory mechanism from biomedical literature. ProNormz distinguishes HPK from other HP besides tagging and normalization. To our knowledge, ProNormz is the first normalization system available to distinguish HPK from other HP in addition to gene normalization task. ProNormz incorporates a specialized synonyms dictionary for human proteins and protein kinases, a set of 15 string matching rules and a disambiguation module to achieve the normalization. Experimental results on benchmark BioCreative II training and test datasets show that our integrated approach achieve a fairly good performance and outperforms more sophisticated semantic similarity and disambiguation systems presented in BioCreative II GN task. ProNormz incorporates our own named entity tagger NAGGNER and other popular tagger BANNER for protein/gene name tagging. As a freely available web tool, ProNormz is useful to developers as extensible gene normalization implementation, to researchers as a standard for comparing their innovative techniques, and to biologists for normalization and categorization of HPs and HPKs mentions in biomedical literature. URL:

The input for ProNormz can be a biomedical abstract(s) (plain text/MEDLINE/XML format) or a list of protein names. The corresponding output is the normalizaied protein\gene name highlighted and Samples inputs are available in the home page. Presence of some special character like Greek letters is not supported by the server.

Dr. Suresh Subramani Dr. Kalpana Raja Dr. Jeyakumar Natarajan
PhD Student PhD Student Professor

Data Mining and Text Mining Laboratory,
Department of Bioinformatics,
School of Life Sciences,
Bharathiar University,
Coimbatore - 641 046,
Tamilnadu, INDIA

Suresh Subramani, Kalpana Raja, Jeyakumar Natarajan. ProNormz - An Integrated Approach for Human Proteins and Protein Kinases Normalization. Journal of Biomedical Informatics. 2014;47:131-8


This work has been carried out as a part of the Department of Information Technology (DIT) project entitled "Text Mining and Data Warehousing of Protein Kinases Relationships and Pathways" at Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar University, Coimbatore, India.