ProNormz
The task of recognizing and normalizing protein name mentions in biomedical literature is a
challenging task and important for text mining applications such as protein-protein interactions,
pathway reconstruction and many more. In this paper, we present ProNormz, an integrated
approach for human proteins (HP) tagging and normalization. In Homo sapiens, a greater
number of biological processes are regulated by a large human gene family called protein kinases
by post translational phosphorylation. Recognition and normalization of human protein kinases
(HPK) is considered to be important for the extraction of the underlying information on its
regulatory mechanism from biomedical literature. ProNormz distinguishes HPK from other HP
besides tagging and normalization. To our knowledge, ProNormz is the first normalization
system available to distinguish HPK from other HP in addition to gene normalization task.
ProNormz incorporates a specialized synonyms dictionary for human proteins and protein
kinases, a set of 15 string matching rules and a disambiguation module to achieve the
normalization. Experimental results on benchmark BioCreative II training and test datasets show
that our integrated approach achieve a fairly good performance and outperforms more
sophisticated semantic similarity and disambiguation systems presented in BioCreative II GN
task. ProNormz incorporates our own named entity tagger NAGGNER and other popular tagger
BANNER for protein/gene name tagging. As a freely available web tool, ProNormz is useful to developers as extensible gene
normalization implementation, to researchers as a standard for comparing their innovative
techniques, and to biologists for normalization and categorization of HPs and HPKs mentions in
biomedical literature. URL: http://www.biominingbu.org/pronormz
The input for ProNormz can be a biomedical abstract(s) (plain text/MEDLINE/XML format) or a list of protein names. The corresponding output is the normalizaied protein\gene name highlighted and Samples inputs are available in the home page. Presence of some special character like Greek letters is not supported by the server.
Team
Dr. Suresh Subramani | Dr. Kalpana Raja | Dr. Jeyakumar Natarajan |
PhD Student | PhD Student | Professor |
Data Mining and Text Mining Laboratory,
Department of Bioinformatics,
School of Life Sciences,
Bharathiar University,
Coimbatore - 641 046,
Tamilnadu, INDIA
Publication
Suresh Subramani, Kalpana Raja, Jeyakumar Natarajan. ProNormz - An Integrated Approach for Human Proteins and Protein Kinases Normalization. Journal of Biomedical Informatics. 2014;47:131-8
Acknowledgement
This work has been carried out as a part of the Department of Information Technology (DIT) project entitled "Text Mining and Data Warehousing of Protein Kinases Relationships and Pathways" at Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar University, Coimbatore, India.