Home | About Us | Publications | Contact

NAGGNER is a hybrid Biomedcial Named Entity Tagger for tagging human genes/proteins in biomedical text. The software is a free web tool which can process biomedical text in plain text/MEDLINE/XML formats. It has been developed using JAVA/JSP/Servlets and tested on common web browsers like Mozilla Firefox, IE and Google Chrome. NAGGNER uses JLEX for tokenization and MALLET for CRF models implementation. The initial phase of human genes/protein names recognition is achieved with a machine learning algorithm called Conditional Random Field (CRF). An enhanced rule-based algorithm and a hybrid abbreviation identification algorithm have been incorporated in the second phase for improved performance. NAGGNER confirms a better state-of-art tagging on human genes/proteins compared to the available taggers such as GENIA tagger, GENETAG, NLProt and ABNER on its own corpus on human proteins/protein kinases by achieving 94.96% precision, 96.96% recall and 95.95% f-score. Evaluation of the performance of NAGGNER on two gold standard corpuses namely NLPBA and BioCreAtivE also attained promising results; 85.87% precision, 88.18% recall and 87.01% f-score on NLPBA corpus and 81.84% precision, 75.49% recall and 78.54% f-score on BioCreAtivE corpus. The reported values are comparatively much higher than the available taggers.

Ms. Kalpana Raja Mr. Suresh Subramani Dr. Jeyakumar Natarajan

Data Mining and Text Mining Laboratory,
Department of Bioinformatics,
School of Life Sciences,
Bharathiar University,
Coimbatore - 641 046,
Tamilnadu, INDIA

This work has been carried out as a part of the Department of Information Technology (DIT) project entitled "Text Mining and Data Warehousing of Protein Kinases Relationships and Pathways" at Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar University, Coimbatore, India.

Copyright © 2013 by Data Mining and Text Mining Laboratory.
Department of Bioinformatics, Bharathiar University, Coimbatore 641046,India.