RNA family classification using the conditional random fields model

2012 | Chiang Mai Journal of Science
Share via twitter Share via email Download PDF

Authors:
Chinae Thammarongtham
Jeerayut Chaijaruwanich

Abstract

—RNA family classification is one of the necessary tasks needed to characterize sequenced genomes. RNA families are defined by member sequences which perform the same function in different species. Such functions have a strong relationship with RNA secondary structures but not the primary sequence. Thus RNA sequences alone are not sufficient to classify RNA families. Here, we focus on computational RNA family classification by exploring primary sequences with RNA secondary structures as the selected feature to classify the RNA family using the method of conditional random fields (CRFs). This model treats RNA classification as a sequence labeling problem. Our CRFs models can classify the RNA families of the test RNA data sets with optimal F-score prediction between 98.77% - 99.32% for different RNA families.