Class
Article
College
College of Agriculture and Applied Sciences
Department
Plants, Soils, and Climate Department
Faculty Mentor
Rakesh Kaundal
Presentation Type
Poster Presentation
Abstract
Abstract: Nitrification is an important microbial two-step transformation in the global nitrogen cycle, as it is the only natural process that produces nitrate within a system. The functional annotation of nitrification-related enzymes has a broad range of applications in metagenomics, agriculture, industrial biotechnology, etc. The time and resources needed for determining the function of enzymes experimentally are restrictively costly. Therefore, an accurate genome-scale computational prediction of the nitrification-related enzymes has become much more important.In this study, we developed an alignment-free computational approach to determine the nitrification-related enzymes from the sequence itself. We propose deepNEC, a novel end-to-end feature selection and classification model training approach for nitrification-related enzyme prediction. The algorithm has been developed using Deep Learning, a class of machine learning algorithms that uses multiple layers to progressively extract higher-level features from the raw input data. The raw protein sequences encoding is used as an input, extracting sequential and convolutional features from raw encoded protein sequences based on classification rather than using traditional alignment-based methods for nitrification-related enzyme prediction. Two large datasets of protein sequences, enzymes, and non-enzymes were used to train the models with protein sequence features like amino acid composition, dipeptide composition, conformation transition and distribution (CTD), NMBroto, conjoint, quasi order, etc. The K-fold cross-validation and independent testing were performed to validate our model training. The models were further implemented as web server (deepNEC) which is publicly available. deepNEC uses a two-tier approach for prediction; in the first phase, it will predict a query sequence as enzyme or non-enzyme; in the second phase, it will further predict and classify enzymes into various nitrification-related enzyme classes. Among all, the DPC+NMBrot hybrid feature gave the best prediction performance (accuracy of 96.15% in k-fold training and 93.43% in independent testing) with an MCC (0.92 training and 0.87 independent testing) in phase I; in phase-II, the DPC feature gave the best prediction performance for 13 nitrification-related enzyme classes. We have also implemented a homology-based method to remove false negatives. The tool can be accessed freely at http://bioinfo.usu.edu/deepNEC/. Presentation Time: Thursday, 12-1 p.m.Zoom link: https://usu-edu.zoom.us/j/83738417563?pwd=SHlRcGdaaTdmVzVUOENqTnVHQ3UzZz09
Location
Logan, UT
Start Date
4-11-2021 12:00 AM
Included in
Climate Commons, Plant Sciences Commons, Soil Science Commons
deepNEC: A Novel Alignment-Free Tool for the Characterization of Nitrification-Related Enzymes Using Deep Learning, A Step Towards Comprehensive Understanding of the Nitrogen Cycle
Logan, UT
Abstract: Nitrification is an important microbial two-step transformation in the global nitrogen cycle, as it is the only natural process that produces nitrate within a system. The functional annotation of nitrification-related enzymes has a broad range of applications in metagenomics, agriculture, industrial biotechnology, etc. The time and resources needed for determining the function of enzymes experimentally are restrictively costly. Therefore, an accurate genome-scale computational prediction of the nitrification-related enzymes has become much more important.In this study, we developed an alignment-free computational approach to determine the nitrification-related enzymes from the sequence itself. We propose deepNEC, a novel end-to-end feature selection and classification model training approach for nitrification-related enzyme prediction. The algorithm has been developed using Deep Learning, a class of machine learning algorithms that uses multiple layers to progressively extract higher-level features from the raw input data. The raw protein sequences encoding is used as an input, extracting sequential and convolutional features from raw encoded protein sequences based on classification rather than using traditional alignment-based methods for nitrification-related enzyme prediction. Two large datasets of protein sequences, enzymes, and non-enzymes were used to train the models with protein sequence features like amino acid composition, dipeptide composition, conformation transition and distribution (CTD), NMBroto, conjoint, quasi order, etc. The K-fold cross-validation and independent testing were performed to validate our model training. The models were further implemented as web server (deepNEC) which is publicly available. deepNEC uses a two-tier approach for prediction; in the first phase, it will predict a query sequence as enzyme or non-enzyme; in the second phase, it will further predict and classify enzymes into various nitrification-related enzyme classes. Among all, the DPC+NMBrot hybrid feature gave the best prediction performance (accuracy of 96.15% in k-fold training and 93.43% in independent testing) with an MCC (0.92 training and 0.87 independent testing) in phase I; in phase-II, the DPC feature gave the best prediction performance for 13 nitrification-related enzyme classes. We have also implemented a homology-based method to remove false negatives. The tool can be accessed freely at http://bioinfo.usu.edu/deepNEC/. Presentation Time: Thursday, 12-1 p.m.Zoom link: https://usu-edu.zoom.us/j/83738417563?pwd=SHlRcGdaaTdmVzVUOENqTnVHQ3UzZz09