Document Type

Article

Journal/Book Title/Conference

The Plant Genome

Author ORCID Identifier

Naveen Duhan https://orcid.org/0000-0003-3014-4921

Rakesh Kaundal https://orcid.org/0000-0001-8683-1240

Volume

18

Issue

1

Publisher

John Wiley & Sons Ltd.

Publication Date

2-9-2025

Journal Article Version

Version of Record

First Page

1

Last Page

17

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Abstract

The organization of subcellular components in a cell is critical for its function and studying cellular processes, protein–protein interactions, identifying potential drug targets, network analysis, and other systems biology mechanisms. Determining protein localization experimentally is time-consuming and expensive. Due to the need for meticulous experimentation, validation, and data analysis, computational methods provide a quick and accurate alternative. Arabidopsis thaliana, a beneficial model organism in plant biology, facilitates experimentation and applies to other plants. Predicting its proteins' subcellular localization can improve our understanding of cellular processes and have applications in crop improvement and biotechnology. We propose AtSubP-2.0, an extension of our previously developed and widely used AtSubP v1.0 tool for annotating the Arabidopsis proteome. For precise protein subcellular localization prediction, AtSubP-2.0 employs a four-phase strategy. The first phase differentiates between single and dual localization with accuracy (97.66% in fivefold training/testing, 98.10% on independent data) and high Matthews correlation coefficient (0.88 training, 0.90 independent). Single localized proteins are classified into 12 locations at the second phase, with accuracy (98.37% in fivefold training/testing, 97.43% on independent data) and Matthews correlation coefficient (0.94 training, 0.91 independent). The third phase categorizes dual location proteins into nine classes with accuracy (99.65% in fivefold training/testing, 98.16% on independent data) and Matthews correlation coefficient (0.92 training, 0.87 independent). We also employed a fourth phase that classifies the membrane type proteins predicted in phase I into single-pass and multi-pass membrane with accuracy (98% in fivefold training/testing, 98.55% on independent data) and a high Matthews correlation coefficient (0.95 training, 0.97 independent). A web-based prediction server has been implemented for community use and is freely available at https://kaabil.net/AtSubP2/, including a standalone version. AtSubP2 will help researchers to better understand organelle-specific functions, cellular processes, and regulatory mechanisms important for plant growth, development, and response to environmental stimuli.

Share

COinS