Computer Science Student Research

ProFeatX: A Parallelized Protein Feature Extraction Suite for Machine Learning

Document Type

Article

Author ORCID Identifier

David Guevara-Barrientos https://orcid.org/0000-0003-3117-0777

Rakesh Kaundal https://orcid.org/0000-0001-8683-1240

Journal/Book Title/Conference

Computational and Structural Biotechnology Journal

Volume

Publisher

Research Networks AS

Publication Date

1-10-2023

Journal Article Version

Version of Record

First Page

796

Last Page

801

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Abstract

Machine learning algorithms have been successfully applied in proteomics, genomics and transcriptomics. and have helped the biological community to answer complex questions. However, most machine learning methods require lots of data, with every data point having the same vector size. The biological sequence data, such as proteins, are amino acid sequences of variable length, which makes it essential to extract a definite number of features from all the proteins for them to be used as input into machine learning models. There are numerous methods to achieve this, but only several tools let researchers encode their proteins using multiple schemes without having to use different programs or, in many cases, code these algorithms themselves, or even come up with new algorithms. In this work, we created ProFeatX, a tool that contains 50 encodings to extract protein features in an efficient and fast way supporting desktop as well as high-performance computing environment. It can also encode concatenated features for protein-protein interactions. The tool has an easy-to-use web interface, allowing non-experts to use feature extraction techniques, as well as a stand-alone version for advanced users. ProFeatX is implemented in C++ and available on GitHub at https://github.com/usubioinfo/profeatx. The web server is available at http://bioinfo.usu.edu/profeatx/.

Recommended Citation

Guevara-Barrientos, D., Kaundal, R. (2023) ProFeatX: A Parallelized Protein Feature Extraction Suite for Machine Learning. Computational and Structural Biotechnology Journal, 21 796-801. https://doi.org/10.1016/j.csbj.2022.12.044

Download

Included in

Computer Sciences Commons

COinS

DOI

https://doi.org/10.1016/j.csbj.2022.12.044

Computer Science Student Research

ProFeatX: A Parallelized Protein Feature Extraction Suite for Machine Learning

Document Type

Author ORCID Identifier

Journal/Book Title/Conference

Volume

Publisher

Publication Date

Journal Article Version

First Page

Last Page

Creative Commons License

Abstract

Recommended Citation

Included in

DOI

Browse

For Authors

Scholarly Communication

Research Data

Computer Science Student Research

ProFeatX: A Parallelized Protein Feature Extraction Suite for Machine Learning

Authors

Document Type

Author ORCID Identifier

Journal/Book Title/Conference

Volume

Publisher

Publication Date

Journal Article Version

First Page

Last Page

Creative Commons License

Abstract

Recommended Citation

Included in

Share

DOI

Browse

For Authors

Scholarly Communication

Research Data