Location

Weber State University

Start Date

5-8-2017 9:46 AM

End Date

5-8-2017 12:00 AM

Description

Codon bias, the usage patterns of synonymous codons for encoding a protein sequence as nucleotides, is a biological phenomenon that is not well understood. Current methods that measure and model the codon bias of an organism exist for usage in codon optimization. In synthetic biology, codon optimization is a task the involves selecting the appropriate codons to reverse translate a protein sequence into a nucleotide sequence to maximize expression in a vector. These features include codon adaptation index (CAI) [1], individual codon usage (ICU), hidden stop codons (HSC) [2] and codon context (CC) [3]. While explicitly modeling these features has helped us to engineer high synthesis yield proteins, it is unclear what other biological features should be taken into account during codon selection for protein synthesis maximization. In this article, we present a method for modeling global codon bias through deep language models that is more robust than current methods by providing more contextual information and long-range dependencies to be considered during codon selection.

Share

COinS
 
May 8th, 9:46 AM May 8th, 12:00 AM

Learning the Language of Genes: Representing Global Codon Bias with Deep Language Models

Weber State University

Codon bias, the usage patterns of synonymous codons for encoding a protein sequence as nucleotides, is a biological phenomenon that is not well understood. Current methods that measure and model the codon bias of an organism exist for usage in codon optimization. In synthetic biology, codon optimization is a task the involves selecting the appropriate codons to reverse translate a protein sequence into a nucleotide sequence to maximize expression in a vector. These features include codon adaptation index (CAI) [1], individual codon usage (ICU), hidden stop codons (HSC) [2] and codon context (CC) [3]. While explicitly modeling these features has helped us to engineer high synthesis yield proteins, it is unclear what other biological features should be taken into account during codon selection for protein synthesis maximization. In this article, we present a method for modeling global codon bias through deep language models that is more robust than current methods by providing more contextual information and long-range dependencies to be considered during codon selection.