ProteInfer

来自OmicsWiki
Yugan讨论 | 贡献2023年12月4日 (一) 06:55的版本
(差异) ←上一版本 | 最后版本 (差异) | 下一版本→ (差异)
跳到导航 跳到搜索

摘要

Predicting the function of a protein from its amino acid sequence is a long-standing challenge in bioinformatics. Traditional approaches use sequence alignment to compare a query sequence either to thousands of models of protein families or to large databases of individual protein sequences. Here we introduce ProteInfer, which instead employs deep convolutional neural networks to directly predict a variety of protein functions – Enzyme Commission (EC) numbers and Gene Ontology (GO) terms – directly from an unaligned amino acid sequence. This approach provides precise predictions which complement alignment-based methods, and the computational efficiency of a single neural network permits novel and lightweight software interfaces, which we demonstrate with an in-browser graphical interface for protein function prediction in which all computation is performed on the user’s personal computer with no data uploaded to remote servers. Moreover, these models place full-length amino acid sequences into a generalised functional space, facilitating downstream analysis and interpretation. To read the interactive version of this paper, please visit https://google-research.github.io/proteinfer/.

从蛋白质的氨基酸序列中预测蛋白质的功能是生物信息学中一个长期存在的挑战。传统方法使用序列比对将查询序列与数千个蛋白质家族模型或单个蛋白质序列的大型数据库进行比较。

ProteInfer,它采用深度卷积神经网络直接从未对齐的氨基酸序列中直接预测各种蛋白质功能——Enzyme Commission(EC) 编号和Gene Ontology (GO) 项。这种方法提供了精确的预测,补充了基于比对的方法,并且单个神经网络的计算效率允许新颖和轻量级的软件界面,我们通过浏览器内用于蛋白质功能预测的图形界面进行了演示,其中所有计算都在用户的个人计算机上执行,没有数据上传到远程服务器。此外,这些模型将全长氨基酸序列置于广义功能空间中,便于下游分析和解释。要阅读本文的互动版本,请访问 https://google-research.github.io/proteinfer/。

论文内容

可在此处查看静态论文:https://elifesciences.org/articles/80942,动态论文则在此查看:https://google-research.github.io/proteinfer/

部署(未经测试)

需要cuda和python 环境

Get our code from github and install python dependencies (e.g. numpy)

git clone https://github.com/google-research/proteinfer
cd ~/proteinfer
pip3 install -r requirements.txt

Run our code on test sequences

cd ~/proteinfer
python3 install_models.py
python3 proteinfer.py -i testdata/test_hemoglobin.fasta -o ~/hemoglobin_predictions.tsv

# View your predictions.
cat ~/hemoglobin_predictions.tsv