基于分子理化性质特征的小样本G蛋白偶联受体靶点结合活性预测的深度学习模型

    Binding Activity Prediction of the Low-data G-protein Coupled Receptors Targets by Deep Learning of Knowledge-based Molecular Representations

    • 摘要: 目的 使用MolMapNet构建深度学习(deep learning,DL)模型,预测化合物对23个小样本(已知活性数据<250)G蛋白偶联受体(G-protein coupled receptors,GPCRs)的结合活性,辅助发现GPCRs的新型药物。方法 从多个数据库搜集小样本GPCRs的活性数据集并进行预处理,使用MolMapNet构建DL模型;将建立的模型与已公布DL模型和ML模型进行比较;用神经肽S受体专利化合物对构建的模型进行评估。结果 构建了23个小样本GPCRs靶点的单回归模型,在10折交叉验证测试下,构建的模型在测试集上的均方根误差为0.373 6~1.199 8(其中20个<1),平均绝对误差为0.299 4~1.008 3(其中21个<1),R2为0.136 9~0.810 7(其中15个>0.5,9个>0.6);与已发表的大样本GPCRs(已知活性数据>250) DL模型和小样本靶点的ML模型相比,显示出相当的性能;使用构建的模型对专利中化合物进行活性预测,模型表现良好。结论 构建的23个回归模型能够预测化合物对特定靶点的生物活性,具有筛选结构新颖的药物的潜力,MolMapNet可用于小样本GPCRs的活性预测。

       

      Abstract: OBJECTIVE To construct new deep learning(DL) models for binding activity prediction against each of 23 low-data G-protein coupled receptors(GPCRs)(known binders <250) using MolMapNet, assisting in the novel drug discovery of GPCRs. METHODS Binding activity datasets of low-data GPCRs were collected from multiple databases and preprocessed, and DL models were constructed by MolMapNet; the established models were compared with published DL models and ML models; Neuropeptide S receptor proprietary compounds to evaluate the constructed model. RESULTS Under 10-fold cross-validation tests, MolMapNet DL models predicted the binding activity values of the test-set compounds for each GPCR with RMSE 0.373 6-1.199 8(20 among which RMSE<1), MAE 0.299 4-1.008 3(21 among which MAE<1), and R2 0.136 9-0.810 7(15 among which R2 >0.5, 9 among which R2 >0.6). Our low-sample models showed comparable performances to those of the published DL models trained with higher-data GPCRs(>250 known binders). Our models also performed well in activity prediction of patented GPCR binders. CONCLUSION The 23 models constructed here can predict the biological activity of a compound against a specific target with good performance, have the potential to screen drugs with novel structures, and MolMapNet architecture is useful for activity prediction against the low-sample GPCR targets.

       

    /

    返回文章
    返回