OBJECTIVE To develop a model for bacterial mutagenicity prediction, which can be used to assess the genotoxic potential of drugs and their impurities.
METHODS Bacterial mutagenicity data were collected from the literature and randomly split into training and testing sets at a ratio of 4∶1. Extended connectivity fingerprints were used as compound features. Fingerprint and algorithm parameters were optimized on the training set to establish a QSAR model. Based on the parameter tuning results, the optimal parameters were selected for modeling. Predicted results of the test data were obtained. The predictive ability of the model was validated by comparing the predicted results with the true values.
RESULTS A total of 8329 bacterial mutagenicity data were collected from the literature. The parameters of extended connectivity fingerprints were optimized, and a set of superior parameters were obtained. The best parameters were used to generate features, which were employed to construct models by using support vector machine. The model performed best when gamma was 0.001 and C was 2.15. The accuracy, precision, recall, and area under the receiver operating characteristic curve of the optimal model on the test set were 0.788, 0.783, 0.846, and 0.855, respectively.
CONCLUSION By optimizing the fingerprint and algorithm parameters, a bacterial mutagenicity prediction model has been successfully established, which holds promise for facilitating the rapid screening of potential genotoxic impurities.