LSTM-driven drug design using SELFIES for target-focused de novo generation of HIV-1 protease inhibitor candidates for AIDS treatment

Creative Commons License

Albrijawi M. T., ALHAJJ R.

PLoS ONE, vol.19, no.6 June, 2024 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 19 Issue: 6 June
  • Publication Date: 2024
  • Doi Number: 10.1371/journal.pone.0303597
  • Journal Name: PLoS ONE
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Agricultural & Environmental Science Database, Animal Behavior Abstracts, Aquatic Science & Fisheries Abstracts (ASFA), BIOSIS, Biotechnology Research Abstracts, Chemical Abstracts Core, Food Science & Technology Abstracts, Index Islamicus, Linguistic Bibliography, MEDLINE, Pollution Abstracts, Psycinfo, zbMATH, Directory of Open Access Journals
  • Istanbul Medipol University Affiliated: Yes


The battle against viral drug resistance highlights the need for innovative approaches to replace time-consuming and costly traditional methods. Deep generative models offer automation potential, especially in the fight against Human immunodeficiency virus (HIV), as they can synthesize diverse molecules effectively. In this paper, an application of an LSTM-based deep generative model named "LSTM-ProGen"is proposed to be tailored explicitly for the de novo design of drug candidate molecules that interact with a specific target protein (HIV-1 protease). LSTM-ProGen distinguishes itself by employing a longshort- term memory (LSTM) architecture, to generate novel molecules target specificity against the HIV-1 protease. Following a thorough training process involves fine-tuning LSTM-ProGen on a diverse range of compounds sourced from the ChEMBL database. The model was optimized to meet specific requirements, with multiple iterations to enhance its predictive capabilities and ensure it generates molecules that exhibit favorable target interactions. The training process encompasses an array of performance evaluation metrics, such as drug-likeness properties. Our evaluation includes extensive silico analysis using molecular docking and PCA-based visualization to explore the chemical space that the new molecules cover compared to those in the training set. These evaluations reveal that a subset of 12 de novo molecules generated by LSTM-ProGen exhibit a striking ability to interact with the target protein, rivaling or even surpassing the efficacy of native ligands. Extended versions with further refinement of LSTM-ProGen hold promise as versatile tools for designing efficacious and customized drug candidates tailored to specific targets, thus accelerating drug development and facilitating the discovery of new therapies for various diseases.