Federated Learning for Collaboratively Screening Drugs

Nautiyal, Rishabh

Please use this identifier to cite or link to this item: http://dspace.cityu.edu.hk/handle/2031/9561

Full metadata record

DC Field	Value	Language
dc.contributor.author	Nautiyal, Rishabh	en_US
dc.date.accessioned	2023-03-15T10:12:07Z	-
dc.date.available	2023-03-15T10:12:07Z	-
dc.date.issued	2022	en_US
dc.identifier.other	2022csnr640	en_US
dc.identifier.uri	http://dspace.cityu.edu.hk/handle/2031/9561	-
dc.description.abstract	Drugs have been a vital part of allowing human evolution. There is a huge dependency on drugs in multiple fields including but not limited to the medical, cosmetic and wellness industries. However, even though they provide an important assistance for day to day lives, the making of drugs is expensive. Therefore, lately machine learning has been utilized to fasten the process of drug discovery. Yet, this field has its own share of problems, the biggest of which is the requirement of huge amounts of data. All machine learning models require large amounts of diverse data to train accurately, and this is especially daunting for the drug discovery industry. The problem is due to the pipeline of drug making and the research related to it. Pharmaceutical research and collecting data that can help develop new drugs is an expensive process and pharmaceutical institutions use a huge amount of time and money in it. This leads to an environment of competition and confidentiality in the industry. As a result pharmaceutical companies that have the research and development power to make drugs, have policies that keep their research, data and progress confidential. This hampers the inception of collaborative efforts amidst pharmaceutical companies and in turn slows down the process of drug discovery. Further, no strategic collaboration can be done as pharmaceutical data is often patented and kept as intellectual property and along with huge financial benefits that companies can receive on launching new drugs, even willing collaborators are unwilling to share data publicly or even via partnerships. Thus, in my project, I aspire to tackle the body of literature that can help create collaborative practices while keeping confidentiality of data to hasten the process of drug discovery. I do this by creating a system for convenient collaboration through the use of Federated Learning. Machine Learning usage in Drug Discovery is a well researched topic with proven success in the past. Ligand Based Drug Discovery has been particularly successful in the virtual screening of active compounds against the targeted protein/gene. However, with the required data for model building being collected by the drug companies individually and being highly confidential, further research on the topic has been negatively impacted. This is where Federated Learning can be utilized to train the models on distributed data through secure communication means, all the while maintaining the confidential nature of it. In this project I am investigating and improving the existing models for the use of Federated Learning in LBDD. The project has explored three different Federated Learning models on the given dataset, namely Federated Stochastic Gradient Descent (FedSGD), Federated Averaging (FedAvg) and Federated Prox (FedProx). All three are well known federated learning models that solve different aspects of the problem. The models are developed in a virtual environment that simulates multiple clients on the same machine. I have then built the FedAvg (best performing model) over an open source framework called Flower for developing federated learning models which can communicate over the net and collaborate with physically different clients. In the project I have also developed a web application using ReactJS and Material UI (MUI) to facilitate the collaboration among the clients. The application provides useful information to the collaborators about the nature of the model and how privacy is preserved. The application connects to a Flask backend that hosts the pre-trained model, allowing the users to access the results from these to see the performance of the models. Moreover, the application allows the users to quickly download the required code and set up their local environments to start collaborating with their own dataset. The project has demonstrated the ability of the Federated Learning models to perform as well as the base models all the while adding additional benefits for the pharmaceutical industry in terms of preserving data privacy and confidentiality, and eliminating the need of data sharing among the collaborators. Finally, the project also acknowledges certain limitations and attempts to provide further research areas to alleviate the same.	en_US
dc.rights	This work is protected by copyright. Reproduction or distribution of the work in any format is prohibited without written permission of the copyright owner.	en_US
dc.rights	Access is restricted to CityU users.	en_US
dc.title	Federated Learning for Collaboratively Screening Drugs	en_US
dc.contributor.department	Department of Computer Science	en_US
dc.description.supervisor	Supervisor: Dr. Wei, Ying; First Reader: Dr. Wong, Ka Chun; Second Reader: Prof. Liang, Weifa	en_US
Appears in Collections:	Computer Science - Undergraduate Final Year Projects

Files in This Item:

File	Size	Format
fulltext.html	147 B	HTML	View/Open

Show simple item record