A machine learning model to categorize proteins based on sequence - Use case for prediction of ubiquitination enzymes E2 and E3

Patil, Rajat and Raghavendra, N K (2019) A machine learning model to categorize proteins based on sequence - Use case for prediction of ubiquitination enzymes E2 and E3. Masters thesis, Indian institute of technology Hyderabad.

[img] Text
Restricted to Repository staff only until 10 July 2024.

Download (1MB) | Request a copy


Advancements in the field of genomics and proteomics have led to the development of high throughput methods that generate data on a large scale. Categorization of the data thus generated can help us better understand the nature of data, make generalizations and track changes during perturbations. The methods that currently exist for protein characterization have input limitation and run complex algorithms to find known patterns in sequences thereby requiring high computation capabilities and time. The pileup of datasets generated by methods like genome sequencing or expression profiles of different cancers have necessitated the decryption of patterns and ability to make predictions. The work described here focuses on using a deep neural network to create a model that learns to categorize a given set of proteins and predict if a new protein belongs to either. Manually annotated data was gathered for the desired categorization – E2 and E3 enzymes of the ubiquitination pathway. Processed data was then used to train the neural network to achieve a validation accuracy of 97.68 % with a validation loss of 0.0771. The trained model was used to predict 25 new E2s and >3000 E3s from a sample of 5000 proteins interacting with E3s. The predicted E2s contain the UBCc domain indicating the model also learnt to identify the UBCc domain from sequences of 193 E2s. By adjusting the model parameters, the model can be used for categorizing any number proteins or to find new domains. Advantage of this model being a learning model is that its performance only improves with addition of data.

[error in script]
IITH Creators:
IITH CreatorsORCiD
Raghavendra, N Khttp://orcid.org/0000-0003-2220-1148
Item Type: Thesis (Masters)
Uncontrolled Keywords: Ubiquitination, Machine learning, Neural network
Subjects: Others > Biotechnology
Divisions: Department of Biotechnology
Depositing User: Team Library
Date Deposited: 11 Jul 2019 04:51
Last Modified: 11 Jul 2019 04:51
URI: http://raiith.iith.ac.in/id/eprint/5691
Publisher URL:
Related URLs:

    Actions (login required)

    View Item View Item
    Statistics for RAIITH ePrint 5691 Statistics for this ePrint Item