Washington [US]: A team of researchers from the Agency for Science, Technology and Research and the National University of Singapore created software that accurately predicts the chemical alterations of RNA molecules based on genomic data. The method of it, called m6anetwas published in Nature Methods.
Within RNA, different types of chemical molecules added to the RNA determine how the RNA molecule functions. However, these RNA changes are often invisible to the standard approaches used by scientists to read RNA. Currently, more than 160 RNA modifications have been discovered, of which the most frequent RNA modification, N6-methyladenosine (m6A), is associated with human diseases such as cancer.
In the past, the identification of RNA modifications required lengthy and laborious bench experiments that were not accessible to most laboratories. Furthermore, previous methods were unable to detect m6A at single-molecule resolution, which is critical for understanding the biological mechanisms involving m6A.
The team overcame these limitations by taking advantage of Nanopore direct RNA sequencing, an emerging technology that sequences a raw RNA molecule along with its RNA modifications. In this study, they developed m6Anet, software that trains deep neural networks with rich data from direct nanopore RNA sequencing and a multi-instance learning (MIL) approach, to accurately detect the presence of m6A.
“In traditional machine learning, we often have a label for each example we want to classify. For example, each image is a cat or it is not a cat, and the algorithm learns to differentiate cat images from other images based on their characteristics.” labels.The problem with m6A detection is that we have an overwhelming amount of data with unclear labels.Imagine having a large photo album with a photo of a cat hidden among millions of other photos and trying to identify that particular photo without having to no label to base your search on. Fortunately, this has been studied before in the machine learning literature and is known as the MIL problem,” explained Christopher Hendra, current PhD student at the Genome Institute of Singapore (GIS) of A*STAR and the NUS Institute for Data Science, and the first author. of the studio. In this study, the team demonstrated that m6Anet can predict the presence of m6A with high precision at single-molecule resolution from a single sample across all species.
“Our AI model has only seen data from one human sample, but is able to accurately identify RNA modifications even in samples from species the model hasn’t seen before,” said Dr. Jonathan Goke, group leader of the A*STAR GIS Computational Transcriptomics Laboratory and lead author of the study. “The ability to identify RNA modifications in different biological samples can be used to understand their role in many different applications, such as in cancer research or plant genomics.”
“It is very satisfying to see how well-studied and theoretically-based machine learning techniques such as MIL can be harnessed to offer an elegant solution to this challenging problem. Witnessing such rapid adoption of the software by the scientific community is a reward for our efforts!” said Associate Professor Alexandre Thiery, Department of Statistics and Data Science, NUS Faculty of Science, who co-led the study.
Professor Patrick Tan, Executive Director of A*STAR GIS, said: “The accurate and efficient identification of RNA modifications has been a long-standing challenge, and m6Anet helps to address these limitations. To benefit the scientific community in Generally, this AI method, along with the results of the study have been made public for other scientists to speed up their research.”