New AI tool can decode DNA sequences

Recently, In the journal Nature Machine, findings on this new tool “GROVER” which can extract important information out of DNA sequence were published.

About DNA

  • DNA, or deoxyribonucleic acid, is the central information storage system of most animals and plants, and even some viruses.
    • DNA is organised structurally into chromosomes and then wound around nucleosomes as part of those chromosomes. 
  • Classification: The name comes from its structure, which is a sugar and phosphate backbone which have bases sticking out from it—so-called bases.
    • It’s a polymer of four bases – Adenine (A), Cytosine (C), Guanine (G), and Thymine (T))
  • Double Helix model: In 1953 James Watson and Francis Crick, based on the X-ray diffraction data produced by Maurice Wilkins and Rosalind Franklin, proposed a very simple but famous Double Helix model for the structure of DNA. 
    • A DNA molecule consists of two strands wound around each other, with each strand held together by bonds between the bases. Adenine pairs with thymine, and cytosine pairs with guanine. 
    • Gene: The sequence of bases in a portion of a DNA molecule, called a gene, carries the instructions needed to assemble a protein

Enroll now for UPSC Online Course

  • Hallmarks: Base pairing between the two strands of polynucleotide chains.

DNA

About GROVER

  • GROVER is a new large language model trained on humans.
  • DNA that can extract important information out of DNA sequences, such as identifying gene promoters or protein binding sites
  • Significance: The researchers believe tools like GROVER could help transform genomics and personalized medicine. 
  • To train GROVER, the team at the Biotechnology Center (BIOTEC) of Dresden University of Technology in Germany, first created a ‘DNA dictionary’. 
  • The DNA Dictionary: DNA resembles language. It has four letters that build sequences and the sequences carry a meaning
    • DNA consists of four letters (A, T, G, and C) and genes, but there are no predefined sequences of different lengths that combine to build genes or other meaningful sequences.
    • Information hidden in the DNA is multilayered. Only 1-2 % of the genome consists of genes, the sequences that code for proteins.
  • GROVER Role: ​​Grover learns the grammar of DNA
    • In terms of the DNA code, this means learning the rules of the sequences, i.e. the order of the nucleotides and their meaning
    • For example: It’s Similar to how GPT models learn human languages, Grover has basically learned to speak DNA,
  • GROVER Functioning: Grover can not only predict the sequence of DNA sequences for certain genetic information, but also derive information of biological relevance from the context, such as the start of genes or protein binding sites on the DNA
    • Grover also learns processes that are considered “epigenetic“.
      • Epigenetics: It is the study of how cells control gene activity without changing the DNA sequence. 

Check Out UPSC CSE Books From PW Store

GROVER Training

  • DNA dictionary using byte pair encoding (BPE) : To train Grover, the team first created a DNA dictionary using byte pair encoding (BPE) –, a tokenization strategy – originally developed for transformer models such as GPT-3, and examined the entire genome for the most common letter combinations. 

 

Must Read
UPSC Daily Editorials UPSC Daily Current Affairs
Check Out UPSC NCERT Textbooks From PW Store Check Out UPSC Modules From PW Store 
Check Out Previous Years Papers From PW Store UPSC Test Series 2024
Daily Current Affairs Quiz Daily Main Answer Writing
Check Out UPSC CSE Books From PW Store

 

To get PDF version, Please click on "Print PDF" button.

Need help preparing for UPSC or State PSCs?

Connect with our experts to get free counselling & start preparing

THE MOST
LEARNING PLATFORM

Learn From India's Best Faculty

      
Quick Revise Now !
AVAILABLE FOR DOWNLOAD SOON
UDAAN PRELIMS WALLAH
Comprehensive coverage with a concise format
Integration of PYQ within the booklet
Designed as per recent trends of Prelims questions
हिंदी में भी उपलब्ध
Quick Revise Now !
UDAAN PRELIMS WALLAH
Comprehensive coverage with a concise format
Integration of PYQ within the booklet
Designed as per recent trends of Prelims questions
हिंदी में भी उपलब्ध

<div class="new-fform">







    </div>

    Subscribe our Newsletter
    Sign up now for our exclusive newsletter and be the first to know about our latest Initiatives, Quality Content, and much more.
    *Promise! We won't spam you.
    Yes! I want to Subscribe.