At the AI Impact Summit 2026, the Bengaluru-based startup Sarvam AI released two Large Language Models (LLMs).
- The two models were trained on 35 billion and 105 billion parameters, respectively, and were less power- and compute-intensive than comparable models.
About Large Language Models (LLMs)
- Large language models (LLMs) are advanced AI systems designed to understand and generate human-like text.
- They learn from vast amounts of written data to predict what comes next in a sentence or to create coherent responses to questions.
Architecture and Training: LLMs use deep learning with transformer architectures, like Generative Pre-trained Transformer (GPT), designed for processing sequential text data.
-
- They feature multiple neural network layers and an attention mechanism for context understanding.
Training of Large Language Models (LLMs)
- Training Process: LLMs are trained on massive clusters of Graphics Processing Units (GPUs), which provide the computational power required to process vast amounts of data.
- The model learns to predict the next word in a sentence based on the context provided by previous words.
- Tokenization and Embeddings: Words are broken down into tokens, which are then converted into numerical embeddings representing the context.
- Massive Text Corpora: LLMs are trained on extensive text data, allowing them to learn grammar, semantics, and conceptual relationships.
- Learning Techniques: They use zero-shot and self-supervised learning to generalise from the data.
- Zero-shot learning refers to a model’s ability to handle tasks or make predictions about data it has not seen during training.
- Enhancing Accuracy: Performance is improved through prompt engineering, fine-tuning, and reinforcement learning with human feedback (RLHF) to address biases and inaccuracies.
Challenges in Training LLMs
- Limited Capital: Since capital is scarce, efforts to train an LLM by Indian firms targeting Indian users can be challenging, especially if there is no immediate business use case for doing so.
- For example, training a 70-billion-parameter LLM can cost around $6 million, a prohibitive amount for early-stage Indian startups without assured near-term returns.
- High Capital Intensity: Training and operating LLMs requires expensive GPU clusters and massive electricity consumption, running into millions of dollars.
- For Example: Training GPT-3 cost over $4–5 million in compute, while GPT-4 reportedly required tens of millions of dollars and thousands of GPUs running for months.
- Scarcity of Indian Language Data: Internet data is dominated by English, European, Korean, and Japanese content, leaving Indian languages underrepresented.
- For Example, English makes up over 50% of web content, while most Indian languages each account for less than 1%, leading to minimal representation in datasets like Common Crawl.
- Performance Gap in Indian Languages: Due to limited native datasets, LLMs often perform poorly in Indian languages compared to English.
- Higher Token Consumption: Many models translate Indian language inputs into English for better processing and then translate outputs back, increasing token usage and inference costs.
- For Example: A 10-word English sentence may use around 12–15 tokens, whereas the same sentence in Hindi (Devanagari script) can consume 20–25 tokens due to tokenisation inefficiencies.
Government Support and Institutional Push
- IndiaAI Mission Subsidy: The IndiaAI Mission has commissioned over 36,000 GPUs in Indian data centres (e.g., Yotta) to provide affordable compute access to researchers and startups.
- Direct Support to Sarvam: The government allocated 4,096 GPUs from its common compute cluster to Sarvam, with subsidies estimated at nearly ₹100 crore.
- Ministry of Electronics and Information Technology (MeitY): It promotes domestic LLMs to build skilled talent in model training and to strengthen the overall Indian AI ecosystem in Indian languages and socio-cultural contexts.
|
About Mixture of Experts (MoE)
- Mixture of Experts (MoE) is a way of designing AI models so that only the necessary parts of the model are used for each question, instead of using the whole model every time.
- For Example:
- Imagine a school with many teachers (experts).
- If a student asks a maths question, only the maths teacher answers, not the history or science teachers.
- Similarly:
- In a normal AI model, all parts work for every question, which uses a lot of power and money.
- In an MoE model, only a few specialised parts are activated, making it faster and cheaper.
Way Forward
- Expand Indian Language Datasets: Create large, high-quality, annotated corpora in Hindi, Tamil, Bengali, Marathi and other Indian languages through public–private partnerships and initiatives like Bhashini.
- Focused Sectoral Models: Develop smaller, domain-specific LLMs for governance, education, healthcare, agriculture, and law instead of only competing with frontier global models.
- Industry Academia Collaboration: Strengthen partnerships between IITs, IIITs, startups, and MeitY to build skilled AI talent and research depth.
- Energy Efficient Architectures: Adopt approaches like Mixture of Experts (MoE) and model compression to reduce training and inference costs
Indigenous LLM Efforts
- BharatGen (IIT Bombay-incubated): Trained a multilingual 17-billion parameter model aimed at sectors like education and healthcare.
- Gnani.ai: Launched a smaller text-to-speech model, focusing on speech-based AI applications.
|