The journey to a Telugu LLM

The journey to a Telugu LLM

Our goal at Viswam has always been to create Telugu LLM from scratch with open source data and models. Collect data through crowd sourcing efforts and open source all the data. In this regard we started of with the 1 lakh internship program this summer. The program was mildly successful as we provided training to around 50,000 students so far and have collected thousands of hours of Telugu data. But that is raw data and we still need to clean up and lots of validations. While that process is ongoing, we thought lets start the effort on creating a base model.

For this, when Karpathy released Nanochat, we knew that this has come at the right time. So me and my team at Ozonetel(Biswajit and Sudhamay) using the help of Viswam decided, lets start the process of building a Telugu LLM from scratch and lets document that effort.

This series of blog posts is a result of that. I will be posting our journey of creating the Telugu LLM. Feel free to subscribe and follow and join this journey and share your ideas and effort. After all, this is a volunteer effort and everything will be open sourced.