1. Worked with a client’s internal Risk team on providing explanations of actions (based on specified rules) taken by the platform on users such as banning them.
2. Populated a Hive table with contributing features which triggered the actions to be taken with a cronjob. Performed a progressive backwards search on these (action, feature) pairings to identify root causes with the help of an API call.
3. Leveraged AWS S3 for scalable data storage, and Apache Airflow to automate and schedule data collection workflows, resulting in a 35% improvement in data coverage of financial reports scraped from SEC and Bloomberg terminal.
4. Developed a semantic retrieval system using OpenAI embeddings leveraging Elasticsearch for scalable xxing and retrieval, reducing query latency and improving search extraction speed by 90%.
5. Containerised the Retrieval Augmented Generation(RAG) model using Docker and Amazon ECR and deployed as a serverless function in AWS Lambda.
1. Selected for a fully funded onsite Research internship through the prestigious Globalink MITACS research internship program.
2. Developed a knowledge-grounded counter-narrative generation pipeline for online hate speech, leveraging large-scale pre-trained language models GPT-2 and XNLG for generating contextually relevant and informative responses.
3. Engineered a multi-stage knowledge retrieval and generation system utilizing transformer-based keyphrase extraction and BM25-based retrieval, integrating structured and unstructured external knowledge into counter-narratives.
1. Developed a multi-user, MERN stack healthcare application serving special needs children, parents, and therapists, featuring a synchronized scheduling and emailing system.
2. Utilized Docker for web app integration and deployed it via the NGINX local server.
1. Implemented a transformers based equivariance learning framework into a table-to-text generation CI-CD pipeline for ingesting tabular data.
2. Utilized RoBERTa, and T5 to enhance text retrieval and generation accuracy, resulting in a 26% increase in BLEU score on proprietary user data. New functionality included QnA over large Excel sheets and embdedded PDF tables.
3. I led the deployment of the RAG Tool at State Bank of India with 10000+ documents uploaded every month.
1. Formalised and developed a novel noise-reduction framework using sentence-transformer LLMs. Buit a rumour-detection model ensembling LSTM and CNN models on top for improved detection of rumours from twitter. Paper under review at Expert Systems with Applications Journal.
2.Developed an MLP regressor model using m-BERT embeddings to evaluate the quality of synthetic machine-generated code-mixed sentences. Paper accepted at INLG 2022.
3. Built NER models for hashtag segmentation, scraped and processed 3.3M hashtags from Twitter using a loosely supervised approach. Paper accepted at LREC22.
1. Developed an LSTM-based Seq-to-Seq activity forecasting system using Scikit-learn, PyTorch, on Netease Music users, processing over 56 million records from more than a million users. Achieved 79% accuracy.
2. Leveraged distributed computing with Apache Spark for scalable data preprocessing and implemented sliding window and seasonal decomposition techniques to enhance temporal pattern accuracy
Working on automating financial research analytics for evaluating pricing posiiton of a company leveraging LLMs. Exploring conversation chains and LLM agents in langchain. Using chromaDB for storing and indexing.
Multifaceted job application portal, leveraging ExpressJS backend, MongoDB Atlas, ReactJS frontend allowing recruiters to post openings and real time status tracking for job searchers.
A lightweight, multi threaded indexer and searcher for Wikimedia dump. System was able to extract top-5 search results in under 0.1 seconds based on BM-25 scores. Web-app coming up soon!
Feature-rich, Terminal based game, employing Python and comprehensive OOP principles (Encapsulation, Inheritance, Abstraction, and Polymorphism) and OS multi-threading and asynchronous I/O operations.
Modular Bash-like shell in C, leveraging core Operating Systems concepts for efficient piping, signal handling, foreground/background process management and I/O exceptions handling with systems.
Sophisticated search operations, multi-field data updates, and record maintenance. Enhanced user experience by efficient handling of complex queries and database management.
Graduate CS student at Arizona State University
I'm Naman Ahuja, a Computer Science graduate from IIIT Hyderabad, now embarking on my master's journey at Arizona State University. As a machine learning enthusiast with research experience in applied NLP for social good, I am passionate about solving complex real-world problems and thrive on exploring new fields and domains within technology. My curiosity drives me to continuously learn and adapt, and I take pride in seeing my projects through to completion. I am currently on the lookout for new collaborations. So if you have a cool idea that needs to be built or a complex problem that needs to be solved, feel free to reach out to me!