Rahul Goswami Last update: October 15, 2024
Contact Information
Research Interests
Large Language Models (LLMs) for Health, Disease Prognosis, Temporal Phenotyping for Disease Outcomes, Statistical Machine Learning, Ensemble Techniques, Bayesian Inference, Recommender Systems, Anomaly Detection, Imbalanced Datasets
Education
Indian Institute of Technology, Guwahati. 2021--Current
- Senior Research Fellow at Department of Mathematics GPA in Coursework: 8.76/10 – via 42 credits.
- In my coursework, I took 6 Courses: Probability Theory, Real Analysis, Linear Algebra, Advanced Topics in ML, Advanced Statistical Algorithm, Statistical Foundation for Data Science.
- Course Repo for Advanced Statistical Algorithm: https://github.com/yuvrajiro/MA691.
Banaras Hindu University, Varanasi. 2018--2020
- M.Sc. Statistics and Computing from the Centre for Interdisciplinary Mathematical Sciences, with a GPA of 8.4/10. I completed 84 credits during my Master's, including courses in Bayesian statistics, computational statistics, regression, probability, and survival analysis.
- My interest in Bayesian inference was sparked during my Master's, when I took my first course in Bayesian statistics. This technique has since become a key part of my approach to data analysis.
- I have consolidated some of my R programming codes on my website, which you can access here: https://yuvrajiro.github.io/Mini-R-Programs/. These codes demonstrate my skills in R programming and statistics.
- One of my notable projects is my credit card fraud detection project, which I completed using random forest. You can view the project here.
Deen Dayal Upadhyaya Gorakhpur University, Gorakhpur. 2015--2018
- B.Sc. Statistics and Computer Science (2yrs. Mathematics) Department of Mathematics and Statistics
- For my Bachelor's, I majored in Statistics and Computer Science, with two years of Mathematics courses. I graduated with First Division honors.
- During my Bachelor's, I also completed the CS1 Certification Course by the Institute of Actuaries India, which is recognized by the Institute and Faculty of Actuaries (IFoA) in London. This certification demonstrates my knowledge and skills in actuarial science.
Technical Skills
- Programming Languages:
- Python: Expert – Advanced capabilities in ML, data analysis, and software development.
- R: Expert – Proficient in statistical analysis and visualization.
- Julia: Intermediate – Experienced in high-performance numerical computing.
- C: Basic – Familiar with low-level programming and algorithms.
- Machine Learning Frameworks and Libraries:
- PyTorch, TensorFlow: Expert in deep learning model development.
- scikit-learn, scikit-survival: Proficient in traditional ML and survival analysis.
- Hugging Face Transformers: Skilled in NLP and state-of-the-art language models.
- OpenAI GPT, Meta’s LLaMA: Familiar with cutting-edge large language models and their applications.
- Data Analysis and Visualization:
- ggplot2, dplyr: Expert in data visualization and manipulation.
- Tools and Technologies: Experienced with Docker, Git, and cloud platforms like AWS and Azure.
Work Experience, Freelancing, and Internship
Media and Data Science Research Intern @ Adobe May 2024 -- August 2024
- Conducted research on the memorization, reasoning, and counting capabilities of large language models (LLMs).
- Developed quantitative metrics to evaluate and measure the performance of LLMs in these specific areas.
- Utilized these metrics to guide the redesign and optimization of in-house LLMs at Adobe, enhancing their performance and efficiency.
Freelanced NLP Expert @ AI2 May 2024 -- June 2024
- Collaborated with the Allen Institute for AI on a project centered around natural language processing (NLP), focusing on leveraging advanced techniques to handle complex text data.
- Applied sophisticated NLP methods to extract and analyze critical information from a large corpus of historical research papers, involving tasks such as entity recognition, relationship extraction, and summarization.
- Contributed to the development of systems aimed at making historical research more accessible and useful by utilizing cutting-edge NLP technologies to improve information retrieval, context understanding, and knowledge synthesis.
Research Collaborator @ Huntington's Disease Project Dec 2022 -- Present
- Collaborated with researchers from Iowa University on a project focused on studying Huntington's Disease.
- Applied expertise in Linear Mixed Effect models to analyze data and identify covariates affecting the Year to Onset of the disease.
- Contributed to data analysis, interpretation of results, and discussions with the research team, furthering our understanding of the disease and its underlying factors.
Machine Learning Engineer @ Ready Tensor Mar 2021 -- Dec 2021
- Contributed to the early-phase startup, Ready Tensor, by spearheading the implementation of advanced Machine Learning (ML) models on their platform. Collaborated closely with the team to design and develop ready-to-dockerize versions of ML models, ensuring ease of deployment and scalability.
- Demonstrated expertise in improving model performance, leading to enhanced accuracy and efficiency compared to previous iterations. Played a pivotal role in advancing the platform's capabilities by integrating cutting-edge ML algorithms.
- Specific projects included:
Subject Matter Expert @ Chegg Inc. June 2018 -- Feb 2021
- Helped more than 1000 students with their work on Statistics and Probability.
- Provided personalized assistance, explanations, and guidance on complex concepts, resulting in improved understanding and academic performance for students.
Packages
cobsurv
- Description: cobsurv is a product of a broad research project, PBSA, which stands for Proximity-Based Survival Analysis. These currently contains combined regression strategy-based models proposed by me.
- PyPI Link: https://pypi.org/project/cobsurv/
- Documentation Link: https://cobsurv.readthedocs.io/en/latest/
- Project Overview and Future Plan: The aim of the project is to provide production-ready models for survival analysis. Future plans include implementing dockerized versions of survival models with Fast API integration. The API will be built inside the dockerized implementation, making it accessible for doctors to analyze survival data without the need for deep knowledge of survival models. The ultimate goal is to create a platform for doctors.
fastkme
Research Papers
- Survival: A Different Approach (Under Review): Developed and implemented novel data-dependent techniques to aggregate survival trees, enhancing the accuracy of survival estimates for patients. Explored innovative approaches for survival analysis, contributing to advancements in the field of statistical machine learning. Currently under review for publication.
- Concordance based Survival Cobra with Regression Type Weak Learners (Under Review): Developed and proposed a novel survival analysis method, utilizing concordance-based techniques in combination with regression-type weak learners. Explored innovative approaches for improving survival analysis accuracy and predictive performance. Paper available at: https://arxiv.org/abs/2209.11919
- Integrated Brier Score based Survival Cobra - A Regression Based Approach (Under Review): Proposed and developed an innovative survival analysis method using an integrated Brier score-based approach combined with regression techniques. Investigated the potential of this method to enhance survival prediction accuracy and evaluated its performance through empirical analysis. Paper available at: https://arxiv.org/abs/2210.12006
Teaching Assistance
- Coursera DA112: Introduction to R Programming Worked as a Teaching Assistant for the global course "DA112: Introduction to R Programming" on Coursera. Assisted in delivering course content, addressing student inquiries, and providing support to learners worldwide, enhancing their understanding of R programming.
- Financial Engineering: Collaborated with Dr. Arabin Kumar Dey in instructing the Financial Risk Analysis course at the Indian Institute of Technology Guwahati. Provided guidance and support to Bachelor's Degree students, helping them comprehend and apply concepts related to financial risk analysis. Contributed to fostering a comprehensive understanding of financial engineering principles among the students.
- Advanced Statistical Algorithm: Assisted Dr. Arabin Kumar Dey in teaching the Advanced Statistical Algorithm course at the Indian Institute of Technology Guwahati. Interacted with students, addressed their queries, and provided guidance in understanding complex statistical algorithms. Conducted sessions on introductory Python with a focus on statistics to enhance students' programming skills. Contributed to the effective delivery of the course and provided valuable support to students pursuing their Bachelor's Degree.
- R Programming: Assisted Dr. Arabin Kumar Dey in teaching R programming to Undergraduate students at IIT Guwahati. Provided guidance and support to students, helping them understand and apply fundamental concepts of R programming. Contributed to creating a positive learning environment and fostering students' programming skills.
- Applied Statistics: Assisted Dr. Palash Ghosh in to conduct bachelor level course at IIT Guwahati.
Certificate Courses and Achievements
- Qualified Graduate Aptitude Test in Engineering (GATE) (Statistics): Achieved All India Rank 63 out of thousands of candidates in the GATE exam, a highly competitive national-level examination conducted by premier institutes of India for admission to graduate programs in various subjects. Demonstrated proficiency in statistics and strong academic performance, which contributed to my admission to the Indian Institute of Technology, Guwahati.
- Qualified IIT-JAM: Secured All India Rank 306 in the subject Mathematical Sciences in the Joint Admission Test for Masters (IIT-JAM), a prestigious national-level exam for admission into Masters programs at Indian Institutes of Technology (IITs). Demonstrated strong mathematical and analytical skills, contributing to my academic success and admission to a competitive Master's program.
- Fuzzy Logic and Neural Networks, NPTEL: Completed the Fuzzy Logic and Neural Networks course as part of my Master's program. Achieved successful qualification in the proctored exam conducted by the Indian Institute of Technology. The course deepened my understanding of fuzzy logic systems and neural network architectures, enabling me to apply these concepts to complex problem domains. Certificate available upon request.
- Operations Research, NPTEL: Successfully completed the Operations Research course offered by NPTEL during my Master's program. Achieved a top ranking of 1% in the proctored exam conducted by the Indian Institute of Technology. The course enhanced my understanding of optimization techniques and their applications in real-world problem-solving. Certificate available at: Operations Research NPTEL Certificate Link.
- Andrew NG's Machine Learning Course: Completed the renowned Machine Learning Course offered by Andrew NG on Coursera. The course covered fundamental concepts and techniques in machine learning, providing a strong foundation in the field and enhancing my ability to develop and implement machine learning algorithms effectively. Certificate available at: Coursera Certificate Link.
- Practical Machine Learning With TensorFlow – NPTEL: Successfully completed the Massive Open Online Course (MOOC) on Practical Machine Learning With TensorFlow organized by NPTEL. The course provided comprehensive insights into practical applications of neural networks and deep learning techniques, contributing to my proficiency in machine learning and enhancing my practical skills in developing advanced models.
- CS1 from Institute of Actuaries India: Successfully completed Core Statistics 1, a rigorous and comprehensive course, at par with the Institute and Faculty of Actuaries (IFoA) standards. Acquired in-depth knowledge and skills in fundamental statistical concepts, enhancing my analytical abilities and contributing to a solid foundation in statistics.