About Me

Background

Ph.D. candidate in Health Data Science specializing in machine learning, large-scale model development, and multimodal data integration. I am experienced in designing and deploying deep learning architectures, especially transformer-based and self-supervised models, on distributed and cloud environments. My work focuses on building scalable, production-grade ML systems and translating research insights into high-performance software solutions, with a strong track record of independently driving complex projects from research to deployment.

Education

Ph.D. in Health Data Science, The George Washington University
Washington, DC (Expected Dec 2025)

Professional Experience

Graduate Research Assistant, George Washington University
Washington, DC (Jan 2022 – Present)

Engineered and deployed genomic foundation models using PyTorch, enabling large-scale biological sequence understanding through transformer-based architectures
Designed scalable end-to-end ML systems and data processing pipelines for high-throughput sequencing data, optimizing distributed GPU workloads across HPC and cloud environments
Developed a database-free taxonomic profiler powered by genomic language models, translating research prototypes into production-level software tools

Data Science and Machine Learning Intern, Curve Biosciences
San Mateo, CA & Washington, DC (May 2025 – Aug 2025)

Researched and developed multi-modal genomic language models integrating sequence and methylation data for disease detection using NGS technologies
Gained experience applying advanced machine learning strategies such as multiple instance learning (MIL) and curriculum learning in genomic model development

Data Scientist, Snapp
Tehran, Iran (Feb 2021 – Dec 2021)

Built and deployed predictive models and data pipelines for real-time analysis
Developed machine learning models for customer segmentation, enabling data-driven and targeted marketing strategies

Technical Skills

Programming Languages

Python, R, Bash

Machine Learning & AI

Deep learning, transformer architectures, multimodal representation learning, self-supervised learning, PyTorch, distributed model training, LLM fine-tuning

Cloud & Infrastructure

AWS, GCP, HPC clusters, distributed GPU training (PyTorch DDP), Docker, Git, CI/CD for ML pipelines

Data Science Tools

Reproducible ML workflows, scalable data preprocessing and pipeline optimization

Research Interests

Multimodal machine learning, genomic foundation models, self-supervised learning, large-scale model development, application of deep learning to complex biological and scientific data

Awards & Recognition

Winner, GW OSPO Open Source Project Award (2024)
Winner, Individual Contributor Award (2024)
First Prize, Doctoral Presenter in Public Health, George Washington University (2023)

Open-Source Contributions

deepBreaks: Developed a Python package for scalable ML pipelines in genotype-phenotype analysis (GitHub & PyPI)
Hugging Face Transformers: Enhanced DataCollatorForLanguageModeling with configurable token replacement probabilities to support specialized domain LLM training

Resume

You can download my Resume here: Download CV

A full list of publications is available on Google Scholar.