About Me
Background
Ph.D. candidate in Health Data Science specializing in machine learning, large-scale model development, and multimodal data integration. I am experienced in designing and deploying deep learning architectures, especially transformer-based and self-supervised models, on distributed and cloud environments. My work focuses on building scalable, production-grade ML systems and translating research insights into high-performance software solutions, with a strong track record of independently driving complex projects from research to deployment.
Education
- Ph.D. in Health Data Science, The George Washington University
Washington, DC (Expected Dec 2025)
Professional Experience
Graduate Research Assistant, George Washington University
Washington, DC (Jan 2022 – Present)
- Engineered and deployed genomic foundation models using PyTorch, enabling large-scale biological sequence understanding through transformer-based architectures
- Designed scalable end-to-end ML systems and data processing pipelines for high-throughput sequencing data, optimizing distributed GPU workloads across HPC and cloud environments
- Developed a database-free taxonomic profiler powered by genomic language models, translating research prototypes into production-level software tools
Data Science and Machine Learning Intern, Curve Biosciences
San Mateo, CA & Washington, DC (May 2025 – Aug 2025)
- Researched and developed multi-modal genomic language models integrating sequence and methylation data for disease detection using NGS technologies
- Gained experience applying advanced machine learning strategies such as multiple instance learning (MIL) and curriculum learning in genomic model development
Data Scientist, Snapp
Tehran, Iran (Feb 2021 – Dec 2021)
- Built and deployed predictive models and data pipelines for real-time analysis
- Developed machine learning models for customer segmentation, enabling data-driven and targeted marketing strategies
Technical Skills
Programming Languages
Python, R, Bash
Machine Learning & AI
Deep learning, transformer architectures, multimodal representation learning, self-supervised learning, PyTorch, distributed model training, LLM fine-tuning
Cloud & Infrastructure
AWS, GCP, HPC clusters, distributed GPU training (PyTorch DDP), Docker, Git, CI/CD for ML pipelines
Data Science Tools
Reproducible ML workflows, scalable data preprocessing and pipeline optimization
Research Interests
Multimodal machine learning, genomic foundation models, self-supervised learning, large-scale model development, application of deep learning to complex biological and scientific data
Awards & Recognition
- Winner, GW OSPO Open Source Project Award (2024)
- Winner, Individual Contributor Award (2024)
- First Prize, Doctoral Presenter in Public Health, George Washington University (2023)
Open-Source Contributions
- deepBreaks: Developed a Python package for scalable ML pipelines in genotype-phenotype analysis (GitHub & PyPI)
- Hugging Face Transformers: Enhanced
DataCollatorForLanguageModelingwith configurable token replacement probabilities to support specialized domain LLM training
Resume
You can download my Resume here: Download CV
A full list of publications is available on Google Scholar.