About Me

Background

Ph.D. candidate in Health Data Science specializing in machine learning, large-scale model development, and multimodal data integration. I am experienced in designing and deploying deep learning architectures, especially transformer-based and self-supervised models, on distributed and cloud environments. My work focuses on building scalable, production-grade ML systems and translating research insights into high-performance software solutions, with a strong track record of independently driving complex projects from research to deployment.

Education

  • Ph.D. in Health Data Science, The George Washington University
    Washington, DC (Expected Dec 2025)

Professional Experience

Graduate Research Assistant, George Washington University
Washington, DC (Jan 2022 – Present)

  • Engineered and deployed genomic foundation models using PyTorch, enabling large-scale biological sequence understanding through transformer-based architectures
  • Designed scalable end-to-end ML systems and data processing pipelines for high-throughput sequencing data, optimizing distributed GPU workloads across HPC and cloud environments
  • Developed a database-free taxonomic profiler powered by genomic language models, translating research prototypes into production-level software tools

Data Science and Machine Learning Intern, Curve Biosciences
San Mateo, CA & Washington, DC (May 2025 – Aug 2025)

  • Researched and developed multi-modal genomic language models integrating sequence and methylation data for disease detection using NGS technologies
  • Gained experience applying advanced machine learning strategies such as multiple instance learning (MIL) and curriculum learning in genomic model development

Data Scientist, Snapp
Tehran, Iran (Feb 2021 – Dec 2021)

  • Built and deployed predictive models and data pipelines for real-time analysis
  • Developed machine learning models for customer segmentation, enabling data-driven and targeted marketing strategies

Technical Skills

Programming Languages

Python, R, Bash

Machine Learning & AI

Deep learning, transformer architectures, multimodal representation learning, self-supervised learning, PyTorch, distributed model training, LLM fine-tuning

Cloud & Infrastructure

AWS, GCP, HPC clusters, distributed GPU training (PyTorch DDP), Docker, Git, CI/CD for ML pipelines

Data Science Tools

Reproducible ML workflows, scalable data preprocessing and pipeline optimization

Research Interests

Multimodal machine learning, genomic foundation models, self-supervised learning, large-scale model development, application of deep learning to complex biological and scientific data

Awards & Recognition

  • Winner, GW OSPO Open Source Project Award (2024)
  • Winner, Individual Contributor Award (2024)
  • First Prize, Doctoral Presenter in Public Health, George Washington University (2023)

Open-Source Contributions

  • deepBreaks: Developed a Python package for scalable ML pipelines in genotype-phenotype analysis (GitHub & PyPI)
  • Hugging Face Transformers: Enhanced DataCollatorForLanguageModeling with configurable token replacement probabilities to support specialized domain LLM training

Resume

You can download my Resume here: Download CV

A full list of publications is available on Google Scholar.