Publications & Research

Tokenization

9

New Submission!

A Comparison of Tokenization Impact in Attention Based and State Space Genomic Language Models

A Comparison of Tokenization Impact in Attention Based and State Space Genomic Language Models

Phoenix: A Prophage Signal Detection Framework for Genomic Language Models

In Progress

Bioinformatics, Machine Learning

A Comparison of Tokenization Impact in Attention Based and State Space Genomic Language Models

Submitted

Pre-Print

Bioinformatics, NLP, Machine Learning

I am credited as a co-author on this paper. I contributed to a large portion of the benchmark results, as well as data collection and primary research.

Read Paper

Exploring the Embedding Methods in Genomic Language Models

Published

Journal

Bioinformatics, NLP, Machine Learning

A published version of my thesis abstract on my university-affiliated undergraduate journal. I also presented at the related symposium.

Read Paper

Exploring the Embedding Methods in Genomic Language Models

Published

Tech Report

Bioinformatics, NLP, Machine Learning

A published version of my full undergraduate thesis, entry UUCS-24-005. I trained an LLM on bacterial DNA and conducted benchmark analyses against state of the art models. All work was completed by me, with the guidance of my mentors.

Read Paper View Code

An Epidemiological Study Quantifying Differences in Thyroid Cancer Risk Across Birth Cohorts and I-131 Exposure Levels

Completed

Competition

Epidemiology, Statistics

A manuscript that I worked on in high school as a final form of our independent research project, in which we prepared and analyzed I-131 exposure and thyroid cancer data on communities exposed to nuclear fallout. Self-taught R and conducted statistical analyses using models such as Poisson regression.
Won the qualifiers for and presented at the ISEF 2019 competition in Phoenix, Arizona.

View Poster

Coursework

  • Final

    Analysis of SQuAD Dataset Artifacts

    Natural Language Processing

    This project aimed to analyze and mitigate dataset artifacts in the context of the ELECTRA model and the SQuAD dataset.

    Read
  • Final

    GitHub Copilot: AI Audit

    Responsible AI

    This project attempts audit GitHub Copilot, a cloud-based artificial intelligence tool developed by Microsoft subsidiary GitHub and OpenAI that aims to assist users by autocompleting code.

    Read
  • Final

    Phylogeny of the Green Fluorescent Protein

    Phylogenetics

    Explored the phylogeny of GFP in Cnidaria to investigate possible ancestor character states and determine how different fluorescent proteins behave from a phylogenetic perspective.
    Performed heavy statistical and phylogenetic analysis such as maximum likelihood and bayesian methods.

    Read
  • Final

    Statistical Analysis of Food Deserts

    Data Science, Statistcis

    Collected and analyzed population, income, and grocery store item prices and location data to determine the extent to which food deserts exist in Utah.

    Read

Awards

  • RANGE Undergraduate Research Scholar Designation

    2024

  • International Science and Engineering Fair Finalist

    2019

  • 1st Place University of Utah Science and Engineering Fair

    2019

  • 2nd Place District Science Fair

    2019

  • ASU Walton Sustainability Solutions Initiatives Special Award

    2018

  • 3rd Place University of Utah Science and Engineering Fair

    2018

  • 1st Place District Science Fair

    2018