Software Engineer & Data Engineer passionate about building scalable systems, optimizing data pipelines, and contributing to open-source projects. Currently pursuing Computer Science and Data Science at NYU.

>Experience

Incoming Software Engineer Intern

at

@TikTok

Bellevue, WA

May 2026 — Aug 2026

  • Design and implement real-time and offline data architecture for large-scale recommendation systems
  • Build scalable and high-performance streaming Lakehouse systems that power feature pipelines, model training, and real-time inference
  • Collaborate with ML platform teams to support PyTorch-based model training workflows and design efficient data formats and access patterns for large-scale samples and features

Data Engineer Intern

at

@Trepp

New York, NY

May 2025 — Aug 2025

  • Implemented Python-based system to handle AWS SQS messages and process 100K+ address records daily, containerized by ECS
  • Optimized Kinesis stream ingestion by integrating Hudi with Apache Spark to write to S3, reducing batch sizes by 40%
  • Decommissioned dependency on third-party ESRI resolution service by prioritizing in-house Property Search API, resulting in ~70% match rate and $120K/year cost savings on a 15M record backlog
  • Setup 20+ AWS Step Functions orchestrating Glue Crawlers and Table creation, enabling Athena queries and QuickSight dashboards on new S3 datasets
Python
AWS
SQS
ECS
Kinesis
Apache Spark
Hudi
S3
Step Functions
Glue
Athena
QuickSight

Open Source Software Engineer

at

@Google Summer of Code - SQLancer

Remote

June 2025 — Sept 2025

  • Improved enterprise database reliability across 5,000+ systems through PostgreSQL v12-v18 testing framework upgrade
  • Contributed 20+ JSON features in Java and Scala, improving test coverage for common PostgreSQL database JSON operations
  • Architected CI/CD pipelines with GitHub Actions to automate multi-database test workflows (PostgreSQL, ClickHouse, etc.)
  • Collaborated with 15+ global open-source contributors via GitHub code reviews, discussions, and documentation updates
Java
Scala
PostgreSQL
GitHub Actions
CI/CD
ClickHouse

Software Engineer Intern

at

@Flowlytics

New York, NY

Dec 2024 — May 2025

  • Built an assessment platform using Python, NGINX, and Docker to auto-scale (1–5 nodes) for 1,000+ concurrent users
  • Developed OAuth2 in Python for JWT validation, token refresh, and custom claims mapping across providers
  • Delivered a set of RESTful API with Python Flask to provide search functionality for assessment and audit data integrated with the internal search engine, utilizing PostgreSQL as relational storage, and improved efficiency by 25% using database indexing
Python
NGINX
Docker
OAuth2
JWT
Flask
REST API
PostgreSQL

Data Engineer Intern

at

@NYU Berkley Center For Entrepreneurship

New York, NY

Sept 2024 — May 2025

  • Migrated legacy systems to a modern AWS S3 data lake, achieving 15K cost savings with automated testing
  • Built a real-time PySpark & Python ETL pipeline handling 1000+ events/sec using batch intervals and watermarking
  • Optimized 2 TB+ relational and document data using Parquet + Snappy compression, reduced query latency by 30s
  • Connected to Tableau dashboards reflecting business metrics such as monthly revenue statistics and investor contributions
Python
PySpark
AWS
S3
ETL
Parquet
Tableau

>Recent Posts

All Posts