Software Engineer & Data Engineer passionate about building scalable systems, optimizing data pipelines, and contributing to open-source projects. Currently pursuing Computer Science and Data Science at NYU.
>Experience
Incoming Software Engineer Intern
atTikTok
Bellevue, WA
May 2026 — Aug 2026
- Design and implement real-time and offline data architecture for large-scale recommendation systems
- Build scalable and high-performance streaming Lakehouse systems that power feature pipelines, model training, and real-time inference
- Collaborate with ML platform teams to support PyTorch-based model training workflows and design efficient data formats and access patterns for large-scale samples and features
Data Engineer Intern
atTrepp
New York, NY
May 2025 — Aug 2025
- Implemented Python-based system to handle AWS SQS messages and process 100K+ address records daily, containerized by ECS
- Optimized Kinesis stream ingestion by integrating Hudi with Apache Spark to write to S3, reducing batch sizes by 40%
- Decommissioned dependency on third-party ESRI resolution service by prioritizing in-house Property Search API, resulting in ~70% match rate and $120K/year cost savings on a 15M record backlog
- Setup 20+ AWS Step Functions orchestrating Glue Crawlers and Table creation, enabling Athena queries and QuickSight dashboards on new S3 datasets
Open Source Software Engineer
atGoogle Summer of Code - SQLancer
Remote
June 2025 — Sept 2025
- Improved enterprise database reliability across 5,000+ systems through PostgreSQL v12-v18 testing framework upgrade
- Contributed 20+ JSON features in Java and Scala, improving test coverage for common PostgreSQL database JSON operations
- Architected CI/CD pipelines with GitHub Actions to automate multi-database test workflows (PostgreSQL, ClickHouse, etc.)
- Collaborated with 15+ global open-source contributors via GitHub code reviews, discussions, and documentation updates
Software Engineer Intern
atFlowlytics
New York, NY
Dec 2024 — May 2025
- Built an assessment platform using Python, NGINX, and Docker to auto-scale (1–5 nodes) for 1,000+ concurrent users
- Developed OAuth2 in Python for JWT validation, token refresh, and custom claims mapping across providers
- Delivered a set of RESTful API with Python Flask to provide search functionality for assessment and audit data integrated with the internal search engine, utilizing PostgreSQL as relational storage, and improved efficiency by 25% using database indexing
Data Engineer Intern
atNYU Berkley Center For Entrepreneurship
New York, NY
Sept 2024 — May 2025
- Migrated legacy systems to a modern AWS S3 data lake, achieving 15K cost savings with automated testing
- Built a real-time PySpark & Python ETL pipeline handling 1000+ events/sec using batch intervals and watermarking
- Optimized 2 TB+ relational and document data using Parquet + Snappy compression, reduced query latency by 30s
- Connected to Tableau dashboards reflecting business metrics such as monthly revenue statistics and investor contributions
>Check Out These Websites
>Recent Posts
All PostsTuesday, February 17, 2026
Getting Started with Open Source Contributions Through GSoC
A guide to participating in Google Summer of Code, including timeline insights, community engagement strategies, and tips for writing a strong proposal that gets accepted.
Tuesday, February 17, 2026
Viewstamped Replication Revisited: A Deep Dive into Distributed Consensus
An exploration of the Viewstamped Replication protocol, covering crash fault tolerance, view changes, recovery mechanisms, and practical optimizations for building reliable distributed systems.
Tuesday, February 17, 2026
Ionia: High-Performance Distributed Write-Optimized Key-Value Stores
Exploring the Ionia protocol that achieves high throughput and low latency in distributed WO-KV stores by decoupling scalability from locality, enabling parallel execution and scalable reads.