Software Engineer & Data Engineer passionate about building scalable systems, optimizing data pipelines, and contributing to open-source projects. Currently pursuing Computer Science and Data Science at NYU.
>Experience
Incoming Software Engineer Intern
at@TikTok
Bellevue, WA
May 2026 — Aug 2026
- Design and implement real-time and offline data architecture for large-scale recommendation systems
- Build scalable and high-performance streaming Lakehouse systems that power feature pipelines, model training, and real-time inference
- Collaborate with ML platform teams to support PyTorch-based model training workflows and design efficient data formats and access patterns for large-scale samples and features
Data Engineer Intern
at@Trepp
New York, NY
May 2025 — Aug 2025
- Implemented Python-based system to handle AWS SQS messages and process 100K+ address records daily, containerized by ECS
- Optimized Kinesis stream ingestion by integrating Hudi with Apache Spark to write to S3, reducing batch sizes by 40%
- Decommissioned dependency on third-party ESRI resolution service by prioritizing in-house Property Search API, resulting in ~70% match rate and $120K/year cost savings on a 15M record backlog
- Setup 20+ AWS Step Functions orchestrating Glue Crawlers and Table creation, enabling Athena queries and QuickSight dashboards on new S3 datasets
Open Source Software Engineer
at@Google Summer of Code - SQLancer
Remote
June 2025 — Sept 2025
- Improved enterprise database reliability across 5,000+ systems through PostgreSQL v12-v18 testing framework upgrade
- Contributed 20+ JSON features in Java and Scala, improving test coverage for common PostgreSQL database JSON operations
- Architected CI/CD pipelines with GitHub Actions to automate multi-database test workflows (PostgreSQL, ClickHouse, etc.)
- Collaborated with 15+ global open-source contributors via GitHub code reviews, discussions, and documentation updates
Software Engineer Intern
at@Flowlytics
New York, NY
Dec 2024 — May 2025
- Built an assessment platform using Python, NGINX, and Docker to auto-scale (1–5 nodes) for 1,000+ concurrent users
- Developed OAuth2 in Python for JWT validation, token refresh, and custom claims mapping across providers
- Delivered a set of RESTful API with Python Flask to provide search functionality for assessment and audit data integrated with the internal search engine, utilizing PostgreSQL as relational storage, and improved efficiency by 25% using database indexing
Data Engineer Intern
at@NYU Berkley Center For Entrepreneurship
New York, NY
Sept 2024 — May 2025
- Migrated legacy systems to a modern AWS S3 data lake, achieving 15K cost savings with automated testing
- Built a real-time PySpark & Python ETL pipeline handling 1000+ events/sec using batch intervals and watermarking
- Optimized 2 TB+ relational and document data using Parquet + Snappy compression, reduced query latency by 30s
- Connected to Tableau dashboards reflecting business metrics such as monthly revenue statistics and investor contributions
>Recent Posts
All PostsMonday, January 19, 2026
Getting Started with Open Source Contributions Through GSoC
A guide to participating in Google Summer of Code, including timeline insights, community engagement strategies, and tips for writing a strong proposal that gets accepted.
Wednesday, January 15, 2025
Building Enterprise Data Pipelines: The Medallion Architecture in Fintech
Exploring how Trepp implements the Medallion Architecture (Bronze, Silver, Gold) using AWS Step Functions, Lambda, Apache Spark, and Apache Hudi to process commercial real estate data.