Data Engineer with 6+ years of experience designing scalable ETL pipelines and lakehouse architectures for diverse data types including BIM, geospatial, imagery and IOT across analytics and operational use cases.
Strong background in Python, SQL, Apache Spark, Databricks, Airflow, Docker, GitHub, and cloud environments including Azure and AWS. Experienced working with structured, semi-structured, and unstructured data, collaborating with data scientists and cross-functional teams, and mentoring junior engineers to improve data quality, performance, and reliability across production systems.
Design and maintain scalable ETL pipelines in Databricks and Airflow ingesting BIM, geospatial, and imagery data into a cloud lakehouse across 2000+ projects. Build Neo4J graph applications on IFC/BIM geometry for space intelligence and furniture placement analysis across 100+ active building models.
Supported data quality and monitoring on Kafka and Spark Streaming pipelines, tuned Spark job configurations on high-volume batch workloads, and refined Snowflake reporting models with analytics stakeholders.
Led the migration of an on-premises data warehouse to Snowflake and engineered a real-time processing system with Kafka and Spark Streaming, cutting data latency from 6 hours to 15 minutes. Reduced infrastructure costs by 35% while scaling to 50% more data volume.
Built a centralized customer data platform integrating 10+ sources, established CCPA/GDPR governance policies, and automated Airflow pipelines processing 1TB+ daily. Partnered with data scientists on an ML pipeline for predictive maintenance.
Developed ETL processes integrating sales data from diverse CRM systems and built interactive Tableau dashboards, cutting dashboard load times by 50%.
Built Python/SQL ingestion and ETL scripts consolidating internal operational data into SQL Server, added validation and monitoring checks, and shipped SQL views supporting ad-hoc reporting for business stakeholders.