Job Description
The DaaSteam’s responsibility is to design, implement, and operate this platform using cutting edgetechnologies such as Spark, Hudi, Delta Lake, Scala, and AWS suite of data tools.We are looking for talented Data Engineers to join our team and help us scale our platform across theorganizations.
Main Responsibilities
- Design, develop, and maintain scalable data ingestion pipelines using AWS Glue, StepFunctions, Lambda, and Terraform.
- Optimize and manage large scale data pipelines to ensure high performance, reliability, andefficiency.
- Implement data processing workflows using Hudi, Delta Lake, Spark, and Scala. Maintain and enhance Lakeformation and Glue Data Catalog for effective data management and discovery.
- Collaborate with cross-functional teams to ensure seamless data flow and integration across the organization. Implement best practices for observability, data governance, security, and compliance.
Qualifications
- 6+ years experience as a Data Engineer or in a similar role. Hands-on experience with Apache Hudi, Delta Lake, Spark, and Scala.
- Experience designing, building, and operating a DataLake or Data Warehouse. Knowledge of Data Orchestration tools such as Airflow, Dagster, Prefect. Strong expertise in AWS services, including Glue, Step Functions, Lambda, and EMR.
- Familiarity with change data capture tools like Canal, Debezium, and Maxwell. Experience with data warehousing tools like AWS Athena, BigQuery, Databricks. Experience in at least one primary language (e.g. Scala, Python, Java) and SQL (anyvariant).
- Experience with data cataloging and metadata management using AWS Glue Data Catalog, Lakeformation, or Unity Catalog. Proficiency in Terraform for infrastructure as code (IaC).
- Strong problem-solving skills and ability to troubleshoot complex data issues. Excellent communication and collaboration skills. Ability to work in a fast-paced, dynamic environment and manage multiple tasks simultaneously.