Data Engineer (Data Pipelines & RAG)

Hyred•Vietnam•🌍 Remote

Full-timeMid Level

👁️ 4 views•📝 0 applications•Posted 6/17/2026•Expires 7/18/2026

Tailor Resume for This Job Check ATS Score View Original Posting ↗

Job Description

Our client is a fast growing Property Tech AI company About the role They are seeking a versatile Data & AI Engineer to build, deploy & maintain end-to-end data pipelines for downstream Gen AI applications. You'll design data models and transformations, build scalable ETL/ELT workflows, while learning fast and working on the AI agent space. Key Responsibilities Data Modeling & Pipeline development Automate data ingestion from diverse sources (Databases, APIs, files, Sharepoint/ document management tools, URLs). Most files are expected to be unstructured documents with different file formats, tables, charts, process flows, schedules, construction layouts/drawings, etc. Own chunking strategy, embedding, indexing all unstructured & structured data for efficient retrieval by downstream RAG/agent systems Build, test, and maintain robust ETL/ELT workflows using Spark (batch & streaming) Define and implement logical/physical data models and schemas. Develop schema mapping and data dictionary artifacts for cross-system consistency Gen AI Integration Instrument data pipelines to surface real-time context into LLM prompts Implement prompt engineering and RAG for varied workflows within the RE/Construction industry vertical Observability & Governance Implement monitoring, alerting, and logging (data quality, latency, errors) Apply access controls and data privacy safeguards (e.g., Unity Catalog, IAM) CI/CD & Automation Develop automated testing, versioning, and deployment (Azure DevOps, GitHub Actions, Prefect/Airflow) Maintain reproducible environments with infrastructure as code (Terraform, ARM templates) Required Skills & Experience 5 years in Data Engineering or similar role, with at least 12-24 months of exposure to building pipelines for unstructured data extraction including document processing with OCR, cloud-native solutions and chunking, indexing etc. for downstream consumption by RAG/ Gen AI applications. Proficiency in Python, dlt for ETL/ELT pipeline, duckDB or eq