AWS Data Pipeline Development need AI Software Development
Contact person: AWS Data Pipeline Development
Phone:Show
Email:Show
Location: India
Budget: Recommended by industry experts
Time to start: As soon as possible
Project description:
"I need a production-ready data pipeline that can ingest roughly 5–7 GB of new transactions every day, land them in Amazon S3, transform them with AWS Glue, and store the refined output as partitioned Parquet files ready for analytics.
The pipeline must pull data from three independent sources—multiple MySQL databases, external APIs, and an SFTP drop—then orchestrate every step with Apache Airflow. Resilient scheduling, back-fill capability, and on-failure alerting are essential so that business dashboards in Power BI always have reliable, up-to-date information.
Key expectations
• End-to-end Airflow DAGs (Python) that manage extraction, loading to S3, Glue transformations, and catalog updates
• Glue ETL code that applies basic cleansing, schema validation, and Parquet conversion while keeping data quality top of mind
• Secure, scalable AWS setup (IAM roles, encryption at rest/in transit, sensible partitioning) that others on my team can extend
• Clear, step-by-step documentation covering deployment, scheduling, and recovery procedures
Acceptance criteria
— Data from all three sources is landed daily in S3 and appears in the Glue Data Catalog as Parquet partitions without manual intervention.
— DAG success rate stays above 99 % over a seven-day test window, with automatic retries and notifications.
— A sample Athena query on the final Parquet set returns record counts matching source systems within agreed tolerances.
If you have a proven track record with S3, Glue, Airflow, and Parquet optimisation, let’s talk details and timelines." (client-provided description)
Matched companies (7)

El Codamics

Junkies Coder

HJP Media

April Innovations

Haven Futures

JanakiBhuvi Tech Labs Private Limited
