This Case Shows Specific Expertise. Find the Companies with the Skills Your Project Demands!

You're viewing one of tens of thousands of real cases compiled on Many.dev. Each case demonstrates specific, tangible expertise.

But how do you find the company that possesses the exact skills and experience needed for your project? Forget generic filters!

Our unique AI system allows you to describe your project in your own words and instantly get a list of companies that have already successfully applied that precise expertise in similar projects.

Create a free account to unlock powerful AI-powered search and connect with companies whose expertise directly matches your project's requirements.

Platform Modernization: Migrate ETL Process to Akka Streams for Enhanced Scalability and Efficiency

scalac.io

Advertising & marketing

Inefficient ETL Process with Apache Spark

Tapad's existing ETL process, built on Apache Spark, is facing challenges related to resource management, cost, scalability, and flexibility. The Spark cluster is proving insufficient to handle the growing data volumes and diverse integration requirements with 3rd party services, leading to higher operational costs and potential bottlenecks. The fixed cluster size and memory consumption limitations of Spark are hindering the ability to efficiently process varying data loads and adapt to different 3rd party ingestion requirements.

About the Client

Tapad is an Adtech organization providing data and technology solutions for the digital advertising ecosystem.

Project Objectives

Migrate the data retrieval, processing, and distribution (ETL) process from Apache Spark to Akka Streams.
Reduce overall resource consumption and associated costs.
Improve the efficiency and scalability of data integration with 3rd party services.
Enhance process resilience and error handling capabilities.
Gain greater control over data processing and error management.

Functional Requirements

Real-time data ingestion from the data platform.
Data transformation and processing capabilities.
Parallel processing of data streams.
Flexible data output options (single file or multiple parallel HTTP calls).
Robust error handling and fault tolerance.
Monitoring and logging of data processing pipeline.

Preferred Technologies

Akka Streams

Scala (or other suitable language for Akka Streams)

Data Platform (existing data source)

Integrations Required

Various 3rd party advertising platforms (via HTTP or other APIs)
Data Platform (for data ingestion)

Key Non-Functional Requirements

Scalability: Ability to handle virtually infinite data streams.
Performance: Efficient data processing with minimal latency.
Resilience: System should remain responsive even after component failures.
Resource Efficiency: Minimize resource consumption (CPU, memory).
Security: Secure data handling and transmission.

Expected Business Impact

Successful implementation of this project is expected to result in significant cost savings due to reduced infrastructure needs, improved integration efficiency, faster processing times, enhanced system resilience, and greater control over data processing. This will allow Tapad to focus resources on core business activities and adapt more quickly to evolving market demands.