Enabling the big data pipeline lifecycle on the computing continuum.
Project name and Acronym
DATACLOUD - Enabling The Big Data Pipeline Lifecycle on the Computing Continuum
Bosch | Ceramica Catalano | iExec | JOT | Tell.u | UBITECH | Universitat Klagenfurt | Sapienza Università di Roma | KTH Royal Institute of Technology
Total Eligible Costs (€)
EU Grant Amount (€)
To develop a software toolbox comprising of new languages, methods, infrastructures, and software prototypes for discovering, simulating, deploying, and adapting Big Data pipelines on heterogeneous and untrusted resources in a manner that makes execution of Big Data pipelines traceable, trustable, manageable, analyzable, and optimizable. DataCloud separates the design from the run-time aspects of their deployment, thus empowering domain experts to take an active part in their definition. The aim of the toolbox is to lower the technological entry barriers to the incorporation of Big Data pipelines in organizations' business processes and make them accessible to a wider set of stakeholders (such as start-ups and SMEs) regardless of the hardware infrastructure.
Objective 1: Big Data pipelines discovery: To develop techniques for discovering Big Data pipelines from various data sources, featuring the use of AI-based and process mining algorithms using data-driven discovery approaches for learning their structure.
Objective 2: Big Data pipelines definition: To develop a domain-specific language (DSL) for Big Data pipelines featuring an abstraction level suitable for pure data processing, which realizes pipeline specifications using instances of a predefined set of scalable and composable container templates (corresponding to step types in pipelines).
Objective 3: Big Data pipelines simulation: To develop a novel Big Data pipeline simulation framework for determining the “best” deployment scenario by evaluating the performance of individual steps in a sandboxed environment and varying different aspects of input data and step parameters.
Objective 4: Blockchain-based resources provisioning for Big Data pipelines: To develop a blockchain-based resource marketplace for securely provisioning, for any given Big Data pipeline, a set of (trusted and untrusted) resources (Cloud, Edge, Fog), ensuring privacy and security of data and pipelines executions.
Objective 5: Flexible and automated deployment of Big Data pipelines: To develop a deployment framework for data pipelines specifications, featuring secure, adaptable, elastic, scalable, and resilient resource deployment and taking into account Quality of Service (QoS) requirements and Key Performance Indicators (KPIs) for pipelines and resources.
Objective 6: Adaptive, interoperable Fog/Cloud/Edge resource provisioning for execution of Big Data pipelines: To develop algorithms for optimized runtime provisioning of resources made available on the marketplace on the Computing Continuum (Cloud, Edge, Fog), facilitating omnidirectional data drifts among the data pipelines.
The work in DataCloud is organized into eight main work packages:
Work Package 1: Requirements analysis and architecture design.
Work Package 2: Big Data pipeline discovery.
Work Package 3: Big Data pipelines definition and simulation.
Work Package 4: Blockchain-based decentralized resource marketplace.
Work Package 5: Adaptative resource provisioning and orchestration.
Work Package 6: Deployment, testing, integration, and validation.
Work Package 7: Exploitation, dissemination, and communication.
Work Package 8: Project management.