Data Integration

GigaSpaces provides advanced data integration tools with enterprise-grade capabilities that offer simpler integration and faster time-to-value, and scale on-demand to support increasing workloads. The platform integrates data into the host module to be consumed by the data services that are created and maintained over the platform.

GigaSpaces provides out-of-the-box capabilities for:

  • Event-based data ingestion from source data stores; creates and manages data pipelines
  • Data cleansing and validation policies with built-in validation that determine if data is rejected or cleansed
  • Built-in reconciliation mechanisms, to support various recovery and schema change scenarios
  • Monitoring, control and error handling

The data integration tools reduce development overhead by automatically scanning the source schema and metadata, and map them to the GigaSpaces data model. Data sources may be relational databases, no-SQL databases, object stores, file systems, or message brokers. Data may either be structured or semi-structured. Data may either be integrated as a stream or in batches. 

GigaSpaces’ data integration is built over a pluggable connectors framework which provides seamless integrations with third-party and proprietary connectors, and offers continuous enhancements of GigaSpaces’ built-in integration portfolio.

Download the Data Integration Technical Paper

GigaSpaces Accepts Input From Any Data Set, Including:

green-b

Change Data Capture (CDC)primarily for core data that is frequently updated, such as user transactions

Full collections/table updatesBuilt-in change management support for data pipeline definition, including adding a new table without stopping
the stream

globe

Streamsappend data in real time, via a message queue or a bus such as Kafka

globe

Batch updates data is extracted from the source using an ETL process; online updates are executed in incremental batches

data integration diagram

GigaSpaces Accepts Input From Any Data Set, Including:

  • No-code construction of data pipelines

  • Declarative data type mapping, with built-in translators for common sources

  • Declarative definition of data validation policies

  • Initial load for full or selective data

  • Smart and fast recovery mechanisms that reduce potential application downtime to the minimum

  • Built-in change management support for data pipeline definition, including adding a new table without stopping the stream

  • Built-in data and process monitoring

  • Data freshness indicator, indicates the lag between updates in the System or Record and the visibility of the corresponding data in the Space

  • Continues event-based data ingestion

Advanced Data Integration Tools:

green-b

Native Kubernetes Supportsupport for Docker-based processing unit deployments via Kubernetes microservices design patterns

Incremental batchrecommended for integrating slowly changing data

globe

Full integration with OpenShift 4.4+and supported topologies for on-prem, cloud, hybrid and multi-cloud deployments

globe

Check Data FreshnessSmart DIH defines a threshold for data freshness per source table. This is the foundation for services’ awareness of data being stale, allowing the application to provide an informed user experience

globe

Redesigned SQL EngineApache Calcite SQL engine with PostgreSQL wire protocol offers low latency distributed query execution and query optimization

globe

Data gateway client-less PostgreSQL wire protocol guarantees seamless integration and can scale on demand to support increasing workload, integrating with:

  • Business intelligence (BI) tools such as Power BI and Tableau
  • Developer tools such as DBeaver
  • Data Integration tools such as talend

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.