BigData Architecture for the Business Warehouse

A cloud-native data-architecture based on OpenSource technology get's SAP BW going.

SparrowBI is a storage engine for SAP Business Warehouse developed for modern business intelligence and BigData. It uses best of breed OpenSource technology like Apache Hadoop and Spark. We take responsiblity for administration and operation.

SparrowBI provides in-memory performance for the price of an data archival solution.. A deep integration into SAP-BW using standard interfaces avoids re-development or introduction of new technology. Die execution layer leverages modern, massiv parallel in-memory technology based on Apache Spark. Due to hosted operation SparrowBI scales on demand. SparrowBI relies for storage on file-based columnar databases, which are persisted in a performant and redundant IO-architecture. Thereby SparrowBI provides enterprise-ready data-security and governance to safely store and report company data.

Overview of SparrowBI components

Integration into SAP Business Warehouse

SparrowBI easily introduces BigData into the company, without making it necessary to rearchitect or rebuild your systems. This is possible due to the seamless integration into the existing SAP Business Warehouse. We rely on standard interfaces. which are well known and used by BW experts.

NLS Interface

Use the nearline interface to archive cubes. This safes storage costs in on premise systems while increasing query performance significantly.

Learn more

Virtual Cubes

Normally when using archived cubes, the SAP BW has to handle Navigation-Attributes which has negative impact on performance. By using virtual cubes provided by SparrowBI one can mitigate the performance impact and reach similar performance as HANA systems.

Learn more

Thank you Apache Spark

Spark is a powerful OpenSource processing-engine which emphasises speed, ease of use and possibily to perform complex analyses.

Thanks to Apache Spark is SparrowBI is blazingly fast and provides endless possiblilites.

Spark is the bigest and fastest growing OpenSource project in data processing. It is backed by companies like Amazon, eBay, IBM, Netflix, Baidu and a lot of others.

Massiv-Parallele In-Memory Verarbeitung

Parallel processing of complex analyses with Apache Spark

Spark prallelizes the processing of a query. This is done by creating an execution plan for a single query. The steps of this plans then get distributed on the worker nodes, such that communication is limited and resource utilization gets maximized.

At the beginning of the query the nodes load just the required data from the file-based column-files. The I/O throughput scales with the number of nodes. Then the intermediate results are kept in-memory to perform filtering and aggregation faster. This means, that for a fraction of the costs of a full-blown in-memory database one gets comparable performance.

File-based columnar Database

Row vw. Column-oriented

For analytical workloads it is beneficial to store the data in a layout, which caters best for filtering and aggregation. At the same time the store should keep costs low facing a big and growing amount of data. On top of that the store should be reliable and easy to manage.

Colum-oriented databeses provide this benefits. They increase I/O throughput, because just the access columns get loaded. Data organized in columns compresses better, and thereby reducing the storage footpring. Further column formats better utilize the vector units of modern CPU and memory hierachies.

SparrowBI uses the open Parquet Format developend and maintained under Apache Software Foundation. It saves the rows in blocks. This blocks are organized in columns and stored with compression. A similar approach is also used by Google BigQuery. Parquet provides the performance of data access of a data-base while showing the benefits of file based storage.