Connect Big Data Architecture with SAP

By Elaina Shekhter, CMO, EPAM Systems

One of the top technology trends impacting information infrastructures is Big Data. Even before the rise of connected devices, the volume of data collected has been doubling every two years and is expected to reach 44 zettabytes, or 44 trillion gigabytes, by 2020. Given the pace of evolution in Big Data platforms, the challenge for CIOs, beyond the initial investment, is to optimize the utility of these new tools by combining them with traditional data capture and analysis tools.

PROVIDING DEEPER BUSINESS INSIGHTS
SAP Big Data solutions enable connections between existing platforms and emerging Big Data tools, but connectivity and access continue to present real-time challenges. SAP’s in-memory platform HANA addresses one very important aspect of Big Data – fast access to predictive analytics and insights with the required real-time speeds. However, in-memory storage comes with a higher price tag, which cannot always be justified.

A hybrid approach utilizing the native competencies of SAP and its superior integration capabilities presents an attractive alternative that delivers a balance of speed, performance and cost. Best practices in establishing a successful Connected Reference Architecture include starting with a zero-based cost model and building modularization and extension capabilities up-front. In addition, the need to manage and pay for multiple servers should be built into the cost model and a proper decision tree established to map the optimal tool to a key business challenge. This cooperative approach allows the architecture to take advantage of the typical Big Data framework and tools, while at the same time integrating the SAP Landscape already in place within many larger enterprises.

Most cost models will include the following:

  1. A classic Big Data solution (based on Hadoop or Sybase IQ) – for Big Data information gathering and storage. Configurations will vary and depend on the flavor of Hadoop, but will include Core File System, Data Access, Algorithm support and Data Import functions.
  2. A streaming processor (SAP Hana Smart Data Streaming) – for data collection, alerts monitoring, and information aggregation.
  3. An ETL solution (SAP Business Object Data Services) – for data transformation.
  4. An In-Memory solution (SAP HANA)– for data slice collection and aggregation.
  5. A Predictive Analytics solution (SAP Predictive Analysis, using SAP HANA– Column Store Database and Predictive Analysis Library).
  6. Visualization Tools (SAP BO WEBI and SAP Lumira) for exploration and display.

CONCEPT IN ACTION
One of EPAM’s customers, a manufacturer of machine tools, recently faced a Big Data challenge. Embedded machine sensors generate billions of data sets that must be captured, curated, analyzed, searched and stored in order to be actionable. Separately, the manufacturing plants run SAP ERPPM to manage all maintenance activities, and the collection of device data is independent from operational applications, creating an opportunity to address a ‘blind spot’ for the business.

A two phased approach was used to plan and implement a Connected Architecture for the manufacturing plant. In Phase One, data replication was enabled(or transformation with ETL) from SAP ERP to SAP HANA and, in Phase Two, SAP HANA Smart Data Streaming was set up. In the latest version of SAP, data replication to HANA is already enabled, so a full phase could be condensed into a short verification work stream. With SAP HANA Smart Data Streaming, data from sensors is collected, aggregated and uploaded into SAP HANA. Once implemented, the new integrated platform enabled a range of data exploration, reporting and real-time high performance analysis. Additional benefits were achieved with predictive analytics and SAP PA, allowing the operations managers to predict, model and plan maintenance needs – optimizing maintenance operations, performance and costs.

Truly integrated Big Data is still at the planning stages in most organizations, but, as businesses continue the path forward into fully digital operational models, the need for handling complex streams of device data and for connecting the architectures of existing platforms will only increase. Combining established platforms like SAP with new best-practice architectures can help companies’ measure new things in new ways to mitigate some of the risks of adoption and establish scalable and upgradable architectures in the process.