WidasConcepts mit Big Data Science Architecture for Continuous Technology Transfer from Research to Industry Operations auf der LWA 2015 in Trier
We would be happy to welcome you at the LWA 2015 in Trier, the 7th to 9th October, when our colleague Richard Leibrandt is going to present the following:
Abstract. Big Data without analysis is hardly anything but dead weight. But how to analyse it? Finding algorithms to do so is one of the Data Scientist’s jobs. However, we would like to not only explore our data, but
also automatize the process by building systems that analyse our data for us. A solution should enable research, meet industry demands and enable continuous delivery of technology transfer. For this we need a Big Data Science Architecture. Why?
Because in projects, Big Data and Data Science cannot be handled separately, since they influence each other. For their complexities applies, so to speak, Big Data Science 6 = Big Data + Data Science, but Big Data Science = Big Data Data Science. (Luckily the gain is also a multiplication.)
This complexity boost arises from the clash of the two different worlds of scientific research programming (Data Science) and enterprise software engineering (Big Data). The former thrives on explorative experiments which are often messy, ad hoc and uncertain in their findings. The latter however requires code quality and fail-safe operation. In industrial settings those are achieved with well-defined processes emphasizing access control and automated testing and deployment.
We present a blue print for a Big Data Science Architecture, which we implement in two industry products. It includes data cleaning, feature derivation and machine learning, using Batch and Real-time engines. It spans the entire lifecycle with three environments: Experiments, close-to-life-tests, life-operations, enabling creativity while ensuring fail-safe operation. It takes and needs of all three: data scientists, software engineers and operation administrators into account.
Data can be creatively explored in the experimental environment. Thanks to strict read governance, even if things get messy, no critical systems are endangered. After algorithms are developed, a technology transfer to the test environment takes place, which is built the same as the live-operations environment. There the algorithm is adapted to run in automated operations and tested thoroughly. On acceptance, the algorithms are deployed to live-operations.
Published: 02 September 2015
Keywords: Big Data, Data Science, Architecture, Industrial Challenges, Technology Transfer, Continuous Delivery, Batch-Processing, Real-Time-Processing, Hadoop, Storm, R, Cassandra DB, ElasticSearch, Kafka
c 2015 by the papers authors. Copying permitted only for private and academic purposes. In: R. Bergmann, S. Gorg, G. Muller (Eds.): Proceedings of the LWA 2015 Workshops: KDML, FGWM, IR, and FGDB