How Munich Re built a data lake fit for all its employees

The German reinsurance giant has built an internal portal giving all employees access to a data lake, with the hope that enterprising employees will create new business models

German reinsurance company Munich Re has built a self-serve portal for employees to access a data lake in the hope that they will unearth innovative new business models.

Speaking at the Dataworks Summit in Berlin this week, Andreas Kohlmaier, head of data engineering at Munich Re said: "The game has changed in the last few years, it is no longer about who has the best experts and knowledge in the company and is more and more about who has access to the right data sources and who has the right technology in place to analyse and crunch that data."

The company set out to enable those experts to make use of data and technology in a more self-serve way, "to do what we have done for more than one hundred years, but better, to find answers to those new risks that come up and maybe to find some great and innovative business models," he added.

Munich Re has more than 40,000 employees and it effectively insures insurance companies against complex or major risks, such as natural catastrophes, large-scale infrastructure projects or emerging risks like cybersecurity incidents.

This means that it has very specialised underwriters who are increasingly reliant on data to help inform decision making.

"Last year's hurricane season was one of the most devastating and expensive ever, so we have experts in the group that really understand weather and know the effects of climate change," Kohlmaier said.

Proof of concept

The first proof of concept Kohlmaier ran was with the reserving team in 2015, which looks after pots of money earmarked for eventual claim payments. The aim was to open up some of that cash that was least likely to be used so that it could be redeployed elsewhere in the business.

As Kohlmaier explained: "Whenever they wanted to tune and train their model it took them forever because they had to take out some sample data, run an iteration of the model on that sample data, then deploy the model to production and finally run it on the complete data set, so one iteration took five to seven days."

By loading that data into an in-memory engine, the staff from the reserving team were able to train and run models direct on the full data set. The resulting work freed up millions of euros for the business.

That's when other people in the organisation sat up and took notice and began to ask: "What happens when we give everyone in the company access to data and technology?"

The solution was a shared data lake which would be accessible to all Munich Re employees that wanted it.

"So we invested in infrastructure and selected the right tools for the data platform, the catalogue and ingested some first data sources and prepared authorisations," Kohlmaier said.

The resulting platform - which is built on open source big data technologies from the vendor Hortonworks - was launched with a short announcement on the company intranet in January 2017.

Kohlmaier anticipated that around 50 people across the organisation would request access to the data lake in total. However, after the first day he had 200 people registered and logged in to the platform.

This caused some confusion within the organisation, who started to ask how it had so many employees interested in a data lake in the first place. After just one week they had picked up 500 users, at which point they they had "a whole new set of challenges," as Kohlmaier put it.

Now they had to prepare for a whole new user base that expected a point-and-click analytics platform. So over the next year they worked on making the platform more accessible to this user base, incorporating technology from self-serve analytics specialists SAS and tweaking the platform to make it more user friendly. Then in October 2017 the Data Lake 2.0 was launched.

Data hunters

To help support these new users Munich Re also created a dedicated data intake team to help identify which data sources to bring to the data lake and to what quality level.

It also developed a team of what they call 'data hunters', tasked with "searching for interesting data sources for your use case both inside and outside the company," as Kohlmaier explained.

"So if you have a good idea and know what you want to do but are missing a piece of data those data hunters go out and help you find and acquire that data, clean it, prepare it and bring it to the data lake so you can use it," he added.

Finally all of this work has naturally led to a comprehensive training programme at Munich Re.

"You will definitely need a dedicated education programme for both data engineering and data science," Kohlmaier advised. The company is aiming to upskill around 2,000 people this year with data engineering, data modelling and analytics skills.

"I have personally no idea what will happen next year when we have 10,000 people using this, but I am pretty sure it will be interesting," he concluded.

Related:
ITWorld DealPost: The best in tech deals and discounts.
Shop Tech Products at Amazon