From data lakes to machine learning platforms

The new challenges of cloud-based data

The data market is evolving very quickly.

Until recently, data engineering platforms were focused on processing information and routing data. Today, more than half of our customers are looking for a global platform that integrates data science layers with data lakes and data pipelines.

The industrialization of tools for Data scientists and data engineers, which help them create models and put them into production, is now a clear market trend.

Data lakes are now an asset whose best practices are well known and mastered. The situation is different in data science, where what could be considered as state-of-the-art is not yet set in stone. On these platforms, our work moves between R&D and industrialization. Our goal is to identify the best practices, apply them within such platforms and support our customers in their implementation. This is a transitional phase before further industrialization.

In five years, platforms for the operation of machine learning will be (ML Ops) automatically deployed on AWS; the procedures will be standardised and there will be no more confusion to how to deal with them.

In this e-book, we give you an overview of the challenges in the field of data processing in 2021:

  • Best Practices for Data Lakes
  • Origin of ML Ops
  • Feedback from our customer experience – Olympique de Marseille case study
  • Use Case Anomaly Detection
  • Focus on the profession of a data scientist