Microsoft is on a mission to provide the most comprehensive “Intelligent Cloud” on the planet. Significant investments have been made related to the Business Applications domain with the tighter integration towards various Azure Data Platform services such as Data Lake and Synapse Analytics as main themes. But what happens once the data lands in the lake and what patterns are commonly used by our customers and why? This article aims to dissect these questions based on our experience and reference patterns provided by Microsoft.
Essentially, once the data is exported from Dynamics 365 and lands in the Lake, it will be stored as .csv files according to the Common Data Model syntax. At this stage, the data provides very limited value due to lack of headers, relationships etc. Instead, all metadata is stored in related .json files. See an example below showcasing Customer table data exported as a .csv file to an Azure Data Lake container.
Consequently, the question at this stage will be how to create insights from this pile of data?
BI- and Analytics Architecture Patterns
Microsoft outlines three common BI- and Analytics architecture patters in a recent “Scenarios and Architecture Patters” Tech Talk with the intention to provide examples on how to create insights from data:
# 1 – Logical Data Warehouse
# 2 – (Cloud) Data Warehouse
# 3 – Data Lakehouse
While, according to our experience, the Data Warehouse pattern (#2) has been the main approach for most Business Applications clients throughout the last decade, alternative 1 and 3 have received a surge in interest during the last years due to mega trends such as Cloud adoption, Big Data and Artificial Intelligence.
From a tooling perspective, Microsoft caters to the needs of all these three patterns through the comprehensive stack of Azure Data Platform services. However, due to the market interest of the Data Lakehouse patterns we aim to take a closer look at the rationale for adopting this pattern in a Dynamics 365 context.
First of all, let’s clarify what a Data Lakehouse architecture actually is. In the article: “Why a Delta Lakehouse?”, Robert Thompson and Geoff Freeman describes it as:
“A Data Lakehouse is simply a data warehousing principle applied over data in a Data Lake. Data is exposed through delta tables, an open-source protocol for interacting with the data in a lake in an ACID- compliant way”.
This means, instead of loading the data into a central repository such as an SQL database, the data stays in the lake using a flat file structure. Another key difference between the Data Lake and the Data Warehouse is that data lakes store the data without arranging it into any logical relationships which is a key enabler for advanced analytics and AI. See Microsoft illustration below highlighting main components and corresponding Azure Data Platform services:
Key concepts of this pattern are described below:
Ingest – Extract data from on-prem and cloud data sources and load into the Data Lake
Enrich, Transform – Key component of the Data Lakehouse architecture. Here, data is divided into different zones depending on to what extent the data has been shaped to cater for end-user needs. Typically, three “zones” are used:
- Bronze – Landing zone for all data ingested into the Data Lake. Either stored as-is for batch patterns or as aggregated datasets for streaming workloads
- Silver – Filtered and enriched data for exploration purposes according to business needs. Stores the latest state of the data.
- Gold – Curated, well-structured data compliant with the organizations BI-tools and AI-algorithms. Moreover, this zone could also be used for feeding a traditional Data Warehouse during a transition period.
Visualize – Exposes the data in a way that make sense to the end-users through reports and dashboards
Features and Tools – Related to Microsoft and their investments to provide a seamless analytics experience for Dynamics 365 customers. Examples being the native service for automatically exporting Dynamics 365 data to the Data Lake in near-real time. Another example is the CDMUtil-tool for materializing views based on Dynamics 365 data using Synapse Analytics according and the Common Data Model schema.
The data lakehouse pattern is a relatively new concept in the data analytics scene. It was first introduced in 2017 in relation to the Snowflake platform. In 2020 the term became widely known when adopted by Databricks for its Delta Lake platform.
Benefits and Limitations with the Data Lakehouse patters
As always there are tradeoffs to be made when choosing between different BI- and Analytics patterns. Below is a list of examples based on our experience.
- Cost efficient cloud storage solution for Big Data scenarios
- Flexible solution since structured, semi-structured and un-structured data could be managed
- Strong support for Data science and Machine Learning experimentation
- Risk of creating a “data swamp” rather than a Data Lake if proper governance is not established
- Good fit for cloud environments but not for on-prem due to main concept of separating compute and storage
- Considered as a relatively new (immature) technology and less adopted by organizations which could require additional training and technology assessment activities
Success factors during implementation
Whether you want to modernize legacy systems to support advanced analytics scenarios or build a new BI- and Analytics solution from scratch, there are key considerations to be evaluated.
First, ensure the right mix of competence. Dynamics 365 specialists for business process- and data model understanding including how to configure the native export of data to the Data Lake. BI- and Analytics specialist to realize your target architecture. If using partners, this typically involves onboarding separate niche players for Dynamics 365- and Analytics competence.
Secondly, familiarize yourself with the core concepts and available standard capability. Preferably, by executing a Proof-of-Concept in a Sandbox environment where a simple hypothesis is validated. For example, export Dynamics 365 data to the lake and then apply a compute service such as Apache Spark by creating a Notebook to curate the data and then finally visualize insights in a report using Power BI.
Finally, ask yourself why you need a data storage solution? Is it for regulatory reporting, business intelligence, real-time analytics and/or data science? Ensure that your analytics needs are well defined to ensure you are targeting the correct personas across your organization. Considering Dynamics 365 F&SCM, the workload ships with a vast number of reporting and BI-capabilities to cater for various needs. Exporting the data to a data lake for further processing is only one available option. Understanding which capabilities that is available out-of-the-box will ensure you maximize the value of your existing Dynamics 365 investments.
Would you like to know more, contact us at: firstname.lastname@example.org