Integrating SAP Data in Snowflake using Azure Data Factory | A Fast & Easy way to Data Transformation
“Data isn’t the new oil — it’s the new nuclear power.
Hello Members,
This blog aims to demonstrate SAP data replication to the Snowflake platform to build an independent data warehousing scenario.
At almost the end of this year, we are running with strange pandemic times; I am sitting down to write my last blog of the year 2020 that shall guide the resources needed to play with your SAP data using Azure services to build a robust data analytics platform on the Snowflake platform.
As we all know, Santa Clause runs on SAP, and his North pole’s supply chain data is growing every year. And the Elfs IT team started exploring Industry 4.0, Machine learning, and tractions for serverless services, including high-performance data warehouses like a Snowflake, Azure Synapse with fully automated, zero administration and combined with a data lake so that they could focus on increasing Line of Business(LoB) on SAP S/4HANA without any trouble.
Sounds very interesting…Let’s keep scrolling to see what’s next!
1. Preamble
According to Gartner(my favorite one!), the public cloud services market continues to grow, largely due to modern applications and workloads' data demands. And data is one of the leading factors in this transition. In recent years, organizations have struggled with processing big data, sets of data large enough to overwhelm commercially available computing systems. (Source: Gartner news)
We know that all Data engineers & Data scientists love to use SQL to solve all kinds of data problems, and it gives them a full grip to manipulate the data views. To get insights into this data, you’d extract and load the data from various sources into a data warehouse or data lake. To connect or build what can sometimes be fairly complex ETL/ELT data pipelines, enterprises prefer to use modern tools like Azure Data Factory, Talend, Stitch, Qlik, and many more… Depending on your architecture and data requirements, you might choose one or multiple ETL/ELT tools for your use case. I’m going to leverage my favorite Azure Service — Azure Data Factory(ADF) — Which is Microsoft’s fully managed ‘serverless data integration tool.
It allows developers to build ETL/ELT data processes called pipelines, with drag and drop functionality using numerous pre-built activities. There are dozens of native connectors for your data sources and destinations from on-prem file systems and databases, such as Oracle, DB2, and SQL Server to applications such as Dynamics 365, and Salesforce to cloud solutions such as AWS S3, Azure Data Lake Storage, and Azure Synapse. It delivers the most significant value when your data pipelines provide the fuel to power analytics efforts. We know that without data, you have nothing to load into your system and nothing to analyze. Azure Data Factory provides 90+ built-in connectors allowing you to easily integrate with various data stores regardless of the variety of volume, whether they are on-premises or in the cloud. Snowflake connector is the latest one added and available as well. I’m still exploring it!
The aim is to load our SAP data on Snowflake in batches or near real-time option using Azure Data Factory using the plug & play method.
For newbies to the Snowflake, a cloud-based data warehouse solution is offered on all big hyperscalers like Microsoft Azure, AWS & GCP. Once you’ve configured your account and created some tables using your Snowflake account, you most likely have to get data into your data warehouse. But sometimes, you also have to export data from Snowflake to another source, for example providing data for third parties. You always receive a 30-Day Free Tri
al with $400 credit from Snowflake while opening a new trial account to explore and test your data, and it’s a huge advantage for anyone looking to explore for the first time, just like me!
To learn ADF’s overall support on SAP data integration scenario, see SAP data integration using Azure Data Factory whitepaper with a detailed introduction on each SAP connector, comparison, and guidance.
2. High-Level Architecture scenarios on Azure
- SAP data to Snowflake Cloud Data warehouse on Azure
(SAP specific scenario, we going to explore in this blog)
SAP Data >Azure Blog Storage >Snowflake
using SAP Table Connector on
Azure Data Factory(ADF)
- Example of ERP/CRM Order processing and Social data landing on Snowflake on Azure with integrated dashboards to leverage
Source: Snowflake
- All Structured/unstructured data flow to Snowflake Cloud Data warehouse on Azure
(End to End scenario with 3rd party options to explore!)
Source: Snowflake
3. Getting Started
For this SAP to Snowflake integration scenario, the following Azure services are used: Azure Blob storage, Azure Data Factory (Linked Services, Datasets, and Data flows), and Snowflake account on Azure Cloud
Let’s list the prerequisites in order:
- Access to SAP App Table (ECC, S/4HANA, Netweaver. Minimum SAP_BASIS701 or above)
- Azure Subscription with running blob storage and Azure Data Factory access
- Snowflake Subscription with AccountAdmin ($400 credit for a new account w/o credit card!)
- Windows Remote desktop with admin access (It’s optional, I used my laptop!)
- 64-bit SAP Connector for Microsoft .NET 3.0 from SAP’s website
3. Steps
SAP Table Connector utility leveraged to fetch the SAP table in this run demonstration.
Source: Microsoft
SAP table should be available based on prerequisites to run this scenario. Now a quick check on SAP Table count and details on SAP S/4HANA before running our main steps.
Table Name: ZSAPSALESDATA with 100K record created manually using dummy sales dataset.
(I leveraged a custom ABAP FM/Z-program to upload the large sales dummy data. A bit of ABAP Skills helped!, let me know in the comments section I can provide inputs if required)
1.Login to your Azure Portal and open Azure Data Factory, and create a new pipeline
2. Right-click and create under the “Pipelines” section, provide the name: Copy_SAPTable2Blob of the pipeline, as shown in the picture below.
3. Create a new folder: SAP2csv2Snowflake under the Datasets section, and right-click on it, and create a new dataset: Azure Blob Storage as shown below screenshots.
4. It will appear on the right panel of Portal for selection. Select Azure Blog Storage and then subsequently pick up the CSV format type for the dataset.
Select DelimitedText CSV as shown below and click the “Continue” button.
Snowflake reads data in certain formats like CSV, Excel in general.
5. Set the properties name of the CSV file “DelimitedText2sap” and link the services by clicking the +New option.
6. Select “Azure” under Integration runtime setup and click on Create. This step required to install .Net Connector for SAP on your local machine or Remote desktop (from Azure Portal)
Link to download with full instructions — https://support.sap.com/en/product/connectors/msnet.html
7. Provide Integration runtime name and other options as shown below
8. Connect Blob storage using the SAS URI Authentication method. (This is required for Snowflake)
9. Connection established for Blob Storage for CSV file.
10. Create New SAP dataset for SAP Table Connection
11. Click on “+New” Linked Connection and fill in SAP login details and run “Test Connection” and connection established with SAP application now.
12. Now check the “test connection” and Table — “Preview data” to ensure SAP Connection and data is visible.
SAP Table — ZSAPSALESDATA contains 100K records for the initial run, then we’ll trigger 1 Million records as the second round of demo.
(Imp: use SAP Table name always in upper case else it might not be able to fetch data from SAP App)
13. Login to your snowflake trial account and create the SAP table structure below the SQL Script.
CREATE TABLE "SAPDATABASE"."PUBLIC"."SAPSALESDATA" ("ORDERID" STRING NOT NULL, "REGION" STRING NOT NULL, "COUNTRY" STRING NOT NULL, "ITEMTYPE" STRING NOT NULL, "SALESCHANNEL" STRING NOT NULL, "ORDERPRIORITY" STRING NOT NULL, "ORDERDATE" STRING NOT NULL, "SHIPDATE" STRING NOT NULL, "UNITSSOLD" STRING NOT NULL, "UNITPRICE" STRING NOT NULL, "UNITCOST" STRING NOT NULL, "TOTALREVENUE" STRING NOT NULL, "TOTALCOST" STRING NOT NULL, "TOTALPROFIT" STRING NOT NULL) COMMENT = 'SAP sales data in Snowflake';
Check Table structure created successfully.
14. Click on Snowflake dataset — Verify by Test connection and preview data; you can see the table's structure in preview data as shown below.
15. All 3 datasets are created successfully, and time to publish all.
Click on Publish All button on the header. And you will be receiving the notification of activation.
15. Click on Trigger now after Publishing is completed. And monitor the load from SAP Table to Azure Blog Storage.
16. Go to Monitor on the left panel and select the Details link to capture the real-time loading.
17. Monitor the flow of SAP Table in progress!
18. Once SAP Table is loaded into Blog Storage; Verify the details in blog storage if the CSV/txt file is available in Blob Storage, just a quick verification/checkpoint!
19.Once Blog storage is connected to Snowflake, and you can trigger a pipeline using snowflake as a sink. (created on step #14 earlier!)
20. Verify data in snowflake account, the table created and data loaded
21. Final data preview on Snowflake account is visible with the same row count
SAP Sales table data transfer accomplished!!
Now we can manipulate the data to the next level based on limitless analytics on Snowflake data-warehousing and Power BI dashboarding business requirements.
22.Sales & Region Dashboard created using Power BI Platform. It is a powerful dashboarding tool to exact the data from various sources like Snowflake data warehousing, Salesforce, SAP, etc., including social data to create a beautiful sales trend analysis report for Business stakeholders.
With Power BI Desktop, you can connect to data from many different sources.
For a full list of available data sources, see Power BI data sources.
Thank you for reading!
A Very Happy New Year 2021
Reference and Sources
- Azure Synapse vs. Snowflake — Good Comparison blog by Jagjeet Makhija, Snowflake Expert
- ADF now supports data integration with Snowflake — Blog by Linda Wang, Microsoft
- Microsoft documentation articles
- What is Azure Data Factory? — Microsoft documentation
- Azure Data Factory Youtube official Channel
- Integrating your SAP data by using Azure Data Factory (ADF) by Ross Loforte, Microsoft
Thanks to Jagjeet Makhija for his valuable contribution to Snowflake & PowerBI technical inputs.
Disclaimer
These are my personal opinions and thoughts. This does not represent any formal opinions, POVs, inputs, product road-maps, etc., from my current or past employers or partners or/and any Clients.