Clean, well-structured data is one of the biggest barriers to efficiency and cost-reduction programs according to supply chain and logistics organizations. And while it is still far from perfect, cleaning data is getting easier thanks to machine learning. Tools like Adiona's Diagnostics APIs help to improve the traditional Extract-Transform-Load (ETL) process by automatically detecting and sorting data from complex systems like ERPs and TMS. Traditionally, ETL is a highly manual process dominated by tools like Excel. It can take dozens or even hundreds of hours for a skilled employee to work it out how to port a dataset from one tool to another just to test it. And if the testing is unsuccessful, it can feel like time wasted. Even if testing is successful, there is still much more work required for a full integration, such as connecting an ERP with a solution API like ours. Let's take an example, This spreadsheet shows a sample of data directly from an ERP system. Can you tell me what all of the fields mean?
Some of them are a bit obvious to a human, but they may not be evident to an ETL script. You would need to manually select the field and map it to the corresponding field in the tool you want to use. For example, "Delv_Qty" in this context would correspond to "Quantity" in Adiona's system. Additionally, ensuring consistency in quantity units is crucial. For instance, the source field might have two decimal points, but the target field might require an integer.
Moreover, some fields are NOT obvious even to a human! For instance, what is the difference between "Tkt_Location_Lat" and "Delv_Lat"? Which latitude value should be used to specify the delivery location in Adiona and then and how can it be correctly mapped back to the ERP system? A human would need to investigate the context of the data to decide, and there are numerous fields requiring mapping.
This is where Machine Learning(ML) can help. What is machine learning (ML)? In this context, it's a way to use existing data mapping as a training guide and mimic a human's ability to decide things from context. If you give a well-designed ML model enough examples (millions and millions), it can learn how to judge the context of data in a similar way to a human. And of course, much faster!
How does it accomplish this? The diagram below outlines 3 fundamental steps:
1) Analysis - this is where the model parses the data
2) Label - this is where the model uses a variety of algorithms to 'label' each field with the correct data type
3) Normalize - once the data is labeled correctly, a different set of algorithms can transform each field into the format required for the solution system.
Look closely at the diagram. You can see how values like addresses and dates are parsed, labeled, and then transformed into a uniform format so they can be ingested by our system. Notice how the unnecessary data in the address fields is also removed wherever possible.
This technique not only speeds up the demonstration and ROI calculation process for using Adiona's system, but partially automates the integration process. We intend to release it as a standalone product suite in the future that can be used for other solution integrations or data science investigations.