Ask an Expert Season 2 Episode 1
Xilin Cheng and Kathleen Taggart from DataXstream discuss building Intelligent Automation (IA) and the advantages of using SAP Data Intelligence for MLOps. The team goes over the problem wholesale distributor’s face with trying to fulfill large quotes and orders accurately and efficiently. Traditionally, this required Customer Service Representatives (CSRs) who understood SAP GUI and had years of industry experience spending hours inputing data. With Intelligent Automation, optical character recognition (OCR) is used to read text into the machine learning models built by Xilin. He discusses the natural language processing (NLP) techniques used in addition to the advantages of developing machine learning pipelines in SAP Data Intelligence. Kathleen and Xilin then wrap up with the future of Intelligent Automation and the potential of using machine learning to win more deals and improve customer buying experience.
Hi and welcome! Thanks for joining us today! We’re here with Xilin Cheng and myself, the host, Kathleen Taggart from DataXstream we will be going through our current project Intelligent Order Creation and how we were roped into the SAP Data Intelligence Content Sprint and how we’ve been able to collaborate with them. Those at SAP and those other partners working SAP Data Intelligence to come up with some really interesting innovative solutions on using the DI platform to better serve our customers. So Xilin how about you introduce yourself and give us a little bit of your background. Hi everyone, my name is Xilin Chung I’m a software developer at DataXstream and I’m also working on task as a machine learning engineer and basically, I build the infrastructure for DataXstream’s testing framework too but today we’re going to focus on the work we did around SAP Data Intelligence and the project we are working on. Kat is going to give you guys more details about that.
Awesome, thanks Xilin, as he mentioned he’s a software developer at DataXstream. We’ve known each other since college, RAH! got your ring on? And then both came over to DataXstream right after graduation to join the machine learning development team since then I’ve transitioned into marketing and I’m now focusing on the technical marketing around data intelligence and robotic process automation. So I guess we’ll just start with why we’re here today can you give us a quick overview, the high level overview, of what intelligent order creation is in the first place and what it does for our customers how did we even start this project?
Yeah absolutely, so this entire project started with the very basic business question, so suppose the client sends a purchase order in PDF files that contain 100 different materials and different quantities, for each material, so it usually takes people hours to input and search for those materials in SAP databases one by one and try to then form an SAP order from there. This is a long process, very labor intensive, and lots of repetitive work and human error is usually a big influencing factor for getting this deal done or not during this process. So that is why we’re building this OMS+ Intelligent Order Creation project, basically aims to have AI do all the labor-intensive repetitive work but much faster and much more accurate through using the OCR technology. And we also have successfully integrated with OMS+ and SAP system which makes the project very practical for any organization using SAP. That’s basically the initial intent for starting this project. It is to help our clients, to speed it up, this entire order creation process. Especially when there’s a large amount of orders involving a large amount of materials.
Absolutely, it improves the customer experience when they submit a quote, and they get it back in a couple hours or even within a couple minutes versus having to wait weeks to project how their projects going to go what the forecasting of the budgets going to be. I guess we originally saw this project come up in 2020 when one of our clients was trying to build this for themselves in-house and they asked “you know… could you do this?” and we’re like “Yeah! Absolutely! Are you guys using a machine learning platform or using data intelligence?” and they had started with this classification system and or senior machine learning engineer was able to come in and really build out the algorithms using cosine similarity using OCR to improve the potential of the product that we had going on and so they’ve been very happy with the results, right, and we’re continuing to demo that to them and it’s also lead to other projects too. So, it’s been really exciting to see how that’s evolved.
Definitely, I think this project has a lot of potential to extend into further areas, there’s so much more technology we will be able to integrate into the prototype, this basis, of the project we have already built and I think it should be very flexible to satisfy any client needs when it comes to OCR or order creation or even for the search engines we’re going to be talking about later.
I think one of the key terms here is robotic process automation and that people realize that machine learning can do a lot of business functions that traditionally was done by someone behind the counter. Let those people go out and use their time more wisely to build the customer relationships and let the machine learning and the robotic process automation handle the manual data entry tasks that are so error prone and, yeah we could rave on and on about the potential of the product but let’s segue into how do we use SAP data intelligence.
I know that you know we had been talking with some our contacts SAP and they had been helping us build the product and then said “you know why don’t you guys join our content Sprint? Your project is so innovative and it’s a great example of the potential of SAP DI” why did we use SAP Data Intelligence.
First of all, DataXstream is a partner with SAP so when we had this project requirement we immediately reached out to them like, ‘hey do you guys have any machine learning platform we could use and they said yes we have SAP Data Intelligence’ and they gave us a trial version and we tried it and it’s an excellent machine learning development platform. Basically, a machine learning IDE with all the tools that you need to develop machine learning pipelines but with very straightforward interface there’s no fancy functions or complexity that stops development and makes the developer confused, it’s pretty straightforward and very simple to use and it makes the machine learning development and pipeline building process as easy as possible for machine learning engineer and data scientist like me and my colleagues. Plus it is an SAP product, so it is very very simple to integrate with all the other SAP products. That is essential to machine learning process like S4/HANA or SAP Analytic Cloud. It’s basically just an operator plugging in and using everything that has already been built in and it’s very very easy. That’s why we decide to go with SAP DI not because just are partners but also its excellent platform. And while we’ve been doing this project, SAP DI support team has constantly been giving us help and during this process they’ve been really interested in what we’re doing so they’ve offered us this content sprint opportunity, basically, marketing for our products and through our product they can also market their SAP Data Intelligence platform for more users so it’s like a win-win situation. And we are both happy about it, so I think the content sprint has been a great event for getting our stuff out, exposed to more audience, and the clients.
And it’s been really excellent to see what other partners are doing with SAP Data Intelligence as well. Some of the other SAP content sprint attendants were also Camelot and they’re using it for the supply chain automation/ optimization so it’s really interesting to see how people are getting creative and innovative with the technology that SAP Data Intelligence offers and they’re using the platform to build these fantastic tools that serve very specific customer pain points. And that’s really the beginning of it because after it started our products will continue to grow and again, we were talking about the potential of it, but hopefully as it takes off there’ll be this whole community around SAP Data Intelligence that we can refer to each other we can collaborate with each other on how we’re using it and explore the potential of some of the pipelines and even just machine learning concepts and how we’ve developed that into our products to make them improve. So the Potential is really exciting, and that was key for your role on that the artificial intelligence team here because you were working on the pipelines and you were in SAP DI every day, programming in Python, setting up API’s between our systems, working with the with the data you were getting from SAP backend, can you tell us a little bit more about the work that you did on the project?
Sure, yeah, I am the machine learning engineer of the project. I basically work with my colleagues to design the machine learning workflow, build the pipeline based on the workflow in DI, and manipulate and data mining to form a bunch of nice tables so the output would be useful and better integrated into our backend and frontend systems. I also collect and make the training data set, with you Kat, and use those datasets to train those highly customized machine learning models in order to give us a better result for each specific customer because they’re all unique. There won’t be a general model that could be applied to every client and we also want the client to have the best accuracy and a unique experience. That means the model must be highly customized. So, that is also part of my job. And also manage the integration of the OCR engines into our machine learning pipelines so yeah basically build the entire pipeline. Integrating the OCR pipelines and have it ready to expand in the future.
I know you did a ton, specifically back to that the optical character recognition capabilities that we’ve built into our solution. I know that you did a ton of work on the analysis of how are we going to handle a purchase order that comes in and it’s handwritten, or how are we going to handle a purchase order that comes in the format is different than another, or it may horizontal landscape or vertical landscape because some of these machine learning these OCR AIs, they’re truly looking for pattern recognition and so when you throw it right unique purchase order it can get confused and take it off the rails. So we did a lot of customization on training them to look for very specific types of text and very specific labels. Do you want to speak to any of that?
Yeah, so basically the OCR engine we are using is outsourced. So that OCR engines result we can’t control it. We can only input in, have the OCR running, and return to us the result. So a lot of time because it’s a general engine, a lot of the data like the table data stuff, won’t return 100% successfully because sometimes it depends on the structure of the purchase order. It could be varied. So, it’s up to us to figure out a way to automate it have our machine learning models look into this data to decide, ‘ oh what is this data’ and if this data is in the right place or not. If not, say there’s a material number being labeled as irrelevant data and that will be discarded and our machine learning model will be prevent that, and have the material number actually being labeled as material number and putting it in the correct place. So, when the client forms an inquiry on the frontend, they will have ability to manually check. Instead of this data being completely being discarded and after the data has been passed to the frontend the client also has ability to change the data if they labeled it wrong. Because the machine learning model is not going to be 100% accurate but we do have the feedback system to learn the mistakes with reinforce learning. So, after all that, the model with make a mistake and then will label better overtime with a specific kind of data and the model will become more and more customized specifically to this customer.
Sure, yeah, and I guess that’s one of the advantages to our solution is because we have the advantage of being able to choose which OCR model that we like. So as the market changes, even as new competitors emerge in that market, we’re able to still pick and choose which engines we like, engines are the best which engines perform the best, and that can change and fluctuate as new developments come out onto the come into the OCR market. And then, also as you were talking about, we have the feedback system so not only do we train models too specifically understand the OCR, those tables that are coming in from the OCR pipeline, filter there with the relevance, but we’re also we’ve built the feedback system so that as the clients are using the interface, no matter their skill level, if they’re interacting with are Intelligent Order Creation, if they’re building orders, the models are already learning from their usage and from their performance. ‘On what are the patterns? What’s going well? What things are being corrected and predicted slightly incorrectly? How can the predictions improve?’ And it’s just collecting that log of data so that we can go back and understand truly how it is performing and how do we optimize it for the client.
Right, and that’s actually coming back to the point I was mentioning earlier. That this project has so much potential to expand in the future. And we also, besides this project, we also have done other intelligent the order creation stuff other than that this. So, we have done the material and the customer search engines. So, say, sometimes the material description on the purchase order are not going to be exactly the same as the SAP description in the customer’s database. So, both of those engines will be able to parse the descriptions and the return the materials or the customers that have the highest similarity. And those two engines could also be integrated into the intelligent order creation project we’re working on. Also, the feedback system that we’ve been talking about will also be integrated into the order creation system. Basically, it’s so easy to use in the frontend that someone without any technology background would just simply upload the file and just wait for it to return the result. And it will automatically form an inquiry for you, which is like the edited version of the SAP order where the user would just go manually examine to see if the order is correct or not. That’s huge in reducing the processing time for creating an order and stuff and the machine learning model will get most of the order correct. So, some of the orders, even get even finds the wrong materials in your SAP database using the material search engine, the user will be able to manually correct them. And after the user correct all the stuff our feedback system would automatically capture all this data and have this data sent back to our automated feedback system and those feedback systems will automatically processing this data and form a table and a matrix about a lot of features of this data for us to analyze further and those features will be further analyzed in SAP in analytic cloud. So after the analytic cloud, we will figure out a way to make the model better because every customers’ needs are different and it’s our job to satisfy every customer’s individual needs for the characteristics of the models. So the way of training the model isn’t always the same but with all this data we will be able to figure out what the customer is looking for and improve the models on their performance and eventually we will automate the entire system. The feedback would directly go into training the model and the new model would be automatically be applied to the customer so next time, so then the model won’t make the same mistake it made the last time.
Right, and it’ll just be this continuous cycle, it will be non-disruptive, the updates will push, the models will be improved, the machine learning models will be optimized but the client on the frontend will just continue to work and, all of a sudden, its results are just getting better and better and better in the predictions at some point we’ll have to do very minimal corrections. And even from the start, our correction percentages it’s like 90% correct already. When you look at a purchase order that’s like thousands of line items and it would have taken, even a very very experienced sales representative, days to go through, maybe hours, but still a tremendous amount of human time running through those purchase orders, cross-referencing on the internal system in SAP all the numbers… Yeah it will take, it would take only a few minutes for our intelligent material search to return those results and then it’ll be this continuous cycle of pattern correction and recognition as the models continue to develop and improve. So yeah, there’s just tremendous potential with using SAP DI as your platform and then what are all the fantastic machine learning capabilities that you can build on top of that.
Yeah absolutely, and because usually a client’s database is limited, there’s only certain amount of material the client is selling so whenever the order comes in, the machine learning models won’t have millions of materials to learn or to add into the search index. So yeah, the reinforcement learning process won’t take very long, it won’t be long until the model gets pretty accurate, and it also depends on how the customer forms their descriptions. It might be too ambiguous, that’s where the human power would step in to make sure when machine learning models have… say… three materials, if they find three materials with high similarity and it depends on the human to just select which one is it. And then that feedback, that selection would go back to our feedback system then we’ll know ‘OK, this is what customer imputed and this is what the customer searched for so this is the correct answer and these are the questions and now we wanted the model to learn this is the question this is the answer and the next time, pair them together.’ Basically that’s how it works.
Right, yeah, and it’ll be almost a browser like interface. You just go on and you begin to search for the material or the data, it could be customer data could be material data, but either way the machine learning algorithms will start to pick up on those patterns and provide recommendations so that you don’t have to spend time manually going through numbers and all the specs to try and get the right material. To find the right material, it’s just going to present that to you, and you pick what you need. So, yes, the potential is incredible and it’s going to be super exciting to see what continues to happen with our intelligent automation suite and what our partners continue to develop with SAP data intelligence. We’re looking forward to seeing all the fantastic solutions that are built on this platform in the future if you want learn more about SAP Data Intelligence please check out the SAP Data Intelligence contents print, links are below. You can also come visit us at DataXstream.com where you can follow this podcast series as we begin our ask the expert podcast series to learn more about our products, development process, gain insights from machine learning developers, the data scientists, the product leads the UI Devs, everybody on the whole team. And thank you so much for joining us today! We hope you learn something interesting, and we look forward to collaborating with you in the future. Bye for now! Bye, thank you!
For more information check out SAP’s Data Intelligence Content Sprint.
Or click here to see Intelligent Automation in action!