Recently the SharePoint Syntex team have provided two pre-built & pre-trained models for invoices and receipts to use with invoices and receipts in your organisation. These form processing models cater for invoices and receipts that are typically common organisational documents that every organisation will likely deal with. These pre-built Syntex models have been super-charged with AI and pre-training using Azure Forms Recognizer technology in the background to extract common invoice/receipt fields.
The idea is you can install these pre-built models directly from a Syntex content centre (see image below). You can then add your own invoices/receipts for the model to analyse and it will pick out common fields and setup extractors for them. You can then modify the model to select the fields to extract rather than starting from scratch.
I imagine the Syntex/Azure team will keep creating pre-built models for other common organisational forms – personally it would be great as a consultant if they did a statement of work (SOW) pre-built model!
Behind the scenes these pre-built Invoice and Receipts Syntex models are using Azure Form Recognizer. Which is a cloud-based Azure Applied AI Service that uses machine-learning models to extract and analyse form fields, text, and tables from your documents. There have been models created by the Azure Form Recognizer team for Invoices and Receipts.
The invoice & receipt models in Azure Forms Recognizer combines powerful Optical Character Recognition (OCR) capabilities with deep learning models to analyse and extract key fields from invoices & receipts. Both invoices and receipts can be of various formats and quality including phone-captured images, scanned documents, handwritten documents/receipts and digital PDFs. The API analyses the text in invoices or receipts then extracts key information such as customer name, merchant name, billing address, due date, and amount due; and then returns these fields ready for extraction with SharePoint Syntex. There are currently a set number of fields that the pre-built model identifies and they can be found in Azure Forms Recognizer documentation (Invoices Fields & Receipts Fields). Syntex pre-built models do not currently use all of these fields and unfortunately I also dont think there is support for extracting table items just yet in an invoice/receipt.
There are other Syntex document understanding models that Microsoft & the community have created that you can install in your tenant today. These show different usage patterns for document understanding models and include training files so you can view and edit to see how others are using Syntex – these can be found on the PnP Syntex Samples repository. The difference between the Invoices/Receipts pre-built models and the PnP models are the pre-built models have been highly trained with Azure Form Recognizer, and are Forms processing models rather than document understanding models and there is currently only a set list of fields that it will automatically extract. The PnP Models have included training files and extractors/classifiers which you can then further adapt and add additional extractors or fine tune existing extractors by adding/refining explanations i.e. Phrase, proximity, regex. They also have to be installed with PnP PowerShell as there is no UI for installing them.
Installing Pre-Built Model: Invoices
I will now test installing and configuring a Invoice pre-built model
To create a pre-built model I need to go to a Syntex Content Centre and then go to the Models page. In this example I will select the Invoice processing prebuilt model.
The model is then created and the first step is to upload a invoice for the pre-built invoice model to analyse.
A new page that I have not seen before is then displayed to add files to analyse with the model – here Image and PDF files can be added with the model. I will add a Invoice file click Add and then click Next.
The pre-built invoice model then automatically scans the document text and matches the text with the pre-built extractors (see below).
You can then confirm each extractor that it has identified by clicking on the highlighted text and then clicking Yes/No. It would be great to add additional extractors that perhaps are unique to type of invoice from a particular company but this is not currently possible.
You can also rename any of the extractors it has identified:
We can then select a Site and Library to add the new model to.
I can then go to the library where the model has been added then add some Invoice files and once processed I can see all Invoice metadata that has been extracted and it looks to be good valid data.
Then if I go to List Settings and view the Invoice content type (which is the content type used for my model I just created) I can see the fields have been setup as per my model and different types of fields are used i.e. Date and Time, Number. Previously in other Syntex models when extractors were created manually by default they created with single line of text field type. This made list formatting tricky i.e. sorting by high to low numbers is not possible when the column type is single line of text and not number.
I can then view the model in the Syntex content centre and then can see this page looks similar to the document understanding model pages except I am unable to add my own extractors/explanations. This would be a great feature so I could adapt the model to pick up extra fields.
I’m now going to run through this same process for Invoices but this time set a pre-built invoice model with an invoice I’ve recently got from Ikea for some furniture to see how it handles other invoices. Below is the Invoice uploaded and analysed for possible extractors by the pre-built model. You can see the fields it has identified but also missed or incorrectly guessed i.e. the CustomerName it identified for me is “Leon Armston Apartment”.
There are some extra United Kingdom specific fields on the invoice i.e. Value Added Tax (VAT) amount that the model has not identified.
It would be great to have some fine-grained control to further adjust this model to add additional extractors and train the pre-built model even further for specific invoice formats.
Installing Pre-Built Model: Receipts
I am now going to use install and configure the pre-built receipt model and test it with some receipts I’ve collected. The idea with this pre-built model is after configuring you could apply it to a library and perhaps have all your staff members drop their receipts into the library and it will extract the key metadata of the receipts to save them filling in the information.
I wont go fully through the setup like I did previously with invoices as it is the same process but I will highlight what is different & notable with Receipts. I created a model based of the model type Receipt processing prebuilt. I then uploaded some sample receipts and decided on the pre-built extractors I wanted to use to extract metadata from the receipts.
I then applied the model after it had been configured to a library, uploaded eight various receipts in different formats and the results was very impressive. See image below with all the key receipt metadata fields extracted. I’m dreaming of the automation or reporting options that are now available now we have the key metadata. Wouldn’t it be great to trigger a flow in Power Automate when a model is applied to a file, the flow would then connect to an Expenses system API and auto upload expense entries! I would love this!!
My only downside it seems in the receipts model that the Total & Subtotal fields instead of being number like they should be from the Model are Single Line of text. Which makes it trickier to do sorting by value in the SharePoint list etc.
I’m excited about this update and looking forward to further models being added. The pre-built models using Azure Forms Recognizer seem very accurate due to the advanced OCR recognition, require hardly any configuration and is a very helpful start with common Invoices and Receipts.
In Invoices there seems to be a lot of variability with invoice fields between different company’s invoices etc, so often fields are named differently or the format is different. There is also some differences between invoices between the UK and US and I think the pre-built models initially may target the US as it’s the bigger market. That said it did a good job when I threw lots of different UK invoices at the model.
Receipts seems very useful and it appears most receipts are very similar and it seems the prebuilt models seem to cope very well with the wide range of invoices I’ve thrown at it. This seems at the moment to be the best pre-built model out of the two and I will definitely be looking to build an expenses solution using this pre-built model. Staff members could upload images/scans of their receipts into a library and they would then be processed automatically by the pre-built model saving a huge amount of data inputting.
My only downsides with the pre-built models in their current form is that you can only select which included pre-defined extractors you wish to use and then rename them, there is unfortunately not yet a concept of adjusting existing extractors with explanations or adding new extractors
Looking forward to using these models more with a larger dataset of many different types of invoice/receipts as it seems to work well and could save staff lots of time.