The Microsoft Syntex AI model type Structured Document Processing model (previously known as Forms Processing) has had some significant updates which were launched as part of Microsoft Syntex. In my opinion the biggest update is Structured Document Processing models now allows models to be reused across libraries and sites. Previously Structured Document Processing models had to be setup individually on one library and there was no possibility to deploy the model to other libraries and sites. So, if you wanted to deploy structured document processing models to many libraries/sites you would need to recreate/train the model from the start in every library.
Structured document processing models uses Microsoft Power Apps AI Builder document processing. This uses machine learning technology to identify and extract key-value pairs and table data from structured or semi-structured documents, such as forms and invoices. There has been some awesome work by the Syntex and the AI Builder team for this integration to make it more native and powerful along with a new model type that’s just arrived freeform & more coming soon.
There have also been some fantastic improvements to the Syntex model creation experience and associated UI. There is a new Syntex model creation UI (see image below) that allows all of the Syntex models to be created from one menu screen. This menu is accessed either centrally from a Syntex Content Centre or locally from a SharePoint document library. To create a Structured Processing model, select the Layout method button.
The whole model creation/modifying experience for all models is much slicker and no longer involves being redirected to Power Apps AI Builder screens or having to wait for environments to load or having points of failures like flows failing.
Previously when a Structured Document Processing model was created and trained in AI builder it created a Flow in Power Automate to read and update the documents. This Flow was triggered when a document was added to the library and the file content was then sent to AI builder to read the file before writing the extracted values back to the file properties of the file in SharePoint. I’m delighted to say that this Power Automate dependency is no more!! It is now done all internally by an internal workflow service without the need for Power Automate being used.
There is a small downside in that we lose the ability to modify the SharePoint document columns in the library to different column types and then in the Flow update the Update File Properties action to point to the new or modified columns. In Structured Document Processing there are a limited number of available field types for the data extracted in (field aka text, table & checkbox) and there are no types for the data types: number, date, multi-line of text, currency etc.
When a field (text) is extracted, it is added as Single line of text field for the document in SharePoint. Table will extract all table rows in your document as single line of text fields on a separate linked SharePoint list (see my blog on this) which is then linked to the file via a lookup column. Checkbox will scan your document for a checkbox and then add the value to a Yes/No (Boolean) field in SharePoint for the file. What was previously useful was when I was extracting dates for example, I could change the field type in SharePoint from single line of text to date. Then go to the flow and modify the update action to correctly format the date so it could be correctly received by the date field in SharePoint. There was also the ability to do further post processing in the Flow for example to remove a prefix from an extracted value, send a value to multiple fields or add an additional action to the workflow i.e., email a user.
Except changing the field types for the update SharePoint file properties – this is all still easily possible, but you now have to setup your own flow in Power Automate using the trigger When a file is classified by a Microsoft Syntex model. This in my opinion gives you greater flexibility and means you will only need to use/setup a workflow when required.
Creating a model with the layout method
I will now walk you through creating a Structured Document Processing model – mainly to demonstrate the new integrated UI for creating models.
- Specify the name of the model and add a description.
- Specify the fields that you wish the AI model to extract and the type they will be in – currently text (field), table & checkbox are supported.
- Upload at least five sample documents to the model for the model to study. Distinct documents with different layouts can be grouped together in collections.
- Map the fields you previously specified to extract to sections on the documents. This is done by clicking on a piece of text i.e., Bill Oddy and then assigning it to the Employee Name field.
- A model summary page will then be displayed to review the created model – if all looks ok then you can send the model to be trained.
- The model is then training – it can take a while for the model to finishing training depending on the size of your sample data.
- You will then be taken to the model screen for your model in the Syntex content centre where you will see your model is still training.
- When the model is created, and training is finished the following screen will be displayed – asking you to review your model
- Here you can review your created model and view stats about the information extracted and the accuracy. You can test the model here or Publish the model.
- Once Published you can apply the model to a library. The good thing now is you can repeat the step as many times as you like to add the model to additional libraries.
- You can now see where the model has been applied, you can see that I have also applied the model to another library in Project Jupiter! This is awesome – now we can add Structured Forms Processing models to multiple sites/libraries – my clients are going to be so happy!!!!!
- If I visit the sites, I can see the model has been applied to the library and if I add some leave request documents to the library the Structured Document Processing model is triggered, and metadata is extracted for the files.
Bonus: Structured Document Processing Models can now be applied in bulk with PnP PowerShell
As an extra bonus you can now use PnP PowerShell to publish the Structured Forms Processing Model to a library using the cmdlet Publish-PnPSyntexModel. Previously the PnP PowerShell cmdlets only worked with Unstructured Document Processing models (previously Document Understanding). This is awesome & opens up lots of automation opportunities – imagine creating a solution to provision a site and apply several Syntex models of many different types to different libraries in the site.
To do this you must connect to your Syntex Content Centre with the Connect-PnPOnline then use Publish-PnPSyntexModel.
Below are my commands and then a screenshot of the commands being run.
I can now go to the library in Project Mars and see Leave Request model has been applied using PnP PowerShell to the library.
Unfortunately, we cannot use the PnP PowerShell cmdlets to create a template of your structured dpm or freeform dpm and move it to another tenant or another Syntex content centre etc. You can use the PnP cmdlets to export unstructured dpm models to a template and deploy to another tenant for example. The reason being is an unstructured dpm stores all it’s configuration in SharePoint columns so these can be exported with your model. However structured dpm and freeform dpm store the model configuration in Dataverse tables due these models being powered by AI builder. So currently these cannot be exported – maybe in the future we will be able to!
Some very welcome changes to Structured Document Processing Models especially model re-use and Syntex model creation UI in one single pane of glass. Here is a summary of the updates:
- New Structured Document Processing Models can now be deployed to multiple sites/libraries – before a model could only be created in one library and could not be pushed out to other libraries
- Bonus models can be administered and deployed to many libraries using PnP PowerShell and then Publish-PnPSyntexModel cmdlet.
- Rename of Syntex model from Forms Processing to Structured Document Processing Model
- New integrated Syntex model creation screen – creating a Structured Froms Processing Model no longer redirects to Power Apps AI Builder. Syntex model creation menu with all three custom models and pre-built models available in a Syntex Content Centre or available in a document library enabled for Structured Document Processing Models. Experience is all nicely integrated and much more efficient/faster
- Creating a Structured Document Processing Model no longer creates a Flow in Power Automate with SharePoint & AI Builder actions to process and tag the documents. An internal workflow now processes and updates the documents which cannot be customised.
Some very exciting updates which I am looking forward to using with customers to classify and extract values from their structured or semi-structured documents, such as forms and invoices. Please contact me if you have any Syntex questions, scenarios or issues and I’ll gladly help out.