A new Microsoft Syntex AI document processing model was recently released called freeform which purpose is to automatically extract information from unstructured and freeform documents such as letters, contracts or correspondence. Initially I was a little confused with the new freeform model type and what scenario you would use it for, so I needed to test it out & learn. My key confusion was what exactly are “freeform” documents and seeing which documents would work best with the model? I will attempt to answer this question in this blog along with introducing the freeform model and other Syntex model renaming changes.
Microsoft Syntex is using Microsoft Power Apps AI Builder’s document processing component (which was previously known as form processing) to create freeform document processing models applied to SharePoint document libraries. Machine learning technology is used to identify and extract key-value pairs and table data from unstructured or freeform documents these are then added to a document as metadata.
In other significant news two of the founding Syntex AI models (Document Understanding & Forms Processing) were renamed (see table below).
|Old Syntex Model Name||New Syntex Model Name|
|Document Understanding||Unstructured Document Processing Model|
|Forms Processing||Structured Document Processing Model|
|N/A||Freeform Document Processing Model|
Microsoft have produced a great table comparing the three different custom model types (link).
There is also a new UI Syntex model creation UI used for creating Syntex models locally from a library or centrally in a Syntex content centre
This new Syntex model creation UI (image above) is triggered by selecting New Model in a Syntex Content Centre or by creating a local Syntex model on a document library. This allows a user to create any of the different types of Syntex models from one screen. All model types can now be created centrally from the Syntex Content Centre which is a welcome update – previously Structured Document Processing models could only be created and used on one specific SharePoint document library.
You will notice the naming of the models on the Syntex model creation menu is different from their new names and instead the labels refer to the scenario than the model’s name. I’ve listed the UI labels and their corresponding Syntex model in a table below.
|New Syntex Model Creation UI Naming||Syntex Model Name|
|Teaching method||Unstructured Document Processing Model|
|Layout method||Structured Document Processing Model|
|Freeform selection method||Freeform Document Processing Model|
Although a little confusing all the different names – I think the new Syntex model creation UI with its descriptive labels & images is helpful to guide a user to use the best Syntex model type for the scenario. For example, I’ve had users trying to use a document understanding model for every document type when the document contains mainly tabular/form data and other models would be more suited.
Testing out the new Freeform Syntex Model
I will now try and get to grips with the freeform model and test it out. Syntex freeform models use AI builder which is part of the Power Platform behind the scenes to provide a no-code & integrated way to build and train a model to process documents. You could I suppose if you had all the time in the world & fancied doing lots of tweaking – set it this all up manually & separately using Power Automate & AI builder attached to a SharePoint document library.
I generally learn by testing & configuring functionality and trying to create real world scenarios so I can then talk about the new functionality and answer questions with the community and my customers. So, I needed to figure out which type of documents the freeform model works best with and find some sample documents. I know AI builder is used behind the scenes and I remembered that previously AI Builder has good sample documents for scenarios available to download. So, this was my first port of call, and I found some example documents for document processing to download here.
In this zip file you can download there are sample files for Invoices and Rental Agreements. I looked at the Rental Receipts folder and the files and believe as they are in a variety of formats (freeform) these would be best with the freeform model. See the image below where I have displayed three Rental agreements side by side – you can clearly see they are a variety of formats – some are in table format and some in paragraph format, but they are all different.
In the image below (click on it to expand) I have placed two of the documents side by side to demonstrate the fields I want to extract from the rental agreements i.e., Landlord, Security Deposit etc. You will notice they are all in different places, text styles and one agreement is in table format, and one is in paragraph format. – so, it’s all very FREEFORM!
- To create a Freeform model, I went to my Syntex content centre and then selected from the new Syntex model creation UI “Freeform selection method“.
- I was then provided with a new screen with some further information about the freeform model. Giving details of what the model can do, examples, training details & supported file types. That the freeform model is still in preview mode – so could change & the model currently only supports text in English at present.
- I can then give my model a name i.e., Rental Receipts (freeform).
- On the next screen I need to specify the names of the information I wish to extract from the documents. Fields (text), checkbox or table data (cannot extract multiple lines items from a table) can be extracted.
- I then uploaded five rental receipt documents in the next screen. A minimum of five documents is required but add more than five documents if your sample documents have a wide variety of different formats. You are training the model to identify the text strings to extract along with handling variances in layout/formatting.
- You will now mark on each of the five documents where the previously created fields are. See the image below where I show you the tagging process for two documents – note the different formats.
- Once I have tagged all of the fields on all uploaded sample documents, I am then presented with a model summary page.
- The model is now training and due to the different layouts and different text locations the training takes a longer time than Structured document processing models (formerly forms processing). Have a tea, coffee, beer, wine etc whilst waiting, it took about 30 minutes for me!
- When the model has finished training – visit the model in the model’s library in the content centre. Here you can review the model, modify the model settings (description, sites where the model is available to be installed from a library and retention label) or even edit the model i.e., add different documents, change fields.
- Here is the review screen for the model and here you can view details of the model. On this screen you can do a quick test by uploading a sample document and see how the model works with sample document i.e., which fields does it extract. I then press Publish to publish the model and make it available in SharePoint.
- Once the model is published it can be added to any library in any site through the UI. This can be done multiple times to apply the model to multiple libraries in multiple sites. You can later also go into the model settings if you like restrict to restrict the model so it can only be applied in specific sites.
- Here the model is applied to a library, and I have added the sample rental agreements. Remember all of the agreements were in various layouts but all contained similar information. You will see Landlord, Tenant, Lease Start Date, Lease End Date, Premises and Monthly rent have been extracted for every document. Security deposit amount has been extracted for 4 out of 5 which is correct as security deposit is not listed on every agreement – they must have a very trusting landlord!
It’s been great to road test this new model and see where it would be best used. I have a new nickname for it though “text extractor” to account that you are really training the AI to look for a particular sting, table or checkbox on the document that could be in any location. Through the training you are getting the AI used to the different formats of the document, the approximate location and example text string format for the text to be extracted. When a document is uploaded to a library where the model is then applied the AI magically processes the document and works out the information to extract.
Freeform is different from the other custom document processing models – it is kind of a hybrid between structured and unstructured document processing models. Powered by AI Builder trained to extract fields, tables or checkbox’s anywhere in your document. Unlike structured document processing models (which also uses AI Builder) where this model focuses in on a particular section of a page for a field/table/checkbox to extract. Freeform is similar to unstructured document processing models in that the information to extract can be anywhere but unstructured dpm extracts text using rules/patterns to identify the location. Freeform and structured document processing models do not have a classifier component i.e., identify a particular document type and only trigger the model on that type, whereas unstructured dpm does have a classification component.
I’m a fan of the new naming conventions but they will take a little bit of time to get used to them. The names draw the focus to the type of content the model works best for – encouraging people to use the best model type for their content. The new AI Builder integration is very slick and nicely integrated along with new UIs.
Coming back to Freeform – I’m very keen to test this out in the real world – think it will work well with technical drawings, contracts and other correspondance that uses many different layouts. I hope this post helps you out and helps you to learn more about this model type. Let me know if you have any questions or comments below…