A new Microsoft Syntex AI document processing model was recently released called freeform which purpose is to automatically extract information from unstructured and freeform documents such as letters, contracts or correspondence. Initially I was a little confused with the new freeform model type and what scenario you would use it for, so I needed to test it out & learn. My key confusion was what exactly are “freeform” documents and seeing which documents would work best with the model? I will attempt to answer this question in this blog along with introducing the freeform model and other Syntex model renaming changes.

Freeform definition: not confirming to a regular or formal structure or shape. – Oxford Languages Definition

Microsoft Syntex is using Microsoft Power Apps AI Builder’s document processing component (which was previously known as form processing) to create freeform document processing models applied to SharePoint document libraries. Machine learning technology is used to identify and extract key-value pairs and table data from unstructured or freeform documents these are then added to a document as metadata.

SIDE NOTICE: I’m still getting over the excitement from the recent Microsoft Ignite event where SharePoint Syntex became Microsoft Syntex to reflect it’s now getting the complete Microsoft spotlight treatment. There were so many new features including the freeform document processing model & I can tell you this blog is going to be busy as new features come out and I get to grips with them. Syntex using content Intelligence with AI to do the work in organisations that employees don’t want to, or the scale is too big. It is common that most organisations will have large repositories of content and that they might be getting very little value from this content. Employees will often complain that they cannot find the content they need to complete the task they are working on or there are too many manual steps. Microsoft Syntex looks to be great for increasing discoverability of content, increasing productivity & saving costs through efficiencies.

In other significant news two of the founding Syntex AI models (Document Understanding & Forms Processing) were renamed (see table below).

Old Syntex Model NameNew Syntex Model Name
Document UnderstandingUnstructured Document Processing Model
Forms ProcessingStructured Document Processing Model
N/AFreeform Document Processing Model
Table outlining previous and new Syntex model names

Microsoft have produced a great table comparing the three different custom model types (link).

There is also a new UI Syntex model creation UI used for creating Syntex models locally from a library or centrally in a Syntex content centre

New Syntex Model Creation UI

This new Syntex model creation UI (image above) is triggered by selecting New Model in a Syntex Content Centre or by creating a local Syntex model on a document library. This allows a user to create any of the different types of Syntex models from one screen. All model types can now be created centrally from the Syntex Content Centre which is a welcome update – previously Structured Document Processing models could only be created and used on one specific SharePoint document library.

You will notice the naming of the models on the Syntex model creation menu is different from their new names and instead the labels refer to the scenario than the model’s name. I’ve listed the UI labels and their corresponding Syntex model in a table below.

New Syntex Model Creation UI NamingSyntex Model Name
Teaching methodUnstructured Document Processing Model
Layout methodStructured Document Processing Model
Freeform selection methodFreeform Document Processing Model

Although a little confusing all the different names – I think the new Syntex model creation UI with its descriptive labels & images is helpful to guide a user to use the best Syntex model type for the scenario. For example, I’ve had users trying to use a document understanding model for every document type when the document contains mainly tabular/form data and other models would be more suited.

Testing out the new Freeform Syntex Model

I will now try and get to grips with the freeform model and test it out. Syntex freeform models use AI builder which is part of the Power Platform behind the scenes to provide a no-code & integrated way to build and train a model to process documents. You could I suppose if you had all the time in the world & fancied doing lots of tweaking – set it this all up manually & separately using Power Automate & AI builder attached to a SharePoint document library.

I generally learn by testing & configuring functionality and trying to create real world scenarios so I can then talk about the new functionality and answer questions with the community and my customers. So, I needed to figure out which type of documents the freeform model works best with and find some sample documents. I know AI builder is used behind the scenes and I remembered that previously AI Builder has good sample documents for scenarios available to download. So, this was my first port of call, and I found some example documents for document processing to download here.

In this zip file you can download there are sample files for Invoices and Rental Agreements. I looked at the Rental Receipts folder and the files and believe as they are in a variety of formats (freeform) these would be best with the freeform model. See the image below where I have displayed three Rental agreements side by side – you can clearly see they are a variety of formats – some are in table format and some in paragraph format, but they are all different.

3 x Rental Receipts documents – in a variety of formats

In the image below (click on it to expand) I have placed two of the documents side by side to demonstrate the fields I want to extract from the rental agreements i.e., Landlord, Security Deposit etc. You will notice they are all in different places, text styles and one agreement is in table format, and one is in paragraph format. – so, it’s all very FREEFORM!

  1. To create a Freeform model, I went to my Syntex content centre and then selected from the new Syntex model creation UI “Freeform selection method“.
  1. I was then provided with a new screen with some further information about the freeform model. Giving details of what the model can do, examples, training details & supported file types. That the freeform model is still in preview mode – so could change & the model currently only supports text in English at present.
  1. I can then give my model a name i.e., Rental Receipts (freeform).
  1. On the next screen I need to specify the names of the information I wish to extract from the documents. Fields (text), checkbox or table data (cannot extract multiple lines items from a table) can be extracted.
  1. I then uploaded five rental receipt documents in the next screen. A minimum of five documents is required but add more than five documents if your sample documents have a wide variety of different formats. You are training the model to identify the text strings to extract along with handling variances in layout/formatting.
  1. You will now mark on each of the five documents where the previously created fields are. See the image below where I show you the tagging process for two documents – note the different formats.
  1. Once I have tagged all of the fields on all uploaded sample documents, I am then presented with a model summary page.
  1. The model is now training and due to the different layouts and different text locations the training takes a longer time than Structured document processing models (formerly forms processing). Have a tea, coffee, beer, wine etc whilst waiting, it took about 30 minutes for me!
  1. When the model has finished training – visit the model in the model’s library in the content centre. Here you can review the model, modify the model settings (description, sites where the model is available to be installed from a library and retention label) or even edit the model i.e., add different documents, change fields.
  1. Here is the review screen for the model and here you can view details of the model. On this screen you can do a quick test by uploading a sample document and see how the model works with sample document i.e., which fields does it extract. I then press Publish to publish the model and make it available in SharePoint.
  1. Once the model is published it can be added to any library in any site through the UI. This can be done multiple times to apply the model to multiple libraries in multiple sites. You can later also go into the model settings if you like restrict to restrict the model so it can only be applied in specific sites.
  1. Here the model is applied to a library, and I have added the sample rental agreements. Remember all of the agreements were in various layouts but all contained similar information. You will see Landlord, Tenant, Lease Start Date, Lease End Date, Premises and Monthly rent have been extracted for every document. Security deposit amount has been extracted for 4 out of 5 which is correct as security deposit is not listed on every agreement – they must have a very trusting landlord!

Summary

It’s been great to road test this new model and see where it would be best used. I have a new nickname for it though “text extractor” to account that you are really training the AI to look for a particular sting, table or checkbox on the document that could be in any location. Through the training you are getting the AI used to the different formats of the document, the approximate location and example text string format for the text to be extracted. When a document is uploaded to a library where the model is then applied the AI magically processes the document and works out the information to extract.

Freeform is different from the other custom document processing models – it is kind of a hybrid between structured and unstructured document processing models. Powered by AI Builder trained to extract fields, tables or checkbox’s anywhere in your document. Unlike structured document processing models (which also uses AI Builder) where this model focuses in on a particular section of a page for a field/table/checkbox to extract. Freeform is similar to unstructured document processing models in that the information to extract can be anywhere but unstructured dpm extracts text using rules/patterns to identify the location. Freeform and structured document processing models do not have a classifier component i.e., identify a particular document type and only trigger the model on that type, whereas unstructured dpm does have a classification component.

I’m a fan of the new naming conventions but they will take a little bit of time to get used to them. The names draw the focus to the type of content the model works best for – encouraging people to use the best model type for their content. The new AI Builder integration is very slick and nicely integrated along with new UIs.

Coming back to Freeform – I’m very keen to test this out in the real world – think it will work well with technical drawings, contracts and other correspondance that uses many different layouts. I hope this post helps you out and helps you to learn more about this model type. Let me know if you have any questions or comments below…

This Post Has 3 Comments

  1. Bradley

    Hi Leon
    Thanks for taking the time to write this some really good information for the community. Couple of questions if I may:

    1. In the older forms processing model (Structured document processing) which seems a very similar setup, you paid per page via credits i.e. if you had info spread across 4 pages you would need to account for this when working out how many credits / documents you could consume, is this the same in the freeform document processing model i.e. one rental agreement may have the data on page 1, but another could be on page 3 so you still need to account for 3 pages. I guess you could in theory create different collections if you know the type?

    2. In your example you have a doc library where you know the document data type before uploading e.g. Rental agreements, so I assume if uploaded a non rental agreement into your library assumption is this would still generate a charge as it looks for that info across the key pages? Even though it doesn’t match your models / collections. It would be nice to have some pre scan facility where at a very low builder credit cost identifies the rental agreements from a series of other documents, and then only attempts to extract the info from rental agreements. i.e. where the data is not aswell organised.

    1. Leon Armston

      Hi Bradley

      Thanks for your comment and kind words about this blog.

      1) Freeform and Structured both use the same AI builder billing method and method of calculating with pages. So you would incur the same costs with the same document in both freeform or structured. You could if you knew it was always going to be on the third page do some pre-processing i.e. split the document into seperate files so only the third page is processed?
      2) I hear you with the pre-scan – structured/freeform are designed for only one of those models to be applied per library and only one model per library. So basically everything that is added to that library gets ran against the model. Would be nice if like unstructured document processing there was a classification process.

      Thanks

  2. Elisabeth

    Hi Leon,
    Both Freeform and structured document model use collections. Now, I had understood that in structured model this was to separate the different layouts from each other. So how should I understand these collections? When would you create another collection?

Leave a Reply