There has been a whole raft of exciting announcements and new functionality released for SharePoint Syntex this year. One of the most exciting new features I have used recently is the ability to train Syntex forms processing to extract data from a table on a form where the row count is variable. Before you could not extract table data from a form where the number of rows was variable and the table size varied.

SharePoint Syntex uses advanced AI and machine teaching to amplify human expertise, automate content processing, and transform content into knowledge.

It has three main pillars:

  • Content Understanding
    • Create AI models that capture expertise to classify and extract information to automatically apply metadata.
  • Content Processing
    • Automate capture, ingestion and categorisation of content and streamline content-centric processes using Power Automate.
  • Content Compliance
    • Connect and manage content to improve security and compliance.

The easiest way to get started with SharePoint Syntex if you are not already licenced is to sign up for a free 30 day trial for 300 users

Here is a Microsoft page that explains all

In this blog I am going to focus on Content Processing aka Forms Processing where from a SharePoint document library an AI builder model can be built and trained to select the precise location of fields on a current form & now also allowing variable table data.

Once a model has been built, AI Builder will then automatically read and extract metadata from the defined field locations for future files that are uploaded.

Implement Forms Processing for Extractable Table Item Data

I will now show you how to setup forms processing with extractable table item data and I will use the sample files provided by Microsoft for forms processing (link). You may already have some forms with tabular data that you want to extract from in your organisation to work with AI Builder. There are requirements for the input documents – the main one is that documents must be in JPG, PNG, or PDF format (text or scanned). See here for the full requirements.

  1. Create a new SharePoint document library. Then in the library click Automate from the document library menu bar – then AI Builder then click Create a model to process forms.
  1. Name the model and then from the new “Extract info from tables” section select Yes. Select A new list from the “where should we save table info” section as this will then create a new list for your table data. You could if you like pre-create a list and use this if you have any special field types i.e. currency, number. As currently all the fields in the auto created list are single line of text type. Once happy with the other options click Create to continue creating the model in AI Builder.
  1. The model will then load in AI Builder & here is where you will choose the information to select. Here you will specify on the fields from the form which you wish to select values from. Select Add and then select Field. Enter the name of the field and click Done.
  1. Next repeat the same process and add the following as type Field Invoice No, Customer ID & Total
  1. Now we are going to add a Single page table to extract the data from the table in our form. Name the table i.e. Invoice Line Items” and then add the headers from the table. I added four columns: Quantity, Description, Unit Price & Line Total then clicked Done.
My completed information to extract screen
  1. Next create a “New Collection” of documents and add at least five sample documents to the collection so we can now train the model. Once five documents are uploaded in a collection click Next.
  1. Now we have to Tag the fields and table on the documents to train the model. The first document in the collection will load – I like to click the “Show detected words” button which identifies text from the form which you can then use to map to fields.
  1. Draw a rectangle around the date on the form and then select the Date field we created earlier from the dropdown

Do the same for the other fields: Invoice No, Customer ID and Total

  1. Next for the table draw a rectangle around the whole table then select Invoice Line Items from the dropdown.
  1. Next left click in the table to mark the rows of the table. I have specified four including the table headers.

I then used Ctrl & left click to specify the columns (Quantity, Description, Unit Price & Line Total).

I then clicked on the headers of the table area denoted A, B, C & D and set the matching table header from the dropdown.

  1. Once all the headers are setup – click Ignore first row so the headers are not included as rows in the table. Then preview the data selected and then if you are happy with the data selected select Done.
  1. Repeat the process for the other five documents in the collection and then click Next.
  1. The model summary page will then be displayed summarising the model that was created. Click Train.
  1. Wait for the status of the model to change from Training to Trained. Once Trained click on the model to be taken to the model page.
  1. The model now can be reviewed and now needs to be published. Click Use model.
  1. Now to use the model in your SharePoint document library it needs to be assigned to a flow in Power Automate. Click Use model and then click Create flow. Once successfully created click Go to SharePoint.
  1. The AI builder forms processing model is now assigned to the document library and we can see the additional fields that have been created i.e. Invoice No, Customer ID etc.
  1. We will now add some test files to the library to test the model. We can now see blow they are being analysed.

Behind the scenes a flow in Power Automate is being kicked off when the document is added to the library and then an action in the flow calls AI builder which calls our model. Which then decorates all the fields in SharePoint with data from the files uploaded.

What is great now with linked tables is a new column is created in the library where we can click on a link and be taken to the data taken from the table extraction.

The table data is then displayed and each row is a new line item in the linked SharePoint list.

This view shows all of the table data extracted from the five documents in a SharePoint list with each row as a separate list item. The items are grouped by file ID.

Summary

There we have it extracting all rows from tables using SharePoint Syntex forms processing. This was a highly requested feature as many organisations have lots of tabular data in forms they may already use. When we have this metadata in SharePoint we could then use it to be represented visually either through SharePoint list formatting or even Power BI.

It seems to work very well and I’m excited to see further developments in the SharePoint Syntex roadmap. There is lots coming in the next few months and I’m excited to work with a few organisations to use AI and machine learning to make work better!

Leave a Reply