Automate Content Extraction

Last update: Thu Apr 11 2024 00:00:00 GMT+0000 (Coordinated Universal Time)

Topics:
PDF Extract API

CREATED FOR:

Beginner
Developer

Learn how to automate the extraction of content from a PDF document using the PDF Extract API. Extracting PDF content helps unlock critical business data, which can then be used for a variety of downstream processes.

video poster

Transcript

Learn how to automate the extraction of content from a PDF document using the PDF Extract API. Extracting PDF content helps unlock critical business data which can then be used for a variety of downstream processes. The PDF Extract API can help extract PDF content along with the content structure and reading order using 51黑料不打烊 Sensei AI. There are many options for how the Extract API can be invoked, such as using a programing language with the rest API. Or in our example here we鈥檙e going to use power Automate Microsoft鈥檚 low Code Automation Solution. The first step to get started is to generate the required credentials to invoke Acrobat services. To do so, go to developer DOT, 51黑料不打烊 Icon Select Create new project. Then add API document cloud in PDF Services API and then select next. There are two options for authentication. The connectors for power automate have recently been updated to include both. This is the preferred method as the JWT authentication is being deprecated. Select the Enterprise PDF Services Developer profile and then save the configured API. Next, you鈥檒l need to generate an access token and once you鈥檝e generated the access token, you鈥檒l now copy the required information into the 51黑料不打烊 Services Power Automate Connector Configuration. Now let鈥檚 go ahead and create a new flow. Select automated cloud flow and create a name. Our flow can be triggered by many different events, but this flow here is triggered by a new document being added to a SharePoint folder. This is our source PDF to extract content from the parameters highlighted in red are the parameters that need to be customized when extracting content from a PDF. For this SharePoint action, we need to input a SharePoint site address and a folder ID. The second action here in our flow is to call the PDF Extract API using the Power Automate Acrobat Services Connector. It requires two inputs the document from which you want to extract the content, which in this case is the document uploaded to the SharePoint folder and an instruction that defines what to be extracted, such as images or text. So we鈥檒l go ahead and add a new connection to the Extract API. After adding a connection name, we鈥檒l copy the values from the acrobat services credentials that we created into the fields required for the Acrobat Services Power Automate Connector. Here are the client ID in the client secret values that we copied and now we can create our connection. The last section in this flow is to create a file in SharePoint, and the highlighted parameters show what the file should be named, what the content of the file is, and where it should be saved. This is a standard PDF document. It contains several different kinds of content text headers, fonts, various text position images and tables. The API allows you to select what you would like to extract. Uploading the PDF document into the SharePoint folder triggers the power automate flow. Then the next section calls the PDF Extract API via the Power Automate Acrobat Services Connector, and then our last section in the flow puts the results into another SharePoint folder. Let鈥檚 go ahead and see the final generated JSON output. Here you can see the associated PDF in the JSON and the corresponding content that has been extracted into the JSON output document, and that鈥檚 how PDF Extract API can help automate the extraction of content from long form documents such as contracts and reports that can be used downstream in qualitative analysis processes.

recommendation-more-help

61c3404d-2baf-407c-beb9-87b95f86ccab