Extract text from PDF document result not usable


#1

Hi,

I would like to use the Extract Text from PDF document action in Plumsail Documents to extract text from PDF invoices we receive, and then capture relevant info from the text to save it to a SharePoint list.

However, when doing some tests with an ultra-simple flow (get the PDF from a SharePoint document library, get the content, process it), I’m not getting a clean HTML as shown in the documentation, but I get a “nonsensical” string.

Here’s my flow:

And this is the extracted text:

image

What am I doing wrong?

Thanks,
Filip


#2

Hello @Filip,

Extract Text from PDF document action returns base64 result. You can use native base64 function to get the text from the string. The expression: base64ToString(body(‘Extract_text_from_PDF_document’)?[‘fileContent’])

Also please check out the article

Best regards,
Petr Bushuev
Plumsail team


#3

Brilliant! That did the trick!
Thank you for the quick response :+1:

Filip