Extract text from PDF document result not usable

Filip · July 25, 2019, 7:21am

Hi,

I would like to use the Extract Text from PDF document action in Plumsail Documents to extract text from PDF invoices we receive, and then capture relevant info from the text to save it to a SharePoint list.

However, when doing some tests with an ultra-simple flow (get the PDF from a SharePoint document library, get the content, process it), I'm not getting a clean HTML as shown in the documentation, but I get a "nonsensical" string.

Here's my flow:

And this is the extracted text:

What am I doing wrong?

Thanks,
Filip

Petr · July 25, 2019, 8:52am

Hello @Filip,

Extract Text from PDF document action returns base64 result. You can use native base64 function to get the text from the string. The expression: base64ToString(body('Extract_text_from_PDF_document')?['fileContent'])

Also please check out the article

Best regards,
Petr Bushuev
Plumsail team

Filip · July 25, 2019, 9:12am

Brilliant! That did the trick!
Thank you for the quick response

Filip