-
-
Notifications
You must be signed in to change notification settings - Fork 25
Description
Reported on linkedin to be verified
It seems that the function Convert-PDFToText is working a bit incorrect - I have to test further, but for the moment (in my environment) it works like this:
Assuming that PDF has multiple pages with PageText1, PageText2,.. PageTextN, after running the function I get the result where text from every next page has all the text from previous pages, smthng like "PageText1PageText1PageText2PageText1PageText2PageText3" for pdf of 3 pages.
It seems that (in my environment) I could fix it by explicitly declaring new TextExtractionStrategy for every call of GetTextFromPage
so, line 1754
[iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor]::GetTextFromPage($ExtractedPage, $iTextExtractionStrategy) converted to [iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor]::GetTextFromPage($ExtractedPage, [iText.Kernel.Pdf.Canvas.Parser.Listener.LocationTextExtractionStrategy]::new())
after this fix extraction worked as expected.