9 Answers, 1 is accepted
With our new 2015 Q3 release, we introduced the availability to export to plain text the content of a RadFixedDocument. More about the TextFormatProvider you can read in the attached document. The related help article is expected to be live later this week.
I hope this is helpful.
If you have further questions, please get back to us again.
Regards,
Todor
Telerik
Hi Todor,
Is there anyway to retrieve specific values from pdf and validate the values in that pdf?
Thanks and Regards,
Bhavya.

Still I don't extract the plain text. So my example:
public void createPdf()
{
RadFixedDocument document = new RadFixedDocument();
RadFixedPage page = document.Pages.AddPage();
FixedContentEditor editor = new FixedContentEditor(page);
editor.DrawText("Hello RadPdfProcessing!");
PdfFormatProvider provider = new PdfFormatProvider();
using (Stream output = File.OpenWrite(@"C:\Temp\Hello.pdf"))
{
provider.Export(document, output);
}
}
public void import()
{
TxtFormatProvider provider = new TxtFormatProvider();
using (Stream input = File.OpenRead(@"C:\Temp\Hello.pdf"))
{
RadFlowDocument document = provider.Import(input);
RadFlowDocumentEditor editor = new RadFlowDocumentEditor(document);
string documentContent = provider.Export(document);
}
}
in documentContent I expected "Hello RadPdfProcessing!", but got:
This document was generated by a trial version of Telerik Document Processing.
%PDF-1.7
%����
2 0 obj
<</Type /Catalog /Pages 3 0 R /Metadata 4 0 R /Names 5 0 R >>
endobj
3 0 obj
<</Type /Pages /Kids [6 0 R] /Count 1 >>
endobj
4 0 obj
Why?
Hello Joachim,
Following the provided example, there are two options. The first one is to import the already exported PDF file using the PdfFormatProvider class so it can parse the content of the document:
public RadFixedDocument ImportFromPdf()
{
RadFixedDocument document = new RadFixedDocument();
PdfFormatProvider provider = new PdfFormatProvider();
using (Stream input = File.OpenRead("Hello.pdf"))
{
document = provider.Import(input);
}
return document;
}
and after that, to export the already parsed content to plain text using TextFormatProvider:
public void ExportPdfAsTxt(RadFixedDocument document)
{
TextFormatProvider provider = new TextFormatProvider();
string documentContent = provider.Export(document);
File.WriteAllText("Sample.txt", documentContent);
}
The other option is instead of exporting the RadFixedDocument to PDF file and importing it back, to use the RadFixedDocument instance (which you create in createPdf method) to directly export to plain text.
I hope this information is helpful.
Regards,
Martin
Progress Telerik

Hi,
Do you have a VB.net sample how to convert PDF to a text file?
I might be missing something, as I tried to use your Code Convertor.
Best regards, Mikko
Hello Mikko,
I updated the code snippet in order to use VB instead of C#:
Dim document As RadFixedDocument
Using stream As Stream = File.OpenRead("SampleDocument.pdf")
Dim pdfFormatProvider As PdfFormatProvider = New PdfFormatProvider()
document = pdfFormatProvider.Import(stream)
End Using
Dim textFormatProvider As TextFormatProvider = New TextFormatProvider()
Dim documentContent = textFormatProvider.Export(document)
File.WriteAllText("TextFile.txt", documentContent)
Regards,
Martin
Progress Telerik
Love the Telerik and Kendo UI products and believe more people should try them? Invite a fellow developer to become a Progress customer and each of you can get a $50 Amazon gift voucher.

Hello Martin,
Thank you for your quick reply.
I get an error "Value of type 'Telerik.Windows.Documents.Flow.Model.RadFlowDocument' cannot be converted to 'Telerik.Windows.Documents.Fixed.Model.RadFixedDocument'.
I might be missing something?
Best regards, Mikko
Hello Mikko,
It seems you have a reference to the WordsProcessing`s PdfFormatProvider in your project instead of PdfProcessing`s PdfFormatProvider. In this case, you will need to use the PdfProcessing`s PdfFormatProvider which is part of the Telerik.Windows.Documents.Fixed.FormatProviders.Pdf.PdfFormatProvider namespace:
Dim pdfFormatProvider As Telerik.Windows.Documents.Fixed.FormatProviders.Pdf.PdfFormatProvider =
New Telerik.Windows.Documents.Fixed.FormatProviders.Pdf.PdfFormatProvider()
Regards,
Martin
Progress Telerik
Тhe web is about to get a bit better!
The Progress Hack-For-Good Challenge has started. Learn how to enter and make the web a worthier place: https://progress-worthyweb.devpost.com.

Hi Martin,
That was it! Thanks for super fast and excellent support!
Best regards, Mikko

We created some simple code for conversion:
RadFixedDocument pdf_document = new RadFixedDocument();
PdfFormatProvider pdf_provider = new PdfFormatProvider();
using (Stream input = File.OpenRead("c:\\temp\\aa.pdf"))
{
pdf_document = pdf_provider.Import(input);
TextFormatProvider text_provider = new TextFormatProvider();
string file_content = text_provider.Export(pdf_document);
File.WriteAllText("c:\\temp\\Sample.txt", file_content);
}
But there are diffrences:
Our PDF file is file after OCR process. So it has Searchable mask. When I CTRL+A, CTRL+C on the PDF document and then notepad CTRL+V. My result was something like that:
ponieważ zobowiązany nie figuruje w naszej bazie danych.
And the result from the code above is :
poniewa ż zobowiązan y nie figuruje w naszej bazie danych.
As you can see - some letters are missing. This is happening on .NET standard Version.
I've tested the same pdf file on the telerik library for WPF ver. 2016.2.606.45 and the result between CTRL+A, CTRL+C and telerik export to txt was identical.
What may be the problem? I've tested telerik on .NET5 and .NET Framework - the same result
Karol Dobek
Hello Karol,
The code snippet looks fine.
This behavior could be related to the PdfProcessing`s text recognition internal logic but in order to deeper investigate the case I would like to ask you to open a Support Ticket and share the document with us. I must assure you we treat all client files strictly confidential and for testing purposes only.