How to render unicode checkbox u2612 and u2610 in PDF? Works ok in Word (Processing 2024.3.802)

2 Answers 149 Views
PdfProcessing
Jason
Top achievements
Rank 1
Iron
Jason asked on 19 Aug 2024, 10:33 PM | edited on 20 Aug 2024, 06:05 AM

I am having trouble getting some HTML converted to PDF to render Unicode checkboxes.

\u2612 ☒ and \u2610  

They appear ok in the word conversion, but are missing in PDF

Here is the same code


using Telerik.Documents.ImageUtils;
using Telerik.Windows.Documents.Flow.FormatProviders.Html;
using Telerik.Windows.Documents.Flow.Model;

namespace Web.Tests.Pdf;

public class PdfRenderTests
{
    private readonly IHtmlToWordConverter _htmlToWordConverter = new HtmlToWordConverter();
    string path = "c:\\\\temp\\\\pdf\\\\";

    [Fact]
    public void CanRenderCheckbox()
    {
        var html = "<span class=\"checkbox\">\u2612 Yes \u2610 No</span>";
        var htmlDocument = _htmlToWordConverter.ConvertHtmlToWord(html);
        var docxProvider = new Telerik.Windows.Documents.Flow.FormatProviders.Docx.DocxFormatProvider();
        using var memoryStream = new MemoryStream();
        docxProvider.Export(htmlDocument, memoryStream);
        var documentBytes = memoryStream.ToArray();

        File.WriteAllBytes($"{path}Test.docx", documentBytes);

        var pdfBytes = PdfConverter.ConvertDocxToPdf(documentBytes);
        File.WriteAllBytes($"{{path}}Test.pdf", pdfBytes);
    }
}

public interface IHtmlToWordConverter
{
    RadFlowDocument ConvertHtmlToWord(string html);
}

public class HtmlToWordConverter : IHtmlToWordConverter
{
    public RadFlowDocument ConvertHtmlToWord(string html)
    {
        var htmlFormatProvider = new HtmlFormatProvider();
        return htmlFormatProvider.Import(html);
    }
}

public static class PdfConverter
{
    static PdfConverter()
    {
        var defaultImagePropertiesResolver = new ImagePropertiesResolver();
        Telerik.Windows.Documents.Extensibility.FixedExtensibilityManager.ImagePropertiesResolver =
            defaultImagePropertiesResolver;
    }

    public static byte[] ConvertDocxToPdf(byte[] docxBytes)
    {
        var docxProvider = new Telerik.Windows.Documents.Flow.FormatProviders.Docx.DocxFormatProvider();
        var pdfProvider = new Telerik.Windows.Documents.Flow.FormatProviders.Pdf.PdfFormatProvider();

        var document = docxProvider.Import(docxBytes);

        using var memoryStream = new MemoryStream();
        pdfProvider.Export(document, memoryStream);
        return memoryStream.ToArray();
    }
}

 

PDF Output:

 

Word Output


2 Answers, 1 is accepted

Sort by
0
Jason
Top achievements
Rank 1
Iron
answered on 20 Aug 2024, 06:05 AM | edited on 20 Aug 2024, 06:06 AM

After testing things further, it appears that the Telerik conversion is defaulting to using Segoe UI or Arial Unicode MS as fallbacks for the font.  

I'm not sure if that means the UTF-8 characters are not in the fontset or not.

To resolve, create a font provider which would return either of these two fonts then the ballotbox character would render.  As in my case these fonts were not being resolved automatically and embedded inside the PDF.  

0
Yoan
Telerik team
answered on 20 Aug 2024, 11:39 AM

Hello Jason,

The behavior you are experiencing reproduces only in a .NET Standard due to limitations of the PdfProcessing library in that environment. In the Cross-Platform, Fonts, and Images articles you can read in detail about these limitations, how they take form, and how they can be handled properly.

This case specifically is caused by the Fonts limitations. In short, the PdfProcessing library requires access to the font data used in the document to correctly read and embed it in the PDF file, ensuring that the content is rendered accurately according to the specified fonts. The .NET Standard version of the library however does not offer a default mechanism to read fonts. This is why the user needs to manually provide the fonts used in the document. This can be achieved through a FontsProvider implementation.

I allowed myself to create a sample project that implements the FontsProvider and the code snippet you have provided. The fonts that need to be provided are the ones used in the document - Times New Roman (for the text) and Segoe UI Symbol (for the checkboxes). The project managed to successfully export the DOCX to PDF with the checkboxes. I am attaching this project for your disposal so you can test it and use it for your own purposes.

I hope this helps.

Regards,
Yoan
Progress Telerik

Love the Telerik and Kendo UI products and believe more people should try them? Invite a fellow developer to become a Progress customer and each of you can get a $50 Amazon gift voucher.

Tags
PdfProcessing
Asked by
Jason
Top achievements
Rank 1
Iron
Answers by
Jason
Top achievements
Rank 1
Iron
Yoan
Telerik team
Share this question
or