-
ATTENTION: This thread is no longer relevant since PyMuPDF's support of subset fonts.
In all of these cases, just make sure to use Is your feature request related to a problem? Please describe. I'm trying to use a font file, say 'Dengxian-light.ttf', to insert a Unicode string containing both Chinese and English characters. My code follows: fontfile= "dengxian-light.ttf"
page.insertFont(fontname="EXT_0", fontfile=fontfile)
text = "姓名 name"
page.insertText((20, 20), text, fontname="EXT_0")
doc.save(output_pdf_path) The saved PDF file look fine in viewer, but the file size is hugely inflated because I guess it has embedded the entire font file into the PDF, instead of only the font data for characters in use. Describe the solution you'd like Describe alternatives you've considered Additional context |
Beta Was this translation helpful? Give feedback.
Replies: 13 comments 15 replies
-
Sorry I missed a few words. I have also tried following code: page.insertText((20, 20), text, fontfile="dengxian-light.ttf") this ends up with the pdf showing correct English characters while the Chinese characters are shown as dots. if I change it to: page.insertText((20, 20), text, fontfile="dengxian-light.ttf", fontname="EXT_1") it behaves same as original post, hugely inflated. Thanks. |
Beta Was this translation helpful? Give feedback.
-
An interesting question! If you use an office software like LibreOffice or Word, they do font subsetting internally, when you export a document to PDF. So the resulting file will be relatively small and depend on the total set of characters you ever used in the Word document. I have been experimenting:
But maybe you want to consider an alternative:
The result should be a much smaller PDF - which looks exactly like the original. |
Beta Was this translation helpful? Give feedback.
-
Thanks for your wonderful comments @JorjMcKie , I will definitely take a look into the font replacing scripts. |
Beta Was this translation helpful? Give feedback.
-
Here is a Python script that produces a PDF with one page of text with a mix of Latin (German) and Chinese characters. This script Then execute Then execute
|
Beta Was this translation helpful? Give feedback.
-
Weird. Please insert after the statement |
Beta Was this translation helpful? Give feedback.
-
you have installed |
Beta Was this translation helpful? Give feedback.
-
No, take mine please. I need to update the other one. |
Beta Was this translation helpful? Give feedback.
-
So the most practical thing to do is creating your PDF as you did before. |
Beta Was this translation helpful? Give feedback.
-
Contemplating a bit more about this idea: We could make a version of import fitz
import font_replace
doc = fitz.open(...) # new or existing PDF
# create your text pages, ...
# make changes to existing text pages,
# etc.
# when everything is done:
font_replace.replace(doc, # the document
font_list, # a list of all fonts used to write text
)
doc.save(...) |
Beta Was this translation helpful? Give feedback.
-
I can't wait for the imported version, so I made one, named |
Beta Was this translation helpful? Give feedback.
-
Not at my computer right now ... so have to postpone my feedback. But I love your initiative!!! A big thank you in advance!
|
Beta Was this translation helpful? Give feedback.
-
@cuteufo - excellent start!
I am looking forward to test your next version. Once we are done to our mutual satisfaction, 😉, we may want to include it in the official PyMuPDF package as an optional Optional means, we would check whether try:
import fontTools
fitz.Document.subset_fonts = fitz.utils.subset_fonts # the function will reside in utils.py
del fontTools
except ImportError:
fitz.Document.subset_fonts = lambda x: print("fontTools not installed") |
Beta Was this translation helpful? Give feedback.
-
Thanks for your great comments. Look forward to the new feature in your official package. I updated the code and, in order to review the code more easily, I uploaded both old and updated code on Github. In the updated version, I have tried to fix the problem in your comments 2, 3, and 6. For comments 4 and 5, honestly I didn't understand it because my limited knowledge about PDF specifications. Would you please check the code again? I am doing this because my project in job requires to write text in particular fonts into an existing PDF. The code is working for the project and I will have to go on with other tasks. But I will try my best to make time for future updates of subset_fonts. |
Beta Was this translation helpful? Give feedback.
An interesting question!
Your observation is correct: if inserting text the complete fontfile is included in the PDF, which can be big. The reason is that PyMuPDF does not know, which characters you intend to use.
There are ways to build font subsets, and there also are Python packages that let you do this.
If you use an office software like LibreOffice or Word, they do font subsetting internally, when you export a document to PDF. So the resulting file will be relatively small and depend on the total set of characters you ever used in the Word document.
And here comes the difference to using PyMuPDF: it does not and cannot know this!
I have been experimenting: