Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF Font Autosize Not Working On Firefox #2551

Closed
kyl293 opened this issue Mar 28, 2024 · 7 comments
Closed

PDF Font Autosize Not Working On Firefox #2551

kyl293 opened this issue Mar 28, 2024 · 7 comments
Labels
workflow-forms From a users perspective, forms is the affected feature/workflow

Comments

@kyl293
Copy link

kyl293 commented Mar 28, 2024

What happened? What were you trying to achieve?
PDF's that are generated do not auto size font for inputs on Firefox but look normal on Microsoft Edge and Google Chrome. If I then save it using acrobat it fixes the font size issue. If there is a way (code wise) to fix the scaling issue?

Environment

Which environment were you using when you encountered the problem?
I am running the software on AWS Lambda (Put pypdf in a layer so that Lambda can access it)

$ python -m platform
{
  "statusCode": 200,
  "body": {
    "file_name": "1182_insert_id_card.pdf"
  }
}
# TODO: Your output goes here

Code + PDF

This is a minimal, complete example that shows the issue:

import os
import json
import boto3
from pypdf import PdfReader, PdfWriter
from io import BytesIO

def generate_pdf(event, context):
    # Extract input data from the Lambda event
    #print(json.dumps(event))
 # Retrieve the raw JSON payload from the request body
    raw_body = event.get("body")
    input_data = raw_body
    #print("A")
    # Check if the body is present
    """
    if raw_body is not None:
        
        # Parse the JSON from the request body
        try:
           input_data = json.loads(raw_body)
        except json.JSONDecodeError as e:
            return {
                'statusCode': 400,
                'body': json.dumps({'error': 'Invalid JSON payload'}),
            }
    else:
        return {
            'statusCode': 400,
            'body': json.dumps({'error': 'Request body is missing'}),
        }
"""
    # Now input_data should contain the parsed JSON payload
    #print(f"Received Input Data: {input_data}")

    # Path to the template PDF file (assuming it's included in the Lambda deployment package)
    #print("B")
    template_path = getTemplate(input_data)
    #print(f"{template_path=}")
    #print("C")
    # Fill in the form fields in the template PDF
    #print("d")
    filled_pdf_buffer = fill_pdf_template(template_path, input_data)
    #print("e")
    # Store the filled PDF in the /tmp directory

    # Upload the filled PDF to the destination S3 bucket
    #print('f')
    destination_bucket = input_data.get('dest_bucket')
    output_key = get_output_key(input_data)
    #print('g')
    upload_to_s3(filled_pdf_buffer, destination_bucket, output_key)
    #print('h')
    #print("b4 return")

    return {
            'statusCode': 200,
            'body': {'file_name': f'{filled_pdf_buffer.split("/")[-1]}'}
        }
        
        
def fill_pdf_template(template_path, input_data):

    reader = PdfReader(template_path,strict=True)
    metadata=reader.metadata
    writer = PdfWriter()
    writer.clone_reader_document_root(reader)
    writer.add_metadata(metadata)    
    pagecount=0
    writer.set_need_appearances_writer(False)
   
    
    writer.append(reader)
    
    for page in range(len(reader.pages)):
        fields = reader.get_form_text_fields()
        print(fields)
        if fields is not None:
            writer.update_page_form_field_values(
                writer.pages[page],
                input_data.get('input_data'),
                flags=1,
                auto_regenerate=True,
           )

    #print("run" ,page)
    # write "output" to pypdf-output.pdf
    tmp_filename=f'/tmp/{input_data.get("uniqueID", "Unknown")}_{input_data.get("file_name", "Unknown")}'
    with open(tmp_filename, "wb") as output_stream:
        writer.write(output_stream)
 
    return tmp_filename
    
    
def get_output_key(input_data):
    # Customize how you want to generate the output key based on input data
    # This is just a simple example, adjust it as needed
    return f'{input_data.get("uniqueID", "Unknown")}_{input_data.get("file_name", "Unknown")}'

def upload_to_s3(tmp_filename,dest_bucket,output_key):
    s3 = boto3.client('s3')
    s3.upload_file(tmp_filename,dest_bucket,output_key)


def getTemplate(input_data):
    s3= boto3.client('s3')
    bucket=input_data.get("bucket_name")
    print(f"{bucket=}")
    path=input_data.get("file_path")
    print(f"{path=}")
    name='/tmp/'+input_data.get("file_name")
    print(f"{name=}")
    try:
        print("yes")
        s3.download_file(bucket,path,name)
        return name
    except  Exception as error:
        print("error during getTemplate")
        print(str(error))
        return {
            'statusCode': 400,
            'body': {'error': error},
        }

Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!
Card with filled dummy data
1182_insert_id_card.pdf
Card with unfilled data
insert_id_card (1).pdf

@kyl293 kyl293 changed the title PDF Font Autosize Not Working. PDF Font Autosize Not Working On Firefox Mar 28, 2024
@stefan6419846 stefan6419846 added the workflow-forms From a users perspective, forms is the affected feature/workflow label Mar 28, 2024
@pubpub-zz
Copy link
Collaborator

I suspect the way you generate writer.
Can you try to replace your fill_pdf_template function:

def fill_pdf_template(template_path, input_data):

    reader = PdfReader(template_path,strict=True)
    writer = PdfWriter(clone_from=reader)
    pagecount=0
    for page in range(len(reader.pages)):
        fields = reader.get_form_text_fields()
        print(fields)
        if fields is not None:
            writer.update_page_form_field_values(
                writer.pages[page],
                input_data.get('input_data'),
                flags=1,
                auto_regenerate=True,
           )
    tmp_filename=f'/tmp/{input_data.get("uniqueID", "Unknown")}_{input_data.get("file_name", "Unknown")}'
    writer.write(tmp_filename)
 
    return tmp_filename

@kyl293
Copy link
Author

kyl293 commented Mar 28, 2024

I just tried your code and it did not fix the issue unfortunately. The policy number is still not complete on firefox

@pubpub-zz
Copy link
Collaborator

this is my test code:

from pypdf import PdfReader, PdfWriter

reader = PdfReader("insert_id_card.1.pdf",strict=True)
writer = PdfWriter(clone_from=reader)

f = reader.get_form_text_fields()
for k in f:
    f[k]="!!!"+k+"!!!"

writer.update_page_form_field_values(writer.pages[0],f,1,True)

writer.write("tt.pdf")

and the result:
image

@kyl293
Copy link
Author

kyl293 commented Mar 28, 2024

this is my test code:

from pypdf import PdfReader, PdfWriter

reader = PdfReader("insert_id_card.1.pdf",strict=True)
writer = PdfWriter(clone_from=reader)

f = reader.get_form_text_fields()
for k in f:
    f[k]="!!!"+k+"!!!"

writer.update_page_form_field_values(writer.pages[0],f,1,True)

writer.write("tt.pdf")

and the result: image

Try making the Policy Number absurdly long. When that is done the Font size stays the same on Firefox. This happens with every PDF I generate. So for other forms that have tighter boxes the cut off is much smaller.

Here is what it looks like on chrome using your code snippet:
image

And here is what the PDF looks like when I open it in Firefox:
image

Policy Number get's cut off.

@pubpub-zz
Copy link
Collaborator

The points deals with the font size.
in the DA field (default appearance initialisation) you can see that the size (number before Tf) is set to 0 : this means the font size is dynamic. if you look at the pdf1.7 reference page 435 you will read:
image

From my interpretation Firefox is in accordance with the standard.

Further, pypdf does not have yet a capability to change the height of the font (see #2064 still in progress).
meanwhile you should be able to compute manually the proper font size and substitute the number before Tf in the annotation content.
Once done you should not ask the appearance to regenerated, using auto_regenerate=False

@pubpub-zz
Copy link
Collaborator

@kyl293
any update ?

@pubpub-zz
Copy link
Collaborator

I close this issue as solved. Feel free to send update to reopen-it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
workflow-forms From a users perspective, forms is the affected feature/workflow
Projects
None yet
Development

No branches or pull requests

3 participants