Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF accessibility check #1260

Open
nippoolg opened this issue Jun 7, 2021 · 27 comments · May be fixed by #1449
Open

PDF accessibility check #1260

nippoolg opened this issue Jun 7, 2021 · 27 comments · May be fixed by #1449

Comments

@nippoolg
Copy link

nippoolg commented Jun 7, 2021

Question

Description

I'm checking PDF accessibility with Acrobat PRO, taking accessible.pdf from demo https://github.com/foliojs/pdfkit/blob/master/demo/accessible.pdf
The accessibility report tells that Tagged PDF check failed. If I open file properties Tagged PDF is NO. Looks like this setting is not set properly, when creating PDF with tagged: true option. Or what could be the issue?

Code sample

const doc = new PDFDocument({
pdfVersion: '1.5',
lang: 'en-US',
tagged: true,
displayTitle: true
});

Your environment

  • pdfkit version:
  • Node version:
  • Browser version (if applicable):
  • Operating System:
@blikblum
Copy link
Member

blikblum commented Jun 7, 2021

@insightfuls Can you look at it?

@paulwaitehomeoffice
Copy link

Just for the record, I'm also seeing this.

Code sample

const doc = new PDFDocument({
  autoFirstPage: false,
  displayTitle: true,
  info: {
    Title: 'My title',
  },
  pdfVersion: '1.7',
  tagged: true,
  lang: 'en',
});

Your environment

  • pdfkit version: 0.12.1
  • Node version: 12.19.0
  • Operating System: Alpine 3.11.6

@adrift2000
Copy link
Contributor

I've found 2 issues:

  1. The StructTreeRoot dictionary must be presented in the PDF even if there are no tags.
  2. PDFKit uses the wrong dictionary name to set up the Tagged PDF:

Pass the option tagged: true when creating your PDFDocument (technically, this sets the Marked property in the Markings dictionary to true in the PDF)

In the PDF Reference there is no mention of the Markings dictionary, it should be renamed to MarkInfo.

@dariux
Copy link

dariux commented Jul 8, 2021

Would it be possible to regenerate https://github.com/foliojs/pdfkit/blob/master/demo/accessible.pdf using the fixed accessibility code?

@nippoolg
Copy link
Author

Tagged PDF option is now fixed in master branch! Thank you! But there are still some failed checks...
IF I take now https://github.com/foliojs/pdfkit/blob/master/examples/kitchen-sink-accessible.pdf for accessibility check...

  1. Tagged Annotation check Failed. Seems like Link element causes the issue...
    Here is link with explanation: https://helpx.adobe.com/acrobat/using/create-verify-pdf-accessibility.html?trackingid=KACNN#TaggedAnnots

  2. Tab order check Failed.
    Explanation: https://helpx.adobe.com/acrobat/using/create-verify-pdf-accessibility.html?trackingid=KACNN#TabOrder
    It looks like tab order setting in page properties is 'Unspecified'. It should be set to 'Use document structure'.

Is there any way to fix it? Thank you!

pdfkit_accessibility_report

pdfkit_tagged_annotation
pdfkit_tab_order

@insightfuls
Copy link
Contributor

In the PDF Reference there is no mention of the Markings dictionary, it should be renamed to MarkInfo.

@adrift2000 Actually the PDF 1.7 reference mentions both, and it isn't clear which is correct. I can't remember now what led me to believe Markings was correct, and MarkInfo was not, but I guess if you have some validator that accepts MarkInfo but not Markings then that's at least some evidence that it is correct....

@insightfuls
Copy link
Contributor

@nippoolg

  1. Regarding the untagged link annotation, that's a known and documented limitation of the current implementation. I have some ideas regarding how to fix it, but have not had time recently to do so.
  2. This should be a pretty easy fix; we just need to add a Tabs entry to each page object with value S (name) to mean structure. I guess it makes sense to set it this way if/when there is structure present. (PDF 1.7 reference section 7.7.3.3 p79.)

@adrift2000
Copy link
Contributor

adrift2000 commented Jul 16, 2021

@insightfuls

In the PDF Reference there is no mention of the Markings dictionary, it should be renamed to MarkInfo.

@adrift2000 Actually the PDF 1.7 reference mentions both, and it isn't clear which is correct. I can't remember now what led me to believe Markings was correct, and MarkInfo was not, but I guess if you have some validator that accepts MarkInfo but not Markings then that's at least some evidence that it is correct....

I searched for the word “Markings” in the PDF Reference 1.7 and no entries were found.

@YairTavizon
Copy link

Hi, @insightfuls sorry to bother but there are any updates on your solution? :(

@nippoolg
Copy link
Author

I'm looking forward for Tab order fix as well! That sounds as pretty easy, right?

@jg-mms
Copy link

jg-mms commented Oct 4, 2021

I am also looking forward to it!

I also have two more questions/suggestions regarding accessibility of the PDFs:

  1. I was wondering why the BBox attribute is only allowed on Artifacts. As far as I know it should be possible to define BBox for images/figures that are no artifacts and are visible in the document accessible structure, like logos of official insitutions etc.
  2. I think it should be possible to define the "Scope" attribute on TH tags when creating a table (e.g. as "column").

Would it be possible to add these functionalities in the next release? Thanks in advance!

@Dress13
Copy link

Dress13 commented Dec 17, 2021

I am also looking forward to it!

I also have two more questions/suggestions regarding accessibility of the PDFs:

  1. I was wondering why the BBox attribute is only allowed on Artifacts. As far as I know it should be possible to define BBox for images/figures that are no artifacts and are visible in the document accessible structure, like logos of official insitutions etc.
  2. I think it should be possible to define the "Scope" attribute on TH tags when creating a table (e.g. as "column").

Would it be possible to add these functionalities in the next release? Thanks in advance!

That would be very important to create an accessible PDF. PAC3 always issues errors when creating tables because the scope = "col" cannot be passed.

Accessible PDF is becoming more and more important and PDF Kit is a wonderful tool to create them automatically.

Many Thanks :).

@dhysa
Copy link

dhysa commented Apr 14, 2022

@nippoolg

  1. Regarding the untagged link annotation, that's a known and documented limitation of the current implementation. I have some ideas regarding how to fix it, but have not had time recently to do so.
  2. This should be a pretty easy fix; we just need to add a Tabs entry to each page object with value S (name) to mean structure. I guess it makes sense to set it this way if/when there is structure present. (PDF 1.7 reference section 7.7.3.3 p79.)

Hi, @nippoolg for point number 2 could you explain it? Is it on code? I've struggled with the Tab Order issue on the accessibility checker from Adobe Acrobat. I couldn't find the documentation at pdfkit or am I missing something?

Thanks in advance

@insightfuls
Copy link
Contributor

Sorry I've been a bit silent. I guess I've been a bit buried under email for a few months. Let me see what I can do about some of these improvements. Please stay tuned, and don't hesitate to message me in a couple of weeks to remind me if you don't hear anything!

@dhysa
Copy link

dhysa commented Apr 19, 2022

Sorry I've been a bit silent. I guess I've been a bit buried under email for a few months. Let me see what I can do about some of these improvements. Please stay tuned, and don't hesitate to message me in a couple of weeks to remind me if you don't hear anything!

Thank you very much for the response, hopefully, the Tab Order issue has the chance to shine to be the next improvement and it will be really helpful.

@oscarbotteri
Copy link

oscarbotteri commented Apr 24, 2022

@nippoolg

  1. Regarding the untagged link annotation, that's a known and documented limitation of the current implementation. I have some ideas regarding how to fix it, but have not had time recently to do so.
  2. This should be a pretty easy fix; we just need to add a Tabs entry to each page object with value S (name) to mean structure. I guess it makes sense to set it this way if/when there is structure present. (PDF 1.7 reference section 7.7.3.3 p79.)

About point number 2, I think you are right @nippoolg. I have added the Tabs entry you mentioned in page.js:

this.dictionary = this.document.ref({
      Type: 'Page',
      Tabs: 'S',
      [...]
});

and now the Tab order check pass in Adobe Acrobat Pro. If you want @insightfuls, I can create a PR

@insightfuls
Copy link
Contributor

PR never hurts @oscarbotteri . My main concern is...do we want to unconditionally do this? Perhaps only if structure has actually been created and/or PDF is tagged?

@oscarbotteri
Copy link

nono, I did it just as an example.. because there was a user asking where to apply this change. Based on this:

Use Document Structure :
For tagged documents, moves in the tag order specified by the authoring application.

Note: This is usually the correct reading order and will be selected by default for tagged documents.

I agree with you @insightfuls this should be set to S value only when the tag is set to true.

Also I think we can add this as an optional options to the page.
So by default, if is not set, we will set Tabs: 'S' if tagged is true. but for any other reason, the user can customize this sending a new option like:

doc.addPage({tabs: 'whatever valid value'}).

I am not sure what are valid values for this new key. let me take a look and I will create a PR

@insightfuls
Copy link
Contributor

Liking the sound of it @oscarbotteri .

@acrollet acrollet linked a pull request May 23, 2023 that will close this issue
4 tasks
@acrollet
Copy link

I added a PR in #1449 per @oscarbotteri's solution because we need this functionality for PDFs to pass accessibility review.

@insightfuls
Copy link
Contributor

I also have two more questions/suggestions regarding accessibility of the PDFs:

1. I was wondering why the BBox attribute is only allowed on Artifacts. As far as I know it should be possible to define BBox for images/figures that are no artifacts and are visible in the document accessible structure, like logos of official insitutions etc.

It's because for Artifacts it's part of a marked content property list, which is fairly well supported (there are only a few of them, and only for Artifact and Span tags). Structure element attributes, on the other hand, are more complicated, much more numerous, and not supported at all (yet). It's hard to know whether to cherry-pick important attributes, supply a generic mechanism which can be used to define any/all attributes, or both. I think probably both would work best, with common cases (especially BBox, which matches the Artifact property) treated specially, but the calling code ultimately able to add any necessary attributes.

2. I think it should be possible to define the "Scope" attribute on TH tags when creating a table (e.g. as "column").

This is another structure element property. Perhaps another candidate for special treatment.

Would it be possible to add these functionalities in the next release? Thanks in advance!

I'll see what I can do.

@insightfuls
Copy link
Contributor

I added a PR in #1449 per @oscarbotteri's solution because we need this functionality for PDFs to pass accessibility review.

Thanks a lot, @acrollet . I've reviewed it. If you have a bit of time to address the review comment, that'd be great, or let me know and I can have a go. I have some personal stuff that needs procrastination at the moment, so the email from GitHub about your PR was welcome. ;-)

@acrollet
Copy link

I added a PR in #1449 per @oscarbotteri's solution because we need this functionality for PDFs to pass accessibility review.

Thanks a lot, @acrollet . I've reviewed it. If you have a bit of time to address the review comment, that'd be great, or let me know and I can have a go. I have some personal stuff that needs procrastination at the moment, so the email from GitHub about your PR was welcome. ;-)

I've been there 😄

@rnewhook586
Copy link

rnewhook586 commented Oct 3, 2023

I would like to bump this up - As per 508 requirements, WCAG 2.1 AA compliance is required for PDFs.

Both the browser demo and example on the site fail the acrobat accessibility checker: https://pdfkit.org/

  • Tagged PDF - Failed (Is this possible?)
  • Primary Language - Failed (Can we take the page html lang="XX" for this?)
  • Title - Failed (Maybe just take the title tag or the first h1 if a title is not present?)

Regarding the image issue described above, I have a suggestion:
Automatically grab the "alt" attribute in html img elements, then place in the /Alt property. If none are present, automatically mark it as Artifact. HTML Alt Attribute: https://www.w3schools.com/TAGS/att_img_alt.asp
I believe this is a better approach as this avoids doubling up on work.

Example:
<img alt="Some nice description of a thing">

/Figure <</Alt (Some nice description of a thing) >>

Note: An artifact is explicitly distinguished from real content by enclosing it in a marked-content sequence with the /Artifact tag (or /Artifact propertyList)
Optimally, we would also have control over making entire classes or IDs artifacts in an array or something.

If possible, maybe alert the render is going to place everything as Artifact in the console?
I would also say you should also read all ARIA-Hidden=True as an Artifact too.

Alt: https://www.w3.org/WAI/WCAG21/Techniques/pdf/PDF1
Artifact: https://www.w3.org/WAI/WCAG21/Techniques/pdf/PDF4

More General Info:
WCAG for PDFs Specifically: https://www.w3.org/TR/WCAG20-TECHS/pdf
508 Compliance: https://www.access-board.gov/ict/#about-the-ict-accessibility-standards
WCAG 2.1: https://www.w3.org/TR/WCAG21/

@andreiaugustin
Copy link
Contributor

I'm currently looking at implementing support for PDF/UA in our application and we make use of PDFKit. We are using PAC2021 to check for compliance.

@rnewhook586 not sure how much it helps as I'm mainly doing my stuff in node, but in its current state, PDFKit can cover a lot of the PDF/UA requirements for us. For example, when building the PDFDocument you can set tagged to true to enable the tagging on the PDF generated. Primary language as well can be handled via lang when constructing the PDFDocument. For example, this is what I'm currently using in my test PDF:

var pdfOptions = {
    autoFirstPage: false,
    modifying: false,
    copying: false,
    annotating: false,
    pdfVersion: '1.7',
    subset: 'PDF/UA',
    tagged: true,
    displayTitle: true,
    lang: 'en-GB'
};

The failing issue I have at the moment is a BBox on a Figure which I'm struggling to get working. Based off the work on Artifact, I've added the following:

// PDFStructureElement constructor

if (typeof options.bbox !== 'undefined') {
  data.BBox = options.bbox;
}
// PDFStructureElement _contentForClosure

const content = this.document.markStructureContent(this.dictionary.data.S, this.dictionary.data); // pass all the data, containing BBox
// const content = this.document.markStructureContent(this.dictionary.data.S);
// MarkingsMixin markContent

...
const dictionary = {};
if (typeof options.mcid !== 'undefined') {
  dictionary.MCID = options.mcid;
}

// changes start
if (tag === 'Figure') {
  if (Array.isArray(options.BBox)) {
    dictionary.BBox = [options.BBox[0], this.page.height - options.BBox[3], options.BBox[2], this.page.height - options.BBox[1]];
    }
}
// changes end

if (tag === 'Artifact') {
...

This does result in a BBox for the Figure:

23 0 obj
<< 
/S /Figure
/Alt (OmnisLogo )
/BBox [28.369759 28.369759 555.273975 174.377073]
/P 8 0 R
/K [8]
/Pg 11 0 R
>>
endobj

However, PAC2021 still fails on Logical Structure: "Figure" element on a single page with no bounding box :(

I add the image like so:

// Initialise document logical structure
var struct = doc.struct('Document');
doc.addStructure(struct);

struct.add( doc.struct('Figure', { alt: 'OmnisLogo ', bbox: [28.369759, 28.369759, 555.273975, 174.377073] }, () => {
  // picture object ident 1024
  om.clipRectangle(doc, 28.369759, 28.369759, 555.273975, 174.377073)
  doc.image('C:\\OmnisBuild\\11.0\\Debug\\output\\Omnis_x64\\omnispdf\\omnis_studio_13468_2_2.png', 473.124132, 40.366244, {width:88.491085, height:87.002863});
  doc.restore();
}));

I wonder if somebody has some ideas to point me in the right direction, perhaps @insightfuls when you have some time?

@Dress13
Copy link

Dress13 commented Jan 10, 2024

Hi, here is an example of how it works for me:

var imageSection = doc.struct('Document');
doc.addStructure(imageSection);

imageSection.add(
doc.struct('H1', () => {
doc
.font('xxx')
.fontSize(9)
.text('xxx ', 68.65, 65.29);
})
);

imageSection.add(
doc.struct(
'Figure',
{
title: '',
lang: 'de-DE',
alt: 'Bild',
expanded: 'Bild',
actual:'Render PNG'
},
() => {
doc.image(__dirname + '/../assets/images/test.png', 238, 42.466, {
width: 278.1
});
}
)
);

@andreiaugustin
Copy link
Contributor

@Dress13 thank you very much! It turns out, all I needed to pass (PAC2021 at least) was the actual option for the doc.struct('Figure', {alt: '', actual:'' }). This also works right away with my previous approach (without a section dedicated to the Figure).

To be honest, I don't understand how it passes since the bounding box it complained about is still not there technically. But hey, it passes and we have both the tag & the logical structure, so I think screen readers will be happy! Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.