Skip to content

Color-Of-Code/SafeRapidPdf

Repository files navigation

SafeRapidPdf

CI-Status

Action Status

Introduction

There is already a very good PDF parser and generator: itextsharp. But it doesn't focus on parsing and its licensing model makes it inappropriate for some purposes. This designed and developed from scratch library is provided under the liberal MIT license (Refer to details in the License section).

The focus of the library is on reading and parsing, not on writing.

The goals followed are:

  • parsing and analyzing PDF contents (virus check for example)
  • integrity of parsing (document scans from start to end gathering all objects)
  • no quirks, invalid PDFs are not parsed
  • allow extraction of text and images at a very low level

This library is not intended for following purposes:

  • rendering a PDF
  • modifying a PDF
  • generating a PDF

File structure

This library attempts to provide a quick and yet reliable parser for PDF files. It focusses on an integral parsing of the whole PDF into its primitive objects.

  • Strings
  • Numeric values
  • Booleans
  • Streams
  • Arrays
  • Dictionaries
  • Indirect Objects
  • Indirect References
  • Cross Reference sections

Document structure

The interpretation layer allows then a decomposition into pages and images among other high level objects.

  • Cross reference table
  • Root
  • Pages
  • Graphics
  • Text
  • Fonts

The library is not interested in rendering the PDF only the informative parts will be extracted such as the position and size of text and graphics for example.

Online resources

It is recommended to read the specification of the PDF language 1.7 for a deeper insight.

Testing

Unit tests are written in XUnit and code coverage is done thanks to Coverlet

# for vscode integrated report
dotnet test --collect:"XPlat Code Coverage"

# msbuild report
dotnet test /p:CollectCoverage=true

Authors

The SafeRapidPdf contributors:

  • Jaap de Haan (initiator)

License

The MIT license (Refer to the LICENSE.md file)

About

Library focussing on reliable PDF Parsing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages