Hello

Hello,

A new PDF Reader based on PDFium.DLL and inspired by PdfiumLib :)

https://github.com/tothpaul/PDFiumReader
https://github.com/tothpaul/PDFiumReader

Comments

  1. Attila Kovacs not sure where are the 6 times ... but I've spent more time to handle multi page support than drawing optimisation ;)

    each time the size or scale change, LoadVisiblePages check which pages are visible on the screen and only these pages are drawn (regardless of the DC clipping indeed).

    ReplyDelete
  2. Attila Kovacs ok, now only 1 Paint :) it was the scrollbar Smooth property and an extra invalidate.

    ReplyDelete
  3. Nice job Paul TOTH

    I found the text rendering to be significantly inaccurate. As much as 22% off.

    I have seen plenty of that from PDFIum blobs (dlls) in the wild.

    This PDFIum blob. It is about a year old?

    An exploit affecting the most recent commercial version of PDFium version went wild least week. FoxIt quickly patches their commercial library.

    This is from some version of the free google source, and not really maintained for ongoing security issues?

    PDF's are a highly used attack vector. There are quite a few.

    Curious as to how how well this blob might hold up against such stuff.

    And the source of the blob? I see DLL's, but no source to the DLL.'s.

    A custom build of open source?

    Who vetted it?

    Not trying to be a pain, but these are reasonable questions to ask about a seemingly closed source blob that is is exposed to a very a high threat level.

    Joe

    ReplyDelete
  4. Paul TOTH I know where the dll is from. That is obvious enough.

    Does that lend anything to the issue? What does that mean (or imply)?

    A closed source binary blob build of an open source project, seemingly modified, out of date, with very possible exploits in the wild, no way to patch?

    It's very cool, but is it reasonably safe to use?

    I have played with it, (and others).

    I have coded PDF engines for a living for most of my coding life.

    I have doubts.

    Not that I think anything evil is going on, but simply because it breaks all the rules of reasonable and minimal security practices for a file format that is pretty much at the top of the exploit list.

    It is certainly an issue worthy of consideration.

    I'm just asking your thoughts, not trying to rain on your parade.

    I am right there with you in making use of the free version of the pdfiuim code, but I think it requires patachable source, someone to patch it, and a front end pre scanner to reasonably live up to today's threat model (and perhaps get some of the render correct).

    Its a nice little library. Super easy to call, and super easy to get some high level work done. Google did well in acquiring the code (and Fox-it did well by keeping a commercial version to sell that deals with those issues).

    The code floating around n the middle of all that? I'm not so sure about it.

    Joe

    .

    ReplyDelete
  5. Joe C. Hecht, interesting. "PDFium", is that what Fox-it uses in it's "Quick PDF Library" and the same as what you mention when you wrote "FoxIt quickly patches their commercial library"?

    TIA!

    debenu.com - Quick PDF Library | Powerful PDF SDK

    ReplyDelete
  6. Dany Marmur You are welcome!

    Thanks for asking. I am highly vested into this, and love the discussion.

    I think this is the best answer I can give:

    PDFIum libs are not all equal. For starters, you have two source bases:

    (a) The FoxIt "Commercial Version".

    (b) The "free" source that Google aquired and opened.

    Any given PDFIum binary would be made from one of the two, plus any number of changes, updates, patches, and custom modifications.

    So far, FoxIt has seemed quick to post indications that they have patched the source (the one used in their reader? more?) for the given exploits I have seen, and has done a good job at minimizing any perceived effect of a given exploit on thier products.

    This would of course, not effect the Google free code version (now separately "community" maintained).

    Highly worth noting: It is extremely likly that any PDFium code used in Google products would likely be separately maintained, patched, and modified, and perhaps based on only some the free source they squired

    I am very familiar with the "Quick PDF" library source (going way back - pre version 6 I believe), and even participated in it (to a very very small degree). It is a very fine library in regards to it's extensive high level functionality.

    FoxIt acquired "Quick PDF" somewhat recently, and, very much worth noting, "Quick PDF" changed it's back end render system (more than once).

    I have yet to get my copy of the recently released version 14 source (nagging for it on my to do list). QuickPDF is not something I have had much time to play with recent versions (I have my own PDF library to deal with). But I have most all of them (except version 14 at the moment).

    Sight unseen, my best guess based on previous versions is "as a source code customer", you will not get much in the areas of dependencies (PDFIum?). I hope I am wrong on that. Previous versions have gotten better in regards to the shipping 3rd party source. I am prettty sure I will not have to stand corrected in saying buying the "100% source code" applies to the QuickPDF source, and not always to the 3rd party libs it uses. (reasonably fair I think).

    That said, if you find something looking like a PDFIum.dll in the distro (noting that you may[?] have look at runtime temp files), the only way to tell if the DLL actually matches some other PDFIum.dll (that may or may not be patched) would be to do a binary compare.

    I find it unlikely that a recent "Quick PDF" version (13 or 14?) would be using one of the freely found Google based binaries floating around on the net. Further, it would seem likely they may not even use a "100%" equal Commercial" version (IE: perhaps internal builds?).

    Hope that helps, and all sounds
    "reasonably reasonable".

    Joe

    ReplyDelete
  7. Joe C. Hecht - i feel we need to get in touch. I'll check out how tomorrow.

    ReplyDelete
  8. Dany Marmur please feel free to contact me, either though google, or one of my email gateways, Try this lame little link:

    uberpdf.org - The ÜberPDF.org Email Gateway V-1.2

    ReplyDelete
  9. Joe C. Hecht I don't want to hijack this post. But I need to ask if you know of a free source code or lib to use in Delphi to sanitize (that ignore the dangerous stuff: links, scripts) and only display the PDF, if it can print the better.

    I want to integrate a little pdf viewer to display invoices, we receive a lot of them as pdf. And I'm very concerned about the malware that can be inside a pdf file. I'm not a security expert. I just want to play safe for my local users and forget the outdated external pdf viewers.

    As you said, a .dll without patches is bad. Maybe is possible to disallow/sanitize the usage of links and scripts via parameters for the .dll showed by Paul TOTH?

    Regards

    ReplyDelete
  10. Mr. E ! Outside of my own code, I am unaware of any source that does that (some anti virus packages do).

    Bet yes, that is how it would be done (pre-scan and fix-up).

    You can either pre-process the whole file, or alternatively, a bit at a time, and hand extracted pages and resources to your view edit code (a bit more involved and resource intensive, but may provide other advantages and finer levels of control).

    One thing to consider is not all PDF libs will be susceptible to the same exploits.

    The format itself is highly exploitable, but exploits are likely be targeted to a given lib or product.

    Joe

    ReplyDelete
  11. Joe C. Hecht I'm reading some of your post in ÜberPDF group. Seems you're working on a commercial product, right? (or is opensource?)

    I don't want to compare it to the MORMot project, from ther I use the code to create some "simple" pdfs reports (without fancy scripts and links) and convert invoice's xml to pdf for easy reading/printing.

    Now, I need to cover the display of pdf files, and so I'm searching for a little viewer to integrate to our local needs.
    Right now, for me, I use the Sumatra PDF reader (don't know if if possible to use and disable the scripts and links, need to ask). I can try to shell/open the pdf with that program and I'm confident to not have malware in my own pdfs, but the received ones scares me... a lot. Almost daily we receive scams with pdf,docs attachments, I'm pretty sure that they have links to download malware-ransomware-trojans. The Antivirus seems to stop them... but when it fails BOOM!.

    ReplyDelete
  12. Mr. E ! I plan to have both "community" and "commercial" versions of the product(s), with no source that will be "closed".

    I have issues with the term "open source", since it means different things to different people.

    To me, "open source" means you get (all) of the source (no BLOBS), and that has nothing to do with a possible associated cost, where "free" might imply "free beer", and not rights.

    You are wise to consider the possibility of malware in PDF files. Indeed, PDF is a very exploitable file format. That is especially true under Windows, where display involves bitmaps (and it is easy to get kernel privileges). Fonts and scripts are another area of concern. and buffer overflows are very easy pickings.

    ReplyDelete
  13. Mr. E ! it seems that PDFiumViewer provide a light version of the DLL without V8 (the Javascript engine for what I know).
    An other nice PDF lib is muPDF (https://mupdf.com/) see github.com - blestan/lazmupdf

    ReplyDelete
  14. muPDF is pretty good. Depending on the project, licensing is usually the hard part of using a given PDF engine.
    http://artifex.com/licensing/

    ReplyDelete
  15. Thank both Joe C. Hecht & Paul TOTH, I need to check the github project. By the examples in the licensing, I feel that it suits us:

    * Use Ghostscript to translate old PostScript files into PDF files for use in generating internal reports or documents (within organization).
    * Use Ghostscript within an IT department for internal uses only.

    All is for internal use, not for sale.

    ReplyDelete
  16. Take a look on this example "Draw PDF without external library", d.hatena.ne.jp - 全力わはー (Japanese).

    ReplyDelete
  17. Seva Minkovich I manage to translate using G.T.

    I wonder how safe it is, being an API from MS... Now if the API has a switch to disable scripts and related. It will be perfect.

    OK, now I remember why I use external and "simple" viewers. =:O

    _ MS16-012 (Critical)
    ...in the Microsoft Windows PDF library where a remote code execution situation could arise if certain Microsoft PDF API calls are not properly addressed...

    MS-ISAC ADVISORY NUMBER: 2017-023
    DATE(S) ISSUED: 03/14/2017
    OVERVIEW: A vulnerability has been discovered in Microsoft Windows PDF Library, which could allow for remote code execution..._

    But, by all means: Thank you! I'll give it a try on the units provided.

    ReplyDelete
  18. Two small bits of wisdom from my many years of doing PDF viewers and creators, for the benefit of those following the post (I am getting a lot of sideline questions).

    (a) Licensing

    I think Artifex has done a very good job of posting an interpretation of the licensing terms.

    Not all do.

    Often, the posted "interpretation" of the "intent" of license may vary substantially from the actual license.

    Arguably, the license is what really counts.

    The legal advice I have solicited found some of the commercial libraries to be very unusable in regards to distribution.

    The terms sure looked OK to me, but I am not a lawyer (thus why I pay them).

    If your are going to distribute a viewer, and the license is not standard, it is wise to get legal advice.

    PDF view engine distribution is not very easy (as demonstrated by the very high priced custom licenses that sometimes involve royalties).

    (b) OS View Engine choices (and more legal issues).

    Engines provided by the operating system can be a good option (for some uses).

    Inaccuracies in those engines may be key, particularity in regards to (for example) legal or other documents where the accurate display of content may be paramount.

    One example would the Apple engine, where I have found it very easy to add valid content that does not display on the Apple engine (and some are known long time issues).

    The MS "Reader" is in my opinion, is absolutely horrible in most every regard. It is an option, but usually, not a very good one.

    The point I am making is, (as I understand from the advice I have paid for), if your app is displaying "content of consequence", you may be on the hook for accuracy.

    "Content of consequence" can be pretty wide. Certainly, legal documents qualify, as do things like medical reports. But were does all that end?

    It is easy to see where (for example) a real estate contract that did not view correctly could come back to bite, and also, difficult to imagine getting bit for some missing data from a grid that causes some sort of miscalculation for someone.

    Most is mitigated in your own license clauses ("do not use this software for running a nuclear power plant").

    Still, these points may be worthy consideration (and a lot of testing).

    Just my 2 cents.

    Joe

    ReplyDelete
  19. Joe C. Hecht Mr. E ! It is worth noting that scripting in PDF is only a very small portion of the total PDF attack surface.

    Certainly, the disabling of scripting is very prudent (and something I encourage).

    In my experience, so far, the Windows platform probably has the widest attack opportunities, but I do see others posted on occasion.

    In the mean time, I am spending a lot of time dealing with the associated privacy leakage concerns of viewers, ( mostly affecting business, military and and government concerns).

    ReplyDelete
  20. Joe C. Hecht I appreciate it your comments and tips. I'll take in account all given here.
    Regards.
    (G+ need a feature for favorite notes, to keep at reach interesting posts.)

    ReplyDelete

Post a Comment