The XtremeDocumentStudio Delphi webinar replay is now available...

The XtremeDocumentStudio Delphi webinar replay is now available...
https://youtu.be/kAxH0KMgovo
https://www.youtube.com/watch?v=kAxH0KMgovo&feature=share

Comments

  1. Thanks Joe C. Hecht for attending the webinar. The answer...
    (a) Tesseract
    (b) Currently, in PDFtoolkit you can work with existing PDF documents and this is what you can do:
    - With regard to Form Fields: retrieve by name and index, fill, edit values and properties, delete, flatten and insert new fields.
    - With regard to Annotations: retrieve, insert. Other CRUD operations are possible by the engine, but not all are exposed.
    - With regard to Text on page: retrieve all text elements with granular detail such as widths, position, font, internal text-out breaks. Add text elements on existing page. So technically you can retrieve, change element text, etc and write back to a new document/page.
    In XtremeDocumentStudio we have given it a fresh look and there's a brand new set of APIs. These APIs are not exposed yet, but internally it is now possible to retrieve the PDF elements according to the physical structure of a PDF as processed and raw data - XRef table, Catalog, content stream... So we will expose this in future along with editing and writing capability.
    (c) Editing as such is only supported as I explained above. With XtremeDocumentStudio, right now you can convert PDF to a flow based RTF document, edit the RTF and write it back to PDF. We will have a more seamless way in future.
    (d) Are you talking about processing embedded PostScript content such as colorspace information etc.? This is being worked on now.

    Thanks for asking the questions ;)

    ReplyDelete
  2. Good answers (to good questions). Thank you for your answerers Girish Patil !

    The PDF engine developer community is very very small, so it is a rare delight to find others qualified in this area to speak with.

    My question about the PostScript® interpreter's compliance was level based (Level 1, Level 2, Level 3...).

    If one is to write a PDF viewer, there are two ways to go about it:

    PDF was designed as container for PostScript® pages. By definition of the requirements of "a conforming viewer" (as set forth in the ISO specification) one would need a level compliant PostScript® interpreter to attempt to achieve that (lofty and likely unachievable) goal.

    Not all viewers / parsers / tool-kits have a level compliant PostScript® interpreter.

    I believe that is the first place to start in making determinations concerning a given engine, it's design, and it's capabilities.

    It is a double edged sword (these days):

    (a) Not having one guarantees compliance issues (by design).

    (b) Having one guarantees the engine can exploited, and likely turns applications that use it into a security risk.

    The number of "one line" code execution exploits published for PDF and PostScript® engines are amazing.

    Sadly, this can cause serious issues on UNIX based operating systems such (as Linux and OSX, and other "Air-print" systems), since they use CUPS (Common UNIX Printing System), and it speaks PDF / PostScript® at the core (using either the Apple or Ghostscript engines that are both level compliant).

    Funny, by design, a PDF engine is either a bad hack, or it can be hacked badly (or both I suppose).

    Know I mean no harm by that statement. It is just a funny fact that came up the other day when discussing PDF exploits with my lovely wife Lynda.

    Somewhere, I have a very long "write up" detailing Apple's venture into creating a conforming engine, with all the juicy details of the problems they encountered along the way, and the design decisions they made. It is a very interesting (and valuable) read. I will offer to send it to you (if I can find it).

    As a side note, I really enjoyed seeing your document layout engine (with "inertia") demonstrated.

    I had wondered if the code had originated from an SDK I published in the early 1990's marketed under the trade-name "OptiScript"? (please take no offence to my pondering - I originally authored a fair amount of the graphics code and concepts in use).

    Your document layout engine sure looks like it has "the right stuff" (not many do).

    I could tell a lot about it's design from looking at how it zoomed and panned the document pages.

    A very good job indeed!

    It really made me smile to see that demonstrated :P

    My lovely wife Lynda took notice of your layout engine as well (and smiled too).

    Joe

    I am legally bound to say: PostScript® is a registered trademark of Adobe Systems Incorporated.

    Adobe Lawyers.... grrrr.

    ReplyDelete
  3. Glad that I can speak a bit about file formats and also be deeply involved in the development of engines for these formats. My first research and read of the specs on PDF and the office formats was about 20 years ago, when I was involved in writing our first creation engines for them. What I really have is a good big-picture understanding of them and a fairly decent low level understanding. So to get my updates on specific areas I know what questions to ask our engineers who regularly research on new areas of the spec.
    I will take a look regarding level compliance.

    I would be happy to read the "write up" you talk about,. Please do send it when you find it.

    The page layout options and specs for them came out part from wanting to fully support the page layout options in PDF (you know of course that the PDF format supports encoding how the viewer should layout/show pages, at least for the initial view). Then part from our imagination and experiencing needs ourselves. Lastly, with our interest and desire to make it flexible and clean at the same time, we made it the full set that we have today. We have more ideas and really they come from within ;) also the design and implementation for them is completely developed internally. If you want to take a look at it inside and critique it, please mail me.

    ReplyDelete

Post a Comment