Hi

Hi,
I'm looking for a library for creating a new HTML file that shows the differences between two given HTML files (only differences in visible content).

It would be good if I could influence the layout of highlighting the differences.
Are there any free or commercial libraries that can be used in Delphi XE2 (no DLLs or external tools)?

It would also help if somebody could direct me to related examples in other programming languages. I just need a good starting point.

Comments

  1. Just to be sure: Raw HTML or the HTML rendered?

    ReplyDelete
  2. raw HTML and only the visible content. I don't need any info about changed meta data or changed formatting.

    ReplyDelete
  3. Oh I didn't see this among all the Google results spam.
    This could be already the starting point I'm looking for.

    But any hints for existing Delphi libraries are still appreciated.

    ReplyDelete
  4. I think you should reconsider what "raw" html means, you can have css and js within it, maybe load HTML DOC #A and HTML DOC #B within a html renderer and then compare nodes? just my two cents....

    ReplyDelete
  5. Could you provide more detailed description of comparison result?
    Using my library, I can easily  obtain two lists of visible (text) elements and compare them

    ReplyDelete
  6. A workaround to make sure you get "only visible content" just occurred to me -- at least if you're on Windows: Use the clipboard to copy-paste the content of an HTML renderer into a TMemo or similar component. Well, if you want only the textual content that works.

    ReplyDelete
  7. some more info:
    Dorin Duminica the HTML to be compared does not load any other content. CSS is used for styling only; scripts may exist but can be ignored.
    Alexander Sviridenkov The HTML pages to be compared have only a very simple layout with some styles applied to the basic tags via CSS and the pages contain normal paragraphs of text (maybe enclosed in divs), bullet lists, tables and images.
    The changes to be made visible are adding, changing and removing text, lists, tables and images. Changing the format (bold, italic, font in general) of text should be ignored. The final result should show the newer version of an HTML document plus optional highlighting of changes compared to an older version of the same document.
    The ideal solution would be that the HTML diff function to be used encloses the changed content in for allowing the style to be used for highlighting the changes to added to a style sheet.
    Christian Conrad we did use this workaround in earlier versions of our software - for a different purpose. But the terminal server got "confused" with handling the amount of data in the clipboard between server and client. Unfortunately we couldn't find a way to prevent sending the clipboard content to the client for selected items.

    ReplyDelete
  8. Aha, Fred Ahrens, I see. I didn't realise there were servers and clients involved at all -- you never said that, so I thought you were comparing two local files.

    ReplyDelete
  9. As a preliminary solution we have now embedded the html-diff lib from https://code.google.com/p/html-diff/, as recommended by Lars Fosdal 
    But I still would prefer a solution that we can compile into our program, as I usually refuse to call and rely on any external programs - you never know who might replace this with some other programs ...

    ReplyDelete

Post a Comment