We are having D5 database application which we recently moved over to D2007. We were having a number of very strange issues in production with D5 version, AVs of all kinds, which never happen during development and testing - we hoped that moving to D2007 will eliminate at least some of them. But it turns out it has only added new ones. The application runs in either WIndows Server 2008 or 2012 terminal session, or Windows 8, or Windows 10.

We are having D5 database application which we recently moved over to D2007. We were having a number of very strange issues in production with D5 version, AVs of all kinds, which never happen during development and testing - we hoped that moving to D2007 will eliminate at least some of them. But it turns out it has only added new ones. The application runs in either WIndows Server 2008 or 2012 terminal session, or Windows 8, or Windows 10.

Is it possible that these issues are due to 32-bit compiler while in fact we are running the app in x64 OSes?

Any suggestions on how to make the app more stable?

Thank you.

Comments

  1. Looks like you've chosen the wrong way to solve the problem. Seems like you hoped that the problem was down to the compiler and its libraries. Far more likely that the problem is in your code.

    If you want to really tackle this you need to face up to that possibility. Go back to your D5 code. Who knows what new bugs you introduced in your port to D2007.

    Then debug your program. Introduce madExcept and gather debug information. Then solve problems.

    ReplyDelete
  2. I think more detail would be required to give some specific advice, but are you using MadExcept? If not, use it. It will certainly help point you in the right direction. Also, do you use FastMM? It has a number of options that can help you identify the source of AVs.

    ReplyDelete
  3. MadExcept or EurekaLog are an absolute must.

    ReplyDelete
  4. madExcept is there, as well as FastMM. Here is one of the reports...

    exception message : Access violation at address 40035031 in module 'Vcl50.bpl'. Read of address 00000000.

    main thread ($2700):
    40035031 +005 Vcl50.bpl Graphics TCanvas.GetHandle
    03d7989a +0f6 ip4000v5.bpl Wwframe wwDrawEdge
    03ccea65 +07d ip4000v5.bpl Wwdbdatetimepicker TwwDBCustomDateTimePicker.DoMouseEnter
    03cce976 +00e ip4000v5.bpl Wwdbdatetimepicker TwwDBCustomDateTimePicker.CMMouseEnter

    ReplyDelete
  5. Reading from a nil pointer or uninitialised object? Have you tried running this app using the remote debugger and putting a conditional breakpoint that checks for a null pointer dereference at the line where the error occurs? That should allow you to trace back to where the actual problem is occurring.

    ReplyDelete
  6. Martyn Spencer The problem is, all these issues never happen during development and testing....

    ReplyDelete
  7. OK, if you already have madExcept then you need to debug your program using those bug reports.

    Much as you'd like somebody to give you a magic solution here, that's not a realistic expectation. You need to solve this. We can't.

    Don't be tempted by anybody making speculative guesses and suggestions here.

    ReplyDelete
  8. I accept that. However, there is clearly a difference between your development and testing environments and the production environment. This is where the remote debugger can help isolate the issue.

    ReplyDelete
  9. The errors look like they're in the "woll 2 woll" components (the "ww" on compoent class names). Make sure you're using the component versions that match your IDE (versions for D2007, not D5). Make sure you don't have a mix and match of DCUs from that component set (D5 DCU's pulled into your D2007 project). Also I can't remember the Windows dll (ComCtrls ?) off the top of my head, but some components have dependencies on specific dll's. Hope that helps..

    ReplyDelete
  10. Here is the typical report...Any ideas what could be the cause?

    exception message : Access violation at address 00010A2F. Read of address 00010A2F.

    main thread ($3bb8):
    00010a2f +000 ???
    77ab0cb1 +021 ntdll.dll KiUserExceptionDispatcher
    77aaea0a +00a ntdll.dll NtFreeVirtualMemory
    74e1647a +02a KERNELBASE.dll VirtualFree
    40003e8a +002 Vcl50.bpl System @ClassDestroy
    40008a20 +010 Vcl50.bpl System @IntfClear
    40005a21 +0c9 Vcl50.bpl System @FinalizeArray
    40008a20 +010 Vcl50.bpl System @IntfClear
    40003e93 +003 Vcl50.bpl System @AfterConstruction
    00b9cfe1 +3dd Idls.exe Salescheck 655 +70 TfrmSalescheck.DoEdit
    00b9cbd5 +0ad Idls.exe Salescheck 577 +12 TfrmSalescheck.Edit
    00ba325d +041 Idls.exe Salescheck 2379 +2 TfrmSalescheck.dbgRelatedDblClick

    ReplyDelete
  11. Yeah, defect in your code due to invalid memory access, heap corruption, access after free, or something along those lines. Debug it. You can see your source code. The bug report leads you to the invalid memory access. Try to hypothesise how that could occur.

    At some point you will realise that guessing (you or us) isn't going to be productive, and you are going to have to do this yourself. I recognise that it is hard, and that you probably don't have the experience and techniques to do this yet. You will have to learn. There are no shortcuts.

    For sure ask us for advice on how to debug, but you are barking up the wrong tree when you ask us to diagnose the problem.

    ReplyDelete
  12. As far as I remember, Vcl50.bpl is a delphi 5 runtime library.
    In your Project > Options > Packages > Runtime Packages, is Link with runtime packages is set to true? If it is, turn it to false, and try again.

    ReplyDelete
  13. Hichem BOUKSANI Why? Runtime packages work fine. All this guesswork is pointless. Is that really how you go about programming.

    ReplyDelete
  14. These kind of errors are why I frequently use FreeAndNil in UI code. Hence, my first step would be to check that any variable which is created/destroyed or set from a source which has a temporary life span - always is set to nil when no longer in use.

    It ensures that those "WTF, The Variable is NIL Eureka(log) Moments" are not confused with other reasons for access violations.

    The reason you then go out with a NIL AV, is because you have code that does not initialize in the right order, or that have events that wrongly assume there is something present that they can work with.

    ReplyDelete

Post a Comment