Does this make sense?
Does this make sense?
delete a
from CLIENT_HISTORY.DBO.BLOB_PROVIDER_ADDRESS_OUTPUT_HISTORY a
inner join CLIENT_HISTORY.DBO.BLOB_PROVIDER_XREF_PREP x on x.INDY_ID = a.INDY_ID and a.AMS_BATCH = 283 and a.SRC_CD = 1059
where x.AMS_BATCH = 283 and x.INDY_ID_END_DATE is not null
Delete on an inner join? What the heck does that even mean?
delete a
from CLIENT_HISTORY.DBO.BLOB_PROVIDER_ADDRESS_OUTPUT_HISTORY a
inner join CLIENT_HISTORY.DBO.BLOB_PROVIDER_XREF_PREP x on x.INDY_ID = a.INDY_ID and a.AMS_BATCH = 283 and a.SRC_CD = 1059
where x.AMS_BATCH = 283 and x.INDY_ID_END_DATE is not null
Delete on an inner join? What the heck does that even mean?
if you need to go to a linked table to specify the criteria for selecting a record to delete, it makes sense.
ReplyDeletePseudo:
ReplyDeletea and x are linked through INDY_ID
if (a.AMS_BATCH = 283) and (a.SRC_CD = 1059) and (x.AMS_BATCH = 283) and (x_INDY_ID_END_DATE <> NULL)
then delete(a)
Thanks -- I ask because it is taking FOREVER to do it. I mean, there are about 10,000,000 records, but it's been two hours......
ReplyDeleteProbably a missing index on one or both of the tables. What DB?
ReplyDeleteA second thought.. if you just write it as a select, does it take as long?
ReplyDeleteif not - could it be that there is a cascade delete from records in table a?
Lars Fosdal Yeah, RI such as a cascading delete absolutely kills these types of updates.
ReplyDeleteThis is sql server syntax, I believe. the "a" after the delete indicates which table to delete from and in this case table "a". It will delete all records in "a" that match the criteria in the join.
ReplyDeleteMultiple things spring to mind:
ReplyDelete- the statement looks like an ansi join so this is just a glorified where clause.
- the filter on non null values probably throws the optimiser off balance and forces sequential scans.
- doing such opperations if more than 10% of the data is deleted , will kill your performance since your db will be rebuilding indexes for 90% of the time.
For a bulk cleanup it is best to split the operation into:
- flagging the records for deletion. There you will have your join.
- suspend indexing and integrity for the table.
- delete flagged records.
- enable indexing and constraints.
Not all dbms give executionplans for dml, just select statements. Ask your dba to get the session statistics .
ReplyDeleteOn what db engine are you running.
ReplyDeleteIt's a typical Ms SQL Server syntax , the problem is massive deletes can last forever. You can use Management Studio from SQL to optimize your select statement.
ReplyDeleteCheck when running your statements what kind of low level db operations take up most of the time. Just like instrumenting code. Act on the bigest badest timeconsumer, which I bet to be either sequential scans or index rebuild.
ReplyDeleteDon't assume the behaviour of the database , get your facts and act on them.
Working in batches assumes you have an issue with your roll forward/ rollbacks . If so you can temporarily change the logging mode for your bulk operation.
ReplyDeleteDeleting partitions is an option when your partitioning key is a static information, something like a creation date or another static element.