General Technical Discussions for Service

Get Involved. Join the Conversation.

Topic

    Chris Rogers
    Safely triggering async CPM retryAnswered
    Topic posted October 17, 2017 by Chris RogersBronze Medal: 1,250+ Points 
    152 Views, 3 Comments
    Title:
    Safely triggering async CPM retry
    Content:

    Hello



    I have an async CPM that sends some data to an external system. However this external system isn't 100% reliable, so I would like to make use of the built in Oracle async CPM retry system. (EG retry after 90 seconds. then 450 seconds etc as per the documentation )



    This topic says that I can do this by throwing a CPMException, easy enough.



    My question is can/how do I work out how many times it has retried?



    I'd rather not cause the CPM to error and stop running if the external system is down for a little bit because I have thrown the exception too many times, as there are no alerts that this has happened from Oracle, so it may not be noticed for a couple of days.

    Best Comment

    Chris Rogers

    An update after many iterations this is what we now do for Incidents (and the same principle for Contacts and other things), in case anyone else has this problem!

    We now have a new custom object (QueuedIncident), that holds the IncidentID, Operation, Failures, FailureMessage, LastRequeued (datetime) and LastFailed (datetime)

    The (synchronous) Incident CPM then becomes very simple, it just creates a new QueuedIncident, with the IncidentId and operation (create,update).

    We then have an asynchronous CPM on the QueuedIncident custom object. When it succeeds it deletes the QueuedIncident. If it fails it increments the number of Failures by 1, logs the time in LastFailed, and FailureMessage.

    We then have a quick custom script that queries for things that have failed, and haven't been requeued since it last failed. It updates the LastRequeued date, and this update triggers the above async CPM again to try again! We then asked Oracle to run this hourly. If something fails more than 5 times it stopped getting requeued, and is flagged in a report for manual intevention.

    We now have a very reliable system that can handle failures gracefully, and retry them automatically, and flag lots of failures for manual intervention!
    It has the other benefit of being able to run it on a specific incident manually by just manually creating the new QueuedIncident, or even using the import wizard!

    Comment

     

    • Johnny Meehan

      Hey Chris, 

      Here are some options to consider: 

      1. Track exceptions persistently in a custom field or custom object. For example, you could have an integer field on your object that you increment each time you hit the exception. This would give you visibility into the historical number of failures for that specific object, which makes it possible to decision on whether to continue to throw the exception or to take some other action for future occurrences. Bear in mind, you may need to call an explicit commit if you're setting a field on an object in a script that's throwing an exception, since throwing the exception may cause object changes to remain uncommitted. 

      2. If this is an incident object, you can put it in an escalation loop using business rules that call your CPM every X minutes/hours/whatever. Your CPM could then take some action (i.e. setting a field) to cause the incident to be removed from the escalation loop when the data push is successful. Note that setting up an escalation loop can be a little tricky, and only works for objects that support escalation rules.

      3. Use the Public Mail API to send yourself an email alert when the exception occurs. Similar to option 1, you might have to use an explicit commit to get the mail message to actually send if your CPM is throwing an exception. You'll have to experiment with this.

      Hope this helps.

    • Chris Rogers

      An update after many iterations this is what we now do for Incidents (and the same principle for Contacts and other things), in case anyone else has this problem!

      We now have a new custom object (QueuedIncident), that holds the IncidentID, Operation, Failures, FailureMessage, LastRequeued (datetime) and LastFailed (datetime)

      The (synchronous) Incident CPM then becomes very simple, it just creates a new QueuedIncident, with the IncidentId and operation (create,update).

      We then have an asynchronous CPM on the QueuedIncident custom object. When it succeeds it deletes the QueuedIncident. If it fails it increments the number of Failures by 1, logs the time in LastFailed, and FailureMessage.

      We then have a quick custom script that queries for things that have failed, and haven't been requeued since it last failed. It updates the LastRequeued date, and this update triggers the above async CPM again to try again! We then asked Oracle to run this hourly. If something fails more than 5 times it stopped getting requeued, and is flagged in a report for manual intevention.

      We now have a very reliable system that can handle failures gracefully, and retry them automatically, and flag lots of failures for manual intervention!
      It has the other benefit of being able to run it on a specific incident manually by just manually creating the new QueuedIncident, or even using the import wizard!

    • Suresh Thirukoti

      Nice info from both of you..Chris & John...yes