Automatically restarting a batch job w/o human intervention

ironponygrl · Post by **ironponygrl** » Sat Jul 11, 2015 2:04 am

I have 600+ IBM z/OS batch jobs. The restart instructions for about 10% of them is simply, 'restart in abending step up to 5 times regardless of the abending step'. 'abending' usually means anything that is not a zero return code, not necessarily a true abend.

I got another annoying job this week that needs this ‘generic’ restart because it crashes with contention (-911) once or twice a week. Operations calls the programmer (me); all I do is tell them to restart in the step it went down in. Operator documents it all in a incident ticket and then I have to respond and close the ticket. Unnecessary human intervention if we could figure out a way to exploit some system software to get around it.

I'd rather not have to modify all the JCL or set up anything that is 'per job' if I did not have to.

The jobs are scheduled through CA7 and restarted via CA11. Is there a way to tell one of these two products that I want to define some generic 'property' to do my bidding that can be assigned to any number of jobs without being a custom version for each job? I'd like to get the humans out of the middle of it unless the job has already been restarted 4 times.

I'd rather not have to modify all the JCL or set up anything that is 'per job' if I did not have to.

I had another population that were just 'force complete' so I set up a generic ARFSET for that action, but I'd like to exploit workload automation to handle the auto restarts too. I dont know that ARFset is the right feature for what I want to do.

I'd rather not have to modify all the JCL or set up anything that is 'per job' if I did not have to.

Any suggestions from anyone?

Post by **nicc** » Sat Jul 11, 2015 2:55 pm

Any suggestions from anyone?

Yes, do not post in multiple forums at the same time. We would rather not read the same topic in the multiple places that we hang out.
(And I am not reposting my response from the first forum that I read this.)

Post by **Anuj Dhawan** » Sat Jul 11, 2015 4:44 pm

I have 600+ IBM z/OS batch jobs. The restart instructions for about 10% of them is simply, 'restart in abending step up to 5 times regardless of the abending step'. 'abending' usually means anything that is not a zero return code, not necessarily a true abend.

This you need to take to the Business after some investigation from application team side. Yo ucan definitely investigate on if the job should really abend in case of a non-zero return code or can there be other acceptable approach to it. For example, a group of business user is sent an e-mail detailing about the non-zero return code and its reason.

CA11 is an excellent tool for restart but the way you've descried it's use at your shop, it sounds very awkward. There should be a reason behind 5 time restart, perhaps thoese instructions are put in by some rookie.

I got another annoying job this week that needs this ‘generic’ restart because it crashes with contention (-911) once or twice a week. Operations calls the programmer (me); all I do is tell them to restart in the step it went down in. Operator documents it all in a incident ticket and then I have to respond and close the ticket. Unnecessary human intervention if we could figure out a way to exploit some system software to get around it.

SQLCODE= -911 is a difficult animal to deal with and you've to really sit down pulling your sleeves up to check and tune the application. SQLCODE -911 means that the current "UNIT OF WORK HAS BEEN ROLLED BACK DUE TO DEADLOCK OR TIMEOUT". It'll also give you the a message of type REASON reason-code, TYPE OF RESOURCE resource-type, AND RESOURCE NAME resource-name in the SYSOUT of failed job.You've to start investigation with the resource names involved. Usually a proper scheduling solves this problem.

Mainframe, MVS and zOS Discussion

Automatically restarting a batch job w/o human intervention

Automatically restarting a batch job w/o human intervention

Re: Automatically restarting a batch job w/o human intervent

Re: Automatically restarting a batch job w/o human intervent

Create an account or sign in to join the discussion

Create an account

Sign in