zPerformance Give Me All Your Logging

Latest IT News - Mainframes or Non-Mainframes.
Post Reply
User avatar
Anuj Dhawan
Founder
Posts: 2802
Joined: Sun Apr 21, 2013 7:40 pm
Location: Mumbai, India
Contact:
India

zPerformance Give Me All Your Logging

Post by Anuj Dhawan »

Long ago I added reporting on DB2 log writing to our code. At the time it was just to understand if a particular job or transaction was “log heavy”. That is, I was interested in the job’s perspective, and whether it was dependent on a high-bandwidth DB2 logging subsystem.

A recent incident, however, gave me a different reason to look at this data: We were concerned with what was driving the logging subsystem so heavily in a given timeframe.[1] This is because there were knock-on effects on other jobs.

It’s as good an opportunity as any to alert you to two useful fields in DB2 Accounting Trace:
  • QWACLRN - the number of records written to the log (4-byte integer)
  • QWACLRAB - the number of bytes logged. (8-byte integer)
In this case I wasn’t really interested in the number of records. In other contexts I might well calculate the average number of bytes per record - because that can be tunable.

I was interested in logging volumes - in gigabytes.

Each 101 (IFCID 3) record has these fields so it’s quite easy to determine who is doing the logging. What is more difficult is establishing when the logging happened:
  • Yes, the SMF record has a time stamp, marking the end of the “transaction”.
  • No, the records aren’t interval records.
For short-running work this is fine. For long-running work units, such as batch job steps this can be a problem. To mitigate this I did two things:
  • Asked the customer to send data from the beginning of the incident to at least an hour after the incident ended.
  • Rather than reporting at the minute level, I summarized at the hour level.
The latter took away the “lumpiness” of long-running batch jobs. The former was enough to ensure all the relevant batch jobs were captured.[2]

What we found was that a small number of “mass delete” jobs indeed did well over 90% of the logging (by bytes logged) - and they started and stopped “right on cue” in the incident timeframe.

In this case I modified a DFSORT E15 exit of mine to process the 101s, adding these two fields. I then ran queries at various levels of time stamp granularity.

These two fields might “save your life” one day. So now you know. And it’s another vindication of my approach of getting to know the data really well, rather than having it hidden behind some tool I didn’t write. And I hope this post helps you in some small way, if you agree with that proposition.


  • This is from an actual customer incident, which I’m not going to describe.  ↩
  • Fairly obviously even an hour might not have been enough. So you might argue I got slightly lucky this time. I’d’ve asked for another hour’s data if I hadn’t, so no real risk.  ↩

Source: https://www.ibm.com/developerworks/comm ... ng?lang=en
Thanks,
Anuj

Disclaimer: My comments on this website are my own and do not represent the opinions or suggestions of any other person or business entity, in any way.
Post Reply

Create an account or sign in to join the discussion

You need to be a member in order to post a reply

Create an account

Not a member? register to join our community
Members can start their own topics & subscribe to topics
It’s free and only takes a minute

Register

Sign in

Return to “IT News.”