A recent incident, however, gave me a different reason to look at this data: We were concerned with what was driving the logging subsystem so heavily in a given timeframe.[1] This is because there were knock-on effects on other jobs.
It’s as good an opportunity as any to alert you to two useful fields in DB2 Accounting Trace:
- QWACLRN - the number of records written to the log (4-byte integer)
- QWACLRAB - the number of bytes logged. (8-byte integer)
I was interested in logging volumes - in gigabytes.
Each 101 (IFCID 3) record has these fields so it’s quite easy to determine who is doing the logging. What is more difficult is establishing when the logging happened:
- Yes, the SMF record has a time stamp, marking the end of the “transaction”.
- No, the records aren’t interval records.
- Asked the customer to send data from the beginning of the incident to at least an hour after the incident ended.
- Rather than reporting at the minute level, I summarized at the hour level.
What we found was that a small number of “mass delete” jobs indeed did well over 90% of the logging (by bytes logged) - and they started and stopped “right on cue” in the incident timeframe.
In this case I modified a DFSORT E15 exit of mine to process the 101s, adding these two fields. I then ran queries at various levels of time stamp granularity.
These two fields might “save your life” one day. So now you know. And it’s another vindication of my approach of getting to know the data really well, rather than having it hidden behind some tool I didn’t write. And I hope this post helps you in some small way, if you agree with that proposition.
- This is from an actual customer incident, which I’m not going to describe. ↩
- Fairly obviously even an hour might not have been enough. So you might argue I got slightly lucky this time. I’d’ve asked for another hour’s data if I hadn’t, so no real risk. ↩
Source: https://www.ibm.com/developerworks/comm ... ng?lang=en