What is code coverage and how to measure it?

Post by **Dino** » Wed Jan 23, 2019 3:01 pm

Hi,

I was asked this, "What is code coverage and how to measure it?" I could not answer this but I did a search on it later and found this:

Code coverage is a measurement of how many lines/blocks/arcs of your code are executed while the automated tests are running.

Code coverage is collected by using a specialized tool to instrument the binaries to add tracing calls and run a full set of automated tests against the instrumented product. A good tool will give you not only the percentage of the code that is executed, but also will allow you to drill into the data and see exactly which lines of code were executed during a particular test.

Even after reading it I did not get it really. And do we use it in mainframes? I have not seen it used. Can someone please help?

Post by **Robert Sample** » Wed Jan 23, 2019 7:47 pm

A code path is how a program can be executed for a given set of data. Each IF statement, for example, generates two code paths -- one for when the IF is true, and one for when the IF is false. A nested IF statement therefore generates three code paths (one for first IF true and nested IF true, one for first IF true and nested IF false, and one for first IF false). Typical testing may only run through 20% of the code paths in the program due to poor selection of test data. (By the way, one of the WORST ways to get test data is to copy production data -- because production data has already been validated and hence is not likely to exercise any of the code paths that handle invalid data.) Code coverage is measuring how many code paths exist in the program, and how many of them are tested by the test data. Most programmers are really lousy at generating test data, by the way.

Code coverage can be measured by manual inspection (a slow, error-prone process) or by using an automated code coverage tool (which allows you to run the program and produces a report of how many code paths are exercised by the test data). There are tools available on the mainframe to do this (IIRC, STROBE from Compuware has a code coverage component to it).

Post by **prino** » Thu Jan 24, 2019 3:23 pm

I'm still using the the old PL/I Optimizing compiler, which allows compilation with the "COUNT,FLOW" options. In the end this produces a report of the number of times every statement is executed, which is also somewhat useful if you want to optimize code.

Strobe uses, as far as I remember, a statistical analysis, it does not require any compiler options.

Post by **Robert Sample** » Thu Jan 24, 2019 6:22 pm

Strobe uses, as far as I remember, a statistical analysis, it does not require any compiler options.

It's been a number of years since I used Strobe, but as I recall it requires the program be recompiled to insert hooks into the object module that allow recording of the execution counts for each paragraph / statement. There is a run-time cost, of course, so this wouldn't be done to programs moving into production.

Post by **Dino** » Tue Jan 29, 2019 10:12 am

Robert Sample wrote: ↑Wed Jan 23, 2019 7:47 pmA code path is how a program can be executed for a given set of data. Each IF statement, for example, generates two code paths -- one for when the IF is true, and one for when the IF is false. A nested IF statement therefore generates three code paths (one for first IF true and nested IF true, one for first IF true and nested IF false, and one for first IF false). Typical testing may only run through 20% of the code paths in the program due to poor selection of test data. (By the way, one of the WORST ways to get test data is to copy production data -- because production data has already been validated and hence is not likely to exercise any of the code paths that handle invalid data.) Code coverage is measuring how many code paths exist in the program, and how many of them are tested by the test data. Most programmers are really lousy at generating test data, by the way.

Code coverage can be measured by manual inspection (a slow, error-prone process) or by using an automated code coverage tool (which allows you to run the program and produces a report of how many code paths are exercised by the test data). There are tools available on the mainframe to do this (IIRC, STROBE from Compuware has a code coverage component to it).

Thank you Robert.

I understand your reply but what is the use of such an analysis? I mean if there is 10% code, say, gets very seldom executed, will that really make a difference in a production environment? Tools like STROBE are very expensive I heard. So will this be a good investment for such an objective?

Post by **Anuj Dhawan** » Tue Jan 29, 2019 10:22 am

This is an interesting thread. I've not seen someone using the "code coverage" as the measure of efficiency of a given code, at the shops I've been to. At some shops there had been exercises wherein they tried to figure out the dead code from the program by using REXX but that's a far cry from what "code coverage" is.

With STROBE, we'd request it to be applied on a production code which has been identified as a CPU hogger, it was for very small time, in prod and was expensive, as it was at run time...as Robert has said.

Post by **Robert Sample** » Tue Jan 29, 2019 6:30 pm

I understand your reply but what is the use of such an analysis? I mean if there is 10% code, say, gets very seldom executed, will that really make a difference in a production environment?

Code coverage analysis is quite useful when testing. If you develop a set of test data for a program, and you run your tests, do you think your boss will be very happy when you tell him / her that your test data exercised 20% of the code paths? In other words, your testing did NOT test 80% of the program statements -- so there is a good chance that promoting the program to production will cause problems.

Post by **prino** » Tue Jan 29, 2019 10:20 pm

Anuj Dhawan wrote: ↑Tue Jan 29, 2019 10:22 amWith STROBE, we'd request it to be applied on a production code which has been identified as a CPU hogger, it was for very small time, in prod and was expensive, as it was at run time...as Robert has said.

Yes, it is expensive, but given the potential savings using Strobe can give, it's worth using it on production data, as test data may not be sufficient in volume to find hotspots.

Using it at one client I reduced the CPU usage of two routines by 99.5 and 99.7%, and at my last employer I found out that the idiots who ran the shop were using the "STORAGE(0,0,0)" LE option, "because the third-party who helped us moving from OS PL/I to Enterprise PL/I [prino: some eight years earlier!] told us to do so", go figure...

And at another client I found that PL/I code using the "BASED REFER" option was eating CPU like there was no tomorrow. A slight restructure of the code eliminated that hotspot, and the method I used is still recommended in the current Enterprise PL/I manual, if you open the EPLI V5.1 Programming guide on (pdf) page 331, you can read

When your code refers to a member of a BASED structure with REFER, the
compiler often has to generate one or more calls to a library routine to map the
structure at run time. These calls can be expensive, and so when the compiler
makes these calls, it will issue a message so that you can locate these potential
hot-spots in your code.

If you do have code that uses BASED structures with REFER, which the compiler
flags with this message, you might get better performance by passing the structure
to a subroutine that declares a corresponding structure with * extents. This will
cause the structure to be mapped once at the CALL statement, but there will no
further remappings when it is accessed in the called subroutine.

And that's nearly a literal copy of what I sent to IBM in the late 1990'ies.

Post by **Robert Sample** » Tue Jan 29, 2019 10:35 pm

At a previous employer, we had STROBE available. The development manager kept telling me we needed to buy faster disk drives because the program was I/O-bound as it was running 90 minutes to process X amount of records. When I put the program through STROBE, we discovered that the program was CPU-bound, not I/O-bound and that the program was spending about 98-99% of the CPU time on a single COBOL statement. The statement was an INITIALIZE of an array. By adding code to only initialize the elements we needed (instead of the entire array), we reduced the CPU time by 95% for the entire program and the 90-minute run time went down to 15 minutes. We did not buy any disk drives.

When a single COBOL statement (in a 3,000+ line program) can have such an impact, you can begin to see why code coverage analysis can be so important while testing.

Post by **Dino** » Wed Jan 30, 2019 2:30 pm

Robert Sample wrote: ↑Tue Jan 29, 2019 6:30 pmCode coverage analysis is quite useful when testing. If you develop a set of test data for a program, and you run your tests, do you think your boss will be very happy when you tell him / her that your test data exercised 20% of the code paths? In other words, your testing did NOT test 80% of the program statements -- so there is a good chance that promoting the program to production will cause problems.

But what if I just dont' tell it? How many programmers do it? As you said that for testing data it's usually copied from production and that's not really good but that is what happens, right.

Post by **Robert Sample** » Wed Jan 30, 2019 6:17 pm

Many -- perhaps most -- sites do not have code coverage tools and hence have no idea how good (or bad) their testing is. A very rough rule of thumb that can be used is to find out how many batch jobs have to be rerun due to ABEND, or unaccounted for data, or untested conditions (etc). If there is a lot of this, then that's a sign the site does not have very good testing procedures.

I have been involved in a couple of coding efforts where the testing was thorough; one of those applications went 5 months (running daily) without any problems; the other went 13 months of daily runs before having its first rerun.

Post by **Dino** » Sun Dec 13, 2020 11:57 am

It's been a while and that topic was left in between. Never mind, when you say

Many -- perhaps most -- sites do not have code coverage tools and hence have no idea how good (or bad) their testing is. A very rough rule of thumb that can be used is to find out how many batch jobs have to be rerun due to ABEND, or unaccounted for data, or untested conditions (etc). If there is a lot of this, then that's a sign the site does not have very good testing procedures.

How do you do it, I mean is there a tool you used? Or a 'house-keeping' exercise to measure for that thumb-rule?

Post by **Robert Sample** » Sun Dec 13, 2020 6:46 pm

How do you do it, I mean is there a tool you used? Or a 'house-keeping' exercise to measure for that thumb-rule?

There are tools on the market to use for code coverage analysis (BMC / Compuware's STROBE product is one I'm aware of that will do this; I'm sure there are others). As far as the rule of thumb, it is mostly common sense. If your site runs 10,000 batch jobs a day and on any given day 3 of them have abends that have to be fixed, that is a very different site than one running 10,000 batch jobs a day and every day 500 of them have abends. You can also look at how many different programs have abends over a week / month / quarter / year -- if the same programs keep having abends in production, that is a definite sign of the quality of testing and code coverage for those programs.

Post by **Dino** » Sun Dec 20, 2020 11:10 am

We've strobe, I would like to experiment with code-coverage, where do I start from, could you please guide on that, Robert?

Mainframe, MVS and zOS Discussion

What is code coverage and how to measure it?

What is code coverage and how to measure it?

Re: What is code coverage and how to measure it?

Re: What is code coverage and how to measure it?

Re: What is code coverage and how to measure it?

Re: What is code coverage and how to measure it?

Re: What is code coverage and how to measure it?

Re: What is code coverage and how to measure it?

Re: What is code coverage and how to measure it?

Re: What is code coverage and how to measure it?

Re: What is code coverage and how to measure it?

Re: What is code coverage and how to measure it?

Re: What is code coverage and how to measure it?

Re: What is code coverage and how to measure it?

Re: What is code coverage and how to measure it?

Create an account or sign in to join the discussion

Create an account

Sign in