The "benefit" row gives the amount of days in-between when an issue
is found because of the ctests and when an issue is reported by
someone not using the ctests. In the example of the ps2pdf flag, the
issue was reported just a couple of weeks after we found it with the
ctests, indicating that the ctests were not that helpful in this
case.
A false positive could either be "not a bug" or a bug that is not
worth the time or complexity to fix. For the purpose of evaluating
the tests, the distinction is not important.
The ctests are not convenient tests. In this file we can attempt to
log the benefits and costs to using them so we can periodically
evaluate which tests we should keep and which we should get rid of.