/home/nand

Entries Comments


Cooperative bug isolation : this is concrete and good stuff!

28 November, 2007 (10:50) | FOSS

Remember a few posts ago when I mentioned the cooperative bug isolation research project?

I was thrilled because it sounded like it has much potential, but still I was expecting the disappointment of discovering an empty box with buzzwords sticked all around it. So I gave a try and read some of their papers : [1] [2] [3].

And whoah! It looks nice and have a concrete useful output! To sum up the whole thing for a developer :

  • Binaries need modifications, and a simple change in the toolchain is needed. The bad consequences here are an average package size growth of 50%, and a performance decrease, but it is said to be negligible.
  • At each execution, some of the decisions of the program (ifs, return values,…) will be sometimes (e.g. 1 time out of 100) sampled and saved. Crashes will be recorded. Then the whole data is regularly send to a server. (A few kbytes a time).
  • In the server side, we get a whole aggregated picture of how the software behaves, and misbehaves. Here is the interesting stuff : to put the things simply, the server will know at each decision of your code what is the probability of a crash depending the outcome of the decision. But wait, more to come! These probabilities themselves won’t help much finding the bug origin, which can be hundreds of lines before. So the server will compute the increase of crash probability between consecutive lines. If a crash probability jump after an given decision, it is likely this decision is at fault. Using this new info plus some more duplicate removal algorithm, we end with a small number of relevant pointers to buggy decisions. And this works pretty damn good!

Ok, the example now. Look at this exif analysis (xslt’ed xml). The thermometer size correspond to the logarithmic size of samples the server get : a big thermometer means *much* more samples than a smaller one. The white area correspond to the number of successful execution (no crash) with a given decision. The dark area plus the red area correspond to the number of unsuccessful execution (crash is incoming) with a given decision. And the important part : the red area is the difference of crash probability before and after this decision.

Now look at section 4.2.3 of the [3] paper for the analysis of the exif results. (and you should definitely read the whole paper btw). This is real good! More software samples here (end of page).

As a conclusion, paraphrasing one of the papers : before, users could only rumble at crashes. Now, the developers are overwhelmed by bugreports, many being duplicates or with not enough debug infos. Will CIB mean the end of bug handling as we know it?

« A man from Earth

 Trip to Tokyo »

Write a comment