I Get the Feeling that the Debugging Discipline has a Bright Future Ahead
Posted in Reviews on January 31st, 2010 by Bryce Thomas – Be the first to commentWhen I debug, I use what I’d consider to be a set of pretty stock standard tools and strategies. We’re talking breakpoints, conditional breakpoints, stepping into code, stepping out of code, stepping over code, variable watches, stack traces, you know the likes…
Just recently though I finished reading Why Programs Fail, Second Edition: A Guide to Systematic Debugging by Andreas Zeller.
This is no “Learn to Debug in 24 Hours Using Visual Studio”; it’s an entirely different treatment of debugging. Just when you thought you knew everything there was to know about debugging, Andreas Zeller came along and unleashed this little gem. The debugging concepts and techniques presented in this book are way ahead of the curve compared to what is commonplace in industry right now. I wanted to dedicate this post to going over just a few of the concepts presented in this book, to give you a taste of all that is still left to come for mainstream debugging. So here we go.
1. Debugging Backwards In Time
The traditional way to conduct a debugging session is to set some breakpoints, fire up your debugger, wait until a breakpoint is hit then start inspecting the program’s variables and execution, stepping through the code line at a time. If you’re anything like me, you rarely find the problem cause on the first run through. By the time you reach a line of code where you notice your variables aren’t as they should be, you’ve gone too far. Time to set some more breakpoints, fire up a new debugging session and see how many more times you can overstep the problem before you actually get it right.
Enter Backwards Debugging. Wouldn’t it be great if you could run your debugging session once, realise when you’ve stuffed up and stepped too far ahead in your code and be able to press rewind? Even better, being able to replay that mangled piece of code back again as many times as you like without ever having to start a new debugging session? It’d be like getting to listen to the 6 minutes of The Verve’s Bitter Sweet Symphony on track repeat without ever having to subject yourself to the other 69 minutes of torture that is the remainder of the album.
Well, guess what? Backwards debugging tools already exist (for some languages at least). In fact, the first experimental backwards Java debugger, or “Omniscient Debugger” (as coined by Bill Lewis) seems to have been released all the way back in 2002 (Yes, 8 years is a long time in the world of technology). Here is a more recent page about the Omniscient Java Debugger.
For the Microsofties among us (myself partially included), you’ll be pleased to know that Microsoft Visual Studio 2010 is going to come equipped with a “Historical Debugger” that lets you step backwards in time through your program. Hopefully this will give backwards in time debugging the exposure it needs to get people excited enough to demand it from other non-.NET programming languages.
2. Program Slices
A program slice is a subset of a program that influences a particular program statement or the subset of a program that is influenced by a particular program statement. Now you’re confused. It’s really very simple. Let’s give an example. Imagine for a moment that you have the following function (this one happens to be written in Python):
def summul(a, b):
sum = 0
mul = 1
while (a <= b):
sum = sum + a
a = a + 1
mul = mul * a
print(sum)
print(mul)
Don't worry too much about what this function does. It's really not important. Now though, say that the value we see printed for mul when calling this function isn't what we expected it to be. For a small sample of code like this, it's easy enough to look back over the whole piece of code and figure out what's wrong. For a larger piece of code though, it'd be nice to be able to look at only those statements which influence the value being printed out for mul without all of the noise from the other statements that are completely irrelevant. So how would we go about isolating only those statements that affect the value printed for mul? Well, we can manually do it by looking back through the code and finding which lines directly and also indirectly influence the value of mul. Let's give that a go.
- Obviously line #9 influences what is printed for mul because it's what does the printing.
- Line #8 in no way influences the value of mul and so in no way influences the value printed for mul.
- Looking back we see that the value of mul is influenced by line #7, as it multiplies mul by a. So line #7 influences what is printed for mul.
- Now we have an indirect or transitive dependency between line #6 and the value of mul, because line #6 affects the value of a, and a affects the value of mul in line #7. So line #6 influences the value printed for mul.
- Line #5 reads the value of a, but doesn't modify it, so mul is not dependent on line #5 and therefore the value printed for mul is not dependent on line #5.
- Line #4 affects mul as the while statement controls the execution of other statements affecting mul. Therefore line #4 influences what is printed for mul.
- Line #3 is where the value of mul is initially set. So line #3 influences what is printed for mul.
- Line #2 has no influence over mul as sum is neither directly nor indirectly connected to mul. Therefore line #2 in no way influences the value printed for mul.
- Line #1 influences mul, as the arguments for both a and b are passed in here; both of which affect the value of mul. Therefore line #1 influences what is printed for mul.
So if we are debugging a misprinted value of mul, we can trace the problem back to somewhere in lines 1, 3, 4, 6, 7 and 9. So we only need be looking at:
def summul(a, b):
mul = 1
while (a <= b):
a = a + 1
mul = mul * a
print(mul)
That was kind of tedious you say, and it only eliminated 3 lines of code. Yep, but 3 lines is 33% of the code and you've been able to completely rule out those 3 lines as the cause. Really though, the important thing to note here is that creating program slices like these is a great candidate for automation. For those of us (un)?lucky enough to be writing code in C/C++, there's a tool called CodeSurfer that does just this, with the ability to automatically generate program slices and highlight the lines of code that influence any given statement. I'd love to see this kind of functionality built into Visual Studio or Eclipse. I can envisage rolling my mouse over a variable during a paused debug session and having it show not only the variable's value, but fade out all of the lines of code which had no bearing on how that value came to be.
I haven't gone into too much detail on all of the program slice variants; they're covered in much more depth in the text. Particularly interesting is the difference between static slices and dynamic slices though. The idea behind a static slice is that you cover all of your bases (all possible execution paths), regardless of what variable values your program actually executes with. Dynamic slices however are a further refinement that narrow down the number of dependent lines of code based on known variable values in a given run of a program. Take this basic (and pointless) piece of code for example:
def pointless(c,d):
if c > 5:
d = d + 3
d = d + 1
print(d)
Assume we execute this function passing in the value 4 for c and 2 for d. If we wanted to know which lines could influence the value printed for d, a static program slice would say lines 5, 4, 3, 2 and 1 (all of them). A static slice has no knowledge of the values being passed in for c and d and so covers its bases saying that it's quite possible that lines #2 and #3 may affect the value printed for d if the if statement evaluates to True.
A dynamic program slice on the other hand looks at a specific execution of the program. The dynamic slice would realise that the value of c is not greater than 5, and so lines #2 and #3 would have no bearing on the value printed for d, at least not under this specific execution. Therefore the dynamic slice would only include lines 1, 4 and 5 as those capable of influencing what is printed for d.
3. Delta Debugging
Delta Debugging is a fancy (and in retrospect more concise) way of saying "find the difference between an old working version of my program and a new broken version of my program and narrow down the change(s) within that difference that is causing my new version to be broken". Much like program slices, delta debugging is all about automation. It's about saying to the computer "here's my old working code, here's my new broken code, figure out what part of the difference between the two is giving me a headache".
It's worth noting here that we're not just talking about doing a diff between the new and old version of the program code and having a programmer manually read through all of the changed lines looking for potential problem causes. No, delta debugging is more advanced than that. Delta debugging says "I'm going to use a binary search to automatically narrow down what part of the diff is giving you problems". By executing the program multiple times with only some of the new changes applied, delta debugging aims to find out which of those new changes is causing the problem.
That all sounds nice you say, but how the hell does this all work? Well, delta debugging does assume a few things. The first assumption is that you have your code under version control. It's pretty hard to compare an old version against a new version if the old version no longer exists. The other assumption is that you have a test suite of some sort so that the computer has some way of determining which executions qualify as "passing" and which executions qualify as "failing". It can be hard to convince people of the merits of unit testing, especially when so much of the supporting evidence is empirical and given the unprecedented levels of fanboyism surrounding it. To anyone who preaches unit testing, here's my advice: start your sermon with "unit testing lets you automate debugging". From here on it should be mostly smooth sailing.
Back to delta debugging though. Time for an example. You've got your version control and your test suite in place. Your old version of code is committed after passing the test suite and you've just now added 100 new lines of "enhancements" for your program. You run the test suite again and it now reports EPIC FAIL (by the way, I don't know of any testing frameworks that report this, but if you work on one, please consider it, at least for say > 80% test failure rate). So now all you know is that something in that 100 lines of code is causing your test suite to fail.
You throw delta debugging at it. Your delta debugger splits those 100 lines in half, takes 50 of them, mixes them in with your old working code and runs it through the test suite again. Does the test suite now pass? If it does, then those 50 lines of code aren't causing the problem, so trying running the program through the test suite again with the other 50 lines of code. If it doesn't pass, then split those 50 lines of code in half again and try running the program through the test suite with only 25 lines of code changes applied. You probably get the idea from here. It's basically a binary search for the problem in your code changes.
You can see how this process can be automated to isolate which line(s) of code amongst all those changed lines of code cause the test suite to fail. My example is of course highly contrived. In fact, there is only a very small class of problems where you can simply split a piece of code in half and expect it not to break. There's dependencies between variables on different lines and many of these splits would result in compile errors. There's further discussion in the book about more advanced methods that build on the same principles, including automated techniques that are syntax aware and not quite so brutish about where splits are made. The example I presented here was only designed to demonstrate how automation can be used to try a large number of program variants to establish what code changes cause the program to fail.
A delta debugging plugin by the name of DDCHANGE exists for Eclipse which helps automate debugging using some of the concepts described here. There's also a nice little Flash video demonstrating it. I haven't tried DDCHANGE myself, but it certainly looks promising.
One of the things which is nice about backwards in time debugging and program slices is that they don't require any special supporting infrastructure. If you want your program to do anything, you have to write the code for it, so everything a backwards debugger/program slicer needs is kind of already there for it. With delta debugging, you need some kind of versioning system as well as a test suite in order for it to be of any use. If you develop with these anyway then it's really not a problem. It does however present a bit more of a barrier to simple experimentation.
And a bit of a wrap up... I'm tempted to say this book is up there in the same league as Code Complete. Sure, it doesn't cover the same breadth of topics, but the writing style is straight forward and methodical and you can tell that the author knows the subject matter extremely well. It's just got that same feel to it. For some languages/IDEs the tools are still quite immature/non-existent, but it wouldn't surprise me if most of these debugging techniques become commonplace over the next 5 years or so. Either way, it's a book that's definitely worth checking out and the best I've read in some time.
