Assignment 1: Software development difficulties

Reflect on your software development experience. Oftentimes it is lots of fun, but at times it can be frustrating.

1. Think of a time when some problem took you a lot of effort. Be specific: choose a concrete incident that you remember. It should not be trivially solvable by an existing technique or tool, nor merely due to your ignorance or inexperience. (But, some tools can help you more rapidly learn facts or overcome inexperience.)

If you cannot remember a specific incident, then put the assignment aside, think about it in the background, and come back to it after a few hours. You also might want to look through the history of commits in projects you have worked on, or issues opened and closed, or code reviews, or TA comments.

Write 1/2 to 2/3 page discussing the problem. Give it a descriptive title. Write about the same amount of text on each of the following (number the paragraphs in your writeup):

  1. Describe the problem in sufficient technical detail for others to understand why it was difficult. The topic sentence (that is, the first sentence) should be an informative summary or description of the problem.
  2. Describe what methodologies and/or tools you used. Be specific about ways in which they were helpful and ways in which they fell short.
  3. Search for a tool or methodology that would solve your problem.
    • If one exists that completely solves your problem, you have learned something valuable, but don't write up that problem; start over at the beginning and choose a different problem to write up. (Note that sometimes a tool may claim to solve your problem, but it might only help part of it.)
    • If no tool exists that solves your problem, then describe the tool or tools that come closest to solving your problem, and explain how it falls short.
  4. Propose a tool that would solve your problem. It might be similar to, or completely different from, the one that you described above. Describe how it would work (this should not be magical, but you don't have to know how to implement it in detail). Think about the challenges of building and deploying such a tool. Why do you think no one has built it already?

2. Do the same thing, but with a different problem and solutions. Choose two problems that are different from one another; for example, they shouldn't both be about any one activity/topic, such as requirements, coding, testing, debugging, performance, documentation, etc.

3. At the end of the document:

Submit a two-page PDF file. Use a font size of 10 or 11 points, and standard margins (1 inch).

This assignment will reward careful thought about interesting problems and issues. Please introspect deeply and thoughtfully. Doing so will help you in this class, and beyond.

Tips

Common problems with homework: * Not thinking of a specific event, but a general type of problem with a vague description. * Lack of imagination: looking at one small part of the problem rather than creatively thinking about the real underlying problem.

The most common reason for low grades on this assignment is lack of specificity. Don't let this happen to you!

Another common problem is unsupported claims. Support every claim that you make. For example, don't just say “Current techniques are not mature enough to do it,” without explaining why. List and briefly describe current techniques, and explain why they fall short.

Examples from previous years

Here are two example answers. Your answer shound not use these particular problems. (Since your problem should be based on an anecdote from your own experience, it should have been different anyway!).

Example 1: Un-chaining method calls for debugging

It is a hassle to debug a long line with many function calls ("fluent style"), at least one of which fails or produces a bad result. You have to step in and out of functions on a single line of code trying to determine what is going wrong, and generally you can't see the intermediate results. The problem itself is not difficult to get around, but annoying and it wastes valuable time.

Generally when I come across this problem, I split up the single line of code into multiple lines, with each function call result stored in its own temporary variable. An example of this is shown below:

Before:
    return obj1.toString().substring(obj2.getList().get(idx).getIntValue());
After:
    List<Val> list1 = obj2.getList();
    Val val1 = list1.get(idx);
    int int1 = val1.getIntValue();
    String string1 = obj1.toString();
    String string2 = string1.substring(int1);
    return string2;

This strategy works and I can clearly tell which function call is causing the improper behavior, but it is tedious especially when you must do it a lot on a codebase.

Eclipse has a feature for Java that allows you to see the previous function's returned value while debugging, and Visual Studio has a similar feature for C# and C++. These features are helpful and I was not aware they existed, but it does not give you a way to deal with many function calls and storing all their results temporarily for debugging.

The tool I would create would simply automate the procedure I used above. The tool would be able to expand a single line of code into multiple lines, storing function return values into variables so they can easily be viewed while debugging. Often, it is cleaner and more readable to split up a line with many function calls anyways, but my tool would also support collapsing lines of code as well back into a single line. Essentially the tool would parse the line of code looking for a function call, determine its return type by finding the function header, create a new line with a temporary variable, and replace the original function call with the temporary variable. For collapsing, you would select lines of code to collapse, and the process would be reversed. I think nobody has built this tool already because it is relatively low effort to do this process by hand, and often lines of code aren't nested as deep as the example I provided.

Example 2: Smarter println debugging

During my course in operating systems, we had to implement swapping in and out pages from physical memory to disk. One problem that occurred was concurrency and the way my partner and I had to handle multiple processes either trying to access a specific page that may or may not already been swapped out to disk. With a single process, the idea was simple. If there was no more memory in physical memory, take that page and put it to disk allowing a new page in physical memory to be allocated. If a page wanted to be accessed and was on disk, then we would have to swap out a page on disk, and swap in the one trying to be accessed. However, a huge problem rose as multiple processes were trying to swap in and out when memory got full. The test suite had multiple forks and concurrent operations that made our implementation buggy. Concurrency made the whole process a nightmare to debug.

At first, my partner and I tried to debug via gdb. We would get to a point where the program would swap in correctly and then the operating system would switch to user mode and call fork, our program would then enter a state where it would panic and error out.We began to look at kalloc, the function where a new page would be allocated or swapped out to disk when physical memory was full. However, after looking through our code over several times, we could not find the problem and gdb would reach a dead end. We then resorted to printfs to figure out what was happening. We filled our own code base with printfs where we would swap out or swap in or enter a function. This helped us know the sequence of which all the functions were called and tried to find patterns or unexpected behavior. We eventually figured out that our program was swapping in and out the same page which was fixed by implementing a least recently used page instead of the first available one. However, as we fixed our bug. Our code base had more printlns than actual code. We basically had to grep all the printlns, one by one and delete them. Our test suite output was still flooded with printlns and sometimes tracking or remembering where the println came from became a nightmare. Both gdb and printlns helped us solve our problem and were useful ways of debugging, but when it came to concurrency, gdb fell short since we didn't know all the features. Also printlns became a nightmare when trying to clean up our code and push our code to git.

After searching and looking online for general concurrency debuggers. I found that gdb has a watch command that looks at and addresses and notifies the us when it something changes at that address. Another useful command I found was backtrace where after a program crashes, gdb can tell you the order of functions that lead up to the crash. People also recommended adding asserts to ensure there wasn't overflow with variable or to ensure expected range values for variables. There was also another mention about logging to figure out where the program may have went wrong. I think the tool that comes closest to solving our problem was the logging which was basically what the printfs were. They helped us understand which locks were where, which functions were being called and what was changing. Logging is basically a timeline of what executed and that was exactly what we needed to solve our problem.

One tool that I think would be useful after my experience with concurrency is a debugging tool that enable and disabled printlns or lines of code specifically for debugging. For instance, when the debugging mode is on, we could add printlns to the code and it would appear red in whatever text editor. When the debugging mode is off, the printlns go away and the code is in a state where it can be pushed. This would be useful for my specific experience with concurrency since it would allow me to continue debugging and not worrying about ruining the already implemented code. When a printf or assert or any debugging purposed line of code is written, it can be marked as a debug line to be used in execution when debugging mode is on. One problem with this would be working with various text editors or ides to make it compatible. Also a problem would be how to specifically mark a line of code since we can't just mark the line number. Such a tool probably has been implemented but in the form of logging. However, I think having on the spot on and off debugging would be a pretty cool feature to implement.