Tales from Programmatic Oceans

08 August 2012

Obvious things worth saying, Part II

Joe Smith on How to use a paper towel:

20 July 2012

Obvious things worth saying, Part I

Terry Moore on How to tie your shoes:

15 May 2012

Erlang Quine

I'm just getting started with Erlang. Naturally, after writing "Hello, world!" the next program I wrote was a Quine:

-module(q).
-export([main/0]).

main() ->
A="-module(q).\n-export([main/0]).\n\nmain() ->\n",
B="io:format(\"~sA=~p,~nB=~p,~n~s\",[A, A, B, B]).",
io:format("~sA=~p,~nB=~p,~n~s",[A, A, B, B]).

Erlang's "pretty print" formatting code makes this super easy. In particular, no raw ASCII codes necessary here!

Update: OK, after thinking more about it, it can be a little simpler:

-module(r).
-export([main/0]).

main() ->
 Fmt = "-module(r).\n-export([main/0]).\n\nmain() ->\n Fmt = ~p,\n io:format(Fmt, [Fmt]).\n",
 io:format(Fmt, [Fmt]).

09 January 2012

What It's Like Being an Assistant Professor At A Research School

You will cry. At least, that's what I did. In the two and half years I've been an assistant professor I've cried more than in all of graduate school, which itself was a rather grueling six years filled with setbacks. [But high school and college, overall, were quite pleasant for me.] Rather than give you a list of pros and cons of being in the academy, I'm going to try to talk you out of it as much as I can. If you've come to read something like this to see if you should be a professor or not, let me make it simple for you: If you have that much doubt, don't be one. And if you don't, then one blog post isn't going to stand in your way anyway.

Being a professor puts you in an awkward position of power. After years of irrelevance as a graduate student, you have a voice that matters. You get to decide your own syllabus, you get to decide if you use a curve and, yes, you get to decide if that student working at less than their potential deserves a C or an F. I was uncomfortable with that sort of power. The first F I gave tore me up. I had to consult with two other faculty members who told me, "yes, definitely this person should get an F." We like to plot students on histograms. Whoever ends up hanging out on the left margin too much? Fails. Those histograms form blobs; when there are gaps between blobs, we separate them out into "A", "B", and "C"... It's a random, arbitrary process sometimes. But so is life in the many other ways we evaluate people.

But giving an F is nothing like the worst thing I've done. Having to fire a student isn't anything I wish upon anyone looking to motivate growing minds. Another part of the power of being a professor is having money, and money can make all of the difference to a spouse in a refuge family. A research assistantship with stipend and tuition is one of the things you are expected to give out. If you're not a good manager, or you just hired the wrong person, you'll have to terminate the assistantship that doesn't work. You can tell yourself that you are giving them a valuable lesson, or that it's really out of your hands. But at the end of the day, you were the one who flipped the switch.

You can make a difference in your department. Even if you come from a superstar school with a superstar advisor, the odds are that your academic position is going to be at a "growing" or an "up-and-coming" department. That means your department will have plenty of flaws, many of which can be fixable if you just put in the effort. It sounds great that you can grow and leave your mark on the department, but when you come up for tenure, all of your "service" will amount to a single sentence of additional contributions. You also will be unable to avoid political aspects in your service, as reasonable colleagues will disagree with your approach, meaning that your "making a difference" will piss some people off. Don't keep your head down completely, but never try to get emotionally involved in the outcome of anything, no matter how clear it is to you that it's "better."

Academic freedom is for the few. It depends upon your department and the courses you're teaching [introductory, core curricula, electives...], but if you're not at a department that hands you a deck of slides and says "this is what you'll be teaching" you'll have a fair degree of freedom in the approach you take with your course. Making the syllabus match your ideal course is a great feeling, until you find out that, well, your ideal plan has some holes in it. The more experience you have with it, the more you end up conforming to the "tried and true" formulas. Depending on your personality, this might affect your enthusiasm for teaching. I'm a software engineer, so I believe working on a software "project" is essential to really learn how to program. But I know another software engineer at a different department who was told "you can't do a project in that course, because they already do a project in this other course..."

But what about research? Sure, you have academic freedom there too. The academy studies what is important, and academic freedom means the academy itself gets to decide what is worthy of study. You can have a new take on a sub-sub-discipline. People might say "wow, that's an interesting idea!" But unless you convince enough people that your idea is worthy of study, it's not going to get published, and it's not going to get funded.

So, the next grant you write? It's not what you really want to work on, but rather a mix of what is currently hot (i.e., funded, like green projects, or security) and what is safe (i.e., incremental improvements to the state-of-the-art).

You will be told that you need to be an independent researcher. But that does not mean now is the time for you to save the world. All that it means to be an independent researcher is that you publish and get funded only with other junior researchers (and your students). Stay away from working with people more senior than you, and in particular stop working with your advisor. If there are things you still want to publish with someone more senior, be sure it accounts for only 20% of your CV or less. Even if it was all your idea, the more senior person will get most of the credit.

When you start, the only thing you'll be qualified to do research on is incremental improvements to your dissertation. This is a nice route to take, because you can publish with only a few months of work [instead of spending six months learning a new area] and the more you continue your dissertation work, the clearer it will be to others that you've taken ownership of it. I got really interesting in end-user programming when I started my appointment. I really wish I could get those six months back in exchange for a few minimal deltas of publishing units.

[At this point, the dear reader may think that I'm actually giving you advice, or that perhaps I'm doing a Swiftian spoof. No, but I'll concede that sometimes the truth sure sounds like a joke.]

Do you get the academic freedom after tenure? Maybe. You'll still need to fund students, and that means you'll still need to get past the guards at the funding agencies. I do basic research in software engineering, and the funding opportunities for me is quite limited. So much concentration into a small pool can lead to groupthink.

Your peers include the anti-social, and actual sociopaths. Forget about discouraging replies from reviewers who "don't get" your submissions. Imagine creating policy and voting on issues with these people. There are sociopaths in the academy. Some of them act quite charming. They'll kindly agree to write letters of recommendation to eager students, only to throw them under the bus with a damning "recommendation." Instead of insisting on "no," they gladly take it upon themselves to let the world know how much so-and-so really sucks. They will repeat this pattern when you are coming up for tenure. When your case is discussed, there will be a pro side and a con side. Be sure you do enough good work for the "pro" side to have strong material to support your case. Don't bother trying to win over the "con" side. They will smile sweetly to you while burning your case as much as they can, even if their argument relies upon making a damning case against another faculty member coming up for tenure at the same time. [That's known as collateral damage in war.]

Students will manipulate you and disappoint you. Finally, let's talk a little about students. Yes, you are very clever, with that sob story that you constructed six weeks in advance. You really pulled a fast one over on us, didn't you? Well, no, not really. We are almost 97% sure that your bullshitting us, but that nagging 3% and just the hassle of it all means that you'll get away with it anyway.

As for the less clever, thanks for making it easier on us giving you a bad grade. If you do poorly the whole semester and only find religion at the end of it, no amount of earnestness is going to make doing "an extra credit project" [i.e., more work for us to grade] attractive.

Now, what about the students who don't fabricate stories of illness and dead relatives? Well, some students you will really like, and really root for. They'll impress you so much early on, that you'll start to talk to them about considering graduate school, or, if they are already in graduate school, working on a research project with you next semester. But working with students can kind of be like starting a relationship: Sometimes it's your fantasy of the perfect student that is blinding you to the reality of the actual student. Once that reality comes crashing in, you'll find you've invested a lot of time in someone who won't give back even a quarter of what you've put in.

So, given all of these, does the academy need changing? No. You just need to accept that sometimes your dream job is still a job, and hope that the great moments outweigh the bad ones. Until you understand what a really good day feels like, you won't be able to put all of the rest of this baggage in perspective.

14 September 2011

Interview on Keyvan.TV

I was recently interviewed on Keyvan.TV, where I talked about some of my feelings about software engineering:

18 May 2011

Why I Will Randomly Assign Students in Group Projects

There's a disturbing pattern when it comes to group projects in my software engineering classes. When it comes to the average group-- I'm not talking about the exceptional groups, I'm talking about the ones right in the middle-- I've noticed that groups generally only have five types of members.

This is surprising because, at first thought, when you let a random group of intelligent and creative people self organize, the result should be as interesting and varied as the people in the group. But instead of bringing out everyone's best qualities, it amplifies only a few, specific qualities:

1. The Visionary. Giving students the freedom to pick their own projects is a huge burden! What would you expect when you come up to a student and say: "Quick, come up with a great idea right now, because your grade depends on it!" Out of five students, just one would love the burden. That student is The Visionary.

The Visionary never has problems coming up with great ideas. They tend to think big thoughts often, and kick around various ideas for years. When they see a course project as an opportunity to pursue this idea, they jump at the chance. To get there, they'll enlist the help of...

2. The Code Monkey. The Visionary is already good friends with a Code Monkey, and respects how many languages the Code Monkey knows, and how many different graphics and networking libraries they've used. The Visionary doesn't want to do all of this work on their own, so they pick a competent peer they can trust. The Code Monkey always gets an A on programming assignments and the two quickly work out a deal: I'll do the write-ups and the presentation, and you do some coding spikes to be sure if this idea is feasible.

Wait, but that's just two people, the group still needs more. The next person to join is...

3. The Leech. The Leech is actually a great person. They respect people and the course, and they want to get something out of it. Specifically? They want to get an A. The Leech seeks out groups as they are forming and finds the group that they think the professor is most interested in. The Leech doesn't want to exactly gain at another's loss, the want to coast on another's gain.

The Leech typically knows The Visionary or The Code Monkey and is the "second pick" to join the group. The Visionary didn't ask The Leech first, because The Code Monkey was in higher demand. But the group needs to grow, so The Leech is accepted. The Leech's acceptance solidifies the group's mission, and already their roles are set in stone. Based on this solidifying service alone, The Visionary might be the one to approve of The Leech.

In a nice group, The Leech isn't even much of a Leech, and is more just an Understudy Code Monkey.

4. The Slacker always comes late. Groups by this point have already started to form into twos and threes, and time is running out before those left become "that group." You know, the group of people who are randomly assigned, because they just don't know enough people and so the only ones left are assigned to a group by default? Who wants to risk their grade with that!

The Slacker might've been a Visionary-in-Waiting, unable to convince anyone to follow their lead. Being too much of a leader to be a follower, The Slacker only reluctantly follows. The Slacker joins the group based not on what their different ideas are, but based on the path of least resistance. The Slacker is different from The Leech, because they aren't as engaged. Even though The Leech wants to coast on the work of others, The Leech knows how key it is for the group to be strong. The Leech is engaged by giving The Visionary all of the social support he or she needs. The Slacker can pull the group the other way. The Slacker might suggest ideas and changes only because it would be easier for their particular circumstances, not because it would lead to the best project.

Thus, groups sometimes end up with a project with a key component dedicated to some technology The Slacker is comfortable with. However, given that they are The Slacker, that key component will only be ready until "next week." As the deadline approaches, that key component is only half done-- if that-- and everyone needs to save face explaining why they just didn't get there. The Slacker is a drag on the group not because they don't do things, but because they've actively pulled focus away from where the group could have gone.

5. The Watertreader rounds out the group. The Watertreader could join the group at any stage, before or after either The Leech, The Slacker, or even The Code Monkey. A Watertreader might have even been brought on by The Visionary as The Code Monkey. Yet, when it comes to either the coding or the write-ups, The Watertreader is simply in over their head. This could be due to personal issues or inexperience. The Watertreader works hard, but no matter what only seems to be getting by.

Yet, the Watertreader might be a key part of the group, too. They don't make promises like The Slacker, so they don't drag down morale. If they were part of the project early on, they might have recruited the best people of all categories. There's an opportunity for The Watertreader to help gel the team, and even use their organizational skills to keep the group on track: "Come on guys, we need to set up a meeting for next time right now!"

And that's how it unfolds. Most of the time. Obviously, if the project is only 2-3 people, or 5-6, some people will be playing simultaneous roles, or change roles over time. The labeling isn't as important as the group dynamic that emerges.

As a result, I will no longer let students self assign groups. Even though in some cases it works out perfectly, in the average case it does not.

Random selection for group projects is worth the risk to me, given that this means I'm picking a policy that is less popular.

And remember being in "that group"? The one that ends up being randomly assigned by default? They always end up being the more interesting groups. Why? Because it brings out the visionary in everyone, so everyone is engaged, and no one has a choice but to be their best.

This post was inspired by @mattmight's post on "Classroom Fortress: The Nine Kinds of Students".

24 February 2011

If fonts were programming languages

If fonts were programming languages, this is what they'd be...

Helvetica - The C Programming Language. It's old, completely overused, and yet also the best solution to many problems.

Times - C++. This is also grossly overused, and is the second best choice for any large tasks.

Courier - Fortran. This font is old and reliable and we're going to be stuck with it for a long time.

Chicago - Lisp. It's quirky, old, and used in many surprising places.

Computer Modern - Fortress. A mathematical and beautiful font, but pedantic.

Garamond - Java. This font sure seems a lot like Times. Not used quite as much, and has some new flaws and quirks of its own.

Palatino - C#. This font is like Garamond, but some things like that uppercase-P just aren't connected. This makes it attractive for some users, and appalling to others.

New Century Schoolbook - Smalltalk. Initially, this font looks like it's for kids, but it's both serious and playful.

Comic Sans - Python. You wouldn't think that this font was serious, but it's used in a surprisingly large number of contexts. And it's a safer choice than it would appear at first.

Zapf Dingbats - Perl. This font is useful for patching things together. If you need that special symbol to make your sub-sub-bulleted list, this is your ad hoc solution.

LED Marquee - Javascript. This font is the unsung hero. Many times it's used improperly, making things overly flashy and distracting, but it's also sometimes the only venue for transferring very important information.

08 January 2011

How do you know if your software tests are any good?

Project management by check boxes gives you a nice, but false, sense of security that everything is going smoothly. Although three decades have passed since Glenford Myers wrote the classic The Art of Software Testing many practitioner’s approach to testing is to simply bang out some buzzwords and be done with it.

You can say that you've passed 100% of your unit tests, but that isn't meaningful if most of the tests are trivial or repetitive with each other. You might’ve achieved 95% code coverage, but that won’t matter if important edge cases haven’t been covered. So, how do you know if your tests are any good? If the purpose of testing is to find bugs, then your tests aren’t good unless they’ve found bugs. If a test does not find a bug, it fails as a test.

While that’s simple to state, it can still be daunting if you’re not familiar with testing. There are three main techniques you can use to improve your test design: (1) whitebox techniques; (2) blackbox techniques; and (3) mutation testing.

Whitebox techniques are used with specific source code in mind. One important aspect of whitebox testing is code coverage. E.g.,:

Is every function called? [Functional coverage]
Is every statement executed? [Statement coverage-- Both functional coverage and statement coverage are very basic, but better than nothing]
For every decision (e.g., if, while, ...), do you have a test that forces it to be true, and other that forces it to be false? [Decision coverage]
For every condition that is a conjunction (uses &&) or disjunction (uses ||), does each subexpression have a test where it is true/false? [Condition coverage]
Loop coverage: Do you have a test that forces 0 iterations, 1 iteration, 2 iterations?
Is each break from a loop covered?

Blackbox techniques are used with specific requirements in mind. Blackbox testing follows the principle that a test should not test a single program, but the full class of possible programs. The following blackbox techniques can lead to high-quality tests:

Do your blackbox tests cover multiple testing goals? You want your tests to be “fat”: Not only do they test feature X, but they also test Y and Z. The interaction of different features is a great way to find bugs.
But you don't want fat tests when you are testing an error condition, such as invalid user input. If you tried to achieve multiple invalid input testing goals (for example, a test to cover an invalid zip code and an invalid street address) it’s likely that one would just mask the other.
Consider the input types and form an equivalence class for the types of inputs. For example, if your code tests to see if a triangle is equilateral, the test that uses a triangle with sides (1, 1, 1) will probably find the same kinds of errors that the test data (2, 2, 2) and (3, 3, 3) will find. It’s better to spend your time thinking of other classes of input. For example, if your program handles taxes, you'll want a test for each tax bracket. [This is called equivalence partitioning.]
Special cases are often associated with defects. Your test data should also have boundary values, such as those on, above, or below the edges of an equivalence task. For example, in testing a sorting algorithm, you’ll want to test with an empty array, a single element array, an array with two elements, and then a very large array. You should consider boundary cases not just for input, but for output as well. [This is call boundary-value analysis.]
Another technique is error guessing. Do you have the feeling if you try some special combination that you can get your program to break? Then just try it! Remember: Your goal is to find bugs, not to “confirm” that the program is valid. Some people have the knack for error guessing.

Finally, suppose you already have lots of nice tests for whitebox coverage, and applied blackbox techniques. What else can you do? It’s time to test your tests. One technique you can use is mutation testing. Under mutation testing, you make a modification to (a copy of) your program, in the hopes of creating a bug. A mutation might be:

Change a reference of one variable to another variable; Insert the abs() function; Change less-than to greater-than; Delete a statement; Replace a variable with a constant; Delete an overriding method; Delete a reference to a super method; Change argument order.

Create several dozen mutants, in various places in your program [the program will still need to compile in order to test]. If your tests do not find these bugs, then you know you need to write a test that can find the bug in the mutated version of your program. Once a test finds the bug, you have killed the mutant and can try another.

Testing is complete when you have stopped finding bugs. Or, more practically, when the rate at which you find new bugs slows down and you see diminishing returns.

Bugs tend to “cluster” in certain modules and features: The moment you find a bug in one, you know that you should look in it further for more bugs. (For example, why does Apple keep on having troubles with the iPhone alarm? It’s a perfect candidate for increased testing efforts.) To find bugs, you can use the techniques of blackbox testing, whitebox testing, and mutation testing. As long as you are finding bugs, you know that your testing process is working!

This post is a revision to two of my answers on the Programmers StackExchange.

31 December 2010

A New Year's Resolution for New PhD Students

Resolve to move from being a grad student who follows directions into one that leads with direction.

07 November 2010

What is Good Code?

I was lurking around the "Programmers" Question & Answer site on StackExchange when I stumbled upon this question: "What does it mean to write 'good code'?" I gave the following meditation:

A good coder is like a good pool player.

When you see a professional pool player, you at first might not be impressed: "Sure, they got all of the balls in, but they had only easy shots!" This is because, when a pool player is making her shot, she doesn't think about what ball will go into which pocket, she's also thinking about where the cue ball will end up. Setting up for the next shot takes tremendous skill and practice, but it also means that it looks easy.

Now, bringing this metaphor to code, a good coder writes code that looks like it was easy and straightforward to do. Many of the examples by Brian Kernighan in his books follow this pattern. Part of the "trick" is coming up with a proper conceptualization of the problem and its solution. When we don't understand a problem well enough, we're more likely to over-complicate our solutions, and we will fail to see unifying ideas.

With a proper conceptualization of the problem, you get everything else: readability, maintainability, efficiency, and correctness. Because the solution seems so straightforward, there will likely be fewer comments, because extra explanation is unnecessary. A good coder can also see the long term vision of the product, and form their conceptualizations accordingly.

Proper conceptualization is important; it helps you avoid false abstractions.