Tales from Programmatic Oceans: 2010

31 December 2010

A New Year's Resolution for New PhD Students

Resolve to move from being a grad student who follows directions into one that leads with direction.

07 November 2010

What is Good Code?

I was lurking around the "Programmers" Question & Answer site on StackExchange when I stumbled upon this question: "What does it mean to write 'good code'?" I gave the following meditation:

A good coder is like a good pool player.

When you see a professional pool player, you at first might not be impressed: "Sure, they got all of the balls in, but they had only easy shots!" This is because, when a pool player is making her shot, she doesn't think about what ball will go into which pocket, she's also thinking about where the cue ball will end up. Setting up for the next shot takes tremendous skill and practice, but it also means that it looks easy.

Now, bringing this metaphor to code, a good coder writes code that looks like it was easy and straightforward to do. Many of the examples by Brian Kernighan in his books follow this pattern. Part of the "trick" is coming up with a proper conceptualization of the problem and its solution. When we don't understand a problem well enough, we're more likely to over-complicate our solutions, and we will fail to see unifying ideas.

With a proper conceptualization of the problem, you get everything else: readability, maintainability, efficiency, and correctness. Because the solution seems so straightforward, there will likely be fewer comments, because extra explanation is unnecessary. A good coder can also see the long term vision of the product, and form their conceptualizations accordingly.

Proper conceptualization is important; it helps you avoid false abstractions.

03 October 2010

The PhD Student Startup Kit: Materials and Books

Here are some office materials to get so that your cubicle is the most organized:

Stackable Letter Trays
Brother PT-1290 Label Maker, with AC adaptor and tape refill
Swingline Stapler, with staples; (Red, of course.)
Box of File Folders, assorted colors
Classic Ballpoint Pen, with refills
Foam Keyboard Wrist Rest
16 GB USB 2.0 Flash Drive
Wire Step File
Scissors

Lots of little things add up. If you can believe it, this comes to around $300! (It is a nice pen!) But they also add up in value.

And David Allen has written a really good instruction manual for using all of these materials: Getting Things Done, which is one of the books I get for all of my students:

Getting Things Done: The Art of Stress-Free Productivity
How to Talk to Anyone
Outliers: The Story of Success
The Academic Job Search Handbook
The Sense of Structure: Writing from the Reader's Perspective

Getting Things Done
In addition to being a manual on managing a paper-based system, this book also includes strategies for electronic organization. The core of Getting Things Done is to get things out of your head and inbox and to put them into a reliable system. For example, instead of keeping important things to do in your email inbox, you make a list of actions. So, don't write "Printer," write what you are actually supposed to do: "Look up printer representative's number," "Call printer representative for quote," "Place an order for more toner."

The beauty of the next-action system is that you don't have to think about what "Printer" means each time you glance at your todo list.

The paper organization is particularly relevant for PhD students: You have so many notes for projects, related works, and exam study materials, keeping large, scattered piles just isn't effective.

How to Talk to Anyone
This book is a rather easy read, and I found about 50% of the material to be quite helpful. I tell my students this is a great manual for schmoozing with people at conferences. Like it or not, our field is not a pure meritocracy, and the more people you know, the more likely it is you will have good opportunities. I hope it goes without saying that schmoozing is a necessary, but not sufficient condition: You need strong research no matter what.

Outliers: The Story of Success
Speaking of success, this is another easy to read book, which challenges our conceptions of what it means to be successful. It's not a matter of talent that you will have strong research, it's a matter of mindset. Giving yourself a growth mindset is necessary for you to get the most out of graduate school and beyond.

The Academic Job Search Handbook
For my students interested in academia, I give them this book early on to help them with their decisions. I have two philosophies for being "ready": (1) The only way to prepare for X is to be an X; and (2) The best way to be good at step X is to already start thinking about step X+1. More concretely: The only way to be ready to be a PhD student is to be a PhD student. The best way to be a good PhD student is to be thinking about yourself as an assistant professor.

[I suppose for a non-academic bound student, I would need to find a good book on the industry. Perhaps The Art of the Start.]

The Sense of Structure: Writing from the Reader's Perspective
Finally, this is the best book on writing I've seen. I've read many of the great writing books, but only Gopen's approach leads you to think logically about what you are writing. Word choice is actually not as important as the structure of your writing, which is a reflection of the structure of your thoughts.

27 August 2010

How to Live On the Grid

Suppose you are dissatisfied with your life, let’s say the feeling of consumerism is getting you down, is completely “disconnecting from the grid” the most rational response? Swinging from one extreme to another is a bad way to live your life. To simplify your life, you don’t have to oversimplify the issues at play.

Instead of following the latest fads, or running from the latest fears, here are 10 tips to help you live on the grid.

1. Understand credit, don't fear it - Credit can smooth the bumps on the road of life, but it’s a tool that must be handled carefully. Don’t let your card balances get over 50% of your credit limit. To build credit, open credit card accounts and then keep them. When you are getting started, don’t worry about paying it off every month, but always pay more than the minimum. Never be late with payments. Make a plan to pay off balances in the short term.

A car loan will pave the way to you securing a home loan. The credit issue is hard, because there is a lot of hogwash out there, some of which may even sound convincing. No, our credit system is not a “house of cards” ready to collapse. Such conspiracy theories are the result of fear, enabled by ignorance.

2. Save and invest - Index funds are the way to do it. Make your index fund portfolio diverse: Get indexes in real-estate (REIT funds), bonds, and mostly securities (i.e. stocks-- small cap, mid cap, large cap; domestic, international, developing markets). The best company for it is Vanguard.

Try a Vanguard IRA (if your work doesn’t have 401k or 403b) and max out your contributions each year. To hedge your bets, have both Roth and Traditional accounts. The easiest fund to get is Vanguard’s Target Retirement Funds, which diversifies for you.

3. Think Win-Win - Generally, no all-powerful group is conspiring to get you. Participating in economic activity is usually a win-win scenario. Just because some banks are getting money doesn’t mean that you are losing. That said, there are some pretty bad products (financial products, services, or goods) out there that should be avoided.

4. Buy only the soft-bristled toothbrushes - A hard bristled brush will wear away your gum line. There is absolutely no medical reason for 66% of the brushes for sale (medium and hard) to exist. This is actually quite a general lesson: As a consumer, you need to know that some products just shouldn’t be purchased.

5. Think global - Think win-win on a global scale. Jobs are not zero sum (e.g. “moving overseas”) and you should expand your circle of interest to the whole globe. The earth is your community! Welcome home!

6. Eat global - There is some misinformation about the environmental harm of shipping food long distances: growing food locally can cause more stress to the environment for foods that can be grown easier elsewhere. In terms of carbon, the most is emitted when you drive to the store, not when it gets shipped rather efficiently with boats and trains. There are so many great fruits, vegetables and beans (coffee and chocolate, anyone?) that to reject them is to reject some of the greatest pleasures of life.

7. Enjoy food without arbitrary rules - Such misinformation has been spread by the “locovore” and “slow foods” movement, which seem to be nothing more than a PR-friendly version of anti-globalization. Michael Pollan introduced this absurd rule to not eat “anything with more than five ingredients.” Darn, there goes my Rudi’s organic seven grain bread, with 4 grams of protein per slice, 8% RDA of iron, and zero cholesterol. Why? Because seven is bigger than five! Instead of arbitrary rules, choose logic.

8. Drive the speed limit - Not only will you increase your gas mileage (noticeably!), you’ll also avoid those tickets. Whenever I drive by a speed trap I don’t panic, I smile.

9. Start a happiness project.

10. Take charge of your health and find second opinions - Toothbrush issues aside, there is so much we don’t know right now about health. For prevention, it helps to practice common sense.

But what about when you get sick? Seek medical advice, and if it’s something acute, go to the emergency room or call 911 without delay. But what about non-acute problems? It’s not up to any doctor to solve it for you. A doctor cannot make you healthy, only you can. If your chronic rhinitis can’t be resolved by a general practitioner, go see a specialist (like an allergist, or an ear-nose-throat doctor). Unfortunately, such doctors might not have seen a case just like yours before. They might have seen the symptom, but from a different cause. They might recommend prescriptions, but drugs, even though massively manufactured, are still mysterious. MDs are just stabbing in the dark sometimes, but they can sure sound authoritative while doing so. Some doctors are quacks, some doctors are lazy, and when that is the case, you need to find other doctors instead.

So, what about that case the MDs can’t solve? Try non-MDs. Acupuncture and chiropractic adjustments have been shown to be beneficial in a variety of non-acute medical situations. When you see an acupuncturist or a chiropractor, you might actually get to talk to someone who is concerned about your health. You can go deep into your medical history, and you might see some patterns emerging. They might have even seen your case before, cause and symptom, and know how to effectively treat the cause. But, again, they aren’t treating it, you are. Read up about all of your conditions, and try any of the low-risk solutions first. For some, the road to health involves surgery and powerful prescriptions. For others, it just means finding a bottle to squirt saline water.

08 August 2010

Ten Things Every Computer Science Major Should Learn

Meeting the graduation requirements is not necessarily sufficient for being the best computer scientist you can be. For a typical college curriculum, here are the top ten things you should be sure to learn:

1. The basics of economics - An introductory course covering topics like complements and substitutes is vital for working in the greater economy, or just simply understanding it. While the concept of a Giffen Good won’t necessarily help you, knowing about externalities will. It might also help you appreciate that more situations are win-win than you might have realized.

2. How to write a proof - All computer science majors should know how to write a proof. And discrete math, while a part of a well balanced breakfast, doesn’t count. [Induction is just one proof technique, and you can get by without actually knowing much about proofs.] A course in algebra or real analysis is necessary to really write proofs. And by algebra I mean group theory or abstract algebra, not the course you took in high school. For the full benefit, take algebra and real analysis in the same term.

Why is proof writing essential? Because it’s programming! Think about when you first learned how to program: if a task required an “if” and a loop, you might not have had any intuition on where to put them in relation to each other. But now the same task would feel completely natural. Writing a proof is very similar. There is a set of tricks that you learn, and once you learn them things look quite different.

3. How to write - Written communication skills are essential, whether you’ll work in the industry or academia. It’s best if you can find a mechanics course, and not a writing course that is effectively about a different topic. That is, many schools will try to make the writing courses more relevant or interesting by making it be about a special topic. Try to go for the “boring” version of the course.

4. Probability and statistics - There are some things that you’ll only pick up properly by taking a course. Together with the CS major requirements (which should give you discrete math, single variable and multiple variable calculus, and linear algebra) and algebra and/or real analysis, picking up statistics will probably give you a minor in math. Learning statistics can help you work with other scientists on their projects.

5. The current hot topic - In previous decades, it might have been databases, or object-oriented programming. Today it might be web programming or service-oriented architecture. Whatever the current fad is, be sure to take a course in it. If only to see what the fad is about.

6. The halting problem - Most problems cannot be solved by machines. This is a fairly deep idea that our culture has absorbed so well that it no longer sounds shocking. The same goes with radio, Goedel, and the atomic bomb; it wasn’t until postmodern art and the cold war that we could once again cope with these concepts. However, taking a course in computability theory can re-sensitize you to this pretty amazing proof.

7. Pure functional programming - You most likely won’t go into pure functional programming, unless you do research in it or work for a select few companies, but knowing it will help you be a better programmer. The reason is that you will learn many new forms of abstraction, and concepts like Church numbers and continuations and monads and, yes, recursion, and these tools can be applied to your next Java program too.

8. P and NP - OK, this one is already on your critical path, but pay attention anyway. You want to be sure you can correct someone when they incorrectly call NP “non-polynomial.” As if!

9. The topics from the course you’re sure to hate - This could be a CS course you find too-low-level, too-theoretical, or a non-CS course you find too-objectionable, too-hard, or too-boring. If a course like this seems to be an issue for you, and you find yourself explaining to others why you’re so glad you don’t have to take so-and-so, it should tell you that you’ll learn a lot by taking the course! Perhaps you won’t learn the materials of the course, but you’ll learn about your own limits and perhaps more about the justifications you make to yourself. [Hint: They are usually weak.]

10. The non-CS course you’re sure to love - In the end, you should have some fun. This is the course you’ll probably get the least out of, but take it anyway. Do it once. If you happen to love many courses, then good for you, but be sure it doesn’t get in the way of covering the rest of the courses on this list.

***
My approach here has been practical, based on courses you can actually take. I’ll save a rant on what courses should be available for a different day. I omitted some obvious choices, like a course on logic, even though logic is essential for a computer scientist. Why not recommend it, then? Because taking a course in logic won’t make you more logical! We can’t conflate the two concepts. And I believe this conflation is the reason why many lists about “what colleges should be teaching” are so often off the mark. Instead, I’ve focused on learning objectives that are likely to be learned.

04 August 2010

Why Wave Failed

There are many narratives describing why Google Wave failed: "It was too confusing," "It wasn't different enough," "It was a solution looking for a problem."

But what really killed it isn't being talked about much: Network effects.

A network effect, or network externality, is an economic term referring to when the value of a product or service increases the more users it has. Imagine being the first person to own a fax machine: if there isn't anyone to send faxes to, it's not very valuable. It's the reason why websites like eBay and Facebook seem to dominate instead of existing among many competitors. (In Japan, Yahoo! Auctions was an early mover, and is the dominant player; and Google's Orkut service is the preferred social network in Brazil.)

Google's reason for killing Wave is that "Wave has not seen the user adoption we would have liked." And all of that can be traced to how Wave was introduced. After a tremendous presentation at Google I/O, there was real buzz and anticipation for Wave. It was a prize to be able to get a Google Wave invite. Eventually, I got a Wave invite too. I signed up as fast as I could and then... I did nothing.

I signed into Wave and sent a Wave to the person who invited me. Only, they weren't on Wave much, so it took them a while to get back to me. I wasn't on Wave much after that, because there was no one to talk to. I think you get the idea. In some cases, causing an artificial shortage of supply can be beneficial. But that is the exact opposite of what you want if your product is subject to network externalities.

Ironically, Google seemed to learn this lesson too late, and then had a much misinformed launch for Buzz. Buzz is Google's... well, I don't think Google is really sure what Buzz is supposed to be. It's kind of like Twitter, but more like Tumblr. Tumblr is a light-weight blog, without the Twitter character limits, and primarily works as a link dump. Tumblr doesn't allow comments, while Buzz does. Anyway... Buzz's launch failed because suddenly everyone with a Gmail account had a Buzz account, which just so happened to reveal information about who you email the most. Whoops.

But Buzz isn't catching on and partly that's because Buzz has to worry about the network effect in a different way: Users have finite energies to dedicate to social networking. Facebook fills one aspect, and Twitter picks up the slack by being different in some key regards. Getting a Facebook account is valuable, because so many people are on Facebook. The same is true with Twitter. But these are active users, which provide you with a reason to go on and stay on. Knowing many people with a Buzz account isn't the same if they aren't active on Buzz. And many people disabled their new Buzz accounts anyway, due to its tendency to over-inform you about comments made by friends-of-friends to a post you didn't even comment on.

Google should have been much more open about letting people use Wave. It should have allowed anyone with any-email-account-at-all to automatically have an account. Here's how it should have worked from Day One: If you wanted to send someone a Wave, you would use their regular email address within Wave. That would have then sent them an email with the text of the Wave, plus a link to view the Wave itself. At that point, you could opt-in to Wave and choose to have further updates sent to your email address or to set up reminders to check it only when major updates to the Wave have been made. Seeing the advantages of Wave, users would stick around and start sending out their own Waves.

Instead, not even Gmail users got a Wave account. Hardly anybody got a Wave account. And those who did found it to be just-another-website-to-check. Had Google even done the cursory Gmail-integration that Buzz has, and made Wave part of Gmail, it might have seen more success.

***

Although, actually, it wouldn't. There were too many other problems:

1) Users didn't want character-by-character typing; it was a flaw, not a feature. IM programs (remember when they were standalone?) could have done this a decade ago, and there's a reason they haven't supported it. Some users don't even like the "is current typing" messages in some IM systems.

2) As mentioned above, there was poor integration with Gmail. Having to check yet another website just isn't productive. [Message to Google: Please integrate Voice with Gmail, for the same reason.]

3) There were major bugs. I tried Wave for a project with three people. I wanted to use it as a wiki and discussion system. It sounded like the perfect application for it. But it couldn't even scale. It had numerous server and client side bugs and poor conversation threading support. It was also far less wiki-like than that Google I/O demo. Really, it was like a different product entirely.

4) None of the extensions seemed to work. Making a new poll, for example, wasn't intuitive or possible. And then "Add gadget by URL"? Really? That's how you make productive users? Instead of showing them a page of possible extensions, and populating the quick access list with a dozen actually useful gadgets, you wanted users to enter a URL?

5) Over half of the screen was by default non-Wave content. I could imagine the default working only for those few folks with very large monitors and the habit of having fully expanded browser windows. Each time I tried Wave, I would have to click to minimize my "inbox," just so I could see content. I bet there is something about that in the usability literature, because any single extra click a user has to make ("it's just one click!") seriously effects their overall experience.

I'm glad that Google is an engineering company, in the sense that they gave a bold idea a fair chance-- there wasn't a real business case, it just seemed like an interesting artifact to make. I just wish it was an engineering company that knew more about economics 101.

Update: Steven Levy's book In The Plex covers some really interesting stories about Hal Varian, Google's chief economist. They do have tons of good econ skill when it comes to auctions!

19 July 2010

Long lists in LaTeX

The ACM transactions style file contains a great environment called "longitem" that can turn your too-long bulleted list into something just right. Here's what you can put into your LaTeX file to get it:

\newenvironment{longlist}
 {\begin{list}{---}{
  \setlength{\rightmargin}{0in}
  \setlength{\leftmargin}{0in}
  \settowidth{\labelwidth}{---}
  \setlength{\itemindent}{\parindent}
  \addtolength{\itemindent}{\labelwidth}
  \addtolength{\itemindent}{\labelsep}
 }}
 {\end{list}}

Then, just use "longlist" where you would otherwise use an "itemize."

03 June 2010

Painless Code Listings, CSS edition

We've seen Painless Code Listings for LaTeX. Now, here's what I have for HTML:

/* Adds scroll bars when code snippet is too large */
pre {
  padding: 3px;
  background: #FFFFFF;
  width: 95%;
  max-height: 5in;
  overflow: auto;
  border: silver 1px dotted;
}

I used this for my Quine page, and it seems to have worked out pretty well. It adds both vertical and horizontal scroll bars, which seems more humane when you want people to be able to read your content and copy and paste your snippets. You can see what it looks like, above.

31 May 2010

How to Surprise Your Advisor

The PhD you get is the PhD you work for. Just because your advisor has certain expectations for you in mind doesn't meant that you can't exceed them! Here are some ways you can surprise your advisor:

Be involved in other projects. For example, even though it's not your research area, do you like parallel programming? Then go to the weekly meetings. Then, if something related to parallel programming comes up, you can make your advisor's jaw drop when you are fluent in it. Obviously, you can over do this. Don't let it get in the way of your primary duties.

Know the conferences. You should know the top conferences in your area. Check out their pages and see what the upcoming deadlines are, and what the co-located workshops are. Mark deadlines in your calendar. Study the members of the PC, and find out more about their work.

Understand your group project. Are you part of a five person group? Don't stick to just knowing and studying "your part" of it-- look at what your teammates are doing and seek to understand their part too. This could help your own work because you can understand the bigger context. For example, you might think of a way the overall system can be improved. Or, you might help your teammate by showing them a related work you found on your own.

Generate ideas. Science is an art. There are no fixed ways for you to consistently generate brilliant ideas that have impact. But here are some things I've found helpful:

Aim Big - Go for the real problems that bug you. Go for the Holy Grails. Do so with concrete ideas. If you've aimed high enough, there's probably a dissertation in there.
Read - The best way I get ideas is reading other people's work. Always make notes when reading, and then later process these notes as a source for papers to work on. A dozen bad ideas that lead to a workable idea makes it all worth it.
Combine - Bring a separate interest into your research area, and see what you get. I came up with the idea for my dissertation when I asked my wife, who has a background in fine art, what "deconstructive programming" would look like. Her answer helped shape my big idea into something I could build.

But, remember, your PhD is not about your advisor. Your PhD is about you. You aren't on a quest to impress your advisor, but they are your best guide for your research career. Surprising them just shows them how well you are on your way.

24 May 2010

The Clipart Syndrome

Michael Ernst has some super advice for giving a talk. One point is subtle and deserves extra attention:

"When giving a presentation, never point at your laptop screen, which the audience cannot see."

You should interpret this broadly and metaphorically, not just literally. When you point to a part of a slide, you are pointing to it because it reminds you of something.

Take a simple example with harmless clipart: You might have thought deeply about some concept when making your slides, and picked out a drawing to fit what you had in mind. Now, when you see that clipart, you get an immediate association with that concept you had in mind. It's a mistake to think that pointing to that same piece of clipart will resonate with your audience the same way it does with you.

It follows that this doesn't have to be clipart, either. It could have been a word you chose, a slogan you conjured, or a diagram you constructed. You must be very careful that it means the same thing to the audience that it does to you.

One of the worst ways this clipart syndrome manifests itself is when it applies to the whole slide. When you see a slide, you immediately think about what was on your mind when you made it. I've seen some speakers, who weren't even short on time, bring up a slide filled with text, talk about it using different terminology, and then skip to the next slide before I could even skim the first half of it.

The clipart syndrome applied to a whole slide can also make an audience motion sick. An unprepared speaker might jump forward five slides, because while on their feet they thought of a great connection to other material, and then later jump backward ten slides, because they just thought of another cool connection. When doing such slide jumps, the audience might not even know if the speaker is going forward or backward. [And the speaker isn't even going to show the slide long enough for it to sink in even if it could!] If you must jump to a slide, use the menu to navigate to it directly. This will spare the audience from seeing that fly-in animation happen five times.

***

The reason there is a reference problem in the first place is due to the lack of context.

Your audience has little or no context for your work, while you have been deep in the trenches so long, the context is the only way you see the world. That's why motivation is so important. When you show that something is a problem, keep in mind that your audience might not even see how the problem is a problem. You will need to be explicit: Not only do you say what the problem is, you should say what the implications of that problem are, and what opportunities are missed.

The goal is for the audience to think in the first minute "this might not be my area, but thank god someone is working on this and that this solution exists."

***

You should also banish the temptation of making your talk or publication a mystery novel. One trap I've noticed students fall into is making their talk mirror the structure of a mathematical proof. That means the talk begins with some definitions, some more definitions, a discussion of the actual material, and then, only at the very end, do you see where all of it was going. I can't stress enough that audiences-- even students in a lecture-- won't appreciate this approach.

Perhaps you've even come up with a great definition that generalizes your contributions and it became a real ah-ha moment and breakthrough in your work. Your ah-ha will just look like clipart to someone else if you don't provide it with the context it needs.

23 May 2010

Approximations and the tools you have

If you are interested in finance then you might be aware of The Rule of 72. The basic idea is if your investment gets interest at some percentage r each year (e.g., a ten percent return implies r=10), then you'll have to wait 72/r years before your investment doubles. That means stocks that have an annual return of about 10 percent will double in 7.2 years. (Note: If you account for inflation, stocks have an annual return of 7%.)

I thought the rule was slick, but when trying to do the math things just didn't add up. You can try constructing the recurrence relation yourself, or you can read moneychimp's explanation. The formula I got was to calculate the log of 2 (two for doubling), base (1 + r/100); it turns out the whole time that the rule of 72 is just an inverse linear approximation to the logarithmic function. But it is mostly right for the ranges people would care about. (For a rate of 5%, the log gives you 14.2 years, while the rule of 72 gives you 14.4; for a rate of 50%, the log gives you 1.7, while the rule of 72 gives you 1.44.)

f(x)=72/x is a really simple function, and when comparing the plots of the two functions it is pretty impressive how close it is:

I like the times when all of the math I've learned, even the basic concepts, become useful in analyzing other things. A computer scientist isn't trained in finance, but knowing that what you've learned can be applied widely can empower you.

I had a physics professor who once quipped "men see parabolic trajectories more often than women do." [So as not to leave it too cryptic, he was referencing stand up urinals.] It's an interesting way of thinking about gravity, and it once made me realize something: A friend in Worcester was showing me his "movie" gun (i.e., it wasn't real) and I considered the scope. I knew that scopes were meant for different target ranges, and it was then that I realized that a scope on a gun is a linear approximation to a parabola. Each scope setting is meant for different ranges, which approximates different parts of the parabola. (Also, they only work with Earth's gravity.) I mention guns in my blog only because, after explaining this to my friend he said something funny: "Man, I should totally take you out shooting; you'd be such a great target... I mean shot."

***

Side note: If investment interests you, check out this piece on index funds by Scott Adams, author of Dilbert. The company referenced, Vanguard, is an excellent one due to its unique corporate structure: Customers are also shareholders in Vanguard, so the company always has the customer's best interests in mind. In our culture it's sometimes considered boorish to talk about money, but somebody's got to tell you about it! Particularly now that pensions are going away and that defined contribution plans are your responsibility, you should empower yourself by knowing as much as you can.

20 May 2010

But what can you do about risk?

We've discussed the risk of graduate school, which has left Keyvan a little "distracted."

So, what can we do about that risk? To answer this question, it's easier to answer a much broader question. I found this one passage, quoted in a speech by Tom Malone (author of The Future of Work), to be helpful. Here is Tom, and then the quote:

"You probably have more choices than you realize. To make choices wisely, you need to think about what really matters to you."

"What can I actually do? The answer is as simple as it is disconcerting: we can, each of us, work to put our own inner house in order. The guidance we need for this work cannot be found in science or technology... but it can still be found in the traditional wisdom of mankind."
-- E.F. Schumacher, Small is Beautiful, 1973.

19 May 2010

The Risks of Graduate School

Going to graduate school for your PhD is a risky endeavor. Here’s what’s on my mind:

Opportunity cost. By taking 5-6 years for a job that is very low pay (effectively at subsistence levels), you are forgoing a higher income and the advancement and promotions that 5-6 years work in the industry would give you. Assume a portion of that difference is invested at a return of 3-7%, and it’s a pretty large sum of money. Also, in the industry, a masters degree almost takes you as far a PhD would. Going for the masters-only has lower opportunity cost. Do the math and decide for yourself if a PhD is really worth it.

Getting scooped. You can put three years of work into a result, only to see a much larger and more substantial result be published or released by someone else, just as you are getting your LaTeX macros in order for your first publication. I knew a brilliant researcher at a top school working on a project for two years, only to see a major company release something for free that effectively enveloped the whole project. Often, when you are “scooped” by some related work, there is something different about your approach that makes your work still publishable; but sometimes that just doesn’t happen.

Market forces can’t be controlled. Say that you found grad school worth doing, and did all you could to avoid being scooped. That’s all and good, but when you graduate you might find that your dream research job is no longer available. Both 2009 and 2010 were bad years to be on the academic job market, where only some areas were in high demand. Many people are waiting it out in postdocs, but some have left the academy permanently. Who knows what the market will look like in 5 years.

If you knew it couldn’t fail, it wouldn’t be research. We don’t know if research projects will work. Alan Kay once said that if you aren’t failing 90% of the time, you aren’t aiming high enough. It could be that you have lots of great ideas that just don’t go anywhere, through no fault of your own. Such an “unlucky” person might still be able to form a thesis and graduate, but it might not be enough to launch a research career.

Relationships suck. Being a student is a human activity. You form a bond with your advisor the way an apprentice would with a master. If something harms that relationship, you might need to pick a new advisor and start all over again. The saddest possible way I’ve seen this happen was when a student highly specialized to his advisor’s area, and had to leave the program when his advisor passed away.

There are doors. Let’s be frank. Every time you walk out of your front door, you are taking a risk. Life is not about risks, per se; but living does not happen without risk. Staying inside is a risk, going outside is a risk, and so is standing in the doorway.

17 May 2010

Painless Code Listings

Even though LaTeX is a tool of, by, and for computer scientists, it sure doesn't make it easy for you to include code samples in your papers. Sure, there is the verbatim package, but what if your code listing is too big, and the next font size down is too small? Also, how the heck do you draw a frame around the text?

Let's talk about it. Here's my solution:

You should put line numbers in your code listings. This makes it easier to refer to particular parts of the code; easier for you in your exposition, or easier for your reviewers. It also seems to give the listing a clean, scientific look.
Putting a box around the figure helps set it apart from the rest of the text, and avoids "bumping." However, LaTeX is surprisingly bad at letting you compose different commands; you can't just put an "fbox" around your verbatim and leave it at that. Instead, you need to construct it in an lrbox first, and then put the fbox around the lrbox.
Actually, this box trick works well for the other problem: Finding the right font size. If you use a minipage, you can make it as exactly the number of columns wide as you need.

Here's it all put together:

\newsavebox{\savelisting}
\newenvironment{listing}
{\begin{lrbox}{\savelisting}
\begin{minipage}{4.5in}
\begin{flushleft}}
{\end{flushleft}
\end{minipage}
\end{lrbox}
\begin{center}
\resizebox{\columnwidth}{!}{\setlength\fboxsep{6pt}\fbox{\usebox{\savelisting}}}
\end{center}}

Then, use it in a figure like this:

\begin{figure}
\begin{listing}
\begin{verbatim}
01 void sort(int[] arr, int start, int end) { ... }
\end{verbatim}
\end{listing}
\caption{A code listing}
\end{figure}

Using the "times" package for the fonts, and the 4.5in minipage width, gives you 60 columns of code. (You could have different widths for different code listings, making each exactly what you need and no more, but then that would cause an inconsistency in size across your listings.)

Here's the result:

04 April 2010

More Matters of Interface

In The Interface Matters, I gave a first approximation of some fashion tips for the academic interview. To follow up, I'll first give you some thoughts on accessorizing:

Get yourself a nice pen. One of those $30 range kinds. Many people like the Waterman pens, but I like the David Allen Folio pen better. It's particularly nice to bring to meetings.
We don't tend to need watches much any more, but a nice one can help complete your look and give you confidence.
Look into pocket squares and tie clips. If you are wearing a tie clip, a ring, and a watch, consider following the "match metals" Design Pattern. (Include your belt buckle in the equation.) The metals only need to match in look, not element, e.g., so you can have a stainless steel watch to go with your white gold ring.

Other thoughts:

In addition to the slim cut for your suit, be sure to ask your tailor to show some cuff.
If you really cannot afford a suit, then go with dress pants and a shirt and tie. Interviews are during the winter, so you'll have your winter coat on too, and indoors it might just look like your took your jacket off.
Work on slimming down your wallet, keeping only the essentials. To help prevent pocket bulge, you can get a card holder for your license and ATM/credit cards and a money clip for your cash.
Work on your posture. When sitting down with committee members, don't swivel your chair.
I probably should have mentioned to get a haircut, too.

Hmm, come to think of it, this list can function as a great gift guide if you know a graduating academic.

For more information, particularly since I haven't addressed fashion for female computer scientists, check out Academichic.

24 March 2010

The Interface Matters

This post builds on previous work by Fortnow 07. When you are interviewing for a faculty position, you must be mindful of how you are dressed, which is your interface to the world. Here, I'll talk about the basic rules.

The Very Basics
Let's start with some things we should talk about. All of these may be obvious to some of you, but with an abundance of caution, I think they are worth repeating. These rules always hold in professional settings, even when not interviewing:

Whenever your shirt is tucked in, wear a belt. It literally and visually "ties" your top half and bottom half together. (Without the belt, something will look "off" that you can't explain. Think back to 90's sitcoms.)
Don't wear a belt with suspenders. (The engineer in you should appreciate that this would be redundant.)
Make sure your belt matches your shoes. You can pick either brown or black. You can never go wrong with black. (With brown, be sure you match the shade of brown, too.)
Make sure your socks match your pants. For example: navy socks? navy pants. If you are really desperate, you can be safe wearing darker socks. But...
Never, ever wear white socks. Not unless you are doing athletics.
Your socks must be pulled up all the way. The rule is that you should be able to cross one leg over the other while seated without exposing your skin.
A tie with patterns on it will conceal accidental food stains better than a solid color tie. Also, solid color ties will make you look too uniform.
Find a tie that makes you feel "on." (Avoid cute ties for the interview.)
Trim your finger nails.

Interview Rules
There are more rules you must follow when doing the academic interview:

Your shirt must be tucked in. (As a result, wear a belt, and follow all belt rules.)
Wear a button-up, long sleeved shirt. You can never go wrong with a plain white shirt. (For one day of the interview, you can wear a light blue shirt, and a white one the next.)
Wear a suit. Because you will be traveling and having a very long day (which can involve all three meals with members of the committee), wear dark colors. Dark colors conceal stains and hide creases. It's always safe to go with dark navy.
Don't button your suit jacket's bottom button, even when getting measured for alterations. The bottom button is for decoration only.
Speaking of which, get your suit altered. It will look odd if the sleeves or the pants are too long (or too short, if you buy the shorter one thinking the longer is "too long"!). When you get your suit altered, bring your dress shirt too, and also get that altered. In the end, you will be very, very comfortable.
Wear nice shoes. As a good starting rule, find ones with thin laces, which are dressier. A Google image search for "mens dress shoes" turns up what I mean, but pick conservatively: A single color (and not in crocodile design) with laces (not a buckle) is what you need.

The above rules are non-negotiable for your interview. The question to ask yourself is: Am I helping my case? If you show up looking like a graduate student, you might only re-enforce concerns faculty may have already had. Also, you need to show that you care enough to follow the rules.

Now, some of you may be thinking about counter-signals right now. If you are a real hotshot, for example, can you get away with looking like a hobo? Maybe you can get away with it. But could it help? You'll be compared to other candidates, who will be as equally outstanding as you. If they happened to be wearing a suit, and had a well-practiced talk, with no errors on the slides, and done their research about the school and courses? People will say they were polished. It is very, very hard to make that a negative.

Dressing Down
If you do feel over dressed you can always remove your tie and unbutton your top two buttons. It's as simple as that.

Recommendations
I recommend Target's Merona line of suits, because you have a grad student budget. They are sold as separates, so be sure you read the labels of the pants and jacket to be sure they match exactly (you don't want to be caught with charcoal pants and a black jacket).

For undershirts, Banana Republic has some great cottons that are breathable and can move with you. They can also help whisk away sweat.

For dress socks, try a department store like Macy's. You want some that can go long enough. (Banana Republic oddly only sells novelty socks, which don't go high enough.)

Extra Credit
Since we are talking about fashion, here are some bonus rules:

Unless you are over 50 years old, don't have cuffs on non-pleated pants.
Generally, the button-down collar isn't as fashionable, particularly after 5pm... But academia is the exception! Go nuts with the button down, because the dean you meet will probably be wearing one too.
Also, as an academic, you can wear loafers too.
Go for a two button suit. You can try three, and one might be too radical. But please have your jacket be single breasted, not double. Otherwise, people will ask you where the dock is or will want to give you their drink orders.
When seated, don't have your jacket buttoned.
When you are tying your tie, give it that little dimple (shown to the right). It will keep the tie from looking flat. No one will likely notice, but, you've come this far, so you might as well know.
When you get into the hotel, run a hot shower and then hang your suit in the steam. The steam will help get out the travel wrinkles (from either your suitcase or from wearing it).
If you have problems with perspiration, get an antiperspirant and put it on the night before your big day. When you shower in the morning, then you can put on a regular deodorant or a light antiperspirant. Antiperspirants are weird, powerful things and it is the process of your skin absorbing them overnight that makes them effective. This holds even if you wash your armpits with soap the next morning. As such, you probably don't want to use that stuff everyday!

The single best way to make a suit work? Get the skinny cut. Based on the suit you start with (and your body type), you might need to ask for the sleeves to be narrowed. The slim fit prevents you from feeling like you are swimming in your jacket (or your pants). The slim fit is narrow, but it is not tight. Be sure you get a tailor who knows style. Avoid the alterations done at dry cleaners... those are basic alterations that won't make you look as good as you can. Instead, look for alterations at places that do wedding dresses and other things. If they don't know what you mean by a skinny fit, run.

If you followed my Merona recommendation, you may find yourself spending as much for the alterations and dry cleaning as you did on the suit itself. It may not last you for years, but it will still look great and you'll be better served than what they try to sell you at certain overpriced men's suit stores.

After following these rules for a while, you'll start to notice how others dress and you can get more ideas. You can also watch TV and see what the non-eccentric, good guy characters are wearing. (Don't go by what late night talk show hosts wear, which are often funny looking because their job is comedy!) Also, you can never go wrong seeing what The President is wearing. And, yes, his ties have a dimple.

21 March 2010

Beware Approximate Models

Joel Spolsky has closed his blog Joel of Software, but the discussion he started continues. One persistent article is on The Law of Leaky Abstractions. While there are other critiques of the claims in the article, none have quite settled on what I found unsettling about it. We must properly use abstraction and beware of both its advantages and limitations. In particular, it is dangerous to use an abstraction with an oversimplification of how it works.

What Abstraction Is
Abstraction is a mechanism to help take what is common among a set of related program fragments, remove their differences, and enable programmers to work directly with a construct representing that abstract concept. This new construct virtually always has parameterizations: a means to customize the use of the construct to fit your specific needs. For example, a List class can abstract away the details of a linked-list implementation-- where instead of thinking in terms of manipulating 'next' and 'previous' pointers, you can think on the level of adding or removing values to a sequence. Abstraction is an essential tool for creating useful, rich, and sometimes complex features out of a much smaller set of more primitive concepts.

Abstraction is related to encapsulation and modularity, and these concepts are often misunderstood.

In the List example, encapsulation can be used to hide the implementation details of a linked-list; in an object-oriented language, for instance, you can make the 'next' and 'prev' pointers private, where only the List implementation is allowed access to these fields.

Encapsulation is not enough for abstraction, because it does not necessarily imply you have a new or different conception of the constructs. If all a List class did was give you 'getNext'/'setNext' style accessor methods, it would encapsulate from you from the implementation details (e.g., did you name the field 'prev' or 'previous'? what was its static type?), but it would have a very low degree of abstraction.

Modularity is concerned with information hiding: Stable properties are specified in an interface, and a module implements that interface, keeping all implementation details within the module. Modularity helps programmers cope with change, because other modules depend only on the stable interface.

Information hiding is aided by encapsulation (so that your code does not depend on unstable implementation details), but encapsulation is not necessary for modularity. For example, you can implement a List structure in C, exposing the 'next' and 'prev' pointers to the world, but also provide an interface, containing initList(), addToList(), and removeFromList() functions. Provided that the rules of the interface are followed, you can make guarantees that certain properties will always hold, such as ensuring the data-structure is always in a valid state.

Although terms like abstract, modular, and encapsulated are used as positive design descriptions, it's important to realize that the presence of any of these qualities does not automatically give you good design:

If an n^3 algorithm is "nicely encapsulated" it will still perform worse than an improved n log n algorithm.
If an interface commits to a specific operating system, none of the benefits of a modular design will be realized when, say, a video game needs to be ported from Windows to the iPad.
If the abstraction created exposes too many inessential details, it will fail to create a new construct with its own operations: It will simply be another name for the same thing.

Joel's So-Called Law of Leaky Abstractions
Joel's law of leaky abstraction states: "All non-trivial abstractions, to some degree, are leaky."

We should first pause to examine the law itself-- like similar "laws," it is not falsifiable. Based on how you define what is trivial or not, or what is leaky or not, any claim of leakability can be made about any abstraction.

So, we should look at what Joel identifies as leaks. Joel's leaks fit the following patterns:

Accidental Limitations - E.g., there is no perfect String class in C++, because the language lacks certain abstraction features. Poorly designed APIs are also limitations that are purely accidental.
Layering Limitations - E.g., TCP is layered over IP, and so it shares the fundamental limitations of IP, even though it provides new functionality built on top of it. NFS is an example of a layering that creates a new interface to a service, but cannot provide the same availability characteristics expected of other file systems.
Performance Limitations - E.g., SQL queries are supposed to be fully declarative (as if writing in relational algebra directly), but when performance matters, you must have intimate knowledge of the database's execution.

Accidental Limitations are a problem when it comes to using real code, particularly when that code is part of an API you need to use. Claiming the entire concept of abstraction is "leaky" because people can write poor code isn't quite right: We could try to claim the same thing about any other useful technique or tool that may be misapplied.

As for Layering Limitations, these aren't really "leaks": We cannot expect abstraction to do what is physically or computationally impossible (wires in a network can get cut, and hence your music over TCP/IP won't play).

That leaves Performance Limitations, and is the strongest argument Joel makes. When the term separation of concerns was coined, one concern described was efficiency. To get the best performance, we need to understand things like memory hierarchies and database engines. You can reason about efficiency separately from other concerns: For example, one day I can look at my database queries and reason about their correctness; and on another day, I can look at the same queries and reason about their efficiency. The query abstractions help me reason about correctness-- and even efficiency, because many optimization cases I would otherwise need to worry about are handled for me.

Why can't the efficiency concern be properly isolated? Because efficiency is inherently a crosscutting concern. Any single weak-link in a chain of unbelievably fast code will drag the entire performance of the system down with it. Research in aspect-oriented programming, which is focused on isolating crosscutting concerns, has had limited or only domain-specific successes in isolating efficiency concerns. It is a hard problem, the philosophy of which would lead us to talk of artificial intelligence and computational complexity.

So, in examining Joel's argument, we find he wants us to believe the problems of efficiency are related to the problems of poor written APIs and also the problems of fundamental limitations. No wonder he is left to conclude that abstraction is "dragging us down." But we don't have to be so gloomy.

Leaks that Aren't
Because abstraction cannot do the impossible, and because isolating efficiency concerns is a long shot possibility, we have to be careful when using abstractions. In particular, we must be sure with the abstractions we use that we are modeling them correctly.

Donald Knuth has said that a computer scientist must have "the ability to work with multiple levels of abstraction simultaneously," from high-level to low-level.¹ We cannot work with simpler models in our minds just because we are working with something that looks simpler.

Let's look at garbage collection as an example. Today, unless a language is meant for writing operating systems, virtual machines, embedded systems, or any other exceptions, it will have garbage collection. The basic idea is that you can allocate memory, but do not have to worry about deallocating it. However, that is the basic idea, and not the true abstraction! In reality, we don't have infinite memory. Here is the actual idea behind garbage collection:

Allocated memory that is no longer reachable from any of the program's threads can be safely deallocated by the garbage collector.

This idea is radically different from the infinite memory idea. For instance, we now have to know about the concept of reachability. How does this change the way we use garbage collection? Under the first, approximate model of GC, we will be lead to memory leaks under certain circumstances. In the second, correct model of GC, we will prevent memory leaks because now we know about WeakReferences.²

By thinking that the GC "is magic," we will be lead down the wrong path. For example, there are times when we know that an object "will no longer be used," but that is not the same as "is no longer reachable." Thus, if we have the right conception of the GC in mind, we find ourselves using WeakHashMaps more frequently.³ My rough estimate is that 20% of the HashMaps I use need to be WeakHashMaps; without the proper conception of GC, then a significant portion of my code would be wrong!⁴

Abstraction is about Design
Abstraction is not primarily about overcoming fundamental limitations of the machine or its environment. Abstraction is about organizing your code to match your conceptions of your design.

Although abstraction provides many benefits that simplify the task of programming, these simplifications are not strictly necessary. We can write our programs without so many abstractions. For example, whenever we need a Node structure that holds a value and pointers to two other Nodes, we can use a simple Node class, and use it for both a doubly linked-list and a binary tree. If we programmed like this, we could even save a few lines of code: The routine to get the right-most node of a binary tree could also be used to iterate through a doubly linked-list.

We should stop thinking of abstractions necessarily as higher or lower level than each other. Abstractions are relative to your design, and what looks like "the same" abstraction in one case can be too high level (leaving out too much), too low level (over specified and lacking flexibility), or just right.

Instead, think about abstractions as providing support for your design conceptions. If, in the process of creating your abstractions, you are lead to "leaks" then the problem is in your design.

*** 1. Further quoted: "When you're working at one level, you try and ignore the details of what's happening at the lower levels. But when you're debugging a computer program and you get some mysterious error message, it could be a failure in any of the levels below you, so you can't afford to be too compartmentalized."

2. And SoftReferences, for when performance concerns are high.

3. Also see Google's MapMaker factory: new MapMaker().weakKeys().makeMap().

4. When thinking about memory, we also need to remember that Accidental Limitations do come up. For example, when using the substring method, you may need to wrap it around a call to new String().

Tales from Programmatic Oceans

@macshonle@c.cim - Mastadon