The Biggest Pond

What I learned taking one of MIT’s hardest graduate computer science classes (besides lots of math)

typical Tuesday morning

This past fall, I had the rare opportunity to take a graduate engineering course at MIT without being enrolled in a degree program. I knew a class would be hard, but being a student at the best technical school in the world, if only for a couple months, was something I had to experience. I talked to some students and recent grads, picked a class, and registered.

It’s hard to explain what the class was about. It was called “Algorithms for Inference”, and it focused on probabilistic graphical models, a framework for dealing with complicated probability distributions. There are no analogies I can use to explain what that means that would make sense to someone who hasn’t taken a graduate level probability and statistics course. Essentially, the course covered a lot of the mathematical techniques that underlie cutting edge artificial intelligence algorithms. Despite being an esoteric topic, it has become pretty trendy in certain computer science communities. It’s also known by students to be one of the hardest graduate subjects offered by MIT’s elite Electrical Engineering & Computer Science department.

The first lecture was fun. The professor was great and rather brilliantly outlined the course and its significance in the larger field of artificial intelligence and machine learning. It wasn’t until I sat down with the first problem set that I realized just how much trouble I was in. I remember spending an entire Saturday on the first question and barely making progress. My old college textbooks and Google searches were all but useless. After hitting the library and getting help from the TAs and my PhD co-workers, I finally started grasping the fundamentals. But things were going painfully slowly. If you’ve ever wondered how smart MIT students are, they’re crazy smart. Scary smart. One-in-a-thousand smart. I wouldn’t be surprised at all if the average IQ in that classroom was 140. Being so thoroughly outmatched was a new experience for me.

A few problem sets into the semester, I started to get the hang of things. Still, I had to spend about 20 hours per week outside of the classroom reading and working on problems. I came to appreciate that struggling with the workload is a fundamental learning tool. The expectations on an MIT graduate student are intense. This course packed far more content into a semester than my bachelor’s and master’s courses had. You can’t learn material at this depth by simply attending lectures and flipping through slides. Only by working through the problems, which always seem impossible at first, can you develop a profound understanding of the mathematical nuances that make these algorithms and models work. My classmates knew this and rarely complained. I found that the students weren’t just intellectually gifted but also ridiculously hard-working. I struggled at times to motivate myself, but I put in the work, managed roughly average performance, and made it through the class.

So what did I learn (besides probabilistic graphical models)?

When it comes to intellectual pursuits, striving to be the best is an impossible goal. This might sound like blasphemy to the over-achiever culture, but let’s be real. There will always be someone better than you in some way: smarter, faster, or harder working. Rather than worrying about being the biggest fish in your pond, focus instead on finding the right pond. Lots of people dream about getting a PhD at MIT because we’ve all heard stories about amazing people conducting impactful, innovative research. No one aspires to “just get by” at an elite institution. Find a place in the world where you’re challenged and need to put forth your best, but also where you’re good enough to make a real difference. And if you keep working hard and learning, who knows where you’ll end up.

Photos from the 2014 Boston Marathon

We had another great year watching the marathon in our usual spot near the Newton firehouse. We arrived just in time to see the elite women.

marathon01

Although it wasn’t as obvious on TV, the security presence was noticeably larger.

marathon02

Twenty minutes after the women, Meb Keflezighi runs by, way out in front of the pack.

marathon04

…and looks back.

marathon05

The elite men’s pack isn’t too far behind.

marathon06

This year, the crowd on Commonwealth Avenue is the largest I’ve ever seen.

marathon07

Despite the increased security, fans still cheer on the runners in creative ways.

marathon08

And the crowd absolutely erupts for Team Hoyt, possibly running their last marathon.

marathon11

Over-the-Air HDTV Looks Good

My latest move in the on-going battle to keep my cable bill down was to ditch the set-top box on my second TV. Now, I’m using an inexpensive antenna and over-the-air DVR. The picture quality is actually amazing. It turns out the over-the-air standard is 19.39 mbps, while cable companies allegedly compress the video down to the 8-13 mbps range.

NBC screen shot

I’d post a full-resolution screen shot, but I don’t want to get sued. Take my word for it: it looks better than cable.

If you’re in range of broadcast HDTV (you can determine what channels you’ll get here), I highly recommend grabbing an HD antenna (like this one) and trying it out. I’ve also been using this over-the-air DVR with a USB hard drive. It’s got a clunky interface, but works just fine.

The new equipment will pay for itself in a couple months. Plus, the video content I record is completely DRM-free, which means I can watch it on any device without any subscriptions.

Expected Payout for Football Squares

Ever wonder which squares you want to get in your Football Squares pool? I did a quick analysis using NFL data, looking at the last digit of the score after each quarter of every game since 2002.

Assume a $100 pot ($1 per square) with these payouts:

  • 1st quarter: $12.50
  • Halftime: $25
  • 3rd quarter: $12.50
  • Final: $50

Here’s the expected payout of each square:

1 2 3 4 5 6 7 8 9 0
1 $0.90 $0.17 $0.95 $1.40 $0.25 $0.45 $1.36 $0.51 $0.35 $1.24
2 $0.17 $0.07 $0.18 $0.33 $0.14 $0.14 $0.46 $0.09 $0.12 $0.49
3 $0.95 $0.18 $2.86 $1.63 $0.19 $1.18 $2.98 $0.54 $0.47 $4.27
4 $1.40 $0.33 $1.63 $2.17 $0.25 $0.72 $2.82 $0.53 $0.45 $2.96
5 $0.25 $0.14 $0.19 $0.25 $0.15 $0.13 $0.47 $0.22 $0.10 $0.46
6 $0.45 $0.14 $1.18 $0.72 $0.13 $0.55 $1.13 $0.24 $0.24 $1.52
7 $1.36 $0.46 $2.98 $2.82 $0.47 $1.13 $4.22 $0.55 $0.56 $5.67
8 $0.51 $0.09 $0.54 $0.53 $0.22 $0.24 $0.55 $0.46 $0.13 $0.88
9 $0.35 $0.12 $0.47 $0.45 $0.10 $0.24 $0.56 $0.13 $0.16 $0.57
0 $1.24 $0.49 $4.27 $2.96 $0.46 $1.52 $5.67 $0.88 $0.57 $7.45

Any expected payout greater than $1 (shown in bold) is a great square!

(Disclaimer: This database is derived from the AdvancedNFLStats.com play-by-play dataset, which contains play data through Week 12 of the 2013 regular season. All subsequent processing is automated and has not been verified for accuracy. I make no guarantee that these stats are accurate — use at your own risk!)

Three Important Lessons for New Coders

I’m told there are workplaces where large, skilled software engineering teams follow textbook development processes and efficiently produce reams of high-quality code. I’ve never worked at a place like that.

The rest of us are stuck dealing with:

  • small teams lacking practical software engineering experience
  • big, ugly legacy code bases
  • having to write code in between meetings, PowerPoint, and other job responsibilities, and
  • engineering processes ranging from “loose” to “non-existent”.

This is a dangerous world for a new coder. It’s easy to develop bad habits. Ultimately, it’s up to you to keep pace the larger software engineering community, and this extra effort pays off when you look for your next job or project. There are tons of books and blogs about software engineering. But here are the three most important things you can do to improve your value as a software engineer:

1. Write automated tests for your code.
Every language has one or more popular testing frameworks (like JUnit for Java and Nose with unittest for Python). Back in college before I learned to write unit tests, I would write a quick script or modify the application to print some debugging information. That’s how I would verify that some new function or class was behaving properly. Unit testing formalizes this process. The biggest benefit is that you build up a collection of these tests that you can run again later. This helps squash bugs caused by new code breaking older code. Even if no one else on your team is writing automated tests, you can and should write test cases for your work.

2. Use revision control for your source code.
Source code revision control systems let you commit code changes to a repository and preserve a revision history as your project evolves. They’re pretty much a necessity when coding with a team. It doesn’t matter which one you use, but be warned that a lot of developers are seriously attached to their system of choice. Subversion is really easy to learn, so I’d suggest starting there. Git is what I use now, but anyone that tells you it’s easy to learn is a lying liar. A less obvious benefit of using source control is that you can delete old code without worrying. You can always get it back from the revision history. Uncluttering your project makes it easier to work on.

3. Learn some important design patterns.
Breaking down your project into a class hierarchy can actually be pretty fun. There are lots of design patterns you can follow to decompose your problem. I’ve found these two to be the most helpful for small projects: Composition (often a better option than traditional inheritance taught in introductory programming classes) and Dependency Injection (DI). Despite a scary-sounding name, DI is a very simple design pattern that helps you abstract out dependencies on libraries and services. If you’ve ever had to change the database you’re using, for example, you’ll immediately see why this is a great technique.

And one last bonus tip: write less code! The great Edsger Dijkstra teaches us to think in terms of “code spent” rather than “code produced”.