In the tradition of educating the masses on memes and Internet trends, I’ve recently been in love with appropriation of Snoop Dogg. This post isn’t personally an endorsement of marijuana, nor is this disclaimer a recommendation against it.
I hate to be GrandPaul, the old curmudgeon who hates things that just ain’t the way they used to be. But I’ve decided to touch webdev for ScrabbleCheat and by jove, I feel those RoR kids have jumped the shark. Forget TDD; that shit is so passe. These kids practice BDD, with Cucumber and RSpec.
Let’s make a few thing clear here: Unit Testing isn’t the end-all, be-all of automated software verification. BDD seems to be a perfectly noble attempt to address many of its shortcomings, or if I may go so far, addressing the “failed promises” of TDD. Some of those promises (and how BDD helps solve them include):
Unit testing, on its own, helps you validate your code, but does little to help you understand its purpose. Anybody who wastes enough time on Hacker News or Proggit has read the mandatory article claiming that Unit Tests are your documentation, but it doesn’t take long for this to break down. What about the 92% of TDD practitioners who, at some point, broke down and wrote code before tests just that once (and again and again)? What about all that semantically useless boilerplate (all the Mock libraries, all the fixtures, what have you)? At the very best you have verbose, very technical and implementation-obsessed documentation only your coders can read.
Sure, by being executable and by having very wide coverage, Unit tests can help the technically-savvy person see, after some scrutiny, just how the application works. But this doesn’t describe what it does. BDD rectifies this by separating the specification of the behavior (what behavior you are testing) with the implementation of how you test it, and achieves this further (in the RoR world, anyways) with a butt-ton of fancy DSLs to make your tests read like English.
TDD is a boon to the developer, but the benefits aren’t directly known to the client. RoR folks, moreso in my anecdotal observation than most other types of devs, love to talk about what tools they’re using, usually much more than the projects they’re working on. So while they can circlejerk to each other about how great their methodology is and what gems they’re using, how much better would it be if they could include the person footing the bill, and give them that extra-personal experience?
I only mention this because everytime Cucumber lists its advantages, it’s always mentioned that you can show your cukes (Feature Definition files) to the clients and they will be able to read and understand it, even as nontechnical people. So that’s a weakness of unit testing, I guess – you can’t show that typing to the clients and it be anything meaningful.
I bring all this up because, well, I’m trying to get on this train and I’m finding it a bit of a pain in the ass. There are two major reasons for it, one the current Ruby ecosystem in particular, and the other simply being skeptical of the whole approach in the first place.
BDD in Ruby, today
Despite my seething cynicism, I’m working through it and giving it a try. Hell, the only reason I joined Twitter was because, four years ago, I was getting angry at its very existence, feeling that “Web 2.0” had brought us to such a shallow, useless service. Then I realized I was sounding like my (awesome, wonderful) Luddite father, so I signed up immediately. 2500 tweets later, I love it. So maybe (and I still believe this) I’ll come around and preach this the way the Ruby kids do these days.
But where to start learning? I looked at RoR years ago, back when we were all still using Test::Unit, I understand MVC, but wanted to do things right this time. So I read about RSpec and Cucumber, go to their respective websites, and try to write my first cukes and specs. The first thing you notice is that they both use DSLs. I know me plenty of Ruby, but when all your examples look like
Feature: An online spittoon which allows users to spit at the monitor for fun and profit
Scenario: Spittoon graphics
Given that my website is operational
And that the user is full of saliva
Then that user may spit upon the screen
But not upon the keyboard
And a bucket will appear to receive the spit
Scenario: Proper spittoon sounds
Given that the user has spit upon the screen
When the user spits on it
Then it should make a great "ding!" sound
It’s not obvious where the code comes in. Or what the rules are for writing these files. Similarly, RSpec shows you tons of files like
describe Spittoon do
subject { :spittoon }
it "should take spit in the monitor" do
visit("./")
subject.should receive_spit
end
end
All of this is well and cute, so you ask yourself, “where is the DSL reference so I can write my own?”
No really, where is it? Because I still haven’t found it. I’m looking all through the interwebz, but neither Cucumber nor RSpec have a definitive listing of their DSLs. There are a few half-baked tutorials, some Wikis on the Github pages, and while they all describe these technologies on a very shallow level, they barely answer my questions. When you run rspec from the command-line, where is it looking? Same with cucumber? How should all the components fit into the Rails app? Where is a full specification of the DSL? What capabilities of the browser does Capybara emulate? And where’s its DSL reference?
To contrast, there’s no shortage of press pages and splash pages telling me how this will add business value, how easy they are to use, how they all play well together, how they’ll turn you into a rockstar ninja coder that makes makes clients happy. I’d believe them, but if I want to read how you’ll redirect me to a dozen scattered wiki links and shallow blog posts with toy examples.
Funnily enough, the best source of information is to dive into source: the kind of thing these tools are supposed to prevent.
Well, that’s not quite true. Because for $25 you can buy The RSpec Book. Or the Cucumber book, for another $25. Note that many a Rails book carries a list price of $40-$50. Yes friends, you can have a decent serial, instructive reading experience that doesn’t involve navigating disjointed wiki tabs if you pay up. All the refrains about how Java was as good for the publishing industry as it was for the software industry come back. PragProg is a house built on their Ruby + Rails books. Never mind that the reviews for the books state that they are mostly out of date (all the examples use Webrat, not Capybara) or also only contain only trivial, toy examples. You can’t really blame them, it’s part of the limitation of the dead tree business. But you know what can solve this? A good, freely available manual.
I can learn a mainstream language from 1995 (Ruby), I can leverage existing programming experience from several other languages and projects, but when Rails apps use a component list reminiscent of The Startup Guys (“So I’m using Cucumber and RSpec with Rails with a dash of Factory Girl to practice BDD, with Launchy and Capybara since I think TDD and Webrat are dead.”) there a real need to ensure that, if you rely on DSLs, the barriers to entry are low and the ability to learn is high.
To compare to another learning experience I’m having, I’ve picked up emacs. I’ve been a Vim user for the past 3 years, but felt like I wanted to taste the forbidden fruit and so have started doing all my text editing in Emacs (this post was written in Emacs). I was off the ground, blown away by some of the capabilities in less than a day. I was using SLIME to run excercises in PAIP in less than a day, filled a few TODO lists in org-mode, and found a redundant import using Erlang-mode. And d’you wanna know how I did it?
I downloaded SLIME… and the manual. I downloaded org-mode… and the manual. I downloaded Emacs… and the manual. When I need to know something, it’s in there. But now, I’m feeling less friction learning one of the most bloated and feature-packed pieces of software in history than I am trying to get a basic site up in Rails, using the community’s idioms and best practices.
At RailsConf 2009, Robert Martin gave a great talk titled “What killed Smalltalk could kill Ruby, too”, and he highlights properties of the Smalltalk culture before it died: a bunch of very talented, smug programmers who were sure they would inherit the world because their tools were so much better than the competition. You can ask the Lisp guys how that worked for them, too. But at the end of the day, Rubyists: try not to make too much of a mess, since for every person like me who comes along to join the party, 10 won’t stick around past this frustration as I have. Do the dirty work and write a damn manual, preferably downloadable as a PDF.
On BDD, generally
All that being said, I can’t help but be skeptical about the advantages of BDD in the first place. A lot of it is well said in this blog post, but I feel like BDD doesn’t offer a whole lot of advantages over TDD, and the criticisms of TDD seek a more powerful answer.
I listed two criticisms at the top of the post. Tackling them in reverse order:
As my smarmy description made clear, nontechnical clients shouldn’t touch (or have a need to see) the executable bits of your testing unless they really want to. If I want to hand my clients a list of requirements to confirm that I understand them, I’ll write them a proper document thank you very much, not send them a bunch of braindead-sounding cukes, since I don’t think it’ll be of any consolation to them that I’m “running” that document. Write cukes for yourself it if helps you, but don’t put them in the client’s face since, frankly, I doubt they care. If they seem to care, they probably want something better suited for their needs.
The second, more compelling point, on specification vs. implementation. Here there is fertile ground to improve upon unit tests, but I think it would be by abandoning the developer writing an implementation in the first place. Many languages now have tools like Haskell’s QuickCheck or Erlang’s PropEr where you declare the properties of the function you test, and the language can generate hundreds of random test inputs, usually finding a lot more corner cases than you expected. We automate building our software, we use fuzzers to find corner cases we didn’t expect in security, why not automate the drudgery of coming up with unit tests?
Oh wow. I never knew that Rubyists had a Cobol fetish. I think I liked my brain better when it didn’t contain that knowledge.
It just feels like so much typing for not much power. “Natural”-sounding executable text doesn’t strike me as a critical gain on the weaknesses of TDD.
Despite all my grousing, I’ll stick through it. Watch me write a blog post in a few months swearing by it, like my Twitter conversion. Just wanted to get this off my chest.
Here’s a long overdue post: what the hell is up with the title to your earlier post, mainly the term observational indistinguishability? I admit to mentally fapping a bit; I’ll try my best to explain the term here since it right well blew my mind when I learned it.
Observational Indistinguishability, as it sounds, is the principle of two or more entities being indistinguishable from each other (you can’t tell which one is which) by any amount of observation. It’s really just a more formal way of saying a group things are equal in any way that matters. The magic of this is that the extra formality (that OI it is not the same thing as equality) is absolutely critical. I’ll show why, using two examples in CS.
The first is pseudorandomness. This is a word everybody says colloquially, probably unaware that it means something very precise, and solves a major theoretical hurdle of cryptography.
That hurdle is this: most crypto constructs need random data in many places, but how do you reliably, consistently get a truly random stream of data? The short answer is that you can’t: every method of gathering the random data will contain patterns that ‘leak’ from whatever method you used. An example of this is pulling numbers out of your head: you may think they’re random, but if you do it for long enough, you’ll start falling into human behavioral patterns that a smart-enough person can predict your next number with better odds than they would if it were actually random. Even if they couldn’t immediately, there’s no proof that they would never be able to if they’re given a long enough time. And saying its ‘mostly random’ without qualification isn’t good enough: In the Game of Cryptography, you win or you die!
So cryptographers got smart: they just lowered the bar to something that’s just as good, in practical terms. Rather than demand actual random data, they created pseudorandom functions which, while provably not actually random, can also be proved to show that any polynomial-time function (computer-speak for “any computer program on all the world’s computers for several lifetimes”) could never tell the difference*.
And with that, a bunch of slacking cryptographers eagerly lost their excuse to sit on their asses, and went on to build secure cryptosystems and hash functions on top of a mathematically precise “random enough data.” Remember: even though we know it’s not random, it really doesn’t matter because we couldn’t tell the difference even if we tried.
The second example is more meaningful though, because it’s a bit more general: it comes from my Programming Languages seminar, where we frequently reasoned about the semantic meaning of programs using operational semantics. You’d frequently get a program written one of a few ways and ask yourself questions like “what does it do,” or “how can we add X feature to the language and preserve all the previous properties?”
To do this, you’d have to understand what a program is doing in relation to another program written with the same rules. Here’s an example: are two functions equal, in terms of their semantic content? Do they do the exact same thing, from an inputs/outputs point of view? This isn’t a trick question, answer the best you can and you’re probably right:
function example_one() {
var x = 4;
return x + 1;
}
function example_two() {
return 4 + 1;
}
The answer is yes, they are equal in terms of meaning, but here’s the real problem: what does ‘equal’ mean? Any attempt we had a class reverted to intuition (“come now, we all know what it means”) or synonyms (“when they are the same. And they are the same when they are… equal…”).
Observational Indistinguishability lets us come up with a suitable definition without having to resort to defining equality. In this case they are observationally indistinguishable when for all programming contexts in the language, they will both evaluate successfully or they will both fail to evaluate. In other words, for a set of evaluation rules M, two programs are ‘unequal’ if you could write a program using M such that one of your functions will run to completion, but the other will “crash” and fail to evaluate. If you can’t produce such a program, they are “equal.”
Lets try now with two unequal functions:
function example_one() {
return 4;
}
function example_two() {
return 3;
}
Now these are clearly not equal, but let’s show this without the notion of equality. We’ll construct a program works when under one function, but not the other. Simple enough**:
function test() {
return 1 / (4 - example());
}
If you’re using example_one(), the program crashes (evaluation is impossible), while example_two() hums along smoothly. Since were created a context where one example evaluated and one didn’t, we know that these provide semantically different behavior. A few things to note about this:
It makes no constraints on syntax, or even the specifics of evaluation rules: so long as a set of rules exists, this definition works for any program written with those rules.
It puts the focus of equality on the meat and bones of the language: the evaluation rules and its primitive operations. example_one() and example_two() would actually appear equal if the language, for example, didn’t support division, and instead only supported addition and subtraction between numbers. To you, as a language engineer, this makes you wonder what the point of including numbers or addition in your language is at all when the difference between 3 and 4 can’t crash any program you can construct in it.
So to come back full circle, I just thought original story was cute because a very studied, full-of-ideas dramaturge got played so hard by a process that was the result not of equality of scripts, but observational indistinguishability, which makes me wonder how important dramaturges are to the process to begin with.
* = A little disclaimer: they didn’t prove that no polynomial time function could ever stop it, just that if anybody could come up with a way to do it, they’d first have to solve a Famous Unsolved Problem We’re Pretty Sure Doesn’t Have An Answer, like discrete logarithm.
My friend made a joke on cryptography proofs: “We haven’t proved they can’t be broken, just that nobody has done it yet. By this logic, I’m immortal!”
**= IIRC, Javascript implementations represent all their numbers as floats, meaning 4 - example_one() might actually not be 0, but some very very small number, and the program won’t crash. Ignore, please.
To contrast to the other part of this double-post, let’s talk about something a little more familiar to the folk at home: failing at life. This is prescient to me because of this:
That is my degree from Brown University, a combined BS in Computer Science/BA in Music. But didn’t I graduate last year?
Well, technically no. Despite passing higher-level neuroscience and graduate-level programming languages, I failed high school calculus, which was a prerequisite to studying computer science at Brown. Not a requirement, mind you: a prerequisite. Meaning that in theory, I needed to complete it before being allowed to study computer science.
(the resolution, if you don’t know, is that I took the course over the summer, and passed. Because of this, I didn’t graduate in the class of 2010, but technically in the class of 2011 as a “non-enrolled student.”)
What surprised me most, and (what I think) is the most valuable lesson I got from Brown, is that failing hard is one of the best things that can happen to you. Only when you’re kicked in the teeth and actually pushed to your limit do you know what it means to truly confront your demons, your weaknesses, and test your actual mettle.
Put another way: are you proud of the fact that you took a shower this morning? Do you look back fondly on the last time you tied your shoes? No, because those presented no challenge; completing those tasks presented no new knowledge of the world, and no new knowledge of yourself. If you think back to anything you’re proud of, it’s almost always been a challenge.
The fact that hard things are what make you proud isn’t really anything new (nor was it before I failed). The key realization for me was understanding that failing presents a much more heightened sense of accomplishment. To wit, I was incredibly proud of my high school production of Arcadia; but no matter how hard or challenging that was, I’ll always be more proud of the fact that I’m programming professionally, if only because I was already good at acting/directing, and had never been bad at it. Leaving your comfort zone and being truly uncomfortable is a growth experience too many bright people won’t experience.
They have a saying in software testing: a successful test case is one that finds a bug in the program. This means that when you test software, you remove all formalities of being a kind or understanding person and put on a different hat, thinking to yourself “what’s the meanest, cruelest thing I can do to this software? How can I BREAK THE SHIT OUT OF THIS.”
It’s a good thing to do this in life as well. Don’t be stupid, but for the good of you, take a risk every once in a while. You’ll never be more proud.