Love in the Time of MapReduce

02 Apr 2012

This piece was written for an internal Google fiction contest, for the 100th edition of the engineering newsletter. The call to arms arrived in my inbox like so:

For this special Eng Newsletter issue, we’re running a “google eng-y” short fiction contest. You can write about anything, but the story must begin with these two words: “The MapReduce”.

Please note that some meaning may be lost on non-Googler’s, notably the bits concerning company hierarchy. All the opinions expressed are my own and obviously do not constitute the workings of an actual Google plan, etc. Jeff Dean is a very nice man. This is a piece of fiction in almost every sense.

The MapReduce was a piece of technology whose existence its steward, Jeff Dean, sometimes begrudged. It was glamorous, in a way, to be the public face of the algorithm that had essentially rewritten interpersonal contact, but it was also draining and surreal.

In one of Jeff’s increasingly common attacks of perspective, he realized that his daughters, too, had been completely swept up by a thing that he himself had designed, built, and evangelized. They were, of course, perfectly happy with the product. Jeff noted this with a tinge of grim pride, remembering the long nights of trial runs. Victoria and Natalie were a bit too happy, Jeff mused, so completely satisfied with something they could never understand (indeed, that he himself no longer understood well), that he found their lack of doubt troubling. Why didn’t they care that it probably shouldn’t work, that time and computation could twist statistics in such a fundamentally disturbing way? It was probably due to both of them being so preoccupied with Natalie’s wedding, he concluded wearily.

Later, outside of his office, Irina was waiting for him.

“Jeff, you have a visitor waiting for you in your office,” she said. Something in her tone gave away the urgency of the situation, and Jeff nodded, having long grown used to trusting Irina to manage his calendar more deftly than he could tie his shoes.

His suited visitor was a trim man of about sixty, which was unusual enough for the Googleplex in terms of both age and dress. He wore his graying hair swept back and neatly cropped. With a start, Jeff realized that his visitor was none other than a senator of Iowa.

“I’m Robert Graves, and sorry about showing up so unannounced, Mr. Dean,” said the man, with a smile. Jeff paused for a moment to admire how finely countenanced the man was, and to feel a small thrill at being so delightfully underdressed, himself.

“You’re the senator pushing for patent reform. I don’t watch TV much, but I’ve seen you on when my wife watches the news.” Jeff shook his visitor’s hand and seated himself behind his desk.

“The very same. Look, I’ll spare you the pleasantries and get right to why I’ve come. I’m told that engineers prize truth and directness.” Jeff lifted an eyebrow at this, having found that lately he valued being left well enough alone better than both of those things. “As you well know, MapReduce is proving problematic, socially. FOX is filming a reality TV show at this very moment about an engaged couple who are convinced that after trying out their MapReduce partners, they’ll still want to get married.”

“Jesus. How’s it looking for the couple?”

“Not good. Even worse, they’re filming it in my hometown.” Graves massaged his temples.

Jeff was not surprised. MapReduce rarely erred. Though it had begun as a general purpose framework for parallelizing search index updates, it eventually lent itself to analyzing the massive amounts of user generated social data Google+ collected. In time, this would become all that MapReduce was known for (at least externally of Google), in a queer reversal of how the words escalator and aspirin came to describe all such contrivances, though they were once only brands.

“Basically the right is getting as much fuel as it wants for its eternal fire of shouting about our perpetual moral decay. On top of that, MapReduce is having a powerful economic impact, which doesn’t help. We’re having an employment problem, as you’ve doubtless inferred by now, since you must have all the numbers on how many people are using MapReduce to pair up.”

The first Jeff had heard about the phenomenon the media had dubbed as the “honeymoon effect,” had been from the news itself, but he nodded anyway. “My citizens are up and leaving jobs they’ve worked at for a decade to meet their dreamboat on the other side of the world. I mean, great for them, but our coffers weren’t in great shape before, and your invention is a drain we can’t possibly afford right now, never mind the bad press. As much as I am for the future, I desperately need you to stop operating in my state.”

Jeff suppressed the urge to tell the man to just contact press@google.com, and instead reluctantly launched into a narrative he had delivered many times before. “I’m sure you’ve seen and read all the press releases about this. What we do isn’t terribly new. We provide a service that users want. In a sense, we provide nothing more than what eHarmony and Match.com have been providing for years, just with much less uncertainty and a bigger candidate pool.”

Robert snorted. “I’d hardly call ‘every Google user on Earth’ a bigger pool. Your operation is different, too. You know all the things that people have searched for, and all the things they’re too ashamed to search for. You know why some actresses draw men to them, and which men women will wait hours to receive texts from. Those bankrupt dating sites had only the constructed personas of the desperate to work with. There’s a case that could be made here for unlawful invasion of privacy and monopolistic abuse of information.”

A Googler rode past Jeff’s window on a small yellow bicycle. Jeff focused on the bright colors to briefly escape his current uncomfortable tension.

If Graves was right about anything, it was that MapReduce was uncannily effective. Through what some people might call sorcery, or what Jeff’s team leads described as “massively parallel Bayesian-adapted machine learning plus deep social mining,” it was able to identify, with nearly 97% confidence, a lifetime romantic partner for any given user. The algorithm could even supply just the right amount of shared interests as conversation starters, while leaving enough unsaid for the nascent couple to discover independently, leaving them feeling as if they had come to know each other intimately of their own volition. Some people found this deeply unnerving.

Even those people commonly derided by society could find love in this way, though MapReduce might take weeks instead of seconds to produce a suitable pairing. People of every sexual deviancy and every personal vice were being matched up, to the horror of the many people alienated by the brutal efficiency of MapReduce’s perfect lack of bias.

In short, romantic fulfillment was, for most people, little more work than clicking “I’m feeling lucky” and buying a plane ticket. This is what the people wanted more than anything else. Graves knew it, and Jeff knew that Graves knew it. Furthermore, Jeff knew that Graves was powerless to do anything about it, so strongly did the public crave MapReduce’s presence in the world. Yet Jeff felt sympathy for Grave’s willingness to shoulder the impossible task of squaring the budget against falling revenues and changing social tides.

“Mr. Graves, I understand your dilemma. The last thing you need right now is the income rug pulled out from under you. But look at it this way: about half of those people who have gone and paired off will probably come back to their hometown, bride or husband in tow, so your population will probably end up about even. After these couples outgrow their honeymoon period, they’ll settle down, work, have kids, and spend with an intensity that only the truly content can bring to bear. In the coming decade, your books might even make it into the black.”

Robert was not easily placated. “Can you say for certain that this is the way it’s going to play out? The world has never seen this kind of mass social movement. What if the people become complacent instead of motivated? What if your algorithms can’t guarantee long term stability?”

Jeff had an inner conflict. As usual, the side favoring the least amount of social friction won out. “We’re the ones who managed to pair everyone up so well in the first place, aren’t we? The models say the population will eventually converge on a higher level of stable productivity. I can’t promise you it’s going to happen, of course, but here at Google we have pretty high hopes for the future.”

The two men talked in this way for some time. The elder statesman pushed and the younger (but not exactly young) engineer deflected until the senator grew weary or satisfied enough to defer discussion to a later date. Jeff had managed to end the meeting with only vague promises, a surprising talent that had earned him his relative autonomy from Larry Page’s inner circle. Later he would have to file a report, naturally, detailing the intricacies of his conversation with the senator, but for now Page trusted him to keep third parties at arm’s length on his own.

Later that evening, after a quiet supper with Heidi, Jeff lay in bed thinking. The models actually didn’t say much about the economic reality of the future. The social data that allowed his team to pair people so effectively seemed to shrug mutely at the problem of what the future might be like. He had assured Graves that everything would be fine, but by the time Jeff could be proved wrong, he would be long retired.

Sleep took him. He dreamt, which was not unusual (though he didn’t know it), but he also remembered his dreams from that night, which was. He dreamt of a young man smashing a perfect chalice in a decrepit hallway, and of women who laughed while they danced away from their homes.

When he woke, Jeff knew what he had to do.

Sanjay probably could be trusted, but Jeff couldn’t take the chance. He would split his change into pieces, and sneak them into other, tangentially related changelists. The other developers on his team would probably rubber stamp these, anyway, since Jeff was one of the most prolific programmers there was. Who would look at yet another Jeff Dean code review too closely?

What did it mean to adhere to Google’s famed “Don’t be Evil” policy, when it came to arranging marriages? The standard Google answer would be to make the user as happy as possible without violating their trust. But what trust was there to violate if users themselves didn’t know what they wanted in relationships, or what would truly make them happy? Marriages are long lived beasts, Jeff reasoned, subject to slowly building changes in the macroeconomic climate. If marriages affect the economy, and the economy affects all marriages, what should you optimize for, and how?

Jeff’s changelists were approved, as a matter of course. Years later, he retired. The day he first started noticing what might have been the fruits of his subversion ripen, he remembered a thing that his old mentor Urs loved to say, before Urs had left him in charge.

“It is better to ask forgiveness than permission,” Hölzle would often chuckle, in a particularly German way. Jeff chuckled now, too.

Wired was doing a bio piece on a recently minted tech millionaire. The man was one of the few people for whom MapReduce’s pairing hadn’t worked out in the long term. When asked what had motivated him to start the company he had just sold, the man somewhat abashedly said that he wanted to prove to his ex-wife that dumping him was a mistake.

Ambition and talent sometimes survive contact with love, Jeff mused, but are more often dulled by it. MapReduce could identify those individuals who are defined by intelligence, drive, and pride. In other words, the archetypal entrepreneur. A few modified terms in a complex linear algebra equation could yield surprising results, Jeff had discovered, like optimizing for romantic partners that would net the largest increase in a person’s ambition, rather than happiness. A lot of the unfortunate people of talent singled out by Jeff’s modification would probably yield little value, but one, he hoped, would build the next Google. Jeff longed to see that day.

HTML5 canvas isolines

20 Mar 2012

Here’s a little page that draws isolines in a canvas element. It uses the marching squares algorithm and can take dynamic data in the form of a tab separated value file (.tsv).

Such a file should be formatted thusly:

90	30	10	50	40
70	20	0	5	10
30	10	20	70	30
60	70	100	10	20
20	40	35	60	15
10	40	50	30	40

misc toys

16 Mar 2012

I’ve put a couple toys on my site that I haven’t bothered documenting.

One is a toy canvas simulating a bunch of colored balls in bouncing around in the shape of the Google logo (I made it before joining Google).

The other is a nifty page that will load a tumblr user’s 100 longest posts. It’s all done client side, querying the old tumblr v1 API.

Arguing for Immortality

30 Apr 2011

I want to live forever. I’ve always thought that not dying was a pretty obvious thing to want. To my surprise, I’ve found that a lot of people whom I usually agree with on most topics strongly disagree with me on this one. Rather than write yet another piece extolling the virtues of a far-future post-scarcity post-singularity world, I thought I’d just document some of the objections to immortality I get and my counterarguments.

Note that for the purposes of giving my conversational partners opportunities to disagree, I typically posit a form of immortality where you, and you alone are presented with the option of eternal youth with no suicide option. You constantly regenerate to perfect health at the prime of your life. There are a lot of potential ways we might go about not dying, but people tend to find objections to this particular flavor more readily than the others. Please assume this working definition for the below.

Watching your loved ones die

I get this one the most. The argument goes like

I can’t stand the thought of having to watch everyone I love die. Can you imagine living forever with everyone you used to know gone? How could you deal with the pain of building lifelong relationships and then seeing them disappear, forever?

My response to this is essentially, “Newsflash, you already are going to have to deal with that.” If you are above average in caution, intelligence, or just plain old fitness, odds are you are going to have to watch a fair share of your friends die and still continue living for a significant period of time afterwards. Statistics don’t lie. Accidents and cancer, while tragic, are inevitable.

As for having to carry the burden forever, we can again look to reality to answer this objection. “Time heals all wounds,” is an adage for a reason. If you can deal with a friend being dead for twenty years (as people already do), it seems pretty likely you’ll be able to deal with it for an indefinite future. As you live forever, you’ll make new friends and get over the loss of old ones. It seems unlikely that it will be year 2769 of missing Fred that finally pushes you over the edge, forever. What will probably actually happen is you just get really good at coping.

I’ll get bored and sick of life

This one is simple enough. It goes

I feel awful enough when I’m bored as it is. If I live forever, won’t I eventually just get bored after having done literally everything? I can’t even imagine my life past [X] years.

The way I deal with this is essentially Moore’s Law. Each day, literally a year of video is uploaded to Youtube. Every year, more books than you could possibly read in a regular lifetime are written. As society increases exponentially in complexity, generation of interesting content explodes. It will be impossible for you to do everything when society invents new experiences faster than you can experience them. Sit back and enjoy the ride, you’re immortal.

As for the people who can’t even imagine their long term prospects right now, I say that this makes them perfectly suited for immortality. If you lack a long term vision and focus on the short term, you’ll never be dragged down by your long past. The best mindset for happiness over a long time period is always to enjoy the present.

Death’s inevitability gives life meaning

Now we start waxing philosophic with stuff like,

There is no good without evil. Having death exist gives my life meaning just like scarcity lends gold-backed currencies their value. I need to know that there is an end because it gives me a sense of urgency to my life. Without it I would just stagnate.

To me this is like saying “shitting gives eating meaning.” Does it? Evolution would beg to disagree if it were anything but a blind selection process. When was the last time you sat down for a great meal and said to yourself, “Thank god I’m going to die one day or this meal would be devoid of meaning”? What you do with your life is what lends it value or meaning, not the eventuality of death.

Besides, if you believe in death being necessary to lend your life value, why go with the arbitrary time limit of your natural lifespan? No one who has fielded this argument against me has agreed to just kill themselves when they feel like they’ve basically achieved what they’ve wanted to in life. The fact of the matter is that everyone chooses life over death on a day-to-day basis, and actions speak louder than words.

What about the afterlife?

This one can be touchy, for obvious reasons. Let’s go with the reasonably neutral objection of,

Suppose there is an afterlife. By taking immortality, I would be missing out on a greater truth and the greatest party in or outside of the universe.

Sure, you might be. But most religions (weighted by subscribers) don’t fault you for not dying. If your purpose in living is to live according to your religion, immortality doesn’t get in the way of that. Furthermore, at the end of eternity, which is a construct that should be available to gods, it seems reasonable that you will receive your just reward in any case. Religions also commonly posit Judgment Day scenarios in which all living humans will be placed into various afterlives. It seems reasonable here to assume that if a god so wishes it, your immortality will be revoked.

I’ll be too different from everyone else

There are a couple variations on this theme. In its simplest form,

If I live forever and no one else does, I’ll be endlessly meeting new people with the foreknowledge that I will outlive them. I’ll be so fundamentally different from the average mortal person that I won’t be able to form meaningful relationships. Nobody will understand me!

This one is more of a deal breaker on a case-by-case basis. If a person really feels that they will be well and truly alienated by being immortal, well, it’s hard to tell them that they won’t be. It’s kind of self-fulfilling. That said, there a few things you can say.

People actually will understand you

If there’s anything humans are good at, it’s getting used to stuff. In short order you and your future peers will be cracking wise about your absurd inability to die over whatever the future equivalent of beer is. Maybe mechano-beer. I’m pretty sure that after an eternity of explaining yourself to mortals, you’ll be extremely proficient at helping other people understand who you are.
Science will eventually catch up

Lifespan lengthening technologies only need to make it to the point where they can increase a person’s lifespan one year for every year of research before the masses become practically immortal. Wait long enough and you and your magical wish-granted immortality will be in good company.
Being different is an asset

You can honestly just play your immortality for international celebrity. Make a bid for president. Be rich. Honestly, being too much like other people basically sucks. Being exceptional will make you popular, not the opposite.

Disaster scenarios

This one is kind of morbid. I thought about simply forbidding it by adding a clause to the immortality definition, but it’s interesting enough to keep in. In a few words,

What if everyone but me gets wiped out? I don’t think I could live as the only human being alive. Also, what if the government kidnaps me and uses me as a lab rat? What if I piss off some people enough to get me tortured forever? There might be times where I might actually need to kill myself, even if I would normally find living forever appealing.

Coming up with a bulletproof answer to this one is hard. It’s very tempting to say that no actual implementation of immortality will actually prevent you from dying in literally every case or rob you of the option of killing yourself. But really, that would be cheating.

A real counterargument has to take the form of, “You already run similar risks today, as a mortal.” You can be used as a medical experiment or tortured this very day, and they can keep you alive for a very long time. As an immortal, it’s likely that you will eventually outlive your captors.

Also consider that it’s far more likely that any government or entity with power will want your cooperation instead of your incarceration. You could be the world’s greatest astronaut or the guy who goes into nuclear reactor cores melting down.

However, the extinction of the human race scenario actually does throw a wrench into this argument. To this, all I can really say is that the miniscule risk you run of a mass extinction is just a risk you’ll have to swallow in order to reap the vast rewards of immortality. Maybe there’s other life out there that you can find. Maybe in your spare time you’ll engineer a way to rebuild the world.

Wrap-up

This is a fun topic for me, and I hope I’ve been interesting without being too offensive to people on either side of the coin. This topic is important in the sense that once you accept that immortality is a pretty good idea, you open the door for a lot of other interesting arguments to be made about rationality, ethics, and just general decision making. If you feel I’ve made some sort of egregious error or just want to weigh in for any reason, please feel free to comment below.

EDIT: The Hacker News discussion of this page is fairly interesting. Give it a try, too!

Facebook Puzzle: sophie

05 Mar 2011

This week: Facebook’s sophie puzzle. This one is “buffet” difficulty, which translates roughly to “the underlying problem is NP-complete,” which explains why I have such a hard time choosing food at sushi buffets. In any case, the problem is to find your cat in your apartment, where you know where the cat is likely to be, as well as the transit times between the various locations in your home.

I’ll document here the various bad solutions I came up with on my way to a decent one, and as a bonus: an optimized-ish version in C++!

About the math

The problem asks you to minimize the expected time to find sophie. What does this mean? Take a look at the example input (comments mine).

4
#node name    #probability sophie is there
front_door    .2
in_cabinet    .3
under_bed     .4
behind_blinds .1
5
#node x    #node y       #time between x and y
front_door under_bed     5
under_bed  behind_blinds 9
front_door behind_blinds 5
front_door in_cabinet    2
in_cabinet behind_blinds 6

This says there are four nodes and five edges between those nodes, and that 40% of the time, sophie is going to be under the bed. If sophie was under the bed 100% of the time, the optimal path to minimize the expected time to find her would be just the path that takes you to the bed in the shortest amount of time. But since some nodes are unlikely to hide sophie, you can afford to take your sweet time getting to them.

For this sample input, the optimal path is front_door, in_cabinet, under_bed, behind_blind. Note that to go from in_cabinet to under_bed, you should pass through the front_door node. The expectation for this path is 6.00 seconds, as explained from this snippet from David Eisenstat’s site:

Pr(front_door) * 0
+ Pr(in_cabinet) * Distance(front_door, in_cabinet)
+ Pr(under_bed) * (Distance(front_door, in_cabinet)
                   + Distance(in_cabinet, under_bed))
+ Pr(behind_blinds) * (Distance(front_door, in_cabinet)
                       + Distance(in_cabinet, under_bed)
                       + Distance(under_bed, behind_blinds))
    = .2 * 0 + .3 * 2 + .4 * (2 + 7) + .1 * (2 + 7 + 9) = 6.00

Building the graph

The input only includes edges between particular nodes. In order to know that, say, the distance between the cabinet and the bed is 7 (through the front door), you have to build up the shortest distances between every node in the graph. This is known as the “all pairs shortest path” problem. There exists a quite famous dynamic programming algorithm to solve this, the Floyd Warshall algorithm. Trivially implemented in Ruby:

# note that $weights[x][y] is initialized to either
# Float::MAX if there is no edge between x and y, or
# to whatever the length of the edge is if there is.
def floyd_warshall
    for k in 0..$num-1
    for i in 0..$num-1
    for j in 0..$num-1
        if $weights[i][k] + $weights[k][j] < $weights[i][j]
            $weights[i][j] = $weights[i][k] + $weights[k][j]
            $next[i][j] = k
        end
    end
    end
    end
end

# links menoizes the list of nodes you need to traverse
# between nodes i and j
def links(i, j)
    k = $next[i][j]
    return k if k.class == Set
    $next[j][i] = $next[i][j] = (k.nil? or i == j) ?
        Set.new([]) :
        (links(i, k) + Set.new([k]) + links(k, j))
end

This gives a good starting point for actually trying to start solving the problem.

BFS solution

In my hubris, I figured a breadth first search where you expand on the path with the lowest current expected time would work. Here’s what it looks like:

def solve
    queue = MinHeap.new
    queue.push 0.0, [[0], Set.new((1..$num-1).select {|n| $probs[n] > 0}), 0.0, 0.0]
    while not queue.empty?
        node, remain, time, expected = queue.pop
        if remain.empty?
            p node
            return expected
        end
        # only iterate remaining nodes that you don't need to
        # go through other remaining ndoes to reach
        remain.select {|n|
            (links(node.last, n) & remain).empty?
        }.each do |n|
            new_time = time + $weights[node.last][n]
            new_expected = expected + $probs[n] * new_time
            queue.push new_expected, [node + [n], remain - [n], new_time, new_expected]
        end
    end
end

In my defense I hadn’t yet realized that the sophie problem is a variant of the traveling salesman problem and that a BFS search would take forever on large graphs. This doesn’t work because you incrementally build all the bad paths on your way to finding the first path to complete. Complexity: proportional to the number of paths, or O(n!).

DP Solution

Hitting upon the realization that the problem is a variant of the traveling salesman problem I decided to try the canonical dynamic programming solution to TSP.

The DP solution requires that you build a structure like

C[subset][j]

where subset is some subset of all nodes, and j is a node in that subset. The value of this entry should be the minimum expected time to proceed from node 0 to node j and through all the nodes in the subset. The problem then reduces to finding the minimum of:

C[subset - [j]][i] # for all i in subset

There are some problems here, but first, some code:

def solve
    relevant = (1..$num-1).select {|n| $probs[n] > 0}
    hash = {}
    hash[[0]] = {0 => [0, 0]}
    for size in 1..relevant.size
        relevant.combination(size).each do |subset|
            subset.insert 0, 0
            hash[subset] = {0 => [Float::MAX, Float::MAX]}
            for j in subset
                next if j == 0
                reduced = subset - [j]
                hash[subset][j] = reduced.collect {|i|
                    e, t = hash[reduced][i]
                    t += $weights[i][j]
                    [e + t * $probs[j], t]
                }.min {|a,b| a.first <=> b.first}
            end
        end
    end
    hash[[0] + relevant].values.collect {|x| x.first}.min
end

This works, but isn’t fast. A 17 node graph took me about 10 minutes to finish. The problem is that since subsets aren’t ordered, there is no convenient way to represent them as array indices and you must therefore hash entire subsets. Since there are 2ⁿ subsets, and you must compute the path that ends in each node in each subset, which itself requires examining all other previous paths of the subset minus one of its elements (breathe), the complexity here is O(2ⁿn²).

Recursive Backtracking (Pruning) Solution

Backtracking can be thought of as essentially a DFS search with fast failure. For example, say you have found a complete path that you know will give you an expected time of 30 seconds. Now you are attempting to build another path, and halfway through you know your partial expected time is already 31 seconds. You can abandon building this path, saving yourself the hassle of expanding all of that partial path’s children.

Skipping entire subtrees is known as pruning, and you can achieve some pretty massive savings depending on how well you implement it. My solution was to very conservatively estimate the remaining expectation of a partial path. For instance, if you are halfway through a path with a current expected time of 10 seconds, a path length of 20 seconds and you have covered 60% of the places where sophie can be, then even in the perfect case where the next node was 0 seconds away and had a 40% probability of hiding sophie, you would still incur an additional 20 * .4 = .8 seconds of expected time. If you have already found a minimum path length of say, 10.5, you can prune this subtree where you could not have before.

And without further ado, here’s the code.

$min = Float::MAX
def solve(node, remain, unseen, expect = 0, time = 0)
    return if expect + unseen * time >= $min
    return ($min = expect) if remain.empty?
    remain.each do |n|
        next_time  = time + $weights[node][n]
        solve n,
              remain - [n],
              unseen - $probs[n],
              expect + next_time * $probs[n],
              next_time
    end
    $min
end

This works fairly well. We can do that 17 node graph in 30 seconds now. I don’t have a good estimate of the complexity improvement here, since you can generate corner cases that can prevent any pruning from happening.

Optimizations!

Ruby is slow. At least, Cygwin’s default Ruby 1.8.7 interpreter is slow. I decided to reimplement the whole deal in C++ and see what kind of speedups I could achieve. Here is my initial implementation in C++:

double solve(int node, set<int> &remain, double unseen,
        double expect = 0.0, double time = 0.0) {
    static double min = numeric_limits<double>::max();
    if (expect + unseen * time >= min)
        return -1;
    if (remain.size() == 0) {
        min = expect;
        return -1;
    }
    for (set<int>::iterator n = remain.begin(); n != remain.end(); n++) {
        int next = *n;
        double next_time = time + weights[node][next];
        set<int> next_remain = remain;
        next_remain.erase(next);
        solve(next, next_remain, unseen - probs[next],
            expect + next_time * probs[next], next_time);
    }
    return min;
}

However, the speedup here was only a factor of two or so. Where are the bottlenecks? Turns out, a big one in both Ruby and C++ is the constant recreation of the remainder set at

set<int> next_remain = remain;

It’s much faster to just do:

set<int> next_remain = remain;
for (set<int>::iterator n = remain.begin(); n != remain.end(); n++) {
    int next = *n;
    double next_time = time + weights[node][next];
    next_remain.erase(next);
    solve(next, next_remain, unseen - probs[next],
        expect + next_time * probs[next], next_time);
    next_remain.insert(next);
}

This way, you only do one set copy per recursion, and then just pass around to all of your children. “But Vincent,” you say, “why even bother recreating the set at each recursion? Can’t you just pass alone one set of remaining nodes and add and remove from it?” Sure, but its very annoying to not invalidate iterators to a set that is constantly shrinking and growing through iteration and recursion. Here’s what I came up with:

double solve(int node, vector<node_entry> &remain, double unseen,
        double expect = 0.0, double time = 0.0) {
    static double min = numeric_limits<double>::max();
    if (expect + unseen * time >= min)
        return -1;

    bool empty = true;
    for (vector<node_entry>::iterator n = remain.begin(); n != remain.end(); n++) {
        if (!n->active)
            continue;
        empty = false;
        int next = n->index;
        double next_time = time + weights[node][next];
        n->active = false;
        solve(next, remain, unseen - probs[next],
            expect + next_time * probs[next], next_time);
        n->active = true;
    }
    if (empty) min = expect;
    return min;
}

This way we just store whether an element is active or not in the data model itself, instead of representing that information with presence in a set. This is nice because adding/removing from a set, while O(log(n)) fast, ain’t no O(1). This gets us (for our 17 node graph):

$ make && time ./sophie in_sophie5.txt
make: `sophie' is up to date.
38.20

real    0m0.220s
user    0m0.156s
sys     0m0.030s

Hooray!

Final Thoughts

I didn’t talk about any of the edge cases, but you need to check for if it’s actually possible to be sure to find sophie. In particular, if there are nodes that aren’t reachable from the first node that sophie has a chance of being in then you need to fail.

Dear submitter,

Thank you for your submission of a puzzle solution to Facebook! After running your solution to sophie (received on March 5, 2011, 8:14 pm), I have determined it to be correct. Your solution ran for 25.118 ms on its longest test case.

Also, full source and some plagiarized test cases are all on github.

Older Newer