Friday, January 26, 2007

The enigma that is Google

Yes, it's the end of an era: Googling for miserable failure will no longer take you to George Bush's official bio. For that matter, "waffles" will no longer take you to John Kerry. Apparently Google has updated their search algorithm to fight against Googlebomb attacks by bloggers.

I've always wondered how much of Google's accuracy is dependent on specific, hand-coded tweaks, which a researcher like me tends to regard as a cheat as compared to automated algorithmic techniques (albeit a necessary one in the real world). Personally, I was surprised that Googlebombing worked as well as it did to begin with; it seemed to me that Google's accuracy relied a lot more on the ad hoc stuff they've applied on top of algorithms like PageRank than they normally let on. However, the new rankings apparently were done entirely through automation. I'm curious what they did. Some very informal observations of mine about Google techniques:

  • They seem to consciously favor known "reference sources" - Wikipedia links seem to come up high for a great many searches, for instance.
  • They also seem to put an automatic penalty on any page that might be classified as pornography. Do a search for "boobies", and you will see pages about the bird dominate the top results; I strongly suspect that would never be a natural result of PageRank alone.*
  • They're disproportionately friendly to academics, and particularly folks in technical disciplines. For example, at a recent machine learning reading group, we were amused that the UC Berkeley faculty member Michael Jordan actually manages to come up fourth in a search for "michael jordan"** - as opposed to, you know, another page about that other Michael Jordan. Perhaps this is the bias resulting from folks like us running Google.


* No, I don't actually sit around searching for things like "boobies" all day. Honest.
** I may have just helped increase his PageRank, actually.

Labels: ,

Tuesday, January 23, 2007

The ultimate mix tape generation service

Last night, at a dinner party, my dear friend Leah handed me something I realized I hadn't been a recipient of in a while: a mix tape. More accurately, a mix CD. The last mix-recording-delivery-instrument I recall getting was a tape from a girl in high school back in 1995, which, if I recall correctly, featured James Brown and Seven Year Bitch back to back. I had nearly forgotten the art of the mixtape.

At any rate, all of this reminded me of Tiny Mix Tapes' automatic mix tape generator, which has provided some entertaining mixes, such as "Songs for the guy who has the worst gas in the world, but you love him anyway" and "Songs for sleeping in a hammock".

And thanks, Leah, for the tunes; it's a lovely, laid back mix. Were I the Tiny Mix Tapes generator, I might produce it in response to "Songs to soothe me whilst on a long-distance car trip at 6 AM, driving bleary-eyed and perhaps a bit hungover and waiting for the caffeine to kick in." But I had no titles for the tracks; who was responsible for that cover of "Good Morning, Good Morning"?

Labels:

Friday, January 19, 2007

Software bugs and misprinted cakes

So apparently the bakeries at Wegmans - which, as any upstate New Yorker knows, is the finest purveyor of groceries the land over - employ software to handle automatic cake decoration. Customers can email requests for cakes and provide the desired text to decorate it with. Unfortunately, the software cannot handle Microsoft's nonstandard HTML extensions, so that when someone tried decorating with an Italian greeting featuring nonstandard characters, this was the result.

Labels: , ,

Wednesday, January 17, 2007

The world's most boring publication

A classic from 1955, courtesy of the RAND Corporation: A Million Random Digits with 100,000 Normal Deviates. It's hard to believe that, at one point in time, a book containing nothing more than an obscenely long string of random digits was a genuinely significant contribution to humankind, but be not deceived: effective random number generation is hard. It's hard enough to require a 131-page specification courtesy of the National Institute of Standards and Technology. And in in an era when one couldn't count on having a cheap, fast PC to run well-established algorithms on, a book like this was indeed a blessing. I'm tempted to buy this to read on airplanes, just to creep out the person sitting next to me.

Labels: ,

Tuesday, January 16, 2007

In other news, it's snowing here again.

Ah, the joys of working at home. It's freeeeeezing out. This is very un-Seattle.

The view from my window:

The view of Tashkent Park from the front door:

Labels: ,

Pop producers are now sampling C64 MOD files

for their hits. Example: Timbaland blatantly ripped off a Finnish MOD writer for a Nelly Furtado song, and failed to give credit where it was due. (Link contains MP3s for comparison.) It wasn't just a minor sampling; pretty much the entire melody line was borrowed. How easy would it be for an average hobbyist like this guy to defend his copyright? Since copyright owners have gotten vigilant about sampling royalties, and court decisions have changed things to the point where famous albums couldn't have been produced today, perhaps they'll be opting to borrow from artists who aren't in nearly as good a position to defend their copyright claims. (Via Waxy.)

Labels: , ,

Sunday, January 07, 2007

Don't buy Pop Tarts.

Happy belated New Year. Yes, I've taken a break from blogging. I intend to post more, and in more depth. However, with a couple of paper deadlines approaching, and me having to begin my search for postdoc/research positions and write a thesis, I may have less time for plunging into the random nonsense I've oft posted in the past.

Having said that, here's some random nonsense: Jenni was watching Spongebob Squarepants today, and at the commercial break I caught an ad for Kellogg's Pop-Tarts that surprised me: it was in the style of famed animator Don Herzfeldt, of "Rejected" fame. Did Herzfeldt, who has refused to do commercials for years, finally sell out, I thought? Turns out no - they shamelessly ripped him off. For shame, Kellogg's! You can find them on the plagiarists' own website; notice the site is designed in Hertzfeldt style.

Labels: