Monday, August 29, 2005

Housing Bubble

Posted by Phil Aaronson at 5:01 PM

New York Times OP-ED, Greenspan and the Bubble by Paul Krugman via RSS
... these days Americans make a living by selling each other houses, paid for with money borrowed from China.

Sunday, August 28, 2005

My iTunes Rating Blunder

Posted by Phil Aaronson at 10:01 AM

Found a great article dealing with the math behind iTunes on Slashdot this morning. Specifically how the random selection works, and I'm very glad I read it. Unconsciously, I've been using the rating feature completely wrong. Who knew? From the article:
Most people follow a bell shaped curve for their ratings, with the 3-star rating being the most common.
I'm not sure this is true of most people. I certainly don't follow this pattern, I checked my wife's collection and its the same. The default in iTunes is unrated, which is how I thought of it, but in reality its more accurate to call it zero stars. The vast majority of my collection has zero stars. And here's where my problem lay, I've been assigning four and five star ratings occasionally to songs I liked and one and two star ratings to songs I didn't, and rarely at that. Little did I know that I was actually increasing the chances of the songs I didn't like to get played. Check out their figure 1, a song with one star is three times more likely to be played than a zero rated song.

The correct use, for the way I've been using this feature, would be to set everything to three stars, and assign four and five stars to the songs I like and zero, one, and two to the songs I don't. There really should be a preference setting so that any new songs I import get the three star rating automatically. Or at least an easy way to set three stars to all the currently unranked tracks wholesale.

Links:
  1. How Much Does iTunes Like My Five-Star Songs? by Brian Hansen.
  2. Slashdot via RSS.

Saturday, August 27, 2005

Gummi Bear Vitamins

Posted by Phil Aaronson at 11:34 PM

Who comes up with this stuff?

Friday, August 26, 2005

So Bad Its Good

Posted by Phil Aaronson at 4:58 PM

In the gym this morning they had rap playing. I belong to the wrong generation, most rap equals noise to my ears. But this song that was playing, it was so bad, it made it all the way to good. For a long stretch they were rhyming against 'eese' on what seemed like every phrase. After a couple I started chiming in with my own rhyme ending in: "... help me please?" Bzzz, it was "... nasty disease". Me: "... sweet lookin' trees", them: "... the police all my computers they seize'." And on it went, roughly anyway. Fortunately no one could hear me. As embarrassing as it is to admit, I'm not a teenager, I'm too optimistic, and life's been too good for me to grok rap. Just the fact that I used the word grok probably means I'll never grok rap. I wish I caught the artist though.

Which reminds me. I've been off work because of paternity leave, so last weekend we ran up to Pine Mountain Lake with the new baby and the kids, just to get away. Some friends of ours have a vacation place up there and were nice enough to let us stay.

One evening the kids found The New Adventures of Pippi Longstocking on a forgotten shelf behind the TV and wanted to put it on. Tami Erin who plays Pippi looks positively stoned throughout a good chunk of this movie. That combined with some bad singing and the pitcher of margaritas we were drinking made for an absolutely hilarious evening. Dick Van Patten certainly helped in the so-so-bad department. I'm cracking up just thinking about this movie again. Bad. Painful. Hinkty bad. And wouldn't ya just know it, there's a porn version called The Nude Adventures of Pippi Longstocking. It just gets worse! I mean better.

Thursday, August 25, 2005

How Big Is Your Index?

Posted by Phil Aaronson at 12:15 PM

I guess it started with this, a claim on the Yahoo! Search blog that their new index sported 19.2 billion web pages. And Google cried, na-ha. And Yahoo! cried, ya-ha.

And then some academics thought it would be clever to estimate the relative sizes, and in some way lend credence to either Google or Yahoo's na/ya-ha. Their methodology: send random two word queries to both engines, dropping anything with more than 1000 results. Compare peni... errr result sizes in the hopes that these corner case queries would corollate to the overall index size. Their conclusion:
It is the opinion of this study that Yahoo!'s claim to have a web index of over twice as many documents as Google's index is suspicious. Unless a large number of the documents Yahoo! has indexed are not yet available to its search engine or if the Yahoo! search engine is not returning all the documents that match our specific search queries, we find it puzzling that Yahoo!'s search engine consistently returned fewer results than Google.
And it was subsequently picked up by Slashdot and made the rounds all over the internet. Its how I found it.

I'll admit, I'm seriously biased. I work for Yahoo!, but this paper has some serious biases of its own. Just for fun I ran through their results and made a quick histogram of the difference between Yahoo! and Google result counts (duplicates removed). The main mode was a difference of 3. So I dumped every query that resulted in three results from Google and zero from Yahoo!. I put the list of terms here. There are 755 of them. I didn't check every one, but its pretty clear a big chunk of the list are all links to these three results:
www.cs.uwyo.edu/~wspears/courses/CS3020/Spring05/dict.random
www.learnignorance.com/programming/code/spell_checker/dictionary.txt
smaug.fh-regensburg.de/~feyrer/ftp/pub/NetBSD/distfiles/GutenMark-words-20030107/english.words.gz
They're all dictionaries. Big lists of words that Yahoo! search decided to exclude from their results. And when a significant chunk of the results are pointing to the same three dictionaries, I think its safe to say that they're not going to tell you much about the overall index size. And I didn't look at the terms that are 4 results different. Neat idea, but flawed.

[Update 8/22/2005] Interestingly, the study has been updated to only include queries that returned more than 25 results. Effectively cutting out the 755 or so three result queries I questioned. The old version of the paper is available here. The new version is here.

[Update: 8/25/2005]The new methodology being employed tries to filter out dictionaries two ways. The first by eliminating any result that returns under 25 results from either Yahoo! or Google. This directly addresses those queries that are almost all dictionary results on Google that I wrote about earlier. The second method is by adding a third random exclusion term.

The two measures were only successful in a limited way. Here, I've plotted a histogram of the difference in results between Yahoo! and Google for the original version, and the newly revised version. More results from Yahoo! run negative, more results from Google run positive. I've limited the plot to 500 queries on the y-axis to better view the shape of the distribution.

Original version:

New Verified version:

The peak is at a difference of 22 results with 120 different queries in the new "verified" version. Going through a few of these, its pretty easy to find a dozen or more dictionary like results for these terms. The exclusion term eliminated the larger, more complete dictionaries, but that's not all there are, there are plenty of more "less complete" dictionaries to be found ... and Google finds them. The chunk carved out of the distributions above? I believe we're still looking at dictionary and dictionary like filtering, and it clearly remains.

For fun I pulled out the two queries that were at the local minimum, at a difference of -93 results. They were:
unwell escapade -relatively
horribly hardihood -outpull
unwell escapade -relatively, is filled with blog results for Yahoo!. Google appears a little more timid about adding blogs to their index? This certainly jibes well with my own experience. The second query, horribly hardihood -outpull? I have no idea why. It was skipped by the analysis, but I kept it in because the duplicates removed results are under 1000. They're a little more strict.

A great way to compare Google vs. Yahoo! that Christian Langreiter put together. Pretty fun.

[Update: 8/29/2005] I ran across this study that was conducted in 2000 on 25 hand picked single term search queries. Then they were comparing the Fast, Northern Light and AltaVista search engines at the time. But aside from the number of queries and the fact that they are single words, the methodology is very similar. The first few queries return over 1000 results today.

Wednesday, August 24, 2005

First Day Of School

Posted by Phil Aaronson at 4:10 PM

My oldest had her first day of kindergarten this morning. Man, that was a fast five years. We had originally hoped to Homeschool, and had even gone so far as to pull her out of preschool for the past year as a trial. But between watching her interact with us, and watching her in swimming class, with camp counselors and so on, there's just a different dynamic going on. At least for her, having teachers that are not her parents turned out to be a good thing. So here we are. Then again, is it a cop out? Maybe, but she was sure excited to start class this morning. And practically bubbled over when she got home, telling us about "circle time" and having to raise her hand. And that some of the boys didn't follow the rules. It was cute.

Monday, August 22, 2005

Yahoo! Acquires Konfabulator

Posted by Phil Aaronson at 5:57 PM

[7/25/2005] Congratulations to Arlo, Ed and Perry! I played around with Konfabulator back in the 1.0 days. It was obvious, even back then, that it was just bursting at the seams with potential. My only problem: it wasn't designed for me. I just couldn't make a nice looking widget to save my life. I think, at its heart, it is a tool that's designed to give graphic artists the ability to write widget applications with a some programmatic stretching. Its sort of the inverse of say, Cocoa, which lets programmers write applications that look halfway decent with just a little graphic stretching. Cocoa just fit me better. But if you're graphically inclined, Konfabulator is more than worth a spin.

[Update: 8/22/2005] In an interview with John Gruber of Daring Fireball fame, John makes an interesting comment about Dashboard (Apple's Konfabulator). I'll quote the second of two points that he felt was controversial about Dashboard:
The fact that widgets need to be created with the help of a graphic designer or Photoshop artist. An utterly unartistic programmer can design beautiful regular Mac software using the standard OS controls; a Dashboard widget that isn’t custom-designed by a good artist is going to stick out like a sore thumb.
Exactly right.

Sunday, August 14, 2005

Blogger Hiccup

Posted by Phil Aaronson at 10:54 PM

[July 12] For some reason Blogger got it in its head that I wanted a "short" length atom feed, meaning, only the first 255 characters. I didn't. I never changed the setting, Blogger hiccupped. I re-saved my existing settings again, republished the feed and that seemed to fix it. I hope! But I appologize for any articles appearing as unread that you could swear you've already read.

I feel like I have learned what I wanted to learn about Blogger. It may be time to move on, once again and try something new.

[Update July 19] Blogger's at it again, shortening the feed size on me. Same problem, same fix, same applogies. Sorry folks.
[Update July 23] And again. Enough already.
[Update July 26] And again. *sigh*
[Update Aug 14] And again. Grrrr.
[Update Sep 02] And again. I gotta make some time to roll my own.

Slurpee Tunes

Posted by Phil Aaronson at 10:46 PM

Free Songs
Can you tell my wife and I have a serious weakness for Slurpees? On a hot day, we'll be heading back from a ride and the bike sort of steers itself on over to get one. And hey, cool, we got a free $0.99 iTunes song with a $1.29 Slurpee. The latest song I picked up: Neon by John Mayer. I'm not really a fan of his, but the acoustic guitar style in this song worked for me. We've got to pick up a bunch more songs, the Slurpee tunes expire at the end of the month.

Friday, August 12, 2005

Most Combative Search Engine

Posted by Phil Aaronson at 8:54 AM

Just for fun I've been running an informal "Most Aggressive Search Engine" competition. The Tour has a most aggressive (most combative) competition where the leader gets to wear a red version of their number on the road ... I'll have to create a red badge for the winner, or maybe they'll get their picture taken with the red podium babes. We'll think of something.

For a long time I had set this site up up so that search engines would not to index this site. It was more family blog than anything else. In June I opened up part of the site to the search engines (basically everything under /blogger/), the first part then, who would be first to start indexing under that part of the site?

The results of stage one:
msnbot: 26/Jun/2005:02:39:26 -0400
Yahoo! Slurp: 29/Jun/2005:18:16:15 -0400
Googlebot: 28/Jul/2005:14:11:16 -0400
MSN is off to a strong three day lead. Interestingly you can find hinkty results at Yahoo! and MSN, but they have not appeared in Google yet (try typing "site:hinkty.com ant crash course" for example into the big three).

So yesterday I opened up everything under /weblog/ to the search engine indexers. That content is linked off of the main page of this blog, its any content from 2004. The second part of the most combative competition: I wonder which search engine indexes that content first?

[Update 8/12/2005] Here's another interesting little combative factoid. I went through my logs and looked at how often each search engine re-checked my robots.txt and index.html file back when I disallowed everything. This was from March 8, 2004 to June 1, 2005. On average the big three re-checked every:
msnbot: 10.5 days
Yahoo Slurp: 13.3 days
Googlebot: 15.1 days

[Standard disclaimer: I work for Yahoo!, but not in Search]

Tuesday, August 09, 2005

Home Birth

Posted by Phil Aaronson at 7:52 PM

Its amazing watching your wife, or any woman for that matter give birth to a child. Watching your wife not only give birth, but be in so much control while doing it that she's able to reach back and catch the baby herself, it took the experience to a whole new level. Especially for her, but also for me. I don't know how to explain it. This birth was not just an experience I watched her survive, but one she met on her terms and really handled. I'm searching for the right words. It was one of the most impressive things I've ever seen.

A lot of my friends and family were worried about the home birth choice. But my wife, who is a NICU (Neonatal Intensive Care Unit) nurse at Stanford, had done the research. The dirty little secret she discovered: for a baby free of risk factors its safer to give birth at home than it is in the hospital. For a lot of reasons, including everything from increased risk of infection in the hospital to over eager c-section loving OBs.

All that aside, the care itself is better with a midwife. They only takes on a few clients a month, so you're much more likely to see her deliver the baby. My wife would go for her prenatal appointments and chat with the midwife for an hour. An hour? What OB does that? And my goodness, its an order of magnitude less expensive.

Our midwife was Donna at Town & Country Midwifery Service, in Palo Alto, CA. I can't recommend her highly enough. I still can't believe how well it worked out for us.

Monday, August 08, 2005

Third Daughter

Posted by Phil Aaronson at 7:24 AM



Chloe, 6lbs 12oz, 21". Born at home, 3:37am! Mom and baby are doing great. Now if we only could get some sleep!

Wednesday, August 03, 2005

OPML Fun

Posted by Phil Aaronson at 11:16 PM

I've been having fun playing a bit with Dave Winer's new OPML outliner/weblog tool. My little mini-site is here. Go there just to check out the headline picture if nothing else.

There's lots of really neat ideas, and I love the concept of weblog as outliner. I guess I should, I'm writing this post in my trusty Hog Bay Notebook, so outlining isn't exactly new ground. But I am cutting and pasting from the notebook to Blogger manually. And while the OPML outliner saves me from cutting and pasting, there is still a workflow issue. The intent seems more along the lines of generating (and publishing) small paragraphs at a time throughout the day. There doesn't seem to be a 'save draft', so I can work on a longer piece locally before publishing it to the web.

In short, its a neat tool so we can all build mini versions of the Scripting News site. I'm not sure that's really how I want my site exactly. But I've got to keep in mind, this is barely beta software and is still evolving. It'll be fun to watch it grow.

Shuttle Replacement

Posted by Phil Aaronson at 11:00 PM

Caught this great quote by Michael Griffin, NASA Administrator on the NY Times ...
"As long as we put the crew and the valuable cargo up above wherever the tanks are, we don't care what they shed," he said. "They can have dandruff all day long."
What they're talking about is a replacement plan for the space shuttle. The plan calls for splitting the shuttle in two, one rocket for lifting people, and another for lifting cargo. The two could potentially synch up at the space station. The people lifter is a capsule sitting on top of the shuttle's existing solid rocket booster. Which shouldn't be all that surprising since the proposal is coming from ATK, the company that makes them.

Solid rocket motors are funny things though. You can shape the fuel so it has certain burn characteristics, but there's no throttle, and there's no shutting it down once its lit. It'll be a heck of a ride. I guess the days of the shuttle's relatively modest 3 or 4g liftoff will be a thing of the past.

[Update] Just noticed, STS-114 wake up calls. Flight Day 9, they played "Where My Heart Will Take Me", the theme song from the now cancelled series Star Trek Enterprise. Seems fitting on a number of levels.