Crunching the Math On iTunes 276
markmcb writes "OmniNerd has posted an interesting article about the statistical math behind iTunes. The author makes some interesting observations concerning the same song playing twice in a row during party shuffle play, the impact that star ratings have on playback, and comparisons with plain old random play (star ratings not considered)." From the article: "To test the option's preference for 5-stars, I created a short playlist of six songs: one from each different star rating and a song left un-rated. The songs were from the same genre and artist and were changed to be only one second in duration. After resetting the play count to zero, I hit play and left my desk for the weekend. To satisfy a little more curiosity, I ran the same songs once more on a different weekend without selecting the option to play higher rated songs more often. Monday morning the play counts were as shown in Table 1."
I am not sure I see what he sees (Score:3, Interesting)
So after analyzing all that data, how does Brian Hansen come to the conclusion that "it's simply the mind's tendency to find a pattern that makes you think iTunes has a preference". Uh, no. It's the software learning that you have a certain type of genre or style that you strongly favor and will selectively pick songs that are related, thus giving you a better-selected playlist.
And it seems that the program has a bug in that it will play a song twice in a row. That's a real bug (if you don't like that type of thing).
Interesting (Score:5, Interesting)
Why? Because I haven't got the time to go around rating my entire music library. Judging from that article, it is dangerous to only do a few because of the weighting algorithm used - surely it would be more sensible to assume that 'not rated' meant 3 stars rather than 0 stars? That way you could rate down shitty songs, and rate up excellent songs, but ignore rating the vast majority of songs.
Re:Interesting (Score:4, Interesting)
As for Gracenote: perhaps sales on the ITMS could act as a gauge of this. e.g. "This is this artist's most downloaded song and this artist compared to similar ones is bought 5x as much, so our algorithms suggest it should be rated 5" Then once you have downloaded it you can change it if you get the time.
Re:I am not sure I see what he sees (Score:2, Interesting)
If you do a static shuffling, i.e., a shuffle at the beginning of playback, and then trudge through the playlist that was generated then you will certainly get each song played the same number of times, and you won't get repeats. The only chance of getting a repeated song is if the last song of a shuffled playlist is the same as the first song of the next shuffled list, which is 1/n^2.
You can combine the two however. Have 6 queues, one for *****, another for ****, and so on. Each queue would have its own last-played pointer. Each queue would be randomly shuffled once, until all songs in that queue have been played. Then have your weighting algorithm merely choose which queue to play from, and then play the next song in that queue.
Re:Reminds me of... (Score:4, Interesting)
Rating For Morning Listening (* for Aphex Twin, Slayer, etc)
Rating For Afternoon Listening (**)
Rating For Evening Listening (****)
Rating For Party Listening (**)
Rating For ${mood} Listening
Then instead of getting work done we can spend out entire lives rating music.
Re: Try last.fm (Score:5, Interesting)
Add on top of that the ability to play a custom-built radio station, set it to play only new music or listen only to music from a particular user profile.
Linux and BSD supported! Open source plugins and radio station player! Could it get better?
---
but make sure that the last line
Generated by SlashdotRndSig [snop.com] via GreaseMonkey [mozdev.org]
Re:iTunes is a monopoly (Score:1, Interesting)
sorry to hear that you business is stalling. Clearly I don't live in your neighborhood (or even your country), but my experience of downloading music has been different: I hear it on the radio (community radio), if I want to hear it again I download it, if I like it I go and buy it. If I don't like it enough I don't buy it - sort of like podcasting music. Almost every CD I have bought in the past 3 years has been bought this way (that's 1 or 2 a week). I'm buying more music now than I did before I started downloading music.
Perhaps your 'family demographic' is the wrong business strategy for you these days as these 'family music' buyers are downloading but not buying.
Re:Underlying formula (Score:5, Interesting)
5 star -
4 star -
3 star -
2 star -
1 star -
Re:You wonder why the music industry is mad (Score:3, Interesting)
Would I pay to have my music rated by an external algorithm? No. Would I pay to have my music peer rated? No - I'd also be contributing back to it like I contribute back to Gracenote and FreeDB.
I suppose it is easiest to just rate everything *** and apply ****/***** and **/* to the tracks I really notice as standing out.
Re:Interesting (Score:5, Interesting)
Rate up and down others as necessary. OK, not the point that default should be doing this for you, but a quick fix if you want it to work that way.
If you already have songs rated then create a 0 star smart playlist and repeat.
Stuart
I found it an odd statement too (Score:3, Interesting)
I mean, where is this statistic coming from?
In my case the majority of rated songs are 5's, almost the same number of 4's, then some 3's, and hardly any 2's or 1's.. with perhaps 50% left unrated. I use iTunes at least several hours a day. Those of my friends who use iTunes seem to have a similar distribution.
Re:Reminds me of... (Score:4, Interesting)
1: Never play unless I explicitly say so.
2: Don't include in shuffle.
The first one I'd use to flag interviews etc. that are sometimes included on albums. Is not necessarily bad content, just something that you don't generally need to hear multiple times.
The second one is for flagging things like Beethoven's 9th. It's really good music, but you don't want 67 minute long pieces in a random playlist.
I currently just use the 1 and 2 star ratings for this, but it's not really ideal. It's too bad (but understandable) that iTunes has no option for looking at TXX frames [id3.org] or I could implement it in a better way.
Re:From the article...trick of the mind (Score:3, Interesting)
For 4000 songs, that's around 64~ songs. So if your player chooses tracks completely randomly then 50% of the times you'll listen to 64 songs, you'll hear the same song twice from those 64.
Even if your player doesn't play the same song twice, if you have 8000 songs from 4000 artists, 2 songs per artist, then you get a similar calculation.
Re:Reminds me of... (Score:1, Interesting)
I like it... but then most of the rest of my music is just as weird.
Modal Music (Score:5, Interesting)
She said that research had shown that listeners would rate the same song higher if it followed other song of a similar genre. If they play songs of different genres randomly the listener does not enjoy the music as much.
So their tendency is to play "blocks" of music.
For example....
4 Classic Rock songs
3 Blues Songs
3 Folk songs
4 Female Rockers
3 Grunge
etc.
This is common knowledge in the radio world. I wonder if Apple has incorporated this type of logic into it's iTunes algorithms?
The radio station in question is WXPN and can be found under iTunes > Radio > Public > WXPN
Re:Reminds me of... (Score:4, Interesting)
Strange math crunching (Score:2, Interesting)
After reading the article, I still do not understand the iPod's shuffling algorithm.
The first half of the article is devoted to describing how the writer got the probabilities of rated songs and properties of these probabilities. Although these probabilities give some insight to the shuffling algorithm, they are pretty useless, since they are observed from unrealistic list of songs, i.e. 6 songs with different ratings.
Then cames the formula in Figure 2. How it is calculated and where from the author takes it, is not explained in the article. Also this formula is not backed up by empirical observation. The rest of the article is devoted to analyzing the effects of this formula, which are interesting, yet could have no importance if the actual formula is different. So this article does not really explore the iPod shuffling algorithm, but explores how would iPod shuffling algorithm work if the probability of the next song is calculated according to the formula provided by author. That is pretty useless, since we all can provide our own formulas and write the articles.
Now concerning this formula. To me it seems a litlle strange. Consider hypothetical situation of song list with 1000 unrated songs, and one with 5 star rating. The the probability (according to the formula provided by author) that the song with 5 star rating would come up is
0.27/(1000*0.039+0.27) = 0.006875477
which is pretty miserable odds. If I rated it so highly, that means I want to hear it a lot, now with such shuffling algorithm, I would hear it slightly more, yet not a lot. Of course, then I could create a playlist, with this song only, but then why one needs rating system, if it does not perform.
So it would be really interesting to know iPod's shuffling algorithm, to see if it saves the hassle of creating your own playlists. (Or even the possibility to provide your own algorithm), yet this article does not provide any insights.
Re:Reminds me of... (Score:5, Interesting)
* - Never play. It's only in the list for the sake of completeness (I hate having partial albums)
** - Play very rarely. If I'm in the mood, I might listen to it.
*** - I'll listen to it at least once a week. If it comes up randomly on the shuffle, I won't take it out of the list.
**** - I can listen to this several times in a day.
***** - I'll listen to this song anytime, anywhere. If it comes up twice in a row, no problem. If my playlist only has this song on it, I can cope with that for at least a few hours.
This means that I have to periodically re-rate the songs. That seems only reasonable, though. Why would songs stay at the same rating forever? As the novelty wears off, I can relegate a song to 4 or 3 stars.
I also keep extensive smart playlists that make sure that songs that are 3 stars or less only get played once every few days.
the REAL underlying formula (Score:5, Interesting)
points(0 stars)=1
points(1 stars)=3
points(2 stars)=4
points(3 stars)=5
points(4 stars)=6
points(5 stars)=7
probability(X stars) = points(X stars) / 26
This yields the following probabilities, listed along side the observed values from the article along with 95% condience intervals.
p(5 star)=.2692 [.270 +- .0038] .0036] .0033] .0031] .0027] .0016]
p(4 star)=.2308 [.230 +-
p(3 star)=.1923 [.189 +-
p(2 star)=.1538 [.154 +-
p(1 star)=.1154 [.118 +-
p(0 star)=.0385 [.039 +-
As you can see each computed probability falls within the 95% confidence interval, so there's a good chance this is the correct forumla.
Boy do I have too much time on my hands today.
Re:Ok... (Score:1, Interesting)
Dear Apple (Score:3, Interesting)
Why no-stars? Because that way the majority of the collection is unrated. Stared songs really stand out in a playlist. 1 and 2 star songs play less often than no-stars, while 3, 4, and 5 play more often. But I want my favorites to play much more often than your arbitrary algorithm.
least often of each rating (Score:2, Interesting)
1: rated 2 and 25 rated by least recently
2: rated 3 and 73 rated by least recently
3: rated 4 and 70 rated by least recently
4: rated 5 and 32 rated by least recently
playlist is any of playlists 1, 2, 3, or 4 with live updating
The numbers (25, 73, 70, and 32) come from multiplying the number of songs in each category by the rating-1, so it is essentially the same as the "play higher rated songs" in PartyShuffle. I leave 1 rated songs for ones that I don't listen to very often. This way, I get a random selection of my music that does not repeat a song until I have more-or-less gone through the rest of them in that rating. And, it generally plays the 5 rated songs about 4 times more than the 1 rated songs.
I found that I do not like the random feature since it often will play one song significantly more than another song. Eventually, it would even out, but in the range of 20 times playing a song, there can be a large discrepancy and I haven't heard some songs in longer than I'd like.
Re:Reminds me of... (Score:2, Interesting)
'Unrated [infinity]' restricts the selection to Songs with a Rating of -----.
'Unrated 50' also restricts it to the 50 least recently played songs.
Finally, '***+' selects songs with a rating of ***--;, ****-, or *****, and then only the 100 least recently played songs.
All these playlists are Live Updating.
When I need to rate songs, I play 'Unrated [infinity]' or 'Unrated 50', and rate songs as I play, using the same rules as the Parent. When I just want to listen to good music, I listen to '***+'.
Also remember that unlike Slashdot, iTunes and the iPod can display Unicode characters, so you can use Real stars and Infinity instead of Asterisks and whatever.
Crunching the math on Slashdot (Score:5, Interesting)
But all this talk of 0, 1, 2, 3, 4, 5 has me thinking of another rating system. Would anybody care to do an analysis of the ratings in Slashdot comments? What are the relative populations (I expect a ton of 2's but how about the rest)? Do comments made in the first hour after a story is posted stand a better chance of reaching +5 than comments made later in the day?
One of my gripes about the Slashdot comment system is that it discourages contemplation and discussion. Comments made more than 24 hours after a story is posted are rarely read and almost never moderated. This is in contrast with comments system like Usenet or other bulletin boards, where threads can remain lively for weeks.
AlpineR
Re:Car has a "random" bug (Score:2, Interesting)
First, certain burned CDs (I have yet to see this on a commercial CD) when played on random will only play the first track. However, if it is not played on random, it will play all the tracks. Interestingly enough, though, if I spin the CD in the holder, that will sometimes allow it work correctly.
The other interesting thing is that the CD player will not repeat tracks until all tracks of the CD have been played (Duh), at which point all bets are off. While I have never seen it play the same track back-to-back, I have seen it play a track, another track, and then the first track again. Note that this only happens once the CD has randomly played all tracks.
I told the Audi dealer about this. They pretty much said, "Yeah, so?" I sort of agree with them. Of course, as soon as I have the cash, I'm getting an ice>Link [densionusa.com] and an iPod and I'll toss the CD player.