Last March, I sat in a review meeting where our team's newest model achieved 94% accuracy on a benchmark of ten million listening sessions. Everyone applauded. I felt nothing. Three years of building collaborative filtering systems had taught me an uncomfortable truth: accuracy on paper and satisfaction in headphones are entirely different metrics. The model could predict what you would not skip. It could not predict what would make you stop everything and listen.
The Recommendation Paradox
The core problem is deceptively simple. When someone skips a track after fifteen seconds, the algorithm reads rejection. But maybe they were in a meeting, or the song hit the wrong memory at the wrong hour. The skip button carries emotional data that no feature vector can capture, yet we built entire systems around treating it as a binary signal.