How we built a music recommendation engine in 2007 – before Spotify existed in Portugal

Palco Principal did not start in 2006. It started in 1999, with Homestudio, one of the first portals for independent bands and music projects in Portugal. For six years Homestudio was a meeting point for musicians who wanted to publish their work online, at a time when the alternatives barely existed. In 2006 that base became Palco Principal, a music social network built from scratch with proprietary technology.

The early years moved fast. A NEOTEC grant supported the launch. At the end of 2007 a partnership with the Clix portal was established. In 2009 Palco Principal joined the SAPO network, the largest Portuguese web portal at the time. The platform expanded to Portugal, Brazil, Angola, Mozambique, and Cape Verde, becoming the largest artist social network of Portuguese origin for the years 2006 to 2018. Partnerships with EMI, Universal Music Portugal, Valentim de Carvalho, and Farol Musica, among others, brought mainstream catalogue alongside the independent artists.

By November 2011 the numbers were: more than 350,000 visitors per month, more than 1.9 million pageviews, more than 500,000 total monthly visitors counting widgets, OpenSocial apps, and mobile, more than 100,000 registered listeners, more than 20,000 artists, more than 70,000 tracks available for listening and download, and more than 250,000 tracks in listener playlists.

One technical detail that defined the product identity: Palco Principal was one of the only sites that did not reduce the bitrate of uploaded tracks. The original quality was preserved intact. At a time when most platforms compressed files to cut storage and bandwidth costs, this decision had real infrastructure cost. It also meant that a musician uploading high-quality audio saw that quality reflected in the listener experience. It was a signal of respect for the artists’ work, and a real technical differentiator that most coverage of the platform never discussed.

The recommendation problem

With 70,000 tracks and 20,000 artists, music discovery was the central product problem. Palco Principal was the first national music site to offer applications for Hi5, Myspace, and Orkut, which extended reach but did not solve the question of how to connect listeners to music they had not yet encountered. Spotify would not launch in Portugal until 2013. The team had to build the recommendation system from scratch.

In August 2007, Exame Informatica magazine published a feature on Palco Principal in issue 146. Journalist Isabel Infante compared the recommendation system to Amazon, iTunes, and eBay. The original article is available here. At that point the platform had just over 10,000 registered users, gained in eight months.

The system architecture

The recommendation system was built in two modules. The first, model generation, ran periodically and built the similarity matrix across all tracks. The second, the recommender, used that matrix to determine suggestions in real time, with a response below one second for any user.

The similarity between two tracks was computed using cosine similarity:

M(i,j) = I(i,j) / sqrt(Di x Dj)

Where I(i,j) is the number of playlists containing both tracks i and j, and Di and Dj are the total number of playlists containing each one. The implementation ran as SQL stored procedures on a PHP and MySQL stack. A single PC. 38,000 tracks. Model rebuild time: under 30 minutes.

The recommendation score for each candidate track was calculated as:

Ri = Sum(I(Ni,m)) / Sum(N(i))

Where Ni are the nearest neighbours of track i and m are the tracks already in the user’s playlist. The number of neighbours considered was set to 4, a value the paper identifies as needing experimental fine-tuning.

The Rejection Index

The system included a two-level blacklist. The personal blacklist let each user flag tracks they did not want recommended. The global blacklist accumulated how many times each track had been rejected across the entire community. To combine these signals, a Rejection Index was developed:

RI = 1 - B / (B + P + 1)

Where B is the number of blacklistings and P is the number of playlist occurrences. A track with a high rejection index drops out of recommendations regardless of its popularity in other contexts. During the 9 days of the A/B test, 64 users blacklisted 279 tracks, with an average of 4.36 per user and a median of 2.

The A/B test

The system went live on April 9, 2010. The controlled experiment ran from March 29 to April 6, with user split by HTTP cookie. The test group, exposed to algorithmic recommendations, added 310 tracks to playlists. The control group added 36. The test period averaged 225 track additions to playlists per day; the period before the test averaged 147.5 per day.

The 310 records from the test group correspond to 309 playlists from 308 distinct users, with 1,491 unique tracks added and an average of 6.56 tracks per playlist. 330 users interacted with playlists or blacklists during the period; 42 used both simultaneously. The full methodology is in the WTI 2010 paper.

Palco 3.0

In 2008 the research work was formalised as Palco 3.0, co-funded by QREN and FEDER. The consortium included INESC Porto (LIAAD and CRACS), the Faculty of Sciences of the University of Porto (FCUP), the Faculty of Engineering (FEUP), and the Faculty of Economics Porto (FEP). The Palco Principal team was Joao Carvalho, Pedro Trindade, and Daniel Botelho. The project ran until October 2011.

The four deliverables were: Palco Principal 3.0 (the evolved production portal), PTECH (a generic, reusable technology platform), Palco Loja (an e-commerce prototype), and Palco Movel (a mobile platform prototype). Mobile solutions were subcontracted to Shortcut.

The academic partners produced published work on incremental collaborative filtering with forgetting mechanisms, by Joao Vinagre and Alipio Mario Jorge (FCUP and LIAAD-INESC Porto). Four algorithms were evaluated: UBSW, IBSW, UBFF, and IBFF, tested across four datasets including the real Palco Principal data, published as the MUSIC dataset: 785 users, 3,121 items, 9,128 transactions. Results showed UBFF processing updates in under 0.1 seconds for the MUSIC dataset, while IBFF could take up to 10 seconds due to the item count (3,121). The papers are here and here.

How it ended

The 2008 financial crisis collapsed the online advertising market in Portugal. Advertising was the primary revenue of Palco Principal. The team continued, grew the user base, and finished the Palco 3.0 research. The platform eventually closed. The accumulated knowledge, including recommendation infrastructure, behavioural analysis, and content management at scale, fed directly into what would become GFoundry.