Author: João Carvalho

  • Your engagement model is not broken. It’s just not yours.

    Europe has the lowest employee engagement scores in the world.

    Gallup’s State of the Global Workplace puts it at 13%. The UK sits at 10%. Portugal and Southern Europe trend lower still. Year after year – the same report, the same number, no improvement since at least 2016.

    The standard response from the HR industry goes like this: European companies are bad at engagement. They need to invest more in culture. More listening sessions. More recognition programmes. More of whatever it is that American companies are apparently doing right.

    There is a problem with this interpretation. The same Gallup datasets that produce the 13% number also show that European employees report lower stress than their American counterparts, higher wellbeing, less loneliness at work, and higher satisfaction with work-life balance. And European business performance – measured by productivity per hour worked, or by retention – is not consistently worse.

    So either Europe has found a way to run healthy, productive companies with almost no engaged employees – or the model is measuring the wrong thing, in the wrong way, for this context.

    I think it is the second.

    The model doesn’t know where you are

    The Gallup Q12 is the most widely used engagement instrument globally. Twelve questions, applied across 160 countries, producing the benchmark that most HR teams use to judge themselves.

    It was developed and validated primarily in the United States.

    One of the twelve items asks: "Do you have a best friend at work?" In the US, "best friend" is a casual phrase – you can have several. In the UK, it implies the single closest person in your entire life. In Portugal, melhor amigo carries a similar weight. The question means different things in different places. This is not a nuance. It is a structural problem with cross-cultural measurement.

    More broadly: individualist cultures – the US, Australia, parts of Northern Europe – are more likely to self-report positively on items about personal performance and team relationships. They are also more willing to give high scores on Likert scales. More reserved or collectivist cultures – Portugal, France, Japan – systematically score lower, not because the underlying experience is worse, but because the response style is different.

    This is well-documented in survey methodology research. It has a name: acquiescence bias, extreme response style, cultural response bias. It is not new. And HR analytics has largely ignored it.

    The practical consequence: an HR team in Lisbon looking at their 11% engagement score against a 79% global benchmark is not seeing their organisation. They are seeing the artefact of a model calibrated elsewhere, applied here without adjustment.

    How different can the results be? Perceptyx – using a different methodology on the same European workforce – reports engagement at 75.6%. Same population. Different instrument. The gap between 13% and 75.6% is not explained by reality. It is explained by how you ask the question.

    The size problem compounds this

    The measurement problem is not only cultural. It is structural.

    Most people analytics benchmarks are built from large enterprise datasets. Insight222 reported in 2025 that the typical ratio is one people analytics practitioner per 2,500 employees. The companies that produce the data that feeds the benchmarks have 5,000, 10,000, 50,000 employees. Dedicated HR analytics functions. Mature HRIS systems. Years of clean data.

    When a 300-person company in Porto tries to interpret their engagement data against those benchmarks, they are comparing structurally incomparable things. A company where the CEO knows everyone’s name operates by different organisational dynamics than a division of a multinational. A metric built for one cannot be read straightforwardly in the other.

    This is not a complaint about access. It is a statement about statistical validity. Benchmarks built from a specific population do not generalise to a different population without recalibration. This is basic methodology. And yet, most platforms present these benchmarks as universal truth.

    What this looks like when you’re building the systems

    At GFoundry, we have spent ten years building and deploying engagement, performance, and talent systems for several companies. We have seen this problem from the inside – not as a research question, but as a deployment reality.

    The research literature is full of examples where models built in one context produced misleading results in another.

    Hanus and Fox (2015) ran a longitudinal study comparing gamified and non-gamified environments using leaderboards and badges. The gamified group showed decreased intrinsic motivation, lower satisfaction, and worse performance over time. The mechanism that was supposed to drive engagement actively undermined it – because the psychological relationship to competition and public recognition is not universal. It depends on context, culture, and how autonomy is perceived.

    The cross-cultural survey research makes this even clearer. Kemmelmeier (2016) documents how response styles – acquiescence bias, extreme response style – vary systematically across cultures. North American and individualist cultures score higher on self-report scales not because the underlying experience is better, but because the response pattern is different. English-speaking countries consistently provide higher ratings compared to non-English-speaking Western European countries. This is not opinion. It is a well-documented measurement artefact.

    And the consequences are real. Harvard Business Review has questioned the validity of the Q12 “best friend at work” item, noting that without a standard definition of what “best friend” indicates, answers are open to interpretation and give no quantifiable way to measure the health of company culture. When that single item contributes to a composite score used to benchmark 160 countries against each other, the measurement problem is not academic. It is operational.

    In each case, the model was not broken. It was just not built for every context where it was being used.

    Three questions before you trust the output

    Here is the framework we use internally – and the one I would recommend to any European company deploying people analytics.

    1. Where was this model trained?

    Every engagement benchmark, attrition predictor, or skills framework carries the imprint of the data it was built on. If the training data is predominantly large US enterprise, the model’s internal logic reflects that context. Ask the vendor: what was the sample composition? What is the average company size, geography, and sector of the benchmark? If they cannot answer this – or if the answer is "it’s proprietary" – be cautious about any comparison they ask you to make.

    2. What is the model actually measuring – and is that the same thing in your context?

    A score is not a fact. It is a measurement of something, using a particular instrument, in a particular language, interpreted through a particular cultural lens. Before drawing conclusions, ask: does this instrument behave consistently across the cultural and organisational contexts present in our company? Has anyone validated that a high score means the same thing here as it does in the benchmark population?

    3. What decision would change if this number were different?

    This is the most important question, and the one most people analytics implementations skip. If the answer is "nothing" – if the score exists for reporting purposes but does not change what managers do, who gets developed, how headcount is planned – then the cost of a miscalibrated model is low. If real decisions flow from this number, the cost of ignoring its context-dependency is high.

    What to do about it

    For a company deploying people analytics in Europe – especially an SME:

    Understand the benchmark before you benchmark against it. Ask where the comparison data comes from. A benchmark built from 500 Fortune 500 US companies is not a valid reference for a 250-person company in Lisbon. It is not even a valid reference for a 250-person company in Chicago. Size matters as much as geography.

    Build internal benchmarks first. For most European SMEs, the most meaningful comparison is to themselves over time. Trend data within the company – are we improving quarter over quarter on the dimensions we care about? – is more actionable than a cross-company benchmark built from a population that does not resemble you.

    Disaggregate before comparing. If you have teams in different countries, cities, or with meaningfully different working arrangements, do not compare them on the same scale without adjusting for context. A remote team in rural Portugal and an office team in Lisbon are not the same population for engagement purposes.

    Validate locally before scaling. A recognition programme that worked in a UK subsidiary is not guaranteed to work in a Spanish one. A feedback cadence that feels right in a fast-moving tech team may feel intrusive in a more traditional industrial context. Pilot before assuming.

    What we don’t know

    This matters – and I want to be explicit about it.

    We do not have a clean answer to the central problem. The honest position is: we do not yet know how to build truly context-aware people analytics models at scale. The tooling does not exist in a mature form. Most platforms offer regional filters, not regional model recalibration. Filtering data by country is not the same as building a model that accounts for how response patterns, cultural norms, and labour market structures differ across countries.

    We also do not know how much of the European low-engagement pattern is measurement artefact and how much is genuine structural difference. Some of it is surely real – European labour markets, regulatory protections, and cultural attitudes toward work are genuinely different from American ones, and that will show up in how people relate to their organisations. Academic work on welfare state structures and work engagement shows that the institutional framework around work, not just company practices, explains variation across Europe.

    Separating signal from noise here is a real, unsolved research problem. Anyone who tells you they have solved it is selling something.

    The first step

    The first step is not better models. The first step is honesty about the models we have.

    A metric that produces systematically implausible results across an entire continent is not revealing a problem. It is the problem. Reading a 13% engagement score as organisational failure – when the same employees report higher wellbeing, lower stress, and comparable productivity – is not data-driven decision making. It is model-driven confusion.

    The benchmark was built elsewhere. It does not know where you are. Start there.

  • Ten years of multi-tenant SaaS: the decisions we made, and the ones we’d unmake

    Gamefoundry was the starting point — a formal R&D programme with Fraunhofer Portugal, co-funded by QREN and FEDER, that gave the team a structured way to solve hard problems before they became production problems. It was an accelerator in the literal sense: it compressed learning that would otherwise take years into a focused project. The research paper from that period documents the architecture and what the data showed.

    GFoundry is not Gamefoundry. The platform that runs today was rebuilt, extended, and rethought many times over a decade. What remained constant was the direction: an engagement platform for people at work, with gamification and data at the core.

    What the platform actually does

    GFoundry is an HR and people management platform built around gamification, AI, and personalised employee journeys. The premise is that engagement is not a metric you measure — it is something you engineer into the daily experience of work. Most HR platforms treat engagement as an output. GFoundry treats it as an input: a set of behavioural mechanics that, if designed correctly, change how people interact with their work, their team, and their development.

    The platform covers learning and development, talent management, performance, onboarding, and internal communication. Each module connects to the same underlying data layer. That shared data layer is what makes personalisation possible — not personalisation as a marketing word, but as a concrete technical capability: different content, different challenges, different progression paths for different employees based on their role, history, and behaviour.

    The virtual assistant — Gi — is the interface for that personalisation. It uses machine learning to surface what each employee needs, when they need it, and in a format that fits how they actually use the platform. The ambition is that the platform adapts to the employee, not the other way around.

    Gamification as a mechanism, not a feature

    The distinction matters. Most platforms that call themselves gamified have added points, badges, and leaderboards on top of an existing product. That is gamification as decoration. It rarely changes behaviour because it does not change the underlying structure of how people interact with the system.

    GFoundry was designed from the start with gamification as the structural layer — the mechanism through which learning, performance, and development workflows are delivered. Challenges are not optional extras. Progression systems are not cosmetic. The data from the Gamefoundry R&D programme showed that behavioural mechanics, when properly integrated into the experience, produced measurable differences in completion rates and return engagement. That evidence is what justified building the product this way rather than adding mechanics after the fact.

    Multi-container: what it means in practice

    Enterprise clients are rarely monolithic. A large company typically operates multiple subsidiaries, business units, or brands — each with its own culture, its own HR processes, and in many cases its own identity. A platform that forces all of them into a single shared environment, with a single brand and a single configuration, does not work for that structure.

    Multi-container architecture means each sub-company gets its own container: its own branding, its own user base, its own configuration, its own data scope. From the outside, each subsidiary has its own experience. From the inside, the platform is one system. The client’s IT team manages one integration. The HR team manages one contract. The data is isolated where it needs to be isolated and shared where sharing creates value.

    This is technically harder to build than a single-tenant product, and harder to maintain than a flat multi-tenant one. It was a deliberate early decision. Building for it later, after the data model and permission system are already in production, is far more expensive than building for it first.

    26 languages

    Localisation in enterprise SaaS is usually an afterthought — something you add when a client in a new market asks for it. 26 languages is not an afterthought. It reflects a product decision made early: that engagement cannot be fully separated from language, and that language cannot be fully separated from culture.

    An employee using a platform in their native language has a different experience from one navigating it in a second language — not just in comfort, but in how they process challenges, respond to feedback, and engage with content. At the scale of a large enterprise, that difference is not small.

    Integration as connective tissue

    GFoundry does not replace the systems of record that large enterprises already run. It connects to them. Employee data flows in from the HRIS. Performance data flows out to management reporting. Learning completions are recorded against the talent management system. In some deployments, GFoundry connects to CRM platforms to align sales team engagement with pipeline data.

    That model — platform as connective tissue rather than platform as replacement — requires an API layer built to enterprise standards. It also requires a different kind of implementation relationship with clients: one that involves their IT teams, their data governance processes, and their existing vendor landscape. The integration work is invisible in the product. It is not invisible in the implementation.

    The competitive context

    SAP SuccessFactors and Workday are built around process efficiency and operational control — comprehensive platforms that cover HR, finance, and planning for large enterprises already inside their ecosystems. Cornerstone OnDemand has built its position around LMS and structured learning. Talentia focuses on HR administration. Factorial targets SMEs with simple workflows.

    None of them put gamification, AI-driven personalisation, and multi-container architecture at the centre of the product. GFoundry competes in the same enterprise segment as platforms built by companies with teams many times larger than ours. The differentiation is not just feature-level — it is architectural. A full comparison is at gfoundry.com.

    The decisions we’d unmake

    Ten years is long enough to accumulate real opinions about what you got wrong. Not every decision compounds well. Some architectural choices that seemed pragmatic created drag. Some features built because clients asked for them added surface area without adding value. Some things deferred because they were hard stayed deferred too long.

    We’ll write those up properly — with specific examples, not generalities. That is the kind of post this site exists for.

  • Stream mining in real time: what we built with Fraunhofer and what we learned

    In 2012, the social gaming market in the United States had 80 million active users, according to eMarketer data. A Nielsen study published in April 2012 showed that 14% of people bought a product after seeing it promoted on a social network, 26% engaged with the ad, and 15% shared it. For brands looking for engagement mechanisms beyond passive advertising, social games were a real and underexplored lever.

    It was in this context that Ubbin Labs started the Gamefoundry project, in co-promotion with Fraunhofer Portugal AICOS, co-funded by QREN and FEDER with an estimated investment of 378,000 euros. The objective was to build infrastructure for digital marketing through social games, including collective behaviour analysis and user profiling based on real interaction data.

    The architecture

    The system had three main modules.

    GameCore was the central hub: it managed data for clients (publishers), players, and games, and provided two web portals. The publisher portal allowed creating, customising, publishing, and managing games, and accessing statistics and data mining results. The player portal let users explore available games, play, access their profile, and view leaderboards. Games were built in HTML5, CSS3, and JavaScript, structured in three segments: Option Selection (optional), Target Questions (optional but critical for data mining), and the game itself. The GameCore API was REST-based, with JSON responses and JSONP support for cross-domain use.

    The Social Game Containers were the distribution layer, available for web, Facebook, and mobile (Android and iOS). Each container handled player authentication via Facebook or Google OAuth, loaded games, and sent interaction data to the analytics module. Data collected included player location (country, city, GPS coordinates), device information (brand, model, operating system), session duration, events with timestamps, and score. Web and Facebook containers were publicly released at the time of the paper’s publication; mobile containers were in development.

    The GDSS (Game Data and Support Service), called SIAJ in the early stages of the project, was the processing and analytics module. It had three sub-modules: communication with GameCore and Containers, statistical analysis and data mining, and database. Communication between the messaging and analytics components ran through a message queue implemented with Apache ActiveMQ. Processing was triggered when matches waiting for analysis exceeded 5% of the total, avoiding on-demand processing that would have degraded performance. Data was stored in MongoDB, chosen for document model flexibility and the expected scale of the data stream. The analytics component used stream mining techniques to update results iteratively rather than recalculating from scratch each cycle.

    Security was handled by AuditMark, internationally recognised for the JScrambler JavaScript protection technology, used at the time by companies including RSA Security and Rovio. AuditMark contributed source code protection for the JavaScript modules, security control design for collected data, and fraud detection techniques for ad traffic auditing.

    Stream mining in practice

    The analytics pipeline ran three types of algorithms over player interaction data.

    Clustering with K-Means: automatically grouped users by similarity across demographic, geographic, and game behaviour data. Each cluster produced a typical user profile available to the publisher.

    Classification with Decision Trees (C&RT): identified the distinguishing features between user profiles for each answer in the Target Questions. On a small sample, the classifier did not produce robust results, a fact documented in the paper as a consequence of sample size.

    Association Rules with FP-Growth: identified associative patterns between answers and behaviours. Results were validated with Rapid Miner.

    What the data showed

    The 18 contests published on Palco Principal during the tests involved more than 2,300 distinct users, all in quiz format, with prizes of event tickets. In one specific contest with 524 participants and two Target Questions, the association rules results were as follows:

    PremiseConclusionSupportConfidence
    Optimus AliveSpotify0.1950.538
    ColdplaySpotify0.1700.495
    MuseSpotify0.1830.469
    The SimpsonsSpotify0.1780.455
    5 para a meia noiteSpotify0.1980.454

    The most cited rule: 54% of players who had liked Optimus Alive on Facebook used Spotify to listen to music online. For a music festival’s marketing campaign, knowing that more than half of the Facebook audience uses Spotify is information with direct value in media buying decisions. The paper published in Procedia in 2014, at the 2nd International Conference on Strategic Innovative Marketing, documents the methodology and full results.

    What it became

    Gamefoundry was R&D. What followed was product. GameCore, the gamification mechanisms, the behavioural analytics infrastructure, the multi-platform container distribution model: these elements moved from the research project into the foundation of GFoundry. The multi-container architecture that today allows enterprise clients to manage multiple subsidiaries with independent branding on a single system has direct origin in the Game Containers model developed in Gamefoundry.

  • How we built a music recommendation engine in 2007 – before Spotify existed in Portugal

    Palco Principal did not start in 2006. It started in 1999, with Homestudio, one of the first portals for independent bands and music projects in Portugal. For six years Homestudio was a meeting point for musicians who wanted to publish their work online, at a time when the alternatives barely existed. In 2006 that base became Palco Principal, a music social network built from scratch with proprietary technology.

    The early years moved fast. A NEOTEC grant supported the launch. At the end of 2007 a partnership with the Clix portal was established. In 2009 Palco Principal joined the SAPO network, the largest Portuguese web portal at the time. The platform expanded to Portugal, Brazil, Angola, Mozambique, and Cape Verde, becoming the largest artist social network of Portuguese origin for the years 2006 to 2018. Partnerships with EMI, Universal Music Portugal, Valentim de Carvalho, and Farol Musica, among others, brought mainstream catalogue alongside the independent artists.

    By November 2011 the numbers were: more than 350,000 visitors per month, more than 1.9 million pageviews, more than 500,000 total monthly visitors counting widgets, OpenSocial apps, and mobile, more than 100,000 registered listeners, more than 20,000 artists, more than 70,000 tracks available for listening and download, and more than 250,000 tracks in listener playlists.

    One technical detail that defined the product identity: Palco Principal was one of the only sites that did not reduce the bitrate of uploaded tracks. The original quality was preserved intact. At a time when most platforms compressed files to cut storage and bandwidth costs, this decision had real infrastructure cost. It also meant that a musician uploading high-quality audio saw that quality reflected in the listener experience. It was a signal of respect for the artists’ work, and a real technical differentiator that most coverage of the platform never discussed.

    The recommendation problem

    With 70,000 tracks and 20,000 artists, music discovery was the central product problem. Palco Principal was the first national music site to offer applications for Hi5, Myspace, and Orkut, which extended reach but did not solve the question of how to connect listeners to music they had not yet encountered. Spotify would not launch in Portugal until 2013. The team had to build the recommendation system from scratch.

    In August 2007, Exame Informatica magazine published a feature on Palco Principal in issue 146. Journalist Isabel Infante compared the recommendation system to Amazon, iTunes, and eBay. The original article is available here. At that point the platform had just over 10,000 registered users, gained in eight months.

    The system architecture

    The recommendation system was built in two modules. The first, model generation, ran periodically and built the similarity matrix across all tracks. The second, the recommender, used that matrix to determine suggestions in real time, with a response below one second for any user.

    The similarity between two tracks was computed using cosine similarity:

    M(i,j) = I(i,j) / sqrt(Di x Dj)

    Where I(i,j) is the number of playlists containing both tracks i and j, and Di and Dj are the total number of playlists containing each one. The implementation ran as SQL stored procedures on a PHP and MySQL stack. A single PC. 38,000 tracks. Model rebuild time: under 30 minutes.

    The recommendation score for each candidate track was calculated as:

    Ri = Sum(I(Ni,m)) / Sum(N(i))

    Where Ni are the nearest neighbours of track i and m are the tracks already in the user’s playlist. The number of neighbours considered was set to 4, a value the paper identifies as needing experimental fine-tuning.

    The Rejection Index

    The system included a two-level blacklist. The personal blacklist let each user flag tracks they did not want recommended. The global blacklist accumulated how many times each track had been rejected across the entire community. To combine these signals, a Rejection Index was developed:

    RI = 1 - B / (B + P + 1)

    Where B is the number of blacklistings and P is the number of playlist occurrences. A track with a high rejection index drops out of recommendations regardless of its popularity in other contexts. During the 9 days of the A/B test, 64 users blacklisted 279 tracks, with an average of 4.36 per user and a median of 2.

    The A/B test

    The system went live on April 9, 2010. The controlled experiment ran from March 29 to April 6, with user split by HTTP cookie. The test group, exposed to algorithmic recommendations, added 310 tracks to playlists. The control group added 36. The test period averaged 225 track additions to playlists per day; the period before the test averaged 147.5 per day.

    The 310 records from the test group correspond to 309 playlists from 308 distinct users, with 1,491 unique tracks added and an average of 6.56 tracks per playlist. 330 users interacted with playlists or blacklists during the period; 42 used both simultaneously. The full methodology is in the WTI 2010 paper.

    Palco 3.0

    In 2008 the research work was formalised as Palco 3.0, co-funded by QREN and FEDER. The consortium included INESC Porto (LIAAD and CRACS), the Faculty of Sciences of the University of Porto (FCUP), the Faculty of Engineering (FEUP), and the Faculty of Economics Porto (FEP). The Palco Principal team was Joao Carvalho, Pedro Trindade, and Daniel Botelho. The project ran until October 2011.

    The four deliverables were: Palco Principal 3.0 (the evolved production portal), PTECH (a generic, reusable technology platform), Palco Loja (an e-commerce prototype), and Palco Movel (a mobile platform prototype). Mobile solutions were subcontracted to Shortcut.

    The academic partners produced published work on incremental collaborative filtering with forgetting mechanisms, by Joao Vinagre and Alipio Mario Jorge (FCUP and LIAAD-INESC Porto). Four algorithms were evaluated: UBSW, IBSW, UBFF, and IBFF, tested across four datasets including the real Palco Principal data, published as the MUSIC dataset: 785 users, 3,121 items, 9,128 transactions. Results showed UBFF processing updates in under 0.1 seconds for the MUSIC dataset, while IBFF could take up to 10 seconds due to the item count (3,121). The papers are here and here.

    How it ended

    The 2008 financial crisis collapsed the online advertising market in Portugal. Advertising was the primary revenue of Palco Principal. The team continued, grew the user base, and finished the Palco 3.0 research. The platform eventually closed. The accumulated knowledge, including recommendation infrastructure, behavioural analysis, and content management at scale, fed directly into what would become GFoundry.

  • Three platforms, one team: shipping for Heathrow Airport

    In January 2012, the App Store was less than four years old. Most software companies were still working out what it meant to build for mobile devices with the hardware constraints of the time: slow processors, limited memory, unstable networks, and SDKs that changed with every OS release. Few teams had experience shipping native applications with the availability, data volume, and security requirements comparable to critical infrastructure.

    Ubbin Labs took on the project of developing native applications for Heathrow Airport in London, one of the busiest airports in the world. The requirement was clear: three platforms simultaneously. iPhone. BlackBerry. Nokia QT. Not three simplified versions, but three complete native implementations built from scratch.

    What the application did

    The application gave users real-time access to information on all flights in the airport, with focus on their specific flight. A personalised alert system notified gate changes, delays, and status updates. Each trip could be planned in advance, with information about the destination city and country and available transport options.

    The airport map was available in detail, with the location and information for every commercial establishment inside the terminals. The parking module let users photograph their vehicle on arrival at the car park and recover its geo-location on return. For frequent travellers through Heathrow, losing a car in a park of tens of thousands of spaces is a real problem. The module solved it.

    Three codebases, one product

    There was no cross-platform development framework with sufficient maturity for this type of application in 2012. React Native did not exist. The only path was three separate native implementations: Objective-C for iPhone, Java for BlackBerry, C++ with Qt for Nokia. Three SDKs, three development environments, three release pipelines, one backend feeding all of them.

    Any change to the data model propagated through all three implementations. A bug could be platform-specific or shared across all three. Testing meant covering three failure surfaces with different behaviour. Synchronising authentication state, managing sessions, and ensuring alerts arrived with the same reliability on BlackBerry OS and iOS were each a different problem depending on the platform.

    The nature of the data added pressure: real-time flight information implies high volumes, high update frequency, and zero tolerance for display errors. A passenger who sees the wrong gate on their phone because a sync failed is a passenger who misses their flight. Reliability was not a quality requirement. It was the basic premise of the product.

    What it cost

    There were nights that did not end. Not for lack of organisation, but because the problems were genuinely hard and there were no shortcuts available. Debugging a synchronisation issue that only appeared on BlackBerry meant switching runtime, debugger, and mental model, multiple times a day. The team learned to work in parallel across three distinct technical contexts without losing coherence in the final product.

    The applications shipped in January 2012. They worked. Heathrow Airport used them.

    What the team came away with was not just three apps in production. It was direct experience with cross-platform mobile architecture at a moment when most engineers were still figuring out how to build for one platform. That experience compounded. The same team went on to build Gamefoundry and, later, GFoundry.

  • What breaks when 20,000 artists depend on your ranking algorithm

    Palco Principal did not start in 2006. It started in 1999, with Homestudio, one of the first portals for independent bands and music projects in Portugal. For six years Homestudio was a meeting point for musicians who wanted to publish their work online, at a time when the alternatives barely existed. In 2006 that base became Palco Principal, a music social network built from scratch with proprietary technology.

    The platform grew quickly. A NEOTEC grant supported the launch. A partnership with the Clix portal was established in 2007. In 2009 Palco Principal joined the SAPO network. Partnerships with EMI, Universal Music Portugal, Valentim de Carvalho, and Farol Musica brought mainstream catalogue alongside 20,000 independent artists. Presence in Portugal, Brazil, Angola, Mozambique, and Cape Verde. The first national music site to have applications for Hi5, Myspace, and Orkut. By November 2011: more than 350,000 visitors per month, more than 1.9 million pageviews, more than 100,000 registered listeners, more than 70,000 tracks available.

    For those 20,000 artists, most without a label and without a marketing budget, Palco Principal was the primary distribution mechanism. The ranking algorithm decided who appeared in featured positions, who appeared in suggestions, who appeared in search results. It was a proprietary system combining play counts, listener interaction, playlist data, and activity signals. It was not explained publicly. It ran, and its results shaped how artists were discovered.

    A technical differentiator that few noticed

    Palco Principal was one of the only sites that did not reduce the bitrate of uploaded tracks. Original quality was preserved intact. At a time when almost every platform compressed files to cut storage and bandwidth costs, this decision had real infrastructure cost. But it meant that a musician uploading high-quality audio saw that quality reflected in the listener experience. It was a signal of respect for the artists’ work, and a real technical differentiator that most press coverage at the time never discussed.

    The recommendation system, built on item-based collaborative filtering with SQL stored procedures, was evaluated in an A/B test between March 29 and April 6, 2010: the group exposed to recommendations added 310 tracks to playlists; the control group added 36. The full methodology is in the WTI 2010 paper. The platform processed 38,000 tracks on a single PC with sub-second response time.

    The research that continued

    In 2008 the technical work was formalised as Palco 3.0, co-funded by QREN and FEDER, with INESC Porto, FCUP, FEUP, and FEP as academic partners. The project produced published research on incremental collaborative filtering with forgetting mechanisms, on association rule mining for ranking, and on intelligent systems for managing music social networks. The project summary and papers on forgetting mechanisms and association rules for label ranking are available here.

    The anonymised Palco Principal data was published by the academic community as the MUSIC dataset: 785 users, 3,121 items, 9,128 transactions. The research ran until October 2011.

    What broke the platform

    Not the algorithm. Not the technology. The 2008 financial crisis collapsed the online advertising market in Portugal. Digital marketing investment fell sharply. Advertising was the primary revenue source for Palco Principal. When that revenue disappeared, the platform had no viable business model.

    The team continued. They grew the user numbers, finished the Palco 3.0 research, kept the platform operational for years after the crisis. But without sufficient revenue to sustain operations, closure became inevitable.

    The most direct lesson: the technical work was real and the results were real, but the business infrastructure was fragile. A platform with 350,000 monthly visitors, 20,000 artists, and published academic research closed because it depended on a single revenue source that the macroeconomic context eliminated. These are separate problems with the same consequence when only one of them fails.

    What 20,000 artists lose

    When a platform closes, artists do not get their data back. They do not recover their play history, the geographic distribution of their listeners, their performance in recommendations, their playlist data. They lose the evidence the platform generated about their work over years, and start over elsewhere with nothing.

    The recommendation engine worked. The A/B test showed 310 additions against 36. The research was published. The platform closed anyway, and 20,000 artists lost their primary distribution channel.