Picture this: you’re leisurely scrolling through Instagram and stumble upon a string of ads that feel like they’ve been tailor-made just for you. They’re not only appealing, but they’re also surprisingly spot-on with your current interests. Or consider the endless stream of videos on YouTube’s recommended feed — even though they aren’t necessarily related to your search history, they captivate your attention and ignite a curiosity you never knew existed. And who could forget how TikTok’s never-ending stream of short videos can keep you watching for hours, with each video as interesting as the one before?
Welcome to the world of your favourite apps, where technology anticipates your desires and interests with uncanny precision. But have you ever wondered what makes this possible? How do these apps seem to ‘know’ you so well? Behind the scenes, we all know that Machine Learning is at play. But what does that mean? And how does it work?
A subset of AI (artificial intelligence) known as “machine learning” (ML) allows computers to learn from data and enhance performance without having to be explicitly programmed. The heart of this technology lies in building learning algorithms that effectively identify patterns and make decisions.
Now, while each app might appear to cater to a unique set of needs on the surface, a closer look reveals an intriguing commonality. After years of experimentation, top tech companies have independently developed a shared blueprint for building these recommendation systems. In addition, this architecture is domain-agnostic and may support any application you can think of, including e-commerce, feeds, search, notifications, email marketing, etc.
Modern recommendation systems comprise five layers – Retrieval, Filtering, Feature Extraction, Model Scoring & Ranking. Lets us briefly get into it one by one –
Products like YouTube have millions or billions of things to show in each suggestion unit. There are so many that no ML model can score all of them while the user waits for their feed to load. So, instead of giving each thing in the inventory a score, a process called “Retrieval” is used to get a subset of the inventory that is easier to work with.
The retrieval layer selects a small subset of items from a vast catalogue based on the user’s context. The context could be user behaviour, user profile, or environmental factors like the time of the day or current trending items. For instance, when you open TikTok, the app’s algorithms retrieve a collection of videos that align with your past behaviour and current trends.
In the past, the retrieval layer leaned heavily on heuristics and algorithms such as collaborative filtering. However, with the evolution of ML, more advanced methods like vector search are gaining traction.
It’s key to understand that ML-based methods aren’t entirely replacing the older rule-based techniques; instead, the retrieval stage often uses a blend of both. Each technique contributes its own set of potential choices to the mix. This broad range of candidate items is combined and moved forward to the next step in the process, Filtering.
Once the system retrieves a broader set of items, the filtering layer comes into play. The goal here is to exclude items the user wouldn’t be interested in or shouldn’t see due to regulatory or policy restrictions. For example, YouTube’s system may filter out videos with age limits based on the user’s profile, or if you are building an OTT video platform, you may need to do some geo-licensing-based filtering. This would ensure content isn’t inadvertently shown to audiences in regions where it’s not legally permitted or licensed.
In simpler terms, filtering is like a gatekeeper. It weeds out items that users won’t be interested in or shouldn’t see and leaves a selection of appealing and appropriate recommendations behind. So, the filtering stage is all about balancing what users would like and what they’re allowed to see to create a personalised and legal viewing experience.
After filtering, the list of possibilities is shorter but still a couple of hundred candidates long. We must choose the top ten items to show the user.
But before scoring can begin, we must obtain many signals about each candidate. These signals are also known as “features”.
The features try to cover a large variety of signals from basic user information, like age, location, and device used, to more complex and derived indicators such as the user’s browsing patterns, content engagement history, preferences inferred from past interactions, time spent on certain types of content, frequency and recency of app usage, to even more sophisticated behavioural patterns. These can include patterns deduced from the user’s interaction with the system, like their response to specific content categories when they’re most active or their propensity to engage with personalised recommendations. The features also encompass information about the candidates themselves — their relevance, popularity, recentness, or any other characteristics that might influence the likelihood of them being chosen by the user.
These features, drawn from wide-ranging data sources, provide a multi-dimensional perspective of the user’s interaction with the application. This comprehensive view helps create more accurate, personalised, and effective recommendations, improving user experience and engagement.
In the model scoring layer, ML models incorporate the retrieved features into predictions about the item’s likelihood of eliciting a user response.
In the past, these were traditionally simpler models, such as linear regression or logistic regression. These models worked by assigning a weight to each feature and summing up these weights to give a score. The higher the score, the more likely it was considered that the user would engage with the item. They were effective for relatively straightforward relationships between features and the output but limited in their ability to capture complex, non-linear relationships and interactions between features.
However, advanced ML techniques have revolutionised the model scoring layer. More sophisticated models such as decision trees, random forests, gradient boosting machines, deep learning neural networks, and reinforcement learning models are being utilised. These models can handle various data types, capture complex patterns, model non-linear relationships between the features and the output, and handle many features.
While these advanced models require more computational power and technical expertise to implement, the improvements in recommendation quality are often well worth it.
After all of the candidates’ scores have been added up, the system goes on to the last step, ranking. In the most straightforward systems, this step is as easy as sorting all the candidates by their scores and taking the top K. Scores are influenced by non-ML business rules in increasingly complex systems. Many products, for instance, need to mix up the search results slightly, for example, so as not to always show content from the same publisher or author.
In other cases, the system may combine scores from multiple models to assign a unique score. In social networking apps, for instance, multiple models are trained – say, one for predicting clicks, one for predicting comments, another one for user reporting the content, etc. And a candidate’s final score is a weighted average of all these models. This is also known as value function modelling.
The final top ten or more results are finally sent to the user. This happens in real-time, within milliseconds every time you use the app, thanks to the complex architecture and infrastructure supporting these recommendation systems.
The next time you open your favourite app and marvel at how it seems to “know” you, remember that it’s no magic but rather a carefully designed process harnessing the power of ML. A process that involves a multi-layered approach, including Retrieval, filtering, feature extraction, model scoring, and ranking, all working together in harmony. Each layer, though distinct, interacts and collaborates with the others, providing a seamless, efficient and personalised user experience.