Yesterday we covered the case of making recommendations when you have customer ratings for a set of products. In most cases you will have much more information! The price, type and quantity of the product are important in predicting customer interest, just as the location, interest and type of the customer. Each of those characteristics is called a feature and you may have dozens to consider.
The matrix factorization method is not ideal when you have dozens or hundreds of features to consider, because it becomes hard to represent them in a matrix. Instead, a better method is something called logistic regression which can predict whether a given user will like or dislike a given product. The process of using logistic regression is a bit involved so I’ll lay it out in a few steps.
- Data. First you assemble a list of all the users who both like and dislike a product and all the features of both the users and the product. It is important to gather as many positive (like) and negative (dislike) examples as possible.
- Training. Second, you train your logistic regression model on the data you assembled. You should set aside some of your data to test the model after you are done building it, and not use that data in training.
- Testing. Finally, you want to test that your model is working by seeing how accurately it can predict the customer preference for the test data you set aside. This step is known as cross-validation and is critical to ensure that your model can be trusted in the future.
- Usage. With your model in hand, you can feed into it any customer’s data and get a prediction of whether they will like the product.
Unlike the matrix factorization model which gave us an estimated rating for each product, the output of your logistic regression will be a value between 0 and 1. The closer the output is to each extreme, the most confident that estimate is for the customer. For example, if you assign 0 to “dislike” and 1 to “like” and the model outputs 0.87 you can be pretty confident that customer will like that product. However, if the model outputs 0.51 then you cannot be certain either way.
An important difference between logistic regression and matrix factorization is that you will have a different model for every product with logistic regression. This is important, because the features that influence a customer to buy one product might be different than another.
Now that you (hopefully) understand how recommendation systems work, tomorrow we will cover how to put them to work for you in making business decisions and improving your products.
Quote of the Day: “When Amazon recommends a product on its site, it is clearly not a coincidence.” – Fortune JP Mangalindan writer about Amazon