A model validation procedure is applied to the Predictive Model every time it is retrained to ensure the model remains consistently reliable and accurate over time. The results of this procedure are displayed in the Validation Summary of the Pipeline with Predictive Model Dashboard. This article explains the procedure and how to interpret the results displayed in the dashboard.
The procedure starts by removing the most recent month’s worth of data from the dataset used to train the model. We call this month of data the validation dataset and the remaining data is referred to as the training dataset. As an example, let’s say it’s December 1, 2021 and we have been collecting data since the beginning of 2020. In this case, the validation dataset contains 30 data points each representing a day in November 2021. The training dataset contains the remainder of the data - January 2020 to the end of October 2021.
Next, the model is trained using only the training dataset. After this training, from the model’s point of view, making predictions for dates in November is the equivalent of predicting the future. This is exactly what the model is meant to do, making this a great situation to put the model to the test.
Because we already know how much was spent per channel per day during November and we have already measured the target variable for each day in November, we can use this information to test the model. So, the next step in the procedure is to ask the model to make predictions using the actual spend we’ve recorded from November.
The model can then be validated by comparing the actual target variable values against the predicted values. Because we are supplying the actual spends to the model, a good model will be able to produce predictions close to what actually happened.
The actual target values, predicted target values, and a percent difference between the two are shown in the dashboard. Initially, they are shown aggregated over the entire month of the validation dataset, but you can drill down further to see the results at daily granularity.
The percent difference between actual and predicted values provides an at-a-glance measurement of the model's performance, but we include the absolute values as well to provide more context. While a 50% difference may seem problematic at first glance, 50% of 2 conversions is much less concerning than 50% of 10,000 conversions.
There are no universal rules when it comes to deciding what level of disagreement is acceptable. It comes down to each individual business to decide what they are comfortable with and what they are not.