Abstract #T172

# T172
Prediction of dairy cow retention pay-offs with k-nearest neighbors methods.
A. Beyi*1, A. De Vries1, 1University of Florida, Gainesville, Florida.

Dairy cow retention pay-offs (RPO) are typically calculated with a dynamic programming (DP) model. Alternatively, a large data set of pre-calculated RPO might be useful if it can predict the RPO of cows in new herds with sufficient accuracy. Objective was to investigate k-nearest neighbor (KNN) methods to predict RPO for new herds. Given a set of herd input variables, 2,304 RPO were calculated for non-pregnant cows varying by parity, month in milk, and relative level of milk yield with a DP model. We calculated the RPO for 500 sets of input variables which varied by heifer price, calf price, and body weight price. Mean of RPO in the 500 sets was $71 (min -$492, max $4,017). The data were divided into a training collection (450 sets) and a test collection (50 sets). The KNN method calculates similarity by (weighted) Euclidian distance between the inputs in the test collection and those in the training collection and selects those k = 5 training sets with the best similarity. The RPO for each test set were predicted by 3 variants of KNN: simple average of 5 RPO (KNNs), average of 5 RPO weighted by simple Euclidean distances (KNNw), and simple average of 5 RPO using weights from a linear regression of the 3 predictors (KNNf). Performances were assessed by similarity measures of the 2,304 RPO in the test set and the predicted RPO: root mean square error (RMSE), relative absolute error (RAE), and minimum and maximum prediction errors. Results are in the Table 1. Although average prediction errors were sufficiently small, some large prediction errors remained. In conclusion, K-Nearest Neighbors Methods and a large RPO data set may produce sufficiently accurate RPO without the need of a DP model. Table 1. Performance results of 50 test sets with the 3 k-nearest neighbors (KNN) methods
CriterionKNNsKNNwKNNf
MinMeanMaxMinMeanMaxMinMeanMax
RMSE2.1415.2242.291.3712.8840.240.8612.0336.90
RAE0.36%2.96%11.49%0.27%2.50%9.84%0.13%2.35%7.75%
Min error−213−374−180−313−131−257
Max error−829132−726129−527157

Key Words: data mining, k-nearest neighbors