Data shift monitoring in machine learning models

Dmitry Namiot

Eugene Ilyushin

Resumen

The fundamental moment of the operation of machine learning systems is that the models are trained on some selected training data set. Accordingly, the generalizations obtained at the training stage are due to the characteristics of some subset of the general population. If the characteristics of the data change during the operation of the system, then generalizations of the model become, generally speaking, untenable. At the same time, such a change in data should be considered the rule rather than the exception. This change in data characteristics is called data shift. This, in turn, means that any machine learning system that claims to be industrial must track the possible data shift. The presence of such a shift reduces the confidence in the results of the work or even makes the system unsuitable for further operation. Taking into account (overcoming) such a data shift is a separate task, simple retraining can be a big problem for critical applications, for example. But in any case, the first task is to determine the fact of data shift. The data shift itself is divided into several types, the most serious of which is a change in the relationship between dependent and independent variables. Naturally, the definition of data offset for streams is of particular interest, since this is directly related to critical applications.

Acceso

PÁGINAS

pp. 84 - 93

NÚMERO

Volumen: 10 Número: 12 Parte: 0 (2022)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Data shift monitoring in machine learning models

Artículos similares

Revistas destacadas