Moving Z-Score

Introduction

The moving Z-score for a point $x_t$ is defined as the value of $x_t$ standardized by subtracting the moving mean just prior to time $t$ and dividing by the moving standard deviation just prior to $t$. Suppose $w$ is the abbreviation for the window_size in terms of the number of observations. Then the moving Z-score is:

$z\left({x}_{t}\right)=\frac{{x}_{t}-{\overline{x}}_{t}}{{s}_{t}}$

where the moving average is:

${\overline{x}}_{t}=\left(1/w\right)\sum _{i=t-w}^{t-1}{x}_{i}$

and the moving standard deviation is:

${s}_{t}=\sqrt{\left(1/w\right)\sum _{i=t-w}^{t-1}\left({x}_{i}-{\overline{x}}_{t}{\right)}^{2}}.$

Since there are not sufficient points to calculate the moving average and moving standard deviation at the beginning, we suppose that the moving Z-score at points within window_size observations of the beginning of a series are undefined. The scores of these values are represented by missing (undefined) values.

Whenever there is no variation in the values preceding a given observation (i.e. a series of constant values), the moving Z-score can be infinite or undefined.

Notice:

Moving z-score is calculated under a set of assumptions that can limit its applicability in the real world problems. These assumptions are as foolows:

• in a certain window data points may have a distribution with definite variance (variance is not infinite)
• and mean of the data points is not undefined.

For example, in case of Cauchy and Levy distributions have infinite variances and also means of the distributions are undefined, z-score could not be very helpful for finding the abnormal data points. The main advantages of this method are

• its simplicity
• and the capability of it to be used for finding anomalous data points in a stream of data (online anomaly detection).

