Fake volumes

The set of indicators is aimed at detecting trading anomalies and suspicious behaviour by analysing delays between the consequent trades.

DISCLAIMER: The term 'Fake volumes' is used to collectively name trading anomalies and suspicious trading behaviour that are detected by triggering critical values of the specific indicators constructed in accordance with the following methodology. We neither claim nor intend to claim whether a certain exchange is involved in or a certain symbol has properties of fake (i.e., artificially inflated) volumes or wash trading. Nor we intend to give any kind of advice on an exchange or an asset. The only aim of these indicators is to show some anomalies and non-expected trading behavior. The term 'Fake volumes' is used solely as convenience.

Indicator #1: Periods with Artificial Trade Frequency

Idea

One of the widespread anomalies is the one when the interval between the consequent trades is regular, i.e., trades happen every, say, 250 milliseconds or every 1 second. Our hypothesis is that such a pattern indicates artificial trading activity. The proposed indicator tries to detect such periods on a particular symbol.

LSK/[email protected] has some suspicious pattern: every 1 sec occurs a great number of trades

Terms

  • period - the number of days in the sample

  • bar - the value of a delay between consequent trades (in ms); the terms 'bar' and 'delay are used interchangeably

  • precision - specific precision for rounding delays

  • low limit - the parameter is used to bound the sample

  • high limit - the parameter is used to bound the sample

  • peak scale - the parameter is used to filter insignificant local peaks

  • min peak count - filter out peaks with rare periods

  • cov offset - filter out peaks periods with big coefficient of variance

Algorithm

  1. Given precision, round delays between consequent trades for the latest period

  2. Bound the sample, i.e. delays beyond low limit and high limit

  3. For each bar, calculate ratio of this bar to the mean of the previous and next bars

  4. Consider only bars with ratio greater than peak scale (these bars are peaks)

  5. Calculate intervals between peaks

  6. Consider only popular intervals (popular intervals are those observed 10 times or more)

  7. Test whether the most popular interval is statistically significant or random.

  8. If the popular interval is not random, the indicator's value for a symbol is TRUE.

Critical values

  • period = 7

  • precision = 10

  • low limit = 1,000

  • high limit = 10,000

  • peak scale = 1.5

  • min peak count = 10

  • cov offset = 0.25

Indicator #2: Low Trade Frequency Variation

Idea

This indicator analyses average daily delay between the consequent trades. If daily delays tend to be alike from day to day, it causes suspicion that such a trading pattern is artificial.

Terms

  • period - the number of days in the sample

  • threshold - a critical value

Algorithm

  1. Calculate mean daily delays for the last period

  2. Exclude outliers: all those smaller or greater than {mean delay ± 2 * standard deviation}

  3. Calculate again standard deviation and mean for the filtered dataset

  4. Calculate ratio of the standard deviation to the mean

  5. The resulting value below the threshold indicates artificial trading activity

  6. The indicator has 2 outcomes: True (anomaly detected) and False

Critical values

  • period = 30

  • threshold = 0.15

Indicator #3: Regime Change of Trade Frequency

Idea

As the market evolves, wash trading bots are becoming more sophisticated. If the bots change their trading patterns within a month, the indicator Low Trade Frequency Variation fails to detect an anomaly. Thus, we have implemented a new one that detects regime change. The idea here is to divide a dataset into subperiods with similar delays between the consequent trades. The outcome is the number of subperiods, if any, with anomalies.

Terms

  • period - the observable dataset (in days)

  • lag - rolling window size for calculating rolling standard deviation (in days)

Algorithm

  1. Calculate mean daily delays for the last period

  2. Exclude outliers: all those smaller or greater than {mean delay ± 2 * standard deviation}

  3. Calculate normalized delays: X=xxˉσX = \frac{x-x̄}{σ}, where x is a daily delay, is the mean of x, σ is the standard deviation of x

  4. Calculate rolling standard deviation of X with window size = lag (e.g., if period = 30 and lag = 7, then we get 23 windows)

  5. Calculate the number of periods where the rolling standard deviation of X is less than the threshold

Critical values

  • period = 30

  • lag = 7

  • threshold = 0.05

Indicator #4: Repeating Trade Frequency

Idea

Although the distribution of delays between the consequent trades may have different shapes, our analysis shows it usually takes a form similar to a log-normal distribution. If the distribution of delays is multimodal (e.g., the most frequent values are not only around 0 but also around 250 ms or 1 sec), it causes a suspicion of artificially generated trading activity. This indicator is designed to detect such modes.

BNB/[email protected], first 10 ms are excluded; 1 bar is 1 ms; data from 25-Mar till 23-Jul-2018

Terms

  • window map - a mapping of a window size to a precision

  • window size - the number of bars (half of the window size to the left and half - to the right)

  • precisions - an array of specific precisions for rounding delays

  • low limit - the parameter is used to bound the sample

  • high limit - the parameter is used to bound the sample

  • threshold - a critical value

  • Algorithm

  1. Consider only delays within [low limit, high limit] range

  2. Given precision, round delays (in ms) and count them

  3. Consider only those counts that are greater than the previous count; these are called peaks

  4. Analyse each peak by applying a linear regression to two datasets (the first dataset includes all delay values within a window; the second dataset includes all delay values within a window but immediate neighbours); calculate t-statistic for intercept

  5. Choose the linear regression with smaller standard error

  6. Consider only those counts where the t-statistic is greater than threshold

  7. Filter out peaks with number of trades less than a threshold

  8. If there is at least one peak, which is not filtered out, an anomaly is detected

Critical values

  • window map:

Precision, ms

Window size, bars

20

25

50

21

100

17

200

15

500

13

1000

11

2000

9

  • low limit = 500

  • high limit = 70,000

  • threshold = 0.0012