The concatenation trick

The concatenation trick in data mining and analytics is an operation that allows the joint analysis of categorical and continuous (numerical) attributes. The trick requires the discretization of the continuous variables, after which they can be combined (concatenated) with the categorical variables into a single string, and subsequently analyzed. Recursive discretization can minimize discretization error and, depending on the goal, result in precise numerical analysis. It also deals with missing values in a natural way. The concatenation trick is described and evaluated in Foorthuis (2017) and Anomaly Detection with SECODA.

The SECODA algorithm detects various types of anomalies. It uses the concatenation trick in an iterative manner to estimate the joint density distribution of datasets with numerical and categorical variables. An implementation for R and various example datasets can be downloaded from this page.

Typology of anomalies: Web page and paper.

Updated: December 1st 2018
Ralph Foorthuis