The concatenation trick



The concatenation trick in data mining and analytics is an operation that allows the joint analysis of categorical and continuous (numerical) attributes. The trick requires the discretization of the continuous variables, after which they can be combined (concatenated) with the categorical variables into a single string, and subsequently analyzed. Recursive discretization can minimize discretization error and, depending on the goal, result in precise numerical analysis. It also deals with missing values in a natural way. The concatenation trick is described and evaluated in Foorthuis (2017) and Anomaly Detection with SECODA.

The SECODA algorithm detects various types of anomalies. It uses the concatenation trick in an iterative manner to estimate the joint density distribution of datasets with numerical and categorical variables. An implementation for R and various example datasets can be downloaded from this page.




Sources:
Foorthuis, R.M. (2017). SECODA: Segmentation- and Combination-Based Detection of Anomalies. In: Proceedings of the 4th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2017), Tokyo, Japan.
Foorthuis, R.M. (2017). Anomaly Detection with SECODA. IEEE DSAA 2017 Poster Presentation (DSAA 2017), Tokyo, Japan.
Foorthuis, R.M. (2018). The Impact of Discretization Method on the Detection of Six Types of Anomalies in Datasets. In: Proceedings of the 30th Benelux Conference on Artificial Intelligence (BNAIC 2018), November 8-9 2018, Den Bosch, the Netherlands.
Typology of anomalies: Web page and paper.




Updated: December 1st 2018
Ralph Foorthuis