Expert Systems with Applications, 87, 15–29p. (2017) DOI:10.1016/j.eswa.2017.06.005

Multidimensional surrogate stability to detect data stream concept drift

F. G. D. Costa, F. S. L. G. Duarte, R. M. M. Vallim, R. F. D. Mello

Concept drift detection plays a very important role in the context of data streams. It allows to point out data behavior modifications along time, which are intrinsically associated to the phenomena responsible for producing such sequences of observations. By detecting such modifications, one can better understand those phenomena and take better decisions in different application domains, e.g. stock market, climate change, population growth, etc. Besides several proposals, most of the studies lack in formal guarantees to ensure the concept drift detection. More recently, Vallim and Mello proposed 1DFT (Unidimensional Fourier Transform), an algorithm that detects drifts on unidimensional streams while holding a stability property based on surrogate series. Motivated by their work we here propose the multidimensional surrogate stability concept, which extends their approach to multidimensional data streams. In addition, our approach, named MDFT (Multidimensional Fourier Transform), employs a different and more robust measurement to analyze drifts, which is based on the Shannon's and Von Neumann's Entropies to quantify variations in data spaces. As final contribution, MDFT allows unidimensional streams to be reconstructed in phase spaces so their data dependencies can also be analyzed to take conclusions on concept drifts along time. Experiments considered seven 120,000-observation synthetic data streams. Synthetic data was taken into account as it allows us to define the exact points of change, using the largest Lyapunov exponent, for which our approach should trigger the concept drift events. Experiments compared MDFT against the main algorithms to detect concept drift in the context of Machine Learning (Page-Hinkley Test – PHT, Adaptive Windowing – ADWIN, and Cumulative Sum Control Chart – CUSUM) and Dynamical Systems (Recurrence Quantification Analysis using different measurements – RQA, and Permutation Entropy – PE). Results confirm MDFT outperforms the other algorithms in terms of an average measurement (using the Euclidean distance) based on: the Missed Detection Rate (MDR), the Mean Time for Detection (MTD) and the Mean Time between False Alarms (MTFA).

back


Creative Commons License © 2017 SOME RIGHTS RESERVED
The content of this web site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.0 Germany License.

Please note: The abstracts of the bibliography database may underly other copyrights.

Ihr Browser versucht gerade eine Seite aus dem sogenannten Internet auszudrucken. Das Internet ist ein weltweites Netzwerk von Computern, das den Menschen ganz neue Möglichkeiten der Kommunikation bietet.

Da Politiker im Regelfall von neuen Dingen nichts verstehen, halten wir es für notwendig, sie davor zu schützen. Dies ist im beidseitigen Interesse, da unnötige Angstzustände bei Ihnen verhindert werden, ebenso wie es uns vor profilierungs- und machtsüchtigen Politikern schützt.

Sollten Sie der Meinung sein, dass Sie diese Internetseite dennoch sehen sollten, so können Sie jederzeit durch normalen Gebrauch eines Internetbrowsers darauf zugreifen. Dazu sind aber minimale Computerkenntnisse erforderlich. Sollten Sie diese nicht haben, vergessen Sie einfach dieses Internet und lassen uns in Ruhe.

Die Umgehung dieser Ausdrucksperre ist nach §95a UrhG verboten.

Mehr Informationen unter www.politiker-stopp.de.