Skip to main content
  • CIS
    Members: Free
    IEEE Members: Free
    Non-members: Free
    Length: 01:02:52
20 Jul 2020

This talk concerns models and algorithms that are generally described as �streaming clustering�. Some of the semantics and methods that are used in this field are co-opted from static clustering: but often, they don�t serve their purposes for streaming data very well. A review of �state of the art� methods such as sequential k-means, Birch, Clustream, Denstream, etc. shows that methods borrowed from classical batch techniques don�t transfer well to the streaming data case. Most of these models fail to acknowledge that the data are seen but once in real streaming analysis (e.g., intrusion detection). When the data are not saved, batch clustering ideas such as pre-clustering assessment, partitioning, and cluster validity are not relevant. I do not argue that current approaches to streaming clustering are wrong: rather, they are transitional methods which will eventually lead to a new and useful paradigm for this type of computation.
I will characterize what seem to be the important new problems presented by streaming data analysis. New terminology and approaches to mining information from data streams are suggested. Several new models are briefly reviewed and illustrated (albeit poorly, with small labeled data sets!). Then I will discuss four new incremental Stream Monitoring Functions and a new approach for visual assessment of streaming data. The conclusions? Useful analysis of real streaming data is in its infancy. We need to carefully define the objectives of streaming analysis, and then choose terminology and methods that suit this evolving paradigm.