Ensembles of classifier chains for multi-label classification based on Spark
-
Abstract
With the wide application of data mining technology, multi-label learning has become a hot topic in the data mining domain. Although ensembles of classifier chains (ECC) algorithm is a multi-label learning method which is effective and accurate, its complexity of time and space is so high that it cannot adapt to the large-scale multi-label classification tasks. A new algorithm named Spark ensembles of classifier chains(S-ECC) was proposed based on Spark platform on which a parallel implementation was conducted of each step of the sequential ECC algorithm. The test results in stand-alone and cluster environments show that S-ECC has a good adaptability to large-scale data with a high speedup, and that it is no less capable than the traditional sequential program.
-
-