An Improved Apriori Algorithm Research in Massive Data Environment

Smart grid computing environment is an information platform that has lots of production data, data management, and the real-time and non real-time data. Under such massive data environment, the classic Apriori algorithm of mining association rules has a significant performance bottleneck. After analyzing the Apriori algorithm, the MapReduce programming model is used to realize the parallel Apriori algorithm. In order to improve the mining efficiency further, auxiliary tables and attribute columns are added and parallel strategy is improved in the process of candidate itemsets generation. Simulation experiments show that the improved Apriori algorithm can effectively reduce the algorithm execution time and improve the efficiency of data mining under the massive data environment.

This is a preview of subscription content, log in via an institution to check access.