Implementation of time series data clustering based on SVD for stock data analysis on hadoop platform
Implementation of time series data clustering based on SVD for stock data analysis on hadoop platform.With a growing amount of data, a viable solution is to use a cluster consisting of a large of computers for parallel processing, and Hadoop parallel computing platform is a typical representative. Clustering analysis for time series data is one of the main methods mining time series data, however, general clustering algorithms can’t perform clustering for time series data directly since series data has a special structure.
The time series clustering algorithm presented is a combining algorithm from algorithms of Canopy and K-means based on SVD. Using singular value decomposition for feature extraction from the time series data, and then use Canopy and K-means algorithms to clustering analysis the feature data of the time series, at last, the algorithm is implemented on Hadoop platform by Mahout leading to a new clustering method that can handle massive time series data. Finally, this new clustering analysis method is successfully applied to real stock time series data with a satisfactory result.
Similar IEEE Project Titles
- Spatial computations over terabyte-sized images on hadoop platforms
- A new solution of data security accessing for Hadoop based on CP-ABE
- A survey on security of Hadoop
- An Architecture for Orchestrating Hadoop Applications in Hybrid Cloud
- Astro: A predictive model for anomaly detection and feedback-based scheduling on Hadoop