One of the observations made in earth data science is the massive increase of data volume (e.g, higher resolution measurements) and dimensionality (e.g. hyper-spectral bands). Traditional data mining tools (Matlab, R, etc.) are becoming redundant in the analysis of these datasets, as they are unable to process or even load the data. Parallel and scalable techniques, though, bear the potential to overcome these limitations. In this contribution we therefore evaluate said techniques in a High Performance Computing (HPC) environment on the basis of two earth science case studies: (a) Density-based Spatial Clustering of Applications with Noise (DBSCAN) for automated outlier detection and noise reduction in a 3D point cloud and (b) land cover type classification using multi-class Support Vector Machines (SVMs) in multispectral satellite images. The paper compares implementations of the algorithms in traditional data mining tools with HPC realizations and 'big data' technology stacks. Our analysis reveals that a wide variety of them are not yet suited to deal with the coming challenges of data mining tasks in earth sciences.
|Number of pages||10|
|Journal||Procedia Computer Science|
|Publication status||Published - 2015|
|Event||International Conference on Computational Science, ICCS 2002 - Amsterdam, Netherlands|
Duration: 21 Apr 2002 → 24 Apr 2002
Bibliographical notePublisher Copyright:
© The Authors. Published by Elsevier B.V.
- Data mining
- Machine learning