On scalable data mining techniques for earth science

Markus Götz, Matthias Richerzhagen, Christian Bodenstein, Gabriele Cavallaro, Philipp Glock, Morris Riedel, Jón Atli Benediktsson

Research output: Contribution to journalConference articlepeer-review

7 Citations (Scopus)

Abstract

One of the observations made in earth data science is the massive increase of data volume (e.g, higher resolution measurements) and dimensionality (e.g. hyper-spectral bands). Traditional data mining tools (Matlab, R, etc.) are becoming redundant in the analysis of these datasets, as they are unable to process or even load the data. Parallel and scalable techniques, though, bear the potential to overcome these limitations. In this contribution we therefore evaluate said techniques in a High Performance Computing (HPC) environment on the basis of two earth science case studies: (a) Density-based Spatial Clustering of Applications with Noise (DBSCAN) for automated outlier detection and noise reduction in a 3D point cloud and (b) land cover type classification using multi-class Support Vector Machines (SVMs) in multispectral satellite images. The paper compares implementations of the algorithms in traditional data mining tools with HPC realizations and 'big data' technology stacks. Our analysis reveals that a wide variety of them are not yet suited to deal with the coming challenges of data mining tasks in earth sciences.

Original languageEnglish
Pages (from-to)2188-2197
Number of pages10
JournalProcedia Computer Science
Volume51
Issue number1
DOIs
Publication statusPublished - 2015
EventInternational Conference on Computational Science, ICCS 2002 - Amsterdam, Netherlands
Duration: 21 Apr 200224 Apr 2002

Bibliographical note

Publisher Copyright:
© The Authors. Published by Elsevier B.V.

Other keywords

  • Data mining
  • DBSCAN
  • HPC
  • Machine learning
  • MPI
  • SVM

Fingerprint

Dive into the research topics of 'On scalable data mining techniques for earth science'. Together they form a unique fingerprint.

Cite this