情報処理学会第73回全国大会講演要旨

2P-7

Parallel Distance Based Outlier Detection in Very Large Datasets on Hadoop environment

○邱　倩如，川島英之，北川博之（筑波大）

As we see, the volume of data being made publicly available such as the weather datasets increases every year. Finding outliers is an important and useful data-mining activity. Here we use a cell-based outlier detection algorithm which is proved to be far superior to the other algorithms for less then four-dimensional to detect outliers in large datasets. We use MapReduce which is a framework for processing huge datasets on certain kinds of distributable problems using a large number of computers collectively referred to as a cluster to deal with these datasets. Apache Hadoop implements this computational paradigm and it develops open-source software for all these related projects. So we have a preliminary approach that implement the cell-based outlier detect algorithm on Hadoop environment. Combining these two, we suppose to confirm that outlier detection can have a good application on Hadoop environment.

情報処理学会 第73回全国大会講演要旨

情報処理学会第73回全国大会講演要旨