FIT2019 第18回情報科学技術フォーラム

電子情報通信学会情報・システムソサイエティ
電子情報通信学会ヒューマンコミュニケーショングループ
情報処理学会

抄録

CD-003
Query by Dataset Based on Instance Similarities Generated by Sentence Embeddings

◎姜　逸越・坂巻慶行・野間　唯（富士通研）

Before data analysis, it often takes three to four times longer for discovering and preparing relevant datasets from a heterogeneous database. The large data volume, lack of dataset’s domain knowledge, and their access limitations make it difficult to reduce the related manual workloads. We propose an automated method that uses sentence embeddings to extract features from instances of dataset attributes and the values. The similarities from sentence embeddings are further utilized to derive dataset relevance scores based on which relevant dataset pairs or clusters can be automatically found. We demonstrate results using heterogeneous open data and provide a quantitative evaluation.

A	モデル・アルゴリズム・プログラミング
B	ソフトウェア
C	ハードウェア・アーキテクチャ
D	データベース
E	自然言語・音声・音楽
F	人工知能・ゲーム
G	生体情報科学
H	画像認識・メディア理解
I	グラフィクス・画像
J	ヒューマンコミュニケーション＆インタラクション
K	教育工学・福祉工学・マルチメディア応用
L	ネットワーク・セキュリティ
M	ユビキタス・モバイルコンピューティング
N	教育・人文科学
O	情報システム