情報処理学会 第86回全国大会 会期:2024年3月15日~17日

7C-06
Self-Supervised Pre-training of Vision Transformers Using Stable Diffusion-Generated Images
○ルイス モルミレ,渥美雅保(創価大)
Traditional dataset building involves time-consuming tasks such as web scraping, cleaning, and labeling. Our proposed method utilizes a fast stable diffusion technique to efficiently generate synthetic images from text prompts, eliminating the need for manual data collection while mitigating biases and mislabeling. We conduct experiments with a vision transformer, comparing models trained on real datasets, datasets enhanced with synthetic images, and fully synthetic datasets. The results showcase the efficacy of stable diffusion-synthesized images in enhancing model generalization and accuracy, highlighting the potential of this approach in the realm of computer vision.