Investigating Domain Gaps for Indoor 3D Object Detection

Accepted by ACM MM 2025

[Arxiv] [Baseline code] [Dataset code] [Dataset download]

Zijing Zhao1, Zhu Xu1, Qingchao Chen2, Yuxin Peng1, Yang Liu1,†

Wangxuan Institute of Computer Technology, Peking University1

National Institute of Health Data Science, Peking University2

Abstract

As a fundamental task for indoor scene understanding, 3D object detection has been extensively studied, and the accuracy on indoor point cloud data has been substantially improved. However, existing researches have been conducted on limited datasets, where the training and testing sets share the same distribution. In this paper, we consider the task of adapting indoor 3D object detectors from one dataset to another, presenting a comprehensive benchmark with ScanNet, SUN RGB-D and 3D Front datasets, as well as our newly proposed large-scale datasets ProcTHOR-OD and ProcFront generated by a 3D simulator. Since indoor point cloud datasets are collected and constructed in different ways, the object detectors are likely to overfit to specific factors within each dataset, such as point cloud quality, bounding box layout and instance features. We conduct experiments across datasets on different adaptation scenarios including synthetic-to-real adaptation, point cloud quality adaptation, layout adaptation and instance feature adaptation, analyzing the impact of different domain gaps on 3D object detectors. We also introduce several approaches to improve adaptation performances, providing baselines for domain adaptive indoor 3D object detection, hoping that future works may propose detectors with stronger generalization ability across domains.

Unified format of indoor 3D object detection datasets

Our proposed ProcTHOR-OD and ProcFront datasets

To address the data scale issue, researches have tried to use simulators to create large-scale datasets with precise annotations at a low cost such. However, the human designed indoor scenes still faces the high cost of construction and can hardly be scaled up. To address the scarcity of synthetic indoor 3D object detection datasets and to enable controlled analysis of domain gap factors, we construct fully synthetic datasets using the AI2-THOR simulation platform.

ProcTHOR-OD

Our proposed ProcTHOR-OD dataset is a large-scale synthetic dataset for object detection in 3D. It uses ProcTHOR generation pipeline to automatically generate 3D single room layouts, with accurate annotations of objects and their poses for object detection task.

ProcFront

ProcFront shares the same room layouts with ProcTHOR, but integrates instances from 3D Front dataset to isolate the domain gap of layout and instance for domain adaptation investigations.

Data generation

We provide the generation pipeline for ProcTHOR-OD and ProcFront datasets in: [ProcTHOR-OD dataset code].

Our code uses AI2-THOR to generate 3D layouts for ProcTHOR-OD. With the generated layouts, we provide code to export 3D scenes in mesh format. With our code, you can generate 3D indoor scenes of any scale, demonstrating flexibility and extensibility.

We integrate 3D Front instances into ProcTHOR-OD dataset to generate ProcFront, isolating the domain gap of layout and instance for domain adaptation investigations. We also provide the integration code to generate ProcFront.

Dataset download

We generate 10,000 indoor scenes for both ProcTHOR-OD and ProcFront dataset. The scene number of both ProcTHOR-OD and ProcFront is more than 5 times larger than ScanNet, and the number of bounding box annotations is an order of magnitude larger than ScanNet and SUN RGB-D.

Our generated ProcTHOR-OD and ProcFront datasets are available for download: [ProcTHOR-OD download] & [ProcFront download].

Existing datasets

We provide the unified format of indoor 3D object detection datasets:

ScanNet

ScanNet dataset is a high quality real-world dataset collected from 3D scanners.

SUN RGB-D

SUN RGB-D dataset is a large-scale dataset collected from real-world single RGB-D images, thus exhibiting low quality point clouds with obvious point omissions.

3D Front

3D Front dataset is a synthetic dataset constructed by placing synthetic 3D models into rooms by human expert designers. It contains uniformly sampled high-quality 3D point clouds, but still lacks extensibility and realism.

Dataset processing code

We provide the processing code for all the above datasets in: [Data processing code].

We convert 3D scenes of mesh or RGB-D formats into point clouds with instance bounding boxes for domain adaptation investigations.

Domain adaptation benchmarks for indoor 3D object detection

We construct 6 benchmarks for domain adaptation investigations, including:

Benchmark name Source domain dataset Target domain dataset Adaptation scenario
proc2scan ProcTHOR-OD ScanNet synthetic-to-real
front2scan 3D Front ScanNet synthetic-to-real
front2sun 3D Front SUN RGB-D synthetic-to-real
scan2sun ScanNet SUN RGB-D point cloud quality
pf2front ProcFront 3D Front layout adaptation
proc2pf ProcTHOR-OD ProcFront instance adaptation

Domain adaptation baselines

We provide 7 training scripts for each adaptation benchmarks: [Domain adaptation baselines], including:

The detailed results of the baseline methods are shown in: [Baseline code]. We hope that our work can inspire the community to further study domain adaptation for point cloud data.

Citation

The bibtex of this paper will be coming soon.