Abstract
As a fundamental task for indoor scene understanding, 3D object detection has been extensively studied, and the accuracy on indoor point cloud data has been substantially improved. However, existing researches have been conducted on limited datasets, where the training and testing sets share the same distribution. In this paper, we consider the task of adapting indoor 3D object detectors from one dataset to another, presenting a comprehensive benchmark with ScanNet, SUN RGB-D and 3D Front datasets, as well as our newly proposed large-scale datasets ProcTHOR-OD and ProcFront generated by a 3D simulator. Since indoor point cloud datasets are collected and constructed in different ways, the object detectors are likely to overfit to specific factors within each dataset, such as point cloud quality, bounding box layout and instance features. We conduct experiments across datasets on different adaptation scenarios including synthetic-to-real adaptation, point cloud quality adaptation, layout adaptation and instance feature adaptation, analyzing the impact of different domain gaps on 3D object detectors. We also introduce several approaches to improve adaptation performances, providing baselines for domain adaptive indoor 3D object detection, hoping that future works may propose detectors with stronger generalization ability across domains.
Unified format of indoor 3D object detection datasets
Our proposed ProcTHOR-OD and ProcFront datasets
To address the data scale issue, researches have tried to use simulators to create large-scale datasets with precise annotations at a low cost such. However, the human designed indoor scenes still faces the high cost of construction and can hardly be scaled up. To address the scarcity of synthetic indoor 3D object detection datasets and to enable controlled analysis of domain gap factors, we construct fully synthetic datasets using the AI2-THOR simulation platform.
ProcTHOR-OD
![]() |
![]() |
![]() |
![]() |
Our proposed ProcTHOR-OD dataset is a large-scale synthetic dataset for object detection in 3D. It uses ProcTHOR generation pipeline to automatically generate 3D single room layouts, with accurate annotations of objects and their poses for object detection task.
ProcFront
![]() |
![]() |
![]() |
![]() |
ProcFront shares the same room layouts with ProcTHOR, but integrates instances from 3D Front dataset to isolate the domain gap of layout and instance for domain adaptation investigations.
Data generation
We provide the generation pipeline for ProcTHOR-OD and ProcFront datasets in: [ProcTHOR-OD dataset code].
Our code uses AI2-THOR to generate 3D layouts for ProcTHOR-OD. With the generated layouts, we provide code to export 3D scenes in mesh format. With our code, you can generate 3D indoor scenes of any scale, demonstrating flexibility and extensibility.
We integrate 3D Front instances into ProcTHOR-OD dataset to generate ProcFront, isolating the domain gap of layout and instance for domain adaptation investigations. We also provide the integration code to generate ProcFront.
Dataset download
We generate 10,000 indoor scenes for both ProcTHOR-OD and ProcFront dataset. The scene number of both ProcTHOR-OD and ProcFront is more than 5 times larger than ScanNet, and the number of bounding box annotations is an order of magnitude larger than ScanNet and SUN RGB-D.
Our generated ProcTHOR-OD and ProcFront datasets are available for download: [ProcTHOR-OD download] & [ProcFront download].
Existing datasets
We provide the unified format of indoor 3D object detection datasets:ScanNet
![]() |
![]() |
![]() |
![]() |
ScanNet dataset is a high quality real-world dataset collected from 3D scanners.
SUN RGB-D
![]() |
![]() |
![]() |
![]() |
SUN RGB-D dataset is a large-scale dataset collected from real-world single RGB-D images, thus exhibiting low quality point clouds with obvious point omissions.
3D Front
![]() |
![]() |
![]() |
![]() |
3D Front dataset is a synthetic dataset constructed by placing synthetic 3D models into rooms by human expert designers. It contains uniformly sampled high-quality 3D point clouds, but still lacks extensibility and realism.
Dataset processing code
We provide the processing code for all the above datasets in: [Data processing code].
We convert 3D scenes of mesh or RGB-D formats into point clouds with instance bounding boxes for domain adaptation investigations.
Domain adaptation benchmarks for indoor 3D object detection
We construct 6 benchmarks for domain adaptation investigations, including:
Benchmark name | Source domain dataset | Target domain dataset | Adaptation scenario |
---|---|---|---|
proc2scan |
ProcTHOR-OD | ScanNet | synthetic-to-real |
front2scan |
3D Front | ScanNet | synthetic-to-real |
front2sun |
3D Front | SUN RGB-D | synthetic-to-real |
scan2sun |
ScanNet | SUN RGB-D | point cloud quality |
pf2front |
ProcFront | 3D Front | layout adaptation |
proc2pf |
ProcTHOR-OD | ProcFront | instance adaptation |
Domain adaptation baselines
We provide 7 training scripts for each adaptation benchmarks: [Domain adaptation baselines], including:
- Cross-domain basic training:
- source_only: train the detector with source data and evaluate it on target data.
- target_only: train the detector with target data and evaluate it on target data.
- Using target domain prior:
- few_shot: fine-tune the detector with 10-shots and 100-shots target domain annotated data.
- size_prior: train the detector with source data, but with the target domain prior of mean size.
- Unsupervised domain adaptation:
The detailed results of the baseline methods are shown in: [Baseline code]. We hope that our work can inspire the community to further study domain adaptation for point cloud data.
Citation
The bibtex of this paper will be coming soon.