We explore the task of zero-shot semantic segmentation of 3D shapes by using
large-scale off-the-shelf 2D im- age recognition models. Surprisingly, we find
that modern zero-shot 2D object detectors are better suited for this task than
contemporary text/image similarity predictors or even zero-shot 2D segmentation
networks. Our key finding is that it is possible to extract accurate 3D segmentation
maps from multi-view bounding box predictions by using the topological properties
of the underlying surface. For this, we develop the Segmentation Assignment with
Topological Reweighting (SATR) algorithm and evaluate it on ShapeNetPart and our
proposed FAUST benchmarks. SATR achieves state-of-the-art performance and outperforms
a baseline algorithm by 1.3% and 4% average mIoU on the FAUST coarse and fine-grained
benchmarks, respectively, and by 5.2% average mIoU on the ShapeNetPart benchmark.