55 - Deep Learning - Segmentation and Object Detection Part 5 [ID:18732]

50 von 74 angezeigt

Welcome back to deep learning and today we want to talk about the last part of

object detection and segmentation and we want to look into the concept of

instant segmentation. So let's have a look at our slides you see this is

already the last part part five and now we want to talk about instant

segmentation. So we not just want to detect where pixels with cubes are or

where pixels of cups are but we want to really figure out which pixels belong to

what cube. So this is essentially a combination of object detection and

semantic segmentation. Examples for potential applications are information

about occlusion, counting, the number of elements belonging to the same class,

detecting object boundaries for example of gripping objects and robotics this is

very important and there's examples in the literature simultaneous detection and

segmentation, deep mask, sharp mask and mask RCNN and reference 10. So let's look

at reference 10 in a little more detail. So we essentially go back to the start

we combine the object detection and the segmentation and we use RCNN for the

object detection and the object detection essentially solves the

instance separation and then the segmentation refines the bounding boxes

per instance. So the workflow is a two-stage procedure you have the region

proposal that proposes the object bounding boxes and then you have the

classification using a bounding box regression and the segmentation in

parallel. So you have a multitask loss that essentially combines the pixel-wise

classification loss so the segmentation loss, the box loss and the class loss for

producing the right class per bounding box. So you have these three terms that

are then combined in a multitask loss. So let's look in some more detail into the

two-stage procedure. You have two different options here for two-stage

networks you can have a joint branch that is working on the ROIs and then

splits at a later stage into the segmentation of the mask and the class

and bounding box prediction or you can split early and then run that into

separate networks. In both versions you have this multitask loss that combines

the pixel-wise segmentation loss, the box loss and the class loss. Let's have a

look at some examples and these are results again from mask RCNN and you

can see that to be honest these are quite impressive results. So there are

really difficult cases you identify where the persons are and you also show

that the different persons of course are different instances. So very impressive

results. So let's summarize what we've seen so far. The segmentation is

commonly solved by architectures analyzing the image and subsequently

refining the course results. Fully convolutional networks preserve the

spatial layout and enable arbitrary input sizes with pooling. We can use

object detectors and implement them as a sequence of region proposals and

classification then this leads essentially to the family of RCNN type

of networks. Alternatively you can go to single shot detectors and we looked at

YOLO which is a very common and very fast technique YOLO 9000 and we looked

into retina net if you really have a scale dependency and you want to detect

on many different scales like for the example of histological slice

processing. So object detection and segmentation are closely related and

combinations are common as you have seen here for the purpose of instant

segmentation. So let's look at what we still have to talk about in this lecture

and coming up very soon is methods to relieve the burden of labeling. So we

will talk about weekly annotation, how we can generate labels which then also

leads to the concept of self-supervision which is a very popular topic right now

and it's been very heavily used in order to generate better networks in order to

Teil einer Videoserie :

Deep Learning - Plain Version

Presenters

Prof. Dr. Andreas Maier

Zugänglich über

Offener Zugang

Dauer

00:07:13 Min

Aufnahmedatum

2020-06-28

Hochgeladen am

2020-06-28 22:56:29

Sprache

en-US

Deep Learning - Segmentation and Object Detection Part 5

In this video, we look at instance segmentation and introduce the concepts of Mask-RCNN.

Additional References
nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation
X-ray-transform Invariant Anatomical Landmark Detection for Pelvic Trauma Surgery
Retina-net Figure by Marc Aubreville
DarkNet Library
Joseph Redmond CV

Further Reading:
A gentle Introduction to Deep Learning

References
[1] Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. “Segnet: A deep convolutional encoder-decoder architecture for image segmentation”. In: arXiv preprint arXiv:1511.00561 (2015). arXiv: 1311.2524.
[2] Xiao Bian, Ser Nam Lim, and Ning Zhou. “Multiscale fully convolutional network with application to industrial inspection”. In: Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on. IEEE. 2016, pp. 1–8.
[3] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, et al. “Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs”. In: CoRR abs/1412.7062 (2014). arXiv: 1412.7062.
[4] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, et al. “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs”. In: arXiv preprint arXiv:1606.00915 (2016).
[5] S. Ren, K. He, R. Girshick, et al. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”. In: vol. 39. 6. June 2017, pp. 1137–1149.
[6] R. Girshick. “Fast R-CNN”. In: 2015 IEEE International Conference on Computer Vision (ICCV). Dec. 2015, pp. 1440–1448.
[7] Tsung-Yi Lin, Priya Goyal, Ross Girshick, et al. “Focal loss for dense object detection”. In: arXiv preprint arXiv:1708.02002 (2017).
[8] Alberto Garcia-Garcia, Sergio Orts-Escolano, Sergiu Oprea, et al. “A Review on Deep Learning Techniques Applied to Semantic Segmentation”. In: arXiv preprint arXiv:1704.06857 (2017).
[9] Bharath Hariharan, Pablo Arbeláez, Ross Girshick, et al. “Simultaneous detection and segmentation”. In: European Conference on Computer Vision. Springer. 2014, pp. 297–312.
[10] Kaiming He, Georgia Gkioxari, Piotr Dollár, et al. “Mask R-CNN”. In: CoRR abs/1703.06870 (2017). arXiv: 1703.06870.
[11] N. Dalal and B. Triggs. “Histograms of oriented gradients for human detection”. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Vol. 1. June 2005, 886–893 vol. 1.
[12] Jonathan Huang, Vivek Rathod, Chen Sun, et al. “Speed/accuracy trade-offs for modern convolutional object detectors”. In: CoRR abs/1611.10012 (2016). arXiv: 1611.10012.
[13] Jonathan Long, Evan Shelhamer, and Trevor Darrell. “Fully convolutional networks for semantic segmentation”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, pp. 3431–3440.
[14] Pauline Luc, Camille Couprie, Soumith Chintala, et al. “Semantic segmentation using adversarial networks”. In: arXiv preprint arXiv:1611.08408 (2016).
[15] Christian Szegedy, Scott E. Reed, Dumitru Erhan, et al. “Scalable, High-Quality Object Detection”. In: CoRR abs/1412.1441 (2014). arXiv: 1412.1441.
[16] Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. “Learning deconvolution network for semantic segmentation”. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, pp. 1520–1528.
[17] Adam Paszke, Abhishek Chaurasia, Sangpil Kim, et al. “Enet: A deep neural network architecture for real-time semantic segmentation”. In: arXiv preprint arXiv:1606.02147 (2016).
[18] Pedro O Pinheiro, Ronan Collobert, and Piotr Dollár. “Learning to segment object candidates”. In: Advances in Neural Information Processing Systems. 2015, pp. 1990–1998.
[19] Pedro O Pinheiro, Tsung-Yi Lin, Ronan Collobert, et al. “Learning to refine object segments”. In: European Conference on Computer Vision. Springer. 2016, pp. 75–91.
[20] Ross B. Girshick, Jeff Donahue, Trevor Darrell, et al. “Rich feature hierarchies for accurate object detection and semantic segmentation”. In: CoRR abs/1311.2524 (2013). arXiv: 1311.2524.
[21] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. “U-net: Convolutional networks for biomedical image segmentation”. In: MICCAI. Springer. 2015, pp. 234–241.
[22] Kaiming He, Xiangyu Zhang, Shaoqing Ren, et al. “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”. In: Computer Vision – ECCV 2014. Cham: Springer International Publishing, 2014, pp. 346–361.
[23] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, et al. “Selective Search for Object Recognition”. In: International Journal of Computer Vision 104.2 (Sept. 2013), pp. 154–171.
[24] Wei Liu, Dragomir Anguelov, Dumitru Erhan, et al. “SSD: Single Shot MultiBox Detector”. In: Computer Vision – ECCV 2016. Cham: Springer International Publishing, 2016, pp. 21–37.
[25] P. Viola and M. Jones. “Rapid object detection using a boosted cascade of simple features”. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision Vol. 1. 2001, pp. 511–518.
[26] J. Redmon, S. Divvala, R. Girshick, et al. “You Only Look Once: Unified, Real-Time Object Detection”. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 2016, pp. 779–788.
[27] Joseph Redmon and Ali Farhadi. “YOLO9000: Better, Faster, Stronger”. In: CoRR abs/1612.08242 (2016). arXiv: 1612.08242.
[28] Fisher Yu and Vladlen Koltun. “Multi-scale context aggregation by dilated convolutions”. In: arXiv preprint arXiv:1511.07122 (2015).
[29] Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, et al. “Conditional Random Fields as Recurrent Neural Networks”. In: CoRR abs/1502.03240 (2015). arXiv: 1502.03240.
[30] Alejandro Newell, Kaiyu Yang, and Jia Deng. “Stacked hourglass networks for human pose estimation”. In: European conference on computer vision. Springer. 2016, pp. 483–499.

Tags

Per RSS abonnieren