Adaptive Multi-modal Fusion Instance Segmentation for CAEVs in Complex Conditions: Dataset, Framework and Verifications

Pai Peng; Keke Geng; Guodong Yin; Yanbo Lu; Weichao Zhuang; Shuaipeng Liu

doi:10.1186/s10033-021-00602-2

Chinese Journal of Mechanical Engineering >

2021 , Vol. 34 >Issue 5: 81 - 81

DOI: https://doi.org/10.1186/s10033-021-00602-2

Original Article

Adaptive Multi-modal Fusion Instance Segmentation for CAEVs in Complex Conditions: Dataset, Framework and Verifications

Pai Peng ,
Keke Geng ,
Guodong Yin ,
Yanbo Lu ,
Weichao Zhuang ,
Shuaipeng Liu

展开

School of Mechanical Engineering, Southeast University, Nanjing, China

收稿日期: 2020-07-17

修回日期: 2021-07-25

网络出版日期: 2022-03-22

基金资助

Supported by National Natural Science Foundation of China (Grant Nos. 51975118, 52025121, 51975103, 51905095), National Natural Science Foundation of Jiangsu Province (Grant No. BK20180401).

收起

Adaptive Multi-modal Fusion Instance Segmentation for CAEVs in Complex Conditions: Dataset, Framework and Verifications

Pai Peng ,
Keke Geng ,
Guodong Yin ,
Yanbo Lu ,
Weichao Zhuang ,
Shuaipeng Liu

Expand

School of Mechanical Engineering, Southeast University, Nanjing, China

Received date: 2020-07-17

Revised date: 2021-07-25

Online published: 2022-03-22

Supported by

Supported by National Natural Science Foundation of China (Grant Nos. 51975118, 52025121, 51975103, 51905095), National Natural Science Foundation of Jiangsu Province (Grant No. BK20180401).

Fold

摘要

Current works of environmental perception for connected autonomous electrified vehicles (CAEVs) mainly focus on the object detection task in good weather and illumination conditions, they often perform poorly in adverse scenarios and have a vague scene parsing ability. This paper aims to develop an end-to-end sharpening mixture of experts (SMoE) fusion framework to improve the robustness and accuracy of the perception systems for CAEVs in complex illumination and weather conditions. Three original contributions make our work distinctive from the existing relevant literature. The Complex KITTI dataset is introduced which consists of 7481 pairs of modified KITTI RGB images and the generated LiDAR dense depth maps, and this dataset is fine annotated in instance-level with the proposed semi-automatic annotation method. The SMoE fusion approach is devised to adaptively learn the robust kernels from complementary modalities. Comprehensive comparative experiments are implemented, and the results show that the proposed SMoE framework yield significant improvements over the other fusion techniques in adverse environmental conditions. This research proposes a SMoE fusion framework to improve the scene parsing ability of the perception systems for CAEVs in adverse conditions.

关键词： Connected autonomous electrified vehicles; Multi-modal fusion; Semi-automatic annotation; Sharpening mixture of experts; Comparative experiments

本文引用格式

Pai Peng , Keke Geng , Guodong Yin , Yanbo Lu , Weichao Zhuang , Shuaipeng Liu . Adaptive Multi-modal Fusion Instance Segmentation for CAEVs in Complex Conditions: Dataset, Framework and Verifications[J]. Chinese Journal of Mechanical Engineering, 2021 , 34(5) : 81 -81 . DOI: 10.1186/s10033-021-00602-2

Abstract

Key words： Connected autonomous electrified vehicles; Multi-modal fusion; Semi-automatic annotation; Sharpening mixture of experts; Comparative experiments

参考文献

[1] Y Jiang, X Zhao, J Gong, et al. System design of self-driving in simplified urban environments. Journal of Mechanical Engineering, 2012, 48(20):103-112. (in Chinese)
[2] J G Ibanez, S Zeadally, J Contreras-Castillo. Integration challenges of intelligent transportation systems with connected vehicle, cloud computing and internet of things technologies. IEEE Wireless Communications, 2015, 6(22):122-128.
[3] X Tang, T Jia, X Hu, et al. Naturalistic data-driven predictive energy management for plug-in hybrid electric vehicles. IEEE Transactions on Transportation Electrification, 2021, 7(2):497-508.
[4] F Rosique, P J Navarro, C Fernández, et al. A systematic review of perception system and simulators for autonomous vehicles research. Sensors, 2019, 19(3):648.
[5] F Lin, Y Zhang, Y Zhao, et al. Trajectory tracking of autonomous vehicle with the fusion of dyc and longitudinal-lateral control. Chinese Journal of Mechanical Engineering, 2019, 32:1-16.
[6] R Girshick. Fast R-CNN. Proceedings of IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, December 7-13, 2015:1440-1448.
[7] Y F Cai, H Wang, X Chen, et al. Vehicle detection based on visual saliency and deep sparse convolution hierarchical model. Chinese Journal of Mechanical Engineering, 2016, 29(4):765-772.
[8] S Q Ren, K M He, R Girshick, et al. Faster R-CNN:towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6):1137-1149.
[9] K M He, G Gkioxari, P Dollár, et al. Mask R-CNN. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2020, 42(2):386-397.
[10] L Hu, J Ou, J Huang, et al. A review of research on traffic conflicts based on intelligent vehicles. IEEE Access, 2020, 8:24471-24483.
[11] K K Geng, W Zou, G D Yin, et al. Low-observable targets detection for autonomous vehicles based on dual-modal sensor fusion with deep learning approach. Proceedings of the Institution of Mechanical Engineers Part D:Journal of Automobile Engineering, 2019, 233(9):2270-2283.
[12] O Mees, A Eitel, W Burgard. Choosing smartly:adaptive multimodal fusion for object detection in changing environments. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, South Korea, October 9-14, 2016:151-156.
[13] X Chen, H Ma, J Wan, et al. Multi-view 3D object detection network for autonomous driving. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, July 21-26, 2017:6526-6534.
[14] A Geiger, P Lenz, R Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Providence, Rhode Island, June 16-21, 2012:3354-3361.
[15] P Wang, X Huang, X Cheng, et al. The apolloscape open dataset for autonomous driving and its application. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 42(10):2702-2719.
[16] J Xue, J Fang, T Li, et al. Blvd:building a large-scale 5D semantics benchmark for autonomous driving. International Conference on Robotics and Automation (ICRA), Montreal, Canada, May 20-24, 2019:6685-6691.
[17] H Caesar, V Bankiti, A H Lang, et al. NuScenes:a multimodal dataset for autonomous driving. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Seattle, USA, June 16-18, 2020:11621-11631.
[18] A Patil, S Malla, H Gang, et al. The h3d dataset for full-surround 3d multi-object detection and tracking in crowded urban scenes. International Conference on Robotics and Automation (ICRA), Montreal, Canada, May 20-24, 2019:9552-9557.
[19] J R Tong, L Mao, J Sun. Multimodal pedestrian detection algorithm based on fusion feature pyramids. Computer Engineering and Applications, 2019, 55(19):214-222.
[20] D Bolya, C Zhou, F Xiao, et al. Yolact:real-time instance segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Long Beach, USA, June 16-20, 2019:9157-9166.
[21] T Y Lin, M Maire, S Belongie, et al. Microsoft coco:common objects in context. European Conference on Computer Vision(ECCV), Zürich, Switzerland, September 6-12, 2014:740-755.
[22] J Xu, A G Schwing, R Urtasun. Learning to segment under various forms of weak supervision. Proceedings of IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, December 7-13, 2015:3781-3790.
[23] D Lin, J Dai, J Jia, et al. Scribblesup:scribble-supervised convolutional networks for semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, June 26-July 1, 2016:3159-3167.
[24] N Xu, B Price, S Cohen, et al. Deep interactive object selection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, June 26-July 1, 2016:373-381.
[25] A Bearman, O Russakovsky, V Ferrari, et al. What's the point:semantic segmentation with point supervision. European conference on computer vision(ECCV), Amsterdam, Netherlands, October 8-16, 2016:549-565.
[26] L C Chen, S Fidler, A L Yuille, et al. Beat the mturkers:automatic image labeling from weak 3d supervision. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, USA, June 24-27, 2014:3198-3205.
[27] Z Zhang, A G Schwing, S Fidler, et al. Monocular object instance segmentation and depth ordering with CNNs. IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, December 7-13, 2015:2614-2622.
[28] Z Zhang, S Fidler, R Urtasun. Instance-level segmentation for autonomous driving with deep densely connected MRFs. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, June 26-July 1, 2016:669-677.
[29] L Castrejon, K Kundu, R Urtasun, et al. Annotating object instances with a polygon-RNN. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, July 21-26, 2017:4485-4493.
[30] D Acuna, H Ling, A Kar, et al. Efficient interactive annotation of segmentation datasets with polygon-RNN++. Proceedings of IEEE International Conference on Computer Vision (ICCV), Salt Lake City, USA, June 18-23, 2018:859-868.
[31] M Andriluka, J R R Uijlings, V Ferrari. Fluid annotation:a human-machine collaboration interface for full image annotation. Proceedings of the 26th ACM International Conference on Multimedia, Seoul, South Korea, October 22-26, 2018:1957-1966.
[32] P Voigtlaender, M Krause, A Osep, et al. Mots:multi-object tracking and segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Long Beach, USA, June 16-20, 2019:7942-7951.
[33] Z Zhou, M Dong, X Xie, et al. Fusion of infrared and visible images for night-vision context enhancement. Applied Optics, 2016, 55(23):6480-6490.
[34] Q Ha, K Watanabe, T Karasawa, et al. Mfnet:towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, Canada, September 24-28, 2017:5108-5115.
[35] A Valada, R Mohan, W Burgard. Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision, 2020, 128(5):1239-1285.
[36] D Xu, D Anguelov, A Jain. Pointfusion:deep sensor fusion for 3D bounding box estimation. Proceedings of IEEE International Conference on Computer Vision (ICCV), Salt Lake City, USA, June 18-23, 2018:244-253.
[37] K Shin, Y P Kwon, M Tomizuka. Roarnet:a robust 3D object detection based on region approximation refinement. IEEE Intelligent Vehicles Symposium (IV), Paris, France, June 9-12, 2019:2510-2515.
[38] C R Qi, H Su, K Mo, et al. Pointnet:deep learning on point sets for 3d classification and segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, July 21-26, 2017:77-85.
[39] J Ku, M Mozifian, J Lee, et al. Joint 3D proposal generation and object detection from view aggregation. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, October 1-5, 2018:1-8.
[40] Z Wang, W Zhan, M Tomizuka. Fusing bird's eye view lidar point cloud and front view camera image for 3D object detection. IEEE Intelligent Vehicles Symposium (IV), Changshu, China, June 26-30, 2018:1-6.
[41] A Asvadi, L Garrote, C Premebida, et al. Multimodal vehicle detection:fusing 3D-lidar and color camera data. Pattern Recognition Letters, 2017, 115:20-29.
[42] A Asvadi, L Garrote, C Premebida, et al. Depthcn:vehicle detection using 3D-lidar and convent. Proceedings of IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan. October 16-19, 2017:1-6.
[43] C Couprie, C Farabet, L Najman, et al. Indoor semantic segmentation using depth information. International Conference on Learning Representations (ICLR), Scottsdale, USA, May 2-4, 2013:1-8.
[44] J Long, E Shelhamer, T Darrell. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(4):640-651.
[45] D Guan, Y Cao, J Yang, et al. Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Information Fusion, 2019, 50:148-157.
[46] A Valada, J Vertens, A Dhall, et al. Adapnet:adaptive semantic segmentation in adverse environmental conditions. IEEE International Conference on Robotics and Automation (ICRA), Singapore, May 29-June 3, 2017:4644-4651.
[47] Y Cheng, R Cai, Z Li, et al. Locality-sensitive deconvolution networks with gated fusion for RGB-d indoor semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, July 21-26, 2017:1475-1483.
[48] M Cordts, M Omran, S Ramos, et al. The cityscapes dataset for semantic urban scene understanding. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, June 26-July 1, 2016:3213-3223.
[49] A Asvadi, L Garrote, C Premebida, et al. Real-time deep convnet-based vehicle detection using 3d-lidar reflection intensity data. Robot 2017:Third Iberian Robotics Conference, Seville, Spain, November 22-24, 2017:475-486.
[50] W Maddern, G Pascoe, C Linegar, et al. 1 year, 1000 km:the oxford robotcar dataset. The International Journal of Robotics Research, 2017, 36(1):3-15.
[51] M Braun, S Krebs, F Flohr, et al. Eurocity persons:a novel benchmark for person detection in traffic scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8):1844-1861.
[52] K M He, X Zhang, S Q Ren, et al. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, June 26-July 1, 2016:770-778.
[53] T Y Lin, P Dollár, R Girshick, et al. Feature pyramid networks for object detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, July 21-26, 2017:936-944.
[54] X Glorot, Y Bengio. Understanding the difficulty of training deep feedforward neural networks. Journal of Machine Learning Research, 2010, 9:249-256.
[55] W Liu, D Anguelov, D Erhan, et al. Ssd:single shot multibox detector. European Conference on Computer Vision(ECCV), Amsterdam, Netherlands, October 8-16, 2016:21-37.
[56] A Shrivastava, A Gupta, R Girshick. Training region-based object detectors with online hard example mining. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, June 26-July 1, 2016:761-769.
[57] O Prakash, A Kumar, A Khare. Pixel-level image fusion scheme based on steerable pyramid wavelet transform using absolute maximum selection fusion rule. International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT), Kochi, India, December 3-5, 2014:765-770.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献