[1] Y Jiang, X Zhao, J Gong, et al. System design of self-driving in simplified urban environments. Journal of Mechanical Engineering, 2012, 48(20):103-112. (in Chinese)
[2] J G Ibanez, S Zeadally, J Contreras-Castillo. Integration challenges of intelligent transportation systems with connected vehicle, cloud computing and internet of things technologies. IEEE Wireless Communications, 2015, 6(22):122-128.
[3] X Tang, T Jia, X Hu, et al. Naturalistic data-driven predictive energy management for plug-in hybrid electric vehicles. IEEE Transactions on Transportation Electrification, 2021, 7(2):497-508.
[4] F Rosique, P J Navarro, C Fernández, et al. A systematic review of perception system and simulators for autonomous vehicles research. Sensors, 2019, 19(3):648.
[5] F Lin, Y Zhang, Y Zhao, et al. Trajectory tracking of autonomous vehicle with the fusion of dyc and longitudinal-lateral control. Chinese Journal of Mechanical Engineering, 2019, 32:1-16.
[6] R Girshick. Fast R-CNN. Proceedings of IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, December 7-13, 2015:1440-1448.
[7] Y F Cai, H Wang, X Chen, et al. Vehicle detection based on visual saliency and deep sparse convolution hierarchical model. Chinese Journal of Mechanical Engineering, 2016, 29(4):765-772.
[8] S Q Ren, K M He, R Girshick, et al. Faster R-CNN:towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6):1137-1149.
[9] K M He, G Gkioxari, P Dollár, et al. Mask R-CNN. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2020, 42(2):386-397.
[10] L Hu, J Ou, J Huang, et al. A review of research on traffic conflicts based on intelligent vehicles. IEEE Access, 2020, 8:24471-24483.
[11] K K Geng, W Zou, G D Yin, et al. Low-observable targets detection for autonomous vehicles based on dual-modal sensor fusion with deep learning approach. Proceedings of the Institution of Mechanical Engineers Part D:Journal of Automobile Engineering, 2019, 233(9):2270-2283.
[12] O Mees, A Eitel, W Burgard. Choosing smartly:adaptive multimodal fusion for object detection in changing environments. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, South Korea, October 9-14, 2016:151-156.
[13] X Chen, H Ma, J Wan, et al. Multi-view 3D object detection network for autonomous driving. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, July 21-26, 2017:6526-6534.
[14] A Geiger, P Lenz, R Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Providence, Rhode Island, June 16-21, 2012:3354-3361.
[15] P Wang, X Huang, X Cheng, et al. The apolloscape open dataset for autonomous driving and its application. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 42(10):2702-2719.
[16] J Xue, J Fang, T Li, et al. Blvd:building a large-scale 5D semantics benchmark for autonomous driving. International Conference on Robotics and Automation (ICRA), Montreal, Canada, May 20-24, 2019:6685-6691.
[17] H Caesar, V Bankiti, A H Lang, et al. NuScenes:a multimodal dataset for autonomous driving. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Seattle, USA, June 16-18, 2020:11621-11631.
[18] A Patil, S Malla, H Gang, et al. The h3d dataset for full-surround 3d multi-object detection and tracking in crowded urban scenes. International Conference on Robotics and Automation (ICRA), Montreal, Canada, May 20-24, 2019:9552-9557.
[19] J R Tong, L Mao, J Sun. Multimodal pedestrian detection algorithm based on fusion feature pyramids. Computer Engineering and Applications, 2019, 55(19):214-222.
[20] D Bolya, C Zhou, F Xiao, et al. Yolact:real-time instance segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Long Beach, USA, June 16-20, 2019:9157-9166.
[21] T Y Lin, M Maire, S Belongie, et al. Microsoft coco:common objects in context. European Conference on Computer Vision(ECCV), Zürich, Switzerland, September 6-12, 2014:740-755.
[22] J Xu, A G Schwing, R Urtasun. Learning to segment under various forms of weak supervision. Proceedings of IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, December 7-13, 2015:3781-3790.
[23] D Lin, J Dai, J Jia, et al. Scribblesup:scribble-supervised convolutional networks for semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, June 26-July 1, 2016:3159-3167.
[24] N Xu, B Price, S Cohen, et al. Deep interactive object selection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, June 26-July 1, 2016:373-381.
[25] A Bearman, O Russakovsky, V Ferrari, et al. What's the point:semantic segmentation with point supervision. European conference on computer vision(ECCV), Amsterdam, Netherlands, October 8-16, 2016:549-565.
[26] L C Chen, S Fidler, A L Yuille, et al. Beat the mturkers:automatic image labeling from weak 3d supervision. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, USA, June 24-27, 2014:3198-3205.
[27] Z Zhang, A G Schwing, S Fidler, et al. Monocular object instance segmentation and depth ordering with CNNs. IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, December 7-13, 2015:2614-2622.
[28] Z Zhang, S Fidler, R Urtasun. Instance-level segmentation for autonomous driving with deep densely connected MRFs. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, June 26-July 1, 2016:669-677.
[29] L Castrejon, K Kundu, R Urtasun, et al. Annotating object instances with a polygon-RNN. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, July 21-26, 2017:4485-4493.
[30] D Acuna, H Ling, A Kar, et al. Efficient interactive annotation of segmentation datasets with polygon-RNN++. Proceedings of IEEE International Conference on Computer Vision (ICCV), Salt Lake City, USA, June 18-23, 2018:859-868.
[31] M Andriluka, J R R Uijlings, V Ferrari. Fluid annotation:a human-machine collaboration interface for full image annotation. Proceedings of the 26th ACM International Conference on Multimedia, Seoul, South Korea, October 22-26, 2018:1957-1966.
[32] P Voigtlaender, M Krause, A Osep, et al. Mots:multi-object tracking and segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Long Beach, USA, June 16-20, 2019:7942-7951.
[33] Z Zhou, M Dong, X Xie, et al. Fusion of infrared and visible images for night-vision context enhancement. Applied Optics, 2016, 55(23):6480-6490.
[34] Q Ha, K Watanabe, T Karasawa, et al. Mfnet:towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, Canada, September 24-28, 2017:5108-5115.
[35] A Valada, R Mohan, W Burgard. Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision, 2020, 128(5):1239-1285.
[36] D Xu, D Anguelov, A Jain. Pointfusion:deep sensor fusion for 3D bounding box estimation. Proceedings of IEEE International Conference on Computer Vision (ICCV), Salt Lake City, USA, June 18-23, 2018:244-253.
[37] K Shin, Y P Kwon, M Tomizuka. Roarnet:a robust 3D object detection based on region approximation refinement. IEEE Intelligent Vehicles Symposium (IV), Paris, France, June 9-12, 2019:2510-2515.
[38] C R Qi, H Su, K Mo, et al. Pointnet:deep learning on point sets for 3d classification and segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, July 21-26, 2017:77-85.
[39] J Ku, M Mozifian, J Lee, et al. Joint 3D proposal generation and object detection from view aggregation. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, October 1-5, 2018:1-8.
[40] Z Wang, W Zhan, M Tomizuka. Fusing bird's eye view lidar point cloud and front view camera image for 3D object detection. IEEE Intelligent Vehicles Symposium (IV), Changshu, China, June 26-30, 2018:1-6.
[41] A Asvadi, L Garrote, C Premebida, et al. Multimodal vehicle detection:fusing 3D-lidar and color camera data. Pattern Recognition Letters, 2017, 115:20-29.
[42] A Asvadi, L Garrote, C Premebida, et al. Depthcn:vehicle detection using 3D-lidar and convent. Proceedings of IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan. October 16-19, 2017:1-6.
[43] C Couprie, C Farabet, L Najman, et al. Indoor semantic segmentation using depth information. International Conference on Learning Representations (ICLR), Scottsdale, USA, May 2-4, 2013:1-8.
[44] J Long, E Shelhamer, T Darrell. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(4):640-651.
[45] D Guan, Y Cao, J Yang, et al. Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Information Fusion, 2019, 50:148-157.
[46] A Valada, J Vertens, A Dhall, et al. Adapnet:adaptive semantic segmentation in adverse environmental conditions. IEEE International Conference on Robotics and Automation (ICRA), Singapore, May 29-June 3, 2017:4644-4651.
[47] Y Cheng, R Cai, Z Li, et al. Locality-sensitive deconvolution networks with gated fusion for RGB-d indoor semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, July 21-26, 2017:1475-1483.
[48] M Cordts, M Omran, S Ramos, et al. The cityscapes dataset for semantic urban scene understanding. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, June 26-July 1, 2016:3213-3223.
[49] A Asvadi, L Garrote, C Premebida, et al. Real-time deep convnet-based vehicle detection using 3d-lidar reflection intensity data. Robot 2017:Third Iberian Robotics Conference, Seville, Spain, November 22-24, 2017:475-486.
[50] W Maddern, G Pascoe, C Linegar, et al. 1 year, 1000 km:the oxford robotcar dataset. The International Journal of Robotics Research, 2017, 36(1):3-15.
[51] M Braun, S Krebs, F Flohr, et al. Eurocity persons:a novel benchmark for person detection in traffic scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8):1844-1861.
[52] K M He, X Zhang, S Q Ren, et al. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, June 26-July 1, 2016:770-778.
[53] T Y Lin, P Dollár, R Girshick, et al. Feature pyramid networks for object detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, July 21-26, 2017:936-944.
[54] X Glorot, Y Bengio. Understanding the difficulty of training deep feedforward neural networks. Journal of Machine Learning Research, 2010, 9:249-256.
[55] W Liu, D Anguelov, D Erhan, et al. Ssd:single shot multibox detector. European Conference on Computer Vision(ECCV), Amsterdam, Netherlands, October 8-16, 2016:21-37.
[56] A Shrivastava, A Gupta, R Girshick. Training region-based object detectors with online hard example mining. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, June 26-July 1, 2016:761-769.
[57] O Prakash, A Kumar, A Khare. Pixel-level image fusion scheme based on steerable pyramid wavelet transform using absolute maximum selection fusion rule. International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT), Kochi, India, December 3-5, 2014:765-770.