BIBLIOGRAPHY - Human Perception and Pose Estimation

175

Bibliography [13] M. Rosheim,Leonardo’s lost robots.

Springer Science & Business Media, 2006 (cit. on p. 2).

[14] Y. N. Harari,Homo deus: A brief history of tomorrow. Random House, 2016 (cit. on p. 3).

[15] J. K. Mitchell, “The last of a veteran chess player”, Chess Monthly, vol. 1, pp. 3–4, 1857 (cit. on p. 2).

[16] C. Bailly,Automata: The golden age: 1848-1914.

Robert Hale Limited, 2003 (cit. on p. 3).

[17] M. Krzyzaniak, “Prehistory of musical robots”,

Journal of Human-Robot Interaction, vol. 1, no. 1, pp. 78–95, 2012 (cit. on p. 3).

[18] 1865 - le petit journal,https://gallica.bnf.fr/ark:

/12148/bpt6k589123j.image.r=Manzetti.f3.langEN, Accessed: 2019-12-4 (cit. on p. 3).

[19] 1849 flute-playing automaton innocenzo manzetti (italian), http://cyberneticzoo.com/robots/1849-flute- playing-automaton-innocenzo-manzetti-italian, Accessed: 2019-12-4 (cit. on p. 3).

[20] K. Capek,Rur (rossum’s universal robots). Penguin, 2004 (cit. on p. 3).

[21] Invention and meaning of the word "robot",

https://web.archive.org/web/20120204135259/http:

//capek.misto.cz/english/robot.html, Accessed: 2019-12-4 (cit. on p. 3).

[22] I. Asimov and L. McKeever,The complete robot.

Doubleday New York, 1982 (cit. on p. 3).

[23] I. Asimov,Robot dreams. Wiley Online Library, 2001 (cit. on p. 3).

[24] ——,I, robot. Spectra, 2004, vol. 1 (cit. on p. 3).

[25] ——,Robot visions. ibooks, 2013 (cit. on p. 3).

[26] A. Deed,The mechanical man,

https://www.imdb.com/title/tt0337377/, 1921 (cit. on p. 3).

[27] S. Kubrick,2001: A space odyssey,

https://www.imdb.com/title/tt0062622/, 1968 (cit. on p. 3).

[28] J. Cameron,The terminator,

https://www.imdb.com/title/tt0088247/, 1984 (cit. on p. 3).

[29] A. Proyas,I, robot,

Bibliography

[30] I. O. for Standardization, “Iso 8373: 2012 (en): Robots and robotic devices—vocabulary”, 2012 (cit. on p. 4).

[31] C. Breazeal, K. Dautenhahn, and T. Kanda, “Social robotics”, inSpringer handbook of robotics, Springer, 2016, pp. 1935–1972 (cit. on pp. 4, 5).

[32] E. B. Goldstein,Sensation and perception. Cengage Learning, 2009 (cit. on p. 4).

[33] L. Thaler and M. A. Goodale, “Echolocation in humans: An overview”, Wiley Interdisciplinary Reviews: Cognitive Science, vol. 7, no. 6, pp. 382–393, 2016 (cit. on p. 4).

[34] K. Weir, “The dawn of social robots”,

Monitor on Psychology, vol. 49, no. 1, p. 50, 2018 (cit. on p. 5).

[35] R. Heldet al.,

Perception: Mechanisms and models: Readings from scientific american.

WH Freeman, 1972 (cit. on p. 5).

[36] J. Bruner,A study of thinking. Routledge, 2017 (cit. on p. 5).

[37] D. C. Dennett,Consciousness explained. Penguin uk, 1993 (cit. on p. 6).

[38] G. Mackie and P. Burighel, “The nervous system in adult tunicates: Current research directions”,

Canadian journal of zoology, vol. 83, no. 1, pp. 151–183, 2005 (cit. on p. 6).

[39] Z. Zou, Z. Shi, Y. Guo, and J. Ye, “Object detection in 20 years: A survey”, ArXiv preprint arXiv:1905.05055, 2019 (cit. on p. 6).

[40] H. Bay, T. Tuytelaars, and L. Van Gool, “Surf: Speeded up robust features”, inEuropean conference on computer vision, Springer, 2006, pp. 404–417 (cit. on p. 7).

[41] D. G. Loweet al., “Object recognition from local scale-invariant features.”, inIccv, vol. 99, 1999, pp. 1150–1157 (cit. on p. 7).

[42] P. Viola, M. Jones,et al., “Rapid object detection using a boosted cascade of simple features”, (cit. on p. 7).

[43] N. Dalal and B. Triggs,

“Histograms of oriented gradients for human detection”, 2005 (cit. on p. 7).

[44] P. Dollár, S. Belongie, and P. Perona, “The fastest pedestrian detector in the west”, 2010 (cit. on p. 7).

[45] P. Dollár, R. Appel, S. Belongie, and P. Perona, “Fast feature pyramids for object detection”,IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 8, pp. 1532–1545, 2014 (cit. on p. 7).

177

Bibliography

[46] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan,

“Object detection with discriminatively trained part-based models”,

IEEE transactions on pattern analysis and machine intelligence, vol. 32, no.

9, pp. 1627–1645, 2009 (cit. on pp. 7, 8).

[47] A. Krizhevsky, I. Sutskever, and G. E. Hinton,

“Imagenet classification with deep convolutional neural networks”,

inAdvances in neural information processing systems, 2012, pp. 1097–1105 (cit. on pp. 7, 13).

[48] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei,

“Imagenet: A large-scale hierarchical image database”,

in2009 IEEE conference on computer vision and pattern recognition, Ieee, 2009, pp. 248–255 (cit. on pp. 7, 13).

[49] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context”,

inComputer Vision–ECCV 2014, Springer, 2014, pp. 740–755 (cit. on pp. 7, 8, 10).

[50] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results, http://www.pascal-

network.org/challenges/VOC/voc2012/workshop/index.html (cit. on p. 7).

[51] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation”, inProceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587 (cit. on p. 7).

[52] R. Girshick, “Fast r-cnn”,

inProceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448 (cit. on p. 7).

[53] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks”,

inAdvances in neural information processing systems, 2015, pp. 91–99 (cit. on p. 7).

[54] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn”, inICCV, 2017 (cit. on p. 7).

[55] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi,

“You only look once: Unified, real-time object detection”, inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788 (cit. on p. 7).

Bibliography

[56] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector”,

inEuropean Conference on Computer Vision, Springer, 2016, pp. 21–37 (cit. on p. 7).

[57] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár,

“Focal loss for dense object detection”,

inProceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988 (cit. on p. 7).

[58] G. Johansson, “Visual perception of biological motion and a model for its analysis”,Perception & psychophysics, vol. 14, no. 2, pp. 201–211, 1973 (cit. on p. 8).

[59] M. A. Fischler and R. A. Elschlager, “The representation and matching of pictorial structures”,

IEEE Transactions on computers, no. 1, pp. 67–92, 1973 (cit. on p. 8).

[60] P. F. Felzenszwalb and D. P. Huttenlocher, “Pictorial structures for object recognition”,

International journal of computer vision, vol. 61, no. 1, pp. 55–79, 2005 (cit. on p. 8).

[61] Y. Yang and D. Ramanan,

“Articulated pose estimation with flexible mixtures-of-parts”, inComputer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 2011, pp. 1385–1392 (cit. on p. 8).

[62] S. Zuffi, O. Freifeld, and M. J. Black,

“From pictorial structures to deformable structures”,

in2012 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2012, pp. 3546–3553 (cit. on p. 8).

[63] X. Chen and A. L. Yuille, “Articulated pose estimation by a graphical model with image dependent pairwise relations”,

inAdvances in Neural Information Processing Systems, 2014, pp. 1736–1744 (cit. on p. 8).

[64] M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele,

“2d human pose estimation: New benchmark and state of the art analysis”, inIEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2014 (cit. on p. 8).

[65] S. Johnson and M. Everingham, “Clustered pose and nonlinear appearance models for human pose estimation”,

inProceedings of the British Machine Vision Conference, doi:10.5244/C.24.12, 2010 (cit. on p. 8).

179

Bibliography [66] A. Toshev and C. Szegedy,

“Deeppose: Human pose estimation via deep neural networks”,

inProceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1653–1660 (cit. on p. 8).

[67] G. Papandreou, T. Zhu, N. Kanazawa, A. Toshev, J. Tompson, C. Bregler, and K. Murphy, “Towards accurate multi-person pose estimation in the wild”,ArXiv preprint arXiv:1701.01779, 2017 (cit. on p. 8).

[68] S.-E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh,

“Convolutional pose machines”, inCVPR, 2016 (cit. on p. 8).

[69] A. Newell, K. Yang, and J. Deng, “Stacked hourglass networks for human pose estimation”,ArXiv preprint arXiv:1603.06937, 2016 (cit. on p. 8).

[70] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields”,

ArXiv preprint arXiv:1611.08050, 2016 (cit. on p. 8).

[71] S. Kreiss, L. Bertoni, and A. Alahi,

“Pifpaf: Composite fields for human pose estimation”, inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 11 977–11 986 (cit. on p. 8).

[72] C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu, “Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments”,PAMI, 2014 (cit. on p. 8).

[73] G. Varol, J. Romero, X. Martin, N. Mahmood, M. J. Black, I. Laptev, and C. Schmid, “Learning from synthetic humans”, inCVPR, 2017 (cit. on p. 8).

[74] M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black, “Smpl:

A skinned multi-person linear model”,

ACM transactions on graphics (TOG), 2015 (cit. on p. 8).

[75] S. Carey,The origin of concepts. Oxford University Press, 2009 (cit. on p. 8).

[76] R. Jain, R. Kasturi, and B. G. Schunck,Machine vision, vol. 5 (cit. on p. 8).

[77] P. W. Battaglia, J. B. Hamrick, and J. B. Tenenbaum, “Simulation as an engine of physical scene understanding”,Proceedings of the National Academy of Sciences, vol. 110, no. 45, pp. 18 327–18 332, 2013 (cit. on p. 9).

[78] A. Lerer, S. Gross, and R. Fergus, “Learning physical intuition of block towers by example”,ArXiv preprint arXiv:1603.01312, 2016 (cit. on p. 9).

[79] J. He, A. Lehrmann, J. Marino, G. Mori, and L. Sigal,

“Probabilistic video generation using holistic attribute control”,

Bibliography [80] C. Vondrick, H. Pirsiavash, and A. Torralba,

“Generating videos with scene dynamics”,

inAdvances In Neural Information Processing Systems, 2016, pp. 613–621 (cit. on p. 9).

[81] T. Xue, J. Wu, K. Bouman, and B. Freeman, “Visual dynamics:

Probabilistic future frame synthesis via cross convolutional networks”, inAdvances in neural information processing systems, 2016, pp. 91–99 (cit. on p. 9).

[82] P. Agrawal, A. V. Nair, P. Abbeel, J. Malik, and S. Levine,

“Learning to poke by poking: Experiential learning of intuitive physics”, inAdvances in Neural Information Processing Systems, 2016,

pp. 5074–5082 (cit. on pp. 9, 173).

[83] C. Finn, X. Y. Tan, Y. Duan, T. Darrell, S. Levine, and P. Abbeel,

“Deep spatial autoencoders for visuomotor learning”,

in2016 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2016, pp. 512–519 (cit. on pp. 9, 173).

[84] C. Finn, I. Goodfellow, and S. Levine,

“Unsupervised learning for physical interaction through video prediction”, inAdvances in Neural Information Processing Systems 29,

D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, Eds., Curran Associates, Inc., 2016, pp. 64–72.

[Online]. Available:http://papers.nips.cc/paper/6161- unsupervised-learning-for-physical-interaction- through-video-prediction.pdf(cit. on pp. 9, 173).

[85] J. Martinez, M. J. Black, and J. Romero,

“On human motion prediction using recurrent neural networks”,

inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2891–2900 (cit. on p. 9).

[86] D. Pavllo, D. Grangier, and M. Auli, “Quaternet: A quaternion-based recurrent model for human motion”,

ArXiv preprint arXiv:1805.06485, 2018 (cit. on p. 9).

[87] S. Gidaris and N. Komodakis, “Object detection via a multi-region and semantic segmentation-aware cnn model”,

inProceedings of the IEEE international conference on computer vision, 2015, pp. 1134–1142 (cit. on p. 10).

[88] X. Zeng, W. Ouyang, J. Yan, H. Li, T. Xiao, K. Wang, Y. Liu, Y. Zhou, B. Yang, Z. Wang,et al., “Crafting gbd-net for object detection”,

IEEE transactions on pattern analysis and machine intelligence, vol. 40, no.

9, pp. 2109–2123, 2017 (cit. on p. 10).

181

Bibliography

[89] V. Ramakrishna, D. Munoz, M. Hebert, J. A. Bagnell, and Y. Sheikh,

“Pose machines: Articulated pose estimation via inference machines”, inEuropean Conference on Computer Vision, Springer, 2014, pp. 33–47 (cit. on p. 10).

[90] J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler,

“Efficient object localization using convolutional networks”, inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 648–656 (cit. on p. 10).

[91] Z. Tu and X. Bai, “Auto-context and its application to high-level vision tasks and 3d brain image segmentation”,IEEE transactions on pattern analysis and machine intelligence, vol. 32, no. 10, pp. 1744–1757, 2009 (cit. on p. 10).

[92] X. Chu, W. Yang, W. Ouyang, C. Ma, A. L. Yuille, and X. Wang,

“Multi-context attention for human pose estimation”, inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1831–1840 (cit. on p. 10).

[93] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri,

“Learning spatiotemporal features with 3d convolutional networks”, inProceedings of the IEEE international conference on computer vision, 2015, pp. 4489–4497 (cit. on p. 10).

[94] C.-Y. Wu, C. Feichtenhofer, H. Fan, K. He, P. Krahenbuhl, and R. Girshick,

“Long-term feature banks for detailed video understanding”, inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 284–293 (cit. on p. 10).

[95] L. Fei-Fei, R. Fergus, and P. Perona,

“Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories”,

in2004 conference on computer vision and pattern recognition workshop, IEEE, 2004, pp. 178–178 (cit. on p. 13).

[96] G. Griffin, A. Holub, and P. Perona, “Caltech-256 object category dataset”, 2007 (cit. on p. 13).

[97] Y. LeCun, Y. Bengio,et al., “Convolutional networks for images, speech, and time series”,The handbook of brain theory and neural networks, vol.

3361, no. 10, p. 1995, 1995 (cit. on p. 13).

[98] X. Xiong and A. D. Ames, “Coupling reduced order models via feedback control for 3d underactuated bipedal robotic walking”, in2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids), IEEE, 2018, pp. 1–9 (cit. on pp. 173, 174).

[99] The robot cassie,

Dalam dokumen Human Perception and Pose Estimation (Halaman 190-198)