This book focuses on the new paradigm of artificial intelligence and systematically introduces the key technologies, foundational models, and typical applications of multimodal large models. To make the technical content more accessible for lower-year undergraduate students and newcomers to the AI field, the book presents each key technical point in an easy-to-understand manner and provides numerous intuitive examples. It deeply analyses the structure and technology of several classic multimodal large models. The aim is to offer readers a clear guide to the technical methods, open-source…mehr
This book focuses on the new paradigm of artificial intelligence and systematically introduces the key technologies, foundational models, and typical applications of multimodal large models. To make the technical content more accessible for lower-year undergraduate students and newcomers to the AI field, the book presents each key technical point in an easy-to-understand manner and provides numerous intuitive examples. It deeply analyses the structure and technology of several classic multimodal large models. The aim is to offer readers a clear guide to the technical methods, open-source platforms, and application scenarios of multimodal large models, as well as to provide insights into achieving general artificial intelligence, including cutting-edge technologies such as causal reasoning, world models, embodied intelligence, and multi-agent systems. The book aspires to provide a clear perspective for both academia and industry, helping AI researchers gain a more comprehensive understanding of multimodal large model technologies and the development directions of the next generation of artificial intelligence.
The book is divided into five chapters. Chapter 1 explores the most representative large model structures in depth. Chapter 2 provides a thorough analysis of the core technologies of multimodal large models. Chapter 3 introduces several representative multimodal large models. Chapter 4 delves into three typical applications: visual question answering, AI-generated content (AIGC), and embodied intelligence. Chapter 5 discusses feasible approaches to achieving general artificial intelligence.
This book is suitable not only as a textbook for senior undergraduate and graduate students in relevant university programs but also as an essential reference for IT professionals. The Chinese version of this book has been selected for the undergraduate textbook series at Sun Yat-sen University.
The translation was done with the help of artificial intelligence. A subsequent human revision was done primarily in terms of content.
Prof. Liang Lin is a world-renowned scholar in the field of artificial intelligence and a Fellow of IEEE, IAPR, and IET. He currently serves as the Director of the Institute of Multi-Agent and Embodied Intelligence at Peng Cheng Laboratory, a Distinguished Professor at Sun Yat-sen University. He previously held the position of Executive Dean at the SenseTime Research Institute. He was a recipient of the National Science Fund for Distinguished Young Scholars, and the Chief Scientist of China’s National Major Project on Artificial Intelligence. His research has led to a series of pioneering contributions in multimodal representation learning, causal inference, and embodied intelligence. As of October 2024, he has published more than 400 papers, which have been cited over 45,000 times according to Google Scholar. He has received five Best Paper or Outstanding Paper Awards at leading international conferences and journals, including ACL, ICCV, ICME, and Pattern Recognition. As the first contributor, he has been awarded CCF-ACM Award for Artificial Intelligence in 2025, the First Prize of the Guangdong Provincial Science and Technology Progress Award in 2024, the Wu Wenjun Artificial Intelligence Award in 2018, and the First Prize of the Science and Technology Award of the China Society of Image and Graphics in 2019. He has supervised and mentored a number of outstanding PhD students who received prestigious honors such as the CCF Outstanding Doctoral Dissertation Award, the ACM China Doctoral Dissertation Award, and the CAAI Outstanding Doctoral Dissertation Award. Yang Liu is an associate professor at the School of Computer Science, Sun Yat-sen University, and a key member of the Human-Cyber-Physical Intelligence Integration Laboratory (HCP-Lab) at Sun Yat-sen University. His primary research interests include embodied intelligence, multimodal spatial perception and reasoning, and causal inference. He has published over 40 papers in prestigious journals and conferences such as TPAMI, TIP, TMECH, TKDE, CVPR, ICCV, ACM MM, and NeurIPS. Among these, four conference papers were selected as Oral/Highlight presentations, and four journal papers have been recognized as ESI Highly Cited Papers. He has led more than 10 research projects, including the National Natural Science Foundation of China (General Program, Youth Program, and Key Program as Project Lead) and the Pengcheng Laboratory "Open Challenge" program. He served as Co-Chair for the AIGC and Multi-Agent Parallel Computing Track at ICPADS 2025 and the Multimodal Mathematical Reasoning Workshop at ICDAR 2025. He won the Excellence Award at the 2023 China Software Conference for the Robotic Large Model and Embodied Intelligence Challenge, and the First Prize at the 2023 Guangdong Province Third Youth Academic Showcase in Computer Science.
Inhaltsangabe
. 1 The Large Model Family . 2 Core Technology of Multimodal Large Models . 3 Multimodal Foundation Models . 4 Applications of Multimodal Large Models . 5 Multimodal Large Models Towards AGI.
. 1 The Large Model Family . 2 Core Technology of Multimodal Large Models . 3 Multimodal Foundation Models . 4 Applications of Multimodal Large Models . 5 Multimodal Large Models Towards AGI.
Es gelten unsere Allgemeinen Geschäftsbedingungen: www.buecher.de/agb
Impressum
www.buecher.de ist ein Internetauftritt der buecher.de internetstores GmbH
Geschäftsführung: Monica Sawhney | Roland Kölbl | Günter Hilger
Sitz der Gesellschaft: Batheyer Straße 115 - 117, 58099 Hagen
Postanschrift: Bürgermeister-Wegele-Str. 12, 86167 Augsburg
Amtsgericht Hagen HRB 13257
Steuernummer: 321/5800/1497
USt-IdNr: DE450055826