This monograph introduces the field of bisociative literature-based discovery (LBD) by first explaining the underlying LBD principles and techniques, followed by the presentation of bisociative LBD techniques and applications developed by the authors. LBD is a process of uncovering new knowledge by analyzing and connecting disparate pieces of information from different sources of literature. Selected techniques include conventional natural language processing (NLP) approaches, as well as outlier-based, concept-based, network-based, and embeddings-based LBD approaches. Reproducibility…mehr
This monograph introduces the field of bisociative literature-based discovery (LBD) by first explaining the underlying LBD principles and techniques, followed by the presentation of bisociative LBD techniques and applications developed by the authors. LBD is a process of uncovering new knowledge by analyzing and connecting disparate pieces of information from different sources of literature.
Selected techniques include conventional natural language processing (NLP) approaches, as well as outlier-based, concept-based, network-based, and embeddings-based LBD approaches. Reproducibility aspects of bisociative LBD research are also covered, addressing all steps of the bisociative LBD process: data acquisition, text preprocessing, hypothesis discovery, and evaluation.
The monograph is targeted at researchers, students, and domain experts interested in knowledge exploration, information retrieval, text mining, data science or semantic technologies. By covering texts, relations, networks, and ontologies, this work empowers domain experts to transcend their knowledge silos when confronted with varied data formats in their research practice. The monograph s open science approach with tutorials in Python allows for code reuse and experiment replicability.
Nada Lavrä is a Research Councillor and a former Head of the Department of Knowledge Technologies at the Joef Stefan Institute, Ljubljana, Slovenia. She is a Full Professor at the University of Nova Gorica and was Head of the ICT Programme and Vice-Dean of the International Postgraduate School Joef Stefan. Her research interests include machine learning, semantic data mining, text mining, computational creativity, and applications of machine learning in medicine and bioinformatics. She has been a keynote speaker at KI, ADBIS, ISWC, LPNMR, JSMI, and AIME conferences, and has chaired several conferences, including ILP, ICCC, IDA, DS, and AIME. She served on the editorial boards of Artificial Intelligence in Medicine, AI Communications, New Generation Computing, Applied AI, Machine Learning, and Data Mining and Knowledge Discovery. She is an ECCAI/EurAI Fellow (and was ECCAI Vice-President from 1996 to 1998), an ELLIS Fellow (and was an ELLIS Board Member from 2022 to 2025), and was a member of the International Machine Learning Society and the Artificial Intelligence in Medicine boards. She also received several national awards for her outstanding contributions to machine learning. Bojan Cestnik is the founder and CEO of the high-tech software company Temida, a Senior Researcher at the Joef Stefan Institute, and a Professor of Computer Science at the University of Nova Gorica and the Joef Stefan International Postgraduate School, all in Slovenia. His work combines scientific research with real-world applications in the field of artificial intelligence. He specializes in machine learning, predictive analytics, and decision making. He has developed innovative methods that improve the interpretability and reliability of machine learning models and knowledge-based systems, significantly contributing to decision-support applications that demand robustness and transparency. Andrej Kastrin is an Associate Professor of Biostatistics and Biomedical Informatics at the University of Ljubljana, Slovenia. His research focuses on the foundations of artificial intelligence, large-scale statistical learning, text mining, and complex network analysis, with particular emphasis on computational scientific discovery. He has authored over 100 peer-reviewed publications, including journal articles and conference papers. He is actively involved in both national and European research initiatives and frequently serves on program committees for leading data science conferences. He is also an editor of the international journal Advances in Methodology and Statistics and the chief organizer of the international conference Applied Statistics. He teaches courses in statistics and data science and supervises PhD students.
Inhaltsangabe
1. Introduction.- 2. History, Resources and Tools.- 3. Background Technologies.- 4. Benchmark Data and Reusable Python Code.- 5. Text Mining for Closed Discovery.- 6. Outlier-based Closed Discovery.- 7. Semantic and Outlier-based Open Discovery.- 8. Network-based Closed Discovery.- 9. Embedding-based Closed Discovery.- 10. Research Trends and Lessons Learned.