This book is for audio information retrieval
practitioners.
It is about audio content-based search.
Specifically, it is on exploring promising paths for
bridging the semantic gap that currently prevents
wide deployment of audio content-based search
engines. Music search sound engines rely on metadata,
mostly human generated, to manage collections of
audio assets. Even though time-consuming and
error-prone, human labeling is a common practice.
Audio content-based methods, algorithms that
automatically extract description from audio files,
are generally not mature enough to provide the user
friendly representation that users demand when
interacting with audio content. This dissertation has two
parts. In a first part we explore the strengths and
limitation of a pure low-level audio description
technique: audio
fingerprinting.
In the second part, we hypothesize that one of the
problems that hinders the closing the semantic gap is
the lack of intelligence that encodes common sense
knowledge and that such a knowledge base is a primary
step toward bridging
the semantic gap. We present a sound effects
retrieval system which leverages both low-level and
semantic technologies.
practitioners.
It is about audio content-based search.
Specifically, it is on exploring promising paths for
bridging the semantic gap that currently prevents
wide deployment of audio content-based search
engines. Music search sound engines rely on metadata,
mostly human generated, to manage collections of
audio assets. Even though time-consuming and
error-prone, human labeling is a common practice.
Audio content-based methods, algorithms that
automatically extract description from audio files,
are generally not mature enough to provide the user
friendly representation that users demand when
interacting with audio content. This dissertation has two
parts. In a first part we explore the strengths and
limitation of a pure low-level audio description
technique: audio
fingerprinting.
In the second part, we hypothesize that one of the
problems that hinders the closing the semantic gap is
the lack of intelligence that encodes common sense
knowledge and that such a knowledge base is a primary
step toward bridging
the semantic gap. We present a sound effects
retrieval system which leverages both low-level and
semantic technologies.