Living media for user communities
This new setting contrasts greatly with the classic multimedia IR model and motivates users to cooperate as a large community. Understanding the possibilities of the new problem setting allows scientists to work on solutions that can help users and bring more success to the area of semantic-multimedia IR.
X-Media: Large scale cross-media IE
This new setting contrasts greatly with the classic multimedia IR model and motivates users to cooperate as a large community. Understanding the possibilities of the new problem setting allows scientists to work on solutions that can help users and bring more success to the area of semantic-multimedia IR.
Large-Scale Data Resources: while the traditional problem setting favours supervised methods that model labelled data, the new problem setting makes available large amounts of unlabeled data that create a demand for a new breed of unsupervised algorithms. The objective of these algorithms should be the deduction of a knowledge base concerning the way users perceive and interact with semantic-multimedia information.
Cross-Media Information: the new multimedia IR scenarios combine many different types of information sources with different semantics. In this thesis we considered only text and visual information, but many other information sources are available in multimedia, (e.g., authorship, location, event), capturing device characteristics (e.g., lenses depth of field). New algorithms must cope with the multitude of information sources and with the increased complexity and heterogeneity that they exhibit.
Semantic Multimedia IR
The extraction of semantic information from multimedia content is a research topic that tries to mimic the way human perception works, and therefore is highly related to artificial intelligence. However, human perception is still too far from being completely understood at a level that we can imitate its functions in a computational system. Nowadays, applications that make use of the semantics of multimedia content depend on manual annotations and other information extracted from the surrounding content (e.g. alternate text). This way of extracting multimedia’ semantics is flawed and costly. Doing the entire process automatically (or even semi-automatically), can greatly decrease the operational and maintenance costs of such applications.
The aim of this research is to enhance multimedia retrieval applications by combining both knowledge and statistical data in a learning framework to extract semantic information from multimedia. We will approach the problem as a Bayesian learning problem divided in three parts:
Multimedia mining: mines the feature space for the problem’s most common patterns and learns the causality relations between the occurrence of these patterns and keywords;
Multi-modal information fusion: multi-modal features will be combined in a statistical framework to increase the prediction accuracy of keywords in new unseen content.
Semantic information extraction: improve the inference results obtained in the previous steps by using knowledge about keywords co-occurrences.
Universal Multimedia Access [2000-2002]
The access to multimedia information by any terminal through any network is a new concept referred as Universal Multimedia Access (UMA). The objective of UMA technology is to make available different presentations of the same information, more or less complex, e.g., in terms of media types, suiting different terminals, networks and user preferences. This can be achieved through customizing the content to the environment where it shall be consumed.

The content customization can be implemented in three different places: 1) at the content server, 2) at a proxy server, and 3) at the user terminal. The developed system, implements the customization engine at the content server and at a proxy server. The following figure illustrates the UMA System and its major elements.

The function of each element in the UMA System is:
-
Content server: this element acts as the content source.
-
MPEG-7 description tool: this element is used to analyze the multimedia content available at the content server and generate an MPEG-7 description that will be saved locally or posted into an MPEG-7 description server.
-
MPEG-7 description server: this element stores the MPEG-7 descriptions received from the MPEG-7 description tool. In this database, one MPEG-7 description exists per each piece of content. This element provides the UMA Platform with the MPEG-7 description for the pretended piece of content.
-
UED server: this element stores the user environment description (UED) received from the UMA browser. This element provides the UMA Platform with the UED for the pretended user. It implements the same functionality as a WAP UAProf server.
-
UMA Platform: this element corresponds to the application implementing the content customization engine (UMA Engine) required to provide the best experience to the user for the content he/she asked. It will act as a content customization server.
-
Browser: this element includes a Web browser used to access content and allows the user to manage his/her environment description.
The content server, the MPEG-7 description server and the UED server are mere Web servers (Apache Web server), which implement the HTTP POST command allowing other applications to store content in the server. The UMA browser, the UMA Platform and the MPEG-7 description tool were implemented for this system.