NUST Institutions Library Catalogue catalog › Details for: Multimedia Analytics for Scene Content Understanding /

Normal view MARC view ISBD view

Multimedia Analytics for Scene Content Understanding / HASNAIN ALI

By: ALI, HASNAIN Contributor(s): Supervisor : Dr Syed Omer Gilani Material type: Text

TextIslamabad : SMME- NUST; 2025Description: 133p. Soft Copy 30cmSubject(s): PhD Robotics and Intelligent Machine EngineeringDDC classification: 629.8 Online resources: Click here to access online

Tags from this library: No tags from this library for this title. Log in to add tags.

Holdings ( 1 )
Title notes ( 1 )
Comments ( 0 )

Item type	Current location	Home library	Shelving location	Call number	Status	Date due	Barcode	Item holds
Thesis	School of Mechanical & Manufacturing Engineering (SMME)	School of Mechanical & Manufacturing Engineering (SMME)	E-Books	629.8 (Browse shelf)	Available		SMME-phd-43

Total holds: 0

With the rapid expansion of video content, understanding how humans retain and recall visual data has become crucial. Memorability, a key neurocognitive process, plays
a significant role in retaining and retrieving video content. While past research has
explored image memorability, video memorability has received less attention, leaving a
gap in robust computational models for predicting memorable video events. This thesis
addresses this gap through a multi-phase study focused on video memorability prediction, scalable feature extraction, and behavior training for robotic systems. The first
study introduces a novel framework that predicts episodic video memorability by fusing deep features, including text, color, and motion. Episodic sequences are generated
using a Fuzzy FastText model and color histogram analysis, while scene objects are
identified using a Faster Region-based Convolutional Neural Network (Faster R-CNN).
The fusion of these features results in improved short- and long-term memorability, with
a superior Spearman’s rank correlation of 0.6428 and 0.4285, respectively. The second
study focuses on a robust Stacked Bin-Convolutional Neural Network (SB-CNN) and
Sparse Low-Rank Regressor (SLRR) model. This model improves video event classification by employing a low-rank representation technique that reduces noise in video
frames, leading to more accurate predictions. The Multi-Attribute Decision Making
(MADM) technique is applied to enhance decision making, achieving a recall time of
49.9247 on public datasets. In the final study, a Trimmed Q-learning algorithm is introduced to optimize memorability-driven scene prediction in mobile robots. The training
is conducted through online, short-term, and long-term learning modules, with significant improvements in memorability scores: 72.84% for short-term and online learning,
and 68.63% for long-term learning. By linking these phases, this thesis presents an
integrated framework that effectively addresses the challenges of video memorability
prediction, robust feature scaling, and robotic decision-making, offering practical insights for both academic research and real-world applications.

There are no comments on this title.

to post a comment.