Multimedia Analytics for Scene Content Understanding / HASNAIN ALI

By: ALI, HASNAINContributor(s): Supervisor : Dr Syed Omer GilaniMaterial type: TextTextIslamabad : SMME- NUST; 2025Description: 133p. Soft Copy 30cmSubject(s): PhD Robotics and Intelligent Machine EngineeringDDC classification: 629.8 Online resources: Click here to access online
Tags from this library: No tags from this library for this title. Log in to add tags.
Item type Current location Home library Shelving location Call number Status Date due Barcode Item holds
Thesis Thesis School of Mechanical & Manufacturing Engineering (SMME)
School of Mechanical & Manufacturing Engineering (SMME)
E-Books 629.8 (Browse shelf) Available SMME-phd-43
Total holds: 0

With the rapid expansion of video content, understanding how humans retain and recall visual data has become crucial. Memorability, a key neurocognitive process, plays
a significant role in retaining and retrieving video content. While past research has
explored image memorability, video memorability has received less attention, leaving a
gap in robust computational models for predicting memorable video events. This thesis
addresses this gap through a multi-phase study focused on video memorability prediction, scalable feature extraction, and behavior training for robotic systems. The first
study introduces a novel framework that predicts episodic video memorability by fusing deep features, including text, color, and motion. Episodic sequences are generated
using a Fuzzy FastText model and color histogram analysis, while scene objects are
identified using a Faster Region-based Convolutional Neural Network (Faster R-CNN).
The fusion of these features results in improved short- and long-term memorability, with
a superior Spearman’s rank correlation of 0.6428 and 0.4285, respectively. The second
study focuses on a robust Stacked Bin-Convolutional Neural Network (SB-CNN) and
Sparse Low-Rank Regressor (SLRR) model. This model improves video event classification by employing a low-rank representation technique that reduces noise in video
frames, leading to more accurate predictions. The Multi-Attribute Decision Making
(MADM) technique is applied to enhance decision making, achieving a recall time of
49.9247 on public datasets. In the final study, a Trimmed Q-learning algorithm is introduced to optimize memorability-driven scene prediction in mobile robots. The training
is conducted through online, short-term, and long-term learning modules, with significant improvements in memorability scores: 72.84% for short-term and online learning,
and 68.63% for long-term learning. By linking these phases, this thesis presents an
integrated framework that effectively addresses the challenges of video memorability
prediction, robust feature scaling, and robotic decision-making, offering practical insights for both academic research and real-world applications.

There are no comments on this title.

to post a comment.
© 2023 Central Library, National University of Sciences and Technology. All Rights Reserved.