Monocular RGB image-based depth estimation plays an important role for autonomous driving, 3D reconstruction, robotics, and augmented reality/virtual reality. Self-supervised monocular depth estimation methods have recently performed impressively in scenes containing static objects primarily based on the assumption that scenes are consistent when viewed from different frames. Violations of this occur with moving objects and occlusions, leading to poor performance in depth accuracy in dynamic scenes and blurry object boundaries due to exclusion of dynamic areas from the training data. To mitigate such issues, we offer a self-supervised monocular depth estimation network using channelattention module that incorporates external pre-trained depth estimation models (pseudodepth) into its loss functions and a guided channel-attention mechanism in the decoder of the depth estimation network. These additions enabled our model to accurately estimate dynamic objects’ depth with clear boundaries when trained on highly dynamic video scenes. We tested this approach on the BONN, KITTI, and NYUv2 datasets which contain both static and highly dynamic scenes. Results indicate that our approach performs competitively with prior approaches.