0% found this document useful (0 votes)

9 views8 pages

Convolutional neural network based encoder-decoder for efficient real-time object detection

Convolutional neural networks (CNN) are applied to a variety of computer vision problems, such as object recognition, image classification, semantic segmentation, and many others. One of the most important and difficult issues in computer vision, object detection, has attracted a lot of attention lately. Object detection validating the occurrence of the object in the picture or video and then properly locating it for recognition. However, under certain circumstances, such as when an item has issues like occlusion, distortion, or small size, there may still be subpar detection performance. This work aims to propose an efficient deep learning model with CNN and encoder decoder for efficient object detection. The proposed model is experimented on Microsoft Common Objects in Context (MS-COCO) dataset and achieved mean average precision (mAP) of about 54.1% and accuracy of 99%. The investigational outcomes amply showed that the suggested mechanism could achieve a high detection efficiency compared with the existing techniques and needed little computational resources.

Uploaded by

IAES IJAI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views8 pages

Convolutional neural network based encoder-decoder for efficient real-time object detection

Uploaded by

IAES IJAI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

IAES International Journal of Artificial Intelligence (IJ-AI)

Vol. 14, No. 3, June 2025, pp. 1960~1967

ISSN: 2252-8938, DOI: 10.11591/ijai.v14.i3.pp1960-1967  1960

Convolutional neural network based encoder-decoder for

efficient real-time object detection

Mothiram Rajasekaran1, Chitra Sabapathy Ranganathan2, Nagarajan Mohankumar3,

Rajeshkumar Sampathrajan4, Thayalagaran Merlin Inbamalar5, Nageshvaran Nandhini6,
Shanmugam Sujatha7
1
Senior Solution Consultant, Pine Candle Way, Saint Augustine, United States
2
Associate Vice President, Mphasis Corporation, Chandler, United States
3
Symbiosis Institute of Technology, Symbiosis International (Deemed University), Nagpur Campus, Pune, India
4
Principal Cloud Architect, McKinsey & Company, Chandler, United States
5
Department of Electronics and Instrumentation Engineering, Saveetha Engineering College, Chennai, India
6
Department of Information Technology, P. S. V College of Engineering and Technology, Krishnagiri, India
7
Department of Biomedical Engineering, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences,
Saveetha University, Chennai, India

Article Info ABSTRACT

Article history: Convolutional neural networks (CNN) are applied to a variety of computer
vision problems, such as object recognition, image classification, semantic
Received Feb 27, 2024 segmentation, and many others. One of the most important and difficult
Revised Nov 26, 2024 issues in computer vision, object detection, has attracted a lot of attention
Accepted Jan 27, 2025 lately. Object detection validating the occurrence of the object in the picture
or video and then properly locating it for recognition. However, under
certain circumstances, such as when an item has issues like occlusion,
Keywords: distortion, or small size, there may still be subpar detection performance.
This work aims to propose an efficient deep learning model with CNN and
Convolutional neural networks encoder decoder for efficient object detection. The proposed model is
Deep learning experimented on Microsoft Common Objects in Context (MS-COCO)
Encoder-decoder dataset and achieved mean average precision (mAP) of about 54.1% and
Mean average precision accuracy of 99%. The investigational outcomes amply showed that the
MS-COCO dataset suggested mechanism could achieve a high detection efficiency compared
Object detection with the existing techniques and needed little computational resources.
This is an open access article under the CC BY-SA license.

Corresponding Author:
Nagarajan Mohankumar
Symbiosis Institute of Technology, Symbiosis International (Deemed University), Nagpur Campus
Pune, India
Email: [email protected]

1. INTRODUCTION
More than 90% of human understanding is visual, and various imaging equipment are frequently
used in fields that are directly related to human activity and living [1]. The processing of photos and other
information has been successfully adopted in various industries to the ongoing growth of machine learning
algorithms. The primary research challenge in computer vision, object detection, has drawn increasing
attention from academics. The object discovery typically contains two stages: first, looking for the item in the
image; second, employing bounding boxes to find the object. Convolutional neural networks (CNN) has
become highly effective at object detection in recent years [2]–[5], region based convolutional neural
network (R-CNN) [6], YOLO [7], the spatial pyramid pooling network (SPP) [8], and Fast R-CNN [9] object
detection techniques that are used in this field of study. Due to computational hardware and data availability,
traditional object detection algorithms have significant drawbacks [10]. Conversely, with the development of

Journal homepage: http://ijai.iaescore.com

Int J Artif Intell ISSN: 2252-8938  1961

artificial intelligence (AI) and processing power in recent years, the entire process can now be automated
with little to no human involvement. The primary distinction is that traditional object detection techniques
rely on human experience standards and expert judgement to extract features, whereas AI uses a sophisticated
neural network that can be trained to routinely identify powerful and judicial features.
In particular, encoder-decoder models based on fully convolutional networks (FCNs) have
significantly enhanced performance, such as semantic segmentation [11], [12], edge recognition [13] object
exposure [14], and crowd counting [15]. Essentially, the trend of popular object identification techniques are
operate within the encoder-decoder framework. For the detection task, some researchers created structures
based on the encoder-decoder paradigm and attained cutting-edge performance [16]. With regard to
benchmark datasets, CNN-based encoder-decoder models are particularly crucial for continuously improving
detection performance [17]. Convolution is done by the encoder, whereas deconvolution, un-pooling, and
up-sampling are done by the decoder to forecast pixel-wise class labels. The up-sampling decode that
corresponds the low-resolution encoder feature maps, is the important feature. This architecture employs the
encoder's pooling indicators to up-sample to map pixel-wise categorization while also significantly reducing
the number of trainable parameters. This paper is structured as follows.
The goal of object detection, which is typically done with photos or videos, is to find borders as well
as to show the object's range and location. The next step is to classify the object's category and to provide the
categorization likelihood. This task is more difficult than simple picture classification because the positions
of many items must be determined from the image or video. CNNs have been used for the detection and
classification of objects with success [18]. Current models include ways to categorise either a full input
window for each scene for a bounding box of several objects. Semantic segmentation has had a breakthrough
thanks to FCN. It has provided a potent method for boosting the effectiveness of CNNs by providing inputs
of any size [19]. The encoder-decoder-based concept that presented by [20]. It suggested for feature learning
that is unsupervised; then, neural networks backed by encoder-decoders have emerged as a potential
replacement for further aids. An intriguing pedestrian collision alert system for advanced driver assistance
systems was suggested in [21]. However, it is only capable of detecting and warning pedestrians. Facial
feature localization [22] extracted information from input strings that could only be one dimension using the
Viterbi decoding technique. Support vector machine (SVM)-based predictive modeling [23] utilised the
similar concept to expand SVM outcomes using two-dimensional maps.
As an attention generating module that learns to specifically attend to significant locations for every
pixel by employing bidirectional long short-term memory (Bi-LSTM) module within the feature maps,
paediatric intensive care audit network (PiCANet) was proposed in [24]. For C-elegans tissues with FCN
inference, coarse multi-class segmentation CNN with FCN architecture. In order to forecast pixel-level labels
and to improve the label map using conditional random field (CRF), network achieves denser score maps
using FCN architecture. One of the current major trends in CNN architecture design is the incorporation of
encoder and decoder to improve performance. Apart from these object detection models; several detection
algorithms are implemented on hardware platforms to improve the detection performance.
Pyramid scene analysis network (PSPNet) is yet another effective CNN architecture that was just
released. It is intended for prediction jobs at the pixel level. The global pyramid pooling structure that
combined global and local hints that produce the results builds the pixel-level features for effective
segmentation. Due to the PSPNet architecture's extreme complexity, training and testing processes need for a
sizable amount of processing power and graphics processing units (GPU) capabilities. The concept of
panoptic segmentation (PS) was recently introduced in a study about pixel-wise segmentation. To complete a
broad segmentation task, PS combines segmenting instances and segmentation based on semantics.
Comparatively speaking, it performs well when compared to previous visual geometry group (VGG) based
networks, although size is the design's main flaw.
The prophet algorithm, K-means clustering, and seasonal autoregressive integrated moving-average
methods act a task in enhancing the cloud infrastructures. Also, it grouping servers into clusters with similar
utilization patterns. K-means clustering enhances the resource allocation efficiency [25]. Internet of things
(IoT)-driven image recognition system utilizing CNNs to notice and quantify microplastics [26]. The data
collected by sensors is forward to a centralized monitoring system that decides whether or not an alarm
activated in the event if the situation diverge from their ideal state [27]. K-nearest neighbor (KNN) and SVM
algorithm forms a precise arrangement model to utilize the important data expectation exactness [28].
SVM with recurrent neural networks are powerful classification that makes it feasible to classify patients’
risks and predict how they will react to therapy [29]. Cloud computing grants the seizure prediction system to
improve accessible and scalable [30] and it examines the feature selection developed in for improving
accuracy [31]. Hybrid machine learning techniques like SVM with CNN algorithm to anticipates Alzheimer’s
sickness [32].

Convolutional neural network based encoder-decoder for efficient real-time … (Mothiram Rajasekaran)
1962  ISSN: 2252-8938

2. PROPOSED DEEP LEARNING MODEL

The proposed architecture is a pixel-wise model that built on two decoupled FCNs for encoding as
well as decoding as explained in Figure 1. The previously described encoder is built using the first 16
convolutional layers of VGG-19 network, then a batch normalised (BN) layer, function of activation, pooling
layer, as well as dropout units. The decoder network is composed of layers for upsampling, deconvolution,
activation, batch normalisation, dropout, and a multi-class classification. Every decoder is matched to a
pooling unit of an encoder in the system's overall encoder-decoder interface. Consequently, the decoder CNN
has 16 de-convolution layers. To probabilities of output class meant to each individual pixel individually, the
decoder sends its computations to softmax classifier.

Figure 1. Proposed architecture for real-time object detection

The key benefits of our suggested decoupled architecture are its simple training with various
environmental factors and ease of customization. For pixel-wise classification, the encoder creates
low-resolution feature maps, which the decoder up-samples through convolutioning the trainable filters to
yield intense feature maps [33]. The fundamental component of the suggested method is the decoding
procedure, which provides several useful advantages in terms of improving boundary delineation and
reduction. Also much improved is the ability to provide training by lowering the amount of trainable
attributes. It offers a simple training, which trains both the encoder as well as decoder at the same time.
With an input image, the network begins training and acts during the network to the top layers.
Adopting convolution with a prearranged set of filter banks to fabricate feature maps, the batch normalisation
process is fulfilled by the encoder. Afterwards, activations are accomplished by rectified linear units
(ReLUs). The max-pooling function is then fulfilled with a window size of 2x2 and a tread of 1. This
outcomes in a two-fold subsampling of the last image. Multiple pooling layers able to increase translation
invariance for effective categorization jobs, but the feature maps' spatial resolution is unnecessarily reduced.
Therefore, prior to the sub-sampling function, the boundary information needs to be recorded and
stored in the encoder feature maps. However, it is not practical to save the entire encoder feature maps
because to memory limitations. The best option is keep the max-pooling indicators in storage. For each 2x2
pooling window, two bits are used to memorise the positions of each max-pooling feature-map. Having a lot
of feature maps on hand is a really effective solution. With this approach, the encoder can store data much
more efficiently and fully connected layers can be dropped.

3. RESULTS AND DISCUSSION

The MS-COCO dataset that consist of 91 item with 2.5 million labelled examples in 328k images, is
used to train the proposed object detection algorithm. On a single 12 GB NVIDIA Tesla K40c GPU, the
suggested network was trained. The network is trained until the accuracy as well as loss do not significantly
grow or decrease and the loss has converged. The whole network is established and trained utilizing the Caffe
Berkeley Vision Library. Caffe provides a flexibility while it relates creating network layers as well as
training the network to meet the suggested specifications. Thus, once converges, it is trained, and no
considerable reduce in training loss is seen. The entire results are then evaluated, examined, and subsequently

Int J Artif Intell, Vol. 14, No. 3, June 2025: 1960-1967

Int J Artif Intell ISSN: 2252-8938  1963

contrasted with the specified benchmark results.The dataset is divide by training and testing. Here, 90% is
allotted for training and 10% is allotted for testing.
Many weights are 0 because training models frequently use the ReLU activation function. In this
work, it was found that after creating the sparsity model, the gradient vanished during training with ReLU6.
This is as a result of the mask excluding 50% of the weights from the gradient update. As indicated in Table 1,
the public dataset MS-COCO evaluated and contrasted with the earlier techniques. In this work, there are 5 k
and 118 k photos are utilised for testing and training the model, respectively. To ensure that the suggested
method works, the outcomes of each trial were examined. For all classes, average precision (AP) is typically
determined, and its middling is known as the mean average precision (mAP). Additionally, for AP75
candidate images, regions with above 75% accuracy are counted, and the AP50 designates the 50% area
properly. Figure 2 shows the multi-object detection results received via training model on MS-COCO dataset.

Figure 2. Screenshot formulti-object detection of complex scenes using proposed model trained on
MS-COCO dataset

For complex scenes, the proposed CNN based encoder decoder model achieved better detection
performance. The detection results include various objects such as horse, potted plant and person as shown in
Figure 3. For this detection, floating point operations per second (FLOPs) is about 128.46 with model size is
134.22 MB. Figure 3 illustrates the detection of multiple objects on MS-COCO dataset using proposed
model. There are various objects are detected from sample complex images in MS-COCO dataset.

Figure 3. Results for object detection of complex scenes using proposed model trained on MS-COCO dataset

The proposed model achieved mAP of 54.1% at 327 FPS as shown in Table 1. With the help of this
investigation, the model's performance in real-time was guaranteed. MS-COCO dataset contains the FPS
value is 327, the percentage of mAP value is 54.1%, AP50 value is 77.2% and AP75 value is 69.3%. Table 2
demonstrates the comparative results of proposed model with existing approaches. Figure 4 explains the
execution analysis of single-shot detector (SSD), YOLOv3, EfficientDet, YOLOv4 tiny, RetinaNet, and
proposed CNN-based encoder decoder model for object detection. Compare to all other models, the proposed
model has provided better results.
Convolutional neural network based encoder-decoder for efficient real-time … (Mothiram Rajasekaran)
1964  ISSN: 2252-8938

Table 1. Results for proposed object detection model

Dataset FPS mAP (%) AP50 (%) AP75 (%)
MS-COCO 327 54.1 77.2 69.3

Table 2. Comparison of proposed CNN-based object detection model with existing algorithms
Model Architecture AP75 (%) AP50 (%) mAP (%) FPS
SSD VGG 30.3 48.5 28.8 36
YOLOv3 Darknet-53 34.3 58 33 66
EfficientDet EfficientNet 35.8 52.2 33.8 16
YOLOv4 tiny CSPNet-15 20 40 22 330
RetinaNet ResNet101 36.8 53.1 34.4 11
Proposed VGG-19 69.3 77.2 54.1 327

Performance Comparison
100
AP75 (%) AP50 (%) mAP (%)
80
Percentage (%)

0
SSD YOLOv3 EfficientDet YOLOv4 tiny RetinaNet Proposed
Models

Figure 4. Performance analysis of existing approaches with proposed detection model

The outcomes of every experimentation are examined to confirm the efficiency of the proposed
method. For assessment, AP is utilized, that concerns to the region under the precision-recall curve. Usually,
AP is computed for all classes, and its average is determined as the mAP. In addition, the AP50 denotes to
the 50% region correctly detected in comparison to the ground truth, and for AP75 candidate images over
75% parts are considered. This study assured the operation of the model for real-time applications with a
good recognition accurateness.

4. CONCLUSION
We have noticed that recent efforts on object detection using CNN-based encoder-decoder models
have addressed salient object detection (SOD) as a classification task at the pixel level. The proposed method
was demonstrated through experimental findings on the open-source MS-COCO 2017 dataset to be capable
of good detection accuracy and quick execution. The objective of this work going forward is to significantly
enhance multiple object detection for high quality images without sacrificing prediction speed. It employs the
unique technique of pooling indices as well, which uses fewer processing parameters and speeds up
inference.With a mAP of 54.1 and 327 FPS, the suggested network model is highly suited for multiple object
identification. To sum up, the model's ease of training and the proposed method's low computational resource
requirements are its key features. As a result, the suggested approach is practical for many real-time
applications and offers a more economical alternative. Overall, the suggested method results in a system for
cutting-edge auto driving systems that is more affordable and more effective.

FUNDING INFORMATION
Funding information is not available.

AUTHOR CONTRIBUTIONS STATEMENT

This journal uses the Contributor Roles Taxonomy (CRediT) to recognize individual author
contributions, reduce authorship disputes, and facilitate collaboration.

Int J Artif Intell, Vol. 14, No. 3, June 2025: 1960-1967

Int J Artif Intell ISSN: 2252-8938  1965

Name of Author C M So Va Fo I R D O E Vi Su P Fu
Mothiram Rajasekaran        
Chitra Sabapathy Ranganathan          
Nagarajan Mohankumar          
Rajeshkumar Sampathrajan        
Thayalagaran Merlin Inbamalar        
Nageshvaran Nandhini       
Shanmugam Sujatha       

C : Conceptualization I : Investigation Vi : Visualization

M : Methodology R : Resources Su : Supervision
So : Software D : Data Curation P : Project administration
Va : Validation O : Writing - Original Draft Fu : Funding acquisition
Fo : Formal analysis E : Writing - Review & Editing

CONFLICT OF INTEREST STATEMENT

The authors have no conflict of interest relevant to this paper.

DATA AVAILABILITY
The data that support the findings of this study are available on request from the corresponding
author, [NM]. The data, which contain information that could compromise the privacy of research
participants, are not publicly available due to certain restrictions.

REFERENCES
[1] Z. Zou, K. Chen, Z. Shi, Y. Guo, and J. Ye, “Object detection in 20 years: a survey,” Proceedings of the IEEE, vol. 111, no. 3,
pp. 257–276, 2023, doi: 10.1109/JPROC.2023.3238524.
[2] Z. Li et al., “Deep learning-based object detection techniques for remote sensing images: a survey,” Remote Sensing, vol. 14,
no. 10, 2022, doi: 10.3390/rs14102385.
[3] J. Jegan, M. R. Suguna, M. Shobana, H. Azath, S. Murugan, and M. Rajmohan, “IoT-enabled black box for driver behavior
analysis using cloud computing,” in 2024 International Conference on Advances in Data Engineering and Intelligent Computing
Systems (ADICS), 2024, pp. 1–6, doi: 10.1109/ADICS58448.2024.10533471.
[4] K. Muhammad, J. Ahmad, Z. Lv, P. Bellavista, P. Yang, and S. W. Baik, “Efficient deep CNN-based fire detection and
localization in video surveillance applications,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 49, no. 7,
pp. 1419–1434, 2019, doi: 10.1109/TSMC.2018.2830099.
[5] J.-M. Guo, J.-S. Yang, S. Seshathiri, and H.-W. Wu, “A light-weight CNN for object detection with sparse model and knowledge
distillation,” Electronics, vol. 11, no. 4, Feb. 2022, doi: 10.3390/electronics11040575.
[6] S. Srinivasan, R. Raja, C. Jehan, S. Murugan, C. Srinivasan, and M. Muthulekshmi, “IoT-enabled facial recognition for smart
hospitality for contactless guest services and identity verification,” in 2024 11th International Conference on Reliability, Infocom
Technologies and Optimization (Trends and Future Directions) (ICRITO), 2024, pp. 1–6, doi:
10.1109/ICRITO61523.2024.10522363.
[7] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv-Computer Science, pp. 1–6, 2018, doi:
10.48550/arXiv.1804.02767.
[8] Y. H. Wu, Y. Liu, X. Zhan, and M. M. Cheng, “P2T: pyramid pooling transformer for scene understanding,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 45, no. 11, pp. 12760–12771, 2023, doi: 10.1109/TPAMI.2022.3202765.
[9] H. Jiang and E. Learned-Miller, “Face detection with the faster R-CNN,” in 2017 12th IEEE International Conference on
Automatic Face & Gesture Recognition (FG 2017), May 2017, pp. 650–657, doi: 10.1109/FG.2017.82.
[10] Z. Wang, J. Zhu, S. Fu, S. Mao, and Y. Ye, “RFPNet: Reorganizing feature pyramid networks for medical image segmentation,”
Computers in Biology and Medicine, vol. 163, 2023, doi: 10.1016/j.compbiomed.2023.107108.
[11] A. Tragakis, C. Kaul, R. Murray-Smith, and D. Husmeier, “The fully convolutional transformer for medical image segmentation,”
in 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 3649–3658, doi:
10.1109/WACV56688.2023.00365.
[12] J. Ramasamy, E. Srividhya, V. Vaidehi, S. Vimaladevi, N. Mohankumar, and S. Murugan, “Cloud-enabled isolation forest for
anomaly detection in UAV-based power line inspection,” in 2024 2nd International Conference on Networking and
Communications (ICNWC), 2024, pp. 1–6, doi: 10.1109/ICNWC60771.2024.10537407.
[13] D. Bai, X. Zheng, T. Liu, K. Li, and J. Yang, “Finger disability recognition based on holistically-nested edge detection,” in
Intelligent Robotics and Applications, 2022, pp. 146–154, doi: 10.1007/978-3-031-13844-7_15.
[14] M. R. Sudha et al., “Predictive modeling for healthcare worker well-being with cloud computing and machine learning for stress
management,” International Journal of Electrical and Computer Engineering, vol. 15, no. 1, pp. 1218–1228, 2025, doi:
10.11591/ijece.v15i1.pp1218-1228.
[15] Y. Xie, Y. Lu, and S. Wang, “RSANet: deep recurrent scale-aware network for crowd counting,” Proceedings - International
Conference on Image Processing, ICIP, pp. 1531–1535, 2020, doi: 10.1109/ICIP40778.2020.9191086.
[16] I. Filali, M. S. Allili, and N. Benblidia, “Multi-scale salient object detection using graph ranking and global–local saliency
refinement,” Signal Processing: Image Communication, vol. 47, pp. 380–401, 2016, doi: 10.1016/j.image.2016.07.007.
[17] Z. Wu, G. Allibert, F. Meriaudeau, C. Ma, and C. Demonceaux, “HiDAnet: RGB-D salient object detection via hierarchical depth
awareness,” IEEE Transactions on Image Processing, vol. 32, pp. 2160–2173, 2023, doi: 10.1109/TIP.2023.3263111.

Convolutional neural network based encoder-decoder for efficient real-time … (Mothiram Rajasekaran)
1966  ISSN: 2252-8938

[18] P. Maheswari, S. Gowriswari, S. Balasubramani, A. R. Babu, N. K. Jijith, and S. Murugan, “Intelligent headlights for adapting beam
patterns with raspberry pi and convolutional neural networks,” in 2024 2nd International Conference on Device Intelligence,
Computing and Communication Technologies (DICCT), 2024, pp. 182–187, doi: 10.1109/DICCT61038.2024.10533159.
[19] J. Hai, Y. Hao, F. Zou, F. Lin, and S. Han, “Advanced RetinexNet: A fully convolutional network for low-light image
enhancement,” Signal Processing: Image Communication, vol. 112, 2023, doi: 10.1016/j.image.2022.116916.
[20] D. Stavens and S. Thrun, “Unsupervised learning of invariant features using video,” Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, pp. 1649–1656, 2010, doi: 10.1109/CVPR.2010.5539773.
[21] C. C. Sekhar, K. Vijayalakshmi, A. S. Rao, V. Vedanarayanan, M. B. Sahaai, and S. Murugan, “Cloud-based water tank
management and control system,” in 2023 2nd International Conference on Smart Technologies for Smart Nation, SmartTechCon
2023, 2023, pp. 641–646, doi: 10.1109/SmartTechCon57526.2023.10391730.
[22] S. M. Hanif, L. Prevost, R. Belaroussi, and M. Milgram, “Real-time facial feature localization by combining space displacement
neural networks,” Pattern Recognition Letters, vol. 29, no. 8, pp. 1094–1104, 2008, doi: 10.1016/j.patrec.2007.09.016.
[23] B. J. Ganesh, P. Vijayan, V. Vaidehi, S. Murugan, R. Meenakshi, and M. Rajmohan, “SVM-based predictive modeling of
drowsiness in hospital staff for occupational safety solution via IoT infrastructure,” in 2024 2nd International Conference on
Computer, Communication and Control (IC4), 2024, pp. 1–5, doi: 10.1109/IC457434.2024.10486429.
[24] N. Liu, J. Han, and M. H. Yang, “PiCANet: pixel-wise contextual attention learning for accurate saliency detection,” IEEE
Transactions on Image Processing, vol. 29, pp. 6438–6451, 2020, doi: 10.1109/TIP.2020.2988568.
[25] A. R. Rathinam, B. S. Vathani, A. Komathi, J. Lenin, B. Bharathi, and S. M. Urugan, “Advances and predictions in predictive
auto-scaling and maintenance algorithms for cloud computing,” 2nd International Conference on Automation, Computing and
Renewable Systems, ICACRS 2023 - Proceedings, pp. 395–400, 2023, doi: 10.1109/ICACRS58579.2023.10404186.
[26] M. D. A. Hasan, K. Balasubadra, G. Vadivel, N. Arunfred, M. V. Ishwarya, and S. Murugan, “IoT-driven image recognition for
microplastic analysis in water systems using convolutional neural networks,” in 2024 2nd International Conference on Computer,
Communication and Control (IC4), 2024, pp. 1–6, doi: 10.1109/IC457434.2024.10486490.
[27] S. Selvarasu, K. Bashkaran, K. Radhika, S. Valarmathy, and S. Murugan, “IoT-enabled medication safety: real-time temperature
and storage monitoring for enhanced medication quality in hospitals,” 2nd International Conference on Automation, Computing
and Renewable Systems, ICACRS 2023 - Proceedings, pp. 256–261, 2023, doi: 10.1109/ICACRS58579.2023.10405212.
[28] K. Padmanaban, A. M. S. Kumar, H. Azath, A. K. Velmurugan, and M. Subbiah, “Hybrid data mining technique based breast
cancer prediction,” AIP Conference Proceedings, vol. 2523, 2023, doi: 10.1063/5.0110216.
[29] N. Mohankumar et al., “Advancing chronic pain relief cloud-based remote management with machine learning in healthcare,”
Indonesian Journal of Electrical Engineering and Computer Science, vol. 37, no. 2, pp. 1042–1052, 2025, doi:
10.11591/ijeecs.v37.i2.pp1042-1052.
[30] M. Vadivel, V. B. Marin, S. Balasubramani, S. Hemalatha, S. Murugan, and S. Velmurugan, “Cloud-based passenger experience
management in bus fare ticketing systems using random forest algorithm,” in 2024 11th International Conference on Reliability,
Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), 2024, pp. 1–6, doi:
10.1109/ICRITO61523.2024.10522226.
[31] M. P. Aarthi, C. M. Reddy, A. Anbarasi, N. Mohankumar, M. V. Ishwarya, and S. Murugan, “Cloud-based road safety for real-
time vehicle rash driving alerts with random forest algorithm,” in 2024 3rd International Conference for Innovation in
Technology (INOCON), 2024, pp. 1–6, doi: 10.1109/INOCON60754.2024.10511316.
[32] M. S. Kumar, H. Azath, A. K. Velmurugan, K. Padmanaban, and M. Subbiah, “Prediction of Alzheimer’s disease using hybrid
machine learning technique,” AIP Conference Proceedings, vol. 2523, 2023, doi: 10.1063/5.0110283.
[33] E. P. Kannan and T. V. Chithra, “Lagrange interpolation for natural colour image demosaicing,” International Journal of
Advances in Signal and Image Sciences, vol. 7, no. 2, pp. 21–30, 2021, doi: 10.29284/ijasis.7.2.2021.21-30.

BIOGRAPHIES OF AUTHORS

Mothiram Rajasekaran is an accomplished IT leader with over 13 years of

experience in big data and cloud solutions. He is recognized for his expertise in Apache
Spark, Hadoop, Hive, Impala, and other cutting-edge technologies. He is proficiency in AI
and machine learning enables him to deliver data-driven insights and innovative solutions. He
excels in collaborating directly with clients, leading the design and implementation of data
and application migrations to private and public clouds. He is leadership accelerates the
adoption of emerging features and ensures the delivery of advanced data solutions for
organizations. He can be contacted at email: [email protected].

Chitra Sabapathy Ranganathan is (Client Partner | Account Management | IT

Transformation Strategy | Digital Engineering Solutions & Advisory | Agile Delivery
Adoption | Sales | CoE & CoP Setup. Results-oriented, accomplished business technology
leader with 23+ years of experience in software engineering and design. Proven track record
of conceptualizing, architecting, and delivering reliable and scalable systems in a variety of
areas comprising multi-technologies including cloud, big data, AI, ML, advance analytics,
blockchain, mainframe, and business intelligence. Executed complex engagements across
multiple verticals, manage sales, IT Delivery and Operations, established vision, strategy, and
journey maps that align with business priorities. Enterprise leader in digital engineering
solutions & advisory, agile delivery adoption & management, pre-sales, CoE & CoP setup, IT
transformation strategy, enterprise quality & digital assurance. He can be contacted at email:
[email protected].

Int J Artif Intell, Vol. 14, No. 3, June 2025: 1960-1967

Int J Artif Intell ISSN: 2252-8938  1967

Dr. Nagarajan Mohankumar was born in India in 1978. He received his B.E.
degree from Bharathiyar University, Tamilnadu, India in 2000 and M.E. and Ph.D degree
from Jadavpur University, Kolkata in 2004 & 2010. He joined the Nano Device Simulation
Laboratory in 2007 and worked as a Senior Research Fellow under CSIR direct Scheme till
September 2009. Later he joined SKP Engineering College as a Professor to develop research
activities in the field of VLSI and NANO technology. He is currently working as a
ResearchProfessor at Symbiosis Institute of Technology, Nagpur Campus, Symbiosis
(International) Deemed University, Pune, India. He is a senior member of IEEE. He has about
85 international journal publications in reputed journals and about 50 international conference
proceedings. He received the carrier award for young teachers (CAYT) from AICTE, New
Delhi in the year of 2012-2014. His research interest includes modeling and simulation study
of HEMTs, optimization of devices for RF applications and characterization of advanced
HEMT architecture, terahertz electronics, high frequency imaging, sensors and
communication. He can be contacted at email: [email protected].

Rajeshkumar Sampathrajan is a Principal Cloud Architect at McKinsey, where

he leads a team of engineers and architects in designing and building highly scalable,
resilient, and distributed systems using the latest cloud native technology in Google Cloud
Platform (GCP). He has over 17 years of experience in the IT industry, spanning various
domains such as banking, retail, healthcare, and consulting. With multiple GCP certifications,
as well as credentials in Azure, Snowflake, HashiCorp, Teradata, Cloudera, and ITIL. She is
an expert in cloud computing, big data, machine learning, and security. He has successfully
delivered solutions for complex and large-scale data analytics, data engineering, and data
science projects, leveraging GCP BigQuery, Vertex AI, Dataiku, and other tools. He is
passionate about helping clients transform their businesses with data-driven insights and
innovative solutions. He can be contacted at email: [email protected].

Dr. Thayalagaran Merlin Inbamalar serves as an Associate Professor in the

Department of Electronics and Instrumentation Engineering at Saveetha Engineering College,
Chennai, India. She earned her B.E. in electronics and instrumentation engineering in 2006
from Karunya Institute of Technology, Coimbatore, affiliated with Anna University, and her
M.E. in applied electronics in 2008 from St. Joseph’s College of Engineering, Chennai, also
affiliated with Anna University. She completed her Ph.D. at Anna University in 2024. With
16 years of teaching experience, she has contributed to numerous national and international
journals, as well as patents. Her areas of expertise include image processing, instrumentation,
and control. She can be contacted at email: [email protected].

Nageshvaran Nandhini has done her B.Tech. (information technology) at

Kongu Engineering College, Erode and M.E. (Computer Science and Engineering) at Perumal
Manimekalai College of Engineering, Hosur. She has started her teaching career in the year
2015 and she has more than 7 years of teaching experience. She has organized seminars,
workshops for the benefit of students and guided several students’ projects. She has published
4 papers in international journals and presented 5 papers in various national / international
conferences. She has authored 3 books and book chapters. She is the member of computer
society of India. She is currently serving as an Assistant Professor in the Department of
Information Technology, P.S.V. College of Engineering and Technology, Krishnagiri,
Tamilnadu, India. She can be contacted at email: [email protected].

Shanmugam Sujatha is an adjunct professor, Saveetha School of Engineering,

Saveetha Institute of Medical and Technical Sciences, Chennai, Tamilnadu, India. She
published her research articles in many international and national conferences and journals.
Her research areas include network security and machine learning. She can be contacted at
email: [email protected].

Convolutional neural network based encoder-decoder for efficient real-time … (Mothiram Rajasekaran)

A new wrapper feature selection approach for binary ransomware detection
No ratings yet
A new wrapper feature selection approach for binary ransomware detection
9 pages
A hybrid framework for wild animal classification using fine tuned DenseNet121 and machine learning classifiers
No ratings yet
A hybrid framework for wild animal classification using fine tuned DenseNet121 and machine learning classifiers
13 pages
Sign language recognition and classification using blended ensemble machine learning
No ratings yet
Sign language recognition and classification using blended ensemble machine learning
9 pages
Identification of potential depression in social media posts
No ratings yet
Identification of potential depression in social media posts
8 pages
A symptom-driven medical diagnosis support model based on machine learning techniques
No ratings yet
A symptom-driven medical diagnosis support model based on machine learning techniques
11 pages
Multilayer stacking for polycystic ovary syndrome diagnosis
No ratings yet
Multilayer stacking for polycystic ovary syndrome diagnosis
8 pages
Electrocardiogram sequences data analytics and classification using unsupervised and supervised machine learning algorithms
No ratings yet
Electrocardiogram sequences data analytics and classification using unsupervised and supervised machine learning algorithms
17 pages
Blockchain and machine learning driven agricultural transformation framework to enhance efficiency, transparency, and sustainability
No ratings yet
Blockchain and machine learning driven agricultural transformation framework to enhance efficiency, transparency, and sustainability
13 pages
Comparative analysis of machine learning models for fake news detection in social media
No ratings yet
Comparative analysis of machine learning models for fake news detection in social media
9 pages
Hybrid semantic model based on machine learning for sentiment classification of consumer reviews
No ratings yet
Hybrid semantic model based on machine learning for sentiment classification of consumer reviews
11 pages
LMS bot: enhanced learning management systems for improved student learning experiences using robotic process automation
No ratings yet
LMS bot: enhanced learning management systems for improved student learning experiences using robotic process automation
11 pages
Improvisation in detection of pomegranate leaf disease using transfer learning techniques
No ratings yet
Improvisation in detection of pomegranate leaf disease using transfer learning techniques
10 pages
Enhancing plagiarism detection using data pre-processing and machine learning approach
No ratings yet
Enhancing plagiarism detection using data pre-processing and machine learning approach
11 pages
Deep lung nodule detection using multi-resolution analysis on computed tomography images
No ratings yet
Deep lung nodule detection using multi-resolution analysis on computed tomography images
12 pages
Averaged bars for cryptocurrency price forecasting across different horizons
No ratings yet
Averaged bars for cryptocurrency price forecasting across different horizons
9 pages
Customer segmentation using association rule mining on retail transaction data
No ratings yet
Customer segmentation using association rule mining on retail transaction data
11 pages
Comparison of deep learning models: CNN and VGG-16 in identifying pornographic content
No ratings yet
Comparison of deep learning models: CNN and VGG-16 in identifying pornographic content
16 pages
Optimizing real-time data preprocessing in IoT-based fog computing using machine learning algorithms
No ratings yet
Optimizing real-time data preprocessing in IoT-based fog computing using machine learning algorithms
10 pages
Assured time series forecasting using inertial measurement unit, neural networks, and state estimators
No ratings yet
Assured time series forecasting using inertial measurement unit, neural networks, and state estimators
14 pages
Application of the adaptive neuro-fuzzy inference system for prediction of the electrical energy production in Jakarta
No ratings yet
Application of the adaptive neuro-fuzzy inference system for prediction of the electrical energy production in Jakarta
9 pages
GradeZen: automated grading ecosystem using deep learning for educational assessments
No ratings yet
GradeZen: automated grading ecosystem using deep learning for educational assessments
11 pages
Flame analysis and combustion estimation using large language and vision assistant and reinforcement learning
No ratings yet
Flame analysis and combustion estimation using large language and vision assistant and reinforcement learning
10 pages
Detection of partially occluded area in face image using U-Net model
No ratings yet
Detection of partially occluded area in face image using U-Net model
7 pages
Heterogeneous semantic graph embedding assisted edge sensitive learning for cross-domain recommendation
No ratings yet
Heterogeneous semantic graph embedding assisted edge sensitive learning for cross-domain recommendation
14 pages
Developing A Website For English-Speaking Practice To English As A Foreign Language Learners at The University Level
No ratings yet
Developing A Website For English-Speaking Practice To English As A Foreign Language Learners at The University Level
12 pages
The influence of sentiment analysis in enhancing early warning system model for credit risk mitigation
No ratings yet
The influence of sentiment analysis in enhancing early warning system model for credit risk mitigation
10 pages
Leveraging artificial intelligence through long short-term memory approach for correcting faults in Chinese language sentences
No ratings yet
Leveraging artificial intelligence through long short-term memory approach for correcting faults in Chinese language sentences
10 pages
Techniques of Quran reciters recognition: a review
No ratings yet
Techniques of Quran reciters recognition: a review
13 pages
Novel preemptive intelligent artificial intelligence-model for detecting inconsistency during software testing
No ratings yet
Novel preemptive intelligent artificial intelligence-model for detecting inconsistency during software testing
9 pages
Multi-Task Deep Learning For Vietnamese Capitalization and Punctuation Recognition
No ratings yet
Multi-Task Deep Learning For Vietnamese Capitalization and Punctuation Recognition
11 pages
Tech Enthusiast's Resume
No ratings yet
Tech Enthusiast's Resume
2 pages
PHD IT Syllabus 01
No ratings yet
PHD IT Syllabus 01
27 pages
Machine Learning and Deep Learning
No ratings yet
Machine Learning and Deep Learning
6 pages
Sat - 75.Pdf - Analysis of Automatic Genger Prediction in Social Media by Using Xgboost Algorithm
No ratings yet
Sat - 75.Pdf - Analysis of Automatic Genger Prediction in Social Media by Using Xgboost Algorithm
11 pages
A Study On Deep Learning For Fake News Detection
No ratings yet
A Study On Deep Learning For Fake News Detection
48 pages
Roi Detection For Visual Lip Reading
No ratings yet
Roi Detection For Visual Lip Reading
22 pages
Is The Deconvolution Layer The Same As A Convolutional Layer
No ratings yet
Is The Deconvolution Layer The Same As A Convolutional Layer
7 pages
AI - Capstone Project
No ratings yet
AI - Capstone Project
12 pages
Intelligent Systems With Applications: Nur A-Alam, Md. Saikat Islam Khan, Mostofa Kamal Nasir
No ratings yet
Intelligent Systems With Applications: Nur A-Alam, Md. Saikat Islam Khan, Mostofa Kamal Nasir
14 pages
Deep Learning Based Sign Language Recognition System Using Convolutional Neural Network
No ratings yet
Deep Learning Based Sign Language Recognition System Using Convolutional Neural Network
68 pages
Using Deep Learning To Detect Price Change Indications in Financial Markets
No ratings yet
Using Deep Learning To Detect Price Change Indications in Financial Markets
5 pages
Performance Evaluation of Efficient Segmentation and Classification Based
No ratings yet
Performance Evaluation of Efficient Segmentation and Classification Based
13 pages
AI Curriculum for Class X Students
No ratings yet
AI Curriculum for Class X Students
8 pages
FluentNet - End-to-End Detection of Speech Disfluency With Deep Learning
No ratings yet
FluentNet - End-to-End Detection of Speech Disfluency With Deep Learning
13 pages
An Automatic Dermatology Detection System Based On Deep Learning and Computer Vision
No ratings yet
An Automatic Dermatology Detection System Based On Deep Learning and Computer Vision
10 pages
Ai
No ratings yet
Ai
14 pages
AI UG Course Book: Sem V-VI
No ratings yet
AI UG Course Book: Sem V-VI
39 pages
Deep Learning For X Ray Image To Text Generation
No ratings yet
Deep Learning For X Ray Image To Text Generation
4 pages
Artificial Intelligence (AI)
No ratings yet
Artificial Intelligence (AI)
6 pages
Ability Convolutional Feature Extraction For Chili Leaf Disease Using Support Vector Machine Classification
No ratings yet
Ability Convolutional Feature Extraction For Chili Leaf Disease Using Support Vector Machine Classification
8 pages
Drones 06 00406 v3
No ratings yet
Drones 06 00406 v3
22 pages
Terjemahan Bab1 TheHundredPageLanguageModels AndySetiawan
No ratings yet
Terjemahan Bab1 TheHundredPageLanguageModels AndySetiawan
34 pages
Huy Duc Pham: Brief Introduction
No ratings yet
Huy Duc Pham: Brief Introduction
5 pages
CT2US: Cross-Modal Transfer Learning For Kidney Segmentation in Ultrasound Images With Synthesized Data
No ratings yet
CT2US: Cross-Modal Transfer Learning For Kidney Segmentation in Ultrasound Images With Synthesized Data
9 pages
SENSL 23 04 RL 0168 - Proof - Hi PDF
No ratings yet
SENSL 23 04 RL 0168 - Proof - Hi PDF
6 pages
Neural Network Architectures Guide
No ratings yet
Neural Network Architectures Guide
6 pages
Bcse332l Deep-Learning TH 1.0 0 Bcse332l
No ratings yet
Bcse332l Deep-Learning TH 1.0 0 Bcse332l
3 pages
COntent
No ratings yet
COntent
43 pages
Convolution Neural Network
No ratings yet
Convolution Neural Network
74 pages
Eye Tracking Article - Found Own - Peer Reviewed
No ratings yet
Eye Tracking Article - Found Own - Peer Reviewed
34 pages

Convolutional neural network based encoder-decoder for efficient real-time object detection

Uploaded by

Convolutional neural network based encoder-decoder for efficient real-time object detection

Uploaded by

IAES International Journal of Artificial Intelligence (IJ-AI)

Vol. 14, No. 3, June 2025, pp. 1960~1967

Convolutional neural network based encoder-decoder for

Mothiram Rajasekaran1, Chitra Sabapathy Ranganathan2, Nagarajan Mohankumar3,

Article Info ABSTRACT

Journal homepage: http://ijai.iaescore.com

2. PROPOSED DEEP LEARNING MODEL

Figure 1. Proposed architecture for real-time object detection

3. RESULTS AND DISCUSSION

Int J Artif Intell, Vol. 14, No. 3, June 2025: 1960-1967

Table 1. Results for proposed object detection model

Figure 4. Performance analysis of existing approaches with proposed detection model

AUTHOR CONTRIBUTIONS STATEMENT

Int J Artif Intell, Vol. 14, No. 3, June 2025: 1960-1967

C : Conceptualization I : Investigation Vi : Visualization

CONFLICT OF INTEREST STATEMENT

Mothiram Rajasekaran is an accomplished IT leader with over 13 years of

Chitra Sabapathy Ranganathan is (Client Partner | Account Management | IT

Int J Artif Intell, Vol. 14, No. 3, June 2025: 1960-1967

Rajeshkumar Sampathrajan is a Principal Cloud Architect at McKinsey, where

Dr. Thayalagaran Merlin Inbamalar serves as an Associate Professor in the

Nageshvaran Nandhini has done her B.Tech. (information technology) at

Shanmugam Sujatha is an adjunct professor, Saveetha School of Engineering,

You might also like