Computational Analysis of Xception and ConvMixer Architecture in Classification of Skin Disease Images using Geometric Transformation

Maya Isafa Sam Saputri; Sugiyarto Surono; Aris Thobirin

doi:10.31764/jtam.v10i3.37037

Computational Analysis of Xception and ConvMixer Architecture in Classification of Skin Disease Images using Geometric Transformation

Maya Isafa Sam Saputri, Sugiyarto Surono, Aris Thobirin

Abstract

This research seeks to evaluate and contrast the effectiveness of two deep learning models, Xception and ConvMixer, for classification of skin disease images. An experimental methodology was employed using the Massive Skin Disease. The data is divided into training, validation, and test data with a ratio of 80:10:10. The pre-processing stage includes resizing, normalization, and the application of geometric augmentation to improve visual variation in the training data. Both models were trained using equalized parameters so that comparisons were made objectively. The models were assessed through several evaluation metrics, including loss, accuracy, precision, recall, and F1-score metrics in a multi-class classification scheme. The results showed that Xception obtained a test accuracy of 99,70%, while ConvMixer achieved 94,60%. Additionally, Xception exhibits faster convergence and more stable inter-class performance consistency, while ConvMixer excels in compute time efficiency. This study contributes in the form of a comparative analysis of two modern architectures with training parameters that are equalized in the classification of skin diseases. However, the study is still limited to the use of a partial class and a single dataset, so further testing is needed to ensure the generalization capabilities of the model over a wider range of scenarios.

Keywords

Deep Learning; Xception; Convermixer.

Full Text:

DOWNLOAD [PDF]

References

Abulwafa, A. (2022). A Survey of Deep Learning Algorithms and its Applications. Nile Journal of Comunication & Computer Science, 3(1), 28–49. https://doi.org/10.21608/njccs.2022.139054.1000

Ainurrohmah, & Wiyanti, D. T. (2023). Analisis Performa Algoritma Decision Tree , Naïve Bayes , K- Nearest Neighbor Untuk Klasifikasi Zona Daerah Risiko Covid-19 Di Indonesia Performance Analysis Of Decision Tree , Naïve Bayes , K-Nearest Neighbor Algorithm For Covid-19 Risk Zone Classificati. Jurnal Teknologi Informasdi Dan Ilmu KOmputer(JTIIK), 10(1), 115–122. https://doi.org/10.25126/jtiik.2023105935

Alomar, K., & Aysel, H. I. (2023). Data Augmentation in Classification and Segmentation : A Survey and New Strategies. Journal Of Imaging, 9(2), 46. https://doi.org/10.3390/jimaging9020046

Awaluddin, B. A., Chao, C. T., & Chiou, J. S. (2023). Investigating Effective Geometric Transformation for Image Augmentation to Improve Static Hand Gestures with a Pre-Trained Convolutional Neural Network. Mathematics, 11(23), 4783. https://doi.org/10.3390/math11234783

Bhatt, D., Patel, C., Talsania, H., Patel, J., Vaghela, R., Pandya, S., Modi, K., & Ghayvat, H. (2021). Cnn variants for computer vision: History, architecture, application, challenges and future scope. Electronics (Switzerland), 10(20), 2470. https://doi.org/10.3390/electronics10202470

Bracci, S., Mraz, J., Zeman, A., Leys, G., & de Beeck, H. O. (2023). The representational hierarchy in human and artificial visual systems in the presence of object-scene regularities. PLoS Computational Biology, 19(4), Article e1011086. https://doi.org/10.1371/journal.pcbi.1011086

Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition(CVPR 2017), 1800–1807. https://doi.org/10.1109/CVPR.2017.195

Hao, X., Liu, L., Yang, R., Yin, L., Zhang, L., & Li, X. (2023). A Review of Data Augmentation Methods of Remote Sensing Image Target Recognition. Remote Sensing, 15(3), 827. https://doi.org/10.3390/rs15030827

Hendrycks, D., & Gimpel, K. (2016). Gaussian Error Linear Units (GELUs). ArXiv.

Ibrahem, H., Salem, A., & Kang, H. (2025). Pixel shuffling is all you need : spatially aware convmixer for dense prediction tasks. Pattern Recognition, 158, 111068. https://doi.org/10.1016/j.patcog.2024.111068

Iĳima, R., & Kiya, H. (2022). An Encryption Method of ConvMixer Models without Performance Degradation. ArXiv.

Khoei, T. T., Slimane, H. O., & Kaabouch, N. (2023). Deep learning : systematic review , models , challenges , and research directions. Neural Computing and Applications, 35(31), 23103–23124. https://doi.org/10.1007/s00521-023-08957-4

Krichen, M. (2023). Convolutional Neural Networks: A Survey. Computers, 12(8), 151. https://doi.org/10.3390/computers12080151

Lin, H., Imaizumi, S., & Kiya, H. (2024). Privacy-Preserving ConvMixer Without Any Accuracy Degradation Using Compressible Encrypted Images. Information, 15(11), 723. https://doi.org/doi.org/10.3390/info15110723

Lukasz, K., Gomez, A. N., & Chollet, F. (2017). Depthwise separable convolutions for neural machine translation. arXiv preprint arXiv:1706.03059.

Mienye, I. D., & Swart, T. G. (2024). A Comprehensive Review of Deep Learning: Architectures, Recent Advances, and Applications. Informatics, 15(12), 755. https://doi.org/doi.org/10.3390/info15120755

Muhammad, W., Aramvith, B., & Onoye, T. (2021). Multi-scale Xception based depthwise separable convolution for single image super- resolution. Plos One, 16(8), e0249278. https://doi.org/doi.org/10.1371/journal.pone.0249278

Mumuni, A., & Mumuni, F. (2022). Data augmentation: A comprehensive survey of modern approaches. Array, 16, 100258. https://doi.org/10.1016/j.array.2022.100258

Nawar, S., Joty, T. A., & Hashem, M. M. A. (2024). A Lightweight Deep Learning Architecture for Efficient Multimodal Medical A Lightweight Deep Learning Architecture for Efficient Multimodal Medical Image Segmentation Using Attention Mechanism. ICCA ’24: Proceedings of the 3rd International Conference on Computing Advancements, October, 970–977. https://doi.org/doi.org/10.1145/3723178.3723307

Raj, R., & Kos, A. (2025). An Extensive Study of Convolutional Neural Networks: Applications in Computer Vision for Improved Robotics Perceptions. Sensors, 25(4), 1033. https://doi.org/doi.org/10.3390/s25041033

Rangel, G., Cuevas-Tello, J. C., Nunez-Varela, J., Puente, C., & Silva-Trujillo, A. G. (2024). A Survey on Convolutional Neural Networks and Their Performance Limitations in Image Recognition Tasks. Journal of Sensors, 2024(1), 2797320. https://doi.org/10.1155/2024/2797320

Sarı, M. O., & Keser, K. (2025). Classification of skin diseases with deep learning based approaches. Scientific Reports, 15(1), 27506. https://doi.org/https://doi.org/10.1038/s41598-025-13275-x

Sathya, R., Mahesh, T. R., Bhatia Khan, S., Malibari, A. A., Asiri, F., Rehman, A. ur, & Malwi, W. Al. (2024). Employing Xception convolutional neural network through high-precision MRI analysis for brain tumor diagnosis. Frontiers in Medicine, 11(1), 1487713. https://doi.org/https://doi.org/10.3389/fmed.2024.1487713

Solano, A., Dietrich, K. N., Martínez-Sober, M., Barranquero-Cardeñosa, R., Vila-Tomás, J., & Hernández-Cámara, P. (2023). Deep Learning Architectures for Diagnosis of Diabetic Retinopathy. Applied Sciences (Switzerland), 13(7), 4445. https://doi.org/10.3390/app13074445

Supriyono, Prasetya, A., Suyono, & Kurniawan, F. (2024). Telematics and Informatics Reports Advancements in natural language processing : Implications , challenges , and future directions. Telematics and Informatics Reports, 16, 100173. https://doi.org/10.1016/j.teler.2024.100173

Tempfli, L., & Sándor, C. (2024). HierNet: Image Recognition with Hierarchical Convolutional Networks. International Conference on Agents and Artificial Intelligence, 2(Icaart), 147–155. https://doi.org/10.5220/0012321100003636

Terven, J. R., Cordova-esparza, D. M., Ramirez-pedraza, A., Chavez-urbiola, E. A., & Romero-gonzalez, J. A. (2025). Loss Functions And Metrics In Deep Learning. Springer Artificial Intelligence Review, 58(6), 195. https://doi.org/https://doi.org/10.1007/s10462-025-11198-7

Trockman, A., & Kolter, J. Z. (2022). Patches Are All You Need? ArXiv. https://doi.org/https://doi.org/10.48550/arXiv.2201.09792

Üzen, H., & Fırat, H. (2024). A hybrid approach based on multipath Swin transformer and ConvMixer for white blood cells classification. Health Information Science and Systems, 12(1), 33. https://doi.org/10.1007/s13755-024-00291-w

Wang, J., & Perez, L. (2017). The Effectiveness of Data Augmentation in Image Classification using Deep Learning. ArXiv. https://doi.org/arXiv:1712.04621v1

Wang, Y., Han, Y., Wang, C., Song, S., Tian, Q., & Huang, G. (2023). Computation-efficient Deep Learning for Computer Vision. ArXiv. https://doi.org/arXiv:2308.13998v1

Younesi, A., Ansari, M., Fazli, M., Ejlali, A., Shafique, M., & Henkel, J. (2024). A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends. IEEE Access, 12(pp), 41180–41218. https://doi.org/10.1109/ACCESS.2024.3376441

Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An Image Is Worth 16x16 Words: Transformers For Image Recognition At Scale. ArXiv. https://doi.org/arXiv:2010.11929v2

DOI: https://doi.org/10.31764/jtam.v10i3.37037