ДАУКЕЕВА»
ISSN 2790-0886
В Е С Т Н И К
АЛМАТИНСКОГО УНИВЕРСИТЕТА ЭНЕРГЕТИКИ И СВЯЗИ
Учрежден в июне 2008 года
Тематическая направленность: энергетика и энергетическое машиностроение, информационные, телекоммуникационные и космические технологии
1 (60) 2023
Импакт-фактор - 0.095
Научно-технический журнал Выходит 4 раза в год
Алматы
о постановке на переучет периодического печатного издания, информационного агентства и сетевого издания
№KZ14VPY00024997 выдано
Министерством информации и общественного развития Республики Казахстан
Подписной индекс – 74108
Бас редакторы – главный редактор Стояк В.В.
к.т.н., профессор
Заместитель главного редактора Жауыт Алгазы, доктор PhD Ответственный секретарь Шуебаева Д.А., магистр
Редакция алқасы – Редакционная коллегия
Главный редактор Стояк В.В., кандидат технических наук, профессор Алматинского Университета Энергетики и Связи имени Гумарбека Даукеева, Казахстан;
Заместитель главного редактора Жауыт А., доктор PhD, ассоциированный профессор Алматинского Университета Энергетики и Связи имени Гумарбека Даукеева, Казахстан;
Сагинтаева С.С., доктор экономических наук, кандидат физико-математических наук, профессор математики, академик МАИН;
Ревалде Г., доктор PhD, член-корреспондент Академии наук, директор Национального Совета науки, Рига, Латвия;
Илиев И.К., доктор технических наук, Русенский университет, Болгария;
Белоев К., доктор технических наук, профессор Русенского университета, Болгария;
Обозов А.Д., доктор технических наук, НАН Кыргызской Республики, заведующий Лабораторией «Возобновляемые источники энергии», Кыргызская Республика;
Кузнецов А.А., доктор технических наук, профессор Омского государственного технического университета, ОмГУПС, Российская Федерация, г. Омск;
Алипбаев К.А., PhD, доцент Алматинского Университета Энергетики и Связи имени Гумарбека Даукеева, Казахстан;
Зверева Э.Р., доктор технических наук, профессор Казанского государственного энергетического университета, Российская Федерация, г. Казань;
Лахно В.А., доктор технических наук, профессор Национального университета биоресурсов и природопользования Украины, кафедра компьютерных систем, сетей и кибербезопасности, Украина, Киев;
Омаров Ч.Т., кандидат физико-математических наук, директор Астрофизического института имени В.Г. Фесенкова, Казахстан;
Коньшин С.В., кандидат технических наук, профессор Алматинского Университета Энергетики и Связи имени Гумарбека Даукеева, Казахстан;
Тынымбаев С.Т., кандидат технических наук, профессор Алматинского Университета Энергетики и Связи имени Гумарбека Даукеева, Казахстан.
За достоверность материалов ответственность несут авторы.
При использовании материалов журнала ссылка на «Вестник АУЭС» обязательна.
99
ИНФОРМАЦИОННЫЕ,
ТЕЛЕКОММУНИКАЦИОННЫЕ И КОСМИЧЕСКИЕ ТЕХНОЛОГИИ
МРНТИ 28.23.37 https://doi.org/10.51775/2790-0886_2023_60_1_99
ANALYSIS OF THE IMPACT OF VIDEO QUALITY ON FEATURE EXTRACTION FROM A VIDEO STREAM USING CONVOLUTIONAL
NEURAL NETWORKS
S.B. Rakhmetulayeva1 ⃰, A.S. Marat2, T. Iliev3, A.K. Mukasheva4
1,2International Information Technology University, Almaty, Kazakhstan
3University of Ruse, Ruse, Bulgaria
4 Non-profit JSC “Almaty University of Power Engineering and Telecommunications named after Gumarbek Daukeyev”, Almaty, Kazakhstan
e-mail: [email protected], [email protected], [email protected], [email protected]
Abstract. This article discusses the development of a monitoring tool that ensures the recognition of any illegal actions. Since the wide variation in visual impressions caused by variations in lighting, color, scale, viewing angles, headgear, and eyewear worn by a person is a severe issue for face identification systems. The focus of this article is to determine how an online control system will be very successful in preventing unfair testing by using facial recognition on video samples. The development of a trainable artificial intelligence network allows you to use several variables at the same time. As part of this work, a deep learning method was developed for automatic face identification, using a convolutional neural network to calculate the difference between a sample and a face in a video, because CNN is reliable and responsive to changes in lighting, changes in facial expression, and facial turns. Usually, to obtain more accurate results, video recordings of different quality were used, taken from different distances, at different angles, under different lighting conditions and with the inclusion of accessories. The results of the modeling work demonstrated that, with the exception of the experiment, when accessories such as a medical mask on the face were present, the model that we trained accurately predicted the result for each sample criterion. Other examples show the effectiveness of facial recognition when calculating the difference between the sample and the face was from 28 to 32%.
Keywords: online proctoring, face recognition, deep learning, convolutional neural network, features.
Introduction
Currently, the relevance of knowledge is changing at the speed of light. Every week, various projects are developed and implemented, incredible scientific discoveries are made. In this regard, various courses and training programs are being introduced, but as the number of students increases, there is a need for a method of express knowledge test-ing. To confirm the development of the material, various methods are used, one of them is testing.
In computer testing, various proctoring systems are being introduced to combat this phenomenon, in which real human proctors actively monitor the cameras and micro-phones of students taking the test. Don't you find it a little tedious? Can a person follow more than two people in real time? Would such a system be extremely effective in com-bating unfair testing? It is not easy to answer these questions, however, based on personal experience, it is safe to say that the human factor itself precludes stable control and relia-bility of test results.
Proctors are generally previous or current staff or working environment administra-tors. These are people who tend to observe a or maybe boring video carefully, mindfully and for numerous hours. Not all people are able to do this. To do this, the proctoring framework is always being made strides online. Based on the consider, proposals are ad-vertised for instructive educate on the utilize of proctoring frameworks.
Pratish et al. offers a cleverly online proctoring framework. This proctoring framework is based on sound and video parameters. [1]. In any case, the paper needs in assessing their claim inquire about work. Chua et
100
al. created a method that uses address bank randomization and tab locking to detect and prevent cheating [2].
E-Parakh, a method for online examination proctoring that can only be used with portable devices, was developed by Pandey et al. [3]. Slusky [4] examines many cybersecurity concerns relating to online proctoring platforms. The article discusses multi-factor authentication and authorization tactics and methodologies, in-cluding challenge-response, biometrics (voice and face recognition), and blockchain tech- nology. The use of lockdown browsers, webcam location of extortion behavioral signs, endpoint security, VPN and VM, screen-sharing and keyboard tuning in programs, spe-cialized controls to relieve the absence of spatial (physical range) controls, compliance with directions (GDPR), etc. are all included in the discussion of operational controls. The effect of proctoring on a student's performance is examined by Alessio et al. in their study [5].
Online proctoring may be a control procedure for online exams or testing, where the complete handle is controlled by an administrator - a proctor [6]. He screens the activities of the subject employing a webcam and sees what is happening. This innovation permits you to confirm the personality of the candidate, objectively survey his information, avoid deceive sheets and other traps within the exam. In this case, the subject and the proctor may be in several parts of the world. Proctoring prohibits the intrigued of the preparing center due to exams. Since the proctor isn't an examiner, but an independent person. He does not take an interest within the instructive prepare but takes an exam and makes be-yond any doubt that he passes all the rules. A proctor can be either an uncommonly pre-pared pro or a free examiner from another instructive organization or a private company. For all his capacities, there are times when the proctor may not keep track of infringement.
Online proctoring, used in most universities, is a human tracking system. However, there are also computer tracking systems and at the same time: a person with a computer. Hussein et al. [7] list the following characteristics of proctoring:
- Synchronous proctoring with a human. As a rule, utilized in testing or composing assignments. All through the exam, the proctor, as a rule an untouchable, directs the un-derstudies as they take the exam. Its reason is to control the quality and decency of testing. His obligations incorporate checking and recognizing the analyst, observing completely all the activities of the understudy and, in case fundamental, he has the proper to meddled within the testing handle (make comments, cancel the session, etc.). One proctor can as it were overseen 6 individuals taking the exam at the same time. This type of proctoring is one of the most exact, but moreover the most resource intensive.
- Asynchronous proctoring with post-validation. Merchant verification, tracking and detailing is done by program. The system can consequently fix the disallowed activities of the understudy and fix them within the logs. For example:
• Conversation during the exam
• Connecting multiple monitors
• Strangers in the frame
• The absence of the subject in the camera's area of view.
When violations are detected, the proctoring system marks the video fragment on which illegal actions were recorded. However, the exam will not be interrupted. All viola-tions will be reviewed and either confirmed or rejected by a university staff member. This type of proctoring is less resource-intensive in terms of time, it is the “computer plus hu-man” method.
- Automatic proctoring. Tracking of all activities of students, as well as cancellation, recording of infringement and all occasions that have happened happens naturally without the support of people. This type of proctoring is the slightest costly, but AI is still not perfect and cannot deliver a precise assessment of the activities of the subjects.
The technology of facial recognition information retrieval systems, which we have selected as the task for research in our article, has recently become widely used for the purpose of identifying a person. Together with the advancement of information transport technologies, numerous types of cloud services, and the insightful analysis of massive volumes of data, this direction is already among the most pertinent and promising [44]. It is vital to reevaluate the issue of automatic identification of a person based on the use of factual information acquired in specialized databases in light of the September 11, 2001 terrorist attacks in the United States of America and the events that followed.
A person is more frequently searched for and identified using precise information about her (last name, first name, patronymic, date of birth, gender, citizenship). Although this search method is quick, there are drawbacks as well. So, only possessing the exact credentials can help find him in the database if, for instance, the person to be recognized has different credentials. A way of identifying a person by her
101
photograph, sketch, or line drawing of her face, created by the artist, might be used as an alternative to the search option mentioned above.
Methods
Face recognition could be an innovation that permits you to automatically identify (dis-cover out who is within the photo) or verify (affirm that usually a confront within the photo) of an individual in a photo, video or live. For recognition, neural networks are used that can peruse and analyze the interesting highlights of a human face, and after that compare them with the base. The primary tests in machine face recognition were dis-played within the 1960s by Woody Bledsoe, a professor at the University of Texas at Aus-tin, an AI researcher. His working bunch made a database of 800 pictures of individuals from diverse points. The researchers at that point checked the faces with 46 arrange fo-cuses employing a model present day tablet.
Employing a extraordinary algorithm, the system turned faces at diverse points, zoomed in and out. Within the moment step, the algorithm used 22 estimations, acting on Bayesian choice hypothesis, to create the gener-ally conclusion as exact as conceivable. As a result, Bledsoe's system was 100 times quicker than a human. In 1988, Michael Kirby and Lawrence Sirovich at Brown University connected the Eigenface approach utilizing linear algebra to image analysis. They utilized less than 100 different meanings to allude to faces. In 1991, Alex Pentland and Matthew Turk at MIT progressed the Eigenfaces technology to include environmental variables. They oversee to computerize the recognition process. Within the late 1990s, the US Divi-sion of Defense Progressed Research Projects Office (DAPRA) and the National Organized of Guidelines and Innovation released the FERET program with the broadest database of faces - more than 14 thousand images. At first, it was utilized to look and recognize of-fenders around the world, but at that point it was presented to the public. Since 2010, Fa-cebook has been utilizing facial recognition to find users in posted photographs and offer to tag them. In 2011, the specialists of Panama and the Joined together States propelled a joint FaceFirst venture. Typically, facial recognition technology that was utilized to break down on illegal exercises at Tocumen Airplane terminal in Panama. That same year, US police and intelligence offices started utilizing facial recognition to identify corpses, counting those of Osama bin Laden. Since 2014, face recognition has been utilized in portable phone cameras, and since 2017 in retail [8]. The accuracy of face recognition by neural networks increased 50 times between 2016 and 2020, with an error rate of 0.8%. According to the 2019 Facial Recognition Market Research, the worldwide facial recogni-tion market was esteemed at $3.2 billion. The forecast for 2024 is $7 billion, with a yearly development of 16%.
The biggest developments in face recognition come from Microsoft (GAFAM), Apple, Amazon, Facebook, and Google. In 2014, Facebook launched DeepFace, a service that decides whether two shot faces have a place to the same person with 97.25% accuracy. In 2015, Google presented its development - FaceNet. Thanks to the huge amount of information that Google services collect, FaceNet has accomplished a record accuracy of 99.63%. Google Photos uses this technology to sort images and, as a result, tag people in them. Since 2018, Amazon has been effectively advancing its cloud-based facial recognition service Recognition, which is utilized by US law authorization agencies. The system can recognize up to 100 people in a single photo and search for them in databases containing tens of millions of faces. According to the Center for Strategic and International Studies, as well as the Office of Science and Technology of the US Department of the Inte-rior, FRT was recognized as the most excellent solution of 2020 with a recognition accu-racy of 99.97% [8]. I.V. Yurko and V.N. Aldobaeva in their work investigated and gave suggestions for improving the work of face recognition methods to increase the optimal ratio of recognition efficiency and computing power under the influence of external factors such as noise, distance to the object, light level and other factors [9]. A.A. Sukmandhani and I. Suteja conducted a series of tests to determine the degree of accuracy of the presen-tation and the length of the testing process using the Emgu CV cross-platform image pro-cessing library and Eigenface machine learning method, as well as the accuracy of the face recognition program using the own face method for the test samples. In case the students pass the approval accurately, exam questions will show up and the students will pass the exam [10]. Face identification systems have preferences as well, according to Syed Nawaz and Mazumder [11], such as accuracy, cost-effectiveness, non- invasiveness, usage of ob-solete data, the use of biometrics that are appropriate for use, and being developed as a backup mechanism. The face is a sign of the mentality, claims Dhawalsingh [12]. This can be a complex multidimensional structure that requires great computational methods to recognize. Computers are easily fooled by changes in lighting, different stances, and changes in the angle of the face when utilizing an automatic facial recognition system. The Eigenface approach is used to conduct out facial recognition algorithms in this at-tendance application [13]. When utilized for face identification in computer vision, a set of eigenvectors are referred to as Eigenface. In plain English, Eigenface may be described as a collection of
102
uniform face elements obtained from a statistical analysis of many different face photos [14]. Sirovich and Kirby developed the Eigenfaces technique to face recogni-tion, which M. Turk and A. Pentland employed for face classification [15]. M. Turk and A. Pentland not only shown how to build face recognition systems but also how to construct eigenvectors to perform compositions on the majority of face photos. The probability dis-tribution of the eigenvector for face image vector spaces is obtained from the covariance matrices [16].
The utilised face dataset needs to have the same lighting and resolution as when a new face recognition is run [13].
It is worth considering that now there are many types and systems for recognition and identification of a person's face. But very few systems can recognize poor video/image quality with different criteria.
Our model and our experiment can be used in other studies, such as cardiology, pulmonology, etc., since data on these diseases are similarly studied during screening, using video recordings of certain places and through their segmentation, which is impos-sible not to note [17-19].
2.1. The Mutual Subspace Method
A linear sub-space of lower dimension can be used to represent the distribution of a set of face images.
Yamaguchi et al. [20] proposed the Mutual Subspace Method (MSM), which represents each set of images by a linear subspace spanned by the images' principal components. The smallest principal angles between subspaces are used to calculate the similarity of image sets. The primary minimal angles between the vectors of two subspaces, angles [21], reflect the fundamental ways that two subspaces change. As a measure of similarity, canonical relations are often used, which represent the principal angle cosines. The Limited Mutual Subspace Method (CMSM) was later added to MSM in order to make it impervious to modifications like pose and lighting alterations [22]. In CMSM, the test and reference subspaces are projected into the constraint subspace, where each subspace shows a slight variation and the two subspaces might be more clearly distinguished. In [23], a real-time system using CMSM was demonstrated. Using ensemble learning techniques, the authors of [24] also presented the multiple constrained mutual subspace method (MCMSM), which creates multiple constraint subspaces (bagging and boosting). Each constraint subspace's similarities are combined for recognition. The results of the experiments revealed an improvement in recognition scores compared to MSM and CMSM. The experiments were conducted using databases of 50 individuals and 500 subjects.
The speed with which MSM-based approaches can compute main angles between linear subspaces and perform step-by-step evaluation of linear subspaces is one of their appealing qualities [25]. Simplified modeling using linear subspace, however, is vulnerable to data fluctuations and is unable to accurately simulate complex and non-linear variations in facial appearance. Additionally, it is noted in [26, 27] that MSM-based approaches cannot consider the whole probabilistic model of face changes since they do not consider the eigenvalues of the main components or the sample mean values.
Nonlinear subspaces or manifolds have been the target of attempts to expand approaches based on the MCM [28, 25, 27]. The kernel approach was first used by Wolf and Chashua [28] to determine fundamental angles between non-linear manifolds. Finding the best kernel function, as with other kernel techniques, is a challenge. According to Kim et al. [25, 29], MSM-based approaches have a limited capacity to simulate variations in a nonlinear pattern and ad hoc blending of data from various primary angles. By fusing global and local manifold variations, they expanded the idea of main angles to non-linear manifolds, where portions of a locally linear manifold are produced using a combination of probabilistic PCA. The weighted average of the similarity between the global modes of data change and the best local locations is used to calculate the similarity between manifolds. Additionally, they used AdaBoost to investigate the best principal angle merging for the application. Both methods of modeling non-linear manifolds outperform the standard MSM, as shown by database studies with 100 participants. The distance between manifolds, which is defined as the distance to the closest pair of subspaces from two manifolds, was first developed by Wang et al. [27] by breaking down a nonlinear manifold into a collection of local linear models, each of which is represented by a linear subspace. They claimed that sample means should be considered in local models to gauge how comparable they are since principal angles mostly reflect the general modes of change between two subspaces while disregarding the actual data. Studies using two open databases show that their method outperforms the MSM approach in terms of performance.
Kim et al. [30, 31] presented a discriminative learning approach for categorizing image sets using canonical correlations as a measure of distance between image sets. In order to reduce canonical set correlations between classes and maximize canonical set correlations within classes, they created a linear discriminant function. After that, canonical correlations are used to compare the sets of images that the discriminant function has altered. Their method has been tested on a variety of object identification tasks,
103
allowing for consistently high recognition results. In [32], to classify sets of face images using canonical correlations, a regularized loss based LDA is introduced.
Without using an allocation extraction approach, the Mutual Subspace Method (MSM) does a nearest neighbor search in the specific allocation space. The smallest central point in the coordinate system refers to the division between the bases of the subspaces of the compared pairs of picture sets
𝒅 (·,·) = 𝛉𝟏 (1)
Constrained Mutual Subspace Method (CMSM) was introduced by K. Fukui et al. to facilitate the execution of the initial MSM through highlight extraction. Particularly, the Constrained Mutual Subspace Method (CMSM) ventures the initial include onto the eigenvectors of
𝐆 = ∑𝐢=𝟏𝐂 𝐏𝐢 = ∑𝐢=𝟏𝐂 𝛀𝐢𝛀𝐢𝐓 (2)
comparing to the d smallest eigenvalues, where G is the sum of the projection matrix of all classes. The space traversed by those eigenvectors is characterized as the generalized contrast subspace [33].
2.2. Face Recognition method with FaceNet and MTCNN
In their research, F. Shroff, D. Kalenichenko, and J. Philbin presented FaceNet, a system that is specifically trained to map face images into a condensed Euclidean space, whose distances directly correlate to the measure of face similarity. In contrast to prior deep learning techniques, their methodology directly optimizes the embedding itself using a deep convolutional network, as opposed to using an intermediate bottleneck layer. They employed online triplet extraction to create triplets of generally aligned matching/mismatched facial areas for training. They obtain state-of-the-art facial recognition performance using as few as 128 bytes per face, which is a significant advantage of their approach. Their algorithm obtains a new record accuracy of 99.63% on the widely used La-beled Faces in the Wild (LFW) dataset. It reaches 95.12% of users on YouTube Faces DB. For both datasets, their method reduces the error rate by 30% when compared to the best reported result [34]. In order to define several face embeddings (produced by various networks) that are compatible with one another and allow for direct comparison, they established the concepts of harmonic embedding and harmonic triplet loss [35].
2.3. Deep Learning and CNN method
In order to "bend" faces into a standard frontal view, Zhenyao et al. [36] first train a CNN that categorizes each face as belonging to a famous person. PCA is employed at the network output in conjunction with the SVM ensemble for face verification. A multi-stage procedure is suggested by Taigman et al. [37] to align faces to a shared 3D form model. To complete the task of facial recognition on more than 4,000 individuals, a network with many classes is trained. The so-called Siamese network was another experimental technique used by the authors, who used it to directly optimize the L1 distance between two facial traits. Their top LFW score (97.35%) was achieved by an ensemble of three networks employing various color channels and equalizations. A non-linear SVM is used to integrate the predicted distances (non- linear SVM predictions based on the 2 kernel) of these networks.
2.4. The most advanced developments in this area
The error rate for facial recognition using neural networks decreased by 50 times between 2016 and 2020, to 0.8%. According to the 2019 Facial Recognition Market Research, the global facial recognition market was valued at $3.2 billion. The forecast for 2024 is $7 billion, with an annual growth of 16%. The most ambitious developments in the field of face recognition are from Apple, Microsoft (GAFAM), Google, Facebook and Amazon. In 2014, Facebook launched DeepFace, a service that, with 97.25% accuracy, detects whether two photographs of faces belong to the same person.
In 2015, Google introduced its development - FaceNet. Thanks to the huge amount of data that Google services collect, FaceNet has achieved a maximum-record accuracy of 99.63%. This technology is used by Google Photos to sort images and automatically tag people in them.
Since 2018, Amazon has been actively promoting its cloud-based facial recognition service Recognition, which is used by US law enforcement agencies. The system can recognize up to 100 people in a single photo and search for them in databases containing tens of millions of faces.
104
According to the Center for Strategic and International Studies, as well as the Office of Science and Technology of the US Department of the Interior, FRT was recognized as the best solution of 2020: its recognition accuracy was 99.97% [16].
Who Uses Facial Recognition Software (FRS)? Companies such as Mastercard already use FRS as an identifier for payments and for security purposes. FRS can potentially be used in retail, hospitality, banks, ATMs, and airports. Companies focused on mobile commerce will greatly benefit from FRS. Marketing firms are considering using FRS for personalized customer service. For example, some glasses e-commerce companies are working on using the FRS to recommend glasses that are a good fit for your facial structure.
This eliminates the need to visit the store to try them on. However, by far the most important use cases for FRS are related to security. Here are 11 of the best facial recognition software available on the market today:
Amazon Recognition, Kairos, Betaface, Cognitec, Face++, SenseTime, FaceFirst, BioID, Sky Biometry, DeepVision AI, Trueface.ai. Of these, BioID, Face++, Trueface.ai offer their services in the field of education [38].
During Ellucian Live 2018, an exemplary face recognition technology dashboard was shown, taken from a test-version prototype demonstrated at Ellucian Innovation Labs. It conveys the reaction of the face, measured throughout the lecture. Education experts and administrators can work together to develop new data models that unlock insightful knowledge about how students learn, the most effective teaching strategies, and what distinguishes great classes and teachers from ineffective learning environments as class participation data is collected week after week and semester after semester. Additionally, aggregated data can be utilized to discover learning strengths and problem areas when a student enrolls in order to graduate in one semester, offering a more individualized learning experience that can help each student achieve better results [39]. The Eigenface approach was used by A. Sukmandhani and I. Suteja to construct a prototype face-based online exam program [40].
The following tasks were included in the effort to help accomplish this goal: 1. Review of the literature on this topic; 2. Collection of biometric indicators; 3. Study of the verification method based on biometric indicators; 4. Development of a client verification method based on biometric indicators; 5. Run and test the method.
The purpose and objectives of the research
The purpose of the study is to analyze video samples according to several criteria for checking, admission and passing the test / exam of the student through the online proc-toring system. The pandemic gave an excellent chance to “switch” education to a distance format, but it was the forced transfer of the process to online that revealed the unprepar-edness for this of teachers, students, and the entire education system as a whole [41]. It al-so allowed attention to be focused on related issues, one of which is checking the dealer. Thanks to the state of the art in ML [42] and computer vision, we have a lot of powerful tools to do these kinds of tasks with great features… You don't need to reinvent the wheel; you need to know how to use it to make your car better.
The advantages of this work are that, by using machine learning methods for student verification, it is possible to prevent the student from being replaced by another person, mark his personal visit, and allow him to pass the exam. Additionally, the novelty of the work lies in the fact that we attempted to develop a model that takes into account every criterion at once and provides good verification results, in contrast to previous deep learning-based approaches presented in the literature review.
Materials and methods
This article presents a deep learning architecture for face recognition where the model incorporates the concept of machine learning and image classification using the basic architecture. The CNN model was trained using 100 images and 40 video samples. Thus, the developed system provides an efficient classification model for identifying a person. The research materials are individuals aged 18 to 35, who can be conditionally considered students, masters, doctoral students.
4.1. Dataset
After looking at several studies, to get good results, it was decided to use a deep learning algorithm as a parameter for face recognition, which can take an input image, assign importance (trainable weights and biases) to different areas/objects in the image, and can distinguish one from another. The experiment was based on images from videos of different quality, shot on different devices. And, samples were taken at
105
different dis-tances, from different sides of the screen, in accelerated mode, with different lighting and accessories on Figure 1.
Figure 1: Samples, a) images; b) video with criteria (from left to right: different quality, with accessories, from different sides of the screen, at different distances, with different lighting)
4.2. Data preparation
For the experiment, video samples of different quality and from different devices were taken, such as:
iPhone 13 pro max: 500x620 pixels (with distortion), 1080х1920, 2316x3088, 2160x3840, 1080x1706, 1080x1270, on a MacBook with a quality of 1440x960 pixels, a Lenovo Syncpad laptop with 1280x720 pixels. Video samples were also taken with such criteria as: with the presence of a headgear, with a medical mask, from near and far distances, with different types of lighting and from different sides of the screen. As a result, 60% of the dataset was used for training and 40% for testing our model.
4.3. CNN architecture
The CNN has been demonstrated in this study using an input layer, a convolution layer, a max pooling layer, a fully connected layer, and an output layer. As a trigger measure to test the probability of an exit, a prediction was made using a sigmoid sort, which should give an answer in the range from (false) to 1 (true).
A visual plot of the CNN readings used for this analysis can be seen in Figure 2. The experiment used an image as input. The convolutional layer comprises 32 filters through which the bit measure is passed as a 7x7 tuple. The filter is at that point moved 1 pixel to the correct, reapplied to the input volume, and runs until it comes to the furthest right restrain of the volume in which the filter moves. The other step within the change is to apply bunch normalization. To decrease the spatial measurements of the yield volume, Max Pooling was connected. After getting the come about of the operation, the show will perform the capacities of in- formation enactment. To decrease the measure of the greatest pool, the input picture is re-duced in size. The feature map is combined and after that smoothed. Smoothing changes the lattice into a one-column vector, which is handled by a completely associated layer.
Figure 2: CNN architecture visualization
Thus, we have created a CNN model, which is used for face image recognition. Local receptor fields, which offer local two-dimensional connectivity of neurons, total weights, which ensure the detection of some features anywhere in the picture, and hierarchical organization with spatial sampling are distinguishing characteristics of the SNS (spatial subsampling). These advancements enable CNN to partially withstand scale, displacement, rotation, angle, and other distortion changes [45]. You'll be able to increment or diminish the layers of convolution, most extreme pooling, as well as the number of neurons in it. Fair be beyond any doubt that the more layers/neurons you include, the slower the demonstration gets to be. In expansion, once you have a huge number of pictures, on the arrange of 50 KB or more, the portable
106
workstation processor may not be effective to ponder such a huge number of pictures [43], or when applying the calculation to video, execution drops very much and powerful GPUs are required to work productively.
You'll have to utilize cloud administrations such as AWS or Google Cloud.
Results
In our study, we used 100 images and 40 video samples, which were used into training and testing. For testing and training, we also split these images into 80% for training and 20% for aspect ratio testing.
500x620 pixels (with distortion), 1080x1920, 2316x3088, 2160x3840, 1080x1706, 1080x1270, 1440x960 pixels, 1280x720 pixels were used in the model. After training the model, we achieved a maximum training accuracy of 93.38% with a test size of up to 118 frames obtained from a 1080x1920 quality video sam-ple in a dimly lit room, as shown in Figure. 3.
Compared to the testing samples there is a difference between testing samples and actual video. The difference ranges from 30 - 40%. But with the face detection accuracy set to the tolerance of 60%. The model was able to identify faces in the videos with the high accuracy and showed that the higher quality samples and videos improve recognition scores, but it can drastically affect performance of the recognition time and requires more power system to handle on time feed.
Figure 3: Training results
It is worth noting that when studying models with species of various categories, the experiments were successfully completed according to all criteria, except for the presence of species with medical masks due to the absence of signs. The video sample from different distances was also tested at the maximum distance, where the model successfully passed face recognition up to 6m, the calculated variation between the sample and the video's subject's face was 38%. With the distorted quality of the video sample, the calculation of the sample's and the face's differences in the video was 21%. When playing a video sample consisting of 79 frames with a quality of 2160x3840 in accelerated mode, the maximum accuracy was 75.11%, which can also be considered successful, since the minimum threshold for passing face recognition for my model is 75%. An experiment with distorted video quality, this is 360x640 pixels, as a result showed a maximum accuracy of 34.34%.
10 minutes and 13 seconds in total running, the confidence parameter is set to 0.325, meaning that the algorithm has a 32.5% confidence that this is the correct individual. No faces were found in the three photos during face detection training.
Conclusion
107
Online proctoring is a control procedure for online exams or testing, in which the entire process is controlled by a proctor [6]. It monitors the subject's actions using a webcam and sees what is happening. As mentioned earlier, a proctor can be either a specially trained specialist or an independent teacher from another educational organization or a private company. For all his abilities, there are times when the proctor may not keep track of vio-lations. The benefits of this approach are that it uses machine learning techniques to con-firm a student's identity, it is possible to prevent a student from being replaced by another person, mark his personal visit and allow him to pass the exam. In addition, compared to other deep learning methods described in the literature review, the novelty of the work lies in the fact that we tried to create a model that considers all the criteria at once and gives good validation results. With all these training criteria, we got 93% efficiency results, at a long distance (maximum - 6m) we got a measurement of the sample's deviation from the face in the video at 38%, while with a distorted quality of 21%, which shows a good result of the experiment. And in the fast playback format, the percentage of face recognition reached the minimum threshold (slightly more than 75%).
Acknowledgements
This research has been funded by the Science Committee of the Ministry of Education and Science of the Republic of Kazakhstan (Grant No. AP13068032).
LIST OF REFERENCES
[1]. Prathish S.; Narayanan S.; Bijlani K. An intelligent system for online exam monitoring, In International Conference on In-formation Science (ICIS), 2016. DOI: 10.1109/INFOSCI.2016.7845315.
[2]. Chua S.; Bondad J.; Lumapas Z.; Garcia J. Online examination system with cheating prevention using question bank ran-domization and tab locking. In 4th International Conference on Information Technology (InCIT), 2019.
[3]. Pandey A.; Kumar S.; Rajendran B.; Bindhumadhava S. Unsupervised online examination system. In IEEE Region 10 Annual International Conference, Proceedings/TENCON, 2020, pp. 667–671.
Available online: https://doi.org/10.1109/TENCON50793.2020.9293792.
[4]. Slusky L. Cybersecurity of online proctoring systems. Journal of International Technology and Information Management, 2020, vol. 29.
[5]. Alessio H.; Malay N.; Maurer K.; John Bailer A.; Rubin B. Examining the effect of proctoring on online test scores. Online Learning, 2017, vol. 21, no.1.
[6]. Nesterenko E. What is proctoring and what is useful in exams, testing and training. Financial Academy Aktiv, 2021. Available online: https://finacademy.net/materials/article/proktoring.
[7]. Hussein M.; Yusuf J.; Deb A.; Fong L.; Naidu S. An evaluation of online proctoring tools. Open Praxis, 2020, vol. 12, no.4, pp. 509-525. Available online: https://doi.org/10.5944/openpraxis.12.4.1113
[8]. Zuikova A. How does face recognition work and is it possible to cheat this system. RBC trends, 2021. Available online: https://trends.rbc.ru/trends/industry/6050ac809a794712e5ef39b7.
[9]. Yurko I.; Aldabaeva V. Investigation of the main problems related to the recognition and identification of faces by video recording and improvement of the work of face recognition algorithms by video recording in real time. Scientific Journal: Problems of Modern Science and Education, 2018.
[10]. Sukmandhani A.; Sutedja I. Face recognition method for online exams. In International Conference on Information Man-agement and Technology (ICIMTech), 2019.
DOI:10.1109/ICIMTech.2019.8843831.
[11]. Syed Navaz A.; Sri T.; Mazumder P.; Professor A. Face recognition using principal component analysis and neural networks. International Journal of Computer Networking, Wireless and Mobile Communications (IJCNWMC), 2013, vol. 3, no.1, pp. 2250–1568.
[12]. Dhavalsinh A.; Solanki V. A survey on face recognition techniques. Journal of Image Processing
& Pattern Recognition Progress (JoIPPRP), 2013, vol. 4, no.6, pp. 11–16.
[13]. Yusuf M.; Ginardi R. V. H.; Ahmadiyah A. Rancang bangun aplikasi absensi perkuliahan mahasiswa dengan pengenalan wajah. Journal Teknik ITS, 2016, vol. 5, no.2, pp. 766–770.
[14]. Al Fatta H. Rekayasa sistem pengenalan wajah. Penerbit Andi, 2009.
[15]. Ruiz-del-Solar J.; P. Navarrete. Eigenspace-based face recognition: a comparative study of different approaches. IEEE Trans. Syst. Man, Cybern. Part C, Applications Rev., 2005, vol. 35, no.3, pp.
315–325.
108
[16]. Turk M. A.; Pentland A. P. Face recognition using eigenfaces. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1991, pp. 586–591.
[17]. Syrymbet Z. S.; Rakhmetulayeva S. B. Convolutional neural network analysis of fundus for glaucoma diagnosis. In IEEE International Conference on Smart Information Systems and Technologies (SIST), 2022.
[18]. Kozhamzharova D. K.; Duisebekova K. S.; Rakhmetulayeva S. B.; Umarov F. A.; Aitimov M. Z.
Development of an infor-mation-analytical system for the analysis and monitoring of climatic and ecological changes in the environment. Procedia Computer Science, 2020, 170, pp. 578–583.
[19]. Rakhmetulayeva S. B.; Duisebekova K. S.; Kozhamzharova D. K.; Aitimov M. Z. Pollutant transport modeling using gaussi-an approximation for the solution of the semi-empirical equation. Journal of Theoretical and Applied Information Technology, 2021, 99(8), pp. 1730–1739.
[20]. Yamaguchi O.; Fukui K; Maeda K. Face recognition using temporal image sequences. In IEEE International Conference on Automatic Face & Gesture Recognition (FG), 1998. DOI:
10.1109/AFGR.1998.670968.
[21]. Hotelling H. Relations between two sets of variates. Biometrika, 1936, vol. 28, issue 3-4, pp.
321–377. Available online: https://doi.org/10.1093/biomet/28.3-4.321.
[22]. Fukui K.; Yamaguchi O. Face recognition using multi-viewpoint patterns for robot vision. In Conference Robotics Research, The Eleventh International Symposium, ISRR, Siena, Italy, 2003, October 19-22, pp. 192–201.
[23]. Kozakaya T.; Nakaia H. Development of a face recognition system on an image processing lsi chip. In Conference on Com-puter Vision and Pattern Recognition Workshop, 2004. DOI:
10.1109/CVPR.2004.322.
[24]. Nishiyama M.; Yamaguchi O.; Fukui K. Face recognition with the multiple constrained mutual subspace method. In 5th In-ternational Conference, AVBPA, Kanade, T.; Jain, A.; Ratha, N.K. (eds.), LNCS, Springer, Heidelberg, 2005, vol. 3546, pp. 71–80.
[25]. Kim T.; Arandjelovic´ O.; Cipolla R. Learning over sets using boosted manifold principal angles (bomva). In British Machine Vision Conference (BMVC), 2005, vol. 2, pp. 779– 788.
[26]. Shakhnarovich G.; Fisher J.; Darrel T. Face recognition from long-term observations. In 7th European Conference on Com-puter Vision, Copenhagen, Denmark, Proceedings, 2002, May 28-31, Part III.
[27]. Wang R.; Shan S.; Chen X.; Gao W. Manifold-manifold distance with application to face recognition based on image set. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 1–8.
[28]. Wolf L.; Shashua A. Learning over sets using kernel principal angles. Journal of Machine Learning Research, 2003, vol. 4, no. 10, pp. 913–931.
[29]. Kim T.; Arandjelovic´ O.; Cipolla R. Boosted manifold principal angles for image set-based recognition. Pattern Recognition, 2007, vol. 40, no.9, pp. 2475–2484.
[30]. Kim T.; Kittler J.; Cipolla R. Learning discriminative canonical correlations for object recognition with image sets. In 9th European Conference on Computer Vision, Graz, Austria, Proceedings, 2006, May 7-13, Part III.
[31]. Kim T.; Kittler J.; Cipolla R. Discriminative learning and recognition of image set classes using canonical correlations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, vol. 29, no.6, pp. 1005–1018.
[32]. Geng Y.; Shan C.; Hao P. Square loss based regularized LDA for face recognition using image sets. In IEEE CVPR Workshop on Biometrics, 2009, pp. 99–106.
[33]. Fukui K.; Yamaguchi O. Face recognition using multi-viewpoint patterns for robot vision. In Proceedings of 11th Interna-tional Symposium of Robotics Research, 2003, p. 192–201.
[34]. Sun Y.; Wang X.; Tang X. Deeply learned face representations are sparse, selective, and robust.
In IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015. DOI:
10.1109/CVPR.2015.7298907.
[35]. Schroff F.; Kalenichenko D.; Philbin J. FaceNet: a unified embedding for face recognition and clustering. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015. DOI: 10.1109/CVPR.2015.7298682.
[36]. Zhu Z.; Luo P.; Wang X.; Tang X. Recover canonical view faces in the wild with deep neural networks. CoRR, 2014, abs/1404.3543.
109
[37]. Taigman Y.; Yang M.; Ranzato M.; Wolf L. Deepface: closing the gap to human-level performance in face verification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 2014. DOI: 10.1109/CVPR.2014.220
[38]. Mohana Krishnan R. Top 11 facial recognition software in 2021. Spiceworks, September 2, 2021.
[39]. Saravanan R. Facial recognition can give students better service (and security). Ellucian, 2018.
[40]. Sukmandhani A. A.; Sutedja I. Face recognition method for online exams. In IEEE International Conference on Information Management and Technology (ICIMTech), 2019.
[41]. Grigoriev V.; Novikova S. I will not write off: what is online proctoring and how it works. RBK Trends, 2021. Availa-ble online: https://trends.rbc.ru/trends/education/5fa01fe49a794782c65b74f9
[42]. Cerliani M. People tracking with machine learning. Towards Data Science, 2019. Available online: https://towardsdatascience.com/people-tracking-with-machine-learning-d6c54ce5bb8c.
[43]. Hashmi F. Face recognition using deep learning CNN in python. Thinking neuron, 2021.
[44]. Tekhnologiya raspoznavaniya lic: princip raboty i aktual'nost' [Face recognition technology:
principle of operation and relevance]. [Electronic resource], 2020.
[45]. Upskaya O. Research and software implementation of face recognition algorithms, 2018.
Available online:
https://knastu.ru/media/files/page_files/page_391/magistr_referat/Avtoreferat._Upskaya_O.K._6VSm-1.pdf
КОНВОЛЮЦИЯЛЫҚ НЕЙРОНДЫҚ ЖЕЛІЛЕРДІ ҚОЛДАНА ОТЫРЫП, БЕЙНЕ АҒЫНЫНАН БЕЛГІЛЕРДІ АЛУҒА БЕЙНЕ
САПАСЫНЫҢ ӘСЕРІН ТАЛДАУ
С.Б. Рахметулаева1⃰, А.С. Марат2, Т.Iliev3, A.К. Мукашева4
1,2Халықаралық Ақпараттық Технологиялар Университеті, Алматы, Қазақстан
3University of Ruse, Ruse, Bulgaria
4«Ғұмарбек Дәукеев атындағы Алматы энергетика және байланыс университеті» КЕАҚ, Алматы, Қазақстан
е-mail: [email protected], [email protected], [email protected], [email protected] Аңдатпа. Бұл мақалада кез-келген заңсыз әрекеттерді тануды қамтамасыз ететін бақылау құралын әзірлеу қарастырылады. Бақылау үшін үш түрлі дереккөз қолданылады: аудио, бейне ағындары және жұмыс үстелі экранындағы жазбалар. Бұл мақаланың басты бағыты-бейне үлгілерінде бетті тануды қолдану арқылы жосықсыз тестілеудің алдын алуда онлайн бақылау жүйесінің қалай сәтті болатынын анықтау.
Оқытылатын жасанды интеллект желісін дамыту бірнеше айнымалыларды бір уақытта пайдалануға мүмкіндік береді. Бұл жұмыстың бір бөлігі ретінде бейнедегі үлгі мен бет арасындағы айырмашылықты есептеу үшін конволюциялық нейрондық желіні пайдалана отырып, беттерді автоматты түрде анықтауға арналған терең оқыту әдісі әзірленді. Әдетте дәлірек нәтиже алу үшін әр түрлі қашықтықтан, әр түрлі бұрыштардан, әр түрлі жарық жағдайларында және аксессуарларды қосқанда түсірілген әр түрлі сапалы бейне түсірілімдер қолданылды. Модельдік жұмыстың нәтижелері экспериментті қоспағанда, медициналық бет маскасы сияқты аксессуарлар болған кезде, біз үйреткен модель әрбір үлгі белгісі үшін нәтижені дәл болжағанын көрсетті. Басқа мысалдар үлгі мен бет арасындағы айырмашылықты есептеу 28-ден 32% - ға дейін болған кезде бетті танудың тиімділігін көрсетеді.
Түйін сөздер: онлайн прокторинг, бетті тану, терең оқыту, конволюциялық нейрондық желі, ерекшеліктері.
110
АНАЛИЗ ВЛИЯНИЯ КАЧЕСТВА ВИДЕО НА ИЗВЛЕЧЕНИЕ ПРИЗНАКОВ ИЗ ВИДЕОПОТОКА С ИСПОЛЬЗОВАНИЕМ
СВЕРТОЧНЫХ НЕЙРОННЫХ СЕТЕЙ
С.Б. Рахметулаева1⃰, А.С. Марат2, Т.Iliev3, A.Мукашева4
1,2Международный университет информационных технологий, Алматы, Казахстан
3University of Ruse, Ruse, Bulgaria
4НАО «Алматинский университет энергетики и связи имени Гумарбека Даукеева», Алматы, Казахстан
е-mail: [email protected], [email protected], [email protected], [email protected]
Аннотация. В данной статье рассматривается разработка инструмента мониторинга, обеспечивающая распознавание любых незаконных действии. Для наблюдения используются три различных источника: аудио, видеопотоки и записи с экрана рабочего стола. В центре внимания этой статьи находится определение того, как система онлайн-контроля будет очень успешной в предотвращении недобросовестного тестирования путем использования распознавания лиц на образцах видео. Разработка обучаемой сети искусственного интеллекта позволяет использовать несколько переменных одновременно. В рамках этой работы был разработан метод глубокого обучения для автоматической идентификации лиц, с использованием сверточной нейронной сети для вычисления разницы между образцом и лицом в видео. Обычно для получения более точных результатов использовались видеосъемки различного качества, снятые с разных расстояний, под разными углами, при различных условиях освещения и с включением аксессуаров. Результаты модельной работы продемонстрировали, что, за исключением эксперимента, когда присутствовали аксессуары, такие как медицинская маска на лице, модель, которую мы обучили, точно предсказала результат для каждого критерия выборки. Другие примеры показывают эффективность распознавания лиц, когда вычисление разницы между образцом и лицом составляло от 28 до 32%.
Ключевые слова: онлайн-прокторинг, распознавание лиц, глубокое обучение, сверточная нейронная сеть, особенности.
Басылымның шығыс деректері
Мерзімді баспасөз басылымының атауы «Алматы энергетика және байланыс университетінің Хабаршысы» ғылыми- техникалық журналы
Мерзімді баспасөз басылымының меншік иесі «Ғұмарбек Дәукеев атындағы Алматы энергетика және байланыс университеті»
коммерциялық емес акционерлік қоғамы, Алматы, Қазақстан
Бас редактор Профессор, т.ғ.к., В.В. Стояк
Қайта есепке қою туралы куәліктің нөмірі мен күні және берген органның атауы
№ KZ14VPY00024997, күні 17.07.2020,
Қазақстан Республикасының Ақпарат және қоғамдық даму министрлігі
Мерзімділігі Жылына 4 рет (тоқсан сайын)
Мерзімді баспасөз басылымының реттік нөмірі және жарыққа шыққан күні
Жалпы нөмір 60, 1-басылым, 2023 жылғы 31 наурыз
Басылым индексі 74108
Басылым таралымы 200 дана
Баға Келісілген
Баспахана атауы, оның мекен-жайы «Ғұмарбек Дәукеев атындағы Алматы энергетика және байланыс университеті»
КЕАҚ баспаханасы, Байтұрсынұлы көшесі, 126/1 үй, А120 каб.
Редакцияның мекен-жайы 0 5 0 0 1 3 , Алм а т ы қ. , «Ғ ұ м а р бе к Дә ук е ев а т ы н да ғы А л м а т ы эн ер г ет и ка ж ә н е ба й ла н ы с ун и в ер с и т ет і » К ЕА Қ, Б а й т ұ р с ы н ұ лы к- с і , 1 2 6 / 1 ү й , ка б. А 2 2 4 , т е л. : 8 ( 7 2 7 ) 2 9 2 5 8 4 8 , 7 08 8 8 0 7 7 9 9 , e - m a i l : v e s t n i k @ a u e s . k z
Выходные данные
Название периодического печатного издания Научно-технический журнал «Вестник Алматинского университета энергетики и связи»
Собственник периодического печатного издания
Некоммерческое акционерное общество «Алматинский университет энергетики и связи имени Гумарбека Даукеева», Алматы, Казахстан
Главный редактор Профессор, к.т.н., Стояк В.В.
Номер и дата свидетельства о постановке на переучет и наименование выдавшего органа
№ KZ14VPY00024997 от 17.07.2020
Министерство информации и общественного развития Республики Казахстан
Периодичность 4 раза в год (ежеквартально)
Порядковый номер и дата выхода в свет
периодического печатного издания Валовый номер 60, выпуск 1, 31 марта 2023
Подписной индекс 74108
Тираж выпуска 200 экз.
Цена Договорная
Наименование типографии, ее адрес Типография НАО «Алматинский университет энергетики и связи имени Гумарбека Даукеева», ул. Байтурсынулы, дом 126/1, каб. А 120
Адрес редакции 050013, г. Алматы, НАО «Алматинский у ниверситет э нергетики и с вязи имени Гумарбека Даукеева», ул. Байтурсынулы, дом 126/1, каб. А 224, т ел.: 8 (727) 292 58 48, 708 880 77 99, e-mail: [email protected]
Issue output
Name of the periodical printed publication Scientific and technical journal "Bulletin of the Almaty University of Power Engineering and Telecommunications"
Owner of the periodical printed publication Non-profit joint-stock company "Almaty University of Power Enginnering and Telecommunications named after Gumarbek Daukeyev", Almaty, Kazakhstan
Chief Editor Professor, candidate of technical sciences Stoyak V.V.
Number and date of the registration certificate and the name of the issuing authority
№ KZ14VPY00024997 from 17.07.2020
Ministry of Information and Social Development of the Republic of Kazakhstan
Periodicity 4 times a year (quarterly)
Serial number and date of publication of a periodical printed publication
Number 60, edition 1, March 31, 2023
Subscription index 74108
Circulation of the issue 200 copies
Price Negotiable
The name of the printing house, its address Printing house of Non-profit joint-stock company "Almaty University of Power Enginnering and Telecommunications named after Gumarbek Daukeyev", 126/1 Baitursynuly str., office A 120, Almaty, Republic of Kazakhstan
Editorial office address 050013, Non-profit joint-stock company "Almaty University of Power Enginnering and Telecommunications named after Gumarbek Daukeyev",
A 2 2 4 , t e l .: 8 (727) 292 58 48, 708 880 77 99, e-mail: [email protected]