Transportation Engineering

(1)

Transportation Engineering 11 (2023) 100159

Available online 31 December 2022

Full Length Article

A comparison study between XR interfaces for driver assistance in take over request

Abhishek Mukhopadhyay

^a^,^*

, Vinay Krishna Sharma

^a

, Prashant Gaikwad Tatyarao

^a

, Aumkar Kishore Shah

^b

, Ananthram M C Rao

^a

, P Raj Subin

^a

, Pradipta Biswas

^a

aCentre for Product Design and Manufacturing, Indian Institute of Science, Bangalore, India

bAIIMS, New Delhi, India

A R T I C L E I N F O Keywords:

ISO 9241 pointing task Lane detection

Automatic lane navigation Motion path planning Mixed / Augmented reality Human factors

A B S T R A C T

Extended Reality (XR) is the umbrella term to address Augmented, Virtual, and Mixed Reality interfaces. XR interfaces also enables natural and intuitive interaction with secondary driving tasks like maps, music, calls and so on without the need to take the eyes off the road. This work evaluates ISO 9241 pointing task in XR interfaces and analyzes the ease of interaction, physical, and mental efforts required in augmented and mixed reality interfaces. With fully automated vehicles becoming an everyday reality is still some research years away, the drivers in a semi-automated vehicle must be prepared to intervene upon a Take Over Request. In such cases, the human drivers may not be required to have full manual control of the vehicle throughout its driving operation but interfere as and when required and perform passive supervision the other times. In this paper, we evaluate the impacts of using XR interfaces in assisting drivers in taking over requests, and during the first second of controlling the vehicles. A prototype of a simulated semi-autonomous driving assistance system is developed with similar interfaces in AR and MR. User studies were performed for a comparative analysis of mixed reality with augmented reality displays/interfaces based on response time to take over requests. In both ISO 9241 Pointing Task and automotive task, the AR interface took significantly less time than the MR interface in terms of task performance. Participants also reported significantly less requirement of mental and physical effort in using screen-based AR interfaces than HoloLens based MR interface.

1. Introduction

According to SAE International [1], ‘conditionally automated driving’ requires complete engagement between the vehicle and the driver. The driver always needs to reclaim control of the vehicle when it reaches its operational limits [2]. Delays in taking over control by drivers are often attributed to the primary cause of accidents in semi-autonomous vehicles [3]. Implementation of XR based driver assistance interfaces within such vehicles can be explored to reduce such hazards. However, such interfaces must be designed with care to maximize user performance and ensure minimal cognitive workload. An interaction typically involves pointing and selection tasks and such tasks require visual search and cognitive processing from the user. Measuring user performance in these tasks is important for comparison of ease of use of different modalities. Semi-autonomous driving can be controlled in two ways, (I) Collocated driving, where a driver is inside the vehicle and can be engaged with the driving scenario; (II) Remotely piloted

driving, where the driver can control the vehicle from a remote location.

In the case of collocated driving, drivers can use the cockpit view for situational awareness but still needs help from ADAS (Advanced Driver Assistance System) due to fatigue, engagement with secondary tasks, foggy weather, low light situations, and many other challenging environments. In remote cases, operators (driver or pilot) completely depend on machine vision for situational awareness. Research on ADAS tried to improve the accuracy of machine vision algorithms as well as alert systems for drivers to reduce latency in Take Over Request (TOR).

Recent vehicles explored both traditional 2D displays as well as 3D (Peugeot 308 RCZ vehicle) and augmented reality-based Head Up Display (HUD) as part of ADAS. AR and MR interfaces offer the possi- bility of improving information representation and its availability to the driver in the real vehicle environment and could be potential modalities for facilitating efficient driver interaction with ADAS and supporting its implementation.

As a first step towards exploring the suitability of AR and MR

* Corresponding author.

E-mail address: [email protected] (A. Mukhopadhyay).

Contents lists available at ScienceDirect

journal homepage: www.sciencedirect.com/journal/transportation-engineering

https://doi.org/10.1016/j.treng.2022.100159

Received 29 September 2022; Received in revised form 21 November 2022; Accepted 27 December 2022

(2)

interfaces for semi-automated driving control tasks, this paper conducted a pilot study to compare user performance within AR and MR interfaces using the ISO 9241 pointing and selection task based on Fitts law. Users’ movement times for target selection along with hand and palm movement amplitudes were tracked as quantitative output parameters. Cognitive workload was estimated subjectively using the NASA TLX questionnaire along with the measurement of physiological parameters such as EEG based brain activity and eye gaze patterns for both modalities. We found significantly faster movement times for target selection within AR interfaces along with reduced cognitive workload compared to that in MR interfaces. Next, a laboratory experiment was set up to compare operators’ performance in terms of latency in TOR for augmented and mixed reality based ADAS. Initially, we introduced a state-of-the-art lane detection model for unstructured road conditions.

The lane detection model is trained and evaluated for the Indian road dataset where lane markings are often less legible than on western roads.

We integrated this lane detection algorithm with a mobile robot and developed 2D (augmented reality-based) and 3D (mixed reality-based) ADAS using the AR foundation package and Microsoft Hololens. In this study, it was found that users can take over control statistically significantly faster in AR interface compared to MR interface and changing lanes took longer to initiate take over control compared to other traffic participants. Results from the study will be useful for developing ADAS for both traditional semi-autonomous cars as well as for automatic taxing of aircraft. In summary, this paper’s main contri- butions are as follows:

•Proposed a detailed study to compare ISO 9241 pointing task in augmented reality and mixed reality environment. We also compared the difficulties in terms of cognitive load using EEG data and novel algorithms based on ocular parameters.

•Reported a novel lane detection algorithm followed by lane navigation algorithm and evaluated them in different lighting conditions.

•Proposed a way of measuring response time in case of Take Over Request by fusing real and virtual driving scenarios in both augmented reality and mixed reality.

This paper is organized as follows. We discussed related work in Section 2 followed by the description of lane detection, lane navigation, and steer control algorithm in Section 3. We discussed about the pilot study using ISO 9241 pointing task study [4] and the automotive study in Section 4 followed by a general discussion and conclusion in Section 5.

2. Related work

2.1. Take over request in semi- autonomous vehicle

Researchers have shown that there are various challenges in take over requests in order to keep the autonomous vehicle secured. Casner et al. [5] proposed driving as a sharing task between humans and vehicle with a transparent interface of interaction. McCall et al. [3] proposed a taxonomy of scheduled and non-scheduled driving situations. They also pointed out different challenges like driver’s skills, situational awareness, and legal responsibilities while takeover transitions. Responding to take over requests is one of the critical parts of the complete take over request process. Van der Heiden et al. [6] used different audio signals to prepare drivers to takeover. They found that with an increasing pulse (signals used to mean urgency) driver responded quickest for takeovers.

Mok et al. [7] reported how drivers performed in an emergency take over request in impending road hazard. Kuehn et al. [8] did an empirical study to understand how quickly driver respond to take over request while performing non-driving related task. They found that distracted drivers were delayed 5 s to be situational aware to understand traffic situations.

2.2. Automatic navigation system

The Literature survey was focused on automatic control of aircrafts during their taxiing phase. Various controllers have been proposed for controlling aircrafts on proposed taxiways. Zajdel et al. [9] suggested a fuzzy logic method combined with a PID controller considering factors such as the aircraft, the nature of its movement, available controls, economic reasons, future certification challenges and the innovation potential. Christian Zammit et al. [10] proposed a modification of the pure pursuit algorithm to reduce the known characteristics of cutting corners when maneuvering along a pre-described path involving a turn.

In this the look-ahead distance of the pure pursuit algorithm was controlled using a fuzzy logic controller. The performance still depends on aircraft speed and hence the study suggests that lower speeds can facilitate accurate tracking during demanding maneuvers. Krawczyk et al. [11] mathematically modelled the control system and the aircraft to allow control laws synthesis of a remotely piloted aircraft system during its taxiing phase. The simulations were carried out in MATLAB environment and critical speeds for different weather conditions were found. Re [12] presents a method to obtain automatic inverse models of an on-ground aircraft for use in a ground controller based on feedback linearization. As an example, a simple one-track aircraft model has been written in an object-oriented modeling language, then automatically inverted, and used as inner loop of the control system, the outer loop consisted of linear PID controllers.

2.3. Interfaces in AR and MR for smooth take-over requests

Among many factors, warning, and alert systems in automotive interfaces play an important role in take-over requests (TOR). It ensures a timely and smooth transition from autonomous to manual driving.

Usually, three types of displays are used, visual, auditory, and tactile.

Visual displays, though being a straightforward means of communicating information, compete with visual resources and get overlooked while driving. Auditory displays easily convey spatial information through verbal or nonverbal signals. Perceiving auditory cues is eye-free but it can be easily masked by background noise. Vibrotactile displays signals are limited only to the seat, pedal, steering wheel, and seat belt.

Studies have shown that using a combination of these unimodal displays for conveying take-over requests generally leads to quicker responses and safer driving performance [13]. The use of multimodal augmented reality (AR) interfaces can enhance the performance of visual displays in take-over requests by detecting regions of interest in drivers’ views and improving driver situation awareness with superimposed spatial information [14]. AR displays has been found to improve the efficiency of driving takeover maneuvers, enhancing driver comfort, and increasing the anticipation accuracy of lane-changing maneuvers [15]. Augmented reality helps in simulating multiple traffic scenarios and designing fu- turistic pedestrian-vehicle interactions for ensuring safety and improving trust in autonomous vehicles [16]. Researchers have rein- forced the importance of AR displays in communicating uncertain information in automated driving by exploring various color and animation based visual variables for conveying urgency [17]. Re- searchers have proposed affordable vehicle to vehicle communication by integrating AR using smartphone camera and computer vision for connecting drivers and providing insights [18]. Researchers have also explored the use of augmented reality and gamification by blending digital components with physical components in a driving environment for improving safe driving practices and player engagement [19]. A review paper by Riegler et al. [20] shows growing trend in application of AR technology applied in driving assistance, safety aspects and many other areas. Schroeter et al. [21] proposed a methodological tool combining VR and real world videos for rapid protoyping of automotive interfaces. Riegler et al. [22] proposed a VR based automated driving simulator, combining software and hardware solution to understand trust, acceptance and other parameters in automation.

(3)

To summarize, existing works focused on developing user interfaces and or creating a extended reality environment for experimenting safety aspects, driving assistance or non-driving related tasks. Existing works did not focus much on modalities of interaction and its effect on sudden take over request in real world. Summarizing lane navigation works, we found that researchers assumed that (I) The map of the environment is readily available; (II) The lane is assumed to not split into multiple lanes;

(III) The case of unexpected objects coming in the followed lane is not considered. In this study, we did not need to assume that the map of the environment is already known because our control algorithm is based on lane detection using an RGB camera which detects the lanes present in from of the camera and accordingly applies the control to the vehicle.

The control algorithm also stops the vehicle if the lane splits into multiple lanes. In the user study, we have considered cases of objects coming in the followed lane as well as compared various modalities of user input. We reported a detailed pilot study to compare how augmented and virtual reality differ in terms of reaction time and how it affects cognitive load.

3. Methodologies

This section describes different algorithms used to make the turtle bot as semi-autonomous vehicle and to communicate with Augmented Reality and Mixed Reality setup. We developed a lane detection algorithm which was used as navigator. Then we developed a lane navigation algorithm to make sure that the vehicle always follows the ego lane.

We communicated the vehicle using Autonomous Steer Control algorithm. Finally, we develop a common interface used in both AR and MR to control the vehicle in take over request. The algorithms are described in the following sections.

3.1. Lane detection algorithm

We proposed a novel hybrid Convolutional Neural Network (CNN) architecture by fusing encoder-decoder architecture with dilated convolution mechanism. We computed weighted average of outputs of these branches and introduced a new loss function to improve lane detection performance. We also proposed an Indian Lane Dataset (ILD) with 6157 labelled images from India Driving Dataset (IDD) to match the criteria of unstructured road scenarios. We reported IoU (Intersec- tion over Union) of 0.26 on ILD which was higher than two state-of-the art lane detection models. Finally, we undertook an ablation study to understand effectiveness of dilated convolution branch and the proposed loss function. We have described the working principle of the algorithm in Fig. 1. The proposed model was tested and reported working in different lighting and challenging conditions in Indian roads and details of the work is reported in [23].

3.2. Lane navigation algorithm

The extracted points from lane detection model were passed on to the Control Generation algorithm, to generate the input for the mobile agent

to navigate autonomously. The algorithm takes in the image mask containing just the lanes (set of points) which were detected using the lane detection algorithm. The image was split into two parts from the middle. OpenCV in python was used to get the image coordinates of the lane from the split images. The right side of the image was used to find the image coordinates of the right lane and the left side was used for left lane image coordinates. Once the image coordinates were found, a quadratic curve was fitted through those coordinates and hence equa- tion of left and right lanes were found. Using these equations, the coordinates of middle lane were calculated.

‘Steer-Bias’ is the distance between the center of the camera and the middle lane in the x direction was found. If the center lane was towards the left of the center of camera, the error is considered positive or else, negative vice versa. We have explained the lane navigation algorithm in Algorithm 1.

3.3. Autonomous steer control algorithm

The mobile agent uses a Proportional and Derivative (PD) controller to navigate along the lane autonomously. The output from the Lane Navigation algorithm serves as input for the PD controller. The mobile agent has two control inputs, linear/forward velocity, and angular/

steering velocity. The forward velocity of the mobile agent is always fixed during the execution. The required steering input to move along the curved lane is generated from the PD controller based on the Steer- Bias. If the Steer-Bias is positive, the mobile agent is steered rightward from the path and angular velocity needs to be positive to stay on the path and negative for vice-versa. The proportional and derivative control parameters of the PD controller, Kp and Kd were set after empirical study to be 0.005 and 0.01 respectively. In the manual mode of control, the control inputs were 0.2 m/s and 0.4 rad/s for forward and steering velocities, respectively. The following is the pseudocode for the navigation of mobile agent. The mobile agent which was used in the experiment was TURTLEBOT3, Burger Model. The algorithm is explained in Algorithm 2.

4. User studies

We conducted two user studies as follows

1 Pilot study to compare and analyzing pointing and selection tasks in both AR and MR.

2 Study in automotive environment to measure response time of participants in case of take over request in semi-autonomous vehicle.

4.1. Pilot study

This study aims to compare and analyze pointing and selection in three-dimensional(3D) space on augmented reality (AR) and mixed reality (MR) interfaces. This paper extends the standard ISO 9241 pointing

Fig. 1.Block diagram of lane detection model combining hybrid network combining encoder – decoder and dilated convolution blocks.

Algorithm 1

Lane navigation algorithm.

Input: laneMask Output: Steer-Bias

while camera_feed is TRUE, do leftMask =getLeftMask(laneMask) rightMask =getRightMask(laneMask) leftLaneCoordinates =getCoordinates(leftMask) rightLaneCoordinates =getCoordinates(rightMask) leftLaneEquation =fitCurve(leftLaneCoordinates) rightLaneEquation =fitCurve(rightLaneCoordinaes)

middleLaneCoordinates =getMiddleLane(leftLaneEquation, rightLaneEquation) cameraCentre =getCameraCentre()

Steer-Bias =findDistance(cameraCentre, middleLaneCoordinates) end

(4)

task for 3D pointing and selection for three sets of size, distance, and depth of targets from a fixed center. The pointing is done by moving the hand towards the target using the user interface and the selection is done by interacting (clicking, touching) with the index finger. For both environments the hand movements (up to elbow), eye-gaze patterns, and the brain-activity (EEG) of the participants have been tracked using state-of-the-art tools and software. The following subsections explains study design, procedure, and results for each of the interfaces in detail.

4.1.1. Participants

We recruited 14 participants (all males) with an average age of 25.9 years (std: 3.2) from our university for the study in both environments.

Before starting the actual task, participants were first asked to take trials on both the environments. We took all necessary permissions and consent from all participants before undertaking trials.

4.1.2. Materials

We used Unity Game Engine to design the interface of the task. The interface was then presented to participants in both AR and MR environments. For AR environment the Unity app was run on 24-inch touchscreen Viewsonic TD 2455 sRGB monitor at 1920 ×1080 resolution. While for MR environment we used Microsoft HoloLens2, and the Unity app was run inside the HoloLens2. It has 2 K resolution with one eye and 47 pixels per degree of viewing angle. We used Microsoft

Lifecam USB camera to render the real background. The live-view camera was used to keep the look and feel of the task environment uniform in both the environments. Over the live real-time video from the camera, the center red ball and the subsequent target balls were rendered as unity sphere gameObjects. The unity app was run on 24-inch touchscreen ViewSonic TD 2455 sRGB monitor at 1920 ×1080 resolution. To accurately track and record hand movement of the participants we used the OptiTracker Motion Capture Software Motive 1.10.1 along with four Flex 13 cameras [24]. This motion capture system tracks rigid bodies defined by retroreflective infra-red (IR) markers (Fig. 2). To record eye-gaze patterns of the participants while performing the ISO 9241 pointing task [4] using the AR interface, we used Tobii Pro X3 120 eye-tracker with an advanced display setup. We used Emotiv Epoc Flex EEG tracker [25] for estimating cognitive load. It is a high density, 32-channel saline sensor wireless brain wear device.

4.1.3. Design

The task comprised a fixed central sphere (ball) of red color and a target sphere of green color. The target ball appears on the periphery of an imaginary(invisible) sphere randomly chosen from a set of concentric spheres centered at the center red ball. Once the center sphere is interacted with, it vanishes, and the green target sphere of a specific size appears on the interface at the periphery of the randomly selected imaginary sphere and at random angles (inclination and azimuth angle) from the center red ball. After the successful selection of the target ball, it vanishes, and the red central ball appears again marking the start of the next iteration. There are a total of 30 iterations (center red ball to target green ball selections) to be completed for a successful session for each interface. For each iteration, the size of target balls and distance from the center ball are randomly selected from a predefined set of values as:

• Size of Target Balls =[1.5,2.0,2.5] cm, Distance of Target Balls from center Ball =[5,10,15] cm

The Index of Difficulty (ID) for different size, width and depth is calculated and reported in Table 1. The details of hand movement tracking, brain activity tracking, and eye-gaze tracking is mentioned in detail in the following paragraphs.

Hand movement tracking. Pointing and selection is made by hand for the ISO 9241 pointing task in both interfaces. The task environment is kept uniform to study any differences in interaction based on modality. One aspect to measure this difference of interaction is recording and tracking the hand movements throughout the task. The motion capture system Algorithm 2

Autonomous steer control algorithm.

Input: Steer-Bias Output: Set Control Inputs Initialize Error and Previous-Bias at ‘0^′ While process is not terminated, do if mode is Autonomous

Angular Velocity = Kp∗(Steer− Bias) + Kd∗(Steer− Bias − Previous− Bias)

Previous− Bias = Steer− Bias

Move agent with Forward velocity(fixed)and Angular Velocity Set Inputs(Forward Velocity,Angular Velocity)

if mode is Manual if Forward is pressed

Set Inputs→Move Forward with positive forward velocity if Backward is pressed

Set Inputs→ Move Backward with negative forward velocity if Left is pressed

Set Inputs.→ Rotate left with positive steering velocity if Right is pressed

Set Inputs→Rotate right with negative steering velocity end end

Fig. 2.Setup of optitrack system for hand movement tracking.

(5)

was calibrated before every session to ensure accurate tracking.

Brain activity tracking (Electroencephalogram, EEG). EEG-based measures for cognitive load have been developed by many studies. It was found that informative indices concerning task engagement and cognitive workload was obtained from the ratio between the theta power (4–8 Hz) and the alpha power (8–12 Hz) and the ratio between the beta power (12–30 Hz) and the alpha power and several related combinations [26].

To study the brain or neural activity while performing ISO 9241 pointing task in three different interfaces, first we needed to record brain signals and then map it to certain performance metrices. Thirty-two electrodes in the EEG tracker were placed in the default Montage configuration. All electrodes were soaked in saline water for at least 30 min before putting it on the participant’s head. Enough time was given to stabilize the contact and EEG quality. Due care was practiced making the participant feel comfortable can calm throughout the experiment.

The raw EEG data for a session of a particular setting (AR or MR) of the experiment gives the raw EEG values for each electrode and average band power (averaged over 16 samples) for each band sampled through a Hanning window [27] before the Fast Fourier Transform. The median of each frequency band power for every electrode (EEG channel) for a single participant and particular experiment setting is recorded sepa- rately from the raw EEG data. The same is repeated for all the participants for all experiment settings to generate separate data files for each experiment setting. Each experiment setting data file comprises five sheets corresponding to the five frequency power bands. The rows of each of these frequency band sheets are the participants for that experiment setting and the columns are the EEG electrodes (channels, 32 in our case) with the cell values representing the median corresponding to the electrode and the participant for that experiment setting. These experiment setting-based data files are then analyzed to find out any significant differences between the values observed for each electrode and the experiment setting for each frequency band. The theta/alpha ratio (or Task Load Index, TLI) is also used to explore the assessment of workload which is based on the assumption that with a decrease in alpha power and an increase in theta power, an increase in mental load is associated [28,29], while an increased level of fatigue is related to increase of alpha and theta powers [30,31]. The ratio of the mean frontal midline theta energy to the mean parietal alpha energy is defined as the Task Load Index (TLI). The electrodes are highlighted in the topology plot (Fig. 4) with two different color, where dark orange circles indicate frontal theta electrodes and red circles indicate parietal alpha electrodes. It has been shown to increase over time during different cognitive tasks. We also visualized a topology map/plot to show the difference in power band between AR and MR. The pseudo code of the TLI analysis algorithm is shown in Algorithm 3.

Eye gaze tracking. Tobii Pro Eye Tracking manager provides function- ality to have an advanced display setup for the Pro X3 120 eye tracker.

Based on multiple trails, the eye-tracker was mounted on top of the screen so that the hand could not occlude participant’s eyes and a seamless tracking could be achieved. The screen was mounted at a

height of 120 cm from ground, inclined at 23^◦ to provide suitable interaction for most of the participants. Eye-tracker was calibrated for every user’s gaze and head position before the trail using an 8 point- calibration routine in eye-tracking manager software. Fig. 3(a) shows a complete augmented reality interface and experiment setup.

HoloLens2 has eye tracking and hand tracking capability. MR environment made with the help of the Mixed Reality Toolkit (MRTK) and Unity game engine. MRTK hand tracking and eye tracking profile allows the user to interact with holograms. Eye gaze position measured using standard script from MRTK. Eye gaze data can be recorded at approxi- mately 30 hz. Fig. 3(b) shows mixed reality view of ISO 9241 pointing task.

We calculated fixation and saccade rate by detecting fixations and saccades from gaze direction using the velocity threshold fixation identification method (I-VT) [32]. I-VT is a velocity-based method that separates fixation and saccade points based on their point-to-point velocities. I-VT then classifies each point as a fixation or saccade based on a simple velocity threshold. If the point’s velocity is below the threshold, it becomes a fixation point, otherwise it becomes a saccade point. We then calculated fixation and saccade rate as the number of fixations and saccades per [33]. We calculated velocity in terms of visual angle i.e., degrees per second to render gaze velocity independent of the image and screen resolutions. This calculation is based on the relationship between the eye position in 3D space in relation to the stimuli plane and the gaze positions on the stimuli plane. The angle is calculated by taking the direction vector of two consecutive sample gaze points. The angle is then divided by the time between the two samples to get the angular velocity.

The velocity threshold parameter is set to 30^◦/sec [34].

Once the above classification is complete, a clustering algorithm is applied on this data. Successive fixations are grouped as a single fixation and successive saccades are grouped as a single saccade. This results in a set of alternating fixations and saccades with every such fixation and saccade having its respective associated total duration. The total number of fixations and saccades are then counted. The task duration is given by the sum of durations associated with each fixation and saccade. The fixation rate is calculated as the ratio of total number of such fixations to the task duration. The average fixation duration is obtained by dividing the sum of individual fixation durations to the total number of such fixations. For every such saccade, the saccade velocity is recalculated using the previously described methodology of rate of change of visual angle with the use of consecutive direction vectors or gaze points. An average saccade velocity is obtained by dividing the sum of individual saccade velocities to the total number of such saccades. The pseudo code of the I-VT algorithm is shown in Algorithm 4.

4.1.4. Procedure

Participants were first explained about the task and then they were given a walkthrough on both environments. After the trials they were asked to undertake the task. Eye tracker was calibrated for each users’

gaze and head position using an 8-point calibration routine. After each Table 1

Index of difficulty for different parameters of ISO 9241 pointing task.

Size(W) cm Distance(D) cm ID ¼log₂(D

W+1)

1.5 5 2.12

1.5 10 2.94

1.5 15 3.46

2 5 1.81

2 10 2.58

2 15 3.09

2.5 5 1.58

2.5 10 2.32

2.5 15 2.81

Algorithm 3

Algorithm for calculating TLI from EEG Data.

Input: Raw EEG values for 32 channels Output: TLI

For each electrode

For each frequency band power

Sample raw EEG values through Hanning Window Band power averaged over 16 samples

Measure Median of the average band power for the complete task duration

Repeat Electrode Band Median for all participants

Obtain consolidated data table for five power bands (Participants-wise median values of 32 EEG channels)

TLI =(mean frontal midline theta energy)/ (mean parietal alpha energy) Return TLI

end

(6)

participant completed his/her trial, subjective feedback was collected using NASA TLX for cognitive load.

4.1.5. Results

We analyzed fixation and saccade rates from ocular parameters collected during the task. We further plotted ID against Movement Time (MT) for all participants in AR and MR interfaces. Fig. 5 shows the plot

for ID against MT across all participants for both the environments. A pairwise t-test p <0.05 showed significant difference between the two modalities. All participants took significantly less time in AR for the same index of difficulty in MR.

The users interacted with both the interfaces using their hands (finger).

A pairwise t-test p <0.05 showed significant difference between the gross forearm and hand movements of all the participants in both modalities.

Fig. 6 shows less forearm and hand movements in AR compared to MR. In ocular parameter analysis, we found significantly higher fixation rate in AR than MR (t =7.70,p<0.05, Cohen^′s d= 2.79), while average fixation duration (t= − 9.93, p<0.05, Cohen^′s d= 3.18 with observed power of 0.05) and average saccade velocity (t= − 12.41, p<0.05, Cohen^′s d= 5.18 with observed power of 0.05) was significantly higher in MR (Fig. 7). Table 2 shows the comparison of the ocular parameters across both environments. We also found that participants took more time to select the target in the MR environment than in the AR environment for similar difficulty levels. A paired t-test for TLI analysis (Fig. 8) showing increased cognitive load workload of the MR task with respect to AR task (t = − 2.27, p<0.05, Cohen^′s d= 0.51). Fig. 10 depicts the topology plots of the difference in the average medium band power of participants for each electrode between AR and MR. We used the median values for the analysis as there was no data preprocessing done before this analysis.

NASA TLX score showed that increased cognitive load workload (t=

− 2.78, p<0.05, Cohen^′s d= 0.75) of the MR interface with respect to AR interface. Fig. 9 shows different parameters as well as overall score of NASA TLX for both modalities. We summarized all parameters and their relation to both modalities in Table 3.

4.1.6. Discussion

This study compared two different modalities of interfaces with respect to ISO 9241 pointing task. We measured ID against Movement Fig. 3.Setup of AR and MR environments for ISO 9241 pointing task.

Fig. 4. A topology plot with highlighted electrodes used for calculating Task Load Index.

Algorithm 4

Pseudocode for the I-VT algorithm.

1. Calculate angle between two consecutive gaze directions.

2. Calculate angular velocity by dividing angle with the time between the two sample points.

3. Label each point below velocity threshold as a fixation, otherwise as a saccade

4. Return fixations and saccades.

5. Group successive fixations into one fixation and successive saccades into one saccade 6. Calculate total duration associated with every such fixation and saccade

7. Count the total number of such fixations and saccades respectively

8. Calculate task duration as sum of durations associated with each fixation and saccade 9. Calculate fixation rate as ratio of total number of such fixations to task duration 10. Calculate average fixation duration as ratio of sum of individual fixation durations to the total number of fixations

11. For every such saccade, calculate saccade velocity using steps 1–2

12. Calculate average saccade velocity as ratio of sum of individual saccade velocities to the total number of saccades

(7)

Time (MT) for both modalities. All participants took significantly less time in AR for the same index of difficulty in MR. We also reported forearm and hand movements during ISO 9241 pointing task. The results showed that it required significantly less movement in AR than in MR.

The cognitive load estimation using both ocular parameters and EEG tracker shows that completing tasks in AR demand less than MR. Using this index as a measure of cognitive workload has been likened to

“Brainbeat” by Holm A. et al. [35] just as the heartbeat indicates the Fig. 5.Movement time vs index of difficulty for AR and MR.

Fig. 6.Forearm and hand movement amplitudes for AR and MR.

Fig. 7.Fixation rate and average saccade velocity for AR and MR.

Table 2

Comparison of ocular parameters between AR and MR.

Augmented Reality Mixed Reality

Fixation Rate (/sec) 5.04 1.28

Average Fixation Duration (secs) 0.16 0.79

Average Saccade Velocity (degree/sec) 55.05 214.49

(8)

status of the cardiovascular system. In the present study, comparing the midline frontal theta and parietal alpha via the electrodes shows the increased cognitive workload of the MR task with respect to the AR task.

Since the ocular parameters are task dependent and cognitive processing is a highly complex activity, there are contrasting results on the relationship between ocular parameters and cognitive load seen in literature [36–39]. When the cognitive load increases, a decrease in fixation rate and increase in average fixation duration was observed. A higher cognitive load also leads to higher saccade velocity (higher saccade amplitudes, lower durations) [39]. We classified all eye gaze movements as fixation and saccade. We noted that in AR, there are a greater number of fixations with lower duration, while in MR, less number of fixation with higher duration. The reasoning is that during demanding cognitive processes, users will concentrate more on a target leading to lower number of fixations per second and increased average fixation duration.

Since this is a pointing and selection task, users made fewer, longer fixations as they focused more attention, cognitive efforts on selecting targets due to the MR interaction modality. NASA TLX score showed that workload in MR was significantly more than AR.

4.2. Comparing different modalities of interfaces – an automotive study The pilot study investigated differences in interaction with AR and MR. In this section, we described a user study which was conducted to validate if AR and MR have different effect in performance while reacting to take over request generated by the vehicle.

4.2.1. Participants

Participants (7 male and 3 female) with an average age of 27.6 (std:

4.2) years from our university volunteered to engage in this study. We chose participants randomly such that the group had a mixture of people wearing and not wearing prescription lenses. The participants wearing lenses had either spherical or cylindrical or both type of powers. We conducted a free trial for participants to get used to both interfaces. This also helped them to drive and keep the turtlebot within lane. We started

the actual trial after sufficiently training participants to make sure that they are used to the interfaces specially in mixed reality environment.

We took all necessary permissions and consent from all participants before undertaking trials. Materials

This research aims to simulate and evaluate a level 3 autonomy vehicle with traffic participant interaction with take-over requests in a screen-based (Samsung Tab 6 Lite) augmented reality (AR) display and a head-mounted (Hololens2) mixed reality (MR) display. TurtleBot3 is doubled as the automated vehicle with the proposed lane detection algorithm running on NVIDIA GeForce RTX 2070 GPU. We fixed a Microsoft Livecam 1080D to get the live feed from the real environment.

Logitech DrivePro V16 steering wheel with pedals were used to maneuver the vehicle. The augmented reality environment was designed in Unity 3D. AR Foundations package provided the necessary AR devel- opment tools for the Android platform. Microsoft provides Mixed Reality Toolkit (MRTK) for developing interactable spatial holograms for the Hololens2.

4.2.2. Design

The mixed-reality driving environment replicate a virtual city with real lane-marking roads and a traffic intersection. The environment was populated with randomly moving traffic participants (cars and humans).

The user interfaces and the driving environment were kept the same for both AR and MR displays. The user interface comprised of start button, a stop button, a continue button, and a switch modality button. Pressing the stop button halt the car irrespective of the mode (autonomous or manual). It worked as a kill switch to prevent any impending collision or lane deviation. Pressing the continue button resumed the car motion.

Pressing the switch modality button changed the control back and forth from automatic to manual mode based on the take over-requests. The communication from the user interface to the vehicle happened through a local server which transfer control signals between the interface and the vehicle. A separate screen showing the live lane detection view was provided alongside the AR and MR display to enhance the participant’s situational awareness.

The participants were instructed to maneuver the vehicle from start to destination point while avoiding traffic participants and switching back and forth from autonomous to manual control at intersections. The display popped up a visual and auditory alert (take over request) when any traffic participant was observed intersecting the road or approach- ing towards the vehicle. Correspondingly, the system demanded the participants to stop the vehicle by pressing the stop button. A similar alert (take over request) popped up when the vehicle approaches an intersection, demanding the participant to switch to manual mode and manually maneuver the vehicle to the desired lane before switching back to autonomous mode. The reaction time between the takeover request and pressing the button was measured for each participant for two tasks with random destinations after one trail. We also recorded total task completion time. To summarize, the study was designed as 2

×2 factorial where the independent variables are: (I) Modalities (AR and MR) and Take Over Request conditions (Obstacle and intersection of lane). We measured Response Time and Task Completion Time (TCT).

After each participant completed his/her trial, subjective feedback was collected using NASA TLX for cognitive load and SUS questionnaire for subjective preference. The experimental design is illustrated in Fig. 11.

4.2.3. Results

Lane detection algorithm. We evaluated the proposed lane detection model on ILD and compared it with two other state-of-the-art models from TuSimple leaderboard [40], i.e., RESA [41] and Discriminative Loss Function based model [42]. We used IoU as the metric for 5 different classes of road environment (low light, shadow, curve, highway and normal). We trained and tested three models on ILD datasets.

We reported a detailed comparison chart in Table 4. We found that Fig. 8.Comparing task load index for AR and MR.

Fig. 9. NASA TLX score for pointing and selection task in AR and MR.

(9)

irrespective of the road scenario, the proposed model achieved 21% of improvement over RESA [41]. However, IOU was lower than RESA by 1% in low-light condition.

4.2.4. Comparison between different modalities of interaction

We undertook a repeated measure ANOVA for RT. We have found significant main effect of TOR conditions on

• RT (Response Time) F(1, 18) =32.45, p<0.05, η²=0.64 with observed power of 1.

We also found tendency towards significant difference for the interaction effects of TOR × Iterations at p= 0.056. Fig. 12 shows interaction effects of variables on RT while Table 5 shows mean with Fig. 10.Topology plots for each power band showing differences between AR and MR.

Table 3

Comparing trend between AR and MR.

Parameters Trend in Parameters

Fixation Rate ↑

Average Fixation Duration ↓

Average Saccade Velocity ↓

Task Load Index ↓

Hand Movement ↓

Forearm Movement ↑

NASA TLX Score ↓

↑: Higher value for AR;↓: Lower value for AR.

(10)

lower and upper bound of standard deviation. However, we did not find any other significant main or interaction effects. In case of measuring TCT, we undertook a mixed ANOVA ((Modality (2) ×Iteration as between subject)) for TCT. We have found significant main effect of modality on

• TCT (Task Completion Time) F(1, 18) =6.77, p<0.05, η²=0.28 with observed power of 0.69.

Fig. 12 shows comparison of TCT for different modalities. However, we did not find any significant difference for any interaction effect in the mixed ANOVAs. The average TLX score for AR and MR are 20.77 and 34.31 respectively. A paired t-test for TLX score indicate that the perceived cognitive load in AR is significantly lower (t= − 2.67, p<

0.05, Cohen^′s d= 0.81) than in MR. The average SUS score for AR is higher than MR, where a paired t-test for SUS indicate that subjective preference between AR and MR is significantly different (t=3.02,p<

0.05, Cohen^′s d= 1.18). We have shown a comparison graph of modalities for TLX and SUS in Fig. 13. We have also shown a comparison graph for different parameters in TLX for AR and MR in Fig. 14.

4.2.4. Discussion

This study compared two different modalities of interfaces with respect to a pilot undertaking take over request in semi-autonomous vehicle. We used traditional 2D display as well as 3D display as part Fig. 11.Extended reality display with user interaction controls and the driving environment. (a)AR interface, (b) MR interface.

Table 4

Comparing performance of the different lane detection models on ILD dataset by using IoU Score.

Model Name Low Light Shadow Curve Highway Normal Average IoU

Our Proposed Model 0.04 0.228 0.364 0.437 0.236 0.261

RESA [19] 0.05 0.192 0.269 0.332 0.228 0.214

Discriminative Loss Model [32] 0.046 0.062 0.082 0.117 0.063 0.072

Fig. 12.Effect of modalities and task iterations on TOR response time.

Table 5

Mean response time of modalities and task iterations.

Iteration Mean Lower Bound Upper Bound

First AR Obstacle 1.282 0.982 1.582

Intersection 3.152 0.840 5.464

MR Obstacle 1.641 1.196 2.087

Second AR Obstacle 1.296 0.996 1.596

MR Obstacle 1.328 0.883 1.774

(11)

of ADAS. We measured response times, task completion time and workload for both modalities. The 3D display (MR) did not increase response time compared to traditional 2D display. However, response time was higher for intersection compared to obstacle detection. We posit that rapid response towards avoiding accidents helped in reducing response time in case of obstacle, while in intersection participants were in dilemma to take control of the vehicle. We observed that it took significantly more time than 2D display in completing the task (TCT).

We have also observed that while participants perform the task second time in 3D display, TCT got reduced. NASA TLX score showed that workload in 3D display was significantly more than 2D display while SUS score indicated 2D display was more acceptable to them compared to 3D display. It may happen as HoloLens is new to maximum of participants. Although they went through trial, the wearable feature of it along with its weight and heat made the participants uncomfortable.

Compared to 3D display, 2D display was comfortable to participants as all of them use touch screen-based devices in their day-to-day life. Our assumptions are supported by TLX and SUS score.

5. General discussion

This paper presents a comparative study on users’ performance to assist driver in ADAS. In particular, we compared ADAS in AR and MR environment to investigate the latency in take over request. To realize the goal, we first developed a novel lane detection algorithm using CNN for unstructured road conditions and then integrated this algorithm with a mobile robot. Apart from lane detection algorithm, we also developed lane navigation and autonomous steer control algorithm to control and navigate the robot automatically. The algorithms also made sure that the robot follows the ego lane. We then developed an interface in AR and MR environment to control and assist the robot in take over request. The whole set up was then used to compare the interaction effect across the environments. We conducted a user study to investigate if there is any difference in users’ performance while assisting the robot in both

environments. To support our study of interaction in AR and MR we also conducted a standard ISO 9241 pointing task study in both environments. The ISO 9241 pointing task study would help us to examine the hand movement and interaction effect in AR and MR. The results of the study can be leveraged to develop and deploy an autonomous vehicle for similar type of unstructured road environments. Besides the current study, our experimental set up could also be useful in multiple other scenarios and studies. For example, it can be used for different types of traffic participants like humans, cars and so on. The current set up can also be extended to include a greater number of vehicles and obstacles to imitate the real-life situations. Furthermore, along with these benefits it can be tested for different types of ego vehicles like cars and aircrafts.

We did not include driving simulator study in the paper because it is an initial stage of the long-term goal to build an autonomous vehicle for unstructured road environments and understanding initial users’

response to such conditions is an integral part of the whole process. The AR and VR environments provide platform to user to comprehend real life situations by allowing them to control robots and vehicles with realistic backgrounds. Furthermore, the implementation of the AR and MR environments restricted the use of driving simulator because it was difficult to register objects or roads displayed on the screen of a desktop driving simulator from HoloLens. The use of mobile robots and realistic lane detection algorithms have an edge over other set up because of the easy deployment to actual vehicles.

5.1. Limitation and future work

Our pilot study provides valuable insights about various factors in performing ISO 9241 pointing task in two different modalities (AR and MR). We therefore designed an automotive study to measure response time in take over requests which is based on interaction in the above- mentioned modalities. Although the majority of participants took more time in MR interface, immersive environment gives a feel of real driving scenarios. We therefore plan to integrate more complex road scenarios (adding real and virtual obstacles, increasing frequency of appearance of vehicles, introducing more intersections). Since the data is limited to young Indian participants, research is needed to verify our findings for people with different cultural and diverse age groups.

However, if young students find something difficult it is likely that their elder counterparts will also find it difficult. We are currently in progress to develop object detection algorithm and integrating with the current system. The driving assistance system featured with both lane and obstacle detection will have huge potential to work in Indian road environment, which is challenging for autonomous vehicle.

6. Conclusion

This paper proposes an interaction system to facilitate human machine interaction in conditionally automated driving environment while undertaking take over request. Before comparing the XR interfaces in automotive, we investigated the continuum of XR media using standard ISO 9241 pointing task study. We compared forearm and hand movement in both the environment. We compared user performance quali- tatively and quantitatively within Augmented Reality (AR) and Mixed Reality (MR) interfaces. Using this knowledge, we investigated the interfaces in automotive study. The proposed system consists of novel lane detection and navigation algorithm along with interaction modalities.

The algorithms were integrated in both AR, and MR display to evaluate response to take over request time and task completion time. Our study found that user took significantly less time in completing the task using screen-based AR interfaces as compared to HoloLens based MR interfaces. However, the response time was not found to be significantly different for different modalities. Moreover, the qualitative results including system usability score showed that interaction in MR was more frustrating and demanded more effort from user.

Fig. 13. A comparison graph to show effect of modalities on different parameters.

Fig. 14. A comparison graph to show individual parameters of TLX for AR and MR.

(12)

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability

Data will be made available on request.

Supplementary materials

Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.treng.2022.100159.

References

[1] S.A.E. International (2018). Taxonomy and definitions for terms related to on-road motor vehicle automated driving systems (Standard No. J3016).

[2] P. Bazilinskyy, A. Eriksson, B. Petermeijer, J. de Winter, Usefulness and satisfaction of take-over requests for highly automated driving, in: Proceedings of the Road Safety & Simulation International Conference (RSS 2017), the Hague, Netherlands, 2017.

[3] R. McCall, F. McGee, A. Meschtscherjakov, N. Louveton, T. Engel, Towards a taxonomy of autonomous vehicle handover situations, Paper presented at the, in:

Proceedings of the 8th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, Ann Arbor, MI, USA, 2016.

[4] ISO 9241-11:2018(en) ergonomics of human-system interaction — Part 11:

usability: definitions and concepts, Available at: https://www.iso.org/obp/ui/#is o:std:iso:9241:-11:ed-2:v1:en, Accessed on: 22nd August 2022.

[5] S.M. Casner, E.L. Hutchins, D. Norman, The challenges of partially automated driving, Commun. ACM 59 (5) (2016) 70–77, https://doi.org/10.1145/2830565.

[6] R.M. van der Heiden, S.T. Iqbal, C.P. Janssen, Priming drivers before handover in semi-autonomous cars, in: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, ser. CHI ’17, New York, NY, USA, ACM, 2017, pp. 392–404.

[7] B.K.J. Mok, M. Johns, K.J. Lee, H.P. Ive, D. Miller, W. Ju, Timing of unstructured transitions of control in automated driving, in: Proceedings of the Paper Presented at the 2015 IEEE Intelligent Vehicles Symposium (IV), Seoul, Korea, 2015.

[8] M. Kuehn, T. Vogelpohl, M. Vollrath, Takeover times in highly automated driving (level 3), in: Proceedings of the Paper Presented at the 25th International Technical Conference on the Enhanced Safety of Vehicles (ESV) National Highway Traffic Safety Administration, Detroit, MI, 2017.

[9] Albert Zajdel, Cezary Szczepanski, Mariusz Krawczyk, Jerzy Graffstein, Piotr Maslowski, Selected aspects of the low level automatic taxi control system concept”, Trans. Inst. Aviat. 2 (247) (2017) 69–79, https://doi.org/10.2478/tar- 2017-0016. No.Warsaw.

[10] Christian Zammit, David Zammit-Mangion, A control technique for automatic taxi in fixed wing aircraft, 13-17 January 2014, National Harbor, Maryland, in:

Proceedings of the 52nd Aerospace Sciences Meeting, 2022, https://doi.org/

10.2514/6.2014-1163.

[11] Mariusz Krawczyk, Cezary Jerzy Szczepanski, Albert Zajdel, Aircraft model for the automatic taxi directional control design, Aircr. Eng. Aerosp. Technol. 91 (2) (2019) 289–295, https://doi.org/10.1108/AEAT-01-2018-0025. /© Emerald Publishing Limited [ISSN 1748-8842].

[12] Fabrizio Re, Automatic control generation for aircraft taxi systems through nonlinear dynamic inversion of object-oriented model, in: Proceedings of the EuroGNC 2013, 2nd CEAS Specialist Conference on Guidance, Navigation &

Control, Delft University of Technology, Delft, The Netherlands, April 10-12, 2013.

[13] J.H. Yang, S.C. Lee, C. Nadri, J. Kim, J. Shin, M. Jeon, Multimodal displays for takeover requests. User Experience Design in the Era of Automated Driving, Springer, Cham, 2022, pp. 397–424.

[14] I. Politis, S. Brewster, F. Pollick, Language-based multimodal displays for the handover of control in autonomous cars, in: Proceedings of the 7th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, ACM, New York, 2015, pp. 3–10.

[15] S. Langlois, B. Soualmi, Augmented reality versus classical HUD to take over from automated driving: an aid to smooth reactions and to anticipate maneuvers, in:

Proceedings of the IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), New Jersey, IEEE, 2016, pp. 1571–1578.

[16] W. Tabone, Y.M. Lee, N. Merat, R. Happee, J De Winter, Towards future pedestrian- vehicle interactions: introducing theoretically-supported AR prototypes, in:

Proceedings of the 13th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, 2021, pp. 209–218.

[17] A. Kunze, S.J. Summerskill, R. Marshall, A.J. Filtness, Augmented reality displays for communicating uncertainty information in automated driving, in: Proceedings

of the 10th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, 2018, pp. 164–175.

[18] C. Wang, Z. Lu, J. Terken, J. Hu, HUD-AR: enhancing communication between drivers by affordable technology, in: Proceedings of the 9th International Conference on Automotive User Interfaces and Interactive Vehicular Applications Adjunct, 2017, pp. 249–250.

[19] F. Steinberger, R. Schroeter, V. Lindner, Z. Fitz-Walter, J. Hall, D. Johnson, Zombies on the road: a holistic design approach to balancing gamification and safe driving, in: Proceedings of the 7th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, 2015, pp. 320–327.

[20] Andreas Riegler, Andreas Riener, Clemens Holzmann, A systematic review of augmented reality applications for automated driving: 2009–2020, Presence Teleoper. Virtual Environ. 28 (2022) 87–126.

[21] Ronald Schroeter, Michael A. Gerber, A low-cost vr-based automated driving simulator for rapid automotive ui prototyping, Adjunct, in: Proceedings of the 10th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, 2018, pp. 248–251.

[22] Andreas Riegler, Andreas Riener, Clemens Holzmann, Virtual reality driving simulator for user studies on automated driving, in: Proceedings of the 11th International Conference on Automotive User Interfaces and Interactive Vehicular Applications: Adjunct Proceedings, 2019, pp. 502–507.

[23] Abhishek Mukhopadhyay, L.R.D. Murthy, Imon Mukherjee, Pradipta Biswas, A hybrid lane detection model for wild road conditions, IEEE Trans. Artif. Intell.

(2022), https://doi.org/10.1109/TAI.2022.3212347.

[24] Optitrack, Available at: https://optitrack.com/, Accessed on: 22nd August 2022.

[25] Emotivepro, available at: https://www.emotiv.com/epoc-flex/, Accessed on: 22nd August 2022.

[26] Friedman N., Fekete T., Gal K., Shriki O. EEG-based prediction of cognitive load in intelligence tests [Internet]. Frontiers. Frontiers; 1AD [cited 2022Aug15].

Available from: https://www.frontiersin.org/articles/10.3389/fnhum.2019.00 191/full.

[27] Mawia A Hassan, Elwathiq A. Mahmoud, Abdalla H. Abdalla, Ahmed M. Wedaa, A comparison between windowing fir filters for extracting the eeg components, J. Biosens. Bioelectron. 6 (4) (2015) 1.

[28] I. K¨athner, S.C. Wriessnegger, G.R. Müller-Putz, A. Kübler, S. Halder, Effects of mental workload and fatigue on the p300, alpha and theta band power during operation of an erp (p300) brain–computer interface, Biol. Psychol. 102 (2014) 118–129, https://doi.org/10.1016/j.biopsycho.2014.07.014.

[29] S.H. Fairclough, L. Venables, Psychophysiological candidates for biocybernetic control of adaptive automation, Hum. Factors Des. (2004) 177–189, https://doi.

org/10.1037/e577062012-018.

[30] P. Antonenko, F. Paas, R. Grabner, T. Van Gog, Using electroencephalography to measure cognitive load, Educ. Psychol. Rev. 22 (2010) 425–438, https://doi.org/

10.1007/s10648-010-9130-y.

[31] G. Borghini, G. Vecchiato, J. Toppi, L. Astolfi, A. Maglione, R. Isabella, et al., Assessment of mental fatigue during car driving by using high resolution EEG activity and neurophysiologic indices, in: Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, San Diego, CA, IEEE, 2012, pp. 6442–6445.

[32] D.D. Salvucci, J.H. Goldberg, Identifying fixations and saccades in eye-tracking protocols, in: Proceedings of the 2000 Symposium on Eye Tracking Research &

Applications, 2000, pp. 71–78.

[33] G. Prabhakar, A. Mukhopadhyay, L. Murthy, M. Modiksha, D. Sachin, P. Biswas, Cognitive load estimation using ocular parameters in automotive, Transp. Eng.

(2020).

[34] A. Olsen, R. Matos, Identifying parameter values for an I-VT fixation filter suitable for handling data sampled with various sampling frequencies, in: Proceedings of the Symposium on Eye Tracking Research and Applications, 2012, pp. 317–320.

[35] A. Holm, K. Lukander, J. Korpela, M. Sallinen, K.M.I. Müller, Estimating brain load from the EEG, Sci. World J. 9 (2009) 639–651, https://doi.org/10.1100/

tsw.2009.83.

[36] Johannes Zagermann, Ulrike Pfeil, Harald Reiterer, Measuring cognitive load using eye tracking technology in visual computing, in: Proceedings of the Sixth Workshop on Beyond Time and Errors on Novel Evaluation Methods for Visualization (BELIV ’16), New York, NY, USA, Association for Computing Machinery, 2016, pp. 78–85, https://doi.org/10.1145/2993901.2993908.

[37] Siyuan Chen, Julien Epps, Natalie Ruiz, Fang Chen, Eye activity as a measure of human mental effort in HCI, in: Proceedings of the 16th international conference on Intelligent user interfaces (IUI ’11), New York, NY, USA, Association for Computing Machinery, 2011, pp. 315–318, https://doi.org/10.1145/

1943403.1943454.

[38] Radha Nila Meghanathan, Cees van Leeuwen, Andrey R. Nikolaev, Fixation duration surpasses pupil size as a measure of memory load in free viewing, Front.

Hum. Neurosci. 8 (2015) 1063.

[39] Kerri Walter, Peter Bex, Cognitive load influences oculomotor behavior in natural scenes, Sci. Rep. 11 (1) (2021) 1–12.

[40] K. Zhou, Li Tomatosliu, C., and M. Wang. Tusimple dataset. [Online]. Available:

https://github.com/TuSimple/tusimple-benchmark/.

[41] T. Zheng, H. Fang, Y. Zhang, W. Tang, Z. Yang, H. Liu, and D. Cai, “Resa: recurrent feature-shift aggregator for lane detection,” arXiv preprint arXiv:2008.13719, 2020.

[42] B. De Brabandere, D. Neven, and L. Van Gool, “Semantic instance segmentation with a discriminative loss function,” arXiv preprint arXiv:1708.02551, 2017.