This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3057808, IEEE Access   VOLUME XX, 2017 1 Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000. Digital Object Identifier 10.1109/ACCESS.2017.Doi Number Intuitive Robot Teleoperation through Multi-Sensor Informed Mixed Reality Visual Aids S. Livatino1, D.C. Guastella2, G. Muscato2, V. Rinaldi3, L. Cantelli2, C.D. Melita2, A. Caniglia4, R. Mazza5, and G. Padula6 1School of Physics, Engineering and Computer Science (SPECS), University of Hertfordshire, Hatfield, AL10 9AB, United Kingdom 2Department of Electrical, Electronics and Computer Engineering (DIEEI), University of Catania, Viale A. Doria 6, 95125 Catania, Italy 3Leverhulme Research Centre for Forensic Science, University of Dundee, Dundee DD1 4HN, United Kingdom 4Microsoft Italy, Via Pasubio 21, 20154 Milan, Italy 5Department of Innovative Technologies, University of Applied Sciences and Arts of Southern Switzerland (SUPSI), Manno, Switzerland 6Academic Laboratory of Movement and Human Physical Performance, Dynamolab, Medical University of Łódź, 90-419 Łódź, Poland Corresponding author: S. Livatino (e-mail: s.livatino@herts.ac.uk). The work carried out by the University of Catania is in the framework of the project “Safe and Smart Farming with Artificial Intelligence and Robotics - programma ricerca di ateneo UNICT 2020-‐22 linea 2” ABSTRACT Mobile robotic systems have evolved to include sensors capable of truthfully describing robot status and operating environment as accurately and reliably as never before. This possibility is challenged by effective sensor data exploitation, because of the cognitive load an operator is exposed to, due to the large amount of data and time-dependency constraints. This paper addresses this challenge in remote-vehicle teleoperation by proposing an intuitive way to present sensor data to users by means of using mixed reality and visual aids within the user interface. We propose a method for organizing information presentation and a set of visual aids to facilitate visual communication of data in teleoperation control panels. The resulting sensor-information presentation appears coherent and intuitive, making it easier for an operator to catch and comprehend information meaning. This increases situational awareness and speeds up decision-making. Our method is implemented on a real mobile robotic system operating outdoor equipped with on-board internal and external sensors, GPS, and a reconstructed 3D graphical model provided by an assistant drone. Experimentation verified feasibility while intuitive and comprehensive visual communication was confirmed through an assessment, which encourages further developments. INDEX TERMS Virtual Reality, Augmented Reality, User Interfaces, Graphical User Interfaces, Human- Robot Interaction, Telerobotics, Stereo Vision. I. INTRODUCTION During the last two decades, robotic vehicles have been proposed for controlled environments such as depots and automated manufacture halls. Different systems have also been employed to help with dangerous tasks such as bomb disposal and mine discovery. Whereas mobile robots' autonomy has constantly increased, so has the awareness of the unreplaceable value of manual teleoperation, especially for challenging tasks and unknown environments. Manually operated robotic vehicles, quite often behaving semi-autonomously, need to be commanded through their operator's interface. There has been an increased consciousness about the role of the interface in enhancing operator's situational awareness and its impact on operational performance. Operator's situational awareness is a key aspect to operate effectively remote vehicles. This means among other things: understanding the surrounding environment, the robot location, the contextual robot-environment movements, and predicting future robot-environment behaviors [1],[2]. New literature contributions have proposed different ways to increase awareness, e.g. better representations of sensors information [3], wider range of available commands and options [4], and more intuitive dashboards including the use of three-dimensional displays [5]. Despite the use of immersive displays and mixed reality (MR) representations has often been discussed [6]-[10], it is rare to find it on commercial products. There is nonetheless consensus about its potential and advantage in increasing operator's sense of presence in remote workspaces. This in turn means greater environment This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3057808, IEEE Access   VOLUME XX, 2017 9 comprehension, which positively affects task performance and decision-making. Teleoperation interfaces have recently included stereoscopic three-dimensional (S3D) monitors [11][12], while in some fields such as telesurgery, S3D has become an established technology (see e.g. the spread of the daVinci system). The involvement of other human sensor modalities such as haptics is also being researched, but it does not appear on commercial products yet. When remotely operating in outdoor natural environments, users may be challenged by: complexity of scenarios and objects, events dynamics, richness of the provided live-sensor data, limited display size, and also by the way available prior knowledge is communicated. Figure 1 illustrates main actors playing a role in operational behavior and decision process. This work proposes an intuitive way to present multiple sensor data to a remote-vehicles operator, which is expected to increase situational awareness. This is to be achieved: (1) by providing operators with as much knowledge as possible about the remote space and its current condition; and (2) by communicating information intuitively. We have today many information-rich sensors at robot disposal [13][14], e.g. 3D cameras, laser scanners, sonars, infra-red range finders and GPS, which can provide knowledge to greatly improve navigation and intervention performance. The challenge is then about how to communicate users a fairly large amount of sensor information effectively, therefore avoiding cognitive overload [1]. Our answer is to present sensor information visually and this should rely on:  Coherent combination of different live sensors information. A lot can be achieved by applying Information Visualization theories to improve visual communication of data.  MR representations and immersive displays. This combination can increase users' comprehension and sense of presence, and therefore remote-space awareness. The operator will be observing an adaptive MR scenario consisting of three-dimensional streamed videos and graphical representations of sensor data. The latter will be designed as multi-sensor informed visual aids (VAs). The focus is on outdoor natural scenarios, where robots are able to provide positional and attitude information together with live views of the surrounding environment. Previously or concurrently acquired environment maps can be considered, e.g. maps today achievable from drone aerial views. The next section introduces the state of the art, whereas section III describes the proposed approach and specific choices. Section IV describes the implemented system, whereas section V analyzes results of experimentation trials. In section VI conclusions are drawn. II. IMMERSIVE VISUALIZATION AND MR INTERFACES A. VIRTUAL REALITY HEADSETS Ivan Sutherland built the first HMD prototype in 1968 [15]. VR headsets had since then received continuous interest because of their potential in providing full visual immersion into artificially generated environments. However, despite the relevant improvements occurred in the last decades, with HMDs adopting optical tracking and OLED displays, while becoming smaller and lighter [16], some other issues remained. These were e.g. observed tunnel vision, tethering to PC, weight, portability and high cost. They all limited HMD’s adoption into the consumer market and HMD use was mainly confined within research labs. This happened up until this last decade. A notable step forward came then in 2012, with the first Oculus Rift system [17], which among other things, featured wide Field of View (FOV) and low-cost. There has been great development since then with newer systems featuring wider displays and higher resolution, lower cost and wireless connection. The latter being one of the latest systems’ focus, which saw first the making of “smartphone- based VR headsets”, which rely on smartphones and their displays to operate [18][19], then the standalone VR headsets, which rely on dedicated computing and display systems [20][21]. They opened up to wide applications and a large audience [22]. Today’s highest-specs VR headset have nonetheless remained those wired to a desktop PC. The reason being HMDs can exploit the greater PC processing and graphic power [23][24]. Additional options are meantime being proposed, such as e.g. embedded eye-trackers [25][26]. Compared to views of traditional 2D desktop monitors, but also to higher specs stereoscopic 3D monitors [27][28], VR headsets feature high user’s isolation from the surrounding environment, while allowing for continuous omnidirectional / 360° viewing through natural head movements, rather than using a computer mouse or multi-dimensional joysticks. Figure 2 depicts our user wearing a VR headset. FIGURE 1. Conceptual illustration of key factors and user interface role during robot teleoperation This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3057808, IEEE Access   VOLUME XX, 2017 9 B. MIXED REALITY REPRESENTATIONS Virtual reality (VR) is the simulation of a world that can be real or invented. VR is often experienced through the use of immersive technologies, therefore involving human sensory inputs and especially vision. Computer graphics (CG) is typically used to visually represent VR environments, but also to represent signs, symbols, indicators, diagrams and numbers. MR is the combination of real objects (live or recorded) with virtual objects, within consistent representations. Instances of MR are Augmented Reality (AR) and Augmented Virtuality (AV) [29] depending on which between reality and virtuality is the primary visual element. A main challenge for MR is alignment of real and virtual elements, which has been responsible for slower take up during last years compared to VR. Nonetheless, MR has recently got a new momentum because of the significant technology improvements of VR/AR headsets. These have become lighter, wireless, comfortable, sunlight insensitive and portraying sharp and well-aligned images. MR representations are proposed in the literature to enable a more coherent combination of different information within the same visual context, which is more intuitive [3][30]. The proposed interfaces focus on specific features and offer various elements to support operators while driving remotely-located vehicles. MR representations have sometimes included the use of visual aids (VAs) to provide information on sensor data or on robot and environment status. In case of sensor data, VAs mainly consist of graphic elements that exclusively display proprioceptive sensors data [2][10]. Only in [3] some information about remote terrain (namely slope) is given to teleoperators. Other interfaces propose the use of stereo cameras to increase spatial awareness, but only recent works combine stereoscopic views with head-mounted displays (HMDs) [10][12][31]- [36]. In case of robot and environment status, VAs refer to the outcome of SLAM algorithms or, more generally, real-time 3D reconstructions, to obtain a 3D model of the region the robot is navigating into [31][35]. Others do not take into account any information about the environment geometry [2][12][33][34][37] despite status information and 3D models have recently become more available and accessible. Furthermore, most of the works exploiting MR have been designed for indoor scenarios or outdoor scenarios under certain conditions only [31]. A summary of related literature works, including interfaces’ visualization characteristics and VAs is shown in Table I. TABLE I. SUMMARY OF LITERATURE WORK’S MAIN CHARACTERISTICS WE CONSIDER AS REFERENCE. EGO/EXO STANDS FOR EGOCENTRIC/EXOCENTRIC. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3057808, IEEE Access   VOLUME XX, 2017 9 III. METHOD: COMPREHENSIVE USER INTERFACE BASED ON VISUAL AIDS This paper proposes a combination of elements within a user’s visual interface for teleoperation of ground vehicles, which integrates live and pre-acquired sensors data, to represent robot and environment status within an immersive and adaptive MR view. Such a combination has never been proposed in the related literature. The interface combines: a) Immersive Visualization. A natural approach to visual observation and interaction, which best suits the use of VR headsets. Interaction in the latter can take place through head movements and hand controllers. b) Three-Dimensional Mixed Reality. An Augmented Virtuality (AV) visual representation [29], where real and virtual elements are integrated three-dimensionally on operator’s display. These elements represent sensors data providing: positional information environment maps, graphical 3D reconstruction and live stereo images. c) Visual Aids. A graphical representation of sensor data designed to provide specific types of aid in an intuitive way. They represent environment and direction information and aim to assist users during navigation. The interface is designed according to following elements: single-window representation, video-synthetic images, intuitive data viewing, regional and directional visual aids. A. SINGLE-WINDOW REPRESENTATION The interface view shown on the VR headset represents a single window. This is the sole control panel users can rely on during robot teleoperation. The shown information may need to be rich (because of the many available sensors), while it may also need to be quickly comprehended, e.g. when decisions are made while driving at speed. Sensors information presentation is therefore a relevant aspect. We see in literature control panels that only show subsets of available sensor data, others that show more sensor outputs but through different windows [38]-[40], and others that attempt grouping all data within a single window [3][7][30]. We want our interface to include all sensor data as well as any available prior knowledge, aiming at achieving a continuous and exhaustive monitoring of events. We follow recommendations from Gibson’s ecological approach [41]. Therefore, we propose the use of a single window where we concurrently display live video-stream of the remote-scene and graphical representations of incoming sensor data combined with prior knowledge. The graphical elements are indications such as: distance to objects, traveled trajectories, environment features descriptors (slopes, obstacles), and a polygonal mesh of environment shape and objects resulted by the 3D reconstruction [42]. Video stream and graphics are three dimensionally integrated, a feature still quite uncommon in literature works (despite the use of 3D graphics). The single-window three-dimensional MR operator’s view is a perfect design fit to the use of immersive displays, such as a VR headset. It also adapts well to 2D/3D desktop displays and wall screens figure 2 top-right and figure 3 show examples of single-window representations. B. VIDEO-SYNTHETIC IMAGES A mixed reality visualization context containing both video and synthetic image textures is proposed, to support the use of the single window representation and its consequent need to integrate data of different type and from different sources. In particular, we concurrently display a live video stream (reality) and graphical representations of other sensor data (virtuality). The resulting MR view is an example of Augmented Virtuality [29]. It is in particular proposed to have the incoming video-stream graphically mapped on the reconstructed 3D graphical model. This allows the graphic engine to provide users with correct and coherent viewing perspective, and to correctly manage occlusions. Vision being the dominant human sensory modality and the one we believe the most, we propose reality [29] to be always on screen whenever available (within the 3D MR context), and to portray the remote environment through the streaming of video-images (a rich information fast to be comprehended). As for the virtuality representation, this consists of graphical objects representing trajectories, trends, slopes, walls, etc., which are three-dimensionally integrated in the MR context. A careful design of functions and their appearance is therefore required. We apply useful guidelines to general interface design provided in Information Visualization literature [43], and we do it within the MANTRA context. FIGURE 2. Illustrated overview of the proposed system and approach to outdoor robot teleoperation. The figure shows the robot-navigation workspace used in our experiments, which is surveyed by a drone that through its camera-system graphically reconstructs the below workspace area in 3D. An image of the ground robot used in our experiments is also included. On the right-hand side an image showing a robot operator wearing a VR headset, with on top an example of operator’s view during navigation. The operator’s view shows a MR scene depicted within a single-window representation. The view includes the video-stream image integrated with the reconstructed 3D model and four of the proposed visual aids (centerline, virtual pointer, top-view and guide arrow). This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3057808, IEEE Access   VOLUME XX, 2017 9 Furthermore, we always have the option for users to adjust element views on demand. Special attention is paid on spatial alignment in the 3D viewing space. We rely on a semi-automatic calibration process done at the start, which can also be repeated on demand during navigation. The proposed MR single-window view is visualized in S3D (for both video and graphics), which greatly helps users to better disambiguate among the different visualized elements based on the higher depth comprehension S3D provides. Figure 3 shows an example of our MR view. FIGURE 3. Example of operator’s MR view with live video-stream in the background (showing sky, grass, rock-stones and buildings) and graphical representation of sensor data (suggested travel trajectory, guide arrow, top-view and traversable area in blue). The MR representation as the one proposed cannot be found in the literature. Table 1 collects related methods, which all have relevant differences with ours. We had also proposed one in recent years [7], but experiments ran indoors, on flat surfaces and with simple man-made scenarios. With this work we propose a new design framework, to address more challenging applications because of the uneven natural outdoor scenarios. C. INTUITIVE DATA VISUALIZATION When designing sensor data representation through the use of intuitive VAs, we propose to refer to Gestalt laws’ most relevant indications and apply them to teleoperation interface’s visual screen design. We focus on the eleven laws summarized by Chang et al. [44], which include:  Smooth Continuation of lines and images [45]. We use it for path-indicators and mapped video streaming.  Unambiguous and simple shapes. We use it for directional indicator and its design.  Appropriate use of colors [46][47]. We use it to design careful color pairing, limited hues, support expectations. COLORS We propose the use of standard color conventions, including those associated to danger–caution–safety [47], as they match users' expectations and recall the ordinary vehicle driving experience. We rely on the simplicity principle [44], use of small number of hues (but tuning up lightness and saturation), and colors pairing [46][47]. The latter is especially relevant as we cannot control color-appearance of objects and landscapes in the incoming images, e.g. terrains and sky, but they act as background colors to contrast our VAs’ color. We go for two dyad complementary colors as in MacDonald et al. [46]. FIGURE 4. Adaptive Transparency VA with 3D reconstruction and traversable area: [left] both opaque, [center] only traversable area opaque, [right] full transparency of live streamed video. TRANSPARENCIES The need to manage multiple sensors data and their graphical representations, arising 3D occlusions, and the blending of video and graphics, make semi-transparency a key aspect. This plays a relevant role in supporting human attention and facilitating comprehension. The work of Harrison et al. [48] on transparency gives useful indications towards maintaining attention and fluency, which we apply to our MR views. We propose smooth blending and balancing between background video and the often sharper and less varying VAs. Furthermore, VAs’ color and semi- transparency can all be adjusted on demand. Figure 4 shows an example of different level of VAs’ transparency. ADAPTIVE VIEWS To minimize cognitive overload, such as it could occur in dynamic situations requiring timely response [7], we follow the recommendations provided by Shneiderman [49] and Harrison et al. [48]. These concern with control panels’ observations and the psychological problem of focused and divided attention. The answer is a screen view that dynamically changes depending on robot speed and user’s preference. For instance, when moving it may show a qualitative overview that only shows relevant sensor information on path and close obstacles, graphically hiding object details, etc. This follows Shneiderman’s Visual Information Seeking Mantra (MANTRA) design approach [49]. Figure 5 shows examples of adaptive views using different color and transparency levels. It is proposed that visibility and blending of predominant elements can change dynamically, adaptively, and on user's demand. We adopt the MANTRA approach for managing different data types [49]. We find very appropriate the This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3057808, IEEE Access   VOLUME XX, 2017 9 suggested use of a small number of tasks, and we wish these to follow the order given by the MANTRA as it fits nicely to robot teleguide actions. We always Overview first, while Zooming, Filtering and Details are applied on demand [31][35]. Automatic adaptation based on speed and distance to objects is an available option useful for specific environments and situations. The Overview situation is empowered by the exo-centric viewing option. It can be combined with other VAs such as the Traversable Area and Centerline (introduced below). The Zooming can be enriched by measurements using virtual pointers. The Filter can be applied to colors and occlusions. We additionally consider within this approach the management of delays-driven latencies (based on timestamps). This to give consistency among visualized sensor data and to align them with the video-stream. Visible seams may occur because of image noise and sensor errors. D. REGIONAL AND DIRECTIONAL VISUAL AIDS The graphical elements, representing incoming sensor data, are specifically designed for teleoperation to visually aid understanding of: sensor information, remote-scene dynamics, vehicle’s behavior and operator’s commands. Eight types of VAs are proposed, which are grouped as Regional and Directional. The Regional VAs provide information on robot surrounding environment, while Directional VAs provide route following information (in our case this is based on traversing-cost path-planning [50]). REGIONAL VISUAL AIDS  Traversable Area. It shows the crossable areas within current field of view. It can include expected difficulty and hide non-relevant details. The shown information is inserted and tuned either on-demand or adaptively (e.g. based on current speed). It is often an essential aid, especially for outdoor unstructured scenarios. Figures 3 and 4 include representations of the traversable area VA.  Extended Camera View. It shows live camera images surrounded and integrated with graphics representing adjacent areas. This allows operators to perceive vehicle surrounding areas simultaneously to frontal view. It is helpful in many tasks to overcome narrow passages while providing greater sense of presence. It marries well with the proposed use of VR headset, which thanks to head-tracking adapts visualized images to user’s head position. We propose live video-images to be inserted into a wider AV context through CG video-mapping. In particular, the streamed video is mapped onto a surface located in front of the robot according to robot heading. Our approach differs from literature works, such as Li J. et al. [51] proposing sensor alignment through reconstructed 3D models (for outdoor applications), because we do not try to align real and virtual views. Rather, we map a live video feed into a surface inside the graphic representation. In this way to produce an extended camera view is straightforward through graphic rendering. We use image-processing to align video and graphical elements as in [7]. Our solution is feasible because we have knowledge of the robot surrounding environment provided by the robot sensors and reconstructed 3D model. Furthermore, our 3D model provides detailed graphical appearance because of the hi- resolution aerial cameras. This is very helpful to reduce the observed gap in quality between video and synthetic images. The main challenge for the proposed AV solution is alignment. We address it by careful horizon alignment [52], sensor data filtering and assessment, an initial and repeatable calibration [7], and graphic texture extrapolation in case of visible seam (due to lack of data). Figure 6 shows an extended camera view example. FIGURE 5. MR images showing qualitative overviews of an environment. Each row shows views of the same environment from the same viewpoint, with a video-image combined with a coloured graphical representation following objects’ shape in 3D. The images to the right show instances of MR views occurring when the robot travels at higher speed. They show less objects details, which gives higher appreciation of free and occupied space. Such qualitative view is deemed relevant when driving at higher speed and therefore there is less time to look at details, but greater interest in avoiding collisions. The graphic mesh is coloured according to distance to objects, with red indicated higher proximity and risk of collision. Transparency changes according to vehicle’s speed. FIGURE 6. Extended Camera View VA: the image shows the graphically reconstructed 3D environment (from live sensor data), while the live video streamed from the onboard camera can be noted to the left in background. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3057808, IEEE Access   VOLUME XX, 2017 9  Exocentric View. It shows the robot during navigation from a 3rd person view. We propose an exocentric 3D view generated from the available 3D model (updated whenever possible), GPS, IMU and odometric sensor data. This solution enables generation of any exocentric views, which is very useful in many occasions, such as parking and area overview. Operating with an exocentric view allows for faster space layout comprehension and effective vehicle maneuvering. Nielsen et al. [30] explain how an adjustable perspective can aid with all three levels of Endsley's situational awareness (perception, comprehension, and projection) [1]. Alternative solutions for exocentric view generation would be impractical as they could require for example large camera heads on top of vehicles or would help only on specific actions. We are aware that an exocentric view may slow down operations because it could divert operator’s attention by uncovering details (the same way 3D views do compared to corresponding 2D views [5][32]). This has nonetheless no impact on accuracy. Figure 7 shows two exocentric views examples. FIGURE 7. Exocentric View VA with three vehicle’s views captured during navigation. The left-image shows the Centerline VA on exocentric views.  Top View. It shows the environment from above. This type of view, popular on computer games to supports vehicles and person navigations, are considered of great help to catch vehicle attitude and plan future moves. Our top-view size and its position can be adjusted by users. Figure 3 shows an example.  Virtual Pointer. It shows a ray casted from the robot towards a location of interest in the environment. This type of VA already proposed in AR from early years [29] can be beneficial for human-computer interaction in virtual/augmented environments [53]. By casting a ray our pointer establishes a connection and distance-factor with the environment. A road-sign post like is shown at the position hit by the ray, displaying corresponding information in terms of altitude, slope, distance and traversability cost. Figure 4 shows an example. This VA supports the need one may have of reading specific measurements [48], which can be derived from the 3D reconstructed model or sensors data. Figure 8 shows an example. DIRECTIONAL VISUAL AIDS  Guide Arrow. It shows a 3D arrow providing clear indication about the direction to follow. Figure 4 shows an example. Arrows are popular to indicate routes and very appreciated by inexperienced users [54]. They outperform alternatives such as compass and light sources [55], and are popular in latest video-games [54]. Our guide arrow is implemented through dynamic selection of most suitable centerline checkpoints and it is displayed in S3D making the indicated direction easy to catch visually. Figure 3 shows an example. FIGURE 8. Virtual Pointer VA with post raised on targeted point.  Centerline. It shows a graphic line that indicates the best vehicle position to have while navigating. A centerline represents a clear reference for vehicle’s operators. Lines and segments are widely used on our roads to indicate lanes and specific behaviors. To use continuous lines to follow directions is in harmony with our brain sensitivity [45]. Lines provide that simplicity suggested in visual observations [44]. Differently from for our roads, our vehicle is expected to stand on top of the continuous line. This to ensure it keeps the best position while driving. The centerline is an intuitive and simple reference, which provides direction too. Figure 3 and figure 7 (right-hand side) show examples.  Robot Bonnet. It shows a graphical representation of robot front bonnet from ego-centric viewpoint. Therefore, only provided with this type of viewpoint. It includes an imaginary bonnet shape (with pointed end) to help operators comprehend vehicle-direction. Viewing vehicle’s front bonnet effectively increases situational awareness [2][31], and helps preventing collisions [5]. We cannot find in the literature an organized visual interface framework such the one proposed, which is aimed at enhancing user’s comprehension through the described design elements and visual aids. The most relevant contribution is related to the intended application scenario, namely in uneven unstructured terrains, for which teleoperation interfaces have been scarcely investigated and developed. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3057808, IEEE Access   VOLUME XX, 2017 9 IV. EXPERIMENTATION SYSTEM AND SETUP Figure 1 gives an illustrated overview of the proposed system, while figure 9 shows the main processing units with the related blocks and data flow. A. HARDWARE, SENSING AND VISUAL SCENES In our experiments we use the U-Go Robot mobile platform, which is 75cm long, 88cm large, with 18cm wide rubber- tracks. It is localized outdoor through the Real-Time Kinematic (RTK) Differential GPS. Its 3D attitude is acquired by the high-precision IMU Xsense MTi. A ZED stereo-camera connected to a Jetson TX1 board sits to face the environment in front of the robot, while acquiring 15fps at 1280×720 px- res. High-level robot functions are managed by a Raspberry Pi3 board, which runs the Robot Operating System (ROS) communication network. The ROS sets up distributed computing based on TCP-IP protocol and manages robot navigation. ROS drivers support the ZED cam. Figure 10 shows the system architecture. The proposed teleoperation interface, built using the Unity3D software, contains environment textures mapped to an aerial photogrammetric reconstruction of the navigation area, obtained through the dedicated mapping software Pix4D [42]. A 2D map including traversing costs is derived from a terrain traversability analysis performed on the Digital Elevation Model (DEM), based on slope assessment and step detection [50]. This cost-map is matched against environment top-view. Our map pixel-size is 25cm. The Centerline VA is a polyline generated through multiple checkpoint objects laid over the 3D environment. B. NAVIGATION, DATA COMMUNICATION AND CALIBRATION The driver side of the teleoperation system consists of a PC remotely connected to the U-Go ROS network, which acts as robot-control station. The remote connection is realized similarly to the work described in [56], thus letting the ROS middleware manage concurrencies and obtaining a reliable communication network. The remote operator wears an Oculus Rift HMD to get immersive MR views of robot and environment. The vehicle is guided through the Oculus Touch controllers. Alternatively, the user observes through a 27” Acer Hn274h 2D/3D desktop monitor. Unity3D runs scripts and sends driving commands to robot [56]. During driving we store video frames, robot pose and associated timestamps. The FIGURE 9. Main processing units with related blocks and data flow. FIGURE 10. System architecture of the U-Go robot. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3057808, IEEE Access   VOLUME XX, 2017 9 operator’s view is updated with latest sensor inputs and live- streamed environment images mapped on the reconstructed planes frontal to robot. The data-flow scheme (figure 9) also handles operator’s run-time requests such as traversable areas (relying on the estimated 2D cost-map). Camera images are aligned to environment 3D reconstruction automatically through the proposed image- stream mapping. Nonetheless, it is possible for an operator to make adjustments through the provided calibration tool. This helps overcome video-graphics mismatches. Calibration may take place at mission start and during teleoperation through the Touch controller. This feature is relevant because outdoor navigation causes vibrations and jumps, which may lead to camera-robot misalignments. V. RESULTS AND DISCUSSION: SYSTEM FEASIBILITY AND INTERFACE EFFECTIVENESS We wanted to gain an insight on overall system functionality and interface performance. We found it difficult to directly compare the proposed system with other state of the art proposals, such as e.g. those listed in Table 1. The reason being the different systems’ setups, (such as the robot platform and the number of variables), the transmission delays and type of visual aids, which are difficult to equally recreate. We consequently aimed in our test trials:  To confirm feasibility of the proposed systems and interface solutions, by implementing and running our interface on a real system setting.  To check effectiveness of the proposed MR scenarios and visual aids, by asking users for their impression. Our qualitative assessment included observing sensors misalignments and visual aids usefulness. We also compared the use of VR headset with a more traditional desktop monitor. Twenty-four trials took place on an outdoor field that included a number of navigation challenges: uneven ground, two different slopes, two uncrossable areas and limited ground visibility due to rich vegetation. A real robotic systems was used with maximum speed of 0.8 m/s. Figure 1 shows our test area through its 3D graphical reconstruction. The test environment resembles two application scenarios: a terraced field, in the context of agricultural robotics, and a post-landslide scenario, where time-critical search and rescue operations have to be performed. We asked 12 users to tele-operate the robot twice across the entire traversable area (after three minutes of practice), while either wearing a VR headset or observing the environment in front of a 2D/3D desktop display. All users had some experience in using video-games, but not with robot teleoperation and VR. They followed a pre-determined schedule to counterbalance learning and fatigue effects. The linked video shows the system, all the VAs and a demo trial. Tests conformed to literature recommendations [57]-[59] and followed traditional approaches in terms of consent, forms and questionnaires [58]. Response to a questionnaire was provided based on the 7-point Likert scale (-3,3), with ‘3’ being the best score and ‘-3’ the worst one. There were 13 questions users were asked to answer with a score. The questions directly asked about the effectiveness of the 8 proposed VAs and of 4 viewing performance indicators (display suitability, used colours, image shaking and image misalignment). Users were carefully and equally explained the meaning of each question and scale values. We asked users to feel free to provide comments regarding our questions. In addition, we specifically asked to comment about sensor inconsistencies and video-graphic misalignments. A. SENSOR DATA MISALIGNMENTS Assessing communication challenges was outside our scope. Rather, we focused on evaluating the interface capability of delivering multiple inputs to operators (from GPS, IMU, Camera and 3D graphic model). We had an Internet connection with delays ranging from 1 to 4 secs, which is in line with several works in the literature [7][31]. We exploited the data-associated timestamps in order to serve all inputs and response on first-in first-served schedule. The relevant outcome of open-handed questions and identified issues are discussed below:  Sensors Inconsistencies. Differences among sensors in acquisition speed were of major interest in our test, as they would potentially affect image alignment and users’ comfort. As for our choice sensor data were all transmitted as raw to save processing time. The GPS operated at 500ms, the IMU at 100ms, while live-camera images were sent at 15fps. The different sensor rates prevented us to opt for an encapsulation-based data transmission, as such approach would have bound us to the lowest data rate (i.e. the one of the GPS). The outcome indicated that discrepancies among sensors led to asynchronous inputs resulting in occasional images or VAs abrupt movements on display (image shaking). These were noted by 9 of our 12 users who judged them as minor and never critical (despite the uneven ground and lack of image- stabilization). The GPS lower update rate was one of the causes of inconsistencies despite mitigated through the use of a Kalman filter used as sensor fusion method and to combine acquired positions and odometry. FIGURE 11. MR view during teleoperation toward the end of a trial. The superimposed red arrows indicate examples of misalignment between video and graphical elements. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3057808, IEEE Access   VOLUME XX, 2017 9  Video-Graphic Misalignment. A consequence of sensor inconsistencies, but also errors and transmission delays, is misalignment in the MR image between video and graphical elements (responding to different sensors and data rates). During our tests we could occasionally observe the actual vehicle attitude being different from the one communicated to operator's interface, which caused visual gaps between the texture-mapped graphical model and the camera streamed images. Figure 11 shows an example of image misalignment towards the end of a trial. The average score of the users for image misalignment was “good” (score 1) with 11 test-users judging that “incoming sensor and model information were received and displayed coherently”. Occasionally image misalignments on attitude and position appeared (noted by all users), but they were not reported as relevant disturbance. We think this is a clear evidence that the robot hardware and the ROS network worked well and can be expected to do a good job under similar conditions and robot speed. The graphic model showed no delay in generating views; neither delays were perceived when mapping sensor data to VAs. FIGURE 12. Top-diagram: median scores and standard error of our qualitative user studies comparing user performance when tele- operating through a VR headset, 2D desktop, S3D desktop. The below- table shows T-Student p-values of coupled comparisons between the displays. The values in red indicate significant differences. B. VIEWING PERFORMANCE Figure 12 shows the outcome of our qualitative within subject user study. Users tele-operated the robot while either wearing the VR headset or observing through a less immersive desktop screen. The latter included two viewing modalities: standard 2D and stereoscopic-3D (S3D). The diagrams show median values and standard error. The figures also include a table containing T-Student’s p-values for coupled comparisons between displays.  Display Suitability. The VR headset performed significantly better than any other screen, whereas the 3D monitor scored significantly better than its 2D version. This outcome clearly shows: (1) the contribution of having S3D viewing (HMD and 3D Desktop); (2) the contribution of having S3D viewing coupled to wide viewing angle (HMD). The higher sense of isolation is also a contributing factor.  Image Shaking. This is generally commented as a minor issue. The significantly worse headset’s performance compared to the monitors, and the similar scores of the two monitors, confirm the negative effect of the headset’s greater involvement of user’s peripheral vision.  Image Misalignment. Visible gaps between image elements occurred mostly between streamed video and graphics, whereas graphical representations of sensor data and 3D model displayed coherently. The 2D monitor performed best. The reason being the misalignment were partially mitigated by the lack of depth awareness, which made them less noted or not perceived at all. The difference was significant only between the two monitors. The overall performance was lower than previous indicators, but still with a positive average.  Used Colors. They were positively judged in terms of the chosen hues and their mapping to visual aids. They were, commented as well adapted to driving context (featuring green grass and brown terrain). Scores were among the highest, with no significant differences. C. VISUAL AIDS The figures 13 and 14 show the outcome of our qualitative analysis over VAs, analogously to figure 12.  Traversable Area. It scored high on all displays with no significant difference among them. It was deemed needed for the specific type of environment, which featured reduced ground visibility because of the vegetation, and very helpful during planning and driving. Users stated this VA provided immediate comprehension facilitated by the use of colors and transparencies.  Extended Camera View. The enhancement in terms of sense of presence was felt and positively judged. It was particularly appreciated on the headset because of the wide FOV. The graphic prevalence was judged excessive and suggested to be reduced. This VA scored significantly better on the VR headset than any other screen, whereas the 3D monitor scored significantly better than its 2D version.  Exocentric View. It was commented as powerful and useful, but to require a couple of seconds to mentally adapt to the switch from egocentric to exocentric viewpoints. The graphic is deemed improvable to further increase realism and presence, e.g. by including wheels’ movement. The 3D monitor performed significantly better than both the other displays. 3D viewing was judged very helpful. It was less appreciated on the headset because of some occurring visual deformation. HMD‐3D  0.0001  0.0087  0.2261  0.1925  HMD‐2D  0.0000  0.0001  0.0976  0.3799  3D‐2D  0.0023  0.1070  0.0279  0.1251  ‐1 0 1 2 3 4 Display Suitability Image Shaking Image Misalignment Used Colors Viewing Performance HMD 3D Monitor 2D Monitor This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3057808, IEEE Access   VOLUME XX, 2017 9 ‐1 0 1 2 3 4 Traversable Area Extended Camera View Exocentric View Top‐View Virtual Pointer Visual Aids ‐ Regional HMD 3D Monitor 2D Monitor ‐1 0 1 2 3 4 Guide Arrow Centerline Robot Bonnet VAs Overall Visual Aids ‐ Directional    HMD 3D Monitor 2D Monitor  Top-View. This VA was judged occupying an excessive area of the overall view. Despite this, many users commented this aid was often ignored and lacked indication of the travelled path. A positive note was this VA being quicker to catch mentally than the exocentric view one. Scores were low with zero median value and no significant difference between screens.  Virtual Pointer. It was commented as very useful to understand the surrounding environment during planning and under static conditions, whereas during navigation the generated occlusions sometime hindered visibility. It scored high as median value on all screens, with no significant differences.  Guide Arrow. This aid was judged to provide substantial help during navigation because it clearly indicated the driving direction. Users also commented its effectiveness was subject to hue choice, as the VA needs to stand out from current background. It was suggested its color should be adapted to vehicle’s speed. The VA position on screen was judged suitable, and it was appreciated the option of changing it on demand. 3D viewing played a major role, as confirmed by the significantly better performance of headset and 3D monitor when compared to 2D monitor.  Centerline. It was commented as the most needed help. It is intuitive and well indicated the position to hold during navigation and the path to follow. It also gave clear visibility to the underlying and surrounding environment. This VA achieved its highest scores on headset and 2D monitor, and slightly less on 3D monitor, with no significant differences between screens.  Robot Bonnet. It was judged very helpful, but in need of some graphic improvements to its shape, to get maximum score. The 3D viewing made once again the greatest difference, especially in narrow passages, as it allowed users to clearly perceive the displacement between robot and closest obstacles. The headset and 3D monitors scored significantly better when compared to the 2D monitor.  VA Overall. The results showed relevant variations among different visual aids and their effect on the tested displays. VAs were overall judged useful and providing real help for outdoor ground robot navigation. The VAs were judged to play a more relevant role in supporting navigation than the display. This we deem explained the high overall scores achieved by the 2D monitor and the non-significant difference between displays. The proposed VAs, with the only exception of the top-view, were no doubt of great help to ground robot navigation. This was the case on any display. Their usefulness varied, with the regional VAs greatly appreciated for overview and planning, typically under static or low-motion conditions, whereas directional VAs were highly valued during motion. The VR headset confirmed its great suitability over VAs that could exploit the wide FOV and head movement, therefore enhancing presence, and under static or nearly static conditions. This was particularly the case of the extended camera view and exocentric view. The VR headset also showed its advantage on both regional and directional VAs, in terms of 3D visualization. This was particularly the case for guided arrow and robot bonnet. The 3D monitor was appreciated for enhancing depth perception, when compared to its equivalent 2D version. It was particularly appreciated for the exocentric view, guide arrow and robot bonnet. Testing on the 2D monitor was useful to see effectiveness of the VAs per se (regardless of the specific display). This was particularly shown by the VA overall high scores and the non-significant difference between displays for traversable area, virtual pointer, centerline and VA overall. HMD‐3D  0.2442  0.0262  0.0386  0.1976  0.3967  HMD‐2D  0.5000  0.0014  0.3874  0.0774  0.0622  3D‐2D  0.2080  0.0399  0.0050  0.2667  0.1181  FIGURE 13. Top-diagram: median scores and standard error related to Regional Visual Aids. The below-table shows T-Student p-values of coupled comparisons between the displays (red values indicate significant differences). HMD‐3D  0.1699  0.1097  0.2041  0.1175  HMD‐2D  0.0001  0.3412  0.0028  0.0918  3D‐2D  0.0007  0.2080  0.0090  0.3639  FIGURE 14. Top-diagram: median scores and standard error related to Directional Visual Aids (and overall). The below-table shows T-Student p- values of coupled comparisons between the displays (red values indicate significant differences. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3057808, IEEE Access   VOLUME XX, 2017 9 VI. CONCLUSIONS A new mixed reality visual context for robot teleoperation interfaces was proposed, aimed at improving performance by increasing operator's situational awareness. With the help of information visualization theories, related literature (and authors’ experience), a way was devised to intuitively communicate available sensor information concurrently to streamed video input and environment knowledge. The interface combined immersive visualization, three- dimensional mixed reality and visual aids; and its design included: single-window representation, video-synthetic images, intuitive data viewing, regional and directional visual aids. The use of specific visual aids was proposed, which were designed to best represent different sensor data within and around live video-images. Eight visual aids were proposed, classified as: Regional (traversable area, extended camera, top view, exocentric view, and virtual pointer) and Directional (guide arrow, centerline, and robot bonnet). The proposed design was implemented on a real system that included: mobile platform (mobile robot with various sensors, a 3D-camera and processing unit), flying vehicle (drone with GPS and camera), and operator’s unit (mixed reality interface, graphical processor and VR headset/3D-monitor). All were linked through a communication network. The system was tested by twelve users through twenty-four practice trials on an uneven outdoor terrain presenting a few challenges. The outcome was very encouraging because of the positive feedback and acceptance given by all users towards the interface performance and visual aids. All visual aids but one, were positively judged, deemed helpful and effective. Performance varied for different displays and viewing modalities, with the VR headset showing superior performance when either its wide FOV and head movement, or 3D viewing, could be exploited. This was the case with extended camera view, exocentric view, guided arrow and robot bonnet. The 3D monitor also showed good performance over its 2D version because of the enhanced depth-perception. Improvements were suggested on: virtual pointer, guide arrow and robot bonnet. While the top view was criticized in its current form. Additionally, we plan to further mitigate sensor data misalignments and image shaking by respectively acting on the network, aiming at reducing communication delays, and on the robotic platform by introducing shock absorbers. As mentioned, communication related issues were not addressed in the present work, thus to be faced in future developments. We think three-dimensional mixed reality is the future of teleoperation visual interfaces, which combined to visual aids has great potential in effectively conveying diverse sensor information visually. Three-dimensional mixed reality and visual aids also well marry with the use of a VR headset, which has now become mature technology. REFERENCES [1] M. R. Endsley, “Design and evaluation for situation awareness enhancement,” in Proc. Human Factors Soc. 32nd Annu. Meet., Los Angeles, CA, USA, pp. 97-101, 1988. [2] K. Krückel, F. Nolden, A. Ferrein, I. Scholl. “Intuitive Visual Teleoperation for UGVs Using Free-Look Augmented Reality Displays”. IEEE Int. Conf. on Robot. Autom. (ICRA), Seattle, WA, USA, pp. 4412-4417, 2015. [3] A. Kelly, E. Capstick, D. Huber, H. Herman, P. Rander, and R. Warner, “Real-Time Photorealistic Virtualized Reality Interface for Remote Mobile Robot Control,” in Robotics Research, vol. 70, Berlin, Germany: Spring., pp. 211–226, 2011. [4] D. Q. Huy, I. Vietcheslav, G. S. G. Lee, “See-through spatial augmented reality - a novel framework for human-robot interaction,” Proc. 3rd Int. Conf. Contr., Autom. and Robot., Nagoya, pp.719-726, 2017. [5] S. Livatino, G. Muscato, and F. Privitera, “Stereo Viewing and Virtual Reality Technologies in Mobile Robot Teleguide,” IEEE Trans. Robot vol. 25, no. 6, pp. 1343-1355, Dec. 2009. [6] J. A. Frank, S. P. Krishnamoorthy, V. Kapila, “Toward Mobile Mixed- Reality Interaction with Multi-Robot Systems,” IEEE Robot. Autom. Letters, vol. 2, no. 4, Oct. 2017. [7] S. Livatino, F. Bannò and G. Muscato, “3-D integration of robot vision and laser data with semiautomatic calibration in augmented reality stereoscopic visual interface,” IEEE Trans. Ind. Inf., vol. 8, no. 1, pp. 69-77, Feb. 2012. [8] J. Xiao, P. Wang, H. Lu, and H. Zhang, “A three-dimensional mapping and virtual reality-based human–robot interaction for collaborative space exploration,” International Journal of Advanced Robotic Systems, 17(3), 2020. [9] N. Zaman, A. Tavakkoli, and, C. Papachristos, “Tele-robotics via An Efficient Immersive Virtual Reality Architecture,” 3rd Int. Workshop on Virtual, Augmented, and Mixed Reality for HRI, Cambridge, 2020, preprint. [10] T. Kot, P. Novák and J. Bajak, “Using HoloLens to create a virtual operator station for mobile robots,” 2018 19th International Carpathian Control Conference (ICCC), Szilvasvarad, 2018, pp. 422-427. [11] S. Livatino, L. T. De Paolis, M. D’Agostino, A. Zocco, A. Agrimi, A. De Santis, L. V. Bruno, M. Lapresa., “Stereoscopic Visualization and 3D Technologies in Medical Endoscopic Teleoperation,” IEEE Trans. Ind. Electron., vol. 62, no. 1, pp. 525-534, 2015. [12] J. Jankowski and A. Grabowski, “Usability Evaluation of VR Interface for Mobile Robot Teleoperation,” International Journal of Human Computer Interaction, vol. 31, no. 12, pp. 882–889, Dec. 2015. [13] A. Nayyar, V. Puri, N. Nhu, and D. N. Le, “Smart surveillance robot for real-time monitoring and control system in environment and industrial applications,” in Information Systems Design and Intelligent Applications. Singapore: Springer, vol. 2018, pp. 229– 243. [14] R. S. Batth, A. Nayyar and A. Nagpal, “Internet of Robotic Things: Driving Intelligent Robotics of Future - Concept, Architecture, Applications and Technologies,” 2018 4th International Conference on Computing Sciences (ICCS), Jalandhar, 2018, pp. 151-160, doi: 10.1109/ICCS.2018.00033. [15] I.E. Sutherland, "A head-mounted three dimensional display," Proceedings of the December 9-11, 1968, fall joint computer conference, part I, pp. 757--764, 1968. [16] "Z800 3DVisor," eMagin, [Online]. Available: https://en.wikipedia.org/wiki/Z800_3DVisor. [Accessed 2 May 2020]. [17] "Oculus Rift DK1" Facebook, [Online]. Available: This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3057808, IEEE Access   VOLUME XX, 2017 9 https://en.wikipedia.org/wiki/Oculus_Rift#Development_Kit_1/. [Accessed 12 Dec 2020]. [18] Google, Google Cardboard. Available at: https://arvr.google.com/cardboard/. Accessed on Nov 2020. [19] Samsung/Oculus, Samsung Gear VR https://www.oculus.com/gear-vr/. Accessed on Nov 2020. [20] "HTC Vive Focus Plus", [Online]. [Accessed 12 Dec 2020]. Available: https://enterprise.vive.com/uk/product/focus-plus/ [21] Oculus, Oculus QUEST. Available at: https://www.oculus.com/quest/. Accessed on Nov 2020. [22] Nayyar, A., Mahapatra, B., Le, D., and Suseendran, G. (2018). Virtual Reality (VR) & Augmented Reality (AR) technologies for tourism and hospitality industry. International Journal of Engineering & Technology, 7(2.21), 156-160. [23] "Oculus Rift S", Facebook [Online]. [Accessed 12 Dec 2020]. Available: https://www.oculus.com/rift-s/ [24] "HTC Vive Pro", [Online]. [Accessed. 12 Dec 2020]. Available: https://www.vive.com [25] "HTC Vive Pro Eye", [Online]. [Accessed. 12 Dec 2020]. Available: https://www.vive.com/uk/product/vive-pro-eye/overview/ [26] "FOVE VR Eye-Tracker", [Online]. [Accessed. 12 Dec 2020]. Available: https://fove-inc.com/ [27] "Best 3D Monitors is 2020” [Online]. [Accessed. 12 Dec 2020]. https://www.youtube.com/watch?v=WBiJFGZ6XqA [28] Available: https://www.amazon.co.uk/dp/B01MY142C0?tag=sushee-21 [29] P. Milgram, F. Kishino, “A Taxonomy of Mixed Reality Visual Displays,” IEICE Trans. Inf. Syst., vol.77, no.12, pp.1321-1329, 1994. [30] C. W. Nielsen, M. A. Goodrich, and R. W. Ricks, “Ecological Interfaces for Improving Mobile Robot Teleoperation” IEEE Trans. Robot., vol. 23, no. 5, pp. 927-941, Oct. 2007. [31] P. Stotko, S. Krumpen, M. Schwarz, C. Lenz, S. Behnke, R. Klein, M. Weinmann. “A VR System for Immersive Teleoperation and Live Exploration with a Mobile Robot” arXiv preprint, arXiv:1908.02949, 2019. [32] F. Okura, Y. Ueda, T. Sato, and N. Yokoya, “Teleoperation of mobile robots by generating augmented free-viewpoint images,” in Proc. IEEE/RSJ Int. Conf. Intell. Rob. and Syst., Tokyo, Japan, pp. 665- 671, Nov. 2013. [33] R. Ventura and P. U. Lima, “Search and Rescue Robots: The Civil Protection Teams of the Future,” 2012 Third International Conference on Emerging Security Technologies, Lisbon, pp. 12-19, 2012. [34] R. P. Saputra and P. Kormushev, “ResQbot: A Mobile Rescue Robot with Immersive Teleperception for Casualty Extraction,” In Proc. 19th International Conference Towards Autonomous Robotic Systems, Bristol, UK, 2018. [35] A. Tikanmäki, T. Bedrník, R. Raveendran and J. Röning, “The remote operation and environment reconstruction of outdoor mobile robots using virtual reality,” 2017 IEEE International Conference on Mechatronics and Automation (ICMA), Takamatsu, 2017, pp. 1526- 1531. [36] J. J. Roldán, E. Peña-Tapia, P. Garcia-Aunon, J. Del Cerro and A. Barrientos, “Bringing Adaptive and Immersive Interfaces to Real- World Multi-Robot Scenarios: Application to Surveillance and Intervention in Infrastructures,” in IEEE Access, vol. 7, pp. 86319- 86335, 2019. [37] G. Adamides, C. Katsanos, Y. Parmet, G. Christou, M. Xenos, T. Hadzilacos, Y. Edan, “HRI usability evaluation of interaction modes for a teleoperated agricultural robotic sprayer,” Applied Ergonomics, Vol. 62, pp. 237-246, 2017. [38] H. A. Yanco, J. L. Drury, and J. Scholtz, “Beyond usability evaluation: Analysis of human-robot interaction at a major robotics competition,” J. Hum.-Comput. Interact., vol. 19, no. 1 and 2, pp. 117–149, 2004. [39] D. J. Bruemmer, J. L. Marble, D. A. Few, R. L. Boring, M. C. Walton, and C. W. Nielsen, “Shared understanding for collaborative control,” IEEE Trans. Syst., Man, Cybern. A, vol. 35, no. 4, pp. 494–504, Jul. 2005. [40] M. Baker, R. Casey, B. Keyes, and H. A. Yanco, “Improved interfaces for human–robot interaction in urban search and rescue,” in Proc. IEEE Conf. Syst., Man Cybern., The Hauge, The Netherlands, pp. 2960–2965, Oct. 2004. [41] J. J. Gibson, The Ecological Approach to Visual Perception. Boston, MA: Houghton Mifflin, 1979. [42] Pix4D, Pix4Dmapper. Available at: https://pix4d.com/product/pix4dmapper-photogrammetry-software/ Accessed on Nov 2020. [43] R. Mazza, Introduction to Information Visualization, Springer- Verlag, London Limited, 2009. [44] D. Chang, L. Dooley, and J. E. Tuovinen, “Gestalt Theory in Visual Screen Design: A New Look at an Old Subject,” in Proc. World Conf. Comp. in Ed., Copenhagen, Denmark, vol. 8, pp. 5-12, 2001. [45] P. M. Lester, Visual communication: Images with Messages. Belmont, CA, USA: Thomson Wadsworth, 2006. [46] L. W. MacDonald, “Using color effectively in computer graphics,” IEEE Comp. Graph. and Appl., vol. 19, no. 4, pp. 20-35, Jul/Aug 1999. [47] R. Williams, “The non-designer's design book: Design and typographic principles for the visual novice” Pearson Ed. London, UK, 2015. [48] B. L. Harrison, G. Kurtenbach, K. J. Vicente, “An experimental evaluation of transparent UI tools and information content,” in Proc. 8th Symp. User Interf. Soft. Tech., Pittsburgh, USA, pp.81-90, 1995. [49] B. Shneiderman, “The eyes have it: a task by data type taxonomy for information visualizations,” in Proc. IEEE Symp. Visual Lang., Boulder, CO, USA, pp. 336-343, Sep. 1996. [50] D. C. Guastella, L. Cantelli, C. D. Melita, and G. Muscato, “A global path planning strategy for a UGV from aerial elevation maps for disaster response,” in Proc. Int. Conf. Ag. Artif. Intell., pp. 335-342, 2017. [51] J. Li, X. Fan, “Outdoor augmented reality tracking using 3D city models and game engine,” in Proc. 7th Int. Congr. on Im. and Sign. Processing, Dalian, China, pp. 104-108, 2014. [52] M. A. Al-Mouhamed, O. Toker, and A. Al-Harthy, “A 3-D vision- based man-machine interface for hand-controlled telerobot,” IEEE Trans. Ind. Electron., vol. 52, no. 1, pp. 306-319, Feb. 2005. [53] C. P. Quintero, R. T. Fomena, A. Shademan, N. Wolleb, T. Dick, and M. Jagersand, “SEPO: Selecting by pointing as an intuitive human- robot command interface,” IEEE Int. Conf. on Robot. Autom., Karlsruhe, Germany, pp. 1166-1171, May 2013. [54] S. Burigat, L. Chittaro, “Navigation in 3D virtual environments: Effects of user experience and location-pointing navigation aids,” Int. J. Human-Computer Studies, vol. 65, no. 11, pp. 945-958, Nov. 2007. [55] T. T. H. Nguyen, T. Duval and C. Fleury, “Guiding techniques for collaborative exploration in multi-scale shared virtual environments,” Int. Conf. Comp. Graph. Theory Appl., Barcelona, pp.327-336, 2013. [56] E. Rosen, D. Whitney, E. Phillips, D. Ullman, and S. Tellex, “Testing Robot Teleoperation using a VR Interface with ROS Reality,” in Proc. 1st Int. Workshop Virt. Augm. Mix. Real for HRI, 2018. [57] J. Rubin, Handbook of Usability Testing: How to Plan, Design and Conduct Effective Tests. Hoboken, NJ, USA: Wiley, 1993. [58] J. Nielson, Usability Engineering. San Mateo, CA, USA: Morgan Kaufmann, 1993. [59] D. Kasik, J. Troy, S. Amorosi, M. Murray, and S. Swamy, “Evaluating graphics displays for complex 3D models,” IEEE Comput. Graph. Appl., vol. 22, no. 3, pp. 56–64, May/Jun. 2002. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3057808, IEEE Access   VOLUME XX, 2017 9 Salvatore Livatino received the M.Sc. degree in Computer Science from the University of Pisa, Italy, in 1993 and the Ph.D. degree in Computer Science and Engineering from Aalborg University, Denmark, in 2003. He was a Researcher with the Scuola Superiore Sant’Anna, Pisa; (1993-‘97), with the University of Leeds, U.K., (1995), with INRIA Grenoble, France (1996); and the University of Edinburgh, U.K. (2001). He worked for 12 years at Aalborg University, first as Research Fellow, then as an Assistant Professor, and finally as an Associate Professor. He is currently a Reader in Virtual Reality and Robotics at the University of Hertfordshire, Hatfield, U.K. His teaching experience has mostly been within problem-based learning and multidisciplinary education. He is the author of several journal and conference papers, and has contributed to many European and U.K. projects. His research interests are in virtual and augmented reality user interfaces for tele-exploration and tele-operation, and focus on the use of stereocopic-3D visualization and immersive technology, computer vision and graphics algorithms, with applications in the field of telerobotics, telemedicine control panels and dashboards. Dario Guastella received the Automation Engineering master degree and the Ph.D. degree in Systems and Computer engineering at the University of Catania, Italy, in 2015 and 2019 respectively. He is currently a Postdoctoral Research Fellow in the Robotic Systems Group at the Department of Electrical, Electronics and Computer Engineering, University of Catania. His research activity focuses on cooperative mobile robots (both ground and aerial vehicles), terrain traversability analysis and Artificial Intelligence for autonomous navigation. Giovanni Muscato received the Electrical Engineering degree from the University of Catania, Catania, Italy, in 1988. After completing graduation, he was with the Centro di Studi sui Sistemi, Turin, Italy. In 1990, he joined the DIEEI University of Catania, where he is currently a Full- Time Professor of robotics and automatic control and since 2018, Director of the Department. His current research interests include service robotics and the cooperation between ground and flying robots. He was the coordinator of the EC project Robovolc and is the local coordinator of several national and European projects in robotics. He is the author of more than 300 papers in scientific journals and conference proceedings and three books in the fields of control and robotics. Prof. Muscato is with the Board of Trustees of the Climbing and Walking Robots (CLAWAR) Association and Senior member of the IEEE. Web site: www.muscato.eu Vincenzo Rinaldi received the B.S. degree and M.S. degree in computer engineering from the University of Catania, Italy, in 2015 and 2018 respectively. From 2018 to 2019 he was a Research and Development engineer at Arm23 SRL, Catania, Italy, focusing on the development of Mixed Reality applications. From 2020 he is a VR/AR Application Specialist at the Leverhulme Research Centre for Forensic Science at the University of Dundee, Dundee, United Kingdom. His work aims to aid research into the assessment of imaging techniques used for crime scene investigation, also developing novel methods of navigating and comparing 3D reconstructions in Virtual Reality. Luciano Cantelli received the M.S. degree in Computer Science Engineering and Ph.D. degree in Electronic and Automation Engineering from the University of Catania, Italy, in 2003 and 2007, respectively. He is currently a Research Fellow in Robotics and Automatic Control with the Dipartimento di Ingegneria Elettrica Elettronica e Informatica (DIEEI), University of Catania, Italy. From 2003 he was involved as Researcher in several national and European projects in robotics, such as the EC projects: ROBOVOLC (robot for volcano explorations), RAPOLAC (rapid production), and TIRAMISU (humanitarian demining). His research interests include industrial and service robotics, navigation and location systems for mobile robotics, mechatronics, multi-sensor data fusion, robotic assistive technologies, bioengineering. He is cofounder of the Etnamatica S.r.l, a Robotic Spin-Off company, in which he covers the role of CEO and research and development manager. Carmelo D. Melita obtained the Ph.D. degree in Electronics, Automation and Complex Systems Control Engineering at the University of Catania, thesis title: “Unmanned Aerial Systems for Volcanic Sites Inspection”. From February 2009 to 2011 he has been a research fellow at University of Catania to develop the subject “Robots control methodologies and performances evaluation”. From September 2005 to January 2006 he worked developing control algorithms and software for the Spiderbot robot, a climbing robot for the inspection of industrial tanks. He has been involved in several National and European projects (ROBOVOLC, MOW-BY-SAT, TIRAMISU). His recent research activities have been mainly focused on the control and navigation topics of autonomous mobile robots (ground, flying and underwater). In 2011 he founded Etnamatica srl: at present he is the CTO of the company. Alessandro Caniglia received the BSc in 2008 and MSc in 2011 both in Computer Engineering at University of Catania, Italy. In 2010, he was a visiting researcher at the University of Hertfordshire under the supervision of Dr. Salvatore Livatino and worked on a design and development of tele-guide systems for mobile robots using Augmented/Virtual Reality user interface. He is currently working at Microsoft as Azure Technical Advisor and trainer. Riccardo Mazza graduated in Computer Sciences at the University of Pisa (Italy) in 1997. He holds a Ph.D. in Communication Sciences from the University of Lugano (Switzerland), obtained in 2004, with a dissertation on the visual representation of students' data in Web-based distance education, pioneering the research in learning analytics already in 2001. From 1997 to 2020 he was a researcher at the Institute of Communication Technologies of the University of Lugano. Since 1999 he has been a lecturer and researcher in the Department of Innovative Technologies at the University of applied sciences and arts of southern Switzerland (SUPSI). He has been involved in a number of national and European research projects. He published the book “Introduction to information visualization” edited by Springer-Verlag in 2009, one of the few reference books in the discipline. His main research interests include information and data visualization, learning analytics and distance education. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3057808, IEEE Access   VOLUME XX, 2017 9 Gianluca Padula received his BSc in Electrical Engineering in 1996 (University of Catania, Italy), MSc in Biomedical Engineering in 2004 (Polytechnic of Milan, Italy) and PhD in Biophysics and Molecular Biology in 2010 (Medical University of Lodz, Poland). He has been working since then at Medical University of Lodz, Poland, first as Lecturer in Physics, Cardiology and Biophysics, and then as Specialist in Experimental and Clinical Physiology. In 2012 he became the Director of the Academic Laboratory of Movement and Human Physical Performance, Dynamolab, at the Medical University of Lodz. From 2020 he is a member of the Committee of Rehabilitation, Physical Education and Social Integration of the Polish Academy of Science, Poland.