A Survey of Recent Machine Learning Solutions for Ship Collision Avoidance 
and Mission Planning 
 
Pouria Sarhadi*, Wasif Naeem**, Nikolaos Athanasopoulos** 
 
* School of Physics, Engineering and Computer Science, University of Hertfordshire, Hatfield, UK, (e-mail: 
p.sarhadi@herts.ac.uk) 
** School of Electronics, Electrical Engineering and Computer Science,  
Queen’s University Belfast, Belfast, UK, (e-mails: w.naeem@ee.qub.ac.uk, n.athanasopoulos@qub.ac.uk) 
 
Abstract: Machine Learning (ML) techniques have gained significant traction as a means of improving the 
autonomy of marine vehicles over the last few years. This article surveys the recent ML approaches utilised 
for ship collision avoidance (COLAV) and mission planning. To this end, following an overview of ever-
expanding ML exploitation for maritime vehicles, essential topics in the mission planning of ships are 
outlined. Besides, notable papers with direct and indirect applications to the COLAV subject are technically 
reviewed and compared. Critiques, challenges, and future directions are also identified. The outcome 
clearly demonstrates the thriving research in this field, even though commercial marine ship incorporating 
machine intelligence that are able to perform autonomously under all operating conditions are still a long 
way off. 
Keywords: Machine learning, deep learning, mission planning, collision avoidance, autonomous ship, risk 
analysis, COLREGs. 
 
1. INTRODUCTION 
The use of Artificial Intelligence (AI) and Machine Learning 
(ML) has gained momentum for a variety of challenges around 
autonomous vehicles and related fields  (Ma et al., 2020, 
Aradi, 2020, Kuutti et al, 2021); Autonomous and electric 
vehicles are taking the lead after smartphones as the main 
outlets to demonstrate and promote digital technology. 
Maritime vehicles, in light of their high use in world trade are 
widely accepted as efficient transportation systems and are not 
exempt from this breakthrough. Some studies such as 
(Campbell et al., 2012) have suggested the importance of AI 
exploitation in reducing human errors and in preventing 
collision of maritime surface vehicles. Further, the significant 
number of recent industrial projects pitching autonomous 
functions in ship mission planning and control emphasises the 
importance of this topic. Mayflower (2022), Yara (2021), 
L3HARRIS (2021), Artemis (2020), Cetus (2020), and 
MAXCMAS (2018) are some of the most recent examples of 
prominent research projects aiming at the development of 
autonomous and high-tech vessels in a range of maritime 
applications. A simple literature search reveals a significant 
increase in research publications dealing with ML applications 
to autonomous ships. Classical approaches in path planning 
and collision avoidance of ships are continuously investigated 
in (Tam et al., 2009, Campbell et al., 2012, Huang et al., 2020, 
Zhang, et al., 2021a, Vagale et al., 2021a and 2021b, Li et al., 
2021, Ozturk, et al., 2022); Nevertheless, this survey is 
concerned with a different aspect, namely, the recent advances 
in ML techniques for mission planning and collision avoidance 
of maritime vessels. The applications of ML techniques to 
other (ground, underwater, and aerospace) autonomous robots 
are reported in comprehensive research articles. Recent studies 
include motion planning and control of autonomous cars 
(Aradi, 2020, Kuutti, et al., 2021, and Kiran et al., 2021), 
intelligent transportation systems (Haydari et al., 2020), 
robotics (Kroemer, et al., 2021, and Sun et al., 2021), 
Unmanned Aerial vehicles (UAVs) (Fraga-Lamas, et al., 
2019), Autonomous Underwater Vehicles (AUVs) (Hadi et al., 
2021), and spacecraft control system design (Shirobokov et al., 
2021). However, despite the considerable technological and 
commercial relevance of autonomous ships, this topic is less 
explored. In this paper, recent ML solutions for ship collision 
avoidance and mission planning are technically surveyed. It 
should be noted that the focus of this paper is on the mission 
planning and COLAV applications. Besides, the references 
provided in this article could be considered as a bibliography 
for the recent advances in autonomous ship design. There are 
several terms in the literature used to describe these vehicles, 
including Unmanned Surface Vehicles (USV), Autonomous 
Surface Vehicles (ASV), Maritime autonomous surface ships 
(MASS), nonetheless, USV is preferred in this paper. 
Following a brief introduction to ML and mission planning for 
USVs, state-of-the-art advancements and research in the field 
are presented. It is tried to categorise and compare the existing 
solution to draw out the existing shortcomings and envision 
the upcoming steps.   
The rest of this paper is organised in the following form. 
Section 2 outlines the application of ML techniques to USVs. 
In Section 3, areas to be addressed in the mission planning of 
an autonomous vessel are introduced. Sections 4 and 5 review 
and compare the existing research with direct and indirect 
applications to the planning problem, respectively. In Section 
6, achievements, challenges, and future directions in this 
research topic are identified based on the surveyed papers. 
Finally, Section 7 concludes the paper. 
  
     
 
2. BACKGROUND: MACHINE LEARNING AND ITS 
APPLICATION TO USVs 
Even though ML and AI are not new topics in engineering or 
data science, recent progress in Deep Learning (DL) 
techniques could enable AI usage for complex autonomous 
functions (Goodfellow et al., 2016, Li, 2017, Sutton and 
Barton 2018). Especially, Deep Reinforcement Learning 
(DRL) due to its affinity with control theory and learning 
ability via feedback from a reward function occupies a 
prominent position in intelligent mission planning and control 
(Kiran et al., 2021). As a result of their meaningful 
interpretation,  the terms agent, environment, and action in RL 
are substituted by the controller, controlled system (or plant), 
and control signal (Sutton and Barton 2018). In this study, it 
was found that DRL has been dominant in the ship mission 
planning topic. However, other AI solutions are also proposed 
in the literature. In fact, AI, ML, DL, RL algorithms used in 
control systems and autonomous vehicles share 
interconnections. Fig. 1. depicts the intersections amongst 
those topics in a Venn diagram as an extension of the one in 
(Goodfellow et al., 2016) to consider control systems and 
autonomous vehicle algorithms.  
The present study considers direct and indirect ML 
applications for the mission planning of USVs in the past five 
years.  The direct applications consider those algorithms that 
have been exploited for planning and collision avoidance 
purposes (Section 4). On the other hand, the second category 
encompasses techniques which are not directly used for 
collision avoidance but have the potential to be applied in other 
relevant topics such as risk assessment and global planning 
(Section 5). Those definitions and use-cases will be discussed 
in more detail within Sections 3-5.  
It is worth noting that there are fascinating papers that consider 
topics such as control or perception in USVs, however, they 
are not the focus of this survey. For instance, Martinsen et. al., 
(2022) have implemented RL-based nonlinear model 
predictive control for trajectory tracking of fully actuated 
vessels. A Deep Q-Network (DQN) is exploited for the system 
identification part of the controller. The proposed technique 
was tested on the ReVolt USV during square sides dynamic 
positioning. In another example, Du et al., (2022) developed a 
Lyapunov boundary deep deterministic policy gradient 
(DDPG) for a USV for vessel tracking and interception tasks. 
In the proposed strategy, a combination of line of sight (LOS) 
proportional guidance and neuron adaptive learning control 
were employed in the own ship (OS) to pursue a target ship 
(TS) and intercept it. The technique was implemented in a 
Gazebo-based virtual reality simulator. Other cases are also 
reported for formation control of USVs in Wang et al., (2021), 
auto-docking in Gjærum et al., (2021a, b), boat autopilot in Cui 
et al., (2019) and (2021), and path following control in 
Gonzalez-Garcia (2020). Nevertheless, in this article, mission 
planning (described in Section 3) applications are investigated, 
and the reviewed papers are categorised into the 
aforementioned two groups. It is important to note that this 
review is concentrated on key relevant papers that are 
published since 2018. Due to space limitations, detailed 
mathematical descriptions of the algorithms are out of the 
scope of this paper, however, the interested reader can avail of 
the references provided. 
 Based on a thorough search of this topic, there appears to be 
a significant increase in published research in this area. Fig. 2 
represents the number of published articles in recent years. 
Based on this figure, an exponential increase is anticipated in 
the upcoming years considering the time of this paper’s 
preparation. It should also be mentioned that popular science 
databases such as Google Scholar, Scopus, and the main 
publishers such as IEEE, ScienceDirect, etc. are used in this 
research. This is due to diverse terminologies and titles used to 
denote these systems (USV, ASV, MASS, etc.). Fig. 3 
illustrates a word cloud for the keywords utilised in the 
reviewed papers which reveals the diversity of keywords used 
in this topic. Words in Fig. 3 will be further discussed in the 
following sections.  
3. MISSION PLANNING PROBLEM Of MARITIME 
VESSELS 
In general, the planning or mission planning problem is related 
to generating feasible paths or trajectories to be tracked by a 
vessel. The mission may be pre-defined by a human operator 
or modified during the journey, either remotely or by the 
onboard crew. To this end, several key areas, specified and 
grouped in Fig 4, should be considered. Those topics are 
  
Fig. 2. Number of yearly publications on this survey's topic, 
predicting exponential growth in the upcoming years. 
 
 
Fig. 1. A Venn diagram illustrating the relationship between AI, 
ML, DL, RL, algorithms of control systems and autonomous 
vehicle algorithms. 
 
  
     
 
outlined in this section, and they are used as the foundation of 
the comparison between the reviewed ML techniques. 
3.1 Global Planning 
Global planning algorithms generate a feasible set of 
waypoints for a mission. Different aspects such as optimality 
(in terms of path, time, fuel consumption etc.), adhering to 
maritime rules, feasibility based on an updated map, to name a 
few are considered at this level. As an instance, Fig. 5 shows 
how successive waypoints (WP1-20) are generated to traverse 
between Oslo and Trondheim (Zhang et al., 2021a).  
3.2 Local Planning 
Local planners are algorithms that plan the motion in terms of 
detailed paths or trajectories to be tracked by the vessel and 
define how to traverse between global waypoints. Note the 
difference between path and trajectory lies is in the inclusion 
of time in trajectory planning, whereas, in path planning, the 
time to reach a certain point is less important. In practice, a 
straight line is the shortest path between two waypoints, and 
algorithms are designed to minimise the Cross Tracking Error 
(CTE) to adhere to the shortest path. This task is sometimes 
called path or trajectory following. Nonetheless, traversing a 
straight-line path between any two waypoints is not always 
possible due to the presence of obstacles and environmental 
disturbances. Regardless, collision avoidance could also be 
needed even if LOS is maintained. 
3.3 Collision Avoidance (COLAV) 
The goal of COLAV is to modify the planned path or trajectory 
in such a way that prevents any collision with an obstacle. 
Some instances classify COLAV as a sub-task of local 
planning, however, because of their different nature in 
classical algorithms, here they are categorised as a separate 
subtask of the motion planning. Those obstacles could be static 
including isles, buoys, maritime infrastructures, etc. or 
dynamic such as other vessels, drifting objects, animals, etc. 
An illustration of path following, and COLAV is depicted in 
Fig. 6 where OS is the own ship, 𝜓 is the desired heading angle 
and  𝑊𝑃(𝑖) = [𝑋𝑤𝑝(𝑖), 𝑌𝑤𝑝(𝑖)] defines waypoints i being the 
index. As can be seen from Fig. 6, the planner should meet the 
waypoint tracking objective whilst avoiding collision with the 
obstacles simultaneously.  
3.4 COLREGs 
In maritime transportation, propelled vessels approaching each 
other should adhere to certain rules. These rules are defined by 
the International Maritime Organisation IMO (1972) and 
named Convention on the International Regulations for 
 
Fig. 6. Path planning and collision avoidance in local planning. 
 
 
 
Fig. 3. Keyword cloud in relevant papers of this survey.  
 
 
Fig. 5. Global waypoints planned ahead for a maritime journey 
from Oslo to Trondheim (from Zhang, et al., 2021a). 
 
 
Fig. 4. Different subjects to be considered in the mission planning 
of a ship. 
 
 
  
     
 
Preventing Collisions at Sea (COLREGs). Rules 5-8 and 13-
17 can be directly exploited in the mission planning of ships. 
Fig. 7, depicts a typical process to apply those COLREGs 
(modified from Namgung, and Kim 2021). Based on Fig. 8, 
and COLREGs rules, each ship should travel at a safe speed 
and look-out for close encounters. In case of a risk involving 
other ships in the vicinity, one or more of the three 
fundamental encounter situations including Overtaking, Head-
on, and Crossing situations could be identified. Appropriate 
action in the form of give-way or stand-on should be 
undertaken to avoid any collision. However, translating these 
rules to the algorithms is challenging as the rules were 
originally written for human consumption. In addition, 
learning these rules should be incorporated into the ML 
approaches.  
3.5 Risk Assessment  
One of the crucial factors to be considered in mission planning 
is risk assessment. Risk is usually defined in terms of an index 
(i.e., Collision Risk Index- CRI) that shows how likely a 
hazardous event such as a collision could happen. Different 
approaches are proposed to calculate the collision risk index 
(Huang et al., 2020a, Pietrzykowski, and Wielgos, 2021), 
however, most of them are based on the Closest Point of 
Approach (CPA) analysis. The CPA determines how close two 
ships would come to each other if they both continue to move 
at the same speed and direction. Distance to CPA (DCPA) and 
Time to CPA (TCPA) form the basis of most risk assessment 
techniques (Huang et al., 2020a). The computation of those 
parameters is based on the relative velocity vector (𝑉𝑟𝑒𝑙) and 
its relative angle (𝛼) between the OS and TS, as shown in Fig. 
8. For further information about the calculation procedure of 
DCPA and TCPA, one can refer to Sarhadi et al., (2022).   
3.6 Manoeuvring Constraints 
Ship manoeuvring constraints such as non-holonomic 
behaviour, underactuated dynamics, system time response, 
control limitations, etc. are important issues that should be 
carefully weighed in local planning. Those items are 
considered in the ship modelling and dynamics prediction 
approach. In the literature, diverse models are utilised to model 
and predict the behaviour of OS and TS (Huang et al., 2020a). 
In addition, the closed-loop control behaviour is another item 
to be included in planning to generate pragmatic paths. A 
highly precise model can result in a more practical machine 
learning algorithm. Hence, a proper planning approach should 
consider the aforementioned manoeuvring limitations. As 
mentioned, research in the field of risk assessment is vibrant 
and the interested reader is referred to Chen et al., (2019a), 
Huang et al., (2020a, b), Du et al., (2021) for further 
information about risk analysis in ship manoeuvre.   
3.7 Environmental Disturbances 
To improve the precision of vessel models, environmental 
disturbances should be taken into account. To this purpose, 
high-fidelity ship models consider the effect of waves, winds, 
and currents in their models (Fossen, 2011). Therefore, due to 
the impact of the environmental disturbances in vessel motion, 
they should be modelled in the learning procedure or at least 
tested after the learning phase in ML-based mission planning 
and COLAV design.  
Other topics can be considered in the algorithm design 
however, it is believed that 2.1-2.7 form the touchstone for 
comparison between surveyed ML approaches in this paper. 
4. ML WITH DIRECT APPLICATIONS TO THE PATH 
PLANNING AND COLLISION AVOIDANCE  
In this section, ML applications that could be directly utilised 
for (local) path planning and collision avoidance for 
autonomous vessels are discussed.  
 
                                                                    Fig 7.   The process for COLREGs based decision making 
 
 
Fig. 8. The CPA illustration between OS and TS to calculate TCPA 
and DCPA parameters 
 
 
  
     
 
Zhao and Roh (2019) proposed DRL for the collision 
avoidance of multiple USVs. In that study, two layers of fully 
connected (FC) multilayer perceptron are employed with a 
proximal policy optimization (PPO) learning algorithm. The 
vessels are modelled by 3DOF equations and disturbances 
were not considered. The output of the ML algorithm is the 
rudder angle for which amplitude and rate constraints are also 
considered for the actuation system. The reward function 
comprises of reaching the goal, heading error, cross tracking 
error, drift, collision to obstacles, and COLREGs. Simulation 
results in crossing, head-on and overtaking scenarios for 
multiple ships are presented. As an additional parameter, the 
control signal time history is revealed, which shows aggressive 
behaviour. This study is an expansion of research in Zhao et 
al., (2019). In Cheng and Zhang (2018), a concise deep 
reinforcement learning obstacle avoidance algorithm is 
developed for underactuated unmanned marine vessels. The 
yaw moment and the surge propulsion forces are considered as 
the action spaces for the ML algorithm. To overcome the 
discrete nature of DQN, the actions are selected from a space 
of one unit increase or decrease, or the previous identical 
value. The defined reward is based on the distance to the goal 
and the obstacle, drift from the straight path, and speed at the 
goal position. Some preliminary simulations are carried out in 
a limited space with several static obstacles. Risk and 
COLREGs topics are ignored in the planning, and excessive 
input efforts were visible in exhibited simulations.   
Xu et al., (2019) developed a DDPG for the same problem. For 
data generation, a dynamic model was utilised, whilst thrust 
and turning moments are the action spaces. Distance to target, 
distance to obstacles, maximum lateral velocity to prevent 
drift, and speed reduction near the goal are reward function 
indices. Risk and COLREGs were not taken into account in 
planning, and simulations were exhibited for static obstacles. 
In Xie et al., (2019), a Model Predictive Control (MPC) via an 
Improved Q-learning Beetle Swarm Antenna Search (I-Q-
BSAS) and ANN (to estimate an inverse model for the optimal 
policy approximation) are merged for multi-ship collision 
avoidance. A combination of CRI, LOS tracking error and 
rudder optimisation was defined for the MPC optimisation 
problem which produces the rudder angle as the output. 
COLREGs and CPA-based risk analysis were also 
incorporated in planning. The proposed approach was tested 
on the KVLCC2 ship model in small as well as large-angle 
crossings, overtaking and head-on scenarios. In addition, the 
performance of I-Q-BSAS was compared against various 
optimisation techniques showing its superior performance.  
Moreover, in Xie et al. (2020), a model-free RL-based multi-
ship collision avoidance algorithm was developed which 
combined an asynchronous advantage actor-critic (A3C) 
algorithm, a long short-term memory neural network (LSTM) 
and Q-learning. The LSTM part is exploited to accelerate the 
model-free A3C learning by adaptive Q-learning decisions. 
The reward function embraces a CRI, control action and LOS 
tracking error and generates the required rudder angle for path 
following and COLAV. It should be noted, in this research, 
COLREGs were not considered. 
Meyer et al., (2020a) conducted a comparative study between 
state-of-the-art DRL techniques i.e., PPO, DDPG, Actor-Critic 
using Kronecker-Factored Trust Region (ACKTR), and 
Asynchronous Advantage Actor-Critic (A3C) techniques. The 
comparison was carried out for the collision avoidance of a 
3DOF model of Cybership II in OpenAI gym Python toolkit. 
Better performance for PPO has been reported for diverse path 
following and collision avoidance scenarios. Nevertheless, the 
authors outlined the challenge of the application of a massive 
number of NN parameters to safety-critical systems. The same 
researchers presented a PPO-based COLREGs-compliant 
collision avoidance for Cybership II USV model in Meyer et 
al., (2020b). A reward function that considers path following 
(based on CTE and speed) and collision avoidance (separately 
for static and dynamic obstacles) were proposed. The 
performance of this approach was tested on map-based static 
obstacles and some COLREGs situations. Interesting 
simulation scenarios were illustrated since some of them were 
carried out in real map situations. However, the environmental 
disturbances were ignored which is a shortcoming. In Meyer 
et al., (2022) the former research is improved to include a 
CPA-based risk-analysis in COLAV decision making. In Liu 
and Jin (2020), a combination of deep Q-network (DQN), 
double DQN, and duelling DQN has been proposed for ship 
collision avoidance. Seven discrete speed and angular rate 
choices were separately considered as action spaces. A 
positive reward for reaching the goal and negative commands 
for hitting an obstacle or stopping were considered. 
Nevertheless, risk and COLREGs are not taken into account in 
decision making. For simulation, points in a 2D gaming 
environment were deployed.  
In Chun et al., (2021), the research in Zhao and Roh (2019) has 
been comprehensively extended to incorporate the risk 
analysis in the planning phase. The CPA and ship domain are 
simultaneously utilised for risk assessment whereas the PPO is 
the main learning algorithm. The defined reward function is 
based on two categories: 1) path following to include reaching 
the goal, CTE and check points (waypoints); 2) COLAV, 
which includes the risk-based collision avoidance and 
COLREGs. The control actions are selected from a discrete 
vector containing zero, minimum and maximum rudder rates. 
Although the proposed technique was simulated in a large 
area, it was restricted to one similar azimuth. The proposed 
technique is compared to the conventional A* algorithm and a 
better performance is claimed. 
Sawada et al., (2021), have proposed an idea to employ LSTM 
in PPO-based DRL to generate continuous COLAV actions. 
The inputs to the network are fed from a grid sensor (a virtual 
sensor to percept the OS and TS locations), waypoints, and 
own ship’s information. Obstacle zone by target (OZT) is 
established to consider the risk of collision. A reward function 
based on the distance to waypoints, move to starboard, yaw 
stability, COLAV and arriving at the target point is developed. 
The outputs of the ML algorithm are heading command, and 
rudder angle. COLREGs are incorporated in only move to 
starboard preference. Good practice in this article is to use 22 
standard encountering test scenarios called the Imazu problem. 
The same benchmark problem is exploited in Zhai et al., 
(2022) to assess the developed COLAV technique. In the 
future, elaborated test scenarios are required to compare the 
safety and performance of emerging proposed algorithms 
(either ML or classical).  
 
  
     
 
         Table 1.  Comparison among selected research on ML-based path planning and COLAV for USVs 
T
es
ti
n
g
 s
ce
n
a
ri
o
s 
C
ro
ss
in
g
, 
h
ea
d
in
g
, 
o
v
er
ta
k
in
g
  
S
m
al
l-
 a
n
d
 l
ar
g
e-
an
g
le
 
cr
o
ss
in
g
, 
o
v
er
ta
k
in
g
 
an
d
 h
ea
d
in
g
 o
n
 
C
ro
ss
in
g
, 
h
ea
d
in
g
, 
o
v
er
ta
k
in
g
  
C
ro
ss
in
g
, 
h
ea
d
in
g
, 
o
v
er
ta
k
in
g
  
M
ap
-b
as
ed
 s
ta
ti
c 
o
b
st
ac
le
s 
an
d
 
C
O
L
R
E
G
s 
 
S
ta
ti
c 
an
d
 d
y
n
am
ic
 
o
b
st
ac
le
s 
in
 s
m
al
l 
d
im
en
si
o
n
 
U
p
 t
o
 t
h
re
e 
sh
ip
s 
in
 
C
O
L
R
E
G
S
 a
n
d
 I
m
az
u
 
p
ro
b
le
m
  
h
ea
d
in
g
 o
n
, 
cr
o
ss
in
g
, 
o
v
er
ta
k
in
g
 i
n
 a
 
si
m
u
la
to
r 
h
ea
d
in
g
 o
n
, 
cr
o
ss
in
g
, 
o
v
er
ta
k
in
g
 i
n
 s
m
al
l 
d
im
en
si
o
n
 
U
p
 t
o
 t
h
re
e 
sh
ip
s 
in
 
C
O
L
R
E
G
S
 a
n
d
 I
m
az
u
 
p
ro
b
le
m
  
C
O
L
R
E
G
s 
sc
en
ar
io
s 
an
d
 s
ta
ti
c 
o
b
st
ac
le
s 
P
er
fo
rm
a
n
ce
 i
n
d
ic
es
 
R
ea
ch
in
g
 t
h
e 
g
o
al
, 
h
ea
d
in
g
 e
rr
o
r,
 
cr
o
ss
 e
rr
o
r,
 d
ri
ft
, 
co
ll
is
io
n
, 
C
O
L
R
E
G
s 
C
R
I,
 L
O
S
 t
ra
ck
in
g
 e
rr
o
r 
an
d
 r
u
d
d
er
 
o
p
ti
m
is
at
io
n
 
R
ew
ar
d
 f
o
r 
p
at
h
 f
o
ll
o
w
in
g
 (
C
T
E
),
 
co
ll
is
io
n
 a
v
o
id
an
ce
 (
T
C
P
A
 a
n
d
 
co
u
rs
e)
, 
an
d
 C
O
L
R
E
G
s 
R
ea
ch
in
g
 g
o
al
, 
cr
o
ss
 e
rr
o
r,
 c
h
ec
k
 
p
o
in
t 
(w
ay
p
o
in
ts
),
 r
is
k
-b
as
ed
 
co
ll
is
io
n
 a
v
o
id
an
ce
, 
C
O
L
R
E
G
 
p
at
h
 f
o
ll
o
w
in
g
 (
C
T
E
, 
an
d
 s
p
ee
d
) 
an
d
 C
O
L
A
V
 (
st
at
ic
 a
n
d
 d
y
n
am
ic
 
b
as
ed
 o
n
 C
O
L
R
E
G
S
) 
R
ew
ar
d
 b
as
ed
 o
n
 t
h
e 
A
P
F
s 
at
tr
ac
ti
v
e 
an
d
 r
ep
u
ls
iv
e 
fo
rc
es
 
R
ew
ar
d
 b
as
ed
 o
n
 d
is
ta
n
ce
 t
o
 
w
ay
p
o
in
t,
 m
o
v
e 
to
 s
ta
rb
o
ar
d
, 
y
aw
 
st
ab
il
it
y
 c
o
ll
is
io
n
 a
n
d
 a
rr
iv
in
g
 
d
is
ta
n
ce
 t
o
 t
ar
g
et
, 
h
ea
d
in
g
 e
rr
o
r,
 
co
ll
is
io
n
, 
C
O
L
R
E
G
s,
 r
ea
ch
in
g
 t
o
 
g
o
al
, 
sp
ee
d
 
A
rr
iv
in
g
 i
n
 t
ar
g
et
 a
n
d
 c
ra
sh
 
R
ew
ar
d
 b
as
ed
 o
n
 r
u
d
d
er
 a
ct
iv
it
y
, 
co
u
rs
e 
ch
an
g
e,
 p
at
h
 d
ev
ia
ti
o
n
, 
ri
sk
 
an
d
 C
O
L
R
E
G
s 
p
at
h
 f
o
ll
o
w
in
g
 (
C
T
E
, 
an
d
 s
p
ee
d
) 
an
d
 C
O
L
A
V
 (
st
at
ic
 a
n
d
 d
y
n
am
ic
 
b
as
ed
 o
n
 C
O
L
R
E
G
S
 a
n
d
 R
is
k
) 
R
is
k
  
N
o
 
Y
es
 
(C
P
A
) 
Y
es
 
(C
P
A
) 
Y
es
 (
S
h
ip
 
d
o
m
ai
n
+
C
P
A
) 
N
o
 
N
o
 
Y
E
S
 
(O
Z
T
) 
Y
es
 
(C
P
A
) 
N
o
 
Y
es
 
(C
P
A
) 
Y
es
 
(C
P
A
) 
M
L
 o
u
tp
u
t 
R
u
d
d
er
 
an
g
le
 
R
u
d
d
er
 
an
g
le
 
C
o
u
rs
e 
an
d
 
v
el
o
ci
ty
 
R
u
d
d
er
 
an
g
le
 
S
u
rg
e 
fo
rc
e 
an
d
 y
aw
 
to
rq
u
e 
N
in
e 
d
is
cr
et
e 
h
ea
d
in
g
 
co
m
m
an
d
s 
 
H
ea
d
in
g
 
co
m
m
an
d
, 
ru
d
d
er
 a
n
g
le
 
T
ru
st
, 
ru
d
d
er
  
R
u
d
d
er
  
C
o
u
rs
e 
al
te
ra
ti
o
n
 
S
u
rg
e 
fo
rc
e 
an
d
 y
aw
 
to
rq
u
e 
L
ea
rn
in
g
 d
a
ta
 
so
u
rc
e 
D
y
n
am
ic
 3
D
O
F
 
si
m
u
la
ti
o
n
 
D
y
n
am
ic
 3
D
O
F
 
m
o
d
el
 o
f 
K
V
L
C
C
2
 s
h
ip
 
Id
en
ti
fi
ed
 m
o
d
el
 
o
f 
W
A
M
-V
 U
S
V
 
an
d
 t
h
e 
re
al
 U
S
V
 
D
y
n
am
ic
 m
o
d
el
 
(3
D
o
F
) 
w
it
h
 w
in
d
 
an
d
 C
u
rr
en
t 
N
o
n
li
n
ea
r 
m
o
d
el
 
o
f 
C
y
b
er
sh
ip
 I
I 
N
o
t 
co
n
si
d
er
ed
  
N
o
m
o
to
 m
o
d
el
 
D
y
n
am
ic
 3
D
O
F
 
m
o
d
el
 
D
y
n
am
ic
 3
D
O
F
 
m
o
d
el
 
N
o
m
o
to
 m
o
d
el
 
D
y
n
am
ic
 3
D
O
F
 
m
o
d
el
 o
f 
C
y
b
er
S
h
ip
 I
I 
M
L
 A
p
p
ro
a
ch
 
T
w
o
 l
ay
er
s 
F
C
 
D
N
N
 P
P
O
 
M
P
C
 
I-
Q
-B
S
A
S
 
A
N
N
 
D
Q
N
 C
N
N
 
S
M
D
P
 V
O
 
D
N
N
 
P
P
O
 
P
P
O
 
A
P
F
-D
Q
N
 
D
R
L
 w
it
h
 
L
S
T
M
 
T
h
re
e 
la
y
er
s 
F
C
 D
N
N
 
(D
Q
N
) 
D
D
P
G
 
D
Q
N
 M
D
Q
N
 
M
D
D
P
G
 
 
D
D
Q
N
 
P
P
O
 
A
p
p
li
ca
ti
o
n
s 
C
O
L
A
V
 f
o
r 
m
u
lt
ip
le
 
U
S
V
s 
M
o
d
el
 p
re
d
ic
ti
v
e 
sh
ip
 
co
ll
is
io
n
 a
v
o
id
an
ce
 
C
O
L
A
V
 f
o
r 
U
S
V
s 
P
at
h
 p
la
n
n
in
g
 a
n
d
 
C
O
L
A
V
 u
si
n
g
 M
L
 
co
n
si
d
er
in
g
 t
h
e 
ri
sk
 a
n
d
 
C
O
L
R
E
G
s 
 
C
O
L
R
E
G
s-
co
m
p
li
an
t 
C
O
L
A
V
 
C
O
L
R
E
G
s 
b
as
ed
 c
o
ll
is
io
n
 
av
o
id
an
ce
 
C
O
L
A
V
 v
ia
 a
 n
o
v
el
 r
is
k
 
in
d
ex
 
U
S
V
 m
o
ti
o
n
 p
la
n
n
in
g
 a
n
d
 
C
O
L
A
V
 
U
S
V
 c
o
ll
is
io
n
 a
v
o
id
an
ce
 
O
n
ly
 a
u
to
n
o
m
o
u
s 
C
O
L
A
V
 
P
at
h
 f
o
ll
o
w
in
g
 a
n
d
 
C
O
L
A
V
  
R
ef
er
en
c
es
 
Z
h
ao
 a
n
d
 
R
o
h
 (
2
0
1
9
) 
X
ie
 a
t 
al
.,
 
(2
0
1
9
) 
W
o
o
 a
n
d
 
K
im
 (
2
0
2
0
) 
C
h
u
n
 e
t 
al
.,
 
(2
0
2
1
) 
M
ey
er
, 
et
 
al
.,
 (
2
0
2
1
) 
L
i 
at
 a
l.
, 
(2
0
2
1
) 
S
w
ad
a,
 e
t 
al
.,
 (
2
0
2
1
) 
X
u
, 
et
 a
l.
, 
(2
0
2
2
a)
 
Z
h
o
u
, 
et
 a
l.
, 
(2
0
2
2
) 
Z
h
ai
 e
t 
al
.,
 
(2
0
2
2
) 
H
ei
b
er
g
, 
et
 
al
.,
 (
2
0
2
2
) 
  
     
 
Zhai et al., (2022) have adopted the discrete Double DQN 
(DDQN) to solve the COLAV-only problem. It is assumed that 
the waypoint following is functioning and only heading 
alterations are required to avoid any collision. Therefore, the 
outputs of the ML are thirteen heading commands.   A reward 
function based on rudder activity, course change, path 
deviation, risk and COLREGs is developed. The Nomoto 
model (Fossen, 2011) is utilised to train the ML. As 
mentioned, the performance of the proposed technique is 
verified in Imazu problem tests. 
Fan et al., (2022) have also used DQN to generate COLAV 
commands. The discrete commands are in terms of rudder rate 
modifications. Rewards are allocated in two sets: i) final 
rewards, i. e., arriving to the destination, and COLREGs, ii) 
sample rewards, i. e., tracking and distance to the course are 
utilised. The Norrbin model (Fossen, 2011) of Lan Xin USV 
is employed to train the ML. Results in some COLREGs 
scenarios with one and multiple TS are represented to 
demonstrate the performance of the proposed algorithm and 
the implementation is considered as future work. 
In Zhou et al., (2022) an improved DQN in terms of a Modified 
Deep Q Network (MDQN) and a Modified DDPG or MDDPG 
were developed for the collision avoidance of USVs. The 
memory pool, success pool, and target network were modified 
to smooth the training process. A relatively simple reward 
function is employed that only considers the arriving at the 
target and the obstacle avoidance objectives. The proposed 
schemes were compared for some COLREGs scenarios. 
Superior performance for MDDPG is reported, however, the 
considered test dimension is small (<100m). Woo and Kim 
(2022) proposed DRL and a semi-Markov decision process 
(SMDP) for USVs’ COLAV. A DQN based on convolutional 
neural networks (CNN) and fully connected layers are 
developed to decide between path following or COLAV 
modes. Then, the velocity obstacles (VO) method is utilised 
for generating the necessary manoeuvre. For DRL training, a 
reward comprising path following (inverse of CTE), collision 
avoidance (including TCPA and vessel course), and 
COLREGs were introduced. Simulation and experimental 
results on the WAM-V USV platform are presented. The need 
for hundreds of hours of training and the adverse effect of 
modelling uncertainty in simulation to experiments are 
nominated as the main challenges of this approach.  
Duelling DQN prioritized replay (Duelling-DQNPR) is 
proposed in Gao et al., (2022) for the mission planning of a 
USV with AIS data learning. The proposed approach involves 
embedding a vehicle model in a real map and replaying AIS 
traffic data to learn mission planning tasks. Using both static 
and dynamic factors, a reward function is developed. The state 
space is a vector of OS location (outputs of a 3DOF dynamics 
model) and TS information from the AIS data replay. No risk 
index or COLREGs are considered based on the smaller size 
and greater manoeuvrability of the USV. The action space is 
selected between 11 discrete rudder levels. As another notable 
research, Xu et al., (2022a) incorporated a deep reinforcement 
learning into the COLREGs-compliant path planning and 
dynamic collision avoidance of ASVs. In this research, DDPG 
has been applied for generating thrust and rudder inputs to 
control the vessel. Saturations for the control inputs were also 
taken into account. The reward function is based on the 
distance to the target, heading error, collision, COLREGs, 
reaching the goal, and the speed. The ML algorithms have 
incorporated CPA-based risk in collision avoidance. 
Simulations are carried out in a visualised environment in 
different COLREGs situations with multiple target ships. A 
similar study to Xu et al., (2022a) is represented in Xu et al., 
(2022b). Because of space limitations or similarities to other 
papers, we are unable to discuss all noteworthy publications in 
this area here. However, an interested reader may refer to: 
Shen et al., (2019), Zhou at al., (2019), Chen et al., (2019b), 
Amendola et al., (2020), Guo et al., (2020), Wu et al., (2020), 
Luis et al., (2021), Xu et al., (2020), Wang et al., (2021), Zhang 
et al., (2021a), Chen et al., (2021a), Abede, et al., (2022).  
Table 1 outlines a comparison between some of the notable 
results in this field. It is noteworthy that all approaches have 
incorporated some COLREGs in the planning. Based on this 
table, all articles deployed DRL-based ML approaches. 
Selection of rudder as the main control signal and preferring 
constant speed are dominant in action space configurations. 
Diverse configurations for the reward function are developed 
that could be insightful. However, in the case of using a reward 
function with several weighted indices, a challenge could be 
defining the rewards’ weights which may result in supervised 
learning thus requiring expert human knowledge.   
5. ML WITH INDIRECT APPLICATIONS TO COLLISION 
AVOIDANCE  
This section investigates ML algorithms found in the literature 
that could be potentially employed in miscellaneous mission 
planning subtopics of autonomous vessels as outlined in 
Section 2. These include methods that have been proposed for 
global planning, collision analysis, ship trajectory prediction, 
etc. For instance, in Kim and Lee (2018) a deep learning 
framework called Ship Traffic Extraction Network (STENet) 
is proposed for medium- and long-term traffic prediction in a 
specific maritime area. For this purpose, a combination of 
CNN and FCNN was developed that predicted the number of 
ships in the caution area using ship length, destination, channel 
type, Pilot Onboard (POB) and Caution Area Estimated Time 
of Arrival (CAETA) from the AIS data, and ship movement 
data. A comparison between Dead Reckoning (DR), Support 
Vector Regression (SVR), Very Deep Convolutional 
Networks (VGGNet) models in ship traffic prediction is 
conducted with Mean Absolute Error (MAE) and the standard 
deviation as the basis for comparing their performances. 
Subsequently, better performance for STENet in both 
medium- and long-term predictions is asserted. 
Lei et al., (2019 and 2021) have modelled the human 
operators’ behaviour in encountering near-collision scenarios 
via AIS data. The proposed approach is based on two parts: 
conflict detection (clustering) and collision avoidance 
behaviour learning via Generative Adversarial Networks 
(GAN) with a long short-term memory (LSTM) based 
encoder-decoder architecture. In a series of papers, Murray 
and Perera (2021a, b, c) have proposed an AIS-based deep 
learning framework for ship behaviour prediction and 
proactive collision avoidance. Hierarchical Density-Based 
Spatial Clustering of Applications with Noise (HDBSCAN) 
and Deep Recurrent Neural Networks (RNN) was proposed in 
Murray and Perera (2021a) to cluster, learn, and anticipate the 
  
     
 
ship behaviour using AIS data. Predictions based on the real 
motion of a ship with a 300 m mean squared error in 30 min 
was exhibited. In Murray and Perera (2021c), a proposal for 
proactive reaction considering COLREGs and CPA analysis 
has been proposed. Further investigation is required to verify 
the fidelity of the proposed scheme. Monitoring ship safety in 
extreme weather events and developing contextually aware 
ship domains via ML algorithms are respectively explored in 
Rawson et al., (2021) and Rawson and Brito (2021).  Rawson 
et al., (2021) have recommended Support Vector Machines 
(SVM) over some ML algorithms to quantify the relative 
likelihood of an incident during the US Atlantic hurricane 
season. For this purpose, a comparison amongst SVM, 
Logistic Regression (LR), Random Forest (RF), Extreme 
Gradient Boosting (XGBoost), SVMs optimised using 
Stochastic Gradient Descent (SGD) and Multi-layer 
Perceptron (MLP) algorithms was carried out. The weather-
related risks are considered in seven areas including wind 
speed and height, vessel category, length, flag, age, distance 
from shore, and incident data. Finally, a case study on the 
Hurricane Matthew (October 2016) demonstrated the ability to 
predict the accident via a likelihood score. In Rawson and 
Brito (2021) developing risk models via contextually aware 
ship domains is studied. The RF machine learning algorithm is 
employed for data mining of big vessel traffic datasets to 
identify the encounter characteristics across various situations 
and to predict the critical passing distance between vessels. 
The developed ship domain is dependent on the ship speed, 
size, encounter type, weather, waterway characteristics and is 
trained on realistic ship encounter data. Potential advantages 
of this approach to estimate the risk of collision in a crowded 
maritime area are addressed that could be used in planning.  
In one of the most comprehensive articles in this category, 
Namgung, and Kim (2021) proposed an Adaptive Neuro-
Fuzzy Inference System (ANFIS) as a collision risk inference 
system that incorporates COLREGs and risk into the 
algorithm. Parameters including DCPA, TCPA, the variance 
of the compass bearing degree (VCD), the relative distance 
between own and target ship are utilised to compute a CRI 
using ANFIS. Risk inference for near-collision encounters 
based on real AIS data in a predefined area is presented. As an 
extension, the same authors have used a density-based spatial 
clustering of applications with noise (DBSCAN), a fuzzy 
inference system based on a near-collision (FIS-NC) and long 
short-term memory (LSTM) to draw out regional collision risk 
and assist the vessel traffic service operator. The focus is near-
collision situations where two ships’ domains overlap. Abede 
et al., (2021) have employed the Dempster-Shafer (DS) theory 
to estimate a collision risk index based on AIS data. The 
developed risk index contains vessel speed, relative speed and 
bearing as well as TCPA and DCPA. Gradient boosting 
regression (GBR) is deemed the best ML technique to enhance 
the DS evidence theory. Zhao et al., (2022) have proposed ship 
trajectory prediction via an ensemble ML algorithm to remove 
outliers in the raw AIS data to predict ship trajectories. For this 
purpose, publicly accessible databases and merchant websites 
have been utilised. Empirical mode decomposition (EMD) is 
used to suppress the AIS data outliers (for data denoising) and 
an ANN is utilised for the trajectory prediction. A comparison 
between LSTM, vondrak+ANN and wavelet+ANN in terms of 
prediction error is presented. Trajectory predictions for three 
typical ship types (i.e., container ship, cargo ship and 
passenger vessel) are presented in this paper. Jeon-Seok et al., 
(2022) have proposed a framework to generate maritime traffic 
routes using statistical density analysis. Hausdorff–distance, 
Douglas–Peucker, and DBSCAN algorithms are subsequently 
deployed to locate the waypoints and to create the connecting 
routes. The outcome is in the form of planned routes for 
Table 2.  Comparison between ML-based algorithms with various applications of planning 
Reference Application(s) ML approach Risk 
inclusion 
COLREGS 
Kim and Lee 
(2018) 
Medium- and long-term traffic 
prediction 
CNN, FCNN No No 
Murray and 
Perera 
(2021a, b) 
Ship behaviour prediction and 
proactive avoidance 
HDBSCAN and 
RNN 
No Yes 
Lei et al., 
(2019) and 
(2021) 
Prediction of maritime collision 
avoidance behaviour considering 
risk and COLREGs 
Clustering, GAN, 
and LSTM 
Yes (CPA) Yes 
Rawson et al., 
(2021) 
Monitoring ship safety in extreme 
weather events 
SVM 
Yes (weather-
related indices) 
No 
Rawson and 
Brito (2021) 
Developing risk models via 
contextually aware ship domains  
RF 
Yes (ship 
domain) 
No 
Namgung, 
and Kim 
(2021) 
Collision risk inference system ANFIS 
Yes (ship 
domain+CPA) 
Yes 
Namgung, 
and Kim 
(2021) 
To extract near collision situations 
DBSCAN, FIS-
NC, LSTM 
Yes (ship 
domain+CPA) 
No 
Abede et al., 
(2021) 
Collision risk index estimation 
Dempster-Shafer 
theory 
Yes (CPA) Yes 
Jeong-Seok et 
al., (2022) 
Global Planning (route planning 
for autonomous ships) 
Hausdorff–
distance, Douglas–
Peucker, and 
DBSCAN 
No No 
Zhao, et al., 
(2022) 
Ship trajectory prediction EMD, ANN No No 
 
  
     
 
autonomous surface ships.  Some other intriguing research can 
be found in Ozturk et al., (2019), Shi and Liu (2020), Gao and 
Shi (2020a, b), Chen et al., (2021b), Park and Jeong (2021), 
Ivanov at al., (2021). 
In Table 2, a summary of ML algorithms not in direct relation 
to local planning and COLAV is presented. As could be 
observed from Table 2, various learning techniques are utilised 
and unlike local planning solutions discussed in Section 4.1, 
DRL is not dominant. In this application, clustering and pattern 
recognition techniques are more common. Nevertheless, the 
proposed approaches discussed in this section have potential 
for applications such as global route planning and risk 
assessment. 
6.  ACHIEVEMENTS, CHALLENGES AND FUTURE 
DIRECTIONS 
According to the surveyed research, in recent years, a 
substantial effort has been dedicated to exploit advanced ML 
techniques to the ship mission planning and collision 
avoidance problems. Despite the progress achieved, this topic 
remains in its infancy with a long voyage ahead to attain 
practical algorithms in safety-critical systems such as ships. 
Here are some of the outstanding remarks to be addressed:   
1- The first and foremost, ML techniques demand a large 
amount of reliable data and computational time to be 
adequately trained. A certain degree of fidelity should also be 
maintained in these data for the algorithm to function properly. 
This demand result in some challenges as outlined below: 
1-1 As a result of the high computational cost, in most cases 
the algorithms are only trained on a limited set of scenarios or 
dimensions, thereby their generalisability is impacted. Hence, 
the algorithm trained in a restricted environment may not be 
able to extrapolate its acquired knowledge to realistic  
conditions. 
1-2 Due to the safety-critical nature of this application, 
planning cannot be easily solved based on trial-and-error 
learning, particularly in realistic scenarios. An extensive high-
fidelity simulation analysis is therefore required prior to any 
experimentation to instil confidence in the algorithm. 
Considering the aforementioned points, the transition between 
simulation to practise remains a major challenge (Pina et al., 
2021). In conventional algorithms, there are some tuning  
knobs to tweak the performance in real-world operations. 
However, the behaviour and tuning of the trained algorithms 
in real environments would be challenging especially if the 
simulation environment is not perfect. Therefore, specific 
consideration for “simulation to real” should be foreseen (Pina 
et al., 2021).   
2- Based on 1-1, due to the existing learning and computational 
challenges, most of the developed approaches are not tested in 
a comprehensive situation. To circumvent this, adopting 
proper Verification and Validation (V&V) procedures, e.g., 
testing in Monte Carlo runs and state of the art simulators such 
as digital twins, is recommended. A statistical analysis based 
on appropriate performance indicators should be conducted to 
assess the functionality of developed ML approaches from 
various aspects of view, such as feasibility and covering the 
mission planning areas presented in Section 3.   
3- To address safety constraints in practice and prove the 
resilient behaviour of developed algorithms, one suggestion is 
to use ML algorithms to develop a captain-assistive system 
(Du et al., 2020) advising the captain on potential feasible 
paths. Further training of those algorithms and examination of 
their performance under controlled conditions could 
potentially lead to full autonomy in the future. 
4- Based on this survey, DRL appears to be the dominant 
category of approaches in planning and collision avoidance of 
vessels. A possible explanation is the affinity of reinforcement 
learning with control. Nevertheless, reward function design 
remains a grand challenge in DRLs. Utilising complicated 
rewards with several indices (as seen in some papers) may 
transform the DRL into supervised learning (Kendall et al., 
2019). In this case, the definition of the reward and its index 
weights could become a non-trivial task.     
5- This subject offers a variety of novel topics to explore, such 
as the development of realistic simulation tools like  digital 
twins (Almeaibed, 2021, Vasanthan, and Nguyen, 2021), 
defining proper testing procedures and edge cases for 
algorithm acceptance (Perera, 2020), automatic test scenario 
generation (Riedmaier, et al., 2020), and leveraging ML 
techniques in animal behaviour modelling for marine animal 
obstacle avoidance (Schoeman et al., 2020). 
7. CONCLUSIONS 
This article surveyed recent advances in ML application for 
ship collision avoidance and mission planning. An 
exponentially increasing trend of published research on 
relevant topics has been identified in this review. Based on the 
pivotal areas of mission planning of autonomous ships, the 
available research was classified into two groups. The first 
group presented ML algorithms with direct application to 
collision avoidance and local planning. The second category 
included techniques that could be utilised in other mission 
planning applications such as global planning or risk 
assessment. Although various ML algorithms are adopted in 
the second category, it is found that DRL techniques (such as 
DQN, DDPG, PPO, etc.) are often used for local planning and 
collision avoidance. The choice may have been influenced by 
the analogy between reinforcement learning and the inner 
planning and control loops in autonomous vehicles. Last but 
not least, the main achievements, challenges, and future 
directions in this field were outlined. Within the next few 
years, it is likely that ML techniques will be implemented on 
autonomous ships. The challenge will be realizing those 
approaches as commercial products that are reliable and safe. 
ACKNOWLEDGEMENT 
The authors would like to thank the UK Research and 
Innovation for funding this project which is part of the Belfast 
Maritime Consortium under the Strength in Places Funding 
programme. 
 REFERENCES 
Abebe, M., Noh, Y., Seo, C., Kim, D., & Lee, I. (2021). 
Developing a Ship Collision Risk Index estimation model 
based on Dempster-Shafer theory. Applied Ocean Research, 
113, 102735. 
  
     
 
Abebe, M., Noh, Y., Kang, Y. J., Seo, C., Kim, D., & Seo, J. 
(2022). Ship trajectory planning for collision avoidance using 
hybrid ARIMA-LSTM models. Ocean Engineering, 256, 
111527. 
Almeaibed, S., Al-Rubaye, S., Tsourdos, A., & Avdelidis, N. P. 
(2021). Digital twin analysis to promote safety and security 
in autonomous vehicles. IEEE Communications Standards 
Magazine, 5(1), 40-46. 
Amendola, J., Miura, L. S., Costa, A. H. R., Cozman, F. G., & 
Tannuri, E. A. (2020). Navigation in Restricted Channels 
Under Environmental Conditions: Fast-Time Simulation by 
Asynchronous Deep Reinforcement Learning. IEEE Access, 
8, 149199-149213. 
Aradi, S. (2020). Survey of deep reinforcement learning for 
motion planning of autonomous vehicles. IEEE Transactions 
on Intelligent Transportation Systems. 
Artemis (2020). “Artemis Technologies to build zero emissions 
ferries following £60M funding”, 
https://www.artemistechnologies.co.uk/artemis-
technologies-to-build-zero-emissions-ferries-following-
60m-funding/, visited in March 2022. 
Campbell, S., Naeem, W., & Irwin, G. W. (2012). A review on 
improving the autonomy of unmanned surface vehicles 
through intelligent collision avoidance manoeuvres. Annual 
Reviews in Control, 36(2), 267-283. 
Cetus, (2022), “Uncrewed Surface Vessel (USV) Cetus for marine 
data gathering and systems development”,  
https://www.plymouth.ac.uk/research/esif-funded-
projects/usv-cetus,visited in March 2022. 
Chen, P., Huang, Y., Mou, J., & Van Gelder, P. H. A. J. M. 
(2019a). Probabilistic risk analysis for ship-ship collision: 
State-of-the-art. Safety science, 117, 108-122. 
Chen, C., Chen, X. Q., Ma, F., Zeng, X. J., & Wang, J. (2019b). 
A knowledge-free path planning approach for smart ships 
based on reinforcement learning. Ocean Engineering, 189, 
106299. 
Chen, C., Ma, F., Xu, X., Chen, Y., & Wang, J. (2021a). A novel 
ship collision avoidance awareness approach for cooperating 
ships using multi-agent deep reinforcement learning. Journal 
of Marine Science and Engineering, 9(10), 1056. 
Chen, X., Liu, Y., Achuthan, K., Zhang, X., & Chen, J. (2021b). 
A semi-supervised deep learning model for ship encounter 
situation classification. Ocean Engineering, 239, 109824. 
Cheng, Y., & Zhang, W. (2018). Concise deep reinforcement 
learning obstacle avoidance for underactuated unmanned 
marine vessels. Neurocomputing, 272, 63-73. 
Chun, D. H., Roh, M. I., Lee, H. W., Ha, J., & Yu, D. (2021). 
Deep reinforcement learning-based collision avoidance for 
an autonomous ship. Ocean Engineering, 234, 109216. 
Cui, Y., Osaki, S., & Matsubara, T. (2019). Reinforcement 
learning boat autopilot: a sample-efficient and model 
predictive control based approach. In 2019 IEEE/RSJ 
International Conference on Intelligent Robots and Systems 
(IROS) (pp. 2868-2875). IEEE. 
Cui, Y., Osaki, S., & Matsubara, T. (2021). Autonomous boat 
driving system using sample‐efficient model predictive 
control‐based reinforcement learning approach. Journal of 
Field Robotics, 38(3), 331-354. 
Du, L., Banda, O. A. V., Goerlandt, F., Huang, Y., & Kujala, P. 
(2020). A COLREG-compliant ship collision alert system for 
stand-on vessels. Ocean Engineering, 218, 107866. 
Du, L., Banda, O. A. V., Huang, Y., Goerlandt, F., Kujala, P., & 
Zhang, W. (2021). An empirical ship domain based on 
evasive maneuver and perceived collision risk. Reliability 
Engineering & System Safety, 213, 107752. 
Du, B., Lin, B., Zhang, C., Dong, B., & Zhang, W. (2022). Safe 
deep reinforcement learning-based adaptive control for USV 
interception mission. Ocean Engineering, 246, 110477. 
Fan, Y., Sun, Z., & Wang, G. (2022). A Novel Reinforcement 
Learning Collision Avoidance Algorithm for USVs Based on 
Maneuvering Characteristics and COLREGs. Sensors, 22(6), 
2099. 
Fossen, T. I. (2011). Handbook of marine craft hydrodynamics 
and motion control. John Wiley & Sons. 
Fraga-Lamas, P., Ramos, L., Mondéjar-Guerra, V., & Fernández-
Caramés, T. M. (2019). A review on IoT deep learning UAV 
systems for autonomous obstacle detection and collision 
avoidance. Remote Sensing, 11(18), 2144. 
Gao, M., & Shi, G. Y. (2020a). Ship collision avoidance 
anthropomorphic decision-making for structured learning 
based on AIS with Seq-CGAN. Ocean Engineering, 217, 
107922. 
Gao, M., & Shi, G. Y. (2020b). Ship-Collision Avoidance 
Decision-Making Learning of Unmanned Surface Vehicles 
with Automatic Identification System Data Based on 
Encoder—Decoder Automatic-Response Neural Networks. 
Journal of Marine Science and Engineering, 8(10), 754. 
Gao, M., Kang, Z., Zhang, A., Liu, J., & Zhao, F. (2022). MASS 
autonomous navigation system based on AIS big data with 
dueling deep Q networks prioritized replay reinforcement 
learning. Ocean Engineering, 249, 110834. 
Gjærum, V. B., Rørvik, E. L. H., & Lekkas, A. M. (2021a). 
Approximating a deep reinforcement learning docking agent 
using linear model trees. In 2021 European Control 
Conference (ECC) (pp. 1465-1471). IEEE. 
Gjærum, V. B., Strümke, I., Alsos, O. A., & Lekkas, A. M. 
(2021b). Explaining a Deep Reinforcement Learning 
Docking Agent Using Linear Model Trees with User 
Adapted Visualization. Journal of Marine Science and 
Engineering, 9(11), 1178. 
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep 
learning. MIT press. 
Gonzalez-Garcia, A., Castañeda, H., & Garrido, L. (2020). USV 
Path-Following Control Based On Deep Reinforcement 
Learning and Adaptive Control. In Global Oceans 2020: 
Singapore–US Gulf Coast (pp. 1-7). IEEE. 
Guo, S., Zhang, X., Zheng, Y., & Du, Y. (2020). An autonomous 
path planning model for unmanned ships based on deep 
reinforcement learning. Sensors, 20(2), 426. 
Hadi, B., Khosravi, A., & Sarhadi, P. (2021). A review of the path 
planning and formation control for multiple autonomous 
underwater vehicles. Journal of Intelligent & Robotic 
Systems, 101(4), 1-26. 
Haydari, A., & Yilmaz, Y. (2020). Deep reinforcement learning 
for intelligent transportation systems: A survey. IEEE 
Transactions on Intelligent Transportation Systems. 
Heiberg, A., Larsen, T. N., Meyer, E., Rasheed, A., San, O., & 
Varagnolo, D. (2022). Risk-based implementation of 
COLREGs for autonomous surface vehicles using deep 
reinforcement learning. Neural Networks, 152, 17-33. 
Huang, Y., Chen, L., Chen, P., Negenborn, R. R., & Van Gelder, 
P. H. A. J. M. (2020a). Ship collision avoidance methods: 
State-of-the-art. Safety science, 121, 451-473.  
Huang, Y., & Van Gelder, P. H. A. J. M. (2020b). Collision risk 
measure for triggering evasive actions of maritime 
autonomous surface ships. Safety science, 127, 104708. 
  
     
 
IMO (1972). Convention on the International Regulations for 
Preventing Collisions at Sea. Available: 
https://www.imo.org/en/About/Conventions/Pages/COLRE
G.aspx. 
Ivanov, Y. S., Zhiganov, S. V., & Ivanova, T. I. (2021). Intelligent 
deep neuro-fuzzy system of abnormal situation recognition 
for transport systems. In Current Problems and Ways of 
Industry Development: Equipment and Technologies (pp. 
224-233). Springer, Cham. 
Jeong-Seok, L. E. E., Hyeong-Tak, L. E. E., & Ik-Soon, C. H. O. 
(2022). Developing a Maritime Traffic Route Framework 
Based on Statistical Density Analysis from AIS Data Using 
a Clustering Algorithm. IEEE Access. 
Kendall, A., Hawke, J., Janz, D., Mazur, P., Reda, D., Allen, J. 
M., & Shah, A. (2019). Learning to drive in a day. In 2019 
International Conference on Robotics and Automation 
(ICRA) (pp. 8248-8254).  
Kim, K. I., & Lee, K. M. (2018). Deep learning-based caution area 
traffic prediction with automatic identification system sensor 
data. Sensors, 18(9), 3172. 
Kiran, B. R., Sobh, I., Talpaert, V., Mannion, P., Al Sallab, A. A., 
Yogamani, S., & Pérez, P. (2021). Deep reinforcement 
learning for autonomous driving: A survey. IEEE 
Transactions on Intelligent Transportation Systems. 
Kroemer, O., Niekum, S., & Konidaris, G. D. (2021). A review of 
robot learning for manipulation: Challenges, representations, 
and algorithms. Journal of machine learning research, 22(30). 
Kuutti, S., Bowden, R., Jin, Y., Barber, P., & Fallah, S. (2021). A 
survey of deep learning applications to autonomous vehicle 
control. IEEE Transactions on Intelligent Transportation 
Systems, 22(2), 712-733. 
L3HARRIS (2021), “L3HARRIS technologies to design long-
endurance autonomous surface ship concept for us defense 
advanced research projects agency”, 
https://www.l3harris.com/newsroom/press-
release/2021/03/l3harris-technologies-design-long-
endurance-autonomous-surface-ship, visited in March 2022. 
Lei, P. R., Yu, P. R., & Peng, W. C. (2019). A framework for 
maritime anti-collision pattern discovery from AIS network. 
In 2019 20th Asia-Pacific Network Operations and 
Management Symposium (APNOMS) (pp. 1-4). IEEE. 
Lei, P. R., Yu, P. R., & Peng, W. C. (2021). Learning for 
Prediction of Maritime Collision Avoidance Behavior from 
AIS Network. In 2021 22nd Asia-Pacific Network 
Operations and Management Symposium (APNOMS) (pp. 
222-225). IEEE. 
Li, Y. (2017). Deep reinforcement learning: An overview. arXiv 
preprint arXiv:1701.07274. 
Li, M., Mou, J., Chen, L., Huang, Y., & Chen, P. (2021). 
Comparison between the collision avoidance decision-
making in theoretical research and navigation practices. 
Ocean Engineering, 228, 108881. 
Li, L., Wu, D., Huang, Y., & Yuan, Z. M. (2021). A path planning 
strategy unified with a COLREGS collision avoidance 
function based on deep reinforcement learning and artificial 
potential field. Applied Ocean Research, 113, 102759. 
Liu, X., & Jin, Y. (2020). Reinforcement learning-based collision 
avoidance: impact of reward function and knowledge 
transfer. AI EDAM, 34(2), 207-222. 
Luis, S. Y., Reina, D. G., & Marín, S. L. T. (2021). A multiagent 
deep reinforcement learning approach for path planning in 
autonomous surface vehicles: The Ypacaraí lake patrolling 
case. IEEE Access, 9, 17084-17099. 
Ma, Y., Wang, Z., Yang, H., & Yang, L. (2020). Artificial 
intelligence applications in the development of autonomous 
vehicles: a survey. IEEE/CAA Journal of Automatica Sinica, 
7(2), 315-329. 
Martinsen, A. B., Lekkas, A. M., & Gros, S. (2022). 
Reinforcement learning-based NMPC for tracking control of 
ASVs: Theory and experiments. Control Engineering 
Practice, 120, 105024. 
Mayflower (2022), “The Uncharted: Autonomous Ship Project 
No captain. No crew. No problem”, 
https://www.ibm.com/industries/federal/autonomous-ship, 
visited in March 2022. 
MAXCMAS, (2018). “MAXCMAS success suggests COLREGs 
remain relevant for autonomous ships”, https://www.rolls-
royce.com/media/press-releases/2018/21-03-2018-
maxcmas-success-suggests-colregs-remain-relevant-for-
autonomous-ships.aspx,  visited in March 2022. 
Meyer, E., Robinson, H., Rasheed, A., & San, O. (2020a). Taming 
an autonomous surface vehicle for path following and 
collision avoidance using deep reinforcement learning. IEEE 
Access, 8, 41466-41481. 
Meyer, E., Heiberg, A., Rasheed, A., & San, O. (2020b). 
COLREG-compliant collision avoidance for unmanned 
surface vehicle using deep reinforcement learning. IEEE 
Access, 8, 165344-165364. 
Murray, B., & Perera, L. P. (2021a). An AIS-based deep learning 
framework for regional ship behavior prediction. Reliability 
Engineering & System Safety, 215, 107819. 
Murray, B., & Perera, L. P. (2021b). Deep representation 
learning-based vessel trajectory clustering for situation 
awareness in ship navigation. In Developments in Maritime 
Technology and Engineering (pp. 157-165). CRC Press. 
Murray, B., & Perera, L. P. (2021c). Proactive Collision 
Avoidance for Autonomous Ships: Leveraging Machine 
Learning to Emulate Situation Awareness. IFAC-
PapersOnLine, 54(16), 16-23. 
Namgung, H., & Kim, J. S. (2021a). Collision risk inference 
system for maritime autonomous surface ships using 
COLREGs rules compliant collision avoidance. IEEE 
Access, 9, 7823-7835. 
Namgung, H., & Kim, J. S. (2021b). Regional Collision Risk 
Prediction System at a Collision Area Considering Spatial 
Pattern. Journal of Marine Science and Engineering, 9(12), 
1365. 
Ozturk, U., Birbil, S. I., & Cicek, K. (2019). Evaluating 
navigational risk of port approach manoeuvrings with expert 
assessments and machine learning. Ocean Engineering, 192, 
106558. 
Öztürk, Ü., Akdağ, M., & Ayabakan, T. (2022). A review of path 
planning algorithms in maritime autonomous surface ships: 
Navigation safety perspective. Ocean Engineering, 251, 
111010. 
Park, J., & Jeong, J. S. (2021). An Estimation of Ship Collision 
Risk Based on Relevance Vector Machine. Journal of Marine 
Science and Engineering, 9(5), 538. 
Perera, L. P. (2020). Deep learning toward autonomous ship 
navigation and possible COLREGs failures. Journal of 
Offshore Mechanics and Arctic Engineering, 142(3). 
Pietrzykowski, Z., & Wielgosz, M. (2021). Effective ship 
domain–Impact of ship size and speed. Ocean Engineering, 
219, 108423. 
Pina, R., Tibebu, H., Hook, J., De Silva, V., & Kondoz, A. (2021). 
Overcoming Challenges of Applying Reinforcement 
  
     
 
Learning for Intelligent Vehicle Control. Sensors, 21(23), 
7829. 
Riedmaier, S., Ponn, T., Ludwig, D., Schick, B., & Diermeyer, F. 
(2020). Survey on scenario-based safety assessment of 
automated vehicles. IEEE access, 8, 87456-87477. 
Rawson, A., Brito, M., Sabeur, Z., & Tran-Thanh, L. (2021). A 
machine learning approach for monitoring ship safety in 
extreme weather events. Safety science, 141, 105336. 
Rawson, A., & Brito, M. (2021). Developing contextually aware 
ship domains using machine learning. The Journal of 
Navigation, 74(3), 515-532. 
Sarhadi, P., Naeem, W., & Athanasopoulos, A. (2022). An 
Integrated Risk Assessment and Collision Avoidance 
Methodology for an Autonomous Catamaran with Fuzzy 
Weighting Functions. 13th UK Automatic Control Council 
(UKACC) International Conference (accepted paper).  
Sawada, R., Sato, K., & Majima, T. (2021). Automatic ship 
collision avoidance using deep reinforcement learning with 
LSTM in continuous action spaces. Journal of Marine 
Science and Technology, 26(2), 509-524. 
Schoeman, R. P., Patterson-Abrolat, C., & Plön, S. (2020). A 
global review of vessel collisions with marine animals. 
Frontiers in Marine Science, 7, 292. 
Shen, H., Hashimoto, H., Matsuda, A., Taniguchi, Y., Terada, D., 
& Guo, C. (2019). Automatic collision avoidance of multiple 
ships based on deep Q-learning. Applied Ocean Research, 
86, 268-288. 
Shirobokov, M., Trofimov, S., & Ovchinnikov, M. (2021). Survey 
of machine learning techniques in spacecraft control design. 
Acta Astronautica, 186, 87-97. 
Shi, J. H., & Liu, Z. J. (2020). Deep Learning in Unmanned 
Surface Vehicles Collision-Avoidance Pattern Based on AIS 
Big Data with Double GRU-RNN. Journal of Marine Science 
and Engineering, 8(9), 682. 
Sun, H., Zhang, W., Runxiang, Y. U., & Zhang, Y. (2021). Motion 
planning for mobile Robots–focusing on deep reinforcement 
learning: A systematic Review. IEEE Access. 
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An 
introduction. MIT press. 
Tam, C., Bucknall, R. and Greig, A., (2009). Review of collision 
avoidance and path planning methods for ships in close range 
encounters. The Journal of Navigation, 62(3), pp.455-476. 
Vasanthan, C., & Nguyen, D. T. (2021). Combining Supervised 
Learning and Digital Twin for Autonomous Path-
planning. IFAC-PapersOnLine, 54(16), 7-15. 
Vagale, A., Oucheikh, R., Bye, R. T., Osen, O. L., & Fossen, T.. 
(2021a). Path planning and collision avoidance for 
autonomous surface vehicles I: a review. Journal of Marine 
Science and Technology, 1-15. 
Vagale, A., Oucheikh, R., Bye, R. T., Osen, O. L., & Fossen, T.. 
(2021b). Path planning and collision avoidance for 
autonomous surface vehicles II: a comparative study of 
algorithms. Journal of Marine Science and Technology, 1-17. 
Wang, S., Ma, F., Yan, X., Wu, P., & Liu, Y. (2021). Adaptive 
and extendable control of unmanned surface vehicle 
formations using distributed deep reinforcement learning. 
Applied Ocean Research, 110, 102590. 
Wang, W., Luo, X., Li, Y., & Xie, S. (2021). Unmanned surface 
vessel obstacle avoidance with prior knowledge‐based 
reward shaping. Concurrency and Computation: Practice and 
Experience, 33(9), e6110. 
Woo, J., & Kim, N. (2020). Collision avoidance for an unmanned 
surface vehicle using deep reinforcement learning. Ocean 
Engineering, 199, 107001. 
Wu, X., Chen, H., Chen, C., Zhong, M., Xie, S., Guo, Y., & Fujita, 
H. (2020). The autonomous navigation and obstacle 
avoidance for USVs with ANOA deep reinforcement 
learning method. Knowledge-Based Systems, 196, 105201. 
Xie, S., Garofano, V., Chu, X., & Negenborn, R. R. (2019). Model 
predictive ship collision avoidance based on Q-learning 
beetle swarm antenna search and neural networks. Ocean 
Engineering, 193, 106609. 
Xie, S., Chu, X., Zheng, M., & Liu, C. (2020). A composite 
learning method for multi-ship collision avoidance based on 
reinforcement learning and inverse control. 
Neurocomputing, 411, 375-392. 
Xu, H., Wang, N., Zhao, H., & Zheng, Z. (2019). Deep 
reinforcement learning-based path planning of underactuated 
surface vessels. Cyber-Physical Systems, 5(1), 1-17. 
Xu, X., Lu, Y., Liu, X., & Zhang, W. (2020). Intelligent collision 
avoidance algorithms for USVs via deep reinforcement 
learning under COLREGs. Ocean Engineering, 217, 107704. 
Xu, X., Cai, P., Ahmed, Z., Yellapu, V. S., & Zhang, W. (2022). 
Path planning and dynamic collision avoidance algorithm 
under COLREGs via deep reinforcement 
learning. Neurocomputing, 468, 181-197. 
Xu, X., Lu, Y., Liu, G., Cai, P., & Zhang, W. (2022b). COLREGs-
abiding hybrid collision avoidance algorithm based on deep 
reinforcement learning for USVs. Ocean Engineering, 247, 
110749. 
Yara, (2021). “Yara to start operating the world’s first fully 
emission-free container ship”, 
https://www.yara.com/corporate-releases/yara-to-start-
operating-the-worlds-first-fully-emission-free-container-
ship/, visited in March 2022. 
Zhai, P., Zhang, Y., & Shaobo, W. (2022). Intelligent Ship 
Collision Avoidance Algorithm Based on DDQN with 
Prioritized Experience Replay under COLREGs. Journal of 
Marine Science and Engineering, 10(5), 585. 
Zhao, L., & Roh, M. I. (2019). COLREGs-compliant multiship 
collision avoidance based on deep reinforcement 
learning. Ocean Engineering, 191, 106436. 
Zhao, L., Roh, M. I., & Lee, S. J. (2019). Control method for path 
following and collision avoidance of autonomous ship based 
on deep reinforcement learning. Journal of Marine Science 
and Technology, 27(4), 1. 
Zhao, J., Lu, J., Chen, X., Yan, Z., Yan, Y., & Sun, Y. (2022). 
High-fidelity data supported ship trajectory prediction via an 
ensemble machine learning framework. Physica A: 
Statistical Mechanics and Its Applications, 586, 126470. 
Zhang, X., Wang, C., Jiang, L., An, L., & Yang, R. (2021a). 
Collision-avoidance navigation systems for Maritime 
Autonomous Surface Ships: A state of the art survey. Ocean 
Engineering, 235, 109380. 
Zhang, Q., Pan, W., & Reppa, V. (2021b). Model-reference 
reinforcement learning for collision-free tracking control of 
autonomous surface vehicles. IEEE Transactions on 
Intelligent Transportation Systems. 
Zhou, X., Wu, P., Zhang, H., Guo, W., & Liu, Y. (2019). Learn to 
navigate: cooperative path planning for unmanned surface 
vehicles using deep reinforcement learning. IEEE Access, 7, 
165262-165278. 
Zhou, C., Wang, Y., Wang, L., & He, H. (2022). Obstacle 
avoidance strategy for an autonomous surface vessel based 
on modified deep deterministic policy gradient. Ocean 
Engineering, 243, 110166.