0018-9545 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2019.2951501, IEEE
Transactions on Vehicular Technology
1
Hybrid Precoding for Multi-User Millimeter Wave
Massive MIMO Systems: A Deep Learning
Approach
Ahmet M. Elbir and Anastasios Papazafeiropoulos, Senior Member, IEEE
Abstract—In multi-user millimeter wave (mmWave) multiple-
input-multiple-output (MIMO) systems, hybrid precoding is a
crucial task to lower the complexity and cost while achieving a
sufficient sum-rate. Previous works on hybrid precoding were
usually based on optimization or greedy approaches. These
methods either provide higher complexity or have sub-optimum
performance. Moreover, the performance of these methods mostly
relies on the quality of the channel data. In this work, we
propose a deep learning (DL) framework to improve the per-
formance and provide less computation time as compared to
conventional techniques. In fact, we design a convolutional neural
network for MIMO (CNN-MIMO) that accepts as input an
imperfect channel matrix and gives the analog precoder and
combiners at the output. The procedure includes two main
stages. First, we develop an exhaustive search algorithm to
select the analog precoder and combiners from a predefined
codebook maximizing the achievable sum-rate. Then, the selected
precoder and combiners are used as output labels in the training
stage of CNN-MIMO where the input-output pairs are obtained.
We evaluate the performance of the proposed method through
numerous and extensive simulations and show that the proposed
DL framework outperforms conventional techniques. Overall,
CNN-MIMO provides a robust hybrid precoding scheme in the
presence of imperfections regarding the channel matrix. On
top of this, the proposed approach exhibits less computation
time with comparison to the optimization and codebook based
approaches.
Index Terms—Hybrid precoding, mmWave systems, multi-
user MIMO transmission, deep learning, convolutional neural
networks.
I. INTRODUCTION
Millimeter wave (mmWave) communication systems pro-
vide a higher data rate and wider bandwidth at high fre-
quencies (in the range of 30 − 300 GHz) [1]. Reasonably,
it has become a leading candidate to be realized in the fifth-
generation (5G) wireless networks [2]. However, in mmWave
bands, the propagation loss is higher as compared to conven-
tional systems with lower frequencies [1], [2]. To overcome
the high propagation path-loss and to provide beamforming
power gain, massive numbers of antennas are used at both the
transmitter and receiver sides by yielding a massive multiple-
Copyright (c) 2015 IEEE. Personal use of this material is permitted.
However, permission to use this material for any other purposes must be
obtained from the IEEE by sending a request to pubs-permissions@ieee.org.
A. M. Elbir is with the Department of Electrical and Electronics Engineer-
ing, Duzce University, Duzce, Turkey (e-mail: ahmetelbir@duzce.edu.tr).
A. Papazafeiropoulos is with the Communications and Intelligent Systems
Research Group, University of Hertfordshire, Hatfield AL10 9AB, U.K., and
also with SnT (http://www.securityandtrust.lu), University of Luxembourg, L-
1855 Luxembourg City, Luxembourg (e-mail: tapapazaf@gmail.com).
input-multiple-output (MIMO) structure enhancing the signal-
to-noise ratio (SNR) at the received signal [3].
Signal processing in conventional systems with frequencies
lower than 3GHz is performed digitally where both the am-
plitude and the phases are processed in the baseband. For
this reason, dedicated radio-frequency (RF) hardware for each
antenna element is required [4]. Unfortunately, in the case of
mmWave MIMO systems implemented with a large number of
antennas, digital processing is not cost-efficient since it brings
high cost at the system hardware and significant complexity.
To reduce the cost and provide sufficient performance, hy-
brid precoding architectures are proposed where the signal
is processed by both analog and digital precoders [5]–[8].
Especially, in the analog processing part of the hybrid systems,
phase shifters with constant modulus are usually used. The
role of phase shifters is the introduction of discrete phases
to the transmitted/received signal to steer the beam, and thus,
increase the gain [8].
In recent years, several techniques have been proposed to
design the hybrid precoding in mmWave MIMO systems. In
particular, initial works focused on the single-user scenario
[6]. In such a case, the user is assumed to be deployed with
multiple antennas. While the single-user case constitutes the
baseline for multi-user systems being of practical interest, the
interference from other users should be taken into account
when designing the precoders [7]–[10]. In [8], the performance
of low-resolution analog to digital converters (ADCs) are
investigated when a single RF chain is used at mobile users.
In [9], simultaneous channel estimation is considered for
multiple-user systems, while, in [10], antenna selection in
mmWave MIMO is considered together with hybrid precoding
estimation. The authors in [7] also consider the multi-user
scenario but the hybrid precoders are obtained by a greedy-like
approach as in [6] where a simultaneous orthogonal matching
pursuit (SOMP) algorithm is proposed. It is worthwhile to
mention that all of the above methods are based on the
assumption of perfect channel state information and the avail-
ability of the array response sets, namely, F and W for the
precoder and combiner design, respectively. These sets are
composed of the transmit and receive steering vectors with
respect to the direction-of-arrival/departures (DOA/DODs) of
the user locations. Taking into consideration that these array
responses are directly related to the singular value matrix of
the channel through a linear transformation, they become the
best candidates for the precoder design problem [5]–[7].
As a class of machine learning techniques, DL has gained
0018-9545 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2019.2951501, IEEE
Transactions on Vehicular Technology
2
much interest recently for the solution of many challenging
problems such as speech recognition, visual object recognition,
and language processing [11], [12]. DL has several advan-
tages such as low computational complexity when solving
optimization-based or combinatorial search problems and the
ability to extrapolate new features from a limited set of features
contained in a training set [11]. Very recently, a great deal of
attention has been received for DL-based techniques regarding
radar [13], and fundamental communication theory topics
[14]–[24] such as channel estimation [16], DOA estimation
[17], and analog beam selection [18]. Especially, in the physi-
cal layer of wireless communications, DL has been applied for
signal detection [19], channel estimation [21], [25], [26] and
dynamic multi-channel access problems [20]. In this direction,
an end-to-end communication scenario is modeled in [21] and
[22] by using auto-encoders where single-input-single-output
(SISO) systems are considered. The authors in [23] have also
used auto-encoders for the channel state information (CSI)
feedback problem. Interestingly, [24] studies the physical layer
structures without channel models via DL.
An interesting topic concerns the investigation of the hybrid
precoding problem in the context of DL [27]–[31]. Inspired
from dense fully connected layers, deep multilayer perceptrons
(MLPs) have been proposed in [27]–[29]. Specifically, in [27]
and [28], MLP has been employed only for the precoder
design and just for the single-user scenario. In [29], an MLP
architecture is considered for coordinated beam training where
the perfect CSI is assumed to be known. Moreover, in [30], a
convolutional neural network (CNN)-based approach has been
proposed for the joint precoder and combiner design problem
but for the single-user setting again. Also, in [31], quantized
and unquantized CNNs have been used for hybrid precoding
in the case of a single-user MIMO system. The performance
of DL-based approaches such as [27]–[29] strongly relies on
the perfectness of the channel matrix whereas in [30] and [31],
robust DL approaches are proposed against the imperfections
in the channel data but these works are developed only for the
single-user scenario.
A. Motivation
Although there are optimization-based approaches that di-
rectly estimate the precoders, they appear large computational
complexity and local-minimum problems due to random ini-
tialization [32]. Also, the design of hybrid precoders for the
common multi-user MIMO scenario, being of high practical
importance, has not been considered in the context of DL.
Thus, driven by the advantages of DL such as its provided
low computational complexity, we develop a method that can
handle the hybrid precoding design in the case of multi-user
MIMO transmission in the mmWave region when corrupted
channel feedback data is available.
B. Contribution
In this paper, we propose a DL framework in terms of
a CNN, which is for mmWaves hybrid precoding design,
henceforth called CNN-MIMO. In our DL framework, the
channel matrix of users is selected as the input of CNN-
MIMO, and the output labels are selected as the hybrid
precoder weights. In the training stage, which is an offline
process (please see Fig. 2), we generate several channel real-
izations of multiple users and obtain the corresponding hybrid
precoders via an exhaustive search algorithm. This process
requires the knowledge of the feasible sets of array responses
F ,W which are not used in the prediction stage. Once the
network is trained, CNN-MIMO is used to predict the hybrid
precoders by simply feeding the network with the channel
matrix of users. The proposed DL framework provides a
nonlinear mapping between the channel matrix and the hybrid
beamformers. Hence, the proposed method achieves more
robust performance than the competing algorithms since the
deep network can handle the imperfections and the corruptions
in the input channel data whereas the other algorithms do not
have such capability. The proposed approach also has superior
sum-rate performance due to the use of the “best” hybrid
beamformers which are obtained via an exhaustive search in
the training process. The main contributions of this work are
as follows.
• A DL-based approach is proposed for the hybrid pre-
coding in multi-user massive MIMO mmWave systems.
We leverage DL to estimate the precoder and combiner
weights so that CNN-MIMO is more robust against the
deviations in the channel matrix. Hence, the proposed
DL framework has superior performance with comparison
to the conventional greedy and codebook based tech-
niques [6]–[8] whose performances strongly rely on the
quality of the channel.
• In most of the previous works such as [6], [7], the
codebooks formed by the feasible set of array responses
F and W are assumed to be known. Then, the analog
precoding design problem reduces to the selection of the
best candidates in F andW to maximize the sum-rate. In
this work, we only need F andW in the training stage to
obtain the network labels and the proposed DL technique
does not require such information in the prediction stage
where DL network itself obtains the analog precoder
weights by learning the features hidden in the input data.
• To train the network, a very large training data (almost
half a million samples) is generated. Hence, a robust
performance against the imperfect channel case and the
deviations in the channel data is achieved.
• The proposed approach also enjoys less computation
time for hybrid precoding design. While the conventional
techniques require an optimization process or greedy
searches, our CNN approach estimates the precoders by
simply feeding the network with the corrupted channel
matrix.
C. Notation
Vectors and matrices are denoted by boldface lower and
upper case symbols, respectively. In the case of a vector a,
[a]i represents its ith element. For a matrix A, [A]:,i and
[A]i,j denote the ith column and the (i, j)th entry, respectively.
IN is the identity matrix of size N × N , E{·} denotes the
statistical expectation, and ‖ · ‖F is the Frobenious norm. The
notation (·)† denotes the Moore-Penrose pseudo-inverse while
∠{·} denotes the angle of a complex scalar/vector while the
notation, expressing a convolutional layer with N filters of
0018-9545 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2019.2951501, IEEE
Transactions on Vehicular Technology
3
Fig. 1. A multi-user MIMO system with hybrid (analog and baseband)
precoding on the BS and analog-only combining at K users.
size D × D, is given by N@D × D. For a complex scalar
a = ejϕ with continuous phase ϕ, Q(a) = ejϕB denotes
the quantization operator where ϕB is the quantized angle in
[0, 2pi] sampled with 2B points.
II. SYSTEM MODEL
We consider a multi-user mmWave MIMO system as shown
in Fig. 1. The base station (BS), serving K users each of
which has NR antennas, is employed with NT antennas and
NRFT RF chains. By taking into consideration of cheaper
hardware at each user, and subsequently, low power consump-
tion, we assume that the BS communicates with each user
via a single stream, i.e., NS = 1 [7]. Hence, only analog
combining is applied at the receiver. Another assumption is
that NRFT ≥ K, i.e., the maximum number of simultaneously
served users cannot be greater than the number of BS RF
chains. In the downlink, the BS applies baseband precoding
FBB = [fBB1 , fBB2 , . . . , fBBK ] ∈ CN
RF
T ×K to the transmit
signal s = [s1, s2, . . . , sK ]T ∈ CK obeying to E{ssH} =
P
K IK by assuming equal power allocation among the users.
Note that P denotes the average power. The RF precoders
FRF ∈ CNT×NRFT , which are constructed by phase shifters,
are used to convey the signal to NT transmit antennas. Also,
given that FRF consists of analog phase shifters, we assume
that the RF precoder has constant equal-norm elements, i.e.,
|[FRF]i,j |2 = 1/NT. In addition, we have the power constraint
‖FRFFBB‖2F = K that is enforced by the normalization of
FBB. Thus, the NT × 1 transmitted signal is written as
x = FRFFBBs. (1)
We can write the received signal of the kth user for a
narrowband block-fading channel as [33]
y˜k = Hk
K∑
n=1
FRFfBBnsn + nk, (2)
where Hk ∈ CNR×NT is the channel matrix between the BS
and the kth user with ‖Hk‖F = NRNT. The vector nk ∈ CNR
denotes the complex additive white Gaussian noise (AWGN)
with nk ∼ CN (0, σ2INR).
Once the transmitted signal is received from the kth user,
the received signal is processed by the combiner wRFk ∈ CNR
as yk = wHRFk y˜k, i.e.,
yk = w
H
RFk
Hk
K∑
n=1
FRFfBBnsn + w
H
RFk
nk, (3)
where the RF combiners are constructed by means of phase
shifters with the normalization constraint as |[wRFk ]i|2 =
1/NR.
A. Channel Model
In mmWave transmission, the channel can be represented by
a geometric model with limited scattering [34]–[36]. Hence,
we assume that the channel matrix Hk includes the contribu-
tions of L scattering paths. Considering a 2-D uniform planar
array (UPA), the channel matrix corresponding to the kth user
is given by
Hk = γ
L∑
l=1
αl,kgR(Θ
(l,k)
R )gT(Θ
(l,k)
T )aR(Θ
(l,k)
R )a
H
T (Θ
(l,k)
T ),
where Θ(l,k)R = (φ
(l,k)
R , θ
(l,k)
R ) and Θ
(l,k)
T = (φ
(l,k)
T , θ
(l,k)
T )
denote the angle of arrivals and departures, respectively. Note
that the angular parameters φ and θ ∈ [0, 2pi] correspond to
the azimuth and the elevation angles, respectively. The scalar
γ =
√
NTNR/L is the normalization factor and αl,k is the
complex channel gain associated with the kth user and lth path
l = 1, . . . , L. Also, gR(Θ
(l,k)
R ) and gT(Θ
(l,k)
T ) are the antenna
element gains for the antennas in the arrays while aR(Θ
(l,k)
R )
and aT(Θ
(l,k)
T ) are the NR × 1 and NT × 1 steering vectors
representing the array responses at the kth user and the BS,
respectively. The nth element of the steering vector aR(Θ
(l,k)
R )
is given as
[aR(Θ
(l,k)
R )]n = exp
{
−2pi
λ
pTnr(Θ
(l,k)
R )
}
, (4)
where λ is the wavelength, pn = [xn, yn, zn]T is the posi-
tion of the nth antenna in the Cartesian coordinate system.
Regarding the direction vector, it is given by
r(Θ
(l,k)
R ) =[sin(φ
(l,k)
R ) cos(θ
(l,k)
R ),
sin(φ
(l,k)
R ) sin(θ
(l,k)
R ), cos(θ
(l,k)
R )]
T . (5)
In a similar way, the transmitter side steering vector
aT(Θ
(l,k)
T ) can also be defined as for aR(Θ
(l,k)
R ).
By assuming that Gaussian symbols are transmitted through
the mmWave channel under study, the achievable rate for the
kth user is written as [5], [7]
Rk = log2
∣∣∣∣1 + PK |wHRFkHkFRFfBBk |2P
K
∑
n6=k |wHRFnHnFRFfBBn |2 + σ2
∣∣∣∣ (6)
and the achievable sum-rate of the system is given by R¯ =∑K
k=1Rk.
III. PROBLEM FORMULATION
The principal aim in this work is to design the hybrid
precoder and combiners FBB, FRF and {wRFk}Kk=1 in the
presence of imperfect channel data by maximizing the sum-
rate. Specifically, we first develop an algorithm to compute the
hybrid precoders which maximizes the sum-rate, and then a
0018-9545 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2019.2951501, IEEE
Transactions on Vehicular Technology
4
deep network is designed such that the hybrid precoders are
predicted by feeding the network with imperfect CSI.
In a nutshell, the proposed DL framework provides a
nonlinear mapping from the channel matrix H to the analog
beamformers FRF and {wRFk}Kk=1. The label generation
process depends on the channel model which is not required
for updating the network parameters in the training stage.
Hence, CNN-MIMO can also be used for various channel
models in mmWave systems [37]. Given that our main focus
is hybrid beamforming, in this work, we use the block-fading
channel model due to the simplistic structure of channel matrix
model and rate computation [25], [26], [38]. The application of
DL to other channel models is the topic of ongoing research.
The estimation process of the channel matrix of the users is
a challenging task, especially in the case of a large number of
antennas taking place in massive MIMO systems [39], [40].
In addition, since the coherence time of the channel is very
short in the mmWave massive MIMO scenario, the parameters
related to the channel characteristics change greatly in a short
time [41]. To obtain a robust precoding performance, we feed
the deep network with several channel realizations which are
corrupted by synthetic noise in the training stage which is an
offline process. Hence, in the testing stage when the network
predicts the precoder weights, the network does not necessarily
require the perfect CSI [30]. We show, through simulations,
that the proposed approach can handle the corrupted channel
matrix case and exhibits satisfactory performance regarding
the achievable sum-rate.
The main stages of the proposed DL framework are label
generation, training, and prediction. In the following section,
we first discuss how the labels are obtained from the channel
data. Then, in Section V, we present the details of the training
and the prediction stages.
IV. HYBRID PRECODING DESIGN IN MULTI-USER MIMO
SYSTEMS
In order to design the network and training data, we first
need to solve the hybrid precoding problem and obtain the
labels of the training data samples. For this reason, we first
develop an exhaustive search algorithm that visits all precoder
and combiner combinations in the feasible sets F and W
such that the sum-rate in (6) is maximized. Then, we solve
the exhaustive search problem in an offline manner to obtain
the training data inputs and labels. The advantage of using
a DL approach is the reduction of the computation time of
the hybrid precoding design problem and obtain near-optimum
performance that can be obtained from an exhaustive search.
We start by formulating the optimization problem for hybrid
precoding in the multi-user scenario as
{FˆBB, FˆRF,WˆRF} = argmax
FBB,FRF,WRF
R¯
subject to: FRF ∈ F , WRF ∈ W,
‖FRFFBB‖2F = K, (7)
where WRF = [wRF1 ,wRF2 , . . . ,wRFK ] denotes the analog
combiner of all users while F and W are the feasible sets of
the precoder and combiners. In practice, both F and W are
composed of the steering vectors aT(Θ
(l,k)
T ) and aR(Θ
(l,k)
R ),
∀l, k with quantized phases, respectively. Specifically, the
array response sets are selected as
F = {Q(aT(Θ(1,1)T )), . . . , Q(aT(Θ(L,K)T ))}, (8)
and
W = {Q(aR(Θ(1,1)R )), . . . , Q(aR(Θ(L,K)R ))}, (9)
where Q(·) denotes the phase quantization operator as men-
tioned before.
In the exhaustive search algorithm, it is desired to visit
all possible combinations of the elements in the feasible sets
F and W to achieve near-optimum performance. For this
reason, we design new feasible sets F and W which include
all precoder and combiner combinations. The search algorithm
visits all the nodes in the direction set
D = [0,
2pi
L¯
,
4pi
L¯
, . . . ,
(L¯− 1)2pi
L¯
], (10)
where |D| = L¯. By assuming that the BS receives L¯ paths
from each user, the kth column of FRF can take L¯ different
values, i.e., {Q(aT(Θ(l,k)T ))}L¯l=1. If we generalize it for all
users, we have QF = L¯K possible candidates to design FRF.
Thus, we define a new set as
F = {F1,F2, . . . ,FQF }, (11)
where FqF ∈ CNT×K is given by
FqF =[Q(aT(Θ
(l1,1)
T )), Q(aT(Θ
(l2,2)
T )), . . . , Q(aT(Θ
(lK ,K)
T ))]
with the indices for each user given by l1, l2, . . . , lK =
1, . . . , L¯. Hence, we have qF = 1, . . . , L¯K which de-
notes the precoder candidates for K users. In a similar
way, the set for the analog combiners is defined as W =
{W1,W2, . . . ,WQW } where WqW ∈ CNR×K is given by
WqW =[Q(aR(Θ
(l1,1)
R )), Q(aR(Θ
(l2,2)
R )), . . . , Q(aR(Θ
(lK ,K)
R ))]
with wRFk selected from the kth column of W, i.e.,
Q(aR(Θ
(lk,k)
R )).
Once the analog precoders are selected from the sets F and
W, the effective channel HeffqF ,qW ∈ CK×N
RF
T is given by
HeffqF ,qW =

heffqF ,qW ,1
heffqF ,qW ,2
...
heffqF ,qW ,K
 , (12)
where the corresponding effective channel for each user can
be calculated as
heffqF ,qW ,k = [WqW ]
H
:,kHkFqF . (13)
The baseband precoder can be given by
FBB,qF ,qW =
(
HeffqF ,qW
)†
and it is normalized as
f
(qF ,qW )
BBk
= f
(qF ,qW )
BBk
/‖FqF f (qF ,qW )BBk ‖F [7]. Thus, the
achievable sum-rate then can be written as
R¯qF ,qW = log2
∣∣∣∣IK+
P
Kσ2
HeffqF ,qWFBB,qF ,qWF
H
BB,qF ,qWH
effH
qF ,qW
∣∣∣∣. (14)
0018-9545 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2019.2951501, IEEE
Transactions on Vehicular Technology
5
Algorithm 1 Hybrid precoding for Multi-user MIMO
Input: {Hk}Kk=1, F, W, D.
Output: FˆRF, WˆRF.
1: for 1 ≤ qF ≤ QF do
2: FRF = FqF ,
3: for 1 ≤ qW ≤ QW do
4: wRFk = [WqW ]:,k,
5: heffk = w
H
RFk
HkFRF,
6: FBB =
(
Heff
)†
,
7: fBBk = fBBk/‖FRFfBBk‖F,
8: Compute R¯qF ,qW as in (14).
9: end for qW ,
10: end for qF ,
11: {q¯F , q¯W } = arg maxqF ,qW R¯qF ,qW .
12: FˆRF = Fq¯F and WˆRF = Wq¯W .
Using the sets F and W, the optimization problem in (7) can
be rewritten as
{q¯F , q¯W } = argmax
qF ,qW
R¯qF ,qW
subject to: FRF = FqF ,wRFk = [WqW ]:,k,
heffk = w
H
RFk
HkFRF,
FBB =
(
Heff
)†
,
fBBk = fBBk/‖FRFfBBk‖F, (15)
where q¯F and q¯W denote the indices providing the maximum
sum-rate. We summarize the algorithmic steps of the proposed
approach in Algorithm 1. Note that the proposed hybrid
precoding optimization in (15) is different than the one in [7],
in which, not all possible combinations of the analog precoders
are considered as it is done in this work. In Section VI, we
show that (15) yields better results as compared to [7]. The
problem in (15) requires to visit QFQW nodes to estimate the
hybrid precoders. To reduce the complexity and the need for
the array responses, in the following section, we propose a
DL-based approach where we elaborate on the details of the
training data generation and network architecture.
V. LEARNING-BASED HYBRID PRECODING
In this part, we present our DL framework for hybrid pre-
coding design. The proposed network architecture is illustrated
in Fig. 2. The CNN-MIMO architecture consists of ten layers
and it accepts an input data of size NR ×NT × 3 while it
yields a K(NT +NR) × 1 vector at the output. The overall
network architecture of CNN-MIMO can be represented by
the function Π(·) : RNR×NT×3 → RK(NR+NT). Let us define
the arithmetic operation of the ith layer in the network with
f (i)(·), then the representation of the overall network can be
given as
Π(X) = f (10)
(
f (9)(· · · f (1)(X) · · · )) = z, (16)
where each layer has certain task described above and we
explicitly show the arithmetic operations for fully connected
layers are convolutional layers in the sequel.
Let W¯ ∈ RCx×Cy be the weights of a fully connected layer
in the network with input x¯ ∈ RCx and output y¯ ∈ RCy . The
cyth element of the output of the layer can be given by the
inner product
y¯cy = 〈W¯cy , x¯〉 =
∑
i
[W¯]Tcy,ix¯i, (17)
for cy = 1, . . . , Cy and W¯cy is the cyth column vector of W¯.
For a convolutional layer, define X¯ ∈ Rdx×dx×Cx and Y¯ ∈
Rdy×dy×Cy as the feature maps and output of a convolutional
layer, respectively. Let us also define dx×dy as the size of the
convolutional kernel, and Cx×Cy as the size of the response
of convolutional layer for each feature map. Then, the response
of a convolutional layer becomes
Y¯py,cy =
∑
pk,px
〈W¯cy,pk , X¯px〉, (18)
where Y¯py,cy is the response for the 2-D spatial region py in
the cyth channel of the feature maps, W¯cy,pk ∈ RCx denotes
the weights of the cyth convolutional kernel, and X¯px ∈ RCx
is the input feature map at spatial position px. Hence we define
px and pk as the 2-D spatial positions in the feature maps and
convolutional kernels, respectively [42].
A. Training Data Generation
In order to train the network, we prepare a training dataset
for several channel realizations. We generate N different
channel realizations for K users. Next, each of these channel
matrices are corrupted by a synthetic noise for G realizations.
The noise is added to each term in the channel matrix
and we define the SNR for the training data generation
as SNRTRAIN = 20 log10(
|[H(n,g)k ]i,j |2
σ2TRAIN
), where σ2TRAIN is the
variance of synthetic noise. Note that [H(n,g)k ]i,j denotes
the (i, j)th entry of the kth channel matrix for the (n, g)th
realization with n = 1, . . . , N and g = 1, . . . , G.
The input of the network consists of three channels. In the
first channel, the absolute values of the entries in the channel
matrix are used. The second and the third channels include the
real and imaginary parts of the channel matrix, respectively.
This approach provides good features for the solution of the
problems [31]. Specifically, let X ∈ RNR×NT×3 be the input
of the network, then, for a channel matrix H ∈ CNR×NT , the
first channel of the input is given by [[X]:,:,1]i,j = |[H]i,j |.
The second and the third channels are given by [[X]:,:,2]i,j =
Re{[H]i,j} and [[X]:,:,3]i,j = Im{[H]i,j}, respectively.
The output of the network is composed of the analog
precoder and combiners. Let z ∈ RNTK+NRK be a real valued
vector, then we design the output as
z = [∠{vec(FRF)T },∠{vec(WRF)T }]T , (19)
where FRF ∈ CNT×K and WRF ∈ CNR×K . Hence the
input-output pair of the network is (X, z). We summarize the
data generation process in Algorithm 2. The total number of
inputs is T = NGK for K users. Note that the input data is
composed of each user channel information as in lines 7−12 of
Algorithm 2 and we record the analog precoder and combiner
associated with each user channel. Note also that the same
analog precoders are used for all noisy channel realizations.
This is to introduce synthetic noise in the input dataset to
make the network robust against the corrupted channel data
[13], [31].
0018-9545 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2019.2951501, IEEE
Transactions on Vehicular Technology
6
Fig. 2. (Top) The proposed network architecture. The input is the channel matrix of any user in the network and the output is the corresponding analog
precoder and combiners. (Bottom) The diagram for the training and prediction stage of the proposed DL framework.
Algorithm 2 Training data generation for CNN-MIMO.
Input: N , G, K, SNRTRAIN.
Output: Training data DTRAIN.
1: Generate N different realizations of the multi-user MIMO
scenario with channel matrices {H(n)k }Nn=1 and corre-
sponding feasible sets {F(n)}Nn=1, {W(n)}Nn=1 ∀k.
2: Initialize with t=1while the dataset length is T = NGK.
3: for 1 ≤ n ≤ N do
4: for 1 ≤ g ≤ G do
5: [H
(n,g)
k ]i,j ∼ CN ([H(n)k ]i,j , σ2TRAIN).
6: Using H(n,g)k , F(n), W(n) in Algorithm 1, find
Fˆ
(n,g)
RF and Wˆ
(n,g)
RF using q¯
(n,g)
F and q¯
(n,g)
W .
7: for 1 ≤ k ≤ K do
8: [[X(t)]:,:,1]i,j = |[H(n,g)k ]i,j |.
9: [[X(t)]:,:,2]i,j = Re{[H(n,g)k ]i,j} .
10: [[X(t)]:,:,3]i,j = Im{[H(n,g)k ]i,j} ∀ij.
11: z(t) = [∠{vec(Fˆ(n,g)RF )T },∠{vec(Wˆ(n,g)RF )T }]T .
12: Construct the input-output pair (X(t), z(t)).
13: t = t+ 1.
14: end for k,
15: end for g,
16: end for n,
17: Training data for CNN-MIMO is obtained from the col-
lection of the input-output pairs as
DTRAIN =
(
(X(1), z(1)), (X(2), z(2)), . . . , (X(T ), z(T ))
)
.
B. Network Architecture
The proposed network shown in Fig. 2 is composed of ten
layers. The first layer is the input layer accepting the channel
matrix data of size NR×NT×3 which denotes 3 ”channels”,
each of which has size equal to NR×NT. The second and the
fourth layer are the convolutional layers with 256 filters of size
2× 2 to extract the features hidden in the input data. We feed
the network with the real and imaginary parts of the channel
data which provides a large number of features [13], [30] to
be handled to help the network map and learn the input data
in accordance with their label data. After each convolutional
layer, there is a normalization layer to normalize the output
and provide better convergence. The sixth and eighth layers
are fully connected layers with 2048 units, respectively. There
are dropout layers after the fully connected layers (the seventh
and ninth layers) with a 50% probability. The dropout layers
make the network non-dependent on the initial weights. The
output layer is the regression layer with K(NR + NT) units
which include the phase information of the analog precoders.
In order to obtain the network parameters such as the number
of layers, number of filters and kernel sizes, we have conducted
a hyperparameter tuning process to achieve the sufficiently
good network accuracy and sum-rate performance [11], [13],
[30]. The current network architecture with a kernel size
2 × 2 is one possible solution of the considered problem
with similar/same performance with network structures having
different kernels. In other words, although different kernel
sizes can also be used for this problem, in this work, we have
first considered a hyperparameter tuning process providing the
sufficient performance for the considered scenario with less
computational complexity [11], [13], [30].
The computational workload of a CNN is the result of
intensive use of arithmetic operations in its layers. Most of
the operations occur on the convolutional parts of the network.
Hence, convolutional layers are responsible for more than 90%
of the execution time during the inference [43]. Conversely
to computations, most of the CNN weights are included on
the fully connected layers which require approximately 90%
of memory due to a large number of weights [43]. Hence,
the complexity of CNN is directly proportional to the number
of parameters and the number of layers. The layers of the
proposed CNN structure are described above and the number
of parameters can be calculated as C2
(
2Ncv(wh + 1) +
([Nfc1 +1]+[Nfc2 +1]) · 50100
)
[43]. Here, C = 3 corresponds
to the number of channels, w = h = 2 is the filter size,
and Ncv = 256 is the number of filters in both convolutional
0018-9545 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2019.2951501, IEEE
Transactions on Vehicular Technology
7
layers. The variables Nfc1 = Nfc2 = 2048 describe the
number of units in the fully connected layers for 50% dropout
probability. Hence, the CNN-MIMO structure in Fig. 2 has
41481 parameters.
C. Training
The CNN structure in Fig. 2 is realized and trained in
MATLAB on a PC with a single GPU and a 768-core
processor. We have used the stochastic gradient descent al-
gorithm with momentum 0.9 [44] and updated the network
parameters with learning rate 0.005 and mini-batch size of
500 samples for 100 epochs. As a loss function, we used the
MSE given by L = 1T
∑T
t=1
(
z(t)− f(X(t)))2 where f(X) is
a function of the input data X, which represents the nonlinear
transformation achieved by the network [11].
To train the proposed CNN structure, N = 500 different
multi-user scenarios are realized with K = 3 users (1500
channel realizations in total) as in Algorithm 2. For each
channel matrix, AWGN is added for different powers of
SNRTRAIN ∈ {15, 20, 25}dB with G = 100 to account
for different channel characteristics. The use of multiple
SNRTRAIN levels provides a wide range of corrupted data
in the training which improves the learning and robustness
of the network. Hence, the total size of the training data
is NR × NT × 3 × 450000. In the training process, 80%
and 20% of all generated data are selected as the training
and validation datasets, respectively. The validation aids in
hyperparameter tuning during the training phase to avoid
the network simply memorizing the training data rather than
learning general features for accurate prediction with new
data. The validation data is used to test the performance of
the network in the simulations for JT = 100 Monte Carlo
trials. In order to prevent the similarity between the test data
and the training data we also add synthetic noise to the
test data where the SNR during testing is defined similar to
SNRTRAIN as SNRTEST = 20 log10(
|[H]i,j |2
σ2TEST
). The number of
grid points is selected as L¯ = 60 for azimuth and L¯ = 20
for elevation angular sectors in Algorithm 1. In addition, the
propagation environment is modeled with L = 10 paths from
the users and all the user directions, i.e., all the azimuth
and elevation angles, are uniform randomly selected from the
intervals φ ∈ [−30◦, 30◦] and θ ∈ [−20◦, 20◦], respectively
[6]. We use sectorized angular range by selecting the antenna
gains gR(Θ
(l,k)
R ), gT(Θ
(l,k)
T ) as unity for these angular ranges
and zero otherwise to provide a sectorized angular interval in-
creasing the beamforming gain and reducing interference and
provide increased beamforming gain [5]. Hence, the training
data includes a large number of scenarios where the users
are randomly located. For each scenario, the corresponding
precoder and combiners are obtained by Algorithm 1.
The training stage takes about 5 hours for T = 450000
samples. This process includes both the labeling and the input
data generation. Note that the training stage is performed only
once. Then, in the prediction stage, it takes only milliseconds
to estimate the hybrid precoders as demonstrated in the sim-
ulations (please see Table I). Hence, the proposed approach,
providing high data rate and low latency, is quite attractive
since it meets the 5G requirements.
The trained network can work for different parameters such
as the number of users1 K, number of paths L, SNRTEST
and SNRTRAIN which motivates the practical implementation
of the proposed DL framework. The proposed CNN structure
requires to be retrained if there is a change in the parameters
like NT, NR, NRFT , which directly dictate the input and output
dimensions of the deep network. The performance of the
network also depends on the angular interval selected in D
when designing the feasible sets F and W as well as the
antenna gains obtaining sectorized angular intervals.
D. Prediction
Once the CNN-MIMO is trained offline as demonstrated in
Fig. 2, it can be used for the prediction of the hybrid beam-
formers. In order to generate the test data in the prediction
stage, we have picked users randomly from the validation
data and the synthetic noise is also added to the test data
with SNRTEST to eliminate the similarity between the test and
training datasets. The corrupted channel data of each user is
fed to the network and the analog precoders are predicted
from the output layer of the network. Then, their phases are
quantized in [0, 2pi] with 2B discrete points. Specifically, the
values of the quantized phases belong in the set { 2pib
2B
}2Bb=1 to
allow the realization of the analog precoder and combiners in
a hardware-efficient manner.
VI. NUMERICAL SIMULATIONS
In this section, we present the performance of the proposed
method, CNN-MIMO, via several experiments where we train
the network with the parameters described in Section V-B such
as N = 500, K = 3, G = 100, SNRTRAIN = {15, 20, 25} dB,
learning rate 0.005, batch size 500 and number of epochs 100.
We compare the performance of CNN-MIMO with state-of-
the-art hybrid precoding techniques such as the manifold op-
timization (MO) [45], the low-resolution hybrid beamforming
(LRHB) [8], SOMP [6] and the two-stage hybrid beamforming
(TS-HB) algorithm [7]. While manifold optimization and
SOMP were proposed for a single-user scenario, we adapt the
algorithms for the multi-user case by using the same strategy
for interference cancellation as in [7]. CNN-MIMO is also
compared with the DL-based approach MLP proposed in [27].
MLP is designed as described in [27] but adapted for the
multi-user scenario with the same training data used for CNN-
MIMO. As another benchmark and denoted as ”No interfer-
ence” in the simulations, we present the performance of fully-
digital beamforming and combining where the interference is
completely eliminated. In addition, the performance plot of the
precoders used in the test data (obtained from Algorithm 1) is
indicated as ”Algorithm 1” in the experiments.
In Fig. 3, we present the achievable sum-rate performance
of the algorithms with respect to different SNR levels. The
design parameters of CNN-MIMO are given in Section IV-
B. Moreover, we select the number of antennas per BS and
per user as NT = 36, NR = 9, respectively. Synthetic
1When the network is trained for KTRAIN users, the output size of the
network is z ∈ RNTKTRAIN+NRKTRAIN . Then we can use the trained
network for hybrid beamforming when there are K ≤ KTRAIN users by
substituting network output of size NTK + NRK × 1 corresponding to
those K users.
0018-9545 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2019.2951501, IEEE
Transactions on Vehicular Technology
8
-30 -25 -20 -15 -10 -5 0 5 10
SNR, [dB]
0
2
4
6
8
10
12
Su
m
-R
at
e 
[bi
ts/
s/H
z]
No Interference
Manifold Optimization
Algorithm 1
CNN-MIMO
LRHB
MLP
TS-HB
SOMP
-5.1 -5 -4.9
4.8
4.9
5
5.1
Fig. 3. Sum-rate versus SNR (NT = 36, NR = 9, K = 3, B = 3 and
SNRTEST = 20 dB).
noise is added to both the channel matrices and the array
responses with SNRTEST = 20 dB and B = 3 quantization
bits are used. The number of users is K = 3 and there are
L = 10 paths for each user. As a benchmark, we use the
fully digital beamforming and the MO algorithm which has
the best performance since it obtains near-optimum analog and
baseband precoders. Our CNN approach follows the perfor-
mance of the MO algorithm. In fact, CNN-MIMO provides
the highest sum-rate as compared to the other algorithms.
Notably, although LRHB is the state-of-the-art technique based
on phase extraction and it is regarded as the technique having
the best performance in the literature [8], we observe the
outperformance of CNN-MIMO. MLP has poorer performance
due to the lack of feature extraction that is achieved by the
convolutional layers in CNN-MIMO. In particular, the effec-
tiveness of CNN-MIMO can be attributed to the maximization
of the sum-rate by visiting all possible combinations for the
analog parts at both the receiver and transmitter side through
an exhaustive search and well-trained deep network. We can
point out that the ultimate performance from CNN-MIMO
can be obtained if CNN-MIMO yields the output exactly the
same as the labels obtained in Algorithm 1. Hence, we can
say that the performance of CNN-MIMO is limited by the
performance of Algorithm 1. We observe that the performance
of CNN-MIMO is close to Algorithm 1 where the gap between
these two is due to the corruption in the input data. SOMP
and TS-HB have poorer performance as compared to CNN-
MIMO. Especially, while SOMP was initially proposed for
the single-user case, we have adapted the algorithm for the
multi-user scenario where the analog precoders are designed
based on the similarity between the optimum precoder and
the analog precoders. As a result, SOMP does not always
find the optimum weights maximizing the sum-rate [31]. TS-
HB algorithm has better performance than SOMP since it is
based on the maximization of the sum-rate and its performance
converges to the same one as SOMP when there is a single
path from each user.
The feedback data, namely, the channel matrix {Hk}Kk=1
and the feasible array response sets F and W may not always
0 5 10 15 20 25 30 35 40
SNRTEST, [dB]
3
3.5
4
4.5
5
5.5
6
6.5
7
7.5
8
Su
m
-R
at
e 
[bi
ts/
s/H
z]
No Interference
Manifold Optimization
Algorithm 1
CNN-MIMO
LRHB
MLP
TS-HB
SOMP
(a)
0 5 10 15 20 25 30 35 40
SNRTEST, [dB]
0.06
0.07
0.08
0.09
0.1
0.11
0.12
0.13
0.14
0.15
0.16
RM
SE
Manifold Optimization
CNN-MIMO
LRHB
MLP
TS-HB
SOMP
(b)
0 5 10 15 20 25 30 35 40
SNRTEST, [dB]
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
RM
SE
Manifold Optimization
CNN-MIMO
LRHB
MLP
TS-HB
SOMP
(c)
Fig. 4. Performance comparison for corrupted channel data. In (a), sum-rate
versus SNRTEST is given whereas the RMSE for precoder FRF and combiner
WRF are shown in (b) and (c), respectively (NT = 36, NR = 9, K = 3,
B = 3 and SNR= 0 dB).
be perfectly available. In order to evaluate the performance
of the algorithms on the robustness against the corrupted
feedback, we simulate the performance of the algorithms
for different SNRTEST levels for the same setting as in the
previous simulation. In this case, complex AWGN was added
to both channel and array response data to resemble the
0018-9545 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2019.2951501, IEEE
Transactions on Vehicular Technology
9
1 2 3 4 5 6 7 8
Number of Quantization Bits
3.5
4
4.5
5
5.5
6
6.5
7
7.5
8
Su
m
-R
at
e 
[bi
ts/
s/H
z]
No Interference
Manifold Optimization
Algorithm 1
CNN-MIMO
LRHB
MLP
TS-HB
SOMP
Fig. 5. Sum-rate versus angular resolution of the analog precoders (NT = 36,
NR = 9, SNR= 0 dB, SNRTEST = 20 dB).
deviations in the feedback data. The results are presented in
Fig. 4 where we present the achievable sum-rate in Fig.4(a)
while the RMS error on precoder FRF and combiner WRF
are shown in Figs. 4(b) and 4(c), respectively. Note that
Algorithm 1 is fed with perfect CSI to demonstrate the best
achievable performance. As can be seen from Fig. 4, CNN-
MIMO is more robust against the corruption in the channel
data as compared to the other methods. Note that the manifold
optimization, LRHB, MLP, and CNN-MIMO are only affected
by the corruption in the channel data since they automatically
estimate the analog precoders, unlike SOMP and TS-HB which
require the feasible sets F and W as input. As a result,
the performance of TS-HB and SOMP heavily rely on the
accuracy of both the channel matrix and the array response
sets. Moreover, the knowledge of channel data and the feasible
sets F and W is only needed in the training stage of the
network to obtain the labels and it is not used in the prediction
stage. However, the other algorithms like SOMP and TS-HB,
require this information to solve the hybrid precoding problem.
Overall, these results show the robustness of the proposed
CNN-MIMO.
The analog precoders are designed with discrete phase
shifters with constant modulus to steer the beam in spatial
precoding. To assess the performance for the phase resolu-
tion in the phase shifters, we present the sum-rate of the
algorithms for different quantization resolutions where the
phases of the analog precoder and combiners are quantized
for B = {1, . . . , 8} bits. The results are depicted in Fig. 5
where we observe that the other algorithms converge after 4
bits while, remarkably, the proposed CNN approach achieves
higher sum-rate starting from one-bit quantization.
In Fig. 6(a), the performance is evaluated for varying
number of users, namely, K ∈ {2, . . . , 8} where L = 10
is fixed. Notably, CNN-MIMO performs better than the other
algorithms. In particular, the gap between ”No interference”
and CNN-MIMO becomes larger as K increases. We observe
that the performance of MLP becomes better than LRHB
after K ≥ 5 and exhibits robust performance like CNN-
MIMO with a certain performance loss. The main reason
is that the use of training data prepared with Algorithm 1
which provides more accurate beamformers than the other
algorithms. We also see that CNN-MIMO closely follows the
performance of Algorithm 1. However, this gap appears due
to the insufficient performance of interference cancellation.
Hence, it is suggested to develop more effective algorithms to
handle the interference among the users.
In Fig. 6(b), we evaluate the performance of CNN-MIMO
when the number of paths for each user is not fixed. Hence, we
train the network with the same parameters except selecting
L uniform randomly from the interval [1, 10]. Using varying
L values for different users reduces the similarity between the
channel data of users and we obtain satisfactory performance
of CNN-MIMO similar to the observations made when L is
fixed.
2 3 4 5 6 7 8
Number of Users
5.5
6
6.5
7
7.5
8
8.5
9
Su
m
-R
at
e 
[bi
ts/
s/H
z]
No Interference
Manifold Optimization
Algorithm 1
CNN-MIMO
LRHB
MLP
TS-HB
SOMP
(a)
2 3 4 5 6 7 8
Number of Users
3
4
5
6
7
8
9
Sp
ec
tra
l E
ffic
ien
cy
 [b
its/
s/H
z]
No Interference
Manifold Optimization
Algorithm 1
CNN-MIMO
LRHB
MLP
TS-HB
SOMP
(b)
Fig. 6. Sum-rate versus number of users. The number of paths is fixed as
L = 10 in (a), and L is selected uniform randomly in the interval [1, 10] in
(b) respectively. (NT = 100, NR = 9, SNR= 0 dB and SNRTEST = 20
dB).
In Fig. 7, we illustrate the performance for varying number
of BS antennas. As can be seen, similar observations can be
obtained. Specifically, CNN-MIMO performs better than the
other algorithms. Furthermore, we present the computation
times of the algorithms for a different number of BS antennas
in Table I in seconds. While the complexity of Algorithm 1 is
the highest due to the exhaustive search, DL-based approaches,
0018-9545 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2019.2951501, IEEE
Transactions on Vehicular Technology
10
0 20 40 60 80 100
Number of BS Antennas
1
2
3
4
5
6
7
8
9
Su
m
-R
at
e 
[bi
ts/
s/H
z]
No Interference
Manifold Optimization
Algorithm 1
CNN-MIMO
LRHB
MLP
TS-HB
SOMP
Fig. 7. Sum-rate versus number of BS antennas (K = 3, NR = 9, SNR= 0
dB and SNRTEST = 20 dB).
TABLE I
COMPUTATION TIMES (IN SECONDS).
NT Algorithm 1 CNN-MIMO MLP LRHB TS-HB SOMP
4 0.1061 0.0039 0.0034 0.0059 0.0093 0.0122
16 0.1164 0.0043 0.0038 0.0113 0.0103 0.0139
64 0.1175 0.0049 0.0045 0.0159 0.0108 0.0216
100 0.1242 0.0052 0.0049 0.0318 0.0125 0.0282
i.e., CNN-MIMO and MLP have the least computation time as
compared to LRHB and the rest. MLP appears slightly lower
complexity than CNN-MIMO due to its less complex structure,
however, it has poorer performance as was shown in the
previous experiments. In addition, regarding the complexity
of TS-HB and SOMP, given its dependence on the number
of elements in the feasible sets F and W , it is observed that
TS-HB has less computation time than SOMP since it does
not follow an OMP stage to obtain the precoders but it selects
the ones with the highest channel gain from the codebook
[7]. It is also worthwhile to mention the trade-off between the
computation time and the performance of CNN-MIMO. While
the MO algorithm has slightly better performance than CNN-
MIMO, the proposed DL framework provides a significantly
faster computation of the hybrid beamformers than the MO
algorithm. The complexity of MO also increases at a higher
rate than that of CNN-MIMO. This observation demonstrates
that CNN-MIMO is more useful in terms of computational
complexity even for a very large number of antennas which
is the case in 5G systems. Hence, we believe the proposed
approach can be a promising technique to be used in mmWave
systems where low complexity and robust performance are
required. The run times of CNN-MIMO can be further ac-
celerated by implementing the network in general-purpose
hardware such as FPGA. For example, domain-specific archi-
tectures have been implemented in [46] for AlexNet [43] and
VGG-16 for real-time image classification with 194 GOP/s
(billions of fixed-point OPerations per second) and consuming
only 300 mW. These promising results encourage us to develop
more energy-efficient DL approaches for the problems in
communications systems.
VII. CONCLUSIONS
We proposed a DL framework for hybrid precoding design
in multi-user mmWave MIMO systems. The proposed network
architecture is a CNN which accepts as input the channel
matrix of users and gives at the output the analog precoder
and combiners. The proposed technique was compared with
both optimization- and greedy-based approaches as well as
DL-based techniques such as MLP. The effectiveness of
the proposed CNN approach was evaluated through several
experiments and it is shown that CNN-MIMO achieves a
better performance than the state-of-the-art hybrid precoding
approaches as well as less computation time. The effectiveness
of CNN-MIMO can be attributed to the use of exhaustive
search to obtain the best analog precoders and combiners
in the training stage. In order to train the network, a large
training data, with a length of nearly half a million, was
used. Notably, large training data provides robust performance
against the deviations in the channel data. Moreover, we
showed that CNN-MIMO achieves more robust results in the
presence of imperfections regarding the channel matrix and
array responses.
REFERENCES
[1] R. W. Heath, N. Gonza´lez-Prelcic, S. Rangan, W. Roh, and A. M.
Sayeed, “An Overview of Signal Processing Techniques for Millimeter
Wave MIMO Systems,” IEEE J. Sel. Topics Signal Process., vol. 10,
pp. 436–453, April 2016.
[2] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. K.
Soong, and J. C. Zhang, “What Will 5G Be?,” IEEE J. Sel. Areas
Commun., vol. 32, pp. 1065–1082, June 2014.
[3] F. Rusek, D. Persson, B. K. Lau, E. G. Larsson, T. L. Marzetta,
O. Edfors, and F. Tufvesson, “Scaling Up MIMO: Opportunities and
Challenges with Very Large Arrays,” IEEE Signal Process. Mag.,
vol. 30, pp. 40–60, Jan 2013.
[4] L. Wei, R. Q. Hu, Y. Qian, and G. Wu, “Key elements to enable mil-
limeter wave communications for 5G wireless systems,” IEEE Wireless
Communications, vol. 21, pp. 136–143, December 2014.
[5] A. Alkhateeb, O. E. Ayach, G. Leus, and R. W. Heath, “Hybrid
precoding for millimeter wave cellular systems with partial channel
knowledge,” in 2013 Information Theory and Applications Workshop
(ITA), pp. 1–5, Feb 2013.
[6] O. E. Ayach, S. Rajagopal, S. Abu-Surra, Z. Pi, and R. W. Heath,
“Spatially Sparse Precoding in Millimeter Wave MIMO Systems,” IEEE
Trans. Wireless Commun., vol. 13, pp. 1499–1513, March 2014.
[7] A. Alkhateeb, G. Leus, and R. W. Heath, “Limited feedback hybrid pre-
coding for Multi-User millimeter wave systems,” IEEE Trans. Wireless
Commun., vol. 14, pp. 6481–6494, Nov. 2015.
[8] Z. Wang, M. Li, Q. Liu, and A. L. Swindlehurst, “Hybrid Precoder
and Combiner Design With Low-Resolution Phase Shifters in mmWave
MIMO Systems,” IEEE J. Sel. Topics Signal Process., vol. 12, pp. 256–
269, May 2018.
[9] M. Kokshoorn, H. Chen, Y. Li, and B. Vucetic, “Beam-On-Graph:
Simultaneous Channel Estimation for mmWave MIMO Systems With
Multiple Users,” IEEE Trans. Commun., vol. 66, pp. 2931–2946, July
2018.
[10] X. Zhai, Y. Cai, Q. Shi, M. Zhao, G. Y. Li, and B. Champagne, “Joint
Transceiver Design With Antenna Selection for Large-Scale MU-MIMO
mmWave Systems,” IEEE J. Sel. Areas Commun., vol. 35, pp. 2085–
2096, Sep. 2017.
[11] Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
no. 7553, pp. 436–444, 2015.
[12] D. Yu and L. Deng, “Deep learning and its applications to signal and
information processing [exploratory dsp],” IEEE Signal Process. Mag.,
vol. 28, pp. 145–154, Jan 2011.
[13] A. M. Elbir, K. V. Mishra, and Y. C. Eldar, “Cognitive radar antenna
selection via deep learning,” IET Radar, Sonar & Navigation, vol. 13,
pp. 871–880(9), June 2019.
0018-9545 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2019.2951501, IEEE
Transactions on Vehicular Technology
11
[14] Z. Jiang, S. Chen, A. F. Molisch, R. Vannithamby, S. Zhou, and Z. Niu,
“Exploiting Wireless Channel State Information Structures Beyond
Linear Correlations: A Deep Learning Approach,” IEEE Commun. Mag.,
vol. 57, pp. 28–34, March 2019.
[15] M. Feng and S. Mao, “Dealing with Limited Backhaul Capacity in
Millimeter-Wave Systems: A Deep Reinforcement Learning Approach,”
IEEE Commun. Mag., vol. 57, pp. 50–55, March 2019.
[16] H. Ye, G. Y. Li, and B. Juang, “Power of Deep Learning for Channel
Estimation and Signal Detection in OFDM Systems,” IEEE Wireless
Communications Letters, vol. 7, pp. 114–117, Feb 2018.
[17] H. Huang, J. Yang, H. Huang, Y. Song, and G. Gui, “Deep Learning
for Super-Resolution Channel Estimation and DOA Estimation Based
Massive MIMO System,” IEEE Trans. Veh. Technol., vol. 67, pp. 8549–
8560, Sep. 2018.
[18] Y. Long, Z. Chen, J. Fang, and C. Tellambura, “Data-Driven-Based
Analog Beam Selection for Hybrid Beamforming Under mm-Wave
Channels,” IEEE J. Sel. Topics Signal Process., vol. 12, pp. 340–352,
May 2018.
[19] N. Samuel, T. Diskin, and A. Wiesel, “Deep MIMO detection,” in 2017
IEEE 18th International Workshop on Signal Processing Advances in
Wireless Communications (SPAWC), pp. 1–5, July 2017.
[20] S. Wang, H. Liu, P. H. Gomes, and B. Krishnamachari, “Deep Reinforce-
ment Learning for Dynamic Multichannel Access in Wireless Networks,”
IEEE Transactions on Cognitive Communications and Networking,
vol. 4, pp. 257–265, June 2018.
[21] S. Do¨rner, S. Cammerer, J. Hoydis, and S. t. Brink, “Deep Learning
Based Communication Over the Air,” IEEE J. Sel. Topics Signal
Process., vol. 12, pp. 132–143, Feb 2018.
[22] V. Raj and S. Kalyani, “Backpropagating Through the Air: Deep
Learning at Physical Layer Without Channel Models,” IEEE Commun.
Lett., vol. 22, pp. 2278–2281, Nov 2018.
[23] C. Wen, W. Shih, and S. Jin, “Deep Learning for Massive MIMO CSI
Feedback,” IEEE Wireless Communications Letters, vol. 7, pp. 748–751,
Oct 2018.
[24] V. Raj and S. Kalyani, “Backpropagating through the air: Deep learning
at physical layer without channel models,” IEEE Commun. Lett., vol. 22,
pp. 2278–2281, Nov. 2018.
[25] P. Dong, H. Zhang, G. Y. Li, N. Naderializadeh, and I. Gaspar, “Deep
cnn based channel estimation for mmwave massive mimo systems,”
ArXiv, vol. abs/1904.06761, 2019.
[26] H. He, C. Wen, S. Jin, and G. Y. Li, “Deep Learning-Based Channel
Estimation for Beamspace mmWave Massive MIMO Systems,” IEEE
Wireless Communications Letters, vol. 7, pp. 852–855, Oct 2018.
[27] H. Huang, Y. Song, J. Yang, G. Gui, and F. Adachi, “Deep-Learning-
based Millimeter-Wave Massive MIMO for Hybrid Precoding,” IEEE
Trans. Veh. Technol., pp. 1–1, 2019.
[28] T. Lin and Y. Zhu, “Beamforming Design for Large-Scale Antenna
Arrays Using Deep Learning,” arXiv e-prints, p. arXiv:1904.03657, Apr
2019.
[29] A. Alkhateeb, S. P. Alex, P. Varkey, Y. Li, Q. Z. Qu, and D. Tujkovic,
“Deep Learning Coordinated Beamforming for Highly-Mobile Millime-
ter Wave Systems,” IEEE Access, vol. 6, pp. 37328–37348, 2018.
[30] A. M. Elbir, “CNN-based precoder and combiner design in mmWave
MIMO systems,” IEEE Commun. Lett., vol. 23, no. 7, pp. 1240–1243,
2019.
[31] A. M. Elbir and K. V. Mishra, “Joint Antenna Selection and Hybrid
Beamformer Design using Unquantized and Quantized Deep Learning
Networks,” arXiv e-prints, p. arXiv:1905.03107, May 2019.
[32] X. Yu, J. Shen, J. Zhang, and K. B. Letaief, “Alternating minimization
algorithms for hybrid precoding in millimeter wave MIMO systems,”
IEEE J. Sel. Top. Signal Process., vol. 10, pp. 485–500, Apr. 2016.
[33] E. Torkildson, C. Sheldon, U. Madhow, and M. Rodwell, “Millimeter-
Wave Spatial Multiplexing in an Indoor Environment,” in 2009 IEEE
Globecom Workshops, pp. 1–6, Nov 2009.
[34] R. Me´ndez-Rial, C. Rusu, A. Alkhateeb, N. Gonzlez-Prelcic, and R. W.
Heath, “Channel estimation and hybrid combining for mmWave: Phase
shifters or switches?,” in 2015 Information Theory and Applications
Workshop (ITA), pp. 90–97, Feb 2015.
[35] V. Raghavan and A. M. Sayeed, “Multi-antenna capacity of sparse
multipath channels,” IEEE TRANS. INFORM. THEORY, 2006.
[36] T. S. Rappaport, F. Gutierrez, E. Ben-Dor, J. N. Murdock, Y. Qiao,
and J. I. Tamir, “Broadband Millimeter-Wave Propagation Measurements
and Models Using Adaptive-Beam Antennas for Outdoor Urban Cellular
Communications,” IEEE Trans. Antennas Propag., vol. 61, pp. 1850–
1859, April 2013.
[37] I. A. Hemadeh, K. Satyanarayana, M. El-Hajjar, and L. Hanzo,
“Millimeter-Wave Communications: Physical Channel Models, Design
Considerations, Antenna Constructions, and Link-Budget,” IEEE Com-
mun. Surveys Tuts., vol. 20, pp. 870–913, Secondquarter 2018.
[38] H. Huang, J. Yang, H. Huang, Y. Song, and G. Gui, “Deep learning for
super-resolution channel estimation and doa estimation based massive
mimo system,” IEEE Trans. Veh. Technol., vol. 67, pp. 8549–8560, Sept
2018.
[39] Z. Marzi, D. Ramasamy, and U. Madhow, “Compressive Channel
Estimation and Tracking for Large Arrays in mm-Wave Picocells,” IEEE
J. Sel. Topics Signal Process., vol. 10, pp. 514–527, April 2016.
[40] J. Wang, Z. Lan, C. woo Pyo, T. Baykas, C. sean Sum, M. A. Rahman,
J. Gao, R. Funada, F. Kojima, H. Harada, and S. Kato, “Beam codebook
based beamforming protocol for multi-Gbps millimeter-wave WPAN
systems,” IEEE J. Sel. Areas Commun., vol. 27, pp. 1390–1399, October
2009.
[41] E. Bjo¨rnson, L. Van der Perre, S. Buzzi, and E. G. Larsson, “Massive
MIMO in Sub-6 GHz and mmWave: Physical, Practical, and Use-Case
Differences,” arXiv e-prints, p. arXiv:1803.11023, Mar 2018.
[42] J. Cheng, J. Wu, C. Leng, Y. Wang, and Q. Hu, “Quantized CNN: A
unified approach to accelerate and compress convolutional networks,”
IEEE Transactions on Neural Networks and Learning Systems, vol. 29,
no. 10, pp. 4730–4743, 2018.
[43] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” in Advances in Neural Infor-
mation Processing Systems, pp. 1097–1105, 2012.
[44] C. M. Bishop, Pattern Recognition and Machine Learning. Springer,
New York, 2006.
[45] X. Yu, J. Shen, J. Zhang, and K. B. Letaief, “Alternating Minimization
Algorithms for Hybrid Precoding in Millimeter Wave MIMO Systems,”
IEEE J. Sel. Topics Signal Process., vol. 10, pp. 485–500, April 2016.
[46] B. Sun, L. Yang, P. Dong, W. Zhang, J. Dong, and C. Young,
“Ultra Power-Efficient CNN Domain Specific Accelerator with
9.3TOPS/Watt for Mobile and Embedded Applications,” arXiv e-prints,
p. arXiv:1805.00361, Apr 2018.
Ahmet M. Elbir received the B.S. degree with
Honors from Firat University in 2009 and the
Ph.D. degree from Middle East Technical University
(METU) in 2016, both in electrical engineering. He
is the recipient of 2016 METU best Ph.D. thesis
award for his doctoral studies. He serves as an Asso-
ciate Editor for IEEE Access since 2018. Currently,
he continues his studies at the Dept. of Electrical and
Electronics Engineering, Duzce University, Turkey.
His research interests include array signal process-
ing, sparsity-driven convex optimization, signal pro-
cessing for communications and deep learning for array signal processing.
Anastasios Papazafeiropoulos [S’06-M’10-SM’19]
received the B.Sc. degree (Hons.) in physics, the
M.Sc. degree (Hons.) in electronics and computers
science, and the Ph.D. degree from the University
of Patras, Greece, in 2003, 2005, and 2010, re-
spectively. From 2011 to 2012 and from 2016 to
2017, he was with the Institute for Digital Com-
munications at The University of Edinburgh, U.K.,
as a Post-Doctoral Research Fellow. From 2012
to 2014, he was a Research Fellow with Imperial
College London, U.K., awarded with a Marie Curie
fellowship (IEF-IAWICOM). He is currently a Vice-Chancellor Fellow at the
University of Hertfordshire, U.K. He is also a Visiting Research Fellow at SnT,
University of Luxembourg, Luxembourg. He has been involved in several
EPSRC and EU FP7 projects such as HIATUS and HARP. His research
interests span machine learning for wireless communications, massive MIMO,
heterogeneous networks, 5G wireless networks, full-duplex radio, mm-wave
communications, random matrix theory, hardware-constrained communica-
tions, and performance analysis of fading channels.