Automated, unsupervised inversion of multiwavelength lidar data with TiARA: assessment of retrieval performance of microphysical parameters using simulated data DETLEF MÜLLER,1,2,* EDUARD CHEMYAKIN,2 ALEXEI KOLGOTIN,3 RICH A. FERRARE,4 CHRIS A. HOSTETLER,4 AND ANTON ROMANOV5 1University of Hertfordshire, College Lane, Hatfield, AL10 9AB Hertfordshire, UK 2Science Systems and Applications, Inc., NASA Langley Research Center, Mail Stop 475, Hampton, Virginia 23681-2199, USA 3A. M. Prokhorov General Physics Institute, Moscow Region 1117942, Russia 4NASA Langley Research Center, Mail Stop 475, Hampton, Virginia 23681-2199, USA 5The National University of Science and Technology, Leninskii av. 4, Moscow 119049, Russia *Corresponding author: d.mueller@herts.ac.uk Received 28 September 2018; revised 23 December 2018; accepted 2 January 2019; posted 4 January 2019 (Doc. ID 346995); published 18 June 2019 We evaluate the retrieval performance of the automated, unsupervised inversion algorithm, Tikhonov Advanced Regularization Algorithm (TiARA), which is used for the autonomous retrieval of microphysical parameters of anthropogenic and natural pollution particles. TiARA (version 1.0) has been developed in the past 10 years and builds on the legacy of a data-operator-controlled inversion algorithm used since 1998 for the analysis of data from multiwavelength Raman lidar. The development of TiARA has been driven by the need to analyze in (near) real time large volumes of data collected with NASA Langley Research Center’s high-spectral-resolution lidar (HSRL-2). HSRL-2 was envisioned as part of the NASA Aerosols-Clouds-Ecosystems mission in response to the National Academy of Sciences (NAS) Decadal Study mission recommendations 2007. TiARA could thus also serve as an in- version algorithm in the context of a future space-borne lidar. We summarize key properties of TiARA on the basis of simulations with monomodal logarithmic-normal particle size distributions that cover particle radii from approx- imately 0.05 μm to 10 μm. The real and imaginary parts of the complex refractive index cover the range from non- absorbing to highly light-absorbing pollutants. Our simulations include up to 25% measurement uncertainty. The goal of our study is to provide guidance with respect to technical features of future space-borne lidars, if such lidars will be used for retrievals of microphysical data products, absorption coefficients, and single-scattering albedo. We investigate the impact of two different measurement-error models on the quality of the data products. We also obtain for the first time, to the best of our knowledge, a statistical view on systematic and statistical uncertainties, if a large volume of data is processed. Effective radius is retrieved to 50% accuracy for 58% of cases with an imaginary part up to 0.01i and up to 100% of cases with an imaginary part of 0.05i. Similarly, volume concentration, surface-area concentration, and number concentrations are retrieved to 50% accuracy in 56%–100% of cases, 99%–100% of cases, and 54%–87% of cases, respectively, depending on the imaginary part. The numbers represent measurement un- certainties of up to 15%. If we target 20% retrieval accuracy, the numbers of cases that fall within that threshold are 36%–76% for effective radius, 36%–73% for volume concentration, 98%–100% for surface-area concentration, and 37%–61% for number concentration. That range of numbers again represents a spread in results for different values of the imaginary part. At present, we obtain an accuracy of (on average) 0.1 for the real part. A case study from the ORCALES field campaign is used to illustrate data products obtained with TiARA. Published by The Optical Society under the terms of the Creative Commons Attribution 4.0 License. Further distribution of this work must maintain attribution to the author(s) and the published article’s title, journal citation, and DOI. https://doi.org/10.1364/AO.58.004981 Research Article Vol. 58, No. 18 / 20 June 2019 / Applied Optics 4981 1559-128X/19/184981-28 Journal © 2019 Optical Society of America 1. INTRODUCTION NASA’s Aerosol-Clouds-Ecosystem (ACE) mission (https:// acemission.gsfc.nasa.gov/whitepapers.html) called for the de- velopment of active and passive remote sensors for investigating climate forcing by aerosol particles and clouds. One of the envisaged instruments is a multiwavelength high-spectral-reso- lution lidar (HSRL), which in its most sophisticated configu- ration would be a system that provides profiles of particle backscatter coefficients and particle linear-depolarization ratios at 355 nm, 532 nm, and 1064 nm, and extinction coefficients at 355 nm and 532 nm. NASA Langley Research Center has been developing two prototype HSRL instruments, dubbed HSRL-1 and HSRL-2, since 2000. These instruments are flown on aircraft and serve as research instruments and testing platforms for a potential space-borne HSRL. In this contribution, we present a summary of the perfor- mance characteristics of the Tikhonov Advanced Regularization Algorithm (TiARA). To our knowledge, TiARA is the world- wide first autonomous algorithm that allows for the automated, unsupervised retrieval of particle size distributions (PSDs) and their microphysical particle parameters based on the analysis of multiwavelength HSRL data. TiARA works with inversion with Tikhonov’s regularization [1]. TiARA allows us to infer uncertainty bars (in terms of bias and noise) from measure- ments of atmospheric aerosols with HSRL-2. In this first stage of the algorithm development, we have focused on optimi- zing TiARA for the case of spherical particle geometry, mainly for the lack of a light-scattering model that allows for a trust- worthy description of light-scattering at 180° of non-spherical particles. Thus, we analyzed backscatter coefficients only at 355 nm, 532 nm, and 1064 nm, and extinction coefficients at 355 nm and 532 nm, which we denote as “3β 2α” or “3 + 2” configuration. The software has been optimized for fast data processing to achieve real-time data analysis with regard to the duration of lidar measurements. The exact data processing times depend on the number of profiles and the number of individual points within a profile. Number of profiles and data points depend on the specifications of the used laser system, e.g., laser power and laser-pulse repetition rate, which influence signal-to-noise ratios of the input optical data, as well as other factors such as lidar return-signal averaging times and height of the observed atmospheric column. TiARA has been developed on the basis of our experience we have collected since 1997 [2] with our data-operator-controlled (manual) version of the inversion algorithm, e.g., [3–7]. Additional ideas and concepts were presented in, e.g., [8–10]. The initial main driver of developments of multiwavelength Raman lidar and lidar-data inversion methodology was the European Aerosol Research LIdar NETwork (EARLINET) [11] since the late 1990s. Data-analysis concepts were devel- oped through working with manually operated inversion algo- rithms. These concepts have been included and automated in TiARA. Additional concepts had to be developed for TiARA to ensure reliable data products of high quality. The general result of simulation studies carried out with the data-operator version of the inversion algorithm pointed towards 20% as the threshold for acceptable measurement uncertainty for retrieving meaningful values of particle effective radius, volume concentration, and surface-area concentration [8–10,12]. Meaningful result in that regard was defined as values that deviate less than 50% from the true values. Time-consuming manual analysis of the inversion results allowed us to identify and remove outliers in the final solution spaces. We present for the first time a detailed analysis of the retrieval performance with regard to number concentration. This paper focuses on the main performance features of TiARA based on an extensive study of results obtained with syn- thetic optical input data. We obtain, in part, significantly im- proved performance compared to the data-operator-controlled inversion algorithm in the case of PSD size parameters. Retrievals of high-quality complex refractive indices remain a challenge and will require more sophisticated approaches. First steps in that direction have recently been published [13–16]. We discuss the concept of uncertainty analysis we de- veloped in the context of TiARA, which includes separating bias (accuracy) and statistical noise (precision) of the inferred data products. We further note that first tests of the performance of TiARA with regard to experimental data focused on comparisons to airborne measurements of the same data products delivered by in situ instruments during two major field activities, i.e., the Two Column Aerosol Project [TCAP; https://www.arm.gov/research/ campaigns/amf2012tcap] and Deriving Information on Surface conditions from Column and Vertically Resolved Observations Relevant to Air Quality [(DISCOVER-AQ; https://discover-aq. larc.nasa.gov/]. Publications by [17,18] discuss in detail the quality of the data products of TiARA and point to the (in part) excellent performance characteristics of TiARA with regard to parameters of particle size distributions, i.e., effective radius and volume and surface-area concentrations. In Section 2, we present the theoretical background of TiARA. Results of simulations with synthetic data show the main features of TiARA in Section 3. In Section 4, we show a measurement example taken with HSRL-2. We summarize the main results in Section 5. The Appendix provides addi- tional background information on the simulations. 2. INVERSION ALGORITHM We summarize the general theory of the inversion methodol- ogy. We have published numerous papers on the manual inversion in the past 20 years. A summary of that work can be found in Ref. [19]. In the following text, we introduce a coherent nomenclature that will be used in future publications. We explain the new features that were introduced in the autonomous inversion, e.g., optimized look-up tables, fast data processing, and tools that are used in uncertainty analysis. A. Theory We solve Fredholm integral equations of the first kind [8,10,19,20] in the inversion of the multiwavelength HSRL and multiwavelength Raman lidar data. In generalized terms, these equations can be written as gl λi  Z ∞ 0 K l m, r, λi , sf rdr, (1) 4982 Vol. 58, No. 18 / 20 June 2019 / Applied Optics Research Article where gl λi are the true values of the optical data at the measurement wavelength λi. The subscript l denotes the type of optical data (β–backscatter and α–extinction). The terms K βm, r, λ, s and K αm, r, λ, s describe the backscatter and extinction kernel functions, respectively, which depend on the particles’ complex refractive index (CRI) m  mR − i · mI (mR denotes the real part, i · mI denotes the imaginary part), on the radius r of the particles, as well as on their shape properties s. In the following discussion, we consider only spherical particle geometry. The kernel functions K l m, r, λi, s can be calculated from Mie-scattering theory [21]. The term f r describes the true particle size distribu- tion expressed as the number of particles per unit volume be- tween particle radius r and r  dr. For easier reading, the reference to the shape property s will be omitted, as we do not need to introduce a numerical value for it. The modified version of Tikhonov’s inversion algorithm is explained in detail by [8,10,12,19,20,22]. Briefly, Eq. (1) can be rewritten in the following form: gp  Z rmax rmin K pm, rf rdr  ϵlimitsp , (2) where gp are the true values of the optical data, and subscript p  l , λi summarizes the kind and wavelength of the optical input data. The terms rmin and rmax denote the lower and upper integration limits of the particle radii within which the inver- sion is performed. We denote this radius range as rmin, rmax. The term ϵlimitsp is the mathematical residual error. This error results from the fact that the reconstruction of the PSD, which is one of the steps in this data inversion, does not perfectly re- produce the size limits of the true PSD f r; see Fig. 1. We will discuss this specific feature in a separate contribution in the context of uncertainty analysis that not only considers the effect of measurement uncertainties but also the effects of uncertain- ties introduced by the mathematical algorithm itself. Equation (2) has to be solved numerically [1,23–25] for the investigated size distribution f r. The expression f r is approximated by a sum that consists of the following linear superposition of base functions: f r  f r  ϵr  XNB j1 f jBjr  ϵr: (3) The term f r is an approximation of the solution of Eq. (2). The expression f j describes the constants or weight coefficients, and Bjr denotes base functions that, in our case, have a triangular shape on a logarithmic radius scale [12,20]. These functions have shown to work sufficiently well in approximating PSDs of monomodal and bimodal shapes [9,10,12]. NB is the number of base functions. In our algo- rithm, NB is always equal to or exceeds the number of optical data. A different concept is followed by [8,9], who prefer to keep the number of base functions nearly equal to the number of input optical data. Figure 1 shows the concept of the reconstruction of the in- vestigated PSDs. Details are given by [12,20]. Briefly, we use base functions of triangular shape that are distributed equidis- tant on a logarithmic radius grid between 10 nm and (in its latest version) 20 μm. The values f j are the weight factors that we obtain in Eq. (3). The weight factors allow us to reconstruct the true PSD in an approximate way. The differences in the reconstruction quality depend on a large number of factors, e.g., measurement uncertainties; if the true PSD is mono- or bimodal; if the main portion (maximum) of the PSD is located in the size range of Aitken mode particles (radius range below 50 nm) or accumulation mode particles (radius range approximately between 50 nm and 500 nm) or coarse mode (particle radius is larger than 500 nm); the fact that the inte- gration limits (minimum and maximum particle radii) are not known; the possibility to reconstruct particle size distributions with base functions of triangular shape; the number of base functions used in the reconstruction process; or the value of the complex refractive index of the particles and whether it depends on particle size and/or measurement wavelength. The solution carries the mathematical residual error ϵr, which is caused by the discretization of the true PSD with the chosen linear combination of base functions. Using Eqs. (2) and (3), we can write the optical data as linear combination, i.e., gp  XNB j1 Apjmf j  ϵp: (4) The expressions Apj and ϵp are calculated from the kernel func- tions, base functions, and mathematical residual errors as Apj  Z rmax rmin K pm, rBjrdr, (5) and ϵp  Z rmax rmin K pm, rϵrdr  ϵlimitsp : (6) We may write the observed optical data as vector g  gp , the weight coefficients as vector f  f j, and the errors as vector ϵ  ϵp. In that way, we can rewrite Eq. (4) in the following vector–matrix form: g  Af  ϵ. (7) The rectangular matrix A  Apj is the weight matrix, the elements of which are calculated from Eq. (5). In the following transformations, it is convenient to rewrite Eq. (7) as the vector–matrix equation Af  g: (8) The unknown vector of the weight coefficients f is connected by the matrix operator A with the optical data vector g. This Particle radius, logarithmically equidistant scale rmin rmax 1 2 3 4 5 Fig. 1. (left) Example of the qualitative reconstruction of a PSD with five base functions of triangular shape distributed in the radius interval rmin, rmax. The radius interval is denoted as inversion window in the text. (right) Eight base functions allow us to reconstruct bimodal size distributions. Research Article Vol. 58, No. 18 / 20 June 2019 / Applied Optics 4983 vector contains the error vector Δg, which includes in an ad- ditive way the measurement error and the mathematical residual errors. The operator A is nearly singular (we are dealing with an ill-conditioned operator); see, e.g., [26]. If we denote the transposed of matrix A as AT and ATA−1 denotes the inverse of matrix ATA, we obtain the simple solution of Eq. (8) for the weight factors, i.e., f  ATA−1ATg: (9) Equation (9) leads to nonphysical or unstable solutions [26] because the mathematical problem is ill posed. For that reason, we consider the following equation instead of Eq. (8) [1,26]: ATA  γHf  ATg: (10) The expression H is a smoothing matrix operator [26] that de- scribes the physical constraint that the retrieved PSD does not show large oscillations within an inversion window. The spe- cific form of H influences the maximum difference between the weight factors of successive base functions. γ is the non- negative regularization parameter or Lagrangian multiplier, which determines the degree of smoothing of the investigated PSD along the grid bins, i.e., the strength of the operator H. We search for a value of γ for which, in simple terms, Eq. (11) becomes a minimum. An exact description of this standard method can be found in, e.g., [20]. The matrix C  ATA  γH is symmetrical. We can apply diagonalization, e.g., [27] and find eigenvectors and eigenvalues. The above-mentioned near singularity of the operator A means that the smallest eigenvalue of matrix ATA is much smaller that its largest eigenvalue—10 orders of magnitude and even more. More information can be found in Ref. [28]. Regularization allows us to increase the minimum eigenvalue of matrix C by several orders of magnitude. In this way the condition number of matrix C decreases, and the solution of Eq. (10) becomes more stable than the solution of Eq. (9). B. Mathematical Tools and Methods Used in the Automated Unsupervised Algorithm Finding the solution requires the application of several con- straints. These constraints stabilize the inversion problem and help us reject mathematical solutions that are physically not rea- sonable. We use the simplifying assumption of a wavelength- and size-independent complex refractive index of the aerosol particles. The effect of this constraint on the quality of the retrieval products will be investigated in future studies. We as- sume that retrieving the wavelength dependence of the CRI likely will require that we combine data from active and passive remote sensing. The application of the gliding-inversion window and the grid of a priori chosen CRI values are the constraints that will be explained in the following subsections. 1. Inversion Window and Grid of CRI This section serves as an introduction and motivation for iden- tifying the solution space. The methodology that allows us to find the solution space is described in detail in Section 2.D. The rationale for using a gliding inversion window is given by [12,20]. In summary, the precise position of the investigated particle size distribution with regard to the particle size range is unknown unless supplementary information, e.g., from other particle size distribution measurements, is known. We solve this challenge by using a gliding inversion window. This window consists of the linear combination of base functions of variable positions along the investigated size range and variable width; see, e.g., [20]. Historically, we started out with 50 inversion windows [3,20]. We find that the quality of the retrieval can be improved if we use more inversion windows on a smaller radius search grid. The automated algorithm uses 91 inversion windows on the basis of an optimized databank of look-up kernel files for the processing of each 3 + 2 dataset. Figure 2 demonstrates the simplified example of how the set of inversion windows is formed in the automated inversion algorithm. This concept requires us to predefine two sets consisting of several minimal andmaximal particle radii. The two sets we used in our simulations might have an insignificant overlap. The top part of Fig. 2 shows a qualitative example of three entries in both minimal and maximal radii sets. The given example results in nine inversion windows because each of the three minimal radii rmin forms a separate inversion window rkmin, rkmax with each of the three maximal radii rmax [see Fig. 2]. The linear size of the inversion window, i. e., the difference rkmax − rkmin, in our simu- lations is always more than 0.38 μm.We exclude the narrow in- version windows because we consider them to be nonphysical. We discretize the CRI search space; see Fig. 2, shaded area. The real part of CRI, mR , ranges from 1.325 to 1.8 with step 0.025. The imaginary part, mI, varies from 0 to 0.1i with step 0.003i. We plan to extend the CRI search grid in our future studies. The real part of CRI, mR , will cover the range from 1.2 to 2 with step 0.025 [see Fig. 2]. Particle radius particle radius particle radius rmin,(1) rmax,(1) rmin,(2) rmax,(2) rmin,(3) rmax,(3) rmin,(4) rmax,(4) rmin,(5) rmax,(5) rmin,(6) rmax,(6) rmin,(7) rmax,(7) rmin,(8) rmax,(8) rmin,(9) rmax,(9) 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 CR Ii m ag in ar y pa rt CRI real part Fig. 2. (top) Gliding inversion window. We show the arrangement of the nine inversion windows. (bottom) Grid of the complex refrac- tive indices. The meaning of the grey-shaded area (real part from 1.325−1.8) is described in the text. 4984 Vol. 58, No. 18 / 20 June 2019 / Applied Optics Research Article In this way, we arrive at a set of individual mathematical solutions. Each solution number k within the complete set of solutions is characterized by an inversion window rkmin, rkmax, by a CRI value mk  mkR − mkI i, by a vector f k of weight coefficients, and by a vector gk of backcalculated optical data: Akjf kj  gk: (11) We calculate what we describe as the modified discrepancy vector [8] using the vectors f k and gk. We obtain ρk  gk − g, (12) ρk  Akjf kj − g, (13) and the normalized discrepancy value kρkk  1 NO XNO j1 kρjkk gj 100%: (14) The symbol ∥ means that every element of the vector f k is converted into its absolute value, NO is the number of optical data, ρkj denotes the jth component of vector ρk, and g k j denotes the jth component of vector gk. This expression does not take account of measurement uncertainties. We expect improved retrieval results if we take account of this information, too. We are working on an im- proved uncertainty analysis scheme. 2. Optimized Look-Up Tables The most time-consuming procedure during the microphysical parameters retrieval is the calculation of the values of the matrix A with the help of Eq. (5). According to our estimations, the direct computation of the light-scattering kernel functions K pm, r consumes more than 90% of the whole inversion time budget. In order to reduce computation time, we improved the efficiency of our algorithm by using look-up tables (LUTs). The elements K pm, r of the LUTs are pre-computed for the radius range r defined by the gliding inversion window and the grid of complex refractive indices. The automated algorithm uses an upgraded version of an LUT that was used in the manual version of the algorithm [12,20], which we denote as the “optimized databank.” This optimized databank contains the pre-integrated values of the matrix A. The optimized LUT is based on the following numerical properties of Eq. (5):Z rmax rmin K l ,nm, r, λBrdr   λ λˆ  3 Z λˆ λrmax λˆ λrmin K l ,nm, r, λˆBˆrdr, (15) Z rmax rmin K l ,sm, r, λBrdr  λ λˆ Z λˆ λrmax λˆ λrmin K l ,sm, r, λˆBˆrdr, (16) Z rmax rmin K l ,vm, r, λBrdr  Z λˆ λrmax λˆ λrmin K l ,vm, r, λˆBˆrdr, (17) where λ is the reference wavelength of the LUT, and λˆ is another wavelength. The term K l ,m, r, λ describes the back- scatter (l  β), extinction (l  α), absorption, and scattering number (n), surface area (s), and volume (v) kernel functions. These 4 kernel functions are calculated from Mie-scattering theory [12,20,21]. The triangular base functions Br and Bˆr are defined on the coordinated radii ranges rmin, rmax and λˆλ rmin, λˆλ rmax, respectively. Our three optimized LUTs were created using Eq. (15) for the number, Eq. (16) for the surface area, and Eq. (17) for the volume kernel functions [12,20]. Each LUT consists of four separate files that contain the results of the integration of the left sides of Eqs. (15)–(17) for the corresponding backscatter, extinction, absorption, and scattering kernel functions. The reference wavelength λ is always equal to 355 nm. We split the covered range of particle radii from 10 nm to 10 μm into 71 logarithmical equidistant grid bins r1,…, r71. These grid bins allow us to form 69 integration intervals rj1 , rj12, where j1  1,…, 69. Each interval has three grid bins rj1 , rj11, and rj12 that are used to create a triangular base function bj1r [see Fig. 3], and then carry out the integration with the corresponding kernel function following the left sides of Eqs. (15)–(17). The complex refractive index m ranges in the LUTs from 1.2 to 2 with step 0.005 for the real part (mR), and from 0 to 0.5 with step 0.0005 for the imaginary part (mI). For the practical purposes of data inversion, we normally form each of the NB triangular base functions Bjr [see Eqs. (4)–(5)] by using an odd number of optimized LUT base functions bj1r. Figure 3 shows a qualitative example where three functions bj1r were used to build one triangular base function Bjr that mathematically can be expressed as Bjr  1 2 bj1r  bj11r  1 2 bj12r: (18) The term Bjr in Eqs. (5) and (18) should be treated as Bˆr on the right-hand side of one of the Eqs. (15)–(17). Thus, the result of the right-side integration of any of the Eqs. (15)–(17) can be linearly expressed through the left-side values that were pre-calculated and stored in the optimized LUT. The other desired triangular base functions Bjr can be formed using lin- ear combinations of the LUT base functions b1r,…, b69r, which allows us to use the optimized LUT in a wide range of practical situations. Another difference to the original version of the LUT used by [12,20] is that we use LUTs not only for backscatter and extinction but also for absorption and scattering coefficients. Backscatter and extinction kernels are used for the inversion itself. We introduced absorption and scattering elements in order to speed up the computations of absorption and scatter- ing coefficients, and single-scattering albedo from the retrieved bj1(r) bj1+1(r) bj1+2(r) rj1 rj1+1 rj1+2 rj1+3 rj1+4 ln r Bj(r) 0 1 Fig. 3. Example of the linear combination of three optimized LUT base functions bj1r, bj11r, and bj12r into one inversion triangular base function Bjr. Research Article Vol. 58, No. 18 / 20 June 2019 / Applied Optics 4985 microphysical parameters. The optimized data bank can be stored in the operational memory of a personal computer, which allows us to further reduce data processing time. Storage of the original LUT that is used for the manual version of the algorithm is done on a harddrive. Another new feature of the optimized LUT is that inter- mediate inversion windows and/or CRIs can be used. We estimate these intermediate values from the entries of the matrices A by linear interpolation. 3. Parallelized Routines The inversion of lidar data allows for efficient parallelization of the computation steps. We discretize the search space in terms of CRI and inversion windows, which allows us to compute a finite set of input parameters. This set of input parameters forms the set of mathematical solutions. The analysis of Tikhonov’s regularization curve, computa- tion of the weight coefficients of the base functions, and com- putation of the microphysical parameters and their uncertainty bars can be done for each individual solution independently of each other. This feature allows us to split the inversion calcu- lations into independent computational threads, each one com- puting the mathematical solutions of the subset that has been selected for that specific thread. Within one application, each thread can be launched on a separate core of a multi-core shared-memory system. According to our experience, the time of calculations for such a sys- tem decreases almost linearly with increasing number of used cores. For example, if the inversion code runs in parallel mode on a 64-core server using all available cores, the computation time decreases by a factor 63 compared to computations with a single-core system. We implemented the software on an AMD Opteron server. It has four CPUs, 64 integer cores, and 32 floating point cores, and the processing frequency is 2.8 GHz. It takes slightly less than 6 s to analyze one set of 3 + 2 data, including a full uncertainty analysis, as described in Section 2.C. For comparison, we previously used a laptop with one Intel CPU, four cores, eight threads, and 2.4 GHz processing frequency. It took a little more than 22 s for a full analysis of one 3 + 2 dataset. Further increases of data processing speed are possible if we increase the number of cores. The speed of the microprocessors will increase in the future. We will also further improve the use of mathematical algorithms in order to increase the speed of the inversion algorithm. Our code also can be compiled and executed using standard message passing interface (MPI) libra- ries. The improvement of performance with the number of involved cores in this case will strongly depend on the features of the hardware of employed distributed-core systems. Finally, we can also increase the data processing speed if we use important auxiliary information that can be provided by multiwavelength HSRL or Raman lidar. For example, aerosol typing [29] is one way of reducing the search space in data in- version, which automatically increases the data processing speed. We achieved one of the main goals of our work set out several years ago. We are able to carry out real-time data analysis of airborne operation of HSRL-2 in the sense that there is no difference between the flight time of a research flight and the time it takes to deliver the final microphysical particle data products. C. Uncertainty Analysis 1. General Remarks Error computation is done in a numerical fashion, i.e., we in- vert the mean value of the optical dataset, and then we distort the optical data within the error bars. This procedure is time consuming and affected by conceptual uncertainties. We will explore these concepts in future work. The errors that we show in the results section are the com- bination of uncertainties resulting from uncertainties of the op- tical data and uncertainties/errors created by the inversion algorithm. We are working on concepts that will allow us to separate the uncertainties caused by the mathematically in- duced errors of the inversion algorithm and the uncertainties of the optical data. Mathematically induced errors are, e.g., choice of the grid of the CRI, choice of number and position of the inversion windows, choice of shape of the base functions, assumption that the CRI can be kept wavelength and size in- dependent in the retrievals, correct choice of the regularization parameter, and choice of regularization method. The error induced by the choice of the grid of the CRI presents a special challenge in uncertainty analysis. The CRI in itself is an unknown quantity in the data inversion scheme. In this first version of TiARA, we analyze the uncertainty caused by selecting only those individual inversion results that are defined by the minimum of the regularization curve [12,20]. We are aware that this minimum of the regularization curve may not always provide us with the best possible inver- sion result of the investigated PSDs. It is a challenge to process measurement uncertainties in data inversion in a realistic manner. Measurement uncertainties do not propagate during the inversion process in a way that allows for using an analytical expression of the effect of measurement un- certainties on the final data products. Small measurement uncer- tainties may lead to large uncertainties of the final data products. We know of only a few studies in which the authors inves- tigated the effect of uncertainties in individual measurement channels on the total uncertainties [30]. The authors found that uncertainties are additive with respect to the retrieval scheme they use. [31] used the methodology of optimal esti- mation to illustrate the propagation of measurement uncertain- ties in a 3 + 2 lidar system. The authors confirm that the mathematical system of equations is underdetermined and con- tains approximately four independent pieces of information. The authors further find that their retrieval procedure reduces the uncertainty with respect to the a priori value for all five retrieved variables. In particular, the comparison of the re- trieved data products to the desired target parameters is best for the size distribution parameters and worst for the complex re- fractive index. [32] explored the limitations of the retrievals with regard to particle size. Preliminary results indicated signifi- cantly increased retrieval uncertainties for particle radii below ≈50 nm (see Tables 1–4 in Ref. [32]). These results are strongly corroborated by the study of [31], who demonstrate that limited sensitivity to particle radii smaller than 50 nm leads to greatly increased uncertainties unless the retrieval size range is constrained to avoid them. One of the characteristics of ill-posed inverse problems is that a small amount of uncertainties or even the absence of uncer- tainties in the input data can lead to large uncertainties/errors 4986 Vol. 58, No. 18 / 20 June 2019 / Applied Optics Research Article of the final data products. There remain questions of how to treat measurement uncertainties in uncertainties analysis, how the er- ror distribution function affects the quality of the inver- sion results, and how big measurement uncertainties can be until the inversion results become meaningless. Studies published in mathematical literature often use unrealistically low uncertain- ties, i.e., input data uncertainties often do not exceed 1%. A description of how to treat measurement uncertainties in the retrieval process and uncertainty analysis based on the optimal estimation approach is given in Ref. [33]. [31] does not carry out an optimal estimation retrieval, but the uncer- tainty analysis used the optimal estimation method. Table 4 in Ref. [31] shows, on the basis of results for three different sets of measurement uncertainties of different sizes, how big measurement uncertainties can be. We currently work on an improved uncertainty analysis scheme and will consider in that study also the techniques used in the Generalized Aerosol retrieval from Radiometer and LIdar Combined data (GARLIC) and LIdar-Radiometer Inversion Code (LIRIC) algorithms, e.g., [34]. The formalism that describes accuracy and precision can be written as accuracy ≔ kf true − f avek, (19) precision:σ: (20) 2. Error Models We use a scheme we denote as “extreme-error model” (EEM), which has been used for the manual version of the inversion method in the past 20 years [3]. This method allows us to com- pute the uncertainty of the data products on the basis of a very simple assumption: if the slope of the extinction-related Ångström and/or backscatter-related Ångström exponent changes, the microphysical parameters also change, though not necessarily in a proportional way. We are aware that this extreme-error scheme comes with certain flaws. For example, it is an intuitive model in which we assume that extreme dis- tortions of the spectra of backscatter and extinction coefficients result in proportional distortions of the solution space of micro- physical parameters. This may not always be true, and we lack in a theoretical model that would help us in considering im- provements. However, the model has been used with rather good success in the past years, e.g., [35], and currently is our preferred model for uncertainty analysis. Because of this situation, we introduced for the first time another model that could be used in future uncertainty analysis. We call it the “Gauss-distributed-error model” (GEM). The uncertainty bars have a Gaussian-curve-like shape. We started exploring if such a model could provide us with a mathemati- cally sound uncertainty analysis and even replace EEM in future versions of TiARA. Both uncertainty-analysis methods will be explained in the following. Figure 4 is a graphical display of the two error models used in the simulation studies. Aside from the traditional way of expressing the uncertainty of the retrieval products in terms of error bars of one standard deviation, we also explored whether it is possible to split the uncertainty bars into accuracy (bias) and precision (noise). A first set of results on the accuracy of the derived parameters can be obtained from the statistical analysis of the simulation results; see Section 3.C. We will further develop this concept of accuracy and precision in future versions of TiARA. In the case of the EEM, we used a fixed error level for each of the five measurement channels. Table 1 shows the basic prin- ciple of this method. We used in our simulations uncertainty values of 5%, 10%, 15%, 20%, and 25%; see Table 5. Figure 4 (top) shows an example of the distortions for the case of using two extinction coefficients in the EEM. The number of combinations of the extinction input data is 32. We consider the three options of correct value, maximum underestimation, and maximum overestimation for a given un- certainty, respectively. In the case of the three backscatter co- efficients, 33 combinations are possible. Figure 4 (bottom) shows the example of the GEM. We show the spread around the “correct” optical data point. We define the measurement uncertainty as the value μ at which this Gauss-error function reaches its peak value. For example, run number 1 describes the situation in which the backscatter and extinction coefficients at 355 nm and 532 nm are overestimated by x%, whereas the backscatter EX TI N CT IO N CO EF F. (a .u. ) WAVELENGTH (nm) 355 532 Fig. 4. Illustration of (top) extreme-error model (EEM) and (bottom) Gauss-error model (GEM). The figure is explained in the text. Table 1. Distortion of Optical Data in EachMeasurement Channel in the Case of the EEMa Run Number β355 β532 β1064 α355 α532 1 + + − + + 2 + + − − − 3 + + − + − 4 + + − − + 5 − − + + + 6 − − + − − 7 − − + + − 8 − − + − + aThe + sign indicates that a data point in one of the five measurement channels is changed by +x%, where x% denotes one of the following values: 5%, 10%, 15%, 20%, or 25%. The − sign indicates that a data point is changed by −x%. Run number denotes the eight different inversion runs that we carried out for each optical data set. Research Article Vol. 58, No. 18 / 20 June 2019 / Applied Optics 4987 coefficient at 1064 nm is underestimated by x%. Theoretically, except for the case of 0% error, we find 242 combinations that could be tested for EEM. However, we can remove many of the scenarios for obvious reasons. Three of these combinations merely lead to a shift in the backscatter and extinction spectra, e.g., in the case of 5% error, there is the case of 0% shift (correct data),5% shift for all five data points, and −5% shift for all data points. Some of the dis- tortions are physically unreasonable, e.g., backscatter coefficients at 532 nm are lower than backscatter coefficients at 355 nm and 1064 nm, and could be identified in the optical profiles. With regard to the EEM, we therefore used nine inversion runs. One run is for the error-free data, and eight runs are for the distorted backscatter and extinction spectra according to Table 1. We then average the results of these nine runs, which gives us the final value of the data product including uncertainy bars. Based on our experience with the manual in- version, we know that this number is at the lower limit of error runs, but still give us statistically meaningful results [36]. With regard to the GEM, the error computation is done for a set of eight distorted input data, which are within the error bounds prescribed for each measurement channel according to Gaussian distribution and the error-free dataset, i.e., we carry out nine different inversions for each optical dataset as in the case of the EEM. We used a random number generator that produced data uncertainties according to the Gaussian-error distribution func- tion; see Fig. 3. This was done for each measurement channel and independently of each other. The mean value of each optical data point, which is the observed value and which we denote in the context of this work as correct value, is located at the μ-position of the Gauss distribution function. The deviation from the μ-position denotes the measurement uncer- tainty. If we denote the measurement uncertainty as x%, this value is reached at the 3-σ position of the error-distribution function, i.e., σ  x∕3. Accordingly, the distribution function is defined by its mean value and the geometric standard deviation σGM, which fully describes the measurement uncer- tainty around the exact value of the optical data point. In our simulations, we set the one standard deviation, de- noted as 1σ in Table 6, and constrain the maximum value of distortion—about three times larger than 1σ in Table 6. Following the formalism of EEM, we average the results of these nine runs, which gives us the final value of the data product including uncertainty bars. D. Unsupervised Identification of the Final Solution Space 1. Method The ill-posedness of the mathematical problem does not allow us to select one single solution for each of the final data prod- ucts. The use of constraints on the one hand reduces the instability of the inversion problem. On the other hand, the introduction of constraints allows us to find a match to the sought microphysical solutions, but we cannot perfectly re- produce the optical data. This “conundrum” regarding the identification of the correct solution space results in a multitude of individual solutions that fulfill the prescribed constraints as well as the constraint that both, the optical input data and the microphysical data products need to be reproduced within a certain range of uncertainty. We use all these individual solu- tions that we obtain from the inversion for computing the mean value and the uncertainty bars. The averaging procedure takes advantage of the fact that different solutions can have oscillations of opposite sign, and that, after averaging, the mean oscillations can become much smaller than the oscillations of individual solutions. In our case, taking the average of many “good” quasi-solutions smoothes the mean solution and gives an additional stabilization of the solution space. The aim of the unsupervised data analysis is to collect a set of individual solutions from the complete solution space we obtain from the inversion of each optical data set. This set of selected individual solutions then is averaged and forms the basis of the final inversion results, i.e., mean value and its uncertainty. One of the challenges we are dealing with is the selection of the vicinity around the minimum discrepancy, as it defines what we denote as the discrepancy averaging interval ρav [8]. We used the discrepancy averaging interval, i.e., we used a few percentages of individual solutions of the total number of all solutions. This latter solution space is denoted as initial solution space, which follows from the inversion of a single data- set, i.e., all solutions we obtain for all inversion windows and the grid of refractive indices tested in the inversion. This initial solution space includes all error runs for each optical input dataset according to the EEM or GEM method. The individual solutions of the initial solution space fulfill the discrepancy as described by [12] in the sense that ρk ≤ ρav  1% to 5%. The exact value of discrepancy that allows us to consider the corresponding individual solution part of the final-averaged solution space depends on various factors. For example, the input (measurement) uncertainties can be used as one of the constraints that define the final solution space. We will use measurement uncertainties as additional elements of information in future versions of TiARA. However, the use of this constraint only usually does not lead to the final solution, which is the average of individual solutions that fulfill ρk ≤ ρav, for which we obtain a tolerable uncer- tainty. In fact, even a small value of the discrepancy within this acceptable value of ρav can be linked to an individual solution of microphysical parameters that does not make sense from a physical point of view and thus has to be classified as outlier. The huge amount of optical data we obtain with airborne and space-borne lidar requires software that runs in an autono- mous mode, i.e., we need routines that allow us to identify the averaging interval automatically. Autonomous mode means that parameters such as discrepancy averaging interval ρav and amount of individual solutions N s that need to be averaged are estimated automatically. For that reason, we started out with an averaging pro- cedure [8] and introduced several modifications in order to allow for the unsupervised data analysis. These modifications simultaneously take into account the following constraints imposed on the solution space: 1. The number of solutions that are averaged is N S. Usually N S does not exceed a few percent of the total number 4988 Vol. 58, No. 18 / 20 June 2019 / Applied Optics Research Article of solutions (initial solution space) for a given optical input dataset. In our simulations, we set N S  500. That number is approximately 1% of the total number of solutions of the initial solution space. 2. The maximum discrepancy δmax is the number that defines all individual solutions with larger discrepancy and that is not considered in the averaging process. This maximum dis- crepancy is correlated with the measurement uncertainty of the optical input data. Usually this value does not exceed δmax  10% to 20%. 3. We define threshold uncertainties of effective radius (δr) and number concentration (δn). Usually, we set these values to 25% and 100%, respectively. These two constraints are set before the averaging procedure starts. This procedure involves the following steps: (a) The initial step is to find the first individual solution of the final solution space. This first solution can be found from the minimum value of the discrepancy, defined as ρ1, as it defines the values of the parameters r1eff and n 1 t . (b) After that first step, we select the number concentration n2t and effective radius r 2 eff that belong to δ 2, which is the second smallest discrepancy value. (c) We compare the values of r1eff and n 1 t and r 2 eff and n 2 t . If the relative deviations of kr1eff − r2eff k∕r1eff × 100% (21) and kn1t − n2t k∕n1t × 100% (22) do not exceed the respective thresholds δr and δn, the 1st and 2nd individual solutions of effective radius are averaged. The same is done for number concentration. If condition (21) and (22) are not fulfilled simultaneously, the 2nd individual solution is ignored. (d) We continue with this selection procedure. We test the k  1-th individual solution that has the discrepancy ρk1. This discrepancy value is the next nearest to the value of the discrepancy ρk of the k-th individual solution. We compare the parameters rk1eff and n k1 t with the averaged values of the k previous individual solutions r¯eff and n¯t . These average values of effective radius and number concentration are defined as p¯  1 k Xk j1 pk, p  nt , reff : (23) If the relative deviations defined by kr¯eff − rk1eff k∕r¯eff × 100% (24) and kn¯t − nk1t k∕n¯t × 100% (25) do not exceed the respective uncertainties δr and δn the 1st, 2nd,…, k-th, and k  1-th individual solutions are averaged. The same is done for the effective radii and number concen- trations that belong to these values of δ. If the relative deviations as described by (24) and (25) are not fulfilled, the k  1-th individual solution is ignored. (e) This selection process is continued as long as the num- ber of averaged solutions is less than N s or the discrepancy of the next individual solution is ρk ≤ δmax. (f ) After these routines are carried out, we obtain a subset of the individual solutions. This subset is constrained by the number N s and/or the maximum discrepancy δmax. The uncer- tainty (scatter) of number concentration and effective radius that arises from the averaged values is limited by δn and δr , respectively. Furthermore, we take account of additional con- straints, i.e., the radius range (rmin, rmax), that we want to use for the averaging procedure; we check the lidar ratios, the Ångström exponents, and the spectral slopes of these parame- ters that follow from the individual solutions we want to in- clude in the averaging procedure. Additional constraints can be set (or extracted, obtained) using a priori information about aerosol types, e.g., [29,37,38]. 2. Unsupervised, Automated Data Inversion: Summary Figure 5 shows the flowchart of the inversion software. The methods were described in the previous sections. The numbers in circles in the flowchart refer to text in Appendix A. Fig. 5. Flowchart of TiARA. The meanings of the numbers in circles in the flowchart are given in Appendix A. Research Article Vol. 58, No. 18 / 20 June 2019 / Applied Optics 4989 Table 2 summarizes the data products we present in this contribution. Additional data products that can be derived with TiARA are listed in Table 4. 3. TIARA: SENSITIVITY STUDY A. Preparation of Simulation Data In this section, we present a summary of the first sensitivity study carried out with TiARA. The number of simulation cases is sufficient to show the basic properties of TiARA. We made several simplifications of the underlying aerosol properties. We neglected the fact that in real aerosol situa- tions, particles have a size- and wavelength-dependent com- plex refractive index. One main simplification is that we assumed a logarithmic-normal monomodal shape of the in- vestigated PSDs. Results of pre-studies regarding bimodal PSDs are presented in Refs. [14–16]. We did not test bimodal PSDs in our con- tribution, because a significantly different simulation strategy other than the one used for simulations of monomodal PSDs is needed. Both modes (fine and coarse mode) cover a wide range of parameters (median radius and geometrical standard deviation in their own domains, respectively). Given the avail- able computer power, we cannot cover a large enough set of bimodal PSDs consisting of different combinations of the fine and coarse modes to derive statistically meaningful results. Figure 6 shows the monomodal logarithmic-normal PSDs we used in our simulations. The mathematical description has the form f r  1 r ffiffiffiffiffiffiffiffi ln σ p exp  − lnr − lnrmed2 2lnσ2  : (26) Monomodal logarithmic-normal PSDs can be characterized by three parameters. One parameter is the median radius (rmed), which defines the radius at which half of the particles in this PSD have radii lower than this radius and the other half of the particles have a radius larger than this radius. The second parameter is the geometrical standard deviation (mode width) σ, which defines the spread of the particle size distribution, i.e., the number of particles per radius interval between the minimum-chosen particle radius and the maximum-chosen particle radius. In that context, the expression ln σ is commonly referred to as mode width. The third parameter is the number of particles (per cm3) for which the PSD reaches its maximum. In our present study, we fixed the number of particles to 1∕cm3. We used PSDs in terms of volume concentration represen- tation in data inversion. Figure 6 shows that our PSDs not only cover the fine-mode range but also the coarse-mode range. We define in this study 500 nm particle radius as separation be- tween fine-mode and coarse-mode particles. This separation at 500 nm particle radius already allows us to take a first look at the performance of TiARA for particles in the coarse mode without explicitly considering bimodal PSDs. Figure 7 is a graphical display of particle effective radii and surface-area and volume concentrations tested in our study. Table 4 in Appendix A presents the numerical values of these parameters. The optical data were computed from these size distributions with the assumption of 1 particle per cm3. We do not need to use other values of particle number concentra- tion, as this parameter serves only as a multiplication factor of the extensive properties of the investigated PSDs, i.e., number, surface-area, and volume concentrations. The chosen values for median radius and mode width cover a comparably wide range of particle scenarios. In terms of num- ber concentration, the PSDs cover the fine-mode fraction; see Fig. 6, left. The inversion algorithm retrieves the PSDs in terms of volume concentration. If we convert the number con- centration to volume concentration, we see that PSDs based on volume-concentration distribution contain particles above 10 μm radius. That size range covers a significant part of the coarse mode of PSDs and thus needs to be considered in the µ µ µ Fig. 7. Graphical display of effective radius and surface-area and volume concentrations of the PSDs that formed the basis for the com- putation of the optical data used in the simulations. Each symbol rep- resents one specific combination of median radius and geometrical standard deviation. The numerical vales of the parameters are given in Appendix A. Table 2. Data Products of the Total PSD Inferred with TiARAa Parameter Effective radius Number concentration Surface-area concentration Volume concentration Imaginary part of the complex refractive index Real part of the complex refractive index aUncertainties of the complex refractive index are too high for archiving purposes. Fig. 6. PSDs that were tested in our simulation studies. 4990 Vol. 58, No. 18 / 20 June 2019 / Applied Optics Research Article reconstruction process. The LUTs developed for TiARA cover this radius range. Simulation studies with the manual version of the inversion software [8,12] showed that measurement uncertainties up to 20% still allow for inferring meaningful results of microphysical data products. The results presented in this contribution are based on five different uncertainty scenarios and two error models. In TiARA’s method of solution space analysis, we added several new components that were not used in the manual version, which was first published between the late 1990s and early 2000s. For that reason, we also explored at which threshold value of measurement uncertainty TiARA provides unacceptable microphysical data products. Table 3 shows that we used measurement-channel- independent uncertainties of 0%, 5%, 10%, 15%, 20%, and 25%. We also simulated four uncertainty scenarios in which the measurement uncertainties varied among the different mea- surement channels; see Table 6 in the Appendix. Analysis of these measurement-channel-dependent simulations is underway. The case of error-free data was also investigated, as it allows us to collect information on the possible minimum uncertainty of the data products. We hypothesize that this uncertainty is driven by the mathematically induced retrieval uncertainty of TiARA. The CRI in that case is assumed error free, i.e., we assume we know the correct value of the CRI. We will fur- ther discuss this topic in our next paper, which will detail our next version of the uncertainty-analysis scheme. In the following section, we present an example of the in- version of optical data obtained from a size distribution of a given geometrical standard deviation. The example serves as illustration of the general features of TiARA. B. Example: Case Study Figure 8 shows effective radius (first and second rows), and number (third and fourth rows) and surface-area (fifth and sixth rows) concentrations obtained with EEM. Figure 9 shows the results for volume concentrations and the complex refrac- tive index. For each parameter, we show the results for the case of error-free optical input data and the case of 15% measurement uncertainty. Each retrieval result (data point) is shown as mean value (symbols). Each data point follows from applying the averaging procedure described before. The 1-1 lines indicate perfect reproduction of the theoretical values of the investigated parameters, i.e., the 1-1 line is a mea- sure of the accuracy of the retrievals. The green lines show that the retrieved data products would need to be reduced, respec- tively, increased by 20% in order to be on the 1-1 line. The red lines indicate that the retrieved data products would need to be redruced by a factor of two, respectively, doubled in order to be on the 1-1 line. In addition, each data point is shown with an uncertainty bar. Each individual solution (mean value) is affected by a retrieval uncertainty caused by the mathematically induced error, which follows from applying the averaging procedure in the vicinity of minimum discrepancy and the uncertainty caused by the inversion of imprecise optical input data. This uncertainty bar can be interpreted as standard deviation of the individual mean solutions. The results for surface-area and volume concentrations are shown on a double-logarithmic scale. We chose this form of presentation because these two parameters cover a large range of values. Although the individual uncertainty bars are difficult to identify, we see that in most cases the uncertainties are below 50%. The mean values of effective radius largely remain within the red sector if the true values of the input optical data are exact, i.e., we achieve the minimum requirement we expect from TiARA: the mean values of the inversion results remain within 50% deviation from the true results. There are some outliers that seem to occur for the case of low real parts and large imagi- nary parts. The reason for these comparably strong outliers is unknown. The retrieved effective radius remains within the green sector, at least up to an effective radius of 0.5 μm, except for these few outliers. 50% is the upper value of statistical uncertainty of the mean value of effective radius. In many cases, this statistical noise is considerably less. The outliers belong mainly to the lowest real part tested in this study (1.4) and also show the strongest stat- istical uncertainty. 15% uncertainty of the optical data reduces the retrieval ac- curacy. The mean values remain in the red sector if effective radius is less than 0.5 μm and the imaginary part is ≤0.01. For stronger light-absorbing particles, i.e., imaginary parts larger than 0.01, we find cases in which effective radius is over- estimated by more than a factor of two. Statistical uncertainty also increases if measurement errors are 15%. In summary, we find that PSDs with particle effective radii below approximately 0.5 μm can be retrieved to at least 50% accuracy and better than 50% precision if par- ticle absorption is low, i.e., imaginary parts are ≤ 0.01. Such low values of the imaginary part result in single- scattering albedo (SSA) at 532 nm >0.9 if, at the same time, 0.1 μm < reff < 0.5 μm; additional information can be found in Ref. [36]. If particles are highly light-absorbing, there is an increasing chance of overestimating particle effective radius. The mean value of number concentration for the most part remains in the red sector. TiARA tends to underestimate Table 3. Error Scenarios of the Simulation Studiesa β355 β532 β1064 α355 α532 EEM 5 5% for all 5 input data EEM 10 10% for all 5 input data EEM 15 15% for all 5 input data EEM 20 20% for all 5 input data EEM 25 25% for all 5 input data GEM 5, 1σ 5% and 1.7% for all 5 input data GEM 10, 1σ 10% and 3.3% for all 5 input data GEM 15, 1σ 15% and 5% for all 5 input data GEM 20, 1σ 20% and 6.6% for all 5 input data GEM 25, 1σ 25% and 8% for all 5 input data aThe numbers 5,…, 25 describe the uncertainty of the input data in percent. In the case of GEM, we need a second number to describe the uncertainty. We use 1σ from the center value (0% uncertainty) within which this data uncertainty of 5% …., 25% is achieved; see Fig. 4 for illustration. Research Article Vol. 58, No. 18 / 20 June 2019 / Applied Optics 4991 Fig. 8. Example of retrieval performance for the case of size distributions with a geometrical standard deviation of 1.9 and median radii of 20 nm, 60 nm, 100 nm, 140 nm, 180 nm, 220 nm, 260 nm, and 300 nm (Table 5). We tested all 15 imaginary parts of the complex refractive index of the investigated particle size distributions (Table 5). We show the results for the imaginary parts 0i, 0.005i, 0.01i, 0.03i, and 0.05i. Each data point (with error bar) shows the average and standard deviation of multiple solutions obtained in one retrieval. The first row shows the results for effective radius and 0% data uncertainty. The second row shows the results for 15% data uncertainty and the use of EEM. The results for number concentration are shown in row 3 (0% data uncertainty) and row 4 (15% data uncertainty). Rows 5 and 6 show the results for surface-area concentration. The colored lines describe by how much the data values need to increase or decrease to coincide with the 1-1–line (gray line). Green = reduction/increase by 20% (denoted in the text as green sector); red = reduction/increase by factor 2 (red sector). The squares describe the results of the four real parts tested in the simulations: 1.4 (blue), 1.5 (green), 1.6 (orange), and 1.7 (black). The uncertainty bars for number concentration are turned by 90° for better readability. 4992 Vol. 58, No. 18 / 20 June 2019 / Applied Optics Research Article Fig. 9. Same as in Fig. 8 but for (rows 1 and 2) volume concentration, (rows 3 and 4) the real part, and (rows 5 and 6) the imaginary part of the complex refractive index. Meaning of the symbols and their colors are the same causedas in Fig. 8. Results for volume concentration again are shown in terms of percent-deviation from the true results. Meaning of the colored lines in the plots that show the results for the real part: reduction/increase of the mean values by 0.05 (green) and by 0.075 (red) before results coincide with the true values (1-1-line, gray). Meaning of colored lines for the imaginary part: reduction/increase by 0.006 (green) and reduction/increase by factor 2 (red) until mean values coincide with 1-1–line (gray). The uncertainty bars for real and imaginary parts of the complex refractive index are turned by 90° for better readability. Research Article Vol. 58, No. 18 / 20 June 2019 / Applied Optics 4993 number concentration if the input optical data are error free; see Fig. 8. This underestimation seems to grow with the in- creasing imaginary part of the complex refractive index of the investigated particles. The statistical error usually is less than 50%—in part, it is significantly less than 50%. The retrieval accuracy seems to improve with the increasing real part of the complex refractive index of the investigated particles. This pattern will be investigated in more detail in future studies. A measurement uncertainty of 15% amplifies the feature of underestimating number concentration. As in the case of error-free data, the retrieved number concentration tends to be underestimated stronger as the imaginary part increases. Statistical uncertainty also increases compared to the case of error-free optical data. This feature is particularly obvious for the cases in which number concentration is overestimated. As noted in previous studies [8,12,13,17,36] we find excep- tionally accurate inversion results for surface-area concentra- tion. An explanation for this mechanism has recently been published [14,15]. Surface-area concentration covered approx- imately three orders of magnitude in our simulation study. The results for surface-area concentration are inside the red sector, even if measurement errors are 15%. The only exceptions from this result are connected to the lowest values of surface-area concentration tested in our study. In that case, we significantly underestimate surface-area concentration if imaginary parts ≤0.01. In contrast, highly light-absorbing PSDs can be re- trieved comparably accurately. We are investigating the outliers that we find for low imagi- nary parts. One reason for this behavior could be that we cannot resolve in a trustworthy manner PSDs below particle radii of 50 nm. Such an underestimation would lead to an underestimation of surface-area concentration. In fact, number concentration would be underestimated even more in such a case. We find this underestimation of number concentration for PSDs that consist of a comparably high share of small par- ticles below 50 nm radius. We will address this issue in our next performance test of TiARA. We will use an upgraded version of LUT tables and an improved solution-space post-processing scheme [14–16]. We find accurate inversion results of volume concentration across four orders of magnitude. The quality of the inversion results seems to decrease for imaginary parts of 0.05i, i.e., the case of highly light-absorbing particles. We find two outliers in the plot for the imaginary parts 0.03i and 0.05i. The statistical uncertainty of the retrieved values is always below 50% except for these two outliers. In most cases, the statistical uncertainty is less than 20%–30%. We will carry out a more detailed analysis to identify what effect may be responsible for the two outliers. First, the input lidar ratios are 39,000 sr and 73,000 sr, respectivley. These data are produced by PSDs with very small mean radii of 0.02 μm and a real part of 1.7. In this case, the overlap between the available inversion windows and the PSDs likely is insuffi- cient. The inversion solution has a minimal discrepancy of ≈1.0e  4. This value is far too large to count as an acceptable solution. The next upgrade of TiARA will use an improved databank and other important modifications. Furthermore, we will be able to use a bigger set of simulation data, which hopefully will remove such outliers. The accuracy of the inversion results slightly worsens for 15% measurement uncertainty if the imaginary part is ≤0.01i. TiARA tends to overestimate volume concentration. Mean values still remain within the red sector except for very small volume concentrations, where we find a similar pattern to the one described for surface-area concentration, i.e., volume concentration may be underestimated by more than 50%. However, as in the case of surface-area concentration, this underestimation seems to disappear for large imaginary parts, i.e., >0.01i. TiARA seems to overestimate volume concentra- tion for large imaginary parts of 0.03i and 0.05i. The statistical noise also increases significantly. Figure 9 (center) shows the results for the retrieved real part. The true value was 1.6 in this simulation case. We used a search grid from 1.325–1.8 (real part) and 0i − 0.05i (imaginary part) in our simulations; see Fig. 2. We tested in summary four dif- ferent real parts: 1.4, 1.5, 1.6, and 1.7. As the final goal of our algorithm development, we targeted a retrieval accuracy of 0.05 to 0.075 for the real part. With regard to the imaginary part we aimed at 50% retrieval accuracy as the minimum goal. The main challenge in that regard lies in retrieving low values of the imaginary part, which we define as 0.01. We cannot use negative imaginary parts in our inversion. Based on the current way of how we select the final solution space, we naturally end up with an overestimation of the imaginary part. This overesti- mation is particularly strong if the true value of the imaginary part of the complex refractive index approaches the value zero, i.e., no light absorption. TiARA tends to overestimate the real part by more than 0.075 if the true value is 1.4 and the input optical data are error free. This overestimation disappears for a larger real part. The range of retrieved real parts for the different size distributions (effective radii) seems to vary in dependence on the underlying imaginary part. For example, in the case of the true value 1.6 (red symbols), we see that all solutions are clustered between approximately 1.65 and 1.7 if imaginary parts are small. In con- trast, if the imaginary part is 0.05i the solutions are spread out between real parts from 1.4 to 1.7. If the true value is 1.4, this pattern is nearly opposite, i.e., real parts vary across a wide range of numbers for low imaginary parts. In contrast, retrieved real parts seem to accumulate between 1.4 and 1.5 if the imaginary part is 0.05i. In summary, we find the following pattern of retrieval ac- curacy: the real part is overestimated, particularly if real parts are 1.4 or 1.5. Real parts are underestimated if values are 1.6 and 1.7. We assume that this pattern is largely based on the search space of the real and that TiARA is not sensitive to retrievals of the real part. Statistical uncertainty is as high as 0.1, which is another indicator of the insensitivity of TiARA with regard to retrieving the real part. Such uncer- tainties are larger than what we need to achieve with TiARA in view of the accuracy requirements for SSA; see, e.g., [36]. We are exploring methods that allow us to reduce the uncer- tainty of the retrieved CRI. Ref. [13] shows that a retrieval accuracy of at least 0.1 of the real part may be achievable. This accuracy would already significantly improve the accuracy 4994 Vol. 58, No. 18 / 20 June 2019 / Applied Optics Research Article of the data products obtained with TiARA, and particulary would improve the accuracy of the mean values of the SSA to approximately 0.075 or better; measurement uncertainty of the optical input data, of course, has influence on SSA, too. An example that illustrates our assumption is given in Fig. 6 of Ref. [36]. In the case of 15% measurement error, TiARA seems to generally overestimate the real part. The majority of retrieved real parts is at/above 1.6. We currently have no explanation for this pattern, i.e., this general overestimation when we move from error-free to erroneous optical input data. The insensitiv- ity of TiARA with regard to the real part retrieval might explain this pattern only in part. Another reason could be the position of the PSD along the particle radius grid. The analysis of this pattern, however, goes beyond the main goal of this paper, which is about identifying typical patterns and the general applicability of TiARA. However, these patterns will form the basis for a more refined analysis of our simulations. We hope that a better understand- ing of these patterns will help us improve TiARA in future. First results in that regard have been published in recent years [13–16,36]. Figures 9 (rows 5 and 6) shows the results of the retrieved imaginary parts. If the optical data are error free, TiARA over- estimates the imaginary part. This pattern is particularly strong for low imaginary parts. If the imaginary part is high, i.e., 0.03i and 0.05i, accuracy improves. The mean values are within the red sector. Still, we find on average an overestimation of the imaginary part in these two cases. The situation is similar in the case of 15% measurement error. The imaginary part is largely overestimated if the true values are 0i, 0.005i, and 0.01i, respectively. However, we find some (minor) differences compared to the case of 0% measure- ment uncertainty. The mean value for the most part stays within 0.006i if the real part is 1.4 or 1.5. A more detailed analysis on the basis of a larger set of simulations is needed to decide if this is a result that can be generalized, i.e., if better retrieval results for small imaginary parts can be obtained if real parts are small. We find better retrieval results of the imaginary part in the case of 0% data error. Accuracy is better than 0.006i if real parts are large, i.e., 1.6 and 1.7. The underes- timation of the retrieved values largely disappears if imaginary parts are 0.03i or 0.05i. Most of the mean values remain in the red sector. The statistical uncertainty is mainly in the range of 50% if the true imaginary parts are 0.03i or 0.05i. If imaginary parts are smaller (0.005i or 0.01i), the statistical uncertainty is as high as 100%. If the true imaginary part is 0i, uncertainties can be as high as −0.02i. In summary, most of the results we obtained for the imagi- nary part in this study confirm the findings we obtained with the manual version of the inversion algorithm (see also results presented in, e.g., Fig. 6 of Ref [36]): a) retrieval results are within 50% for highly light-absorbing particles, and b) imagi- nary parts can be retrieved to 100% if particles are low light- absorbing. Imaginary parts of 0i cannot be retrieved at the moment. C. Statistics: Histograms The previous case study illustrated for a few examples of PSDs what can be achieved with TiARA. It showed its strengths and weaknesses. As we are interested mainly in the mass production of inversion results for future space-borne missions, the more important question is how TiARA performs on a statistical basis; regardless of this demand, we will strive for improving inversion results of each individual case, as this, of course, nat- urally improves the statistical performance of TiARA. The stat- istical overview in this section allows us to further discuss the performance of TiARA in more general terms. In view of the assessment of the impact of data uncertainties on the inversion results, the following figures also include re- sults for GEM. We emphasize, that at present, we favor EEM over GEM in TiARA. EEM is more robust than GEM at the current status of algorithm development. We gain insight on the impact of extreme values of the retrieved microphysical parameters on the mean values of the data products, because we test extreme distortions of the optical spectra with the EEM. Knowledge of the possible largest and smallest values of each data product is valuable information in our uncertainty analysis and can be achieved with a comparably small number of error runs (eight). We cannot derive this information (ex- treme values of data products) from GEM. The strength of GEM lies in its use of the statistical nature of uncertainty bars of the optical input data. In this first step of developing this new scheme of uncertainty analysis, we used only eight error runs. Our idea was to make the inversion re- sults we obtained with GEM easier to compare with the results we obtained with EEM. Based on the current analysis of the simulation results we obtained with GEM, we realized that the number of error runs (eight) may not be sufficient to obtain a statistically significant set of individual solutions that need to be averaged for the final data products. We will increase the number of error runs to several hundred per dataset in a future study. We assume that we will reproduce the Gaussian-like shape of the distribution curve shown in Fig. 4 in a much better way. Despite the incompleteness of the simulation with GEM, we consider it relevant to show some results. The results help in better understanding the fundamental concepts of using GEM in future use of TiARA. Figure 10 shows a statistical overview of the deviation between the reconstructed microphysical parameters and the true values for all size distributions tested in this study, includ- ing the six error levels, i.e., 0%, 5%, 10%, 15%, 20%, and 25%. The figures show how results differ if we use EEM (gray-shaded columns and numbers in black) and GEM (green-shaded columns and numbers in green) for generating the eight erroneous datasets (for each PSD and set of CRI) in our simulations. The histogram distributions show the percent- age of simulated cases (eight PSDs, four real parts, 15 imaginary parts) that deviate by a given percentage from the true results of effective radius and number, surface-area, and volume con- centrations. The histogram columns show results for which the retrieval products deviate ≤50% from the true results. 50% retrieval accuracy is our priority requirement, though we target a 20% retrieval accuracy for as many data products as possible in a future version of TiARA. In the case of real and imagi- nary parts of the complex refractive index, we show absolute Research Article Vol. 58, No. 18 / 20 June 2019 / Applied Optics 4995 deviations from the true results. The histogram columns refer to these deviations from the true results. These histogram distributions are complemented by hori- zontal gray lines (EEM) and green horizontal lines (GEM). These lines show the percentage of cases that deviate by ≤50% from the true results in the case of effective radius and number, surface-area, and volume concentrations. For ex- ample, Fig. 10(aa) shows that nearly 100% of all effective radii can be retrieved with less than 50% deviation (accuracy) from the true results. This holds true for both error models—the gray and the green horizontal lines overlap. The black and green numbers in each plot extend this analysis. Each number de- notes how many of all cases deviate less than 20% from the true results. The results give an impression of how far off we are from what we consider as maximum achievable performance. In the case of the real part, the horizontal lines and the num- bers in each plot describe absolute deviations from the true values (real part). The horizontal lines refer to 0.075 deviation (accuracy) from the true values. The numbers in the plots describe the percentage of cases that deviate by 0.05 from the true values. With regard to the imaginary part, the numbers in the plots describe the percentage of cases that deviate 0.006i from the true values.. The results derived from EEM and GEM differ. We first discuss the results for the size parameters, i.e., effective radius and number, surface-area, and volume concentrations. From a qualitative point of view, the distribution (spread) of histogram columns generally is wider for the results obtained with EEM compared to the GEM results. There is a larger number of individual simulation cases for which, on average, Fig. 10. Inversion results for all values of sigma, all median radii, and all real and imaginary parts of the complex refractive index (see Fig. 7 and Table 5). Each row shows the results for one of the six investigated data products. Each column shows the results for (from left to right): 0%, 5%, 10%, 15%, 20%, and 25% optical data uncertainty. The histograms show percentage deviations (0%–50%) from the true values for the cases (aa)–(df ) and the absolute errors for cases (ea) to (ff ). We show the results for the EEM (black/hatched columns) and the GEM (green columns). The gray (EEM) and green (GEM) horizontal lines show the total number of simulation cases for which the respective parameter can be recon- structed to better than 50% [(aa)–(df )]. The black (EEM) and green (GEM) numbers show the percentage of cases for which the data products can be retrieved to better than 20%. In the case of the real part [(ea)–(ef )], the horizontal lines (gray and green) represent results for absolute deviations less than 0.075 from the true values. The numbers represent deviation less than 0.05. In the case of the imaginary part [(fa)–(ff )], relative deviations, i.e., horizontal lines, are not defined, as we included the results for imaginary parts of 0i in our analysis. The numbers show the percentage of simulations for which the retrieved imaginary parts deviate less than 0.006i from the true values [(fa)–(ff )]. The y axis of all plots is scaled to 100%. Figure 16 in the Appendix provides an overview of the results based on individual scalings for each data product. 4996 Vol. 58, No. 18 / 20 June 2019 / Applied Optics Research Article the microphysical data products tend toward larger systematic deviations from the true values if we apply EEM compared to GEM. On average, GEM delivers more results that are close to 0% deviation from the true results compared to EEM. The inversion results tend to be better, i.e., narrower around the 0%-deviation point for measurement errors of 10%–15%. This result however needs to be confirmed in further simula- tion studies. We again note the exceptionally high retrieval quality of surface-area concentration. With regard to GEM, we notice on average a slight underestimation of the size param- eters for the individual simulations. The exception is surface- area concentration, which is overestimated by up to 10%. With regard to the real and imaginary parts, we find the following results. We systematically overestimate the real part in all cases considered in our study. Only for the case of high imaginary parts do we find situations in which we under- estimate the real part for some cases. If imaginary parts are large, there is a rather clear systematic behavior with regard to retrieval quality in dependence on uncertainty of the optical input data. The retrieval accuracy drops with increasing uncer- tainty of the input data if we take 0.075 as the threshold value for the retrieved real part. We do not find a significant performance difference in TiARA whether we use the EEM or the GEM in our simula- tions. If we use 0.05 as the threshold for the retrieved real part, we find fewer cases that fulfill this requirement if the imaginary part is large (0.03i and 0.05i). There is no clear pattern for smaller imaginary parts. With regard to the imaginary part, we again notice an over- estimation of the retrieved values compared to the theoretical values. The inversion results are larger than the theoretical val- ues, regardless of the used error model. The number of cases in which the imaginary part is derived to within 50% of the true value is comparably high. This number of cases increases with increasing imaginary part. No results are shown for the cases of 0i and 0.005i (horizontal gray and green lines) as the error of 50% cannot be computed in these cases. I.e., it is not de- fined for the case of 0iand cannot be computed for the case of 0.005iin view of the resolution of the grid that was used for the imaginary part. We see a slight systematic behavior of the retrieved values of the imaginary parts with regard to increasing uncertainty of the optical data. The number of cases that fall within the 50% threshold-level decreases with increasing uncertainty level. Again, it seems that 15% data uncertainty is an upper threshold for which a reasonable retrieval performance can be achieved. If we use the threshold-value 0.006i, we find that on aver- age 40%–50% of all cases fall within this uncertainty level. We do not see a clear pattern of retrieval quality in dependence on the uncertainty of the optical data used as input in the inversion. In summary, the uncertainties of the optical input data seem to have little influence on the retrieval quality of the real and imaginary parts of the CRI. This pattern points to a possibly low sensitivity of TiARA with regard to retrieving the CRI. In contrast, the algorithm is capable of deriving trustworthy values of particle size parameters, i.e., effective radius and number, surface-area, and volume concentrations. Finally, we show the unscaled version of this figure in Fig. 16 in Appendix D. This figure shows more clearly the height of the individual histogram columns and the distribu- tion of the height of the columns. This figure is our first attempt to illustrate the distribution function of the solution space with regard to the deviation from the true results. One goal of future work includes the development of a more comprehensive uncertainty analysis scheme and to investigate the properties of the distribution function. We are interested in properties such as the shape of this distribution function. For example, we want to find out if a specific shape exists for all data products, e.g., Gaussian-type distribution or skewed distribution, and if we can infer in a more straightforward man- ner systematic and statistical biases of the individual data prod- ucts from such histogram distributions. We will discuss in more detail these topics in our follow-up contribution that will deal with our updated uncertainty analysis scheme. D. Statistics: Color Maps Figure 11 summarizes the retrieval results in terms of color (heat) maps. This style of presentation provides us with a quicklook version of the performance of TiARA. We show the results for all investigated size distributions and all investigated error levels (EEM and GEM) for all micro- physical parameters. We show the number of simulated cases (in percent) that deviate by less than 20%, respectively, 50% from the true values of the size parameters (effective radius and number, surface-area, and volume concentrations). We show the results for 0.05 and 0.075 deviation from the true val- ues of the real part of the CRI. We show the results for 0.005i and 50% deviation from the true values of the imagi- nary part. The map separates the results according to the uncertainty level of the input optical data and the error model we used. In one case, we combined all results regardless of the imaginary part, but the results are separated according to the mode width that was used for the investigated PSDs. In the other case, we combined all results regardless of the mode width of the PSDs, but the results are separated according to the used imagi- nary parts of the investigated PSDs. We believe that separation according to the imaginary part of the CRI is necessary in this figure, as this parameter is the most sensitive input quantity for calculating SSA. SSA also depends on the PSD; thus, representation of the simula- tions according to a quantity related to particle size is also useful for the analysis of the performance of TiARA. We picked geo- metrical standard deviation as a first attempt in this type of data analysis. This parameter represents the variance, which is one additional quantity we want to provide as data product in the future. We look at other ways of presenting our results. These heat maps can be considered only as our first attempt of presenting the performance of TiARA in a way that allows for easy com- parison to performance changes compared to future versions of the software. We counted the solutions that fall within 20% and 50% deviation from the true results. We then defined color-coded intervals that reflect if, e.g., 68.3% (1-σ standard deviation), 95.4% (2-σ standard deviation), and 99.7% (3-σ standard Research Article Vol. 58, No. 18 / 20 June 2019 / Applied Optics 4997 deviation) of all solutions are within the 20%-interval, respectively, 50%-interval of acceptable retrieval quality. Our goal is to develop a statistically robust framework of uncertainty analysis, and for that reason, we introduce the concept of (1, 2, 3)-sigma standard deviations. We are aware of the fact that such a denotation requires a statistical framework based 0 5 10 15 20 25 0 5 10 15 20 25 0 1 62% 51% 55% 39% 32% 91% 90% 88% 83% 74% 64% 0.005 2 69% 59% 44% 35% 32% 93% 100% 98% 77% 63% 58% 0.01 3 67% 58% 43% 38% 29% 93% 99% 94% 70% 66% 58% 0.03 4 58% 44% 38% 23% 23% 74% 78% 65% 55% 45% 43% 0.05 5 45% 29% 26% 25% 22% 61% 63% 50% 46% 44% 48% 1.5 6 63% 63% 57% 47% 33% 77% 78% 75% 69% 69% 65% 1.7 7 64% 57% 43% 27% 16% 79% 79% 77% 70% 55% 49% 1.9 8 64% 57% 22% 18% 16% 83% 84% 81% 55% 43% 39% 2.1 9 53% 38% 40% 27% 28% 88% 89% 81% 59% 48% 38% 2.3 10 55% 36% 42% 34% 32% 84% 92% 75% 68% 59% 55% 2.5 11 62% 38% 42% 40% 39% 82% 94% 84% 75% 77% 79% Extreme Error Model < 20% Deviation From True Values < 50% Deviation From True Values EFFECTIVE RADIUS Uncertainy of Optical Input Data in Percent mI theo Sigma 0 5 10 15 20 25 0 5 10 15 20 25 58% 63% 68% 56% 54% 55% 91% 90% 90% 89% 88% 91% 55% 60% 65% 59% 66% 60% 93% 96% 96% 96% 91% 93% 55% 54% 57% 63% 68% 58% 93% 97% 99% 97% 95% 88% 54% 54% 51% 47% 51% 43% 74% 78% 75% 78% 78% 73% 44% 51% 41% 33% 34% 32% 61% 63% 65% 58% 54% 55% 65% 61% 75% 62% 55% 57% 77% 75% 84% 77% 77% 75% 62% 66% 60% 52% 55% 51% 79% 79% 76% 70% 74% 75% 63% 57% 65% 56% 62% 51% 83% 81% 84% 87% 82% 80% 49% 56% 48% 45% 51% 45% 88% 93% 88% 86% 82% 77% 43% 51% 47% 51% 55% 50% 84% 92% 89% 94% 85% 86% 38% 48% 44% 43% 49% 42% 82% 88% 89% 86% 87% 88% Gauss Error Model < 20% Deviation From True Values < 50% Deviation From True Values 0 5 10 15 20 25 0 5 10 15 20 25 0 44% 38% 37% 41% 40% 31% 74% 69% 61% 69% 63% 64% 0.005 51% 49% 48% 50% 39% 33% 80% 75% 73% 78% 71% 73% 0.01 53% 46% 44% 46% 40% 33% 81% 75% 78% 82% 79% 72% 0.03 51% 52% 41% 37% 21% 18% 89% 88% 86% 71% 61% 49% 0.05 43% 34% 33% 17% 13% 10% 91% 81% 78% 67% 53% 48% 1.5 52% 52% 50% 39% 30% 27% 88% 86% 77% 63% 56% 59% 1.7 49% 41% 40% 46% 36% 23% 87% 78% 67% 72% 75% 74% 1.9 52% 48% 43% 44% 30% 24% 86% 73% 80% 80% 75% 67% 2.1 51% 43% 47% 38% 30% 24% 83% 76% 81% 81% 67% 56% 2.3 45% 46% 34% 30% 25% 22% 81% 77% 76% 73% 57% 56% 2.5 41% 33% 30% 31% 32% 28% 73% 75% 69% 71% 63% 55% Extreme Error Model < 20% Deviation From True Values < 50% Deviation From True Values mI theo Sigma NUMBER CONC. Uncertainy of Optical Input Data in Percent 0 5 10 15 20 25 0 5 10 15 20 25 44% 48% 46% 40% 28% 26% 74% 71% 74% 61% 61% 48% 51% 41% 49% 28% 44% 33% 80% 75% 76% 52% 70% 63% 53% 57% 48% 38% 26% 39% 81% 82% 75% 66% 61% 65% 51% 48% 43% 43% 28% 40% 89% 80% 77% 74% 56% 63% 43% 33% 39% 40% 21% 20% 91% 87% 75% 74% 60% 67% 52% 52% 64% 49% 39% 46% 88% 91% 87% 83% 80% 73% 49% 46% 40% 44% 32% 32% 87% 82% 81% 75% 67% 65% 52% 51% 47% 36% 26% 26% 86% 79% 76% 58% 61% 55% 51% 49% 42% 30% 24% 24% 83% 79% 77% 63% 54% 58% 45% 39% 38% 34% 30% 32% 81% 75% 67% 60% 56% 57% 41% 35% 38% 32% 25% 29% 73% 67% 65% 53% 52% 60% Gauss Error Model < 20% Deviation From True Values < 50% Deviation From True Values 0 5 10 15 20 25 0 5 10 15 20 25 0 98% 100% 100% 100% 72% 37% 99% 100% 100% 100% 100% 100% 0.005 99% 100% 100% 100% 81% 34% 99% 100% 100% 100% 100% 98% 0.01 100% 100% 100% 99% 84% 43% 100% 100% 100% 100% 100% 99% 0.03 98% 100% 99% 99% 95% 46% 100% 100% 100% 100% 100% 100% 0.05 98% 98% 100% 98% 96% 60% 100% 100% 100% 100% 100% 99% 1.5 96% 97% 99% 98% 80% 40% 98% 100% 100% 100% 100% 100% 1.7 97% 100% 100% 98% 83% 41% 100% 100% 100% 100% 100% 97% 1.9 100% 100% 100% 100% 87% 35% 100% 100% 100% 100% 100% 98% 2.1 100% 100% 100% 100% 91% 46% 100% 100% 100% 100% 100% 100% 2.3 100% 100% 100% 100% 85% 49% 100% 100% 100% 100% 100% 100% 2.5 100% 100% 100% 100% 87% 52% 100% 100% 100% 100% 100% 100% Sigma Extreme Error Model < 20% Deviation From True Values < 50% Deviation From True Values SURFACE-AREA CONC Uncertainy of Optical Input Data in Percent mI theo 0 5 10 15 20 25 0 5 10 15 20 25 98% 98% 100% 99% 96% 99% 99% 99% 100% 99% 97% 100% 99% 99% 97% 100% 99% 99% 99% 100% 98% 100% 100% 100% 100% 98% 100% 100% 98% 100% 100% 100% 100% 100% 100% 100% 98% 98% 99% 99% 100% 100% 100% 100% 100% 100% 100% 100% 98% 97% 98% 99% 96% 97% 100% 100% 100% 100% 100% 100% 96% 90% 95% 98% 90% 96% 98% 99% 97% 99% 97% 100% 97% 99% 98% 99% 97% 98% 100% 100% 100% 100% 99% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 99% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% Gauss Error Model < 20% Deviation From True Values < 50% Deviation From True Values 0 5 10 15 20 25 0 5 10 15 20 25 0 58% 56% 49% 56% 41% 28% 90% 89% 87% 83% 74% 68% 0.005 54% 64% 58% 44% 39% 37% 91% 98% 98% 78% 78% 69% 0.01 53% 59% 53% 44% 43% 33% 92% 99% 95% 71% 71% 59% 0.03 50% 58% 43% 37% 33% 31% 74% 77% 64% 54% 62% 53% 0.05 43% 46% 30% 26% 27% 33% 61% 63% 48% 48% 53% 51% 1.5 66% 59% 55% 54% 38% 32% 76% 76% 75% 69% 71% 69% 1.7 61% 58% 57% 46% 45% 39% 79% 79% 78% 71% 66% 55% 1.9 60% 60% 56% 26% 31% 29% 83% 83% 80% 57% 63% 45% 2.1 45% 48% 40% 45% 30% 25% 88% 89% 80% 58% 58% 47% 2.3 45% 58% 37% 39% 39% 33% 85% 92% 76% 70% 68% 62% 2.5 33% 56% 33% 38% 35% 36% 78% 92% 81% 74% 79% 82% mI theo Extreme Error Model < 20% Deviation From True Values < 50% Deviation From True Values Sigma VOLUME CONC. Uncertainy of Optical Input Data in Percent 0 5 10 15 20 25 0 5 10 15 20 25 58% 58% 64% 54% 54% 52% 90% 88% 88% 88% 86% 86% 54% 52% 61% 52% 56% 61% 91% 93% 93% 94% 91% 90% 53% 52% 57% 58% 67% 50% 92% 96% 97% 97% 95% 88% 50% 50% 49% 47% 46% 40% 74% 77% 75% 78% 77% 75% 43% 49% 40% 33% 33% 29% 61% 63% 64% 58% 53% 54% 66% 58% 72% 58% 55% 56% 76% 72% 84% 77% 75% 75% 61% 58% 60% 51% 49% 48% 79% 79% 76% 71% 76% 77% 60% 59% 60% 52% 57% 51% 83% 81% 84% 86% 83% 79% 45% 53% 47% 41% 50% 42% 88% 90% 85% 86% 81% 75% 45% 45% 44% 49% 53% 46% 85% 92% 85% 92% 84% 82% 33% 41% 42% 41% 43% 35% 78% 84% 84% 85% 84% 83% < 50% Deviation From True Values Gauss Error Model < 20% Deviation From True Values 0 5 10 15 20 25 0 5 10 15 20 25 0 26% 28% 24% 23% 25% 25% 29% 30% 28% 27% 27% 27% 0.005 29% 30% 28% 26% 26% 26% 36% 47% 38% 36% 28% 29% 0.01 33% 39% 36% 29% 26% 28% 46% 55% 48% 40% 32% 37% 0.03 38% 22% 31% 26% 32% 32% 55% 53% 51% 48% 48% 45% 0.05 43% 36% 31% 32% 29% 29% 67% 55% 43% 43% 53% 53% 1.5 44% 44% 43% 33% 31% 33% 59% 57% 54% 43% 40% 41% 1.7 39% 32% 30% 29% 24% 25% 51% 55% 42% 41% 37% 41% 1.9 35% 31% 27% 25% 27% 26% 48% 48% 43% 36% 36% 41% 2.1 29% 27% 28% 23% 28% 28% 42% 42% 36% 32% 36% 32% 2.3 28% 25% 26% 30% 27% 27% 40% 44% 37% 38% 38% 37% 2.5 29% 27% 26% 23% 28% 28% 39% 42% 38% 42% 38% 37% REAL PART Uncertainy of Optical Input Data in Percent < 0.05 Deviation From True Values < 0.075 Deviation From True Values mI theo Sigma Extreme Error Model 0 5 10 15 20 25 0 5 10 15 20 25 0 n/d n/d n/d n/d n/d n/d n/d n/d n/d n/d n/d n/d 0.005 26 30 28 25 25 26 26 33 32 31 35 32 0.01 47 53 52 45 46 41 47 56 54 53 54 50 0.03 62 61 58 54 58 58 62 60 63 59 63 61 0.05 92 90 78 79 84 83 92 88 85 82 87 83 1.5 49 54 54 45 41 40 49 56 56 54 52 51 1.7 48 49 45 42 46 44 48 45 49 49 53 45 1.9 48 44 42 42 41 42 48 50 46 42 49 44 2.1 43 44 40 37 43 41 43 46 47 43 46 44 2.3 41 45 38 38 41 41 41 44 42 39 43 43 2.5 42 45 40 40 43 41 42 42 41 43 43 44 Uncertainy of Optical Input Data in Percent mI theo Sigma Extreme Error Model Gauss Error Model < 50% Deviation From True Values IMAGINARY PART 0 5 10 15 20 25 0 5 10 15 20 25 32 38 32 27 28 25 32 49 44 40 41 39 38 52 45 40 39 33 38 53 49 53 50 48 47 53 52 45 46 41 47 56 54 53 54 50 29 24 26 23 19 14 29 25 30 21 28 24 26 23 18 22 21 18 26 29 23 18 18 18 45 48 41 31 32 25 45 49 56 45 36 44 46 44 37 32 34 33 46 44 48 41 42 34 38 37 36 34 36 27 38 48 39 36 47 36 28 31 30 34 30 23 28 45 36 31 33 35 24 33 34 30 29 27 24 31 29 34 35 32 25 34 28 26 23 23 25 37 31 35 35 34 < 0.006 Deviation From True Values Extreme Error Model Gauss Error Model 0 5 10 15 20 25 0 5 10 15 20 25 26% 30% 29% 26% 26% 23% 29% 34% 33% 31% 33% 30% 29% 34% 35% 34% 33% 31% 36% 49% 47% 47% 45% 42% 33% 46% 40% 42% 42% 38% 46% 58% 54% 52% 56% 52% 38% 23% 33% 28% 30% 30% 55% 49% 48% 45% 53% 52% 43% 46% 44% 38% 34% 41% 67% 68% 64% 52% 48% 58% 44% 42% 51% 42% 47% 43% 59% 63% 67% 54% 59% 59% 39% 33% 33% 37% 32% 28% 51% 52% 55% 48% 47% 41% 35% 43% 38% 33% 37% 33% 48% 59% 47% 44% 51% 46% 29% 37% 35% 27% 28% 32% 42% 48% 46% 41% 41% 45% 28% 29% 31% 34% 25% 28% 40% 43% 40% 42% 43% 44% 29% 31% 29% 28% 29% 30% 39% 44% 41% 42% 40% 44% < 0.05 Deviation From True Values < 0.075 Deviation From True Values Gauss Error Model Fig. 11. Results for all simulations, separated according to the imaginary part and the geometric standard deviation (mode width) of particle size distribution. The colors (see color legend at bottom of figure) visualize the number of simulation cases (in terms of percentages) that have a retrieval uncertainty that is better than 50%, respectively, 20% for effective radius and number, surface-area, and volume concentrations, i.e., 50% and 20% serve as cutoff values for the simulations that meet these constraints. The numbers in each cell provide the number of cases (in percent relative to the total number of cases that belong to each cell). With regard to the real part, the colors describe the number of cases that are retrieved to better than 0.075, respectively, 0.05. Only cases that meet this threshold value are included in the statistics. The imaginary-part retrieval is color coded on the basis of a retrieval result that is better than 50%, respectively, better than 0.006i. If values cannot be given (not defined) we inserted n/d in the cells. 4998 Vol. 58, No. 18 / 20 June 2019 / Applied Optics Research Article on Gaussian-like distributions of data products that can only in part be achieved at the moment. However, we prefer to introduce this standardized concept of representation of the results already at this stage of our research work, as it allows us to make the results presented in this study comparable to future performance tests of TiARA. We also avoid more arbitrary selections of intervals that could be used to reflect the perfor- mance of TiARA in the main region of our interest with regard to performance assessment, i.e., accurate retrievals in more than 50% of all simulation cases. The cells in the heat maps for which we achieve 68.3%, 95.5%, and 99.7% are shown in green (light, medium, dark) colors. Cases for which at least 50% of all investigated cases of effective radius and number, surface-area, and volume con- centrations fall into the selected uncertainty levels are shown in orange. If less than 50% of the cases are within the uncertainty levels, we show the cells in red color. One row of the map of results related to the imaginary part retrievals has no color. The values are not defined. The left part of the map shows the results for the EEM. The right half of the map shows the results for the GEM. We note once more that the EEM currently is our preferred method of uncertainty analysis, as the retrieval results seem to be more consistent with what we can expect from erroneous simulation data, i.e., retrieval uncertainties increase on average with in- creasing optical data input uncertainty. As mentioned before, the results we obtain with GEM are not conclusive, and we need to carry out more simulations and analysis of the results before we will consider switching to this new uncertainty analysis scheme. In order to illustrate the utility of Fig. 11, we will present two specific examples of how to interpret the figure. We select the SURFACE-AREA map, mI theo, and the row that shows the value 0.005 for the case: extreme error model: <20% deviation from the true values. The term mI theo refers to the correct value of the imaginary part of the CRI that was used for calculating the optical input data. We select column 20, which denotes the optical data un- certainty of 20%. We see a light green cell, and the value 81% inside. It means that if we take all simulations regardless of the sigma value and select the simulations for which the true value of the imaginary part was 0.005i, 81% of all surface-area concentration retrievals could be retrieved with less than 20% uncertainty. As another example, we select the IMAGINARY PART, Sigma map and the row that shows the value 1.9. We use the map Gauss error model: <50% deviation from true values. We select the column 5%, which again denotes the uncertainty of the optical input data. We see an orange cell and the value 50 inside. It means that we take all simulations obtained from PSDs with a geometrical standard deviation of 1.9 regardless of the imaginary part (0i, 0.005i, 0.01i, 0.03i, 0.05i); we ob- tain the imaginary part with less than 50% uncertainty for 50% of all investigated cases. With regard to the size parameters, on average, more cases fall within 50% than 20% retrieval uncertainty. This result holds true for both uncertainty models. With regard to the EEM, the retrieval performance on average worsens significantly stronger if we reduce the threshold from 50% to 20% deviation from the true value, compared to the results obtained with GEM. However, we need to take the results of better performance of GEM compared to EEM with caution. For example, in the case of <50% Deviation From True Values, it seems counter- intuitive that the retrieval quality of the size parameters is independent of the uncertainty of the optical input data. One explanation for this unusual behavior we obtain with GEM could be the low number of inversion runs (nine) that we use for each PSD and error level. As mentioned before, this number by far is not able to reproduce a Gaussian-shape-like error curve, i.e., the statistics of the distribution of the errone- ous input data may be (strongly) biased. For example, it may happen that the nine inversion runs accumulate around 0% uncertainty rather than covering the bell-shaped form of the Gaussian curve in a representative way (Section 3.C.2). This result shows that we still have insufficient understanding of how to correctly apply the GEM. The situation with regard to retrieval quality is different in the case of the CRI retrievals. With regard to the real part, con- siderably less than 50% of all cases are retrieved to better than 0.075, respectively, 0.05 accuracy. There are only a few exceptions. With regard to the imaginary part, we see the same poor retrieval quality. E. Statistics: Summary of Statistical Analysis Figure 12 summarizes the main performance characteristics of TiARA if we take the retrieval uncertainty of 50% deviation from the true results as benchmark. As noted a few times be- fore, this value is the primary threshold value we have been aiming at in this first performance study. Thus, we do not show the results for 20% deviation from the true results in this figure. We show the results in dependence on the used error model and the uncertainty of the optical input data. With regard to the EEM, we can derive effective radius to within 50% in more than 60% of the simulation cases if we keep the optical data uncertainty below 15%. The exceptions to this result are strongly light-absorbing particles, in which case, the retrieval uncertainty is on average larger; see Fig. 11. Approximately 60% of all simulation cases of number concen- tration retrievals fall within this threshold of 50%. Surface-area concentration retrievals are very robust. As mentioned before (Figs. 10 and 11), the retrieval quality of this parameter seems to depend little on optical input data uncertainties of up to 25%. The pattern of uncertainty of volume concentration retrievals with regard to optical data uncertainty is similar to the patterns we find for effective radius. Figure 12 once more confirms that the quality of the retrieval of the real part and the imaginary part of the CRI is not satisfactory, regardless of the uncertainty of the optical input data. We find that the quality of the microphysical data products worsens with increasing measurement error in the case of the EEM. In that sense, we find at least a pattern we expected at the start of our simulation work. In contrast, this pattern is missing if we use the GEM. As noted before, results for the GEMmay not represent the true statistics in view of the limited number of error runs for each optical dataset tested in this study. Research Article Vol. 58, No. 18 / 20 June 2019 / Applied Optics 4999 4. ORACLES: MEASUREMENT EXAMPLE FROM 22 SEPTEMBER 2016 We present the results of the inversion of optical data taken with the HSRL-2 during the ObseRvations of Aerosols above CLouds and their IntEractionS (ORACLES) mission (https://espo.nasa. gov/oracles/content/ORACLES_Two-page4_ORACLES_Flyer). A more comprehensive description of microphysical particle properties of aerosols observed during ORACLES will be given in a future presentation. A. Optical Properties We used data collected during the flight from 08:00 and 14:40 UTC on 22 September 2016. Smoke from biomass burning was observed to heights up to 8 km above sea level. There was considerable variation of particle properties and particle concentration during the flight. Figure 13 provides a comprehensive overview on several key optical properties of the smoke. Particle Angström exponents provide us with some insight to particle size. Particle lidar ratios contain information on particle size and light-absorption capacity. Particle Angström exponents varied between 1 and 1.5 above approximately 3 km height above sea level during most of the flight time. These values point to particles in the fine- mode fraction, i.e., particles below approximately 500 nm radius. Below approximately 3 km height, particles were con- siderably larger. We excluded them from our data analysis. The linear particle depolarization ratios indicated a significant contribution from non-spherical particles, which most likely were dust particles. The lidar ratio at 532 nm is above approximately 65 sr in most parts of the curtain plot. Maximum values are around 90 sr. These high values of lidar ratios and Angström exponents above 1 (above approximately 2 km height above sea level) in- dicate the presence of aerosol pollution with comparably strong light-absorption capacity. Biomass-burning aerosol particles likely were the reason for the observed hight lidar ratios. The ratio of the lidar ratios at 355 nm and 532 nm, respec- tively, varies around 1. There is a tendency toward values 0 5 10 15 20 250 5 10 15 20 25 0 20 40 60 80 100 ERROR LEVEL [%]ERROR LEVEL [%] effective radius (a) CA SE S [% ] 0 5 10 15 20 25 number concentration (b) ERROR LEVEL [%] surface concentration (c) 0 5 10 15 20 25 ERROR LEVEL [%] ERROR LEVEL [%]ERROR LEVEL [%] volume concentration (d) 0 5 10 15 20 25 real part (e) 0 5 10 15 20 25 imaginary part (f) Fig. 12. Simulation results based on all PSDs and CRIs and the use of the extreme-error model (squares) and the Gauss-error model (stars). The symbols represent the sum (expressed in terms of percent of all cases) of all simulation cases for which the shown parameters are within 50% deviation from the true results. Fig. 13. Curtain plots of particle Angström exponents (extinction coefficients measured at 355 nm and 532 nm), lidar ratio at 532 nm, and ratio of lidar ratios measured at 355 nm and 532 nm. Data are shown up to 8 km height above sea level. The measurement time was from 8:00 UTC to 14:40 UTC. The temporal resolution of the data is 10 s; the vertical resolution is 100 m per data point. Geographical location of the aircraft was between approximately −22.7°N and −21.21°N and from 12.69°E to 6.85°E. In the case of the lidar ratio, the x-axis range was from −22.7°N to −22.97°N and from 12.69°E to 13.85°E. Data below approximately 3 km height above sea level were in part excluded from the analysis with TiARA (see Figs. 14 and 15) because these data points either did not meet the threshold values regarding signal-to-noise ratio needed for a reliable, high- quality data inversion or contained significant values of the particle linear depolarization ratio. As pointed out in previous publications and also in this contribution, we lack a reliable light-scattering model that can accurately describe particle backscatter coefficients at 180º that present key input data for TiARA. 5000 Vol. 58, No. 18 / 20 June 2019 / Applied Optics Research Article Fig. 14. Curtain plots of microphysical parameters (left column) and retrieval uncertainties in terms of absolute values of the data products (right column). We show (first row) number concentration, (second row) effective radius, (third row) surface-area concentration, and (forth row) volume concentration. The data products do not include results for particles below 50 nm particle radius. Measurement time, measurement location and height are the same as in Fig. 13. The individual profiles shown in Fig. 13 were averaged into 5 min averages and subsequently used for data inversion. We applied a 1 min gliding average to each set of 5 min profiles, which allows us to produce the curtain plots of microphysical parameters in this figure and Fig. 15. Research Article Vol. 58, No. 18 / 20 June 2019 / Applied Optics 5001 below 1 (as low as approximately 0.75) in heights below 4 km and as high as 1.5 above 4 km above sea level. These ratios corroborate the assumption that comparably strong light- absorbing particles were observed during the flight. B. Microphysical Data Products Figure 14 shows curtain plots of number, surface-area, and vol- ume concentrations and particle effective radius. We used only optical data points that fulfilled quality assurance checks, e.g., optical data with strong signal-to-noise ratio, i.e., uncer- tainty of the optical data products below 15% and low linear particle depolarization ratios (to exclude contamination by dust particles) below 5%. Total number concentration in large portions of the plot does not exceed 1000 particles∕cm3. However, particle concentration significantly increases between approximately 9:30 and 11 UTC in flight levels between 3 km and 5 km height above sea level. We find number concentrations as high as 3000 particles∕cm3. After 12:00 UTC, we find another, though less significant, increase in number concentration, up to 1500 particles∕cm3. The uncertainty of the retrieved val- ues of number concentration for the most part of the curtain plot stays below approximately 250 particles∕cm3. The excep- tion is the region where we find the significantly increased values of number concentration. In that region, uncertainties are as large as approximately 750 particles∕cm3. That means retrieval errors are on average 25% for the complete cur- tain plot. Particle effective radius remains comparably constant above 4 km height between 9:00 and 11:30 UTC. Values are a little larger than 0.2 μm. Below 4 km height and particularly after 11:30 UTC, mean particle size seems to increase. We find particle effective radii above 0.4 μm. Retrieval uncertainties ac- cordingly increase in the portions of the curtain plot that show larger values of effective radius. The results for effective radius are rather noisy in these portions. In contrast, the section of the flight track that contains particles with a comparably con- stant value of effective radius (around 0.2 μm) shows rather low uncertainties. Uncertainties of effective radius are less than 0.04 μm, which translates into a relative uncertainty below 20%. Surface-area concentration and volume concentration behave in a fashion similar to number concentration. Values remain rather steady during most of the flight except for the flight time from 9:30–11:00 UTC. During that time, surface- area concentration increases from average values of less than 300 μm2∕cm3 to as high as 700 μm2∕cm3. As in the case of number concentration, we find again another slight increase of surface-area concentration after 12:00 UTC in flight levels up to 5.5 km height above sea level. Volume concentration for the most part of the flight stays below 20 μm3∕cm3 except during the time of the flight that shows the increase in number and surface-area concentra- tion. Volume concentration increases to values as large as 50 μm3∕cm3. On average, the retrieval results show more noise for surface-area concentration and for volume concentration after 11:30 UTC. Uncertainties on average stay below approx- imately 20% for volume concentration. With regard to surface-area concentration, uncertainties are below approxi- mately 10% in the region of increased number concentration. Outside this region, we find higher retrieval uncertainties, which also tend to vary significantly stronger from data point to data point. Uncertainties increase to 15%–20%. Figure 15 shows the contribution of particles in the fine- mode fraction (particle radius below 500 nm) of the number concentration to number concentration of the complete par- ticle size distribution. We also show the ratio of the fine-mode particle effective radius (we considered only particles with radius below 500 nm in our computation) to effective radius of the total PSD. These curtain plots provide us not only with valuable information on the share of particles in the fine-mode frac- tion of the retrieved particle size distribution, but also serve Fig. 15. Curtain plots of (top) ratio of number concentration in the fine mode fraction of the particle size distribution (particles with radius between 50 nm and 500 nm) to total number concen- tration and (bottom) ratio of effective radius of particle size distri- bution with particles less than 500 nm radius (and above 50 nm) to effective radius of the complete particle size distribution. Measurement time, measurement location and height are the same as in Fig. 13. The spatial and temporal resolutions of the curtain plots are the same as in Fig. 14. 5002 Vol. 58, No. 18 / 20 June 2019 / Applied Optics Research Article as a quality flag parameter. We are quite confident that we can derive particle effective radius and number, surface- area, and volume concentrations with comparably high quality if particle size is below 500 nm radius, as explained below. We are aware of the fact that retrieval uncertainties may increase with particle radius, particularly if particles are above approximately 1–2 μm radius because of the limited range of measurement wavelengths. The ratio, particularly that of number concentration, but also to some extent effective radius, provides us with a qualitative confidence value on the retrieval quality. If these ratios of fine-mode number concentration to total concentration and fine-mode effective radius to total effective radius are close to the value 1, we likely obtain comparably low retrieval uncertainties. We see that the ratio of fine-mode number concentration to total number concentration is close to 1 for nearly all the data points that were used for data inversion. There are only a few outliers, and we assume that these outliers are related mainly to larger than usual retrieval uncertainties. We are quite con- fident that most of the particles in this biomass-burning plume belonged to the fine-mode fraction of the particle size distributions. With regard to effective radius, we find a ratio of 0.8 or larger in the portion of the plume that was observed between 9:00 and 11:30 UTC above 4 km height above sea level. Again, this result indicates that most particles in this part of the plume belonged to the fine-mode fraction of the retrieved particle size distributions. We find similarly large values of this ratio in many other parts of the curtain plots, although the overall ap- pearance of the plot indicates more variation of effective radius. On the one hand, larger retrieval uncertainties may be one rea- son for the more noisy structure of the plot. On the other hand, larger effective radius observed after 11:30 UTC may be an- other reason. Particularly below 4 km height above sea level, we find stronger variations. 5. CONCLUSION We presented the main results of extensive simulation studies carried out with our new, autonomous inversion algorithm, TiARA. We applied the algorithm to a case of experimental data collected during ORACLES. The values of the data prod- ucts obtained with TiARA for the ORACLES case are in a reasonable range of numbers. Nevertheless, we need to con- firm the quality of the results with in situ data and a more detailed uncertainty analysis. Some of the strengths of TiARA are: • This methodology works with inversion with regulariza- tion, which allows for, e.g., great flexibility with regard to the resolution capacity of the investigated particle size distribu- tions. We do not need to assume any specific shape of the par- ticle size distribution. • TiARA allows for the unsupervised inversion of curtain plots of optical data acquired with NASA Langley’s airborne HSRL-2 instrument. TiARA is used for producing several mi- crophysical particle parameters, i.e., effective radius, and num- ber, surface-area, and volume concentrations to a level that until now could be obtained with the data-operator-controlled version of the inversion algorithm, e.g., Ref. [36]. • Contrary to the manual version of our inversion method- ology, we include for the first time number concentration as an additional standard data product in TiARA. • We managed to test different levels of uncertainties of op- tical input data. Although such work has been done in previous publications, it was never based on a statistically significant set of data. We tested hundreds of scenarios and used a wide range of data uncertainty levels ranging from 5% to 25%. We con- sider tests of such a range of uncertainties important, as they give us a much clearer picture on the maximum acceptable op- tical data uncertainty above which microphysical data products become useless. We find that 15% data uncertainty is a thresh- old value that needs to be considered in future instrument de- velopment. Of course, lower uncertainties are desirable, as they could further improve the data products (though no proof of this claim is available). • Another important outcome of this study is that data un- certainties above 20% clearly lead to a reduction of the quality (uncertainty respectively accuracy/precision) of data products obtained with TiARA. • Adding to this finding, i.e., impact of measurement un- certainties, we find that TiARA performs at least as well as our manually operated inversion algorithm with regard to a) quality of retrieved data products, and b) sensitivity to uncertainty of optical input data. • We obtain for the first time insight on the uncertainties of the retrieved data products by making use of the statistical analysis of a large number of simulations in which we use differ- ent properties of the investigated particle size distributions, i.e., size parameters and complex refractive index. This analysis, which could not have been achieved with a manual, data- operator-controlled algorithm, allowed us for the first time to introduce the concept of bias (or accuracy) and noise (or precision) in our uncertainty analysis. • The simulations corroborate findings we summarized in Refs. [17,18]. In these publications, we showed the quality of data products (retrieval accuracy of effective radius and volume and surface-area concentrations) obtained with TiARA for the case of inversion of experimental data. Despite these strengths, TiARA still requires improvements in order to meet the high demands that are required with regard to an accurate description of atmospheric aerosol pollution. We mention just a few important points that serve as guidance for future work. • TiARA currently has been tested only for particle size distributions of monomodal shape. Work on bimodal size dis- tributions is in progress but is affected by extreme challenges. It is a futile approach to combine two monomodal particle size distributions into bimodal distributions, particularly in view of the fact that we must obtain statistically significant results. The number of bimodal size distributions that can be generated from the monomodal size distributions used in this study leads to an amount of bimodal distributions that cannot be processed in any reasonable time frame, even with supercomputers that might be available to us. • We also need to consider that the weighting of each of the two monomodal size distributions that are needed to create a bimodal particle size distribution adds another element of Research Article Vol. 58, No. 18 / 20 June 2019 / Applied Optics 5003 complexity to the simulations. We thus need to think of smar- ter approaches to this specific problem of using bimodal particle size distributions in simulation studies. • The main challenge of retrieving an accurate and precise value of the CRI remains an open issue. We made progress in this work, as we now have a much clearer picture of where we stand, based on this statistical set of simulations. We also know from a previous study, [36], what uncertainty of the retrieved CRI (real and imaginary part) is permitted at maximum to ob- tain meaningful values of the SSA. Thus, the goal of future work will be to insert additional tools in TiARA that will help us to come closer to this ideal situation shown in Ref. [36]. • We assume size- and wavelength-independent CRIs in the simulations. This assumption is a serious restriction and likely affects simulations with bimodal size distributions con- siderably stronger than simulations with monomodal size dis- tributions. The number of possible scenarios that need to be tested to obtain statistically significant results is simply too large if we apply current strategies of simulation studies. • Despite the fact that we made progress with developing a robust uncertainty analysis scheme, we are still far from calling this task fulfilled. We tested two different uncertainty models (optical input data), i.e., the Gauss-error distribution model (GEM) and the extreme-error model (EEM). Both methods have advantages and disadvantages, though it is obvious that in the future, we must move to GEM. However, our simula- tions remain inconclusive with regard to the true performance of GEM, should, in the future, this uncertainty model of op- tical input data become the method of choice in TiARA. We are preparing a manuscript that will further elaborate on the topic of uncertainty analysis. We hope this work will bring us one step closer to solving this challenge in future versions of TiARA. • We explicitly excluded simulations with particle size distributions in which particles are of non-spherical shape. • We are still struggling with providing robust error bars (bias and noise), though we made significant progress in that regard in this contribution. • We exclude in our software the possibility of inverting data that represent particle sizes below 50 nm radius and above 10 micrometers. Although we are working on resolving that challenge at the larger particle radius range, we have not yet developed any realistic strategy that would allow us to identify particles below 30 nm radius, given the available lidar measure- ment wavelengths of 355 nm, 532 nm, and 1064 nm. In this simulation study, we targeted a statistically based uncertainty analysis, which is in contrast to studies on the uncertainties of inversion results carried out with the data- operator-controlled inversion algorithm. TiARA has been devel- oped for the large-scale processing of data from measurements with airborne and space-borne lidar. Thus, a statistical approach toward the performance characterization is necessary. We ob- tained first though very simple results in that regard, because of the amount of simulation data needed for a statistical analysis. Based on the statistical analysis of the simulations, we noticed that there is a systematic shift of the retrieved values of effective radius and number, surface-area, and volume concentrations compared to the true values. This shift is comparably small for surface-area concentration, and larger for the other param- eters. With regard to real and imaginary parts of the CRI, we found a significant shift. The spread of the retrieval results around the true values also differs for the different data prod- ucts. Surface-area concentration has the smallest spread. The largest spread is found for real and imaginary parts of the CRI. We selected a benchmark of 50% retrieval accuracy of effective radius and number, surface-area, and volume concen- trations as acceptable performance of this first version of TiARA. We achieve this goal if the optical input data are error free. This case also gives us insight into the mathematical uncertainties caused by TiARA. Simulations with different uncertainty levels of the input optical data show that satisfac- tory performance of TiARA on average can still be achieved if measurement uncertainties are ≤15%. Retrievals of the CRI are not satisfactory from a statistical point despite the situation that individual inversion cases deliver acceptable results. The retrieval accuracy of the real part needs to be better than 0.1 and that of the imaginary part should be better than 50% in order to achieve an accuracy of approximately 0.05 for SSA [36]. We achieve this accuracy of the CRI in less than 50% of all cases tested in this study. Particularly in the case of low light-absorbing particles, i.e., the imaginary part is less than 0.01i, deviations between retrieved and true value are considerably larger than 50%. For these reasons, we will not use TiARA for the retrieval of the CRI before simulations indicate an improvement of the quality of this data product. Simulations show that EEM is a comparably robust scheme for inferring uncertainty bars of the data products. As men- tioned before, we want to use a new model of uncertainty analysis, i.e., GEM, in future versions of TiARA. In the present study, we found that simulation results with GEM are still in- conclusive. For example, retrieval uncertainties do not increase in a clear way if uncertainties of the optical input data increase. Other results of this simulation study also suggest that we do not understand the mechanism of GEM in TiARA well enough to use it as standard procedure for our uncertainty analysis at the current stage of algorithm development. We note that we do not use TiARA for analyzing optical data of particles with non-spherical geometry if the linear par- ticle depolarization ratio is larger than approximately 5% [18]. Previous work has shown [32] that linear particle depolarization ratios below approximately 5% still provide us with reasonable values of the microphysical data products. We currently work on developing methods that allow us to invert optical data of depolarizing particles. A few results have been presented pre- viously [39,40]. That work was based on using the data- operator-controlled version of the inversion software. In future work, we will focus on the following main topics in order to further improve the performance of TiARA: • Application of TiARA to experimental data • Next version of post-processing of inversion results • Improvements of the quality of the derived CRI • Improved uncertainty analysis of data products • Simulations with bimodal particle size distributions Table 4 shows a more comprehensive list of data products provided by TiARA. In this contribution, we focussed on key parameters that can be found in publications that reported 5004 Vol. 58, No. 18 / 20 June 2019 / Applied Optics Research Article on the manually operated version of the inversion algo- rithm, e.g., [4,8]. For example, TiARA provides data products in which we sep- arate the fine-mode and the coarse-mode fraction of particle size distributions. We define 500 nm particle radius as the boundary between fine-mode and coarse-mode particles. In that regard, TiARA has the potential of providing additional information other than information on bulk parameters; see, e.g., [41]. TiARA provides detailed information on uncertainty levels that go beyond the usual concept of uncertainty bars. The “x” symbols in Table 4 mark the data products pre- sented in this contribution. The parameters marked by the “+” symbols can already be produced. We currently analyze the results of that part of the simulation studies. First results obtained from experimental data have recently been presented in Ref. [18]. The data products denoted by the “o” symbols are available for monomodal PSDs, but we want to include results based on bimodal PSDs, which have not yet been generated. We are cur- rently developing automated software that allows us to produce the result tables. The simulations denoted by the “#” symbol pose a challenge as we currently use the AERONET light- scattering model for mineral dust [42]. We are preparing a publication that summarizes inversion results of optical data that describe scattering by mineral dust observed during DISCOVER-AQ California and DISCOVER-AQ Texas. This contribution will serve as a basis for developing a methodology suitable for the inversion of data that describe light scattering by non-spherical particles at 180º. APPENDIX A: MEANING OF NUMBERS IN FIGURE 5 The following list provides information on where the individual parts of the flow chart (Fig. 5) are explained in the main body of the text: Table 4. Data Products Inferred with TiARAa Data Type Total PSD fm PSD cm PSD * ** Reff mp x o o Number mp x o o Surf.-area mp x o o Volume mp x o o Imag. part mp x o o ** Real part mp x o o ** Scat. coeff. op + o o * Abs. coeff. op + o o * SSA op + o o * Phase funct. op # # # * Asym. param. mp # # # * SNS ratio mp o o o aData products that will be presented in follow-up contributions are marked by plus signs (+) and circles (o). mp denotes microphysical parameters, and op denotes optical parameters. The # symbol means that these parameters will use a novel light-scattering model that can better reproduce particle backscattering and linear depolarization ratios at 180° scattering direction for non-spherical particles. *means that wavelength dependence is the effect of particle size and not the CRI, which is kept wavelength independent in the inversion algorithm. **means that the inversion algorithm produces this parameter intrinsically, i.e., it does not need to be calculated from the data products obtained in data inversion. SNS denotes spherical-to-non-spherical particle ratio. Ta b le 5. In p ut P ar am et er s o f S iz e D is tr ib ut io ns U se d in th e S im ul at io n S tu d ie sa M ed ia n R ad iu s (μ m ) E ff ec ti ve R ad iu s (μ m ) Su rf ac e- A re a C on ce nt ra ti on μ m 2 ∕c m 3 ) V ol um e C on ce nt ra ti on (μ m 3 ∕ cm 3 ) 20 0. 03 ; 0. 04 ; 0. 06 ; 0. 08 ; 0. 11 ; 0. 16 ; 7 e − 3 ; 8 .8 e − 3 ; 1 .2 e − 2 ; 1 .5 e − 2 ; 2 e − 2 ; 2 .7 e − 2 ; 7 e − 5 ; 1 .2 e − 4 ; 2 .1 e − 4 ; 4 e − 4 ; 7 .6 e − 4 ; 1 .5 e − 3 ; 60 0. 09 ; 0. 12 ; 0. 17 ; 0. 24 ; 0. 34 ; 0. 49 ; 6 .3 e − 2 ; 8 e − 2 ; 0. 10 ; 0. 14 ; 0. 18 ; 0. 24 ; 1 .9 e − 3 ; 3 .2 e − 3 ; 5 .8 e − 3 ; 1 .1 e − 2 ; 2 .1 e − 2 ; 4 e − 2 ; 10 0 0. 15 ; 0. 20 ; 0. 28 ; 0. 40 ; 0. 57 ; 0. 81 ; 0. 18 ; 0. 22 ; 0. 29 ; 0. 38 ; 0. 50 ; 0. 67 ; 8 .8 e − 3 ; 1 .5 e − 2 ; 2 .7 e − 2 ; 5 e − 2 ; 9 .5 e − 2 ; 0. 18 ; 14 0 0. 21 ; 0. 28 ; 0. 39 ; 0. 55 ; 0. 79 ; 1. 14 ; 0. 34 ; 0. 43 ; 0. 56 ; 0. 74 ; 0. 99 ; 1. 32 ; 2 .4 e − 2 ; 4 .1 e − 2 ; 7 .3 e − 2 ; 0. 14 ; 0. 26 ; 0. 50 ; 18 0 0. 27 ; 0. 36 ; 0. 50 ; 0. 71 ; 1. 02 ; 1. 47 ; 0. 57 ; 0. 72 ; 0. 93 ; 1. 22 ; 1. 63 ; 2. 18 ; 5 .1 e − 2 ; 8 .7 e − 2 ; 0. 16 ; 0. 29 ; 0. 55 ; 1. 07 ; 22 0 0. 33 ; 0. 45 ; 0. 62 ; 0. 87 ; 1. 25 ; 1. 79 ; 0. 85 ; 1. 07 ; 1. 39 ; 1. 83 ; 2. 44 ; 3. 26 ; 9 .4 e − 2 ; 0. 16 ; 0. 29 ; 0. 53 ; 1. 01 ; 1. 95 ; 26 0 0. 39 ; 0. 53 ; 0. 73 ; 1. 03 ; 1. 47 ; 2. 12 ; 1. 18 ; 1. 49 ; 1. 94 ; 2. 55 ; 3. 40 ; 4. 55 ; 0. 15 ; 0. 26 ; 0. 47 ; 0. 88 ; 1. 67 ; 3. 22 ; 30 0 0. 45 ; 0. 61 ; 0. 84 ; 1. 19 ; 1. 70 ; 2. 44 ; 1. 57 ; 1. 99 ; 2. 58 ; 3. 40 ; 4. 53 ; 6. 06 ; 0. 24 ; 0. 40 ; 0. 72 ; 1. 35 ; 2. 56 ; 4. 94 ; gs d 1. 5, 1. 7, 1. 9, 2. 1, 2. 3, 2. 5 R ea l pa rt 1. 4, 1. 5, 1. 6, 1. 7 Im ag . pa rt 0 i, 0 .0 0 0 1 i, 0 .0 0 1 i, 0 .0 0 2 5 i, 0 .0 0 5 i, 0 .0 0 7 5 i, 0 .0 1 i, 0 .0 1 5 i, 0 .0 2 i, 0 .0 2 5 i, 0 .0 3 i, 0 .0 3 5 i, 0 .0 4 i, 0 .0 4 5 i, 0 .0 5 i a W e us ed PS D s no rm al iz ed to on e pa rt ic le pe r cm 3 . gs d de no te s th e ge om et ric al st an da rd de vi at io n (m od e w id th ). Research Article Vol. 58, No. 18 / 20 June 2019 / Applied Optics 5005 1. Sections 2.A and 2.B.1; 2. Section 3.A and Appendix B, optical data gp; 3. Section 3.A and Appendix B, gp; 4. Section 2.B.2; particularly, we refer to the appendix in reference [16]; 5. Section 2.C.2 6. and Sections 2.C.2, 3.A, and Table 3 in the Appendix of reference C; 7. Sections 2.A and 2.B.1; 8. Section 2.D; 9. see Section 3.2 in reference [9]; 10. this post-processing option will be used in the next version of TiARA; 11. see Section 3.C in Ref. [4]; 12. this post-processing option will be used in the next version of TiARA. APPENDIX B: SIMULATIONS: INPUT PARAMETERS Table 5 lists the input parameters of microphysical parameters (median radius, geometrical standard deviation, and real and imaginary parts of the CRI) that were used to compute the op- tical data with a Mie-scattering algorithm [21]. The optical data were subsequently used as input for TiARA. The other param- eters listed in this table show the data products retrieved with TiARA, i.e., particle effective radius and number, surface-area, and volume concentrations. We are in the process of including particle radii below 50 nm into our LUTs, i.e., down to 1 nm. At that point, we will be able to analyze our simulations for which effective radii are less than 0.14 μm. APPENDIX C: SIMULATIONS WITH CHANNEL-DEPENDENT ERRORS Table 6 lists scenarios in which measurement uncertainty de- pends on the measurement channel. These simulations will provide us with additional insight on the robustness of TiARA. These simulations target experimental scenarios that Table 6. Scenarios of Measurement-Channel- Dependent Optical Data Uncertaintiesa β355 β532 β1064 α355 α532 EEM A 10 20 15 15 15 EEM B 10 5 15 20 20 EEM C 10 5 15 10 10 EEM D 5 5 5 8 10 GEM A, 1σ 10; 3.3 20; 6.6 15; 5 15; 5 15; 5 GEM B, 1σ 10; 3.3 5; 1.7 15; 5 20; 6.6 20; 6.6 GEM C, 1σ 10; 3.3 5; 1.7 15; 5 10; 3.3 10; 3.3 GEM D, 1σ 5; 1.7 5; 1.7 5; 1.7 8; 2.7 10; 3.3 aThe numbers are given in percent. Fig. 16. Inversion results as shown in Fig. 10. Results are not scaled to 100%. Meanings of symbols and colors are the same as in Fig. 10. 5006 Vol. 58, No. 18 / 20 June 2019 / Applied Optics Research Article are more realistic. For example, uncertainties of the optical pro- files may be different for the different measurement channels. We note that the four cases listed in this table by far do not describe the variety of scenarios that may occur under realistic measurement situations. Nonetheless, we will gain a better understanding if it is important that specific measurement channels should have lower measurement uncertainties than other measurement channels. Simulations have already been carried out for these scenar- ios. We will present the results in a future presentation. We will use the next version of TiARA that will include the improved data post-processing scheme described in Refs. [14–16]. APPENDIX D: HISTOGRAM DISTRIBUTIONS Figure 16 shows once more the histogram distributions of the data products presented in Fig. 10. In contrast to Fig. 10, we did not scale the results to 100% on the y axis. The shape of the distribution of the histogram columns can be clearly identified. As mentioned in the main body of the text, surface-area con- centration can be retrieved exceptionally well. There seems to be a comparably well-defined shape of the distribution of the histogram columns for all data products. The shape in terms of peak and width varies for the different data products. In the future, we plan to parameterize these distribution functions. Parametrization allows us to provide important performance characteristics of TiARA in a highly compressed way. Thus, parametrization will also allow us to identify performance im- provements of future versions of TiARA if they are significant. Furthermore, the parametrization allows us to compare our results to a study carried out in Ref. [30]. The authors of that study for the first time investigated the effects of systematic and random errors on the retrieval quality of particle microphysical data products based on their own inversion methodology. Funding. Langley Research Center (NASA Langley Research Center). Acknowledgment. We thank Dr. Matthias Tesche, for- merly University of Hertfordshire (UK), now at University of Leipzig (Germany), for preparing several figures in this publication. REFERENCES 1. A. N. Tikhonov and V. Y. Arsenin, eds., Solutions of Ill-Posed Problems (Wiley, 1977). 2. D. Müller, “Inversionsalgorithmus zur Bestimmung Physikalischer Partikeleigenschaften aus Mehrwellenlängen-Lidarmessungen (inver- sion algorithm for the determination of particle properties from multi- wavelength lidar measurements),” Ph.D. thesis (Universität 1997). 3. D. Müller, U. Wandinger, D. Althausen, I. Mattis, and A. Ansmann, “Retrieval of physical particle properties from lidar observations of extinction and backscatter at multiple wavelengths,” Appl. Opt. 37, 2260–2263 (1998). 4. D. Müller, U. Wandinger, and A. Ansmann, “Microphysical particle parameters from extinction and backscatter lidar data by inversion with regularization: experiment,” Appl. Opt. 39, 1879–1892 (2000). 5. T. Murayama, D. Müller, K. Wada, A. Shimizu, M. Sekigushi, and T. Tsukamato, “Characterization of Asian dust and Siberian smoke with multi-wavelength Raman lidar over Tokyo, Japan in spring 2003,” Geophys. Res. Lett. 31, L23103 (2004). 6. Y. M. Noh, D. Müller, I. Mattis, H. Lee, and Y. J. Kim, “Vertically resolved light-absorption characteristics and the influence of relative humidity on particle properties: multiwavelength Raman lidar observa- tions of East Asian aerosol types over Korea,” J. Geophys. Res. 116, D06206 (2011). 7. H. Baars, A. Ansmann, D. Althausen, R. Engelmann, B. Heese, D. Müller, P. Artaxo, M. Paixao, T. Pauliquevis, and R. Souza, “Aerosol profiling with lidar in the Amazon Basin during the wet and dry season,” J. Geophys. Res. 117, D21201 (2005). 8. I. Veselovskii, A. Kolgotin, V. Griaznov, D. Müller, U. Wandinger, and D. N. Whiteman, “Inversion with regularization for the retrieval of tropo- spheric aerosol parameters from multiwavelength lidar sounding,” Appl. Opt. 41, 3685–3699 (2002). 9. I. Veselovskii, A. Kolgotin, V. Griaznov, D. Müller, K. Franke, and D. N. Whiteman, “Inversion of multiwavelength Raman lidar data for retrieval of bimodal aerosol size distribution,” Appl. Opt. 43, 1180–1195 (2004). 10. C. Böckmann, I. Miranova, D. Müller, L. Scheidenbach, and R. Nessler, “Microphysical aerosol parameters from multiwavelength lidar,” J. Opt. Soc. Am. A 22, 518–528 (2005). 11. J. Bösenberg, V. Matthias, A. Amodeo, V. Amoiridi, A. Ansmann, J. M. Baldasano, I. Balin, D. Balis, C. Böckmann, A. Boselli, G. Carlson, A. Chaikovsky, G. Chourdakis, A. Comerón, F. D. Tomasi, R. Eixmann, V. Freudenthaler, H. Giehl, I. Grigorov, A. Hågård, M. Iarlori, A. Kirsche, G. Kolarov, L. Komguem, S. Kreipl, W. Kumpf, G. Larchevêque, H. Linné, R. Matthey, I. Mattis, L. Mona, D. Müller, S. Music, S. Nickovic, M. Pandolfi, A. Papayannis, G. Pappalardo, J. Pelon, C. Pérez, R. M. Perrone, R. Persson, D. P. Resendes, V. Rizi, R. Rocadenbosch, J. A. Rodriguez, L. Sauvage, L. Schneidenbach, R. Schumacher, V. Shcherbakov, V. Simeonov, P. Sobolewsky, N. Spinelli, I. Stachlewska, D. Stoyanov, T. Trickl, G. Tsaknakis, G. Vaughan, U. Wandinger, X. Wang, M. Wiegner, M. Zavrtanik, and C. Zerefos, “EARLINET: A European aerosol research lidar network to establish an aerosol climatology,” Technical report No. 348 (2003). 12. D. Müller, U. Wandinger, and A. Ansmann, “Microphysical particle parameters from extinction and backscatter lidar data by inversion with regularization: simulation,” Appl. Opt. 38, 2358–2368 (1999). 13. E. Chemyakin, D. Müller, S. Burton, A. Kolgotin, C. Hostetler, and R. Ferrare, “Arrange and average algorithm for the retrieval of aerosol parameters from multiwavelength high-spectral-resolution lidar/ Raman lidar data,” Appl. Opt. 53, 7252–7266 (2014). 14. A. Kolgotin, D. Müller, E. Chemyakin, and A. Romanov, “Improved identification of the solution space of aerosol microphysical properties derived from the inversion of profiles of lidar optical data, part 1: theory,” Appl. Opt. 55, 9839–9849 (2016). 15. A. Kolgotin, D. Müller, E. Chemyakin, and A. Romanov, “Improved identification of the solution space of aerosol microphysical properties derived from the inversion of profiles of lidar optical data, part 2: simulations with synthetic data,” Appl. Opt. 55, 9850–9865 (2016). 16. A. Kolgotin, D. Müller, E. Chemyakin, A. Romanov, and V. Alehnovich, “Improved identification of the solution space of aerosol microphysical properties derived from the inversion of profiles of lidar optical data, part 3: case studies,” Appl. Opt. 57, 2499–2513 (2018). 17. D. Müller, C. A. Hostetler, R. A. Ferrare, S. P. Burton, E. Chemyakin, A. Kolgotin, J. W. Hair, A. L. Cook, D. B. Harper, R. R. Rogers, R. W. Hare, C. S. Cleckner, M. D. Obland, J. Tomlinson, L. K. Berg, and B. Schmid, “Airborne multiwavelength high spectral resolution lidar (HSRL-2) observations during TCAP 2012: vertical profiles of optical and microphysical properties of a smoke/urban haze plume over the northeastern coast of the US,” Atmos. Meas. Tech. 7, 3487–3496 (2014). 18. P. Sawamura, R. H. Moore, S. P. Burton, E. Chemyakin, D. Müller, A. Kolgotin, R. A. Ferrare, C. A. Hostetler, L. D. Ziemba, A. J. Beyersdorf, and B. E. Anderson, “HSRL-2 aerosol optical measure- ments and microphysical retrievals vs. airborne in situ measurements during DISCOVER-AQ 2013: an intercomparison study,” Atmos. Chem. Phys. 17, 7229–7243 (2017). 19. A. Ansmann and D. Müller, “Lidar and atmospheric aerosol particles,” in Lidar Range-Resolved Optical Remote Sensing of the Atmosphere, C. Weitkamp, ed. (Springer, 2005), pp. 105–141. Research Article Vol. 58, No. 18 / 20 June 2019 / Applied Optics 5007 20. D. Müller, U. Wandinger, and A. Ansmann, “Microphysical particle parameters from extinction and backscatter lidar data by inversion with regularization: theory,” Appl. Opt. 38, 2346–2357 (1999). 21. C. F. Bohren and D. R. Huffman, eds., Absorption and Scattering of Light by Small Particles (Wiley, 1983). 22. D. Müller, U. Wandinger, D. Althausen, and M. Fiebig, “Comprehensive particle characterization from three-wavelength Raman-lidar observa- tions,” Appl. Opt. 40, 4863–4869 (2001). 23. D. L. Phillips, “A technique for the numerical solution of certain integral equations of the first kind,” J. Assoc. Comput. Mach. 9, 84–97 (1962). 24. S. Twomey, “The application of numerical filtering to the solution of integral equations encountered in indirect sensing measurements,” J. Franklin Inst. 279, 95–109 (1965). 25. V. E. Zuev and I. E. Naats, eds., Inverse Problems of Lidar Sensing of the Atmosphere (Springer, 1983). 26. S. Twomey, ed., Introduction to the Mathematics of Inversion in Remote Sensing and Indirect Measurements (Elsevier, 1977). 27. R. A. Horn and C. R. Johnson, eds., Matrix Analysis (Cambridge University, 1985). 28. I. Veselovskii, A. Kolgotin, V. Griaznov, D. Müller, and D. N. Whiteman, “Information content of multiwavelength lidar data with respect to microphysical particle properties derived from eigenvalue analysis,” Appl. Opt. 44, 5292–5303 (2005). 29. S. P. Burton, R. A. Ferrare, C. A. Hostetler, J. W. Hair, R. R. Rogers, M. D. Obland, C. F. Butler, A. L. Cook, D. B. Harper, and K. D. Froyd, “Aerosol classification using airborne high spectral resolution lidar measurements—methodology and examples,” Atmos. Meas. Tech. 5, 73–98 (2012). 30. D. Pérez-Ramírez, D. N. Whiteman, I. Veselovskii, A. Kolgotin, M. Korenskiy, and L. Alados-Arboledas, “Effects of systematic and ran- dom errors on the retrieval of particle microphysical properties from multiwavelength lidar measurements using inversion with regulariza- tion,” Atmos. Meas. Tech. 6, 3039–3054 (2013). 31. S. P. Burton, E. Chemyakin, X. Liu, K. Knobelspiesse, S. Stamnes, P. Sawamura, R. H. Moore, C. A. Hostetler, and R. A. Ferrare, “Information content and sensitivity of the 3β + 2α lidar measurement system for aerosol microphysical retrievals,” Atmos. Meas. Tech. 9, 5555–5574 (2016). 32. U. Wandinger, D. Müller, C. Böckmann, D. Althausen, V. Matthias, J. Bösenberg, V. Weiss, M. Fiebig, M. Wendisch, A. Stohl, and A. Ansmann, “Optical and microphysical characterization of biomass- burning and industrial-pollution aerosols from multiwavelength lidar and aircraft measurements,” J. Geophys. Res. 107, D21 (2002). 33. C. D. Rodgers, ed., Inverse Methods for Atmospheric Sounding: Theory and Practice (World Scientific, 2000). 34. V. Bovchaliuk, P. Goloub, T. Podvin, I. Veselovskii, D. Tanré, A. Chaikovsky, O. Dubovik, A. Mortier, A. Lopatin, M. Korenskiy, and S. Victori, “Comparison of aerosol properties retrieved using GARRLiC, LIRIC, and Raman algorithms applied to multi-wavelength lidar and sun/sky-photometer data,” Atmos. Meas. Tech. 9, 3391– 3405 (2016). 35. P. Sawamura, D. Müller, R. M. Hoff, C. A. Hostetler, R. A. Ferrare, J. W. Hair, R. R. Rogers, B. E. Anderson, L. D. Ziemba, A. J. Beyersdorf, K. L. Thornhill, E. L. Winstead, and B. N. Holben, “Aerosol optical and microphysical retrievals from a hybrid multiwavelength lidar dataset – DISCOVER-AQ 2011,” Atmos. Meas. Tech. 7, 3095–3112 (2014). 36. D. Müller, C. Böckmann, A. Kolgotin, L. Schneidenbach, E. Chemyakin, J. J. Rosemann, P. Znak, and A. Romanov, “Microphysical particle properties derived from inversion algorithms developed in the frame- work of EARLINET,” Atmos. Meas. Tech. 9, 5007–5035 (2016). 37. M. Tesche, A. Ansmann, D. Müller, D. Althausen, R. Engelmann, V. Freudenthaler, and S. Groß, “Vertically resolved separation of dust and smoke over Cape Verde by using multiwavelength Raman and polarization lidar during SAMUM 2008,” J. Geophys. Res. 114, D13202 (2009). 38. R. E. Mamouri and A. Ansmann, “Separating mixtures of aerosol types in airborne HSRL data,” Atmos. Meas. Tech. 7, 419–436 (2013). 39. I. Veselovskii, O. Dubovik, A. Kolgotin, M. Korenskiy, D. Whiteman, K. Allakhverdiev, and F. Huseyinoglu, “Linear estimation of particle bulk parameters from multi-wavelength lidar measurements,” Atmos. Meas. Tech. 5, 1135–1145 (2012). 40. D. Müller, I. Veselovskii, A. Kolgotin, M. Tesche, A. Ansmann, and O. Dubovik, “Vertical profiles of pure dust and mixed smoke-dust plumes inferred from inversion of multiwavelength Raman/polarization lidar data and comparison to AERONET retrievals and in situ observa- tions,” Appl. Opt. 52, 3178–3202 (2013). 41. M. de Graaf, A. Apituley, and D. Donovan, “Feasibility study of integral property retrieval for tropospheric aerosol from Raman lidar data using principle component analysis,” Appl. Opt. 52, 2173–2186 (2013). 42. O. Dubovik, A. Sinyuk, T. Lapyonok, B. N. Holben, M. Mishchenko, P. Yang, T. F. Eck, H. Volten, O. Muñoz, B. Veihelmann, W. J. van der Zande, J.-F. Leon, M. Sorokin, and I. Slutsker, “The application of spheroid models to account for aerosol particle nonsphericity in re- mote sensing of desert dust,” J. Geophys. Res. 111, D11208 (2006). 5008 Vol. 58, No. 18 / 20 June 2019 / Applied Optics Research Article