Evaluation of the CHUMS Child Bereavement Group: A Pilot Study Examining Statistical and Clinical Change

This article describes the largest evaluation of a UK child bereavement service to date. Change was assessed using conventional statistical tests as well as clinical significance methodology. Consistent with the fact that the intervention was offered on a universal, preventative basis, bereaved young people experienced a statistically significant, small to medium-sized decrease in symptoms over time. This change was equivalent across child age and gender. Type of bereavement had a slight impact on change when rated by parents. Potential clinical implications are highlighted, and various limitations are discussed that we hope to address using an experimental design in future research.

educational functioning, and regression in developmental milestones (Dowdney, 2000;Lutzke, Ayers, Sandler, & Barr, 1997). Furthermore, adjustment to bereavement can be even more difficult if the event of the death is traumatic; in such cases the trauma of the event of the death may impede grieving the loss, meaning that the event must be processed in order for adjustment to the loss to begin (e.g., Brown & Goodman, 2008;Cohen, Mannarino, Greenberg, Padlo, & Shipley, 2002).
In an effort to prevent the onset of psychological and adjustment problems following bereavement, childhood bereavement services offer support and intervention to bereaved children and their families on a universal basis, or according to the objective circumstances of the death (Currier, Neimeyer, & Berman, 2008). However, attempts to evaluate the effectiveness of these services have yielded equivocal results, and many important questions remain unanswered. In an attempt to establish a consensus to inform clinical practice, several reviews of the child bereavement outcome literature have been conducted.
The three existing narrative reviews concluded that there is evidence of mixed benefits for bereaved families (Schneiderman, Winders, Tallett, & Feldman, 1994), a small amount of quantitative evidence for universal approaches with bereaved children (Curtis & Newman, 2001), and, more recently, evidence that primary, secondary, and tertiary prevention may all be effective for children (Schut & Stroebe, 2005). However, two of these reviews (Curtis & Newman, 2001;Schneiderman et al., 1994) also noted that significant methodological flaws prevented them from drawing firm conclusions (cf. Dowdney, 2000).
With regard to quantitative reviews, Currier, Holland, and Neimeyer (2007) conducted a metaanalysis of 13 controlled child outcome studies and reported a small, nonsignificant overall effect size. The authors concluded that their results do not support the assumption that bereavement interventions with children have a significant influence on adjustment and they contrasted this finding with the large positive effect sizes typically shown in reviews of general psychotherapy with children. However, generalizing the conclusions of this meta-analysis may be problematic because many of the included studies were conducted on samples that are not representative of clients who seek grief counselling in the real world (Larson & Hoyt, 2009). This is because, in these studies, participants experienced unusually lengthy delays between bereavement and intervention (M ¼ 17.5 months). Therefore, many of the interventions offered in the studies summarized by Currier et al. (2007) may have occurred too late after bereavement, at a point when children were no longer affected. As Larson and Hoyt (2009) noted, if these lengthy delays between bereavement and intervention are atypical, or if recruitment procedures produced research participants who differ from actual clients in other ways (e.g., different levels of motivation for treatment), it is difficult to generalize the effect sizes derived from meta-analyses of such studies to grief therapy as actually practiced.
A subsequent meta-analysis of the child bereavement outcome literature found improvements amounting to a small to medium effect size across 15 controlled studies, and a medium effect size across 12 uncontrolled studies (Rosner, Kruse, & Hagl, 2010). Further analyses revealed that interventions for young people who showed some level of distress, impairment, or a diagnosis tended to show larger effect sizes than interventions for young people who were either nonsymptomatic or heterogeneous regarding symptom status (Rosner et al., 2010).
It is noteworthy that almost all of the controlled and uncontrolled child bereavement outcome studies to date have been conducted in the United States: There are only four published quantitative evaluations of UK child bereavement services (Bisson & Cullum, 1994;Black & Urbanowicz, 1987;Stokes, Wyer, & Crossley, 1997;Trickey & Nugus, 2011). Moreover, it is currently difficult to draw firm conclusions about the effectiveness of the UK child bereavement programs as a result of methodological flaws in some of these studies and equivocation in the wider child bereavement outcome literature regarding whether interventions are effective. As Trickey and Nugus (2011) wrote, ''this leaves those offering such services in a dilemma. Should they stop providing services until there is an overwhelming weight of evidence demonstrating their effectiveness? Or should they carry on offering a service based on the relatively weak evidence, even if it may be proven ultimately to be less effective than they hoped?'' (p. 30). Importantly, commissioners, referrers, young people, and families expect providers to deliver an effective service (Cape & Barkham, 2002). The purpose of this study was therefore to evaluate the effectiveness of one of the larger UK child bereavement services, using data collected routinely within the service. To quantify the effectiveness of the program, we examined self, parent, and teacher perceptions of (a) the average amount of children's improvement over time and (b) individual-level change, using Reliable Change Index (RCI) and clinical significance methodology (Jacobson & Truax, 1991).

Description of the Intervention
The CHUMS bereavement service offers support and intervention to referred young people aged 3-19 years. Following assessment, young people and parents=carers are offered a range of interventions. The most common is the group program, which has content similar to established interventions such as the Family Bereavement Programme (FBP; e.g., Sandler et al., 2003Sandler et al., , 2010. The group program has evolved over time in response to child and parent feedback. For children aged 3-12 years, the group program consists of three ''workshops'' that are facilitated by a play therapist or counsellor and a number of trained volunteers. Workshops are held on three consecutive weekends and take place four times a year in a local school. The first workshop lasts 5 hr (including a lunch and afternoon break) to allow people to settle into the group; the two other workshops last 3 hr (including a mid-morning break). There is a child to staff ratio of 2:1. The school setting provides the opportunity for children and parents=carers to interact and engage in therapeutic activities separately and together. Young people work in groups of similar ages (this varies depending on referrals) and parents= carers form a parallel group that completes similar activities as well as content focusing on supporting parents in facing the challenges of caring for bereaved children and young people. Similar to the FBP (Sandler et al., 2003), the parent group seeks to improve the quality of family relationships and help parents with their own difficulties. The workshops predominantly involve group work with similar-aged children, but there are also regular therapeutic tasks with parents=carers. Separate activities for children and parents=carers enables age-appropriate discussions and allows activities to be targeted to individual needs. These separate activities initially take place within the same room, which provides comfort and security for children and parents alike. This set-up is particularly important for the younger children, who often approach parents for attachment needs (e.g., information and comforting) during the first workshop morning.
For teenagers aged 13-19 years, the group program consists of four consecutive evening 2.5-hr workshops (with a half-hour break) that are facilitated by a family care practitioner and a number of trained volunteers. The workshops take place in a hospice. There is a teenager to staff ratio of 3:1, and these workshops take place twice a year because they are generally in lower demand. Teenage workshops involve a mixture of whole-group therapeutic tasks, individual therapeutic tasks, and working in pairs. Teenagers work in groups of similar ages and complete similar therapeutic activities to the under-12s. However, to be acceptable and engaging to teenagers, the group adopts a more informal atmosphere than the child group and the nature of some of the therapeutic tasks is tailored to be more age-appropriate and engaging for adolescents (Stallard, 2005). For example, children create a storyboard of the death (describe life before the death, describe the death, describe what happened after the death, describe life now) and briefly talk through their storyboard (or have a volunteer do this for them if needed). Teenagers likewise complete a storyboard, but also have the opportunity to verbally tell the story of the event in small groups for a longer period of time. The decision was taken not to run a parallel group for parents=carers of teenagers following teenager feedback and because we wanted to respect the teenagers' increasing independence from their parents=carers (Garcia Preto, 1999). An ongoing support group is available for parents or carers on a drop-in basis if needed, which some carers of the teenagers attend.
Although there are some differences between the child and teenager groups, both groups have a similar theoretical underpinning and many of the therapeutic activities are similar. Both groups aim to encourage the following factors considered to facilitate adjustment to a significant person's death:

Social Support and Normalization
Bereaved young people tend to withdraw socially (Worden, 1996), and isolation is associated with depression in this group (Balk, 1990). Therefore, the social aspects of the program are considered therapeutic above and beyond their importance in facilitating the therapeutic activities. Many of the young people of all ages that attend the groups explicitly comment on their surprise that there are so many other people who are bereaved and who are ''like them.'' This destigmatization through the realization that they are not the only ones experiencing these things seems to be helpful to young people of all ages (Metel & Barnes, 2011) but may be particularly important to the teenagers, who are more concerned with their standing within their peers (Leader, 1991;Stallard, 2005). This aspect of the groups is very similar to that of the FBP, which normalizes the experiencing of grief-related feelings and encourages their adaptive expression (Sandler et al., 2003).

Memory Activities
These are designed to facilitate reminiscing and communication about the person that died. Items are made that can be used after the workshops as an aide-memoire both to enable the person to retain memories of the deceased as well as to facilitate ongoing discussions about that person within the family. This aspect of the therapeutic content is based on the notion of continuing bonds, which suggests that rather than ending their relationship with the deceased, people who are bereaved often find it helpful to continue that relationship and may benefit from support in this process. The therapeutic activities are focused toward enabling young people to change the nature of their relationship so that it continues to offer comfort and solace (Klass, Silverman, & Nickman, 1996;Stroebe, Schut, & Boerner, 2010). Evidence indicates that talking openly about the deceased, owning mementos of them, and forming an internal construction of the deceased is often associated with better adjustment to the loss (e.g., Black & Urbanowicz, 1987;Nickman, Silverman, & Normand, 1998).
Because most of the activities take place in groups of children of similar ages, and there is a high ratio of staff to young people, it is very easy to adapt each task to the EVALUATION OF THE CHUMS CHILD BEREAVEMENT GROUP particular needs of the group. There is more emphasis on play with the younger children, and more opportunities to talk for the teenagers (Fuggle, Dunsmuir, & Curry, 2012;Stallard, 2005). For example, children of all ages (and their parents) make a ''salt-sculpture'' by filling a jar with layers of salt of different colors; each color represents something about the person that died such as an attribute or a memory (e.g., ''blue like his eyes, green for when we played football''). The younger children will be given a lot more assistance with this creative task and might only talk very briefly about what each layer represents. However, the older children may complete the task more independently, which often leads to them having conversations in twos and threes about what each layer represents. Once the task is completed, the younger children may be keen to return to their carers to show them their sculpture, whereas the teenagers are encouraged and supported to share details of their sculpture and what it means to their group. This is intended to be a fluid process that meets the needs of each specific group, but also roughly matches the developmental level of the young people.

Information and Meaning-Making
To ''make meaning'' of a death, young people may need information that has often not been forthcoming. This is partly achieved through sessions involving ''ask the doctor'' and ''ask the undertaker.'' Young people are able to write any questions that they have for the doctor or the undertaker anonymously. These questions, and any questions asked verbally, are then answered in a friendly but ''authoritative'' way. These activities encourage young people to ask questions about the death and reassures parents that providing information outside of the workshops will be useful. Young people receive truthful, age-appropriate, and sometimes new information that enables them to begin the process of meaningmaking both within and outside the groups. Other activities facilitate the creation of a coherent narrative of the event of the person's death; this is intended to minimize the chances of the account of the death being too frightening to think through, which could impede the process of grieving and meaning-making (Cohen, Mannarino, & Staron, 2006); for example, if the young person were too scared by the death to be sad about their loss.

Fostering Coping and Resilience
Many bereaved children and young people experience extremely strong feelings such as intense anger and sadness and they can sometimes struggle to know how to cope usefully (Dowdney, 2000). Similar to the FBP (Sandler et al., 2003), which is known to be effective, group activities are designed to normalize the experiencing of grief-related feelings and encourage their adaptive expression. Further work is done with a view to fostering coping and resilience based on cognitive behavioral principles. This includes discussions and activities aimed at fostering greater understanding of feelings, their links to thoughts and beliefs, and developing strategies for coping with them in a useful way. Although there is some uncertainty about the actual mechanisms of change when using cognitive behavioral therapy (CBT) with younger children (Grave & Blissett, 2004), there is sufficient evidence that CBT can be effective in helping children of a wide range of ages (5-18) to deal with difficult and strong emotions (e.g., Cartwright-Hatton, Roberts, Chitsabesan, Fothergill, & Harrington, 2004;National Institute for Health and Clinical Excellence, 2005;Shortt, Barrett, & Fox, 2001;Stallard, Simpson, N., Anderson, S., Hibbert, S., & Osborn, 2007), to justify its inclusion as part of the program. Each of the activities and discussions are titrated to take into account the particular children in each group, including their age and cognitive ability. With appropriate ''scaffolding,'' even young children are able to make use of CBT activities (Wood, Bruner, & Ross, 1976).

Study Design
Service policy and clinical and ethical concerns meant that all young people needing services were offered the option of group participation. Because a passive control group was not possible, a naturalistic one-group pre-post design was used.

Participants
The CHUMS internal ethical review board approved the use of routinely collected existing data. 1 Study participants were children attending 10 different groups between May 2009 and March 2011. Data were collected at initial assessment, generally within a month of referral, and during follow-up visits, generally within 5 weeks of attending the group program. Pre-and postgroup data from a parent, child, or teacher was available for 168 children. The mean age of participants was 9.86 years (SD ¼ 3.30; range ¼ 3-16 years); 44.0% of participants were male; and 45.2% of participants had experienced a parent's death (father ¼ 32.7%, mother ¼ 12.5%, brother ¼ 8.2%, sister ¼ 7.4%, grandfather ¼ 13.1%, grandmother ¼ 6.0%, multiple ¼ 13.1%, other ¼ 7.2%). Thirty-six children and adolescents (24 females; range ¼ 9-15 years) completed the self-report Strengths and Difficulties Questionnaire (SDQ; Goodman, 1997).

INSTRUMENTS Outcome Measure
The SDQ is widely used as a screening tool for psychiatric problems in clinical practice and is increasingly being used as a measure of child psychological problems in etiological, longitudinal, and service evaluation studies (Vostanis, 2006). The SDQ is routinely used as an outcome measure in three of the main UK child bereavement services (CHUMS, Winston's Wish, and Child Bereavement UK) and is used universally by the Outcome Research Consortium of the UK Child and Adolescent Mental Health Services (www.corc.uk.net) and Children and Young People Improving Access to Psychological Therapies outcome monitoring and evaluation initiatives. The SDQ assesses young people's behaviors, emotions, and relationships and consists of 25 items covering five subscales: hyperactivity=inattention, emotional symptoms, conduct problems, peer relation problems, and prosocial behavior. The first four subscales can be summed to generate a total difficulties score (range ¼ 0-40). For children aged 3-16, there are versions that can be completed by parents=carers or teachers. For young people aged 11-16, there is also a self-report version. There is strong evidence for the validity of the SDQ, including its five-factor structure (Goodman, 2001). The reliability of the SDQ is also satisfactory, whether judged by internal consistency (mean Cronbach a ¼ .73), or retest stability after 4 to 6 months (M ¼ .62) (Goodman, 2001). In the present sample, interrater agreement as determined by intraclass correlation coefficients between SDQ raters (e.g., parent and teacher) ranged from .46 to .66 at assessment and .34 to .65 at follow-up.

Background Variables
Background variables used were the child's age, gender, and the type of bereavement the child experienced. Sample size meant that three categories were formed for type of bereavement: immediate family (death of father, mother, or sibling), grandparent, and multiple losses. The latter category did not overlap with immediate family and grandparent categories.

Group-Level Change
Mean change on the SDQ from baseline to follow-up was investigated using separate factorial mixed-model analyses of variance (ANOVAs). Main and interaction effects were examined in relation to time, gender, and SDQ informant (parent, teacher, child). Type of bereavement and age were examined as moderator variables. The alpha level for statistical significance was set a priori at .05 and Cohen's d effect sizes were calculated. The data were checked for normality and variance homogeneity.

Individual-Level Change
Several methodologies have been developed to examine individual change during psychotherapy (clinical significance). The consensus across three reviews (Atkins, Bedics, McGlinchey, & Beauchaine, 2005;Maassen, 2000;Wise, 2004) is that the Jacobson and Truax method is optimal (Jacobson, Follette, & Revenstorf, 1984;Jacobson & Truax, 1991). First, a reliable change index (RCI) was calculated for each child using the formula RCI ¼ (where X 2 denotes the individual's follow-up SDQ score, X 1 denotes the individual's baseline SDQ score, SD 1 is the SD of group scores at baseline, and r is the standardization sample's internal consistency 2 (see Goodman, 2001). RCIs larger than 1.96 are unlikely to occur by chance and indicate reliable change with 95% confidence. Second, Jacobson and Truax's Method C was used to determine clinical significance because boxplots revealed that baseline and follow-up distributions overlapped substantially (Jacobson & Truax, 1991). Therefore, normative British data and cutoff scores on the SDQ (Meltzer, Gatward, Goodman, & Ford, 2000) were used to interpret clinically significant change. These two steps were used to group individuals into four categories: recovered (individual passed normative range cutoff and RCI in the positive direction), improved (individual passed RCI criterion in the positive direction), unclassified 3 (individual passed neither criteria), and deteriorated (individual passed normative range cutoff and RCI in the negative direction).

RESULTS
Differences Between the Three SDQ Versions at Baseline and Follow-Up Table 1 displays the means for the three SDQ versions at baseline and follow-up for males and females. Statistically significant differences were observed between the three SDQ informants at baseline, F(2, 259) ¼ 12.13, p < .001, and at follow-up, F(2, 259) ¼ 6.34, p < .001.
Follow-up analyses revealed that the mean teacher SDQ score was significantly lower than the mean child SDQ score at baseline (p < .05, Cohen's d ¼ .77) and follow-up (p < .05, Cohen's d ¼ .68). Mean teacher SDQ score was also significantly lower than mean parent SDQ score at baseline (p < .05, Cohen's d ¼ .57), and borderlinesignificantly lower at follow-up (p ¼ .056, Cohen's d ¼ .31). There were no significant differences between parent and child ratings at either time point.

Improvement on the SDQ and Gender Effects
Separate mixed-model ANOVAs were conducted to explore change in SDQ means from baseline to follow-up for each of the three SDQ versions ( Table 2). The three Time Â Gender interaction effects were nonsignificant (p > .10), indicating that, on average, males and females changed equally over the course of the intervention. Statistically significant main effects for the intervention were revealed on all three SDQ versions; the average reduction of the SDQ total difficulties score corresponded to a medium effect size for the child and parent SDQs but was more modest for the teacher SDQ. In addition, there were statistically significant main effects for gender on the parent and teacher SDQs: On average, parents and teachers reported statistically significantly higher total difficulties scores in males, amounting to medium effect sizes (Tables 1 and 2). Although no statistically significant gender differences emerged on the child SDQ, the effect size was of a comparable magnitude to the parent and teacher gender main effects and the means (Tables 1 and 2  that, on average, young people self-reported females as having higher total difficulties scores.

Average Improvement Depending on SDQ Informant
A statistically significant difference emerged in the average amount of improvement between the three SDQ versions, F(2, 259) ¼ 5.58, p ¼ .004. Follow-up analyses revealed that the average reduction on the SDQ was somewhat larger on the parent SDQ (n ¼ 119, M ¼ À .3.7) in comparison to the teacher SDQ (n ¼ 107, M ¼ À.1.8), corresponding to a modest effect size (Cohen's d ¼ .31). The average amount of improvement on the child SDQ (n ¼ 36, M ¼ À2.6) did not differ significantly from the other two versions.

Moderator Analysis of Improvement on the SDQ
There was a noticeable and statistically significant moderating effect for type of bereavement on the amount of improvement on the parent SDQ only, F(2, 110) ¼ 4.19, p ¼ .02. Follow-up analyses showed that the amount of improvement in children who lost someone from within their immediate family was significantly smaller (p < .05, M ¼ 2.6) than that reported for children who had lost a grandparent (M ¼ 5.6, Cohen's d ¼ .52) or experienced multiple losses (M ¼ 6.1, Cohen's d ¼ .60). On the teacher SDQ, type of loss did not significantly moderate the average amount of change, F(2, 97) ¼ .25, p ¼ .78. These analyses were not possible for child SDQs because of very small group sizes. The potential moderating influence of age on the individual amount of improvement was examined. There was no evidence that the age of the child was significantly correlated with individual change scores (child SDQ: r s ¼ À.10, p ¼ .67; parent SDQ: r s ¼ .05, p ¼ .68; teacher SDQ: r s ¼ .11, p ¼ .41). Table 3 indicates that, depending on the person evaluating the change, 13%-18% of young people were deemed recovered. The change for these young people was large enough to be regarded as reliable with 95% confidence and they moved out of the likely psychiatric ''caseness'' range. Two percent of young people were deemed improved, indicating that change for these individuals was large enough to be regarded as reliable with 95% confidence, but these individuals did not move out of the likely psychiatric ''caseness'' range. The majority of young people's change was not large enough to be considered reliable with 95% confidence (75%-87%) and a small proportion of young people reliably deteriorated (0%-8%).

Clinical Significance
Scatterplots were created to illustrate reliable and clinically significant change for the three SDQ versions (Figure 1). Baseline SDQ scores are plotted on the x-axis and follow-up SDQ scores are plotted on the y-axis. For clarity, similar individual data values were combined into groups of data. Horizontal and vertical dashed lines define the cutoff for clinical significance, with values above the dashed lines suggesting likely psychiatric ''caseness.'' Diagonal lines represent the upper and lower boundaries of unreliable change corresponding to an RCI band of 2 SDs (95% CI). Only the data points outside of these bands are said to have made a large enough amount of change to be regarded as reliable. Data points below the horizontal cutoff line and the lower diagonal line present reliable and clinically significant improvement with 95% confidence; data points above the horizontal cutoff line and the upper diagonal line present reliable deterioration with 95% confidence. Overall, relatively few cases demonstrated a reliable and clinically significant Improvement. The majority of the children attending the program were already below the clinical cutoff at baseline and hence their potential amount of improvement on the SDQ was limited and not large enough to be considered reliable. Consequently, most of the individual change scores fall within and not outside the RCI bands.

DISCUSSION
To the best of our knowledge, this is the largest evaluation of a UK child bereavement service to date, and the first in the literature to examine individual-level change. The results indicate that bereaved young people who participated in the CHUMS group program experienced a statistically-significant, medium-size decrease in symptoms over time when rated by parents and children, and a statistically significant, small size decrease in symptoms over time when rated by teachers. The amount of improvement was equivalent irrespective of the child's age or gender. The magnitude of these effects is similar to that reported in a meta-analysis of uncontrolled child bereavement interventions (Rosner et al., 2010). However, as with the meta-analytic results, these group-level effects can only be tentatively interpreted as evidence of effectiveness because of the absence of a control group which would account for natural recovery. The results should also be interpreted with caution, given the concern in the literature regarding whether symptom-based measures make sensitive outcome measures when examining grief outcomes (e.g., Currier et al., 2008;Neimeyer & Hogan, 2001;Neimeyer, Hogan, & Laurie, 2008). With this in mind, the reliable change and clinical significance methodology provided complimentary information to the group-level analyses. These analyses demonstrated that 13%-18% of young people were deemed recovered at the end of the program when a (very rigorous) cutoff of 2 SDs was used. The majority of young people's individual-level change was not large enough to be considered reliable with 95% confidence. However, these results are unsurprising when it is considered that the intervention was offered on a universal basis to potentially prevent the onset or deterioration of psychological and adjustment problems in bereaved children and young people using the service. Indeed,  these results correspond well with evaluations of preventive interventions for other types of psychological problems such as depression (Rosner et al., 2010). These results are consistent with the fact that most bereaved children and adolescents adjust to bereavement without difficulty (Dowdney, 2000;Worden, 1996) and would not therefore evidence changes large enough to be considered statistically reliable. However, a plausible alternative explanation for these results is that the single, generic, symptom-based measure that was used to assess change over time may have missed important facets of young people's bereavement adaptation.
It is noteworthy that the type of bereavement had a moderate influence on change over time when rated using the parent SDQ. This analysis showed that individuals whose bereavement involved an immediate family member benefitted significantly less than those who experienced multiple deaths or the death of a grandparent. This finding is consistent with the fact that, in general, the death of a parent or sibling would be expected to involve more practical (e.g., need to move house or schools, or to change social roles) and psychological (e.g., loss of an important attachment relationship) changes than the death of a grandparent. The finding regarding the multiple deaths category cannot be readily interpreted because no information was collected regarding the nature of these deaths or the young person's relationship to the bereaved persons. The fact that this moderating effect was nonsignificant when rated using the teacher SDQ potentially raises the possibility that the SDQ rater's own mental health may have impacted their ratings of young people.

IMPLICATIONS FOR CLINICAL PRACTICE
The results provide some potential implications for clinical practice with bereaved children and adolescents. However, in light of the significant limitations of this pilot study (which are discussed further below), we note that the following clinical implications should be interpreted tentatively. The individual-level change results draw attention to the potential usefulness to clinicians and families alike of data regarding individual deterioration over time, particularly in light of the observed group differences in SDQ reporting. For example, child bereavement services could use SDQ (or other) data to monitor and identify deterioration in specific young people, corroborate written and verbal reports, identify discrepancies between different sources of information, and use all the information gathered to inform clinical conversations with young people and families about potential additional interventions (e.g., more of the same or an alternative intervention such as 1:1 therapy). Although we recognize their potential disadvantages, the use of outcome measures every session could potentially identify individuals that were deteriorating sooner, rather than waiting until follow-up.
Two other potential clinical implications arise from the group differences in SDQ reporting. In keeping with a review of the child bereavement literature (Dowdney, 2000) and empirical work examining the posttraumatic stress disorder diagnosis in children (Meiser-Stedman, Smith, Glucksman, Yule, & Dalgleish et al., 2008), teacher ratings of hyperactivity=inattention, emotional symptoms, conduct problems and peer relation problems (which the SDQ total difficulties score measures) were significantly lower than parent and self-reports and the average reduction on the SDQ was somewhat larger on the parent SDQ in comparison to the teacher SDQ. A number of factors may potentially explain these group differences: (a) young people's difficulties may be most apparent outside of school; (b) parents and young people are more attuned than teachers to the needs and well-being of young people; (c) parents may be more directly affected by the death and this may have impacted their perception and reporting of child problems (see Meiser-Stedman et al., 2008); (d) young people experience less problems in a school environment, for example because of distraction from their difficulties or because of social support from peers and teachers; or (e) expectancy or socially desirable responding effects. Awareness of this finding may be of use to clinicians during assessment and ongoing monitoring, particularly in cases where one or more parents have died and the teacher and child perceptions are all that is available. However, because data was not collected regarding parent mental health problems, the extent to which parental reports regarding children's mental health problems may have been influenced by the parent's own problems, is unknown.
It is also noteworthy that young people reported males as having lower total problem scores than females, and that this pattern of results was opposite to parent and teacher reports of problems. Because the literature suggests that following bereavement, males tend to show higher rates of overall psychological difficulties and more externalizing problems (e.g., aggression) than females (e.g., Dowdney, 2000;Haine, Ayers, Sandler, & Wolchik, 2008), this finding suggests a discrepancy between self-perceptions or self-reports, and observable distress and behavioural problems. This result testifies to the importance of gathering information from multiple informants and may need to be borne in mind when interpreting young people's self-reports.

LIMITATIONS AND FUTURE RESEARCH
This preliminary study suffers from various significant limitations, some of which are perhaps inevitable in a EVALUATION OF THE CHUMS CHILD BEREAVEMENT GROUP service rather than research setting. The main limitation to this study was the absence of a control group. Although relatively common in the existing child bereavement outcome literature (see Rosner et al., 2010), a one group pre-post design is limited in its internal validity as change on outcome measures cannot be attributed with certainty to the intervention. There is therefore a pressing need for future research to use an experimental design to rigorously evaluate the effectiveness of child bereavement services in the United Kingdom.
Another major limitation of this study was the self-report data sample size. This is so small (particularly for males) that there is a good chance these results may change with a larger sample. We therefore advise that great caution is taken when interpreting the self-report data. Furthermore, it is not possible to know how representative the current results are, as no information was kept regarding group attendance for young people or their parents. This fact means that the average effectiveness estimates may have been underestimated by including participants who did not receive enough sessions to expect an impact. On the other hand, follow-up data could have been lacking adequate representation from those who missed sessions or who dropped out of the intervention. It therefore remains unknown whether nonattendance or drop-out took place and, if this did occur, whether this resulted in a selection bias that distorted the average effectiveness estimates. Finally, very few variables were measured by the service that could be used to potentially explain individual differences in bereavement adjustment and well-being (e.g., multiple vs. single death; cause and nature of death; relationship with the deceased; time between bereavement and assessment or intervention). These issues clearly point to the need for more systematic and detailed data collection procedures in the service.
Next, the SDQ is a general, symptom-based measure of psychological problems and, as such, will not have measured all the nuanced and multidimensional manifestations of bereavement adaptation which may warrant attention in therapy (e.g., the degree to which the bereaved person's identity was constructed around or entwined with the deceased; the specific attachment relationship the young person had with the deceased; the presence of unhelpful appraisals about self, others and the deceased; the use of unhelpful coping styles such as rumination and (inflexible) avoidance; current social support and the ability to talk about the death; social role change as a result of bereavement; impaired functioning; Boelen, van den Bout, & van den Hout, 2003;Maccallum & Bryant, 2013;Neimeyer & Hogan, 2001;Stroebe et al., 2007Stroebe et al., , 2010. Although the CHUMS group program has a clear theoretical underpinning, the use of a single symptom-based measure means that it is not possible to comment on the extent to which the intervention achieved some of its theoretical aims. We are also unable to comment on the degree to which potential mechanisms of change (e.g., constructing a subjective sense of understanding in the death; posttraumatic growth: Currier et al., 2008) may have explained the outcomes observed. Although it is often impractical to administer and analyze a range of outcome measures in routine clinical practice, these issues highlight the importance of not solely using symptom-based measures, and of administering a number of outcome measures in child bereavement services. Such information could very usefully contribute to individualised clinical formulations and be used to measure grief adjustment more accurately.
Moreover, the service routinely collects large amounts of qualitative data regarding satisfaction, functioning and change. Systematically recording and analyzing this information to supplement the quantitative data would provide useful additional outcome data. Likewise, lack of outcome data concerning the parents=carers participating in the program means that the potential effectiveness of this element of the intervention, and its relative effect in supporting or facilitating adjustment in young people, is currently unknown. Further research is needed to determine the optimum length of the group program as well as predictors and moderators of change. Moreover, the widespread assumption in the literature that group programs are superior to individual therapy requires empirical testing.

CONCLUSION
This preliminary study was useful in examining the potential effectiveness of one of the larger UK child bereavement services. Although causation could not be established, the size of the intervention effects were comparable to those observed in other uncontrolled intervention studies and provide a basis for the continued evaluation of UK child bereavement programs in future research. This study highlighted important limitations with the service's current data collection procedures that we hope to address in a subsequent, more rigorous, service evaluation design, such as an RCT (the first in the United Kingdom). This service evaluation provided some preliminary indications regarding the potential usefulness of comparing different perceptions of change and exploring statistical and clinical significance to identify change at the group and individual level.

ACKNOWLEDGMENTS
We would like to thank all of the children and families who routinely provide data to the service. This research was conducted whilst the first author was at the