A Pediatric Appropriateness Evaluation Protocol for Iran Children Hospitals


Anvar Esmaili 1 , Hesam Seyedin 2 , * , Obeidollah Faraji 1 , Jalal Arabloo 1 , Yaghoub Qahraman Bamdady 3 , Shahin Shojaee 3 , Saadat Hesam 3

1 Department of Health Management and Economics, School of Public Health, Tehran University of Medical Sciences, Tehran, IR Iran

2 Health Management and Economics Research Centre, School of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, IR Iran

3 Mahabad Imam Khomeini Hospital, Urmia University of Medical Sciences, Urmia, IR Iran

How to Cite: Esmaili A, Seyedin H, Faraji O, Arabloo J, Qahraman Bamdady Y, et al. A Pediatric Appropriateness Evaluation Protocol for Iran Children Hospitals, Iran Red Crescent Med J. 2014 ; 16(7):e16602. doi: 10.5812/ircmj.16602.


Iranian Red Crescent Medical Journal: 16 (7); e16602
Published Online: July 5, 2014
Article Type: Research Article
Received: December 4, 2013
Revised: February 1, 2014
Accepted: February 26, 2014




Background: Applying utilization review programs is an appropriate solution to decrease the expenditure, and to increase the efficiency of healthcare systems.

Objectives: This paper presents an instrument to measure the level of appropriate admissions and days of stay (DOS) in the pediatric public hospitals of Iran.

Materials and Methods: The American version of the Pediatric Appropriateness Evaluation Protocol (PAEP) was modified and adjusted by our group of physicians. They carried out a retrospective study over 100 randomly selected patients. The reliability of the instrument was tested based on the consensus of reviewers using PAEP. In addition, the external validity of the instrument was studied by comparing the evaluations of the reviewers using PAEP and the individual judgments of three clinicians in two public teaching hospitals. Finally, reliability and validity were also calculated by the kappa statistic.

Results: With respect to the inter-reliability testing, there was a high level of agreement between reviewers applying the instrument in the admissions criteria and days of stay. Overall agreement was > 77%; also specific inappropriate agreement and specific appropriate agreement were > 61%, and > 72%, respectively. Regarding the validity of the testing, the instrument had a sensitivity of > 0.75, specificity of > 0.67, as well as positive and negative predictive values of > 0.93, and > 0.55, respectively. The kappa statistic for the reviewers (using the instrument for admission and days of stay criteria) were substantial (k = 0.75.5 and 0.71). They were also substantial for clinicians (k = 0.73 and 0.66).

Conclusions: These results showed that the modified PAEP is a reliable and valid instrument to study the appropriateness of admission and days of stay in Iran hospitals. As the developing countries, particularly, Middle East countries have the same status and culture, the result of this study (with minor changes) could be applied in these countries too.


Reliability Validity Iran Hospital

Copyright © 2014, Iranian Red Crescent Medical Journal. This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/) which permits copy and redistribute the material just in noncommercial usages, provided the original work is properly cited.

1. Background

Efficient and cost-effective use of resources is very important for countries such as Iran where resources allocated to the health care system are limited. The total health expenditure (as % of GDP) in Iran was reported at 5.60 in 2010 (1). In Iran, however, hospital expenditure raised more than three times from 2002 to 2007 (2). Although the costs of health care in Iran are much lower than developed countries, concerns regarding the rising expenditures and limited efficiency of hospitals are increasing (3). Given, the inappropriate admissions in some hospitals, the usefulness of utilization review instruments seems quite clear in order to decrease the cost (4). In addition, based on the methods of payment (fee for service) and their effects on increasing bed occupancy in Iranian hospitals (5), this instrument can be helpful in limiting the demand and controlling the costs of the services too.

The implementation of these programs presents a practical solution to the problems of increase in cost and lack of efficiency. Therefore, such an implementation must be based on a practical method, which is both reliable and valid. One of the most extensively used utilization tools for assessing pediatric admissions and days of stay (DOS) is the Pediatric Appropriateness Evaluation Protocol (PAEP). Kreger and Restuccia (6) modified this tool from its adult version (7). Werneke et al. (8) in their study found that the North American PAEP had limited validity for evaluating British pediatric admissions as well as DOS and concluded that utilization review instruments developed in one health system may not be transferable to another.

2. Objectives

This paper illustrates the modification and adjustment of the PAEP and its reliability and validity to measure the level of appropriate admissions and DOS in public pediatric hospitals in Iran.

3. Materials and Methods

3.1. Cross-Cultural Translation

The tool was translated from English into Persian (9) using the process of cross-cultural translation through the following steps: 1) translation from English to Persian; 2) organizing a working group, including two experienced pediatricians, one methodologist, one English language professional, and one translator to construct the first Persian draft; 3) pilot-testing of the draft on medical records of patients; 4) second meeting of the working group to construct a new consensus version; 5) translating from Persian to English and re-evaluating the instrument by the working group.

3.2. Reliability and Validity

The translated PAEP was modified and adjusted in a two-stage process (6). Five physicians (three pediatricians and two general practitioners) made some modifications to the American version of the PAEP to be used in the Iranian context using a nominal group technique. The modified PAEP (Appendix 1) was then used by reviewers (the physicians and researchers [two nurses and the first author]) in a retrospective study in order to examine the inter-rater reliability and external validity of the modified tool.

This study was performed on 100 case records randomly selected from two public teaching hospitals at Tehran University of Medical Sciences in Iran from 21 November 2012 to March 2013. One of the authors summarized the medical records of the patients in a standardized abstract format (using panel of expert opinion). To safeguard patient’s confidentiality, the standardized abstract format was copied, and patients’ identifications (ID) were deleted. The physicians were identified with an anonymous ID code.

The sample size was calculated considering a disagreement degree of 30% with a 2-tailed confidence interval of ± 10% and 95% confidence. A minimum sample of 84 hospital admissions was calculated with 25% more to compensate for exclusion-associated losses (in total: 105). The patients admitted for elective surgery, burns, intensive care, psychiatric problems, and patients older than 18 years old were excluded.

Before performing the reviews, the reviewers were trained by using the PAEP reviewers’ manual. Then, the reviewers independently and concurrently evaluated medical records. Along with assessing the admission details, the group also assessed 153 DOS in which the patients stayed in the hospitals longer than 48 hours.

Inter-rater reliability was tested by calculating the level of overall agreement and specific agreement between reviewers’ assessments based on the PAEP (three pediatricians, two general practitioners and three researchers). Overall agreement is the proportion of judgments in which two reviewers agree. Specific inappropriate agreement is the proportion of judgments (among those judged to be inappropriate by at least one of the two reviewers) that are rated as being inappropriate by both reviewers. Specific appropriate agreement is also calculated in a similar way (6). In addition, overall agreement between reviewers was evaluated by the kappa (k) statistic (10).

In order to test the validity of PAEP, a separate group of clinicians (experienced physicians), including three pediatricians assessed 100 admissions and 153 DOS using individual judgment concerning the appropriateness of the admissions and DOS. The assessments of the three groups of the reviewers based on the PAEP were compared with those of the clinician (11). All the raters (reviewers and clinicians) were asked to judge whether each admission and DOS were appropriate or not. Sensitivity, specificity, positive and negative predictive values of the developed tool was calculated. Experienced clinicians’ judgment was employed as the gold standard in these analyses (12). Finally, Kappa coefficient was also calculated to evaluate the agreement between the reviews using PAEP and the experienced clinicians’ judgment.

Statistical analysis was performed using the Statistical Package for Social Sciences (Windows version 10.0; SPSS Inc. Chicago, United States). Landis and Koch’s guiding principles were employed in interpreting the levels. According to these guidelines, the coefficients between 0.41 and 0.60 are considered as moderate; between 0.61 and 0.80 as substantial, and between 0.81 and 1.00 as perfect (13). The Ethics Committee of Tehran University of Medical Sciences approved this study (on October 3, 2012 with approval No. 90-04-136-16139-97822).

4. Results

4.1. The Consensus Process

There were no fundamental changes between the American PAEP and its Iranian version (IR-PAEP). Regarding the admission criteria the nominal group made some changes to the criteria of “severity of illnesses.” The criteria 8 and 13, “electrolyte abnormality” and “procedures for which outpatient departments are not responsible” were the major concerns of the group. In criterion 8, the following values were added: “BUN > 45 mg/dL,” “BS ≤ 200-, or BS ≥ 50- mmol/L,” “WBC ≤ 15000, or WBC ≥ 2500,” and in criterion 13, the following sub criteria were added: “unbearable pain,” “abdominal tenderness,” and “foreign body ingestion”. Also, for criterion 9, “hematocrit < 30 %,” and 15, “seizures” were added and “lack of alternative care,” “social acceptance,” and “provision of care in case there is a need for time to take the patient to other centers” are considered.

Regarding DOS criteria, the nominal group agreed to change “nursing/life support services” to “nursing/life support services (where/when no alternative care exists or there is no individual to be trained in order to do any of the procedures at home)”. Finally, group unanimously removed criterion of “IM medication for at least 8 hours that day”. These items are significantly different from the American PAEP.

4.2. Reliability and Validity Testing of the Instrument

We selected 105 hospital admissions by a simple random sampling method in which 4.76% of the patients were excluded. Out of 100 patient files, 324 days of stay were obtained. Then the days of admission and discharge were excluded. Those files which lack information referring to the day of clinical-file evaluation (n = 39) or have incomplete notations (n = 32) were also excluded, and 153 days of stay was remained for the study sample.

The reliability in admissions and DOS were almost the same regardless of using override option. In general, the reliability in the samples decreased when the overrides were considered (without overrides = 73.5% and with overrides = 65.5%). Specific inappropriate agreement without overrides was equal to 81.5% and specific inappropriate agreement with overrides was 71%. As there is a possibility for overrides to create bias (6), we avoided using the override option.

The results obtained in this study are shown in the following tables (Tables 1, 2, 3, 4 and 5). Tables 1 and 2 show selected characteristics and distribution of clinical diagnoses for all admissions. Table 3 shows the level of agreement of the reviewers for the IR-PAEP criteria. In general, overall agreements on the assessment of admissions and DOS were very high (96% and 88% respectively) and Cohen’s kappa coefficients (0.75.5 and 0.71, respectively) showed substantial agreement.

There was also a similar level of overall agreements on admissions and DOS among pediatricians (91% and 88%), general practitioners (96% and 91%), and researchers (95% and 94%, respectively). Kappa coefficients showed substantial agreement (0.75 and 0.73) among pediatricians and complete agreement in general practitioners (0.86 and 0.80) and researchers (0.81 and 0.84, respectively).

The findings in Table 3 were compared with the results in Table 4 in which the subjective judgment of the clinicians was regarded as the gold standard. Table 5 shows the results of “sensitivity”, “specificity”, “positive predictive values”, and “negative predictive values” for admissions which were 0.91, 0.85, 0.96, and 0.70 and for DOS were 0.83, 0.92, 0.97, and 0.67, respectively. Cohen’s statistic on admissions and DOS (0.67.3 and 0.61.8, respectively) showed substantial agreement.

The IR-PAEP on admissions had the highest sensitivity (0.93) and specificity (0.88) in researchers’ results and the lowest sensitivity (0.87) and specificity (0.86) in pediatricians’ results. The IR-PAEP on DOS had almost similar sensitivity and specificity in all groups (0.82-0.84 and 0.90-0.91, respectively). Positive and negative predictive values were almost similar in all groups (0.65-0.69.5 and 0.62-0.71, respectively). Kappa coefficients on admissions and DOS showed substantial agreement in pediatricians’ results (0.63.7 and 0.61), general practitioners’ results (0.71 and 0.62), and researchers’ results (0.67.3 and 62.5, respectively).

The overall agreement for reviewers using the IR-PAEP on admissions and DOS was higher (92% and 88%) in comparison with the overall agreement of the clinicians using their subjective judgment (83% and 84%, respectively). Furthermore, the agreement in terms of Kappa coefficient in reliability of the IR-PAEP for reviewers on admissions and DOS was higher (k = 0.75.5 and 0.71) in comparison with the reliability of the clinicians using their subjective judgment (k = 0.73 and 0.66, respectively).

Table 1. Selected Characteristics of the Study Population
Age Group, y
3 ≥72
3 <28
Place of Residence
Table 2. Distribution of Clinical Diagnoses for all Admissions
Seizures and fever14
Urinary tract infection13
Acute upper respiratory infections12
Lower respiratory infections8
Cardiac diseases7
Other diagnosis23
Table 3. Inter-Rater Reliability of the PAEP on Admissions and DOSa,b
Overall agreement, %9688918896919594
SAA, %9389898295849391
SIA, %8083756983778186
Cohen’s k (95% CI for k)0.75.5 (0.59-0.94)0.71 (0.50-0.96)0.75 (0.66-0.85)0.73 (0.68-0.79)0.860.800.81 (0.72-0.88)0.84 (0.71-0.96)

aAbbreviations: GP, general practitioners; A, admission; DOS, day of stay; SAA, specific appropriates agreement; SIA, specific inappropriate agreement; CI, confidence interval.

bReviewers, pediatricians, general practitioners and trained PAEP reviewers (researchers); Researchers, two nurses and the first author; Overrides: for admission = 4.12%, and for DOS = 4.5%, Appropriateness: average appropriate and inappropriate ratings by PAEP reviewers for admission = 70.2% and 23.2%; and for the DOS = 57.7% and 37.8% respectively, P < 0.0001.

Table 4. Agreement Among Judgments of Clinicians on Admissions and DOS a,b
Overall Agreement, %SAA, %SIA, %Cohen’s k 95% CI for k
Admission8391700.73 (0.67-0.77)
DOS8489720.66 (0.48-0.90)

aAbbreviations: SAA, specific appropriates agreement; SIA, specific inappropriate agreement; DOS, day of stay; CI, confidence interval.

bP < 0.0001; Uncertainty: clinicians ‘cannot decide’ for admission = 7.1% and for DOS = 6.5%, total appropriate and inappropriate ratings by clinicians for admission = 74.6% and 18.6%; and for DOS = 68% and 25.5% respectively; Disagreement: on admission = 17% and on DOS = 16%.

Table 5. Validity of the PAEP When Compared with the Judgments of Cliniciansa,b
Cohen’s k (95% CI for k)0.67.3 (0.51-0.82)0.61.8 (0.43-0.72)0.63.7 (0.51-0.82)0.61 (0.52-0.71)0.71 (0.69-0.82)0.62 (0.51-0.75)0.67.3 (0.62-0.71)62.5 (0.43-0.72)

aAbbreviations: GP, general practitioners; A, admission; DOS, day of stay; PPV, positive predictive value; NPV, negative predictive value; CI, confidence interval.

bOverall agreement: for admission = 93%; and for DOS = 85.5%, P < 0.0001; Raters, pediatricians, general practitioners, trained PAEP reviewers (researchers) and clinicians; Researchers, two nurses and the first author.

5. Discussion

The current study represents the first effort to develop an instrument for measuring the extent of appropriateness of admissions and DOS in pediatric hospitals in the Iranian context. In this research, some criteria were modified and adjusted for admission, including removing the criteria of “intramuscular medication”, and considering “lack of alternative care”, “social acceptance”, and “provision of care in case there is a need for time to take the patient to other centers” which were similar to the UK study (4).

Difference in admission of suspected cases of child seizure is an example of why such cases are routinely admitted to the hospitals in Iran and not in the US (6) and UK (4). In general, the important changes are related to the criteria dealing with “severity of illnesses”.

Also, for DOS criteria, “the need to hospital stay” to be checked and offered paramedical services are considered unacceptable in the Iranian setting except for “interval care” which is similar to the UK study (4). The results of the study showed that the instrument is highly replicable as the agreement between the reviewers on admissions and DOS were 96% and 88% with a k statistic of 0.75.5 and 0.71, respectively.

According to the classification of Landis and Koch, k statistic value showed a substantial level of reliability. Therefore, non-physicians could be trained to employ the IR-PAEP too. In other words, they will achieve reliable results as physicians. Regarding the admission criteria of the IR-PAEP instrument, level of overall agreement among all reviewers is 96% which is higher than the agreement reported by the developers of the PAEP in the UK (83%) (4). In our study, the level of overall agreement between researchers is 95% which is similar to those reported by the developers of the PAEP in the US (94%) (7) and the UK (96%) (4). Additionally, the level of overall agreement between clinical raters is 83%. This is much higher than those reported by the developers of the PAEP in the UK (59.5%) (4). Furthermore, in this research the overall agreement between physicians and non-physician reviewers is 93%, while, developers in the UK obtained 68% of agreement (8).

Considering the reliability of DOS criteria for the IR-PAEP instrument, the values of k statistic among all reviewers is 0.71. However, in the UK, the developers of the PAEP obtained the value of 0.53. The researchers reported a value of 0.84. This value is similar to the value obtained in the UK study (4). According to the results of the study, “sensitivity” and “specificity” values gained were (0.83) and (0.92), respectively. These results are almost comparable to those reported (“sensitivity” = 0.93 and “specificity” = 0.78) in the USA (11). The PAEP was modified and adjusted in the UK by Esmail (4) to be used in pediatric practice. This modified instrument yielded a high level of inter-rater reliability. However, in this study, lack of validation of the instrument using separate specialist panels is obvious.

In another study in the UK, the PAEP was employed only to the admission criteria and high level of inter-rater reliability was obtained in the modified tool. In the validity exercise by using separate expert panels, the PAEP had limited validity, and it is not recommended for assessment of the UK pediatric practice in general hospitals.

Our results are similar to the North American studies. The similarity and differences of results for validity scores between these countries can be due to differences in payment system to the physicians, whether it is a fee for services or capitation (8).

As high overall agreements can occur with low k scores when the probable prevalence of the factor under investigation is either very high or very low, the decision to employ an instrument should not be made solely on correlation coefficients and rationality. In fact, relevance and suitability of the criteria should be considered together. It is advised that the prevalence of the situation to be measured should not be higher than 50% (14). There is no evidence in the Iranian pediatric hospitals showing the exact rate of inappropriate admission and DOS, but according to the findings of the seven local studies in adults, the percentage of inappropriate hospital admissions and the DOS ranged from 6-22.8% and 6.2-61.2%, respectively (15-21). Therefore, we think that the consequence of the prevalence in this study is not important.

Panel of clinicians is one method to solve the problem as there is no gold standard (12). Panel of clinicians can be considered the ‘the next-best thing’. As a gold standard, it has restrictions since differences between clinicians’ judgments are generally high. In our study, a substantial level of agreement on admissions and DOS was obtained among the members of the clinicians (k = 0. 73 and 0.66 respectively).

When employing the IR-PAEP, important point of concern is the reliability of the protocol when it is used in different sittings (22). In Iran, all public hospitals are centralized and the case mix in various hospitals is not dissimilar. Consequently, the modified version of the protocol is applicable in other hospitals across the country.

The retrospective nature of the study may be mentioned as its major limitation. The results of our study show that the IR-PAEP in its present structure has adequate reliability and validity to measure the extent of appropriateness of admission and DOS. Therefore, it is recommended that the instrument be utilized in pediatric public hospitals in Iran. As the developing countries, particularly Middle East countries have the same status and culture, the result of this study (with minor changes) could be used in these countries.



  • 1.

    Tabibzadeh M. An Evaluation on the Health System Progress and Economic Development Indicators in Iran. Online Int In Res J. 2013; 3(4) : 182 -92

  • 2.

    Mehrara M, Fazaeli A. A Study on Health Expenditures in Relation with Economics Growth in Middle East and North Africa (MENA) Countries. J Health Admin. 2009; 12(35) : 49 -60

  • 3.

    Statistical Center of Iran . [Annual Report, 2002 to 2007] 2011;

  • 4.

    Esmail A. Development of the Paediatric Appropriateness Evaluation Protocol for use in the United Kingdom. J Public Health Med. 2000; 22(2) : 224 -30 [PubMed]

  • 5.

    Zaboli R, Seyedin S, Khosravi SST. Effect of per-case reimbursement on performance indicators of a military hospital's wards. Mil Med J. 2011; 13(3) : 155 -8

  • 6.

    Kreger BE, Restuccia JD. Assessing the need to hospitalize children: pediatric appropriateness evaluation protocol. Pediatrics. 1989; 84(2) : 242 -7 [PubMed]

  • 7.

    Gertman PM, Restuccia JD. The appropriateness evaluation protocol: a technique for assessing unnecessary days of hospital care. Med Care. 1981; 19(8) : 855 -71 [PubMed]

  • 8.

    Werneke U, Smith H, Smith IJ, Taylor J, MacFaul R. Validation of the paediatric appropriateness evaluation protocol in British practice. Arch Dis Child. 1997; 77(4) : 294 -8 [PubMed]

  • 9.

    Guillemin F, Bombardier C, Beaton D. Cross-cultural adaptation of health-related quality of life measures: literature review and proposed guidelines. J Clin Epidemiol. 1993; 46(12) : 1417 -32 [PubMed]

  • 10.

    Cohen J. A Coefficient of Agreement for Nominal Scales. Educ Psychol Meas. 1960; 20(1) : 37 -46 [DOI]

  • 11.

    Kemper KJ, Fink HD, McCarthy PL. The reliability and validity of the pediatric appropriateness evaluation protocol. QRB Qual Rev Bull. 1989; 15(3) : 77 -80 [PubMed]

  • 12.

    Werneke U, MacFaul R. Evaluation of appropriateness of paediatric admission. Arch Dis Child. 1996; 74(3) : 268 -73 [PubMed]

  • 13.

    Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977; 33(1) : 159 -74 [PubMed]

  • 14.

    Hoehler FK. Bias and prevalence effects on kappa viewed in terms of sensitivity and specificity. J Clin Epidemiol. 2000; 53(5) : 499 -503 [PubMed]

  • 15.

    Bakhtari Aghdam F, Mohammadpoorasl R. admissions and days of stay in patients based on appropriate assessment protocol in Tabriz Imam Khomeini Hospital. J Tabriz Uni Med Sci. 2006; 30(2) : 35 -9

  • 16.

    Fokari J, ghiasi A. Assessment of inappropriateness admissions and inpatient on base the appropriateness evaluation protocol in alinasab hospital at Tabriz. J Hospital. 2009; 9(3) : 39 -43

  • 17.

    Hatam N, Askarian M, Sarikhani Y, Ghaem H. Necessity of admissions in selected teaching university affiliated and private hospitals during 2007 in Shiraz, Iran. Arch Iran Med. 2010; 13(3) : 230 -4 [PubMed]

  • 18.

    Nabilu B, Mohebbi I, Alinezhad H. Productivity of Hospital Beds: Evaluation of Inpatient Bed Days in the West Azerbaijan Selected Hospitals. J Nurs Midwife Urmia Uni Med Sci. 2012; 10(4)

  • 19.

    Pourreza A, Kavousi Z, Mahmoudi M, Batebi A. Admission and numbers of days of staying of inpatient on the basis of the appropriateness evaluation protocols in, two Tehran university of medical sciences hospitals. J Pub Health Institute Pub Health Res 2006; 4(3) : 73 -84

  • 20.

    Yaghoobifar M, Maskani K, Akaberi A, Shahabipoor F. The Rate of Inappropriate Admission and day of stay of Patients in Hospitals of Sabzevar. J Sabzevar Univ Med Service 2011; 18(3) : 224 -32

  • 21.

    Ouladsahebmadarek E, Seidhejazie M, Rashidi M, Sahhaf F, Fardiazar Z. Evaluation of the appropriateness of hospital stay in gynecological wards in Tabriz Teaching Hospitals. Pak J Med Sci . 2009; 25(5) : 852 -6

  • 22.

    O'Neill D, Pearson M. Appropriateness of hospital use in the United Kingdom: a review of activity in the field. Int J Qualit Health Care. 1995; 7(3) : 239 [DOI]