Refining the evaluation of operating room performance.

Kim MJ, Williams RG, Boehler ML, Ketchum JK, Dunnington GL. 

Journal of Surgical Education.2009 Nov-Dec



An accurate and consistent evaluation of resident operative performance is necessary but difficult to achieve. This study continues the examination of the Southern Illinois University (SIU) operative performance rating system (OPRS) by studying additional factors that may influence reliability, accuracy, and interpretability of results.


OPRS evaluations of surgical residents by faculty at SIU, from 2001 to 2008, were analyzed for the most frequently rated procedures to determine (1) the elapsed time from the procedure until completion of rating, (2) the patterns in responses of procedure-specific and global surgical skills items, and (3) whether particular evaluating surgeons differed in their stringency of ratings of resident operative performance.


In all, 566 evaluations were analyzed, which consisted of open colectomy (n = 125), open inguinal hernia (n = 103), laparoscopic cholecystectomy (n = 199), and excisional biopsy (n = 139). The number of residents evaluated per training level (PGY) ranged from 88 to 161. The median time to completion of evaluations was 11 days, 9 hours. The quickest evaluation was 18 hours after assignment. Most were completed within 4.5 to 22 days. Procedure-specific and global scale scores resulted in similar rank-ordering of performances (single-measure intraclass correlation using the consistency model = 0.88; 95% confidence interval [CI] = 0.87-0.90) and similar absolute OPRS scores (single-measure intraclass correlation using the consistency model = 0.89; 95% CI, 0.87-0.90). Evaluating surgeons differed in stringency of ratings across procedures (average difference = 1.4 points of 5 possible points). Resident performance improved with increasing PGY level for all 4 procedures.


Substantial time elapses between performance in the operating room and the completion of the evaluation. This raises the question of whether surgeons remember the nuances of the procedure well enough to rate performance accurately. The item type used for rating does not affect the absolute rating assigned or the rank ordering of the performance. Differences in stringency of evaluators indicate the need for multiple resident performance observations by multiple surgeons. These findings are the foundation for an upcoming multi-institutional trial.

PubMed ID 20142134