Reliability and Validity of the DISCOVER Assessment 


This section provides a summary of studies that have been completed to show reliability and validity for the DISCOVER Assessment.  To accommodate non-technical readers of this material, many terms and procedures, common to researchers, are explained in an easy-to-read format, and some conclusions are generalized based upon underlying data.  If you are a researcher who is interested in the statistical particulars of the various studies, click the “Details” graphic where available.

     The DISCOVER Assessment was developed and refined over a 13-year period.  DISCOVER research has been supported by the Office of Bilingual Education and Minority Languages Affairs and the Javits Gifted and Talented Education Program.  The Assessment has been used with multicultural populations in the United States and abroad and with students from varied economic levels.  To learn more, click DISCOVER Assessment

     Reliability:  Reliability is the measurement of how consistently an instrument accomplishes its intended purpose.  For example, an instrument with high reliability will produce consistent results when implemented under similar circumstances by different individuals.  DISCOVER Assessment researchers have analyzed reliability studies for consistency in several categories.  They have analyzed results to compare what happens when the Assessment is administered by DISCOVER staff and when the Assessment is administered by trained non-DISCOVER staff.  Researchers, also, have analyzed results to compare the impact of Observers who have varying levels of experience.

     The studies focused on Observers, the trained individuals who observe problem solving strategies during the Assessment.  After trained Observers viewed the same assessment, the researchers analyzed the results to determine if individual Observers reached the same conclusions.   The researchers found that the difference between DISCOVER staff and trained non-DISCOVER personnel was small, but a sizeable variation did occur between the levels of experience.

     Observers were categorized as Novice (having observed less than 10 Assessments), Experienced (10-29 Assessments) and Expert (30 or more Assessments).  Agreement among Novice Observers varied considerably with agreement occurring anywhere from 47% to 92% of the time.  However, agreement among Expert Observers was between 92% and 100%.  Interestingly, experience did not seem to be a factor when looking specifically at agreement on the “Definitely a Superior Problem Solver” rating, the rating used by most schools as a criterion for placement in special programs.  Observers across all experience levels agreed on this rating 95% of the time.

     Recent studies, across all levels of experience, have shown agreement among DISCOVER personnel to average 81%, with 100% agreement on the “Definitely” rating.  Overall agreement between DISCOVER Personnel and school district teams averaged 85%, with 82% overall agreement among members of the district teams.

     Because Assessment reliability is dependent, to some degree, upon experience, the importance of sufficient training and practice for new observers, as well as continual use of the skills learned, must be stressed.  Observers are required to be certified by DISCOVER trainers.  Re-certification and supplemental training is required at two year intervals.

     Validity:  Validity, in a broad sense, is a determination of whether an instrument actually measures what it is intended to measure.  Several categories of validity exist.  Here the focus will be on three that are specifically related to the DISCOVER Assessment:  Theoretical, Concurrent, and Predictive.

     Theoretical Validity addresses the extent to which results align with expectations given the underlying theory.  Incorporated into the design of DISCOVER is Dr. Maker's belief that all races have roughly an equal percentage of exceptional or “gifted” individuals.  They may not be gifted in the same ways but, when considering the different intelligences, are gifted in equal numbers.  Therefore, the assumption is that if a school uses the DISCOVER Assessment as a placement mechanism in programs for the gifted, the ethnic balance in these programs will reflect the overall ethnic composition of the school.  Approximate ethnic balance, in fact, does occur when using the DISCOVER Assessment as a placement tool.  The percentage of students who receive the highest ratings (relative to their group size) is similar across ethnic, cultural, language, and economic groups (Nielson, 1994; Maker, 1997; Sarouphim, 1999). 

     Concurrent Validity compares a new instrument with established instruments that are intended to measure the same items.  Concurrent validity for the Assessment has proven difficult to establish because of a lack of similar instruments.  As discussed in the Problem Solving section, most other tests look at problem types one and two only.  DISCOVER collects information on all six problem types, which creates a more complex situation.  Nevertheless, some degree of correlation between aspects of the Assessment with other established instruments would be expected.  With few exceptions, some correlation has been the case.  The infomation collected is used by the DISCOVER researchers to fine-tune Assessment procedures and to improve these correlations.  Click the details button for research data.


    Predictive Validity addresses the issue of whether a test can be used to predict who will do well on certain activities and whether similar results will be repeated over a number of years.  This type of validity is difficult to determine because intellectual growth is dynamic.  The strengths measured by DISCOVER are not fixed in an individual and can increase or decrease depending on how they are developed.  As a result, ratings can fluctuate from year to year, and isolating whether the fluctuation occurs because the individual changes or because of the Assessment design is difficult.  The staff continues to study these issues.  A previous study compares results over a three-year period.  In this study, Romanoff (1999) examined a problem solving assessment (PSA) containing three of the DISCOVER assessment activities.  The PSA was used with students referred by their teachers as showing promise; the students referred, tested, and selected were compared with those referred, tested and not selected.  Scores in reading and math, as measured by North Carolina end of grade tests and averaged across grades 3, 4, and 5, were significantly higher for all students identified as gifted (M=84.03 for gifted and 58.37 for non-gifted). Romanoff (1999) also found that the differences between gifted African American and Caucasian students were not as great as those between non-gifted African American and Caucasian students.

Visit Us Back to Home Page Contact Us