OUP user menu

Sources and Rates of Errors in Methods of Individual Identification for North Atlantic Right Whales

Timothy R. Frasier , Philip K. Hamilton , Moira W. Brown , Scott D. Kraus , Bradley N. White
DOI: http://dx.doi.org/10.1644/08-MAMM-A-328.1 1246-1255 First published online: 15 October 2009


Many long-term studies of wildlife populations rely on individual identification based on natural markings or genetic profiling, or both. However, only rarely are these 2 independent data sets systematically compared with each other to estimate the error rates inherent in these studies. Here, >25 years of photo-identification data on the endangered North Atlantic right whale (Eubalaena glacialis) were compared with high-resolution genetic profiles, available for >75% of the individuals in the photo-identification catalog, in order to identify sources and rates of errors associated with both methods of individual identification. The resulting estimates were 0.0308 errors/identification for the photo-identification data, and 0.00121 errors/locus and 0.0327 errors/multilocus profile for the genetic data. These are among the lowest error rates yet reported, and indicate that the approaches used provide reliable means of individual identification for this species. However, despite these low error rates, the large size of the data sets results in a nonnegligible estimated number of errors, indicating that the potential for these errors needs to be incorporated into other analyses that are based on these data. A similar situation likely exists in other long-term studies where, although error rates are assumed to be low, the size of the data set results in a large number of errors that will influence subsequent analyses. Regularly conducting and reporting extensive database comparisons such as this is invaluable for maintaining the integrity of long-term data sets by identifying where sources of error are occurring and how protocols can be improved to lower error rates in the future.

Key words
  • error rates
  • genotyping
  • individual identification
  • photo-identification
  • right whale

The ability to recognize individuals through space and time is a great asset in studies of wildlife populations, enhancing the ability to perform detailed assessments of behavior (Goodall 1986), ecology (Durant et al. 2004), demography (Coulson et al. 2001), and evolutionary biology (Paterson et al. 1998). Additionally, individual-based data are often a necessity in the field of conservation biology, where such detail is frequently required for identifying the factors influencing recovery, and for designing appropriate management or conservation strategies (e.g., Coltman et al. 2003; McComb et al. 2001). This utility has led to the development of artificial marking techniques that allow for the identification of individuals from a wide range of species (e.g., Clutton-Brock and Pemberton 2004; Foerster et al. 2003; Le Boeuf and Reiter 1988). However, for an increasing number of species, and for whales and dolphins in particular, natural characteristics have been recognized that are stable through time within each individual, while variable among individuals, and thus provide reliable markers for individual identifications (e.g. Clutton-Brock et al. 1982; Hammond et al. 1990).

Molecular techniques also are becoming one of the primary methods for individual identification in population studies. This approach was originally based on the finding that the variability detectable at restriction fragment length polymorphism and minisatellite markers provides adequate resolution for individual-specific genetic profiles (Amos and Hoelzel 1990; Gibbs et al. 1990; Jeffreys et al. 1985; Quinn et al. 1987). Now this resolution is most often obtained through the analysis of multiple microsatellite loci (Paetkau and Strobeck 1994; Palsbøll et al. 1997). Additionally, the recognition that adequate amounts of DNA can be obtained from “noninvasive” sampling of biological material, such as from feces (Gillett et al. 2008; Reed et al. 1997) or hair (Banks et al. 2003; Woods et al. 1999), has allowed for individual-based population studies of many wildlife populations for which this level of detail would otherwise be unavailable.

Although the benefits of individual identification in wildlife studies are numerous, the implications of the errors that inevitability occur in such studies have only recently begun to receive adequate attention. These recent evaluations are largely focused on genetic identification and associated techniques, with little being reported on the error rates associated with identification of individuals based on natural markings. Indeed, despite at least 70 years of using natural markings for such purposes (Lorenz 1937), a relatively recent paper on humpback whales (Megaptera novaeangliae) claims to be “the first large-scale test of errors in individual identification by natural markings for any species” (Stevick et al. 2001:1862). Because the individual-based data obtained from these studies are frequently used to estimate fundamental aspects of population biology such as abundance, trends through time, and, for endangered species, extinction risk (e.g., Fujiwara and Caswell 2001), estimating the associated error rates should be a high priority in order to properly understand the biases and sources of variability associated with these estimates.

The recent surge in studies assessing the error rates associated with genetic-based methods of individual identification is largely due to data from studies based on noninvasive sampling techniques and ancient DNA (Bonin et al. 2004; Pompanon et al. 2005). DNA extracted from these biological materials is known to be of low quantity or quality, or both, and thus strict protocols have been developed to minimize, as well as report, the genotyping errors that result from these data, and to incorporate these error rates into any subsequent analysis and interpretation (Creel et al. 2003; Paetkau 2003). However, more recent studies have found that nonnegligible genotyping error rates also are widespread in studies based on DNA of high quantity and quality, and that even small error rates can have large effects on subsequent interpretation of the data (Bonin et al. 2004; Hoffman and Amos 2005). It has therefore been suggested that estimates of genotyping error rates should be reported in association with all individual-based genetic data sets (Hoffman and Amos 2005).

Right whales (genus Eubalaena) have patches of raised cornified skin on their heads called callosities (Payne et al. 1983). The callosity patterns are different for each individual and are stable through time, and thus provide a means of individual identification based on these natural markings (e.g., Best 1990; Kraus et al. 1986). Consistent photo-identification research on the endangered North Atlantic right whale (Eubalaena glacialis), based on callosity patterns and other distinguishing characteristics such as ventral pigmentation and scarring, began in 1980 and has been ongoing, almost year-round, ever since (Hamilton et al. 2007; Kraus et al. 1986). The population size is estimated at ∼400 individuals, and the rate at which “new” whales, whose births were not documented, are added to the photo-identification database is so low (∼l–2%/ year) that the direct count of individuals is thought to be an accurate census of population size (Clapham et al. 1999). Thus, comprehensive individual-based photo-identification data are available for the majority of individuals in this species.

Tissue sample collection from individual North Atlantic right whales for genetic analyses began in 1988 (Brown et al. 1991), and as a result of close collaboration between several research teams, samples have been collected from the vast majority of identified individuals. High-resolution genetic profiling protocols (based on 35 microsatellite loci) have been developed for this species (Frasier et al. 2006), and individual-specific genetic profiles are currently available for more than 75% of all identified individuals.

Combined, the high-resolution photo-identification and genetic databases for such a large portion of the species provide a powerful tool for understanding their biology, and also provide great potential for improving our understanding of the biology of small populations in general. The photo-identification or genetic databases, or both, provide the basis for the majority of research and conservation initiatives for this species (Kraus and Rolland 2007). Thus, North Atlantic right whale research and conservation hinge on the reliability of both the photo-identification and genetic methods of individual identification. Because these techniques provide 2 independent methods of identifying individuals and are almost always collected in tandem, they serve as excellent checks against each other. Here, we present an extensive comparison of the 2 databases for the purposes of quantifying the error rates associated with both methods of individual identification and identifying the source of those errors. These data are essential for assessing the reliability of both techniques, for allowing this error to be incorporated into subsequent analyses and assessments, and for minimizing such errors in the future. Additionally, through these analyses we were able to document other discrepancies and inconsistencies between the 2 databases that do not necessarily result in errors in individual identification (e.g., in sample labeling and recording).

Materials and Methods

Sources of individual identification data.—Images of North Atlantic right whales were collected primarily during shipboard and aerial surveys conducted throughout the known range (see Brown et al. 2007; Kenney et al. 2001). Photographs from opportunistic sightings and some research efforts go back as far as 1935, but uninterrupted photo-identification studies began in 1980. From 1980 through 2002, photographs were preserved on slide or print film, with a transition to digital images in 2003. These survey and photo-identification data were collected by many organizations and institutions, which are part of a collaborative research effort referred to as the North Atlantic Right Whale Consortium (www.rightwhaleweb.org). Photographs submitted by all members of the Consortium are processed, cataloged, and maintained by researchers at the New England Aquarium, who actually perform the photo-identification analyses and curate the photo-identification catalog and database. Briefly, all incoming photographs are compared to identified whales in the North Atlantic right whale catalog (Hamilton and Martin 1999), and all identifications are confirmed by 1 or 2 experienced matchers. All photo-identification analyses are conducted by eye and matchers follow strict protocols to reduce the chance of errors (Hamilton et al. 2007). Although some computer programs have been developed to aid identification in some right whale species (e.g., Hiby and Lovell 2001), these have not proven to be helpful with North Atlantic right whales. Since 2003, all new images have been collected with digital cameras, and many of the older slides have been digitized. These data are integrated into a completely electronic, online database and matching platform called DIGITS (Digital Image Gathering and Information Tracking System—Hamilton 2007). The DIGITS program is designed for sophisticated matching queries and easy data entry and retrieval. The database consists of >500,000 slides, prints, and digital images collected during the ∼46,000 sightings of >500 individuals photographed since 1935.

Skin samples have been collected for genetic analyses since 1988 by several members of the Consortium, primarily using the method described in Brown et al. (1991), but with the following modifications: fishing line is no longer used to retrieve crossbow bolts; instead they are equipped (Excalibur Crossbow Inc., Kitchenger, Ontario, Canada) with a compressed foam float that acts both as a float and a stop collar, which results in the bolt bouncing out of the whale after reaching the desired level of penetration; the tip used is that described as “Tip #2” in Palsbøll et al. (1991) that is screwed directly onto the tip of the bolt; and an Excalibur Vixen crossbow (Excalibur Crossbow Inc.) with a 150-pound draw weight is generally used. Samples are stored in a 20% dimethylsulfoxide solution saturated with NaCl (Seutin et al. 1991). Individual-specific genetic profiles are obtained for each sample based on genotype analysis at 35 microsatellite loci as described in Frasier et al. (2006). The North Atlantic right whale genetic database is curated and maintained at Trent University, Peterborough, Ontario, Canada.

All research and sample collection followed guidelines approved by the American Society of Mammalogists (Gannon et al. 2007), and were approved by the appropriate governing bodies in both Canada and the United States. A detailed list of the permit numbers (for both countries) under which these data were collected can be found in appendix A of Kraus and Rolland (2007).

Database comparison.—The photo-identification and genetic databases were compared in 2 ways in order to detect individual identification errors, and discrepancies in data collection or sample labeling or both, respectively. Individual identification errors were detected by 2 methods. First, we compared the genetic profiles for all replicate sampling events of each whale. These represented cases where examination of the photo-identification data suggested that the same whale was sampled multiple times. If a discrepancy was found between replicate genetic profiles of the same whale, the genetic and photo-identification data were both reassessed to identify the source of the discrepancy. Second, we compared all genotypes in the database against each other to identify cases where the same whale was sampled, but for which photo-identification had either not been able to match both sightings involved (false negative error) or the 2 sightings were matched as different whales (false positive error). When such cases were found, the raw photo-identification and genetic data were examined to identify the source of error.

Although data on mother-calf pairs also are available to use for error rate estimation for this species, this approach is known to underestimate the true error rates because of the lower resolution inherent in such comparisons (e.g., Hoffman and Amos 2005). Because a large number of replicate samples were available to detect errors, and these provide the highest resolution in error rate analyses, the data on mother–calf pairs were not used for error rate detection, but instead were used a posteriori to compare the expected number of mother–calf mismatches (based on the error rate estimate detected from duplicate sampling events) to the observed number within the data set, and thus provide a “check” for our estimated error rate.

Discrepancies in data collection or sample labeling for the photo-identification and genetic databases were detected as follows. For any given day, photo-identified whales are lettered so that each sighting has a unique record that can be used as a reference that is independent of the actual identification of the whale. For example, if 2 whales were photographed on 20 July 2002, they would be labeled as whale “A” 20-July-2002, and whale “B” 20-July-2002. If whale “B” also was sampled that day, the sample tube would be labeled as “B” 20-July-2002. The sample labels often include the habitat area in which the sample was collected, as well as the name of the research vessel. This labeling strategy minimizes sample confusion when whales are being sampled on the same day by different research teams. We compared the sample labels associated with each sample between the 2 databases. When discrepancies were identified, we assessed the associated field and laboratory notes to identify the source of the discrepancy. Other discrepancies in data collection or sample labeling were detected while reviewing the individual identification data.

Error rate calculation.—We followed the approaches outlined in Pompanon et al. (2005) for calculating error rates. Genotyping error rates were calculated per locus, as well as per multilocus profile (described in Table 1). For the photo-identification data, 1 sighting was considered the type sighting and additional sightings were considered potential matches. If a whale was matched and genetically sampled twice, we counted it as 1 replicate event (and potential matching error), not 2. If it was matched and sampled 4 times, we counted it as 3 replicate events and potential matching errors. This approach represents the most accurate assessment of true error because it uses the number of opportunities to detect an error as the denominator, as opposed to the total number of sampling or identification events.

View this table:
Table 1

Example of methods used to calculate genotyping error rates using hypothetical data. Shown are the methods used to calculate the error rate per locus, as well as per multilocus profile.

Locus 1Locus 2Locus 3Error rates per
IndividualSampleAllele 1Allele 2Allele 1Allele 2Allele 1Allele 2TotalsProfile
111001022032031861881 error in 2 replicate profiles1/2 = 0.5
211021022032051881881 error in 1 replicate profile1/1 = 1.0
Totals Error rates1 error1 error0 errors2 errors in 9 replicate loci2 errors in 3 replicate profiles
per locus1/3 = 0.331/3 = 0.330/3 = 00.22 errors per locus0.67 errors per profile


Errors in genetic identification.—The database comparison resulted in 245 replicate multilocus profiles, representing 7,466 replicate loci from samples collected between 1988 and 2006. Nine genotyping errors were found in 8 profiles, resulting in error rate estimates of 0.0327 errors per multilocus profile and 0.00121 errors per locus. No locus was particularly error-prone, with no errors found in 28 loci, 1 error found in 5 loci, and 2 errors found in 2 loci. There was a positive relationship between variability of loci and their estimated error rates; however, these trends were not statistically significant (Fig. 1). There was a significant positive relationship between mean allele size and error rate (P < 0.01; Fig. 1), which agrees with previous studies suggesting that loci with larger polymerase chain reaction products are more error-prone (e.g., Hoffman and Amos 2005).

Fig. 1

Relationship between error rates and characteristics of the microsatellite loci for North Atlantic right whales (Eubalaena glacialis). Included are comparisons of error rate versus a) polymorphic information content (PIC—Botstein et al. 1980), b) allele size, c) number of alleles, and d) expected heterozygosity (H— Nei 1978). Linear regression lines, R2 estimates, and P-values (based on analysis of variance tests) also are included.

An error rate of 0.0327 per profile indicates that 1 profile in every 31 should contain errors. The genetic database currently contains profiles for 391 individuals; however, multiple samples have been genotyped for 151 of them (39%). Because replicate samples are available as “checks” for these profiles, they are not expected to contain errors. Genetic profiles of the remaining 240 genotyped individuals are expected to contain 240 × 0.0327 = 8 errors, based on the estimated error rate. We used data from mother-calf pairs to assess if this estimate accurately represents the expected error rates for this data set. Profiles are available for 168 mother-calf pairs, and therefore we expected approximately 168 × 0.0327 = 5 errors to be found. Comparing the genetic profiles of identified mother-calf pairs resulted in the detection of 5 errors, indicating that the error rate estimate obtained from the replicate profile comparison does accurately estimate the expected number of errors for the rest of the data set.

There were 3 identified causes of the 9 genotyping errors: weak amplification of a locus, abnormal peak morphology, and scoring error (homozygous individuals incorrectly scored as heterozygotes). Three (33%) of the genotyping errors were found in individuals where the locus of interest showed only weak amplification, which resulted in only 1 allele being observed in individuals that were actually heterozygous (e.g., “allelic dropout”; Table 2). Although the scoring protocols for this project include rules to not score loci that show weak amplification (e.g., that have peak heights below a certain relative fluorescent unit) to minimize the potential for misscoring individuals due to allelic dropout, these protocols were apparently too lenient and have been altered to reduce these errors in the future.

View this table:
Table 2

Sources of discrepancies between the genetic and photo-identification data for North Atlantic right whales (Eubalaena glacialis). Included are the photo-identification and genotyping error rates, as well as the discrepancies detected in data collection and recording. Note that the majority of discrepancies in data collection and recording do not represent errors (see text for further explanation).

StageComparisonsSource of discrepancyNo. discrepanciesErrors/identification
Photo-identification162Multiple whales photographed when thought to be just 1 Only 1 side of head photographed; poor photograph quality Matching calf from birth year to images in subsequent years Whale not cataloged at time of sampling2
Data collection and recording726Errors resulting in identification discrepancies
Multiple-whale groups9
Recording or labeling error8
Other or unknown20
Discrepancies not resulting in identification conflicts Dead whales36
Data recording of a specific team14
Whale seen multiple times13
Totals (discrepancies not necessarily errors)726770.106
Genetic profiling245 profilesAllelic dropout3
7,466 lociAbnormal peak morphology4
Scoring error2
Totals9 errors in 8 profiles0.0327 errors/profile, 0.00121 errors/locus

Four (44%) of the genotyping errors were due to abnormal peak morphology resulting in nonallele peaks being scored as actual alleles. The cause of the altered morphology is not clear, but the phenotype was poor separation of peaks resulting in “blobs” of fluorescence in and around the expected allele peaks. This phenotype was apparently not due to poor electrophoresis conditions, because other loci within the same multiplex had clean morphologies. Replacing primers at the 1st sign of this abnormal morphology alleviated the problem, suggesting that the primers or the fluorescent tag, or both, had become compromised. To minimize the potential for primer degradation and contamination in our laboratory, primers are distributed into 1-time-use aliquots, and therefore each aliquot is thawed and opened only once. Thus, primer degradation or contamination appears to be an unlikely explanation. Similar problems have been reported for more than a decade (e.g., Hengen 1995), but there does not yet appear to be adequate data to assess what processes or factors are compromising the primers. Manufacturers state that primers are stable for at least 6 months if they are kept frozen, suggesting that primer quality deteriorates over time even if kept at −20°C. All of the primers for which this phenotype was observed were > 1 year old, thus primer or label deterioration while frozen appears to be at least partially responsible for the abnormal morphology.

The remaining 2 genotyping errors were due to scoring errors, where homozygous individuals were scored as heterozygotes. These errors occurred because amplification of these loci in these samples resulted in more polymerase chain reaction product than in the majority of other samples, resulting in the allele peaks being saturated (reaching the highest detectable fluorescence). Consequently, the stutter peaks, although of much less intensity, appeared to be of a similar height. Based solely on peak height, the individuals appeared to be heterozygotes; however, closer examination of the peak morphologies of the raw data indicated that only 1 allele was present. Examination of the raw data is always carried out as part of the scoring protocol for this project, but the discrepancy was apparently missed in these 2 cases.

Errors in photo-identification.—The database comparison resulted in 282 sightings of 120 different whales that had replicate photo-identification events that also could be checked with the genetic data. These 282 sightings resulted in 162 potential matching errors. This number is lower than the 245 replicate profiles available for the genetic analysis due to some replicate profiles representing multiple genotyping events of the same samples. These do not represent distinct sampling (and thus photo-identification) events. Five photo-identification errors were detected, resulting in an estimated photo-identification error rate of 0.0309 errors/identification (Table 2). Three were false positives (incorrectly identifying 1 individual as another) and 2 were false negatives (failing to identify an individual).

All of the matching errors were a combination of poor data collection, poor images, and matching errors. For 2, the error occurred because field researchers recorded that a group of images were all of 1 whale, when in fact the photographs were of 2 different whales. In 1 case, 5 of the 6 frames (all right heads) represented the whale that the group of photographs had been matched to, but 1 frame (a left head) was of a different whale. The matcher did not notice this during photo-analysis, and it was the 1 left-head frame that represented the whale that was sampled genetically. If the matcher had been using only the single frame of the sampling event to match, the matching error would not have been made. In the other such instance, the incorrect grouping of images resulted in a false negative. A single frame of the whale's right head was collected at the time of sampling and was grouped with other images of a left head thought to be of the same sighting. The matching notes showed that the matcher had reviewed the correct whale that was sampled, but because the images of the left head did not match the catalog file, that potential identification was dismissed. Again, this matching error would not have occurred if the images from the sighting were grouped correctly.

For another of the errors, a single, oblique, dark image was the only photographic data from a sampling event. This photograph was misidentified as a whale that looked similar. In review, this single image could never be confirmed as either whale and it was an error in judgment to confirm it as the correct whale. The 4th photographic error involved matching the sightings of a calf with its mother the year it was born to sightings of this whale in subsequent years. Calves are notoriously difficult to match between their calf year and subsequent years because their identifying features can change significantly (Hamilton et al. 2007). In hindsight, the quality of the calf images was not adequate to reliably match to sightings in subsequent years. The 5th error occurred because the whale that was sampled had not been cataloged yet. When that individual was finally cataloged, the sighting at which it was sampled was missed and the 1st sighting in the database was a full 2 years later than the genetic sampling. When a whale is cataloged, it is compared to all previous unmatched sightings; the sighting when the whale was sampled was missed at this point for unknown reasons.

Data collection or sample labeling discrepancies.—We also identified discrepancies in data recording or sample labeling procedures, or both, in our databases. Not all of these discrepancies were due to errors, some were the result of different data being recorded in each database. For example, photo-identification and genetic databases had different sampling dates for most samples obtained from dead whales. This resulted because necropsies often take several days, and a sample often is not collected on the 1st day that the dead individual is reported. In the photo-identification database only 1 “sighting” is given to dead whales, which means that all data associated with a necropsy are associated with the 1st day that the individual was found. However, the actual sampling date is written on the tube containing the sample for genetic analysis, resulting in different dates being recorded for dead individuals between the 2 databases. Although this is a discrepancy between the databases, it does not represent an error per se, because the data in both databases are correct, just different. However, these types of discrepancies indicate where sample collection and recording protocols need to be improved. Importantly, this situation does not result in a discrepancy in individual identification because both databases have the same individual recorded as being sampled, but the data associated with the sample for that individual has been recorded in a different way. Because not all discrepancies in sample labeling and recording were due to errors, nor resulted in discrepancies in individual identification, they were divided into 2 categories: those that resulted in discrepancies in individual identification and those that did not.

Out of the 726 sample label–record comparisons between the databases, 37 (5.1%) contained errors that resulted in a discrepancy regarding which individual was sampled, whereas 77 (10.6%) contained discrepancies that did not result in differences in which individual was recorded as sampled (Table 2). All of the 37 discrepancies resulting in conflicting data on individual identification were due to data collection and recording errors. The majority of these errors for which the cause could be identified (9/37 = 24%) occurred during data collection from large social groups, where many whales were present and highly active, resulting in difficulty keeping track of the identities and locations of the whales present. The 2nd most frequent cause of these errors (8/37 = 22%) was mistakes in data recording, such as the date being mislabeled on the sample tube. The remaining discrepancies were due to a variety of other errors, or the cause could not be determined (Table 2).

The vast majority of the 77 discrepancies that did not result in conflicting individual identification data were not due to errors, but instead resulted from data (all of which were correct) being recorded differently in each database (Table 2). Thirty-six (47%) of these discrepancies were due to data associated with dead whales, as described above. Fourteen (18%) represented samples that were collected by 1 specific research team that uses several different identifiers for each sample, and a different identifier was recorded in the photo-identification and genetic databases. Thirteen (17%) of the discrepancies were due to whales that were seen twice on the same day, either by the same or different research teams, and thus given 2 different letters with each database having the sampled whale as a different letter. The remaining 14 (18%) of the discrepancies were due to a variety of other data collection and recording errors or discrepancies, or both.


All 3 of the error rates estimated here (genotyping, photo-identification, and sample labeling or recording) are strikingly similar to those reported by Palsbøll et al. (1997) and Stevick et al. (2001) for North Atlantic humpback whales, the only other published data set for which both photo-identification based on natural markings and genetic data were compared against each other to estimate errors in individual identification. These similarities across data types were unexpected and seem remarkable given that these studies represent different species, population sizes, research teams, and photo-identification and genetic profiling techniques and protocols. One implication is that errors in individual identification may occur at similar frequencies in other long-term studies using similar techniques.

The estimated error rates for the genetic identification techniques of 0.121% per locus and 3.27% per multilocus profile are low compared to most studies, but similar to error rates detected in North Atlantic humpback whales discussed in Palsbøll et al. (1997) and Stevick et al. (2001). Using those data, Palsbøll et al. (1997) described an error rate estimate of 0.11% per locus. However, studies that combine genetic and photo-identification data are rare; more data are available for studies using only genetic techniques for individual identification and the error rates in these are higher. For example, in a study based on tissue samples from brown bears (Ursus arctos) in Sweden, Bonin et al. (2004) estimated their genotyping error rate to be 17.6% per multilocus profile. Additionally, in an ongoing study of Antarctic fur seals (Arctocephalus gazella), Hoffman and Amos (2005) estimate their average genotyping error rate to be 0.455% per locus.

Using the identified error rates to estimate how many errors are present in the entire genetic database for right whales results in an expectation of 8 errors. This approach of estimating database-wide error rates was validated by the comparison of mother–calf pairs, which showed exactly the same number of errors as were expected by extension of the identified error rates. Combined, examination of these data shows that the genetic database for North Atlantic right whales is a robust and comprehensive genetic inventory of a wild population, and that the protocols currently in use provide reliable means of individual identification. Additionally, all errors detected were preventable, and our analyses have shown how protocols can be improved to minimize such errors in the future and increase the integrity of this data set.

The estimated photo-identification error rate of 3.09% also is remarkably similar to that for North Atlantic humpback whales. Stevick et al. (2001) discovered 14 photo-identification errors out of 414 replicate identifications, resulting in an estimated error rate of 3.38%, or 1 error in every 30 identifications. Interestingly, all of the photo-identification errors of Stevick et al. (2001) were false negatives (incorrectly identifying 2 sightings of the same whale as different), whereas the errors detected here were a mix of false positives (incorrectly identifying 2 sightings of different whales as the same) and false negatives. One important difference between the 2 species that may have influenced these results is that humpbacks can be identified from a single photograph of the underside of the tail, whereas right whales often require photographs of both sides of the head. Thus, false positives may be more likely in the data set for right whales if photographs are only available for 1 side of the head. Indeed, 3 of the 5 right whale identification errors would have been avoided if there had been photographs of both sides of the head correctly assigned to the sighting (see the first 2 rows of Table 2).

The photo-identification process has protocols in place that should have caught all of these errors. At least 2 people review every identification and both the person performing the initial identification and the person or people that confirm their work are supposed to ensure that every image from a sighting matches the whale in question and that the quality of the images is adequate to be confident about that identification. The error rates indicate that there needs to be increased emphasis on this checking phase of the process. The errors also underline the importance of recording and reviewing the frames that were taken at the exact time of the sampling. For species that require multiple images of different parts of the head and body to reliably identify individuals, it is these specific photographs that form the strongest link to the genetic data.

It is difficult to determine whether the photo-identification error rate from this subsample of data reflects the error rate for the entire photo-identification database of >46,000 records. If it does, it would mean that 1 in every 33 records could be misidentified, or that the entire database contains ∼1,400 errors. Although this is potentially true, there are several reasons why the actual error rate may be lower. Because the majority of the population has been genetically sampled, researchers are often trying to get a skin sample from a specific whale and may do so under more difficult circumstances (with poor light, from a bad angle to get photographs, or during the large social groups where body parts and heads can easily get confused). Such situations would result in a higher photo-identification error rate in those cases where a genetic sample is collected than in the photo-identification database in general. Additionally, the researcher collecting the sample is always on the bow of the boat and can be quite distant from the data recorder and photographers on some vessels. This distance can adversely affect communication and thus the quality of the link between the photographs and the sampling event.

Another consideration is that the error rate does not directly assess the quality of matching for photographs taken during aerial surveys (because no genetic samples can be collected from the air). Aerial surveys provide a substantial number of the sightings in the database (35% and growing). In some cases, the quality of the identification information from aerial surveys is higher than from shipboard because they can provide a view of the entire head and callosity pattern in 1 photograph, whereas shipboard images often contain the view of only 1 side of the head at a time. This avoids errors caused by the heads of 2 different whales being mistaken as 1 in a sighting. However, the distance to the whale, the speed of the platform, and whales being partially underwater all can result in blurry, distant, or obscured images. These specific quality issues are more likely to result in false negatives (the sighting not being identified) rather than false positives. Given this conflicting quality of aerial data, it is difficult to determine whether the database-wide error rate would be expected to be higher or lower than that reported for data collected from boats.

Analyses of photographs of humpback whales include quantification of photographic quality, as well as the distinctiveness of individuals (e.g., Friday et al. 2000; Stevick et al. 2001). These data are useful for incorporating data on error rates into subsequent analyses. For example, Stevick et al. (2001) show that none of the detected errors involved photographs of the highest quality. Thus, if a specific analysis must make the assumption of no errors and can utilize a smaller data set, then only images of the highest quality can be used. On the other hand, if errors can be accounted for and a larger sample size is desired, photographs of different quality scores can be used. In this way, the data associated with photographs of different quality can be used to aid decisions regarding what data are best suited to different analyses. The photographs in the catalog of North Atlantic right whales are not quantified based on quality or distinctiveness. Although the catalog would benefit from including assessments of both, such assessments would require unique protocols because multiple images are often used to make an identification, and, therefore, it is not the quality or distinctiveness of a single image that needs to be considered, but rather the quality of the identification based on a series of photographs. For example, some whales are very distinctive and can be identified based on only a single image of 1 side of the head, other individuals are only distinctive when both sides of the head (or additional body parts) are photographed, and some whales are relatively indistinct even when all the identification information is available. Potential future approaches could develop criteria for coding the composite of information from multiple images, or select a single image to be the type image for the sighting and only code the information available in that image. It would also be useful to have each sighting coded for the quality of the data behind that sighting (an assessment of potential confusion based upon the number of whales in the area, the number and skill of the people collecting the data, etc.). If appropriate protocols are established, incorporating photograph quality, data quality, and distinctiveness scores for each sighting could be used to filter a subset of high-quality identifications for population assessments and to determine which sightings require a more rigorous photo-identification review.

The consequences of error rates in individual identification are likely to be greatest for photo-identification data, which forms the basis of all analyses regarding species status and trends through time. The majority of these analyses are based on mark-recapture approaches, and assume that the photo-identification data do not contain errors (e.g., Fujiwara and Caswell 2001). Mark-recapture analyses are sensitive to errors in individual identification (e.g., Amstrup et al. 2005), and these studies need to include the potential for errors to accurately estimate demographic parameters and the confidence intervals around those estimates. Assessing errors is particularly important given the critically endangered status of North Atlantic right whales, where changing data on just 1 individual can have dramatic effects on the estimated sustainability of the entire species (Kareiva 2001). Moreover, mark-recapture approaches are more sensitive to.false positive errors than false negatives, with false positives potentially resulting in smaller estimates of population size (e.g., Mills et al. 2000).

Our estimates of error rates in individual identification are likely to cause less of a problem for genetic analyses. The reported errors would have the largest effect on paternity assignment and estimates of male reproductive success. However, the potential for errors (at a rate higher than estimated here) was accounted for in those analyses (Frasier et al. 2007), and thus already incorporated into the data analysis and interpretation.

Although the majority of discrepancies found between the 2 methods of individual identification were not due to errors in data collection and recording, these discrepancies still represent potential sources of confusion. All of the data collection and recording discrepancies are avoidable and could have been prevented by better communication among data collectors and recorders. Thus, training procedures for sample collectors will be reevaluated and instructions for sample labeling and recording will be revised. Evaluating and improving existing protocols is a primary benefit of performing an analysis such as ours, and provide information on where discrepancies may occur in similar studies of wild populations. Originally, estimating error rates was thought to be relevant only to genetic studies based on “noninvasive” sampling techniques that deal with DNA of low quantity or quality, or both. Only recently have error rates been estimated in studies based on tissue samples resulting in DNA of high quantity and quality (Bonin et al. 2004; Hoffman and Amos 2005). These studies indicate that all genetic studies likely have nonnegligible error rates, which should be investigated and incorporated into subsequent analyses and interpretations. Our analyses revealed an error rate of 3.27% per multilocus profile despite using high-quality DNA, rigorous scoring protocols, and highly trained personnel (e.g., Paetkau 2003). Although this error rate results in an expectation of only 8 errors in the genetic database, this number will grow as the study continues and the database expands. We encourage others to rigorously evaluate error rates in their studies even when these rates are thought to be low.


This work would not be possible without an extraordinary collaborative effort to collect photographs of right whales for individual identification and skin samples for genetic analyses. For a complete list of the collaborators and contributors, and the scientific research permits under which the research was conducted, see table 4.1, appendices A and B, and acknowledgments in Kraus and Rolland (2007). Photographs have been contributed by literally hundreds of people and organizations. All data were provided through the North Atlantic Right Whale Consortium (www.rightwhaleweb.org). Funds were contributed for this work by the National Marine Fisheries Service, Natural Sciences and Engineering Research Council of Canada, Canadian Whale Institute, and a Canada Research Chair grant to BNW. We thank 2 anonymous reviewers for providing useful comments that improved the quality of this manuscript.


  • Associate Editor was William F. Perrin.

Literature Cited

View Abstract