Interpreting yDNA Test Results

1.  Introduction
2.  STR test results
3.  SNP test results and haplotrees
4.  NGS test results, including BigY
5.  FTDNA database size and text kit prefixes
6.  Further information

1.  Introduction

There are two types of basic yDNA tests: STRs and SNPs (pronounced "snip"s).  An analogy is that STRs identify the leaves of a tree, while SNPs identify the twigs. 

STRs are liable to relatively frequent mutations, whereas SNP mutations are much more stable.  

ySTR tests and ySNP tests complement one another.  However the first test is always a STR test, and individuals tackling genetic genealogy for the first time can ignore SNP tests to start with. 

ySTR tests predict, and ySNP tests confirm an individual man's haplogroup (analogy: the branches of a tree).  Haplogroups are used in ethnicity/Deep Ancestry studies, which relate to the pre-surname era and are not directly relevant to surname genealogy.  However two men with the same surname but different haplogroups cannot be paternally related to one another during the surname era.  

2.  STR test results

FTDNA publish the results of an STR test in a confidential on-line webpage from which a paper certificate can be downloaded.  The results, commonly known as the testee's haplotype, or genetic signature, comprise a number of markers, typically 37, mostly identified by a "DYS" number, and a count of the number of times these markers are repeated, known as the marker count. 

The yDNA STR test results of a single participant are of little value until they are compared with those of another participant.  From such comparisons the probability of the two participants sharing a common ancestor can be assessed.  Clusters or groups of participants sharing a common ancestor within the surname era, typically the last millennium, are known as genetic families (aka genetic groups or surname branches; the terms "lineage" and "cluster" is also used, but such groupings may include participants whose common ancestor lived before the surname era).

 While determining the marker counts that make up a genetic signature is a strict scientific process, determining and expressing the probabilities of the comparisons of genetic signatures sharing a common ancestor is still a developing art.  The probabilities are complex mathematical functions dependent on many variables, including the number of markers tested, the number and magnitude of the mismatching markers, and the different rates of mutation of individual markers (slow mutating markers are useful for grouping participants’ results, fast mutating markers for differentiating between results).  And some assumption is needed of the possible number of generations elapsed since the most recent common paternal ancestor (MRCA).  
Several tools may be used to assess comparisons of two participants' genetic signatures.  The tools I use include the following.  Some FTDNA customers and other project administrators prefer to rely of FTDNA's "Matches" pages, which I discuss at the foot of this page.
2.1         Haplogroups, the DNA signatures associated with basic ethnic groups used in Deep Ancestry Studies.  FTDNA predict the relevant haplogroup from each genetic signature. Participants with different haplogroups are not genealogically related within the surname era.  NB Haplogroup predictions can be confirmed by SNP ("snip" tests - see section 2 below)
2.2         Number of matching markers.  This simple indicator can be used to give a rough indication of the probability of the number of generations since the two DNA signatures shared a common ancestor. The following table is from FTDNA's former faq512:
Number of matching markers
Probability that the MRCA was not more than this number of generations ago
50% 90% 95%
10 of 10 16.5 56 72
11 of 12 17 39 47
12 of 12 7 23 29
23 of 25 11 23 27
24 of 25 7 16 20
25 of 25 3 10 13
35 of 37 6 12 14
36 of 37 4 8 10
37 of 37 2 to 3 5 7
65 of 67 6 12 14
66 of 67 4 8 9
67 of 67 2 4 6
107 of 111 7 11 13
108 of 111 5 10 11
109 of 111 4 8 9
110 of 111 2 6 7
111 of 111 1 3 to 4
To place these numbers of generations in context, it is unlikely the surname Irwin existed more than about 24 generations ago.  See also "Genetic Distance" below.

This table, and others like it, are only a very rough guide.  The probabilities shown are averages, and are often misleading for individual comparisons.  As will be seen, our Study includes two brothers who have only 23 of 25 matching markers, while we have about two dozen participants with 37 of 37 matching markers, of whom half also have 67 of 67 matching markers, none of whom have been able to use this study to determine their genealogical relationship (unlike several other participants with non-identical, albeit close, genetic matches who have succeeded in identifying genealogical relations).  These examples illustrate that the mutations of individual markers is a random process.
2.3    Genetic distance ("GD").  This is the simplest measure. Genetic Distances are expressed in terms of the differences between each marker, in terms of the number of markers compared, e.g. ‘0/12’ or ‘1/37’.  There are various models for calculating genetic distance. FTDNA now calculate Genetic Distance as the sum of the differences of individual marker counts, e.g. a distance of 3 may include three 1-step mismatches, or one 2-step mismatch plus one 1-step mismatch.
NB1  Different rules apply for multi-copy marker such as DYS 385, 389, 464 and YCA: see ;
NB2  FTDNA's Y-DNA "Matches" pages assume  a "Match" when men have STRs with a GD of 4/37 or less (or 1/2 or 2/25 or 6./7 or 10/111 or less).   See section 2.8 below.   

2.4     Time since Most Recent Common Ancestor (TMRCA).  Tables and graphs may be used to convert Genetic Distance into the number of generations since two participants shared a common ancestor.  While this measure is expressed in years it is readily comprehensible, and is a powerful tool for deep ancestry studies, alas the margins of error are so great that it is a most unreliable tool when used within the surname era.  And like the number of matching markers and Genetic Distance, TMRCA dates are also unreliable because they assume some single average mutation rate for all markers, while in practice the average mutation rates for individual markers vary enormously. 

TMRCA's may also be calculated from SNPs (see below).  Average SNP mutation rates are more reliable than average STR mutation rates - for BigY tests the average mutation rate is once per 83 years, or about one SNP mutation every three generations.  But the SNP mutation rate for a particular individual may be far from the average:  our Study has one individual who has had just two mutations since c.1350 - an average rate of one SNP mutation per 10 generations, and another individual who has had 17 SNPs during this period, and average mutation rate of one SNP per generation. 
2.5         FTDNA’s ‘TiP’ probability. These are a tool that encompass a large number of variables in a single probability figure.  Unlike Matches, GDs and TMRCAs they take account of differing average mutation rates of individual markers, and respond to "resolution" (i.e. the number of makers analysed): as the resolutions is increased from 12 to 37 to 67, so the TiP probabilities of common ancestry of two participants tend to polarise towards 0% or 100%. But there are many exceptions to this generalization, and 12-marker TiP %s are particularly unreliable. The number of generations since a possible MRCA is also important. For most people with the same or similar surnames this is typically a maximum of 24 generations. Coincidentally this is the number of generations since the Irwin traditions of the time of Robert the Bruce. TiP probabilities reduce if genealogical research has shown the common ancestor must have been more than, say, 8 generations ago.

FTDNA's TiP tool is powerful but complex and, amongst geneticists as its content is commercially confidential, controversial.  It represents FTDNA's best understanding of the impact of differing average mutation rates of individual markers.  TiP% data is still almost unique in utilizing a weighted average mutation rate to calculate TMRCA 
(Time to Most Recent Common Ancestor).data, but alas I believe TMRCA data it  produces gives a misleading impression of accuracy and, since 2016, results that are biased, i.e. that the true TiP%s should be lower than shown.  

Nevertheless this Study still uses TiPs as the best available tool for assessing the relative probabilities of an individual participant and a modal participant sharing a common ancestor with a modal participant.  It assumes that the 24-generation, no-paper-trail TiP % for the highest available resolution, known as the TiP Score is over 60% (formerly a TiP of over 80%) - see Appendix C of the accompanying Supplementary Paper No.1 ("Towards Improvement ...."), they can be grouped together as members of a “genetic family”.  The participant having the genetic signature that is most common within each genetic family is known as the modal participant.  The genetic signature of the modal participant may have the modal signature of the common ancestor of the genetic family, but this is not necessarily so. 
Participants with the surname Irwin or similar who do not closely match any other participant are known as "Singletons".  Participants with other surnames who have a "match" with one or more Irwins etc but a TiP Score of less than 60% are considered to be Mismatchesknown as "False positives" and not included in the Study statistics.  Participants with other surnames whose yDNA Matches shown on their FTDNA personal page are all or nearly all Irwins (or similar) are known as "NPEs" (see below), as are participants with the surname Irwin (or similar) but whose yDNA Matches on their FTDNA personal page are nearly all some other particular surname. 
2.6       NPEs. In practice the biological male ancestral line experienced a change in the surname, and in Surname DNA studies instances of such events are euphemistically termed as Non Paternal Events.  Examples of NPEs include:
  1. A formal change of surname, typically a 20th century event, but sometimes earlier, e.g. to inherit land from a father-in-law.
  2. An informal change of surname, typically in the 13th to 19th centuries, for example when a young boy's father died and he was given the surname of his mother (in Scotland females retained their maiden names until the 19th century) or, if she remarried, of his step-father; or if a boy was orphaned or a waif, and was given the name of his guardian.
  3. A change of surname before these had become strictly hereditary, typically in the 12th to 17th centuries, for example a patronymic when a boy was given the forename of his father, or a man became known by his nickname or occupation, or by where he lived or came from, or when a clan member, tenant, apprentice, servant or slave took the surname of his master, laird or chief. This practice seems to have been particularly prevalent in the Scottish Borders.  Sometimes such 'alias' surnames were used concurrently with paternal surnames, which later lapsed. 
  4. An illegitimacy or infidelity, covert or otherwise, at any period, and the child was given the surname of his mother or her partner or husband. 
NPEs can be manifest in two ways: those that today use the Irwin surname or similar but share the yDNA of some other surname, and those that share the yDNA surname of one of the Irwin genetic families but today use a different surname.  In the case of the latter I require a TiP Score of 95% for an individual testee to "qualify" for membership of our Study.  For further discussion of the interpretation of test results see section 7 and Appendix D of the accompanying Supplementary Paper 8, slides 25-30 of the lecture at Supplementary Paper 9, and my contribution at

Awareness that one's paternal ancestry included a NPE can be disappointing, particularly to genealogists who have long believed they are descended from a particular branch of their surname.  But it is important to remember that a majority of NPE's were not associated with any untoward event, that most surnames are not derived from a single ancestor, that DNA studies are never 100% proof of anything, and that some NPE branches of a surname may be older than branches that are not NPEs.  The heritage of a surname can be shared by all its branches.  For inspiring examples of how genealogical research can resolve NPE test results see the accompanying Supplementary Paper No.8 and, if you can get hold of a copy, Richard Hill's fascinating book Finding Family.

2.7.     Singletons, Mismatches, False Positives and Convergence/Back mutations.   Participants with the surname Irwin or similar who do not (yet) closely match any other participant are considered to be "Singletons".  This is, of course, hopefully only a temporary status!

Participants with a surname dissimilar to Irwin who have a close match with an Irwin at low STR resolutions who fail to make the TiP Score 95% cut-off at higher STR resolutions, or whose SNP test results are not compatible with Irwin participant with whom they otherwise match, are termed False Positives. 

False Positives are one example of Convergence, a term used in genetic genealogy to describe the process whereby two different genetic signatures have mutated over time - experienced "back mutations" - to become identical or near identical, resulting in an accidental or coincidental match.  Many of the "Matches" identified on the FTDNA YDNA "Matches" web pages which have different surnames can be explained by convergence. Convergence is more likely in at lower resolutions (1-12 or 1-25 markers) than high (1-67 or 1-111 markers). 

2.8    FTDNA's Y-DNA Matches pages.   These pages have been prepared by FTDNA to help their customers understand the results of their yDNA tests.  Participants in a well-developed surname project such as this are better served by our Main Results table, which shows the participants to whom they are most closely matched, but some explanation of the Matches pages is in order. The following points are relevant:
- a participants' "Matches" are identified by their e-mail address but not be their kit number.  For privacy reasons neither FTDNA nor administrators ever include e-mail address and kit number at the same time.  "Matches" pages help matching participants to contact each other, but for reasons I explain below this exercise is unlikely to be profitable, and will typically result in a very poor response level as recipients soon tire of unsolicited and ill-founded approaches.
- "Matches" can be identified at different levels of resolution (i.e. 12, 25, 37, 67 or 111 markers), up to the level of the participant concerned.  Thus if he has tested to 37 markers he cannot have matches at 67 or 111 markers.
- "Matches" are ranked by Genetic Distance (see above).  "Matches" at 12 marker level include participants with GDs of 0 and 1; at 25 marker level they include GDs of 0, 1 and 2; at 37 marker level they include GDs of 0, 1, 2, 3 and 4; at 67 marker level they include GDs of 0, 1, 2, 3, 4, 5, and 6; at 111 markers they include GDs of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10.
- These "cut-off" GDs of 1, 2, 4, 6 and 10 are arbitrary.  Participants with higher GDs may be related within the surname era, but the probability of this is lower and these participants are not listed on FTDNA's Matches pages - they are "false negatives".  For example our Study has identified several men who have tested L555 positive but have GDs of 5/37 or 6/37from the L555 modal STR values.We even have examples of two men who are L555 positive but have a GDof 13/37 from one another. 
- Similarly participants who are listed as "Matches" (i.e. have GDs of 1, 2, 4, 6 or 10) are not necessarily related within the surname era.  This is especially likely if the surnames are dissimilar, or if their haplogroups are different.  These "false positives" occur because of convergence (see above).
- Conversely "Matches" of participants with dissimilar surnames may be true matches disguised by a NPE (see above) in the ancestry of one of the participants.
- For any individual participant he will obviously have many more "Matches" at 12 markers than at 111 markers.  Indeed "Matches" at 12 markers are usually best ignored.  In theory if another participant is not a "Match" at a given level he will not be a "Match" at a higher level.  In practice a "Match" can occasionally appear at a higher level due to convergence.
- The number of "Matches" that individual participants have can vary widely.  Few or even no "Matches" may be listed at higher resolutions simply because few or no participants have (yet) tested to that level.  The number of Matches at mid-level, say at 37 markers, will vary from participant to participant depending on the number of participants with similar surnames who have tested.  Participants with a large number of "Matches" with other participants similar surnames will be members of a large genetic family. 
-The number of Matches at mid-level, say at 37 markers, will also vary from participant to participant depending on how close their STR signature is to that of a popular haplogroup such as M222, U106, M269 etc.  Such individuals may have hundreds of "Matches", but many of these will be False positives, i.e. with surnames. dissimilar to himself and also dissimilar to each other.  This will contrast with a NPE "Match" which is typically characterised by his "Matches" being limited to two surnames. 
- The number of "Matches" a man has is constantly increasing as more men take YDNA tests.  
- Small Genetic Distances alone (the basis of FTDNA's "Matches") should not be seen as indicative of a close genealogical relationship;  other evidence such as both similarity of surname AND some geographical or shared common ancestor should be determined before attempting to contact a "Match".  This is especially relevant of members of large genetic families such as the Border Irwins, at least until SNP evidence suggests such a relationship.

2.9     Caution.  Prospective participants should therefore be aware that some DNA test results have unexpected implications.  Disappointments can occur for several reasons:
  • if there has been a NPE in the paternal ancestral line;
  • if the results contradict some cherished genealogical research or tradition;
  • if there has been a mutation in recent generations and two known relatives have different DNA signatures (while most fathers, sons, brothers and first cousins have identical DNA signatures, a few have mismatches of 1/37 or even 2/37);
  • if the comparisons are indeterminate, e.g. if a participant appears genetically unrelated to anyone else in the study;
  • if the test does not lead to identifying any "new" genealogical relatives (because few surname DNA studies have sampled more than 1% of those with the surname who are alive today ).
Notwithstanding these contingencies over 90% of participants in our Study have been shown to be in one of the various genetic families that have been identified.

3.  SNP test results and haplotrees

3.1      SNPs. 

SNPs (pronounced "snip"s) are another form of analysing yDNA samples.  An analogy is that STRs identify the leaves of a tree, but SNPs identify the twigs, and sub-clades and haplogroups identify the main branches. Although ySTR tests and ySNP tests complement one another, STRs are liable to relatively frequent mutations, whereas SNP mutations are much more stable, and so analyses based on SNP data are much more reliable.  But as each man inherits hundreds of SNPs, and the relative ages of the mutations that gave rise to the younger SNPs are not readily apparent, these analyses are not straight forward. 

SNP test results are very different from STR test results.  In single SNP tests and (multiple) SNP Pack tests the saliva sample simply tests positive or negative, e.g. L555+ or L555-, i.e. it is binary, and not probabilistic.  Next Generation Sequencing (NGS) tests such as BigY are much more sophisticated and comprehensive.  For more details of SNP tests see Supplementary Paper 5. 

The nomenclature used in describing SNP test results can be confusing.  SNP tests are usually identified by an alphanumeric such as L21 or L555.  L555 or L555+ indicates the test was positive for this SNP, L555- means it was negative.  Negative test results are usually not reported as they are so numerous.  The prefix letter indicates the laboratory/individual which/who has named the SNP, e.g. SNPs “L” were named by FTDNA (who confusingly also use the prefixes BY and FT).  Other organizations use different prefixes (for readers curious about these prefixes see Confusingly many SNPs have synonyms, e.g. L21 = M529, L555 = S393.  SNPs are also known by their location on the human genome, e.g. L555 aka S393 is located at position 7779294.   See for a full list of SNPs with their synonyms and locations on the human genome.  FTDNA no longer "name" Private SNPs (see below) but refer to them by their position on the human genome. 

As SNPs are very similar to haplogroups (to the lay genealogist these two terms may be considered synonyms) they used to be given names such as R1b1a2a1a2c or R1b-L21, which are both further synonyms for L21.  

A further nomenclature challenge is the meaning of terms such "Variants", "known SNPs", "private SNPs" and "terminal SNPs", not least because as more and more SNPs are discovered these labels will change.  Private SNPs were sometimes used to relate to those specific to a surname, and terminal SNPs to the most recent/youngest known SNP.  FTDNA are now using the term "Terminal SNP" to denote the most recent SNP that is shared by two or more BigY testees, and "Private SNPs" as those SNPs not shared by any other BigY testee.  Of course these demarcations can change as more individuals take the BigY test, and may change as the FTDNA haplotree is refined (see below).  Because of lack of definition and the inherent instability of these terms I prefer not to use them, though the relative terms upstream SNPs and downstream SNPs is sometimes useful - see below.

3.2       Haplotrees.  All SNPs can be placed on a haplotree (aka phylogentic tree), a genetic family tree that goes back to the genetic Adam.  Until recently haplotrees were mainly relevant to genetic anthropologists and others interested in ethnicity/Deep Ancestry studies, and their relevance to genetic genealogists in general and to this Study in particular had been very limited.  However since about 2010 SNPs and haplotrees have become increasingly relevant to genetic genealogists as more and more SNPs are discovered and halpotrees expand downstream towards and now even into the surname era, i.e the last millennium.  SNPs have been identified from mutations that occurred during the 20th century. Downstream haplotrees such as the L555 Border Irwin haplotree (see below) are "at the cutting edge" and are throwing new light on many genealogical challenges. 

The position of a SNP on a haplotree may be indicated in various ways, thus L21 may be termed R-L21 or R1b-L21, where R is the haplogroup and R1b is the sub-clade.  Or some call R1b the haplogroup and R1b1a2a2a1a2c the sub-clade that defines L21.  The latter hierarchical form is logical but both clumsy and liable to be changed, so more descriptive forms such as R>M343>M269>P312>L21 are now more popular.  Thus L555 may be termed R1b>M343>M269>P312>L21>Z251>L555, or simply as R1b-L555, or some intermediate description if preferred. 

As SNPs are being discovered so frequently (and their relationships to one another occasionally revised) that there alas is no single, up-to-date haplotree, and if there was it would be too cumbersome to replicate graphically. Several haplotrees are relevant to this Study:

- FTDNA's haplotree (at their personal webpage/account under Y-DNA > Haplotree & SNPs) used to be very outdated, but since spring 2016 has been much more comprehensive, expecially for SNPs they have "discovered" themselves. SNPs that have tested positive are shown in green, SNPs that have tested negative are shown in red.  Untested SNPs are shown in orange (upstream, presumed positive), grey (upstream, presumed negative) or blue (downstream),

NB On FTDNA's public pages (e.g. (and on the main results table of this Study), haplogroups confirmed by SNP testing are shown in green, haplogroups predicted from STR data are shown in red.  FTDNA's haplogroup predictions are very reliable.

- ISOGG's haplotree (at is more comprehensive but less up-to-date and excludes many downstream aka Private SNPs.  Like the FTDNA haplotree this is presented as a table, with the oldest SNPs on the left, the younger "sons" and "grandsons" successivly  indented towards the right.

Alex Williamson's excellent Big Tree (at is resticted to P312 and its downstream SNPs (including L555), but includes few SNPs identified by SNP Pack tests. This haplotree is presented more like a conventional family tree, with the oldest SNPs at the top, and successive "sons" and "grandsons" below.   

- The Clan Irwin haplotree (at LATEST ANALYSIS UPDATE) is edited (in BigTree format) to show only the haplotree branches relevant to the 40+ Genetic families identified in this Study.  This includes the Border Irwin L555 SNP, but no details downstream thereof. 

- The Border Irwins L555 haplotree (a downstream amplification of the Clan Irwin haplotree) is now shown in two formats: (1) within the Master Results table in the LATEST RESULTS TABLE (FTDNA format), and (2) in the Border Irwins section of LATEST ANALYSIS UPDATE (BigTree format).  The latter is now extended to include testees who can be included because of L555 Pack test results, single SNP test results, STR data, Family Finder connections or genealogical relationships


3.3      TMRCAs. The number of true generations separating each "son"/"grandson" on a haplotree varyies greatly, depending on when relevant mutations occurred.  On average one BigY500 SNP mutation occurred every 125 years, or about once every 4 generations, and one BigY700 SNP occurred every 84 years, or about once every 3 generations.  However for any individual the mutations of the SNPs they have inherited may be separated by 15 or more generations, or by just one generation (our Study has an example of two brothers, of whom only one inherited his very own Private SNP!).  TMRCA's calculated for individual branch lines on the basis of average SNP mutation rates are more reliable than those based on average STR mutation rates, but are still unreliable, even when averaged over several lineages within a particular surname branch.

3.4    FTDNA's BigY Matches data.  I personally find this confusing and refer L555 testees to the results shown in this Study


4.  Next Generation Sequence (NGS) e.g. BigY test results
These tests are expensive and complex.  They give much more comprehensive STRs and SNPs.  For details see Supplementary Paper 5.

5.  FTDNA data base size and test kit prefixes
The number of Matches any DNA test kit can have is in part a function of the size of the relevant data base of the testing company concerned.  For yDNA tests FTDNA's data base is by far the largest in the world, which is one of the reasons this Study recommends members take a FTDNA yDNA test.  FTDNA do not publish the size of their databases, but a quantitative estimate of their direct-to-customer DNA database was attempted by Martin McDowell in February 2020 using his awareness of FTDNA's kit numbering system.  He found the following
- non-prefixed kits:  925,000
- IN (International) kits:  84,000
- MK (Multi-kit orders, USA):   67,000
- MI  (Multi-kits, international):  54,000
- AM (Amazon orders):  32,000
- N (transfers from National Genographic):  271,000
- B (transfers from other testing companies):  612,000
- 27 letter other prefixes: 71,000
Some of these kits have not been used, but he estimated that at that date FTDNA's database exceeded 1,700,000.  By 2021 it probably exceeds 2 million.

FTDNA's yDNA Matches pages draw on all their kits, immaterial of any prefix, although for technical reasons Matches of some transfers from other testing companies may be limited.   

6.  Further guidance on understanding test results