Including your data in this analysis

Can you include my data?

Am I eligible? Currently, our formal efforts are restricted to Y-DNA testers who have taken a BigY test at Family Tree DNA and are positive for a clade within R-U106. However, we are now (November 2017) working on a solution that will allow other next-generation sequencing tests to be included (e.g. YElite, WGS) and extend our analysis to the wider R-M269 community. To participate, see our instructions for downloading your data and uploading it to the Data Warehouse.

Privacy of your data and informed consent

Your data security and privacy is important. The data agreement you will be entering into by participating in this project can be found on the data upload page. The information below will help guide you through the some of the implications surrounding submitting your data, and what options you have to safeguard your privacy. It also describes our standard procedures, and the logic behind the ethical decisions we have made regarding your data privacy. It is important that you understand these before submitting your raw data, or submitting raw data on someone else's behalf. Note that submission to this project is not covered by Family Tree DNA's privacy policy.

Key points:

General concerns about Y-DNA testing:

DNA testing in general sometimes uncovers family histories that do not match historical records. There are a variety of reasons behind this, and they can include hidden adoptions or cuckoldries. You should consider these carefully when undertaking DNA testing in general. Any detailed analysis of your DNA may identify these, but if they have not been brough to light already, their presence is unlikely to be found from our analysis.

Aside from this, you should be aware that any copy of data placed on an internet site may be subject to malicious attacks, including hacking. Safeguards are meant to prevent this from happening, but can never be completely secure. The small but inevitable risk of this is that data may be stolen.

What does my raw data contain?

For BigY testers, your raw data contains three files: variants.vcf, regions.bed and a readme file. The variants.vcf file contains the Y-DNA SNPs that are called positive or negative in your data. This largely overlaps with the "known variants" and "novel variants" pages on your BigY results, and also contains some quality information. The regions.bed file contains the regions of your Y chromosome where an ancestral or derived (negative/positive) call was made. These data do not contain any information about autosomal DNA, mitochondrial DNA, or exome DNA, which is purposely removed by Family Tree DNA before we receive the results. The data can identify your family to within a few hundred years, but can very rarely identify you as a person without explicit testing of your close male-line relatives.

Testers with other companies (e.g. YSeq WGS and FGC YElite/WGS) will have similar sets of results. Whole Genome Sequencing (WGS) tests will contain autosomal, mitochondrial and (depending on the test) exome DNA results. These are not relevant to us, so we remove them from the analysis and do not make them public.

Does my data contain any medically relevant information?

There should be no medically important information in most people's data. Medically important mutations occur very rarely on the Y chromosome, where most genes have lost their functionality. With few exeptions, reliable Y-linked medically relevant mutations are so far limited to male infertility. If the tester has fathered a child, then the potential for medically relevant information on the Y chromosome is extremely limited. If not, Family Tree DNA's policy is to let you know privately if any known fertility-related mutations have occurred (e.g. a complete DYS464 deletion).

How will my data be stored?

We have recently (October 2017) had to move our data storage to James Kane's Data Warehouse. This is a secure archive, with access by only a small list of haplogroup administrators. At any point, however, conditions may dictate we have to opt for a different solution. The conditions of the data policy would be upheld if this was to occur.

What personal information will be used?

Our normal policy is to only use two pieces of personal information: your most-distant known ancestor's (MDKA) surname, and your kit number. There are several motivators behind this. Your kit number allows your entry to be connected back to your ancestral information and STR profile at Family Tree DNA in a way that reduces errors in the analysis. Your MDKA surname is a piece of information that can be used to check we have the right match (e.g. we haven't made a typo in the kit number). Surnames are also useful indicators of close relationships.

Can my information be included more anonymously?

Yes. We have several kit owners who do not want to be identified by surname and/or kit number. In these cases, we can obfuscate either piece of information. Our normal procedure is to replace surnames with geographical points of origin (e.g. "Spain"), and replace kit numbers with the haplogroup that person is in. We prefer not to do this if possible, for the cross-checking purposes outlined above: data uploaded in this way ends up truly anonymous, as not even the U106 project's administrators will be aware of who it belongs to.

Can this data be used to identify me?

Normally only if you choose it to. Since it only includes Y-chromosomal information, it can be used to identify your family if another person with the same surname tests. Further testing can be used to refine that, but you can only be identified from your DNA as an individual if every other male line within the last handful of generations can be ruled out by death or direct DNA testing. Of course, we cannot prevent you being identified by other methods. For example, your kit number is linked to your account at Family Tree DNA. Anyone with access to that account can readily identify you as an individual. Normally that is restricted to Family Tree DNA (Gene by Gene) staff and volunteer administrators, who are bound by Family Tree DNA's privacy policy.

Informed consent for testees who are not the kit holder

Kit holders must take responsibility for obtaining informed consent from the person whose DNA is being tested.

The bottom line

At the end of the day, the project administrators are known, named invidiuals. For us, the link between our genetic data and ourselves is plain for all to see. We have chosen to put our genetic data in the public domain because we consider the risks associated with this to be very small, and the return on the understanding of our origins to be much greater. We hope, having understood the risks and benefits, that you will feel the same.