Privacy of your data and informed consent


Your data security and privacy is important. The data agreement you will be entering into by participating in this project can be found on the data upload page. The information below will help guide you through the some of the implications surrounding submitting your data, and what options you have to safeguard your privacy. It also describes our standard procedures, and the logic behind the ethical decisions we have made regarding your data privacy. It is important that you understand these before submitting your raw data, or submitting raw data on someone else's behalf.

Key points:

At the end of the day, the project administrators are known, named invidiuals. For us, the link between our full genetic data and ourselves is plain for all to see. We have chosen to put our genetic data in the public domain because we consider the risks associated with this to be very small, and the return on the understanding of our origins to be much greater. We hope, having understood the risks and benefits, that you will feel the same.

Disclaimer: The information here does not represent legal or medical advice. It is prepared in good faith for informative purposes, and is complete within the bounds of conciseness and of our personal knowledge.

Specific concerns and FAQs

Legal protection

Data content

Use of personal data

How will my data be stored and processed?

Centralised data storage takes place in the United States. Data will be stored in the
Data Warehouse: this is a secure, password protected archive. Access is enables by only a small list of haplogroup administrators. At any point, however, legal or technical conditions may dictate we have to opt for a different solution. The conditions of the data policy would be upheld if this was to occur.

Data processing for this project will take place in the United Kingdom and United States. However, the centralised data storage means it will be available to a selected few other administrators (currently [11 Mar 2018] US-based James Kane and Alex Williamson). Additional administrators may join in future from additional countries, who will have access to raw data.

Submitters' consent is made via that data submission form for anonymised, processed data to be sent to third parties. These are selected by the project administrators, with the expectation (and, depending on circumstances, legal requirement) that these are limited to people with a genuine research interest.

What legal protection does my data have?

The rights and protection afforded to individuals depends both on the country of origin of the data, the country in which the data is being stored and processed, and the country from which it is transmitted. Additional insitutional statutes may apply, as well as the agreement consented to in our
privacy policy. Note that submission to this project is not covered by Family Tree DNA's privacy policy, as you are choosing independently to make this data available.

UK and EU law: The UK Data Protection Bill (DPB) and EU General Data Protection Regulation (GDPR) are among the laws covering the treatment of personal and genetic material within the United Kingdom and European Union, respectively, and the latter will continue to be part of UK law following the UK's exit from the EU.

UK and EU users should be aware that the Haplogroup R Data Warehouse is run by James Kane, not directly by us, they are sending data for storage in the United States. This archive is open to other researchers (currently [11 Apr 2018] only Alex Williamson and Jef Treece) who are based in the US: EU/UK laws may not apply (or may not apply in full) to data that the user personally exports outside the EU. While we aim to abide by the full principles and guidance of the GDPR and DPB, as "hobby" projects (i.e. not necessarily under the "large scale" definition of the GDPR), our liability under the GDPR, the DPB, and the associated US implementations via the Privacy Shield Programme, may be limited.

US law: In the US, laws may be applicable on either the national or state level (see a lay summary on privacy law and genetic data). The US has a less-well-developed body of legislature guaranteeing personal privacy than the UK or EU: fewer than half of states even require written consent to disclose genetic information.

Note that some US law-enforcement agencies are using genetic data from genealogical databases (e.g. GedMatch) to trace victims or perpetrators of crime. It is unlikely that our database would be used for such a purpose but we may theoretically, under certain circumstances, be requested to co-operate with law enforcement agencies in any country on such ground. Such requests will be considered individually, with due consideration to the balance of users' privacy, national and international law, and the rights of (and due respect for) victims of crime.

In the event of a problem: Regardless of the legal situation, if you feel your data security has been (or may be) compromised by this project, we would welcome correspondence. Security issues regarding data storage in our Warehouse may be addressed to James Kane; issues regarding data processing and release may be addressed to Iain McDonald. Data can be edited or fully removed or from our storage and analysis facilities and, on receipt of a request to do so, we will aim to do this in an expedient manner.

Informed consent for testers and kit holders

Informed consent is taken through acceptance of our
data policy. For this reason, we cannot accept data e-mailed directly to us, excepting that it is accompanied by a statement acknowledging content to this policy.

If you are uploading genetic data that is not yours, as a kit holder you must take responsibility for obtaining informed consent from the person whose DNA is being tested, unless that person is deceased. Informed consent can only be made by persons above the age of legal consent and with mental capacity to manage their own affairs. (See EU General Data Protection Regulation (GDPR) Article 7, Article 9 Section 2(b); UK Data Protection Bill (DPB) Article 84(2), among others.).

General concerns about Y-DNA testing

DNA testing in general sometimes uncovers family histories that do not match historical records. There are a variety of reasons behind this, and they can include hidden adoptions or cuckoldries. Most commonly, these manifest themselves as a change of surname in the genetic lineage. You should consider these carefully when undertaking DNA testing in general. Any detailed analysis of your DNA may identify these. If your research has not brought these to light by this stage, their presence is unlikely to be found from our analysis. However, we can advise on specific circumstances before you submit your data to our project.

Furthermore, testers must take some personal responsibility for the secrets in their own family. This may include situations where revealing their own identity could impact on the lives of others in their family and beyond, e.g. through adoptions, infidelity, etc. Please be considerate of others when publicising the results of your own DNA.

Aside from this, you should be aware that any copy of data placed on an internet site may be subject to malicious attacks, including hacking. Safeguards are meant to prevent this from happening, but can never be completely secure. The small but inevitable risk of this is that data may be stolen.

What does my raw data contain?

For BigY testers, your raw data contains three files: variants.vcf, regions.bed and a readme file. The variants.vcf file contains the Y-DNA SNPs that are called positive or negative in your data. This largely overlaps with the "known variants" and "novel variants" pages on your BigY results, and also contains some quality information. The regions.bed file contains the regions of your Y chromosome where an ancestral or derived (negative/positive) call was made. These data do not contain any information about autosomal DNA, mitochondrial DNA, or exome DNA, which is purposely removed by Family Tree DNA before we receive the results. The data can identify your family to within a few hundred years, but can very rarely identify you as a person without explicit testing of your close male-line relatives.

Testers with other companies (e.g. YSeq WGS and FGC YElite/WGS) will have similar sets of results. Whole Genome Sequencing (WGS) tests will contain autosomal, mitochondrial and (depending on the test) exome DNA results. These are not relevant to us, so we remove them from the analysis and do not make them public.

Does my data contain any medically relevant information?

There should be no medically important information in most people's Y-DNA data. Medically important mutations occur
very rarely on the Y chromosome, where most genes have lost their functionality. With few exeptions, medically relevant mutations reliably linked to the Y chromosome are thought to be limited to male infertility. If the tester has fathered a child, then the potential for medically relevant information on the Y chromosome is extremely limited. If not, it is our understanding that Family Tree DNA's policy is to let you know privately if any known fertility-related mutations have occurred (e.g. a complete DYS464 deletion). Hence, while we cannot provide absolute certainty, to the best of our knowledge, most people's submitted Y-DNA will have negligible relevant medical use.

What personal information will be used?

Our normal policy is to only use two pieces of personal information as identifiers: your most-distant known ancestor's (MDKA) surname, and your kit number. Any information you supply in the free-text upload to describe your MRCA or other information may also be used.

This balances the legal* and ethical need to anonymise data (see Can this data be used to identify me?), with the need for users to identify themselves and genealogically relevant matches in the data. A contact e-mail address is collected by the Warehouse administrator for the purposes of data administration (see privacy policy), but may be made available to matching kit holders on request, and only with the uploader's consent. (*EU General Data Protection Regulation (GDPR) Article 1, Section 156; Article 5, Section 1(e); among others).

Our reasons for using your kit number is that it allows your uploaded data to be connected back to your ancestral information and STR profile at Family Tree DNA. This provides an independent check for all administrators and users that the data being uploaded are assigned correctly to an individual, minimising errors, and so that people can identify relevant genetic information about the people in their haplogroup. The MDKA surname provides an additional check (e.g. to make sure we haven't made a typo in the kit number anywhere), and acts as a useful indicator of close relationships to other testers looking for matches. Geographical information about your MDKA is used to map historical migrations in a statistical sense. This data can be anonymised (see Can my information be included more anonymously?).

Can my information be included more anonymously?

Yes. We have several kit owners who do not want to be identified by surname and/or kit number. In these cases, either piece of information can be obfuscated on upload by the user. We normally request that surnames are replaced by geographical countries or regions of origin (e.g. "Spain"), and kit numbers replaced with the lead SNP of the haplogroup assigned by FTDNA (e.g. "R-S4004"). Data uploaded in this way ends up truly anonymous, as not even the project's administrators will be aware of who it belongs to. However, we prefer if users do not do this if possible: DNA testing works by comparison to others, therefore if this information isn't shared, we cannot guarantee to provide such accurate information for you, and it seriously impacts the usefulness of your data for your genetic matches (for more information on why, see
What personal information will be used?).

Can this data be used to identify me?

Normally only if you choose it to. We take care to reduce personal information down to the bare genealogical information useful for our analysis (see
What personal information will be used?). We only use and store Y-chromosomal information. We remove autosomal and mitochondrial data from WGS test.

We ask that your kit is represented by an ancestral surname and your assigned kit number (see also Can my information be included more anonymously?). This information alone cannot identify you as an individual, but could be used to identify you if you create a paper trail leading back to it. If you are worried about being identified personally, you should take normal online data precautions to avoid creating a paper trail that leads back to personal information. An example might be posting your kit number or a family tree online, under a username that you use on social media, which can then connect you back to an organisation or institution. If you remain concerned by this, you can anonymise some of your information when it is input.

The ability of your genetic information to identify you as an individual is remote. Depending on the commonness of your Y-DNA haplogroup, the genetic information you supply could be linked to a branch of your family (e.g. as a descendant of a specific person): indeed, that is a specific goal of this project. However, you can only be securely identified as a specific person from this data if every other male line within the last handful of generations can be ruled out by death or direct DNA testing.

Note that your kit number is linked to your account at your testing company. Anyone with access to that account can readily identify personal information associated to yourself, which may include your name, and e-mail and physical addresses. For Family Tree DNA, normally that is restricted to Family Tree DNA (Gene by Gene) staff and volunteer administrators, who are bound by Family Tree DNA's privacy policy.

Currently, there is normally negligible risk in associating your Y-DNA genetic data with you as a person, to the extent that the administrators of this project are happy to have that information made public, and we encourage you to take that as a guide. While different circumstances may apply to your own situation, we encourage users to learn about what information they are sharing and the small relative risk they are exposing themselves to. The information you reveal here and elsewhere is your choice, and you retain the right for your submitted personal and genetic to be removed entirely from this site.

Full statement on the UK Data Protection Bill (DPB) and EU General Data Protection Regulation (GDPR)

[This statement is undergoing internal review. Further information is available on request.]