Privacy of your data and informed consent

Summary

Your data security and privacy is important: the reputation of genetic genealogy (and the future discoveries we can make) rely on us ethically treating your data in a way you are happy with. The data agreement you will be entering into by participating in this project can be found on the data upload page. The information below is intended as a guide to help you make an informed decision about whether to submit your data, and what options you have to safeguard your privacy. It also describes our standard procedures, and the logic behind the ethical decisions we have made regarding your data privacy. It is important that you understand these before submitting your raw data, or submitting raw data on someone else's behalf.

Key points:

Submitters are asked to publicly release the Y-DNA mutations found in your test results. These include variants that might be private to your family or, in some cases, to yourself. For BigY testers, this information is already available to your matches.
Submitters are asked to publicly identify your results by your kit number and ancestral family name, which we ask you supply us with on submission. The facility exists to anonymise this data if you do not wish to provide these, but it is helpful to us if you do so.
DNA test results can contain medically important information. However, Y-DNA test results are less likely to, except for specific fertility problems that can be easily identified.
The test and Warehouse have been designed to limit information that is of a personal nature, medically relevant, or important for any health or other insurance purposes.
DNA testing in general risks exposing family secrets. These risks are already inherent in any test results you have publicly released already.

At the end of the day, the project administrators are known, named invidiuals. We have chosen to make public our full genetic data Y-DNA data. We have put it in the public domain because we consider the risks associated with this to be very small, and the return on the understanding of our origins to be much greater. We hope, having understood the risks and benefits, that you will feel the same.

Disclaimer: The information here does not represent legal or medical advice. It has had no approval by any ethical, medical or legal entity. It is prepared in good faith for informative purposes, and is complete only within the bounds of conciseness and of our personal knowledge.

Specific concerns and FAQs

How will my data be stored and processed?

Centralised data storage takes place in the United States. Data will be stored in the Data Warehouse: this is a secure, password protected archive. Access is enables by only a small list of haplogroup administrators. At any point, however, legal or technical conditions may dictate we have to opt for a different solution. The conditions of the data policy would be upheld if this was to occur.

Data processing for this project will take place in the United Kingdom and United States. However, the centralised data storage means it will be available to a selected few other administrators (currently [11 Mar 2018] US-based James Kane and Alex Williamson). Additional administrators may join in future from additional countries, who will have access to raw data. This list may be updated without notification.

Submitters' consent is made via that data submission form for anonymised, processed data to be sent to third parties. These are selected by the project administrators, with the expectation (and, depending on circumstances, legal requirement) that these are limited to named people with a genuine research interest.

What legal protection does my data have?

The rights and protection afforded to individuals depends both on the country of origin of the data, the country in which the data is being stored and processed, and the country from which it is transmitted. Additional insitutional statutes may apply, as well as the agreement consented to in our privacy policy. Note that submission to this project is not covered by Family Tree DNA's privacy policy, as you are choosing independently to make this data available.

UK and EU law: The UK Data Protection Bill (DPB) and EU General Data Protection Regulation (GDPR) are among the laws covering the treatment of personal and genetic material within the United Kingdom and European Union, respectively, and the latter will continue to be part of UK law following the UK's exit from the EU.

UK and EU users should be aware that the Haplogroup R Data Warehouse is run by James Kane, they are sending data for storage in the United States. This archive is open to other researchers (currently [11 Apr 2018] only Alex Williamson and Jef Treece) who are based in the US: EU/UK laws may not apply (or may not apply in full) to data that the user personally exports outside the EU. While we aim to abide by the full principles and guidance of the GDPR and DPB, as "hobby" projects (i.e. not necessarily under the "large scale" definition of the GDPR), our liability under the GDPR, the DPB, and the associated US implementations via the Privacy Shield Programme, may be limited.

US law: In the US, laws may be applicable on either the national or state level (see a lay summary on privacy law and genetic data). The US has a less-well-developed body of legislature guaranteeing personal privacy than the UK or EU: fewer than half of states even require written consent to disclose genetic information.

Note that some US law-enforcement agencies are using genetic data from genealogical databases (e.g. GedMatch) to trace victims or perpetrators of crime. It is unlikely that our database would be used for such a purpose but we may theoretically, under certain circumstances, be requested to co-operate with law enforcement agencies in any country on such ground. Such requests will be considered individually, with due consideration to the balance of users' privacy, national and international law, and the rights of (and due respect for) victims of crime.

In the event of a problem: Regardless of the legal situation, if you feel your data security has been (or may be) compromised by this project, we would welcome correspondence. Security issues regarding data storage in our Warehouse may be addressed to James Kane; issues regarding data processing and release may be addressed to Iain McDonald. Data can be edited or fully removed or from our storage and analysis facilities and, on receipt of a request to do so, we will aim to do this in an expedient manner.

Informed consent for testers and kit holders

Informed consent is taken through acceptance of our data policy. For this reason, we cannot accept data e-mailed directly to us, excepting that it is accompanied by a statement acknowledging content to this policy.

If you are uploading genetic data that is not yours, as a kit holder you must take responsibility for obtaining informed consent from the person whose DNA is being tested, unless that person is deceased. Informed consent can only be made by persons above the age of legal consent and with mental capacity to manage their own affairs. (See EU General Data Protection Regulation (GDPR) Article 7, Article 9 Section 2(b); UK Data Protection Bill (DPB) Article 84(2), among others.).

General concerns about Y-DNA testing

DNA testing in general sometimes uncovers family histories that do not match historical records, so-called non-paternity events or NPEs. There are a variety of reasons this can happen, and they can include hidden adoptions or cuckoldries. Most commonly, these manifest themselves as a change of surname in the genetic lineage. You should consider these carefully when undertaking DNA testing in general. Any detailed analysis of your DNA may identify these. If your research has not brought these to light by this stage, their presence is unlikely to be found from our analysis. However, we can advise on specific circumstances before you submit your data to our project.

Furthermore, testers must take some personal responsibility for the secrets in their own family. This may include situations where revealing their own identity could impact on the lives of others in their family and beyond, e.g. through adoptions, infidelity, etc. Please be considerate of others when publicising the results of your own DNA.

Aside from this, you should be aware that any copy of data placed on an internet site may be subject to malicious attacks, including hacking. Safeguards are meant to prevent this from happening, but can never be completely secure. The small but inevitable risk of this is that data may be stolen.

What does my raw data contain?

For BigY testers, the raw (VCF) data we ask for contain only information on the Y chromosome and are not your complete (BAM) test results. These data consist three files: variants.vcf, regions.bed and a readme file. The variants.vcf file contains the Y-DNA SNPs that are called positive or negative in your data. This is an unfiltered version of the "known variants" and "private variants" pages on your BigY results, and includes quality information on them. The regions.bed file contains the regions of your Y chromosome where an ancestral or derived (negative/positive) call was made. These data do not contain any information about autosomal DNA, mitochondrial DNA, or exome DNA, which is purposely removed by Family Tree DNA before we receive the results. The data can identify your family to within a few hundred years, but can very rarely identify you as a person without explicit testing of your close male-line relatives.

Testers with other companies (e.g. YSeq WGS and FGC YElite/WGS) will have similar sets of results. Whole Genome Sequencing (WGS) tests will contain autosomal, mitochondrial and (depending on the test) exome DNA results. These are not relevant to us, so we remove them from the analysis and do not make them public.

Does my data contain any medically relevant information?

We do not expect medically significant information in most people's Y-DNA data. Medically important mutations do occur on the Y chromosome but very rarely. Most Y-DNA genes have lost their functionality, and the purpose of the Y chromosome is now mainly to trigger male-specific processes during human development. Consequently, medically relevant mutations reliably linked to the Y chromosome are normally limited to male infertility. If the tester has fathered a child, then the potential for medically relevant information on the Y chromosome is extremely limited.

Specific infertility related issues can be identifed easily in tests, such as a deletion of DYS464. Others are more subtle. Factors like genetic stability or mosaic loss of Y may have affected the acquisition of results, but will not normally be apparent in the results. Hence, while we cannot provide absolute certainty, current research implies most people's submitted Y-DNA will have negligible relevant medical use.

What personal information will be used?

Our normal policy is to only use two pieces of personal information as identifiers: your most-distant known ancestor's (MDKA) surname, and your kit number. These will be made public. Any information you supply in the free-text upload to describe your MRCA or other information may also be used but is not automatically shared.

This balances the legal and ethical need to anonymise data (see Can this data be used to identify me?), with the need for users to identify themselves and genealogically relevant matches in the data. A contact e-mail address is collected by the Warehouse administrator for the purposes of data administration (see privacy policy), and will only be made available to administrators, except with the uploader's explicit consent.

Our reasons for using your kit number is that it allows your uploaded data to be connected back to your ancestral information and STR profile at Family Tree DNA (we also ask you to upload your STR data if you are happy to). This provides an independent check for all administrators and users that the data being uploaded are assigned correctly to an individual. This minimises errors and ensures that administrators of genetic projects know which of their members have uploaded. The MDKA surname provides an additional check (e.g. to make sure you haven't made a typo in the kit number anywhere). It is also a useful indicator of close relationships: the origins of surnames can be traced using matches within a surname group. Geographical information about your MDKA is used to map historical migrations in a statistical sense. All of this data can be anonymised (see Can my information be included more anonymously?), but we strongly encourage you to include as much of it as possible.

Can my information be included more anonymously?

Yes. We have several kit owners who do not want to be identified by surname and/or kit number, especially in places where Nordic surnames are used. In these cases, either piece of information can be obfuscated on upload by the user. If you choose to anonymise your surname, we suggest it is replaced by "Anon" or similar, and that you include your geographical country of origin (e.g. "Norway") so that we know why. If you anonymise your kit number, we suggest you replace it with the lead SNP of the haplogroup assigned by FTDNA (e.g. "R-S4004"). Data uploaded in this way ends up truly anonymous, as not even the project's administrators will be aware of who it belongs to. However, we prefer if users do not do this if possible: DNA testing works by comparison to others, therefore if this information isn't shared, we cannot guarantee to provide such accurate information for you, and it seriously impacts the usefulness of your data for your genetic matches (for more information on why, see What personal information will be used?).

Can this data be used to identify me?

Normally only if you choose it to. We take care to reduce personal information down to the bare genealogical information useful for our analysis (see What personal information will be used?). We only use and store Y-chromosomal information. We remove autosomal and mitochondrial data from WGS tests as it is not relevant to us.

We ask that your kit is represented by an ancestral surname and your assigned kit number (see also Can my information be included more anonymously?). This information alone cannot identify you as an individual, but could be used to identify you if you create a paper trail leading back to it. If you are worried about being identified personally, you should take normal online data precautions to avoid creating a paper trail that leads back to personal information. An example of a paper trail might be posting your kit number or a family tree online, under a username that you use on social media, which can then connect you back to an organisation or institution. If you remain concerned by this, you can anonymise some of your information when it is input.

The ability of your genetic information to identify you as an individual is remote. Depending on the commonness of your Y-DNA haplogroup, the genetic information you supply could be linked to a branch of your family (e.g. as a descendant of a specific person): indeed, that is a specific goal of this project. However, you can only be securely identified as a specific person from this data if every other male line within the last handful of generations can be ruled out by death or direct DNA testing.

Note that your kit number is linked to your account at your testing company. Anyone with access to that account can readily identify personal information associated to yourself, which may include your name, and e-mail and physical addresses. For Family Tree DNA, normally that is restricted to Family Tree DNA (Gene by Gene) staff and volunteer administrators, who are bound by Family Tree DNA's privacy policy.

Currently, there is normally negligible risk in associating your Y-DNA genetic data with you as a person, to the extent that the administrators of this project are happy to have that information made public, and we encourage you to take that as a guide. While different circumstances may apply to your own situation, we encourage users to learn about what information they are sharing and the small relative risk they are exposing themselves to. The information you reveal here and elsewhere is your choice, and you retain the right for your submitted personal and genetic to be removed entirely from the Warehouse and its subsiduaries.