Tuesday, June 28, 2011

SnpMap version 1.0.4

Just posted new version to fix a bug in the shared positive calculation, and fix a file read permissions issue on newer versions of Windows. See earlier posts for link to file. The link is the same.

Monday, June 27, 2011

SnpMap version 1.0.3

Some people brought to my attention that fact that the 1.0.1 and 1.0.2 versions did not run on Windows XP. I've corrected that in version 1.0.3 by creating a new installer that doesn't have the problems of the Microsoft installer. All versions of SnpMap require Microsoft .Net v. 3.5, and but this version's installer won't automatically install .Net like the previous versions. I also changed the positive symbols because the symbols I was using aren't supported on Windows XP. The download link is the same as before. This downloads the setup exe, not a zip file. It's also a couple megabytes smaller. In the future I'll have two setup programs. A small one with just the updated exe file, and the full setup that includes the data files.

Before installing version 1.0.3, I recommend uninstalling the previous version of SnpMap. Version 1.0.3 puts the files in a different location.

Friday, June 24, 2011

SnpMap version 1.0.2 Introduction

I just posted a new version of SnpMap that adds a few little features. Here is the link:
 SnpMap v.1.0.2
Now would be a good time to give an introduction into SnpMap. The purpose of SnpMap is to help you explore your autosomal genome data from 23andMe or FamilyTreeDNA. In particular, it's meant to help identify segments of your chromosomes that show ancestry, either recent or ancient, that show similarities with different regions of the world, and the populations within those regions. One thing you'll discover, is that genetically, people all over the globe are quite similar. At some positions on the chromosome, people can have alternate alleles for that position. Some of these are the SNP locations that the DNA services test. Although any allele can occur in people from any part of the globe, studies have shown that the frequencies of the alleles are often different among people from different parts of the world. For most SNPs the frequency differences are small - too small to be very useful for identifying the region or population from which a person might have ancestry. But there are some SNPs that have sizable frequency differences. These difference usually vary over distance, so that the difference is small between adjacent populations, or even adjacent regions, but significant between distant regions. Finding these SNPs and displaying information about them is what SnpMap helps you do.

After you load your genome data using the File menu in the upper left corner, select a chromosome, choose your options, and click on the Recalculate button. You'll then see something like the picture above. If you checked more than one region, SnpMap will compare those regions. If you checked only one region, SnpMap will compare the checked populations within that region. The grid will display the SNPs of interest, according to your option selections. You'll see the SNP physical base pair location, SNP identifier (like rs####), your alleles for that SNP, and region or population frequency information for your alleles. This frequency info will be in the form ##.#/##.## followed by possible markers. The numbers are the frequency each of your alleles occur within a region or population. For example, in the first row, I am homozygous TT. Among people in sub-Saharan Africa, the occurrence of T is 0.9% (i.e. 9 out of 1000 people will have this allele), but among Europeans T occurs in 26.7% of people (267 out of 1000).

SnpMap will add display markers to point out notable situations. The '≠' symbol is the negative marker. It indicates that your allele occurs in very low frequency within that region or population, and there is at least one other region or population that shows frequencies several times higher. The '●' symbol is the positive marker. It indicates that your allele occurs several times more frequently than any of the other regions/populations. This marker is not that common to get, especially when comparing many regions/populations, because there are not very many SNPs where a single region/population has a uniquely higher frequency for an allele. The '◦' symbol is the shared positive marker. This is new to this version, and is described in more detail below. In version 1.0.2 and earlier, I used a happy face for the positive marker, and an empty circle for shared positive, but changed them to support Windows XP. Some of the screenshots still have the older symbols.

To help you learn more about populations data and SNPs, you can right click on the populations list, or an SNP row and a menu will display with links to the Alfred and NIH website pages containing detailed information.

You can control which SNP rows SnpMap displays by using the check boxes above the SNP grid.
When you check one of these boxes, it tells SnpMap to display all rows where that type of marker is found, including when there are other marks as well.

Positive Markers
If you check the Positive box, rows like the one above will be displayed. Here, there is an SNP at which I am heterozygous, and one of the alleles has a frequency of 17.3% which is quite a bit higher than any of the other regions. If there are other SNPs like this in that area of my chromosome, it would indicate a segment showing African ancestry from one of my parents, because in that area of my chromosome it looks like it is 50% African. If there are two markers then it means you are homozygous at that SNP, and your alleles from both parents appear to be positive for that region/population. Checking the Positive Markers box will display all single and double positive rows.

Double Negative Markers
If you check the Double Negative box, rows like the one above will be displayed. Double negatives show SNPs where you received the same allele from each parent, and that allele is unlikely for that region/population. These markers are useful for determining the beginning and end of segments. If a particular if a column shows lots of double negatives, and few or no positive markers, you probably don't have any detectable segments from that region/population, and it's safe to uncheck it and recalculate. This will give better results for the remaining regions/populations.

Single Negative Markers
If you check the Single Negative box, rows like the one above will be displayed. Single negative markers show where one of your alleles is unlikely for that region/population. It's a somewhat weak indicator by itself, because while it says one allele doesn't look like it's associated with that region/population, it says nothing about the other allele. It is only really useful in context with other markers and nearby SNPs.


Unmarked
Check the Unmarked box if you want to see all the SNPs that have no other marks. You will probably only rarely use this. I added it so you could see your entire set of SNPs, and see the marked SNPs in that context.

Shared Positive Markers
This is something I just added. If you check this, you'll see rows like the one above. The threshold for seeing normal positive markers is pretty high since the frequency for that region/population must be much higher than all the others. Because of this you don't get many normal positive markers unless comparing just 2 or 3 regions/populations. When comparing more than that you might want to try showing shared positive markers as well. This marker will be displayed when 2 or 3 regions/populations have higher frequencies than all the rest. Typically you will see adjacent regions displaying shared positives. You might see Africa and North Africa showing a shared positive, or Europe and the Middle East. This will show markers for SNPs where the allele distributions are more widespread, which would have been missed if showing only regular positive markers.

In my next post, I will show some examples of real segments, and pointers on identifying them.

Tuesday, June 21, 2011

SnpMap Application

This blog is about a little program I wrote to help me explore the genome data of me and some of my relatives. I've made it available to anyone that's interested in this topic. Currently, SnpMap allows you to view your SNP data, and see how your data compares to other populations and regions of the world.

In future posts here, I'll show some of the ways I use the program and interpret the results.