|
Let me help you unravel the mysteries of GEDCOM
GEDCOM is an acronym that stands for GEnealogy Data COMmunications. In short, it is the language by which different genealogy software programs talk to one another. The purpose is to exchange data between dissimilar programs without having to manually re-enter all the data on a keyboard.Roughly fifteen years ago, the Church of Jesus Christ of Latter-day Saints announced a file format called GEDCOM. This new proposed standard file format was designed to allow different genealogy programs to exchange data. There was only one problem at the time: the only program that could read and write this type of data was the one written by the Church of Jesus Christ of Latter-day Saints. GEDCOM is a standard, not a program. As such, genealogy programs that are going to use the same data have to be written by the programmers to handle these types of files. If you are trying to transfer data from one program to another, only to discover that one of the programs does not support this protocol, you are out of luck. To complete the exchange of data, both programs have to support this protocol. Slowly, over a period of several years, other genealogy programs began to add the ability to read and write GEDCOM files. It became possible to move data from one genealogy program to another without manually re-typing everything. Now you can just export your file from one genealogy program in this format and then import the file into another genealogy program. All of today's major genealogy programs will import and export GEDCOM data. Data transfer may still be a problem for those using older genealogy programs without this capability; many people still find their data trapped in these "islands." For them, there is no easy solution. Unlike the "dark ages" of the 1980s, it is now common for people to use two or three or even more genealogy programs. You may find one program that you prefer to use for storing all the bits of information that you encounter in your research efforts. However, you might prefer the printed reports or multimedia scrapbook features of a different program. Thanks to GEDCOM, you can easily move your data from one program to another. You can also share information with distant cousins using yet other genealogy programs. The instructions for creating or reading GEDCOM files will vary from one program to another. You need to consult the program's HELP files to find the exact sequence of instructions your genealogy program requires. Here is an extract from the beginning of a typical GEDCOM file: 0 HEAD 1 SOUR TMG 2 VERS 6.0 2 NAME TMG (R) 2 CORP Wholley Genes 3 ADDR PO Box 88 4 CONT Anywhere, U.S.A 1 DEST Gedcom55 1 DATE 16 Nov 2007 1 SUBM @S0@ 1 FILE Jones.ged 1 GEDC 2 VERS 2 FORM LINEAGE_LINKED 1 CHAR ANSI 0 @S0@ SUBM 1 NAME Not Given 1 ADDR Not Supplied 2 CONT 0 @I1@ INDI 1 NAME Joseph Patrick /Jones/ 2 GIVN Joseph Patrick 2 SURN Jones 1 SEX M 1 BIRT 2 DATE 6 Nov 1885 2 PLAC Sudbury, MA 2 SOUR @S2@ 3 PAGE pg 96 3 QUAY 3 1 DEAT 2 DATE 11 Oct 1959 2 PLAC Sweetland, MA (rest of file omitted) The file contains genealogy data in a structured format. It utilizes numbers to indicate the hierarchy and tags to indicate individual pieces of information within the file. A number of zero indicates the first line within a single record, and the letters, or tag, after the zero indicate the type of record. The top line in any GEDCOM file is the HEADER record, indicating that it is the beginning of the file. Words that are more than four letters long are typically abbreviated. In this case, the word HEADER is written as HEAD. A number "1" shows that the line in question is one level below the "zero" line. This indicates that this line is one level subservient to the zero line and contains additional information. In the case of the second line in the above file, the entry of "1 SOUR Legacy" indicates that this file was created by (SOURCE) Legacy, a popular genealogy program for Windows. The number "2" on the next line shows that it is subservient to the preceding line with a number 1 in it. In this case, the line of "VERS 4.0" indicates that the file was written with version 4 of Legacy. Below that you see a line labeled ADDR (address) and another labeled CONT (the previous line is CONTinued here). Scanning a bit further down the file, you will see the following: 0 @I1@ INDIAgain, the zero indicates this is the beginning of a new record. The "at" signs bracket the record number. In this case, the record is of an INDIvidual, and it is individual #1 (I1) in the database. Succeeding lines show events, such as birth, marriage, and death, along with subsequent data listing dates and places. You will also note an entry of "2 SOUR @S2@," which indicates that a source citation for the event can be found in SOURce entry S2 to be found later in this file. INDI, NAME, BIRT, DEAT, SEX, SOUR and the other record types are called GEDCOM "tags." There are many available tags within the this standard and even a capability to create user-defined tags for those situations not covered by the standard. Of course, user-defined tags are usually not understood by the receiving program, so they seem to be somewhat useless. They may help define data within the program in which they were created, but they will not translate to a new program via the GEDCOM format. You need to be aware that the creation of the GEDCOM standard was not a perfect implementation. For one thing, not all the data fields are specified precisely in the GEDCOM specifications. Next, not all the programmers of the various genealogy programs interpreted the specifications in exactly the same manner. For instance, your present genealogy program might be perfectly happy with a birth date listed as, "after 1847 but before 1852." However, once that information is exported in a GEDCOM file and then imported into a different program, the birth date may say something else. The receiving program may expect exact dates and not be able to handle anything that says "after" or "before," especially not both in the same statement. Typically, the receiving program simply leaves the line blank. Sadly, one or two genealogy programs will accept the first date found on the line and then will disregard any further information. Another problem is that not all genealogy programs have the same ideas about databases. One program may have only one field for "occupation," assuming that every person on the face of the earth never, ever changed careers. Another genealogy program may have the ability to record multiple occupations during the person's lifetime. When transferring data via GEDCOM from the more powerful program to the simpler one, some of these occupations will be lost. These are a couple of simple examples; you can find numerous other inconsistencies when moving data between dissimilar programs. Another limitation is the fact that the present GEDCOM standard was created before the popularity of multimedia. You can transfer textual data, such as names, dates, and locations rather well in GEDCOM. However, transferring scanned images, sound clips, and movies from one genealogy program to another is almost impossible to accomplish via GEDCOM files. The present GEDCOM implementation can point to the location of multimedia files on a hard drive. In theory, this should suffice. However, in my experience of moving data around in many genealogy programs, I have rarely seen multimedia files handled properly. There is another problem with translating from one program to another: that of data integrity. Translating from one program's database to GEDCOM is sort of the same as translating from one spoken language to another. The basics work, but subtleties and details sometimes do not translate well. Then, when translating to the third language (the receiving genealogy program's database), more translation losses creep in. I well remember reading a technical manual some years ago that had been written in Japanese and then translated into Chinese. At a later date, the Chinese version was translated into English. The resultant English manual was barely readable. The same may happen with translating a database from Program A into GEDCOM and then from GEDCOM into Program B. A new method of transferring data between different genealogy programs was announced some time ago by Wholly Genes Software. Their Bridge technology reads data from one program directly into a second program without requiring a "double translation" via GEDCOM. The result is a much more accurate transfer process. However, very few genealogy developers have adopted GenBridge. To date, this technology is only available in a few programs: The Master Genealogist and Family Tree Super Tools (both produced by Wholly Genes), The Pocket Genealogist, and GedStar Pro are the only ones I can think of. Despite all the shortcomings, GEDCOM is still a simple and somewhat effective method of transferring genealogy data from one program to another. Most of the data will transfer properly, and then there are easy ways of reviewing the data to look for errors. The names, dates, and locations normally transfer correctly. Text, events, notes, and source citations may not always work perfectly. The exact problems encountered will depend upon the two genealogy programs involved. Most modern genealogy programs will create an error log of data imported but not understood by the receiving program. You can read that log file to see what the program detected as inconsistent, then manually go in and fix the errors. While tedious, this is still a lot better than re-keying everything! Two and a half years ago a new GEDCOM standard was proposed that is to be based upon XML, a programming language that is popular on the World Wide Web. This new standard should greatly improve data transfer accuracy. See http://www.familysearch.org/GEDCOM/GedXML60.pdf for details. However, don't look for this new version any time soon. It has been a proposal for more than two and a half years, and nothing has happened in that time. Older versions of GEDCOM have been around for more than fifteen years, and only minor improvements have been made in that time. I expect that GEDCOM 6.0 will not appear in genealogy programs for several more years, if ever. As an interesting side note: there were plans back in 2003 to create a program called "gedify" to convert GEDCOM 6.0 files to the older GEDCOM 5.5 standard so that older genealogy programs could read data created with the new format. This seems not to have progressed, but there doesn't seem to be much need for this program as no one is yet creating GEDCOM 6.0 files!
Return to Home Page from GEDCOM

|