README file for Ch13_anno_v3.0.txt ---------------------------------------- Current annotation version: 3.0; release date: March 13th, 2007 Brief descriptions of the main changes since the previous release of the annotation file can be found in the file, Ch13_annotation.releasenotes. This README file was last modified: March 13th, 2007 I. Contents of Ch13_anno_v3.0.txt --------------------------------------- This is the third major release of the annotation file accompanying the Chicken 13k cDNA microarray ("the Chicken13k array") manufactured by the Genomics Resource of the Fred Hutchinson Cancer Research Center (FHCRC). The annotation file is a tab-delimited, plain ascii text file containing annotation and other information pertinent to the clones used to construct the Chicken13k array. The file comprises 14 columns of data, with each data row serving as a record of information for a feature (i.e. spot) present on the microarray (in some instances, several features on the array represent the same gene or EST). There are two header lines in the file; the first line has the name and version of the annotation file (in column 1); the second line is a row of column header labels. Following are descriptions of the 14 columns of data, which fall into four rough categories: - Local information or information about the clone provider: 1. Array Id -- an identifier serving to link each feature to a unique identifier in the image analysis output file (for GenePix Pro the output file is the .gpr file). Each id is unique to its feature (no two rows have the same Array Id value). 2. Source -- a designation of the group or institution that provided the clone represented by this feature. 3. Library -- the cDNA library of ESTs or genes from which this clone's material was procured. This column of information was compiled for us by Dr. Joan Burnside. - Common identifiers for the clones: 4. Source Clone Name -- an identifier for the corresponding clone, issued by the clone's source. This column may have the same value appearing in two or more rows, since different features on the array may represent the same clone; the same point applies as well for values in the remaining columns. 5. GenBank Accession -- the GenBank Accession number associated with this clone in the GenBank database at NCBI. Not all of the clones are registered yet in GenBank, so not all rows will have values in this column. 'NA' indicates we do not have an Accession for a given clone. - BLAT-determined, chromosome location: 6. BLAT Alignments -- a column indicating whether a unique chromosome location has been determined for this clone, using BLAT. Possible values are 'none', 'single', 'multiple' and 'NA'. Further information on how the four columns of information in this group were obtained can be found below, in Section V. 7. Chromosome -- the name of the chromosome on which the clone has been located, if a unique position was determined for the clone; otherwise this column will contain a blank. 8. Start -- the starting index on the chromosome of the BLAT alignment locating this clone on the chromosome, if a location was determined; otherwise this column will contain a blank. 9. End -- the ending index on the chromosome of the BLAT alignment locating this clone on the chromosome, if a location was determined; otherwise this column will contain a blank. - Annotation: 10. Source Assigned Annotation -- annotation provided by the clone's source. Not all sources provided this information, so not all rows will have values in this column. 'NA' indicates none available. 11. TGI Assigned TC -- an identifier for the tentative consensus (TC) if TGI has constructed a consensus sequence that includes this clone. Further information on how TGI constructs TCs is provided in Section VI. 'NA' indicates no TC was available. 12. TGI TC Annotation -- an annotation derived by TGI for the TC representing this clone. TGI's method of acquiring annotation for their TCs is described below, in section VI. 'NA' indicates none available. 13. UniGene -- the UniGene identifier associated with this clone in the UniGene database at NCBI. Each UniGene identifier was determined by its association with a GenBank accession number from the Ch13K array. 'NA' indicates we do not have a UniGene identifier for a given clone. 14. Comment -- an additional remark pertaining to data in the row; the word 'none' appears if no remarks are required. II. Provenance of Cloned Material --------------------------------- There are 15,769 features on the present array. These represent clones that have been obtained in large part from the following four collections: 1) the UMIST collection, maintained Dr. Dave Burt of the University of Manchester Institute of Science and Technology. 2) the DKFZ collection at Heinrich-Pette-Institute maintained by Dr. Jean-Marie Buerstedde. 3) the DT40 collection at Fred Hutchinson Cancer Research Center, maintained by Dr. Paul Neiman. 4) the T-Cell ("pat-clones") and Lymphoid ("pgn1c-clones") Libraries, maintained by Dr. Joan Burnside of the University of Delaware and the Delaware Biotechnology Institute. Representation on the array from these four sources breaks down as follows: University of Manchester ("UMIST"): 11,447 clones; ~ 72.6% Heinrich-Pette-Institute ("DKFZ"): 378, ~2.4% Fred Hutchinson ("FHCRC"): 1801, ~11.4% U. Delaware ("UDEL"): 1983 ~12.6% with a remaining 160 features (~1%) serving as controls, comprised of 4 arabidopsis genes (Cab, NAC1, PRKase, and RUBISCO) to serve as negative controls as well as chicken genomic DNA to serve as a positive control, all spotted on each of the 32 blocks of the array. III. How to Obtain Further Information Regarding the Clones on this Array ------------------------------------------------------------------------- In order to obtain more information about a particular clone represented on the Chicken13k array, users are encouraged to search the following web sites, as appropriate for the clone's source using either a Source Clone Name or GenBank Accession number for the given clone, obtained from the Ch13_anno_v3.0.txt annotation file. UMIST Clones (clones whose Array Ids are of the form "C00041...") Web site: http://www.chick.umist.ac.uk Click on "ID Search". Enter a list of one or more Source Clone Names or GenBank Accession numbers in the text pane, as indicated. Click on "Search" DKFZ Clones (clones whose Array Ids are of the form "DKFZ426..."): Website: http://www.ncbi.nlm.nih.gov Search "Nucleotides" using the GenBank Accession Number (i.e. BE139972) DT40 Clones (clones whose Array Ids are of the form "DT40subNB..."): Website: http://www.ncbi.nlm.nih.gov Search "Nucleotides" using the GenBank Accession Number (i.e. BE139972) Pat Clones and pgn1c Clones (clones whose Array Ids are of the form "Pat_..."): Web site: http://www.chickest.udel.edu Click on "Search By Clone ID" Type in the Source Clone Name (e.g. 'pat.pk0023.e5.f' or 'pgn1c.pk002.b19') NB: this is case sensitive. Use lower case for ALL entries. Select a Library (if relevant, otherwise leave on default setting "Any Type") Click on "Search" On the resulting page, click on either "BLASTX Hits" or "BLASTN Hits" IV. Obtaining Clones represented on the Chicken13k Array -------------------------------------------------------- Inquiries about access to clones used in the construction of the array should be addressed to: UMIST clones: via Web Form at ARK-Genomics: http://www.ark-genomics.org/order/order.php DKFZ clones: Jean-Marie Buerstedde, PhD Director, GSF-Institute for Molecular Radiobiology e-mail: buersted@gsf.de DT40 clones: Jeffrey Delrow, PhD Director, Genomics Resource Fred Hutchinson Cancer Research Center e-mail: jdelrow@fhcrc.org Pat & PGN1C clones: Joan Burnside, PhD Professor Delaware Biotechnology Institute e-mail: joan@udel.edu V. Determining Values for BLAT Alignment and Chromosome Locations ----------------------------------------------------------------- If a clone had an associated GenBank Accession number, that Accession number was used to query UCSC's on-line Genome Table Browser containing the May 2006 Build of the Gallus gallus genome to acquire BLAT alignment results. All resulting alignments were then filtered, based on the conditions: score/alignment length <= 90% alignment length >= 75bp. Chromosome name, and start and end locations were only recorded in cases where a clone had a unique alignment surviving this filtering, indicated by 'single' under the BLAT Alignments column. Occurrences of 'multiple' indicate cases where two or more distinct alignments for a clone made it through the filter. 'none' indicates that either no alignments survived the filter or that we lacked both GenBank Accession as well as sequence data for the clone. In the case of both 'multiple' and 'none', the columns for Chromosome, Start and End are left blank. For control spots, 'NA' appears under BLAT Alignments, rather than 'none'. VI. Regarding TGI Annotation Data ---------------------------------- Columns 11 and 12 of chicken13_anno_v3.0.txt contain information extracted from the EST Annotation table maintained in TGI's GgGI. TGI obtains this information by the following method: ESTs on record in GgGI, obtained from dbEST, are clustered and "tentative consensus" sequences (TCs) are inferred for each resulting cluster. The annotation obtained for each resulting TC is then attributed to all of the ESTs making up the cluster represented by that TC. There are, however, two ways in which TGI assigns annotations to its TCs. A) If the cluster represented by a TC includes any of TGIs own curated expressed transcripts, then the annotations for these expressed transcripts are concatenated, with the result serving as the annotation for the TC itself. B) For any TC not including any TGI expressed transcripts, a BLAT search is conducted for alignments against an in-house, non-redundant database derived from several well-curated public-domain protein databases (including SwissProt, GenPept and PIR), and the annotation from a selected best hit from this search is used as annotation for the TC. The annotations derived by method B) can be distinguished by the presence of the modifiers, 'homologue to', 'similar to' and 'weakly similar to', which occur at the start of the annotation, and indicate percent identity of the alignment result. In all annotations, the modifiers 'complete' and 'partial (PERCENT)' occurring at the tail of the annotation indicate percent coverage of the TC query by the aligned subject. To obtain this annotation information directly from the TGI web-site, go to: http://compbio.dfci.harvard.edu/tgi/cgi-bin/tgi/gireport.pl?gudb=g_gallus Enter either the EST sequence name or GenBank Accession number in the appropriate text field. A more thorough description of TGI's procedure for obtaining annotation information can be found at the following URL: http://compbio.dfci.harvard.edu/tgi/gifaq.html VII. Disclaimer ---------------- The information supplied in this file is based on data provided by the four clone providers, in conjunction with data extracted from publicly accessible databases. We provide preliminary information to assist array users with cursory data analysis only. We make no claims on the validity of the information provided by the clone providers or obtained through the public- domain resources. It should also be noted that much of the information contained within these resources are continually in flux. We encourage independent informatics analysis, and use of the clone providers' own resources cited above, in Section IV. Furthermore, as with all experimental data, independent verification of microarray findings is highly recommended. VIII. Contact info ---------------- Questions/comments concerning the compilation of this file should be addressed to: Jeffrey Delrow, PhD Director, Genomics Resource Fred Hutchinson Cancer Research Center e-mail: jdelrow@fhcrc.org or Ryan Basom Systems Analyst/Programmer Genomics Resource Fred Hutchinson Cancer Research Center e-mail: rbasom@fhcrc.org IX. Acknowledgments ------------------ Version 3.0 of the Chicken 13k annotation file was compiled by Ryan Basom, who would like to thank Denise Mauldin of the Peter Nelson Lab at FHCRC for her assistance with answering questions about the of scoring BLAT alignments. Version 3.0 of the Chicken 13k annotation file was built upon the framework laid down during the creation of v2.0 of the Chicken 13k annotation file, which was compiled through the joint effort of Ryan Basom and Mark Aronszajn. For their past contributions we would like to express sincere thanks to the following people for their help and advice. Thanks to Joan Burnside from University of Delaware and the Delaware Biotechnology Institute for her continued help and encouragement regarding the annotation project. Thanks as well to Angie Hinrichs of the UCSC Genome Browser project for helping us with the procedures we followed to determine chromosome locations for our clones. This included help in the use of BLAT and other nice utilities in the BlatSuite of tools provided by her group. So thanks to the UCSC Genome Browser staff and researchers, generally, for providing such useful, high-quality tools. And finally, a nod of sincere appreciation to Dr. Janet Young of the Trask lab at Fred Hutchinson Cancer Research Center, for her cheerful contribution of time and wisdom on numerous occasions, answering our frequent questions about what annotation information to look for, how to acquire it, and how to sort it all out once we had it.