DIRECTORY STRUCTURE:

        sequences/${ENCODE_REGION}/${COMMON_NAME}.${ENCODE_REGION}.fa
        RepeatMasker/${ACCESSION}.${VERSION}.out
        trf/${ACCESSION}.${VERSION}.bed
        alignments/${ALIGNER}/*.maf
        alignments/${ALIGNER}/${CONSERVATION}/

Each FASTA file will have all the sequence entries for a given species/region.

HEADER STRUCTURE:

>${COMMON_NAME}|${ENCODE_REGION}|${FREEZE_DATE}|${NCBI_TAXON_ID}|${ASSEMBLY_PROVIDER}|${FREEZE_DATE}|${ASSEMBLY_ID}|${CHROMOSOME}|${CHROMOSOME_START}|${CHROMOSOME_END}|${ACCESSION}.${VERSION}|${NUM_BASES}|${NUM_N}|${THIS_CONTIG_NUM}|${TOTAL_NUM_CONTIGS}|${COMMENT}

Where:

	${COMMON_NAME}		like 'baboon' or 'dusky_titi'
	${ENCODE_REGION}	like 'ENm001' or 'ENr223'
        ${FREEZE_DATE}		like 'AUG-2004'; latest date for inclusion in this freeze of the set of sequences encompassing the ENCODE regions
	${NCBI_TAXON_ID}	like '9555' or '9523'
	${ASSEMBLY_PROVIDER}	like 'NISC' or 'RGSC'
	${ASSEMBLY_DATE}	like 'NOV-2003' or '21-JUN-2003'; Date associated with the specific sequence assembly represented in this ENCODE freeze
	${ASSEMBLY_ID}		like 'rn3' or 'panTro1'
	${CHROMOSOME}		like 'chr1' or 'chr19_random'
	${CHROMOSOME_START}	[1 based]
	${CHROMOSOME_END}	[1 based]
        ${ACCESSION}.${VERSION}	like 'NT_107546.1'
        ${NUM_BASES}		Total number of called bases in the sequence entry, including N's
        ${NUM_N}		Total number of N's in the sequence entry
        ${THIS_CONTIG_NUM}	ID of sequence contig (see next variable).
        ${TOTAL_NUM_CONTIGS}	Total number of sequence contigs syntenic to a human region.
        ${COMMENT}		This is an example I hope we all agree on.

>rat|ENr223|AUG-2004|10116|RGSC|NOV-2003|rn3|chr8|83281297|83487179|NT_107495.1|205883|133587|1|2|This is an example I hope we all agree on.

Not all fields need to contain information. For example when
${ASSEMBLY_PROVIDER} = NISC, there will be no ${ASSEMBLY_ID} or
chrom:start-stop coordinates.


Data Release Terms
------------------
All data in this directory and any subdirectories is subject to the terms 
of the ENCODE Project Data Release Policy of the National Human Genome Research
Institute.  This policy is posted at:

http://www.genome.gov/12513440
http://genome.ucsc.edu/encode/terms.html
      Name                                Last modified      Size  Description
Parent Directory - RepeatMasker/ 2004-10-20 12:27 - alignments/ 2004-11-17 11:25 - sequences/ 2004-11-17 11:21 - trf/ 2004-10-20 12:27 - speciesCoverage.html 2004-11-06 19:38 23K metadata.txt 2004-10-22 09:49 56K metadata_with_alternates.txt 2004-10-20 12:02 57K