Tissue samples
AgResearch collected tissue samples for the following 8 libraries:
- POSSUM_01-C-POSSUM-IMMUNE-2KB (pooled tissue from Spleen, Lymph
node, Splenocytes stimulated with ConA for 2 hours, Splenocytes
stimulated with ConA for 19 hours)
Vector: PDNRLIB; Approximately equal quantities of RNA from 4
tissues were pooled for cDNA library generation. The tissues were:1.
Spleen tissue, 2. Lymph node tissue, 3. Splenocytes stimulated with
ConA for 2 hours, 4. Splenocytes stimulated with ConA for 19 hours.
- POSSUM_01-POSSUM-C-BRAIN-2KB (hypothalamus and whole pituitary
glands)
Vector: PDNRLIB; Approximately equal quantities of RNA from: 1.
brain tissue from the hypothalamus region from 3 individuals. 2.
whole pituitary glands collected from 18 individuals.
- POSSUM_01-POSSUM-C-OVARY-2KB (ovaries)
Vector: PDNRLIB; Approximately equal quantities of RNA collected
from ovaries at the following 8 stages of development: 1. Late
folicular (3 tissue samples). 2. Mid luteal (3 tissue samples). 3.
Juvenile (2 tissue samples). 4. Mid pregnancy (2 tissue samples). 5.
Early luteal (2 tissue samples). 6. Early/Mid follicular (2 tissue
samples). 7. Anaestrous (2 tissue samples). 8. Lactating (2 tissue
samples).
- POSSUM_01-POSSUM-GUT-2KB (pooled epidermal cells scraped from
Duodenum, Ileum, Jejunum, Proximal Colon, Distal Colon, Caecum,P eyers
patches)
Vector: PDNRLIB; Approximately equal quantities of RNA from
epidermal cells scraped from the following 7 regions of the possum
gut were pooled for cDNA library generation: The tissues were:1.
Duodenum, 2. Ileum, 3. Jejunum, 4. Proximal Colon, 5. Distal Colon,
6. Caecum, 7. Peyers patches.
- POSSUM_01-POSSUM-KIDNEY-2KB (Approximately equal quantities of
RNA from Kidney Cortex and Kidney Medulla were pooled for cDNA
library generation)
Vector: PDNRLIB;
- POSSUM_01-POSSUM-LIVER-2KB (RNA prepared from Liver was used for
cDNA library generation)
Vector: PDNRLIB;
- POSSUM_01-POSSUM-C-EMBRYO-2KB (mixed)
Vector: PDNRLIB; Approximately equal quantities of RNA from the
following 8 sources: 1. Whole embryos (2 tissue samples). 2. Whole
Joeys 8-11 days old (8 tissue samples). 3. Early male reproductive
tract, day 9 to day 57 (13 tissue samples). 4. Early female
reproductive tract , day 22 to day 66 (13 tissue samples). 5. Mid
male reproductive tract, day 73 to day 113 (10 samples). 6. Mid
female reproductive tract, day 78 to day 115 (7 tissue samples). 7.
Late male reproductive tract, day 119 to day 168 (7 tissue samples).
8. Late female reproductive tract, day 119 to day 136 (3 tissue
samples).
- POSSUM_01-POSSUM-C-REPROTRACT-2KB (Oviduct, Cul-de-sac and
Uterus)
Vector: PDNRLIB; Approximately equal quantities of RNA from the
Oviduct, Cul-de-sac and Uterus with tissues collected for these
three regions of the reproductive tract collected at 8 different
physiological states: Late follicular, Mid luteal, Juvenile, Mid
pregnancy, Early luteal, Early/Mid follicular, Anaestrous,
Lactating. Each state contributed approximately equal amounts (by
weight) of tissue to each of the tissue pools from which each of the
3 RNA preparations was purified.
cDNA libraries
For each tissue sample a normalised cDNA library was prepared by
Clontech from RNA supplied
by AgResearch.
cDNA sequences
The cDNA clones were sequenced by TIGR.
sequence processing
TIGR provided the raw sequence data (trace files) together with the
nucleotide sequences and their quality values.
- Quality filtering
Low quality sequence ends have been removed using a custom script.
- Vector trimming (seqclean)
Vector sequences and polyA tails have been removed using TIGR's
seqclean.
- Preclustering (BLAST)
An all-against-all BLAST was performed to find ESTs which might
belong to the same group. This step was neccessary in order not to
overload the clustering tool (cap3) used in the subsequent step. All
the preclustering did was to choose the ESTs which got submitted to
cap3 intelligently so ESTs that might belong to the same group were
submitted in the same chunk.
- Clustering (cap3)
ESTs were clustered (grouped) into contigs using cap3.
- Some statistics from contig build CS40:
111,767 ESTs clustered into 12,013 contigs and 55,749 singletons
(that's 67,762 sequences in total).
Contigs are on average 1,244.25 bp long and 4.66 sequences deep and
contain 0.45% ambigious bases (Ns).
Singletons are on average 751.44 bp long and contain 0.72% ambigious
bases (Ns).
- Mapping (BLAST)
EST clusters were mapped (compared) against the genomic sequence of
the closest available relative, the gray short-tailed opossum
Monodelphis domestica.
- Annotation (BLAST)
In July 2006 EST clusters were compared against the non-redundant
protein database nr.
Top
The idea
Brushtail possum (Trichosurus vulpecula) ESTs were compared
(mapped) against the genomic sequence of the closest available relative,
the gray short-tailed opossum Monodelphis domestica.
The generic genome browser (GBrowse) is a genome browser which you
can use to visually inspect the mapping and compare sequence annotation
across a number of species. GBrowse opens in a separate window which is divided into the
following parts:
- Instructions
Of particular interest are the Examples and [Help].
Selecting chr2 from the Examples will show chr2 in the genome
browser. Clicking [Help] will bring up general help for
GBrowse.
- Search
Here you can search for your favourite region or keyword.
Use Scroll/Zoom to select the region and the level of detail
shown.
- Overview
This gives you an overview of the complete region (e.g. a complete
chromosome).
Click on the axis to change position (symbolized by the red
rectangle).
- Details
Here you can see all the information in various groups / datasets
for a selected region. You can't display more than 5 Mbp in this
detailed view.
- Tracks
Here you can select the datasets / tracks you are interested in,
just check or uncheck the checkbox in front of a track name.
Clicking on any of the track names will bring up information about
this specific dataset.
- Display Settings
Here you can adjust the display.
- Add your own tracks
You are not stuck with our tracks, you can add your own tracks. It's
very easy and you can access instructions by clicking on [Help].
The content
- Opossum Assembly Contigs
The topmost line (track) of the genome browser shows the opossum
genome. The opossum contigs have been sequenced and assembled by the
Broad
Institute. The current build is called monDom4 (January 2006).
Other available tracks are:
Try GBrowse.
Top
- Select the Program
The NCBI offers a
concise
program selection table. Read the remainder of the NCBI document
for more info on BLAST.
- Select the Database
Depending on your choice of Program all valid database will be
displayed, e.g.:
A) nucleotide databases
- Possum GenBank ESTs (as at 6/2006)
- CS40 Possum EST contigs (as at 7/2006)
- Human RefSeq mRNAs (as at 4/2005)
- Mouse RefSeq mRNAs (as at)
B) protein databases
- All non-redundant protein sequences (as at)
GenBank CDS translations+PDB+SwissProt+PIR+PRF
- Non-redundant Swissprot sequences (as at)
Swissprot without swissnew
- Mouse RefSeq proteins (as at 4/2005)
- Human RefSeq proteins (as at)
- Select E-value.
The E-value or Expectation-value tells you how many database hits of the
same computationally judged quality you can expect to occur simply by
chance. In a sense it is a measure for false positives. An E-value of 0
means no false positives predicted. Therefore the smaller the E-value,
the better. An E-value of e-04 or smaller (e.g. e-10) is considered
significant. e-04 translates to 0.000 1, e-10 translates to 0.000 000
000 01.
- Select max hits
Limit the number of database hits to be reported. If you search with
a very common motif you can choose not to get hundreds of database hits
reported but just the top 5.
- Paste your FASTA sequence into the large text field.
Sequences in FASTA format start with a one line header beginning with a
'>'. Following the header are any number of lines with just the sequence
data (no numbers). See the following example:
>My test-sequence
CATGCAGACGATGCTAGCTAGCTGATCGATCGATGCTAGCATGCATGCTAGTAC
- Press the "Run Blast" button to perform a search against all sequences
in the database.
Your sequences will be submitted to the AgResearch BLAST server
and returned when completed. Your results are stored on the server for 1 week to enable you to
retrieve them at a later date. You need to take note of the URL to
your results (e.g.
http://www.possumbase.org.nz/cgi-bin/blast_results.py?filename=wpVuSjHQYi7zExVB2CXjhCKAo) as
we cannot identify them due to the randomly assigned
file-names.
Top
Here you can download the complete set of annotated ESTs (assembled into
contigs) as a compressed file cs40annotated.zip.
What does the data look like?
cs40annotated.seq is a FASTA file containing all Trichosurus vulpecula
mRNA at June 2006 assembled into contigs and annotated via blast against all
non-redundant protein sequences (nr protein). Each description line includes
DNA composition, library expression of contig, and top hit against nr
protein and evalue.
Let's look at an example:
>GI|108827484|GB|EC281509.1|EC281509_CS40
DNA=(1%N 19%C 19%G 32%A 29%T ) Expr= 1=POSSUM_01-POSSUM-C-BRAIN-2KB(1) NR
protein=ref|XP_534617.2| PREDICTED: similar to NEDD8-conjugating enzyme [Canis
familiaris(eval= 8.00E-57)
CTGGGGAATGGTATGAGGCTCCCCAGGGTAAAAGTTGTAGTAATGTTCAC
ATTGGAGAGCAAATTGAAGAAGGATGAGCACCTTAAAGGATCCCATCTGT
GTGGCCCAGCCTCAAACTCCACATGGAAAGTATCTGTGAGGGATAAACTG
CTTATTAAAGAGGTTGAAGAGCTCGAAGCCAATTTACCTTGTACACGTAA
AGTGAATTTTCCTGATCCAAACAAGCTTCACTACTTTCAACTAAGAGTCA
CTCCAGATGAGGGTCACTACCAGGGTGGAAAATTTTGGTTTAGATAGAAG
TCCCTGATGCTTACAAAATGCTGCCTCCCAAAGTAAAATGTTTGACTAGA
ATCTGGCACCCTAACATCATAGTGATGGGGGAAATCTTTCTGAGCTTACT
AAGAGAACATTCATTGGACAGCACTGGATGGCCTTCCACAAGAACATTAA
GGGATATTGTATAGGAATTAAACTTTTTTTATTTACCAACCTTTTGAATT
CCTAACCCACAGTTTCTTGCATATAGTATCAGTTCCCTAAACACTTCTGA
ATGTATACTACTCATGAATAACACTTTTAACTTGCATTGGTATAGACACT
AGATTAATTAAAGTGTAGAAGTCCTGGATTAAAANNAACTCCAAGCCAAT
GCAGATNNNCATCTTGCTANATTTAGTCACCAAAATAATTGAAATTGTTT
TCTGTTACATCACCTTTGAAATGNCTC
- name:
GI|108827484|GB|EC281509.1|EC281509_CS40
GI number: 108827484
Accession number: EC281509, version: 1, contig build: CS40
- DNA composition:
DNA=(1%N 19%C 19%G 32%A 29%T )
- library expression of contig:
Expr=
1=POSSUM_01-POSSUM-C-BRAIN-2KB(1)
The first number is the total number of ESTs comprising the contig
(contig depth).
Then all libraries in which member ESTs get expressed are listed
together with their individual counts.
- top hit against nr protein:
NR protein=ref|XP_534617.2|
PREDICTED: similar to NEDD8-conjugating enzyme [Canis familiaris
- E-value:
(eval= 8.00E-57)
Naming scheme
cs40.annotated.seq contains 67,762 sequences (note: these sequences
are contigs and singletons) with the following naming scheme:
- Singleton (sequence does not belong to any group as determined
through preclustering)
e.g. >GI|108827484|GB|EC281509.1|EC281509_CS40 (given is GI number:
108827484, accession number: EC281509, version: 1, contig build:
CS40)
- Singleton (sequence does not belong to any group as determined
through clustering)
e.g. >050706CS40000705FFFFF (this AgResearch internal numbering
scheme means that this sequence was preclustered into the precluster
050706CS40000705; singletons are then counted using the hexadecimal
system, e.g. FFFFF)
- Contig (a number of sequences could be clustered to one group)
e.g. >050706CS4000070500001 (this AgResearch internal numbering
scheme means that various sequences were clustered into cluster
00001 of precluster 050706CS40000705)
Top |