Manual for the SCPD Database
Saurabh Sinha
March 7, 2000
Introduction
The SCPD database provides information on regulatory elements and transcription
factors in genes of the yeast Saccharomyces cerevisiae. It
provides easy access to
-
sequences of genes and ORFs (that contain instances of known regulatory
elements) by name
-
detailed description and occurrences of regulatory elements and transcription
factors
-
tools to analyze motifs in genes and ORFs
-
provision to submit information to the database
-
other miscellaneous information such as binding affinities of certain motifs
and links to other relevant sites
(Henceforth, the term 'factors' will refer to transcription factors and
regulatory elements, and the term 'genes' will refer to genes and ORFs)
This page by default displays a table that shows a large number of genes
with mapped sites. Each table entry is a hyperlink to information about
that gene. A specific gene's information can also be retrieved by entering
the gene's name in a form at the top of this page.
Information on a specific gene
By default this page displays information about mapped sites (highlighted
in red) in the gene - the location of the site and sequence of the adjoining
regions.
This page offers the following options:
-
Get mapped sites: same as above.
-
Get putative sites: lists all putative sites in the gene, i.e., sites that
have sequence similarity with a factor's binding sites. The location and
sequence of the site, along with the putative factor's name, is displayed.
-
Get intergenic region: displays the sequence of the region upstream of
the gene, up to the start (or end) of the next gene.
-
Retrieve sequence: displays the sequence of the gene.
This page by default tabulates all known factors. A specific factor may
also be searched by name.
This page offers the following options:
-
Get factor and element list: this is the table of all known factors, displayed
by default.
-
Get consensus list: this is a list of all known factor consensi. Each entry
is a (factor, consensus sequence) pair.
-
Get matrix list: this is a list of all known factor weight matrices. Each
entry is a (factor, weight matrix) pair.
-
Get distribution of mapped sites: link does not work.
-
Get summary of distribution: Tabulates the number of occurrences of each
factor in the genome.
-
Get correlation between factors: This is a table that tabulates highly
correlated factor pairs, i.e., pairs of factors that tend to occur together
more often than not.
Information on a specific factor
This page offers the following options:
-
Get regulated genes: lists the genes that are regulated by this factor.
Each gene is a hyper-link to information about that gene (see above for
its format)
-
Get sites: lists the sites where this factor binds in the regulated genes.
-
Get consensus: retrieves the consensus sequence of the factor's binding
site, if known.
-
Get matrix: retrieves the weight matrix describing the factor's binding
sites, if applicable.
-
Get affinity data: displays the binding affinity of the factor to each
of its binding sites.
-
Get genomewise distribution: lists all occurrences of this factor, along
with the name of the gene and the location and sequence of the binding
site.
-
Sort by copy No.: sorts the genes regulated by this factor, by the number
of copies of its binding site in the genes.
-
Sort by function category: sorts the genes regulated by this factor, by
the regulatory function.
Analysis tools
-
Retrieve
promoter sequence: allows retrieval of the upstream region of a gene.
The boundaries of the region can be specified.
-
Search
existing motif: determines whether a sequence pattern (motif)
occurs in the binding sites of the known factors. The searched motif could
be up to 20 characters long, and allows spacers ('N') also. Optional approximate
matching that tolerates up to a specified number of mismatches is also possible.
-
Search putative regulatory elements: determines if an input sequence (in
FASTA format) has putative occurrences of some motif
-
Group genes according
to function categories: a set of user-specified genes (by name) are
grouped by their function categories.
-
Repetitive
sequence analyzer: this is a javascript tool that finds repetitive
elements (whose nature can be specified in detail by the user) in a set
of user-specified sequences. The page gives very clear instructions on
how to use the tool.
-
Motif (>6-mer) distribution:
this
provides a chromosome-wise search for a user-specified motif. The result
can be displayed according to the genomic location or the location in the
upstream regions of genes.
-
K-tuple
relative information: "The k-tuple relative information between two
sequences is defined as the log[(n1+1)/(n1+n2+2)], where n1 is the number
of k-tuples present in the first sequence while n2 is the number in the
second sequence. ". Given two user-specified sequences (in FASTA
format), it computes the above statistic for these two sequences (for a
user-specified tuple length). The biological relevance of this
statistic is unclear.
-
Palindrome
sequences: this is a javascript tool that searches for palindromic
substrings in an input sequence (in raw data format). Detailed instructions
for using this tool are available on this page.
-
Multisequence
alignment by GibbsDNA: this is a tool for multiple sequence alignment
by GibbsDNA. The tool is heavily parameterized, so the alignment can be
appropriately fine-tuned.
Submit a record to SCPD
This feature allows users to submit a new entry to
the database. One can submit a gene, a consensus or a weight-matrix record:
The ER and Object models for the database are presented
in a diagram. This documentation is specifically for someone with knowledge
of these models from database theory. The ER model is easier to comprehend.
This page lists all factors for which data on binding affinity (to sites)
is available. Each enlisted factor is a hyper-link to its binding affinity
data. (This information can also be accessed from the page dedicated to
that factor's information - see above.)
Links
Links to other online databases are listed.