Manual for the SCPD Database

Saurabh Sinha
March 7, 2000

Introduction

The SCPD database provides information on regulatory elements and transcription factors in genes of the yeast Saccharomyces cerevisiae. It provides easy access to

sequences of genes and ORFs (that contain instances of known regulatory elements) by name
detailed description and occurrences of regulatory elements and transcription factors
tools to analyze motifs in genes and ORFs
provision to submit information to the database
other miscellaneous information such as binding affinities of certain motifs and links to other relevant sites

(Henceforth, the term 'factors' will refer to transcription factors and regulatory elements, and the term 'genes' will refer to genes and ORFs)

This page by default displays a table that shows a large number of genes with mapped sites. Each table entry is a hyperlink to information about that gene. A specific gene's information can also be retrieved by entering the gene's name in a form at the top of this page.

Information on a specific gene

By default this page displays information about mapped sites (highlighted in red) in the gene - the location of the site and sequence of the adjoining regions.

This page offers the following options:

Get mapped sites: same as above.
Get putative sites: lists all putative sites in the gene, i.e., sites that have sequence similarity with a factor's binding sites. The location and sequence of the site, along with the putative factor's name, is displayed.
Get intergenic region: displays the sequence of the region upstream of the gene, up to the start (or end) of the next gene.
Retrieve sequence: displays the sequence of the gene.

Regulatory elements and transcription factors

This page by default tabulates all known factors. A specific factor may also be searched by name.

This page offers the following options:

Get factor and element list: this is the table of all known factors, displayed by default.
Get consensus list: this is a list of all known factor consensi. Each entry is a (factor, consensus sequence) pair.
Get matrix list: this is a list of all known factor weight matrices. Each entry is a (factor, weight matrix) pair.
Get distribution of mapped sites: link does not work.
Get summary of distribution: Tabulates the number of occurrences of each factor in the genome.
Get correlation between factors: This is a table that tabulates highly correlated factor pairs, i.e., pairs of factors that tend to occur together more often than not.

Information on a specific factor

This page offers the following options:

Get regulated genes: lists the genes that are regulated by this factor. Each gene is a hyper-link to information about that gene (see above for its format)
Get sites: lists the sites where this factor binds in the regulated genes.
Get consensus: retrieves the consensus sequence of the factor's binding site, if known.
Get matrix: retrieves the weight matrix describing the factor's binding sites, if applicable.
Get affinity data: displays the binding affinity of the factor to each of its binding sites.
Get genomewise distribution: lists all occurrences of this factor, along with the name of the gene and the location and sequence of the binding site.
Sort by copy No.: sorts the genes regulated by this factor, by the number of copies of its binding site in the genes.
Sort by function category: sorts the genes regulated by this factor, by the regulatory function.

Analysis tools

Retrieve promoter sequence: allows retrieval of the upstream region of a gene. The boundaries of the region can be specified.
Search existing motif: determines whether a sequence pattern (motif) occurs in the binding sites of the known factors. The searched motif could be up to 20 characters long, and allows spacers ('N') also. Optional approximate matching that tolerates up to a specified number of mismatches is also possible.
Search putative regulatory elements: determines if an input sequence (in FASTA format) has putative occurrences of some motif

Using predefined matrix and consensus the motif that is searched is one of the known factor consensi or weight matrices in the database.
Using self-defined consensus: the motif that is searched is a user-specified sequence, along with a parameter for allowing mismatches.
Using self-defined matrix: the motif that is searched is a user-specified weight matrix.

Group genes according to function categories: a set of user-specified genes (by name) are grouped by their function categories.
Repetitive sequence analyzer: this is a javascript tool that finds repetitive elements (whose nature can be specified in detail by the user) in a set of user-specified sequences. The page gives very clear instructions on how to use the tool.
Motif (>6-mer) distribution: this provides a chromosome-wise search for a user-specified motif. The result can be displayed according to the genomic location or the location in the upstream regions of genes.
K-tuple relative information: "The k-tuple relative information between two sequences is defined as the log[(n1+1)/(n1+n2+2)], where n1 is the number of k-tuples present in the first sequence while n2 is the number in the second sequence. ". Given two user-specified sequences (in FASTA format), it computes the above statistic for these two sequences (for a user-specified tuple length). The biological relevance of this statistic is unclear.
Palindrome sequences: this is a javascript tool that searches for palindromic substrings in an input sequence (in raw data format). Detailed instructions for using this tool are available on this page.
Multisequence alignment by GibbsDNA: this is a tool for multiple sequence alignment by GibbsDNA. The tool is heavily parameterized, so the alignment can be appropriately fine-tuned.

Submit a record to SCPD

This feature allows users to submit a new entry to the database. One can submit a gene, a consensus or a weight-matrix record:

Submit a gene record: allows user to submit information about a gene, such as name, ORF, coordinates, sequence, orientation, reference etc.
Submit a consensus record: allows user to submit a consensus sequence for a factor, along with its name and reference.
Submit a matrix record: allows user to submit the weight matrix for a factor.

Documentation

The ER and Object models for the database are presented in a diagram. This documentation is specifically for someone with knowledge of these models from database theory. The ER model is easier to comprehend.

Collection of binding affinity and expression data

This page lists all factors for which data on binding affinity (to sites) is available. Each enlisted factor is a hyper-link to its binding affinity data. (This information can also be accessed from the page dedicated to that factor's information - see above.)

Links

Links to other online databases are listed.

Manual for the SCPD Database

Saurabh Sinha
March 7, 2000

Link to the database

Introduction

Genes

Information on a specific gene

Regulatory elements and transcription factors

Information on a specific factor

Analysis tools

Submit a record to SCPD

Documentation

Collection of binding affinity and expression data

Links

Manual for the SCPD Database

Saurabh Sinha March 7, 2000

Link to the database

Introduction

Genes

Information on a specific gene

Regulatory elements and transcription factors

Information on a specific factor

Analysis tools

Submit a record to SCPD

Documentation

Collection of binding affinity and expression data

Links

Saurabh Sinha
March 7, 2000