import java.util.*; /** * Classes that implement SequenceSearch provide a set of sequence * comparison and search methods that can be used to find sequences, * in a collection of sequences, that have close matches to some * pattern, and tell where in each sequence the close matches are. * * The class should do whatever setup it needs to get ready to perform * searches, such as reading in data files, creating internal data * structures, etc., in its constructor, as there are no * methods in this interface to tell it to do those things. * * Because there were two protein translation options, this interface * requires a method that tells which translation is in use. */ public interface SequenceSearch { // Although it cannot be part of the interface, a SequenceSearch class // should provide a constructor that takes no parameters, and does any // necessary setup. // Methods that search the entire collection of gene sequences or their // translated protein sequences. /** * Search through all gene sequences and find matches to the given * pattern that have a Hamming distance no more than the given value. * Return a Collection containing a SearchResult object for each sequence * with at least one sufficiently close match. * * @param pattern - the pattern to be used in the distance comparison * @param maxDistance - the largest distance that should be accepted * as a match * @return a Collection containing a SearchResult object for each sequence * with matches */ public Collection searchGenesByHammingDistance( String pattern, int maxDistance); /** * Search through all protein sequences and find matches to the given * pattern that have a Hamming distance no more than the given value. * Return a Collection containing a SearchResult object for each sequence * with at least one sufficiently close match. * * @param pattern - the pattern to be used in the distance comparison * @param maxDistance - the largest distance that should be accepted * as a match * @return a Collection containing a SearchResult object for each sequence * with matches */ public Collection searchProteinsByHammingDistance( String pattern, int maxDistance); /** * Search through all gene sequences and find matches to the given * pattern that have an edit distance no more than the given value. * Return a Collection containing a SearchResult object for each sequence * with at least one sufficiently close match. * * @param pattern - the pattern to be used in the distance comparison * @param maxDistance - the largest distance that should be accepted * as a match * @return a Collection containing a SearchResult object for each sequence * with matches */ public Collection searchGenesByEditDistance( String pattern, int maxDistance); /** * Search through all protein sequences and find matches to the given * pattern that have an edit distance no more than the given value. * Return a Collection containing a SearchResult object for each sequence * with at least one sufficiently close match. * * @param pattern - the pattern to be used in the distance comparison * @param maxDistance - the largest distance that should be accepted * as a match * @return a Collection containing a SearchResult object for each sequence * with matches */ public Collection searchProteinsByEditDistance( String pattern, int maxDistance); // Methods that search one sequence of any sort, finding match locations. /** * Searches a sequence for substrings that match a given pattern, using * Hamming distance. A substring matches if its distance is no more * than the given maximum. Returns the locations of the matches in a * List of Integers. The location is the index in the sequence of the * beginning of the substring that was a sufficiently close match. * * @param sequence - the sequence to search for matches * @param pattern - the pattern to be used in the distance comparison * @param maxDistance - the largest distance that should be accepted * as a match * @return a List of Integers containing locations of the matches in this * sequence */ public List searchOneSequenceByHammingDistance( String sequence, String pattern, int maxDistance); /** * Searches a sequence for substrings that match a given pattern, using * edit distance. A substring matches if its distance is no more * than the given maximum. Returns the locations of the matches in a * List of Integers. The location is the index in the sequence of the * beginning of the substring that was a sufficiently close match. * * @param sequence - the sequence to search for matches * @param pattern - the pattern to be used in the distance comparison * @param maxDistance - the largest distance that should be accepted * as a match * @return a List of Integers containing locations of the matches in this * sequence */ public List searchOneSequenceByEditDistance( String sequence, String pattern, int maxDistance); // Methods that compare strings and return a "distance" that measures // how dissimilar the strings are. /** * Computes the Hamming distance between two strings. * * @param a - one string to compare * @param b - the other string to compare * @return the Hamming distance between the two strings */ public int hammingDistance( String a, String b ); /** * Computes the edit distance between two strings. * * @param a - one string to compare * @param b - the other string to compare * @return the edit distance between the two strings */ public int editDistance( String a, String b ); // Methods that tell what your class supports. /** * Returns the "version number" of the protein translation procedure. * This should be 1 if the class provides the original translation * procedure, or 2 if it uses the updated procedure. * * @return version number of the protein translation procedure */ public int translationVersion(); }