I installed BLAST and CLUSTALW.

I downloaded the protein and genome files for all the Bacteria.

Using an awk script, I filtered out all the proteins lacking an upstream region of at least 50 base pairs.

Using the filtered proteins, I created a BLAST database to query against.

I took the filtered protein list for Streptococcus pyogenes, and ran it against the new BLAST database.

I processed the results using an awk script, gathering together all the proteins and upstream segments into files for later processing.

I took 3 of the top hits (on the order of 500 matches), and ran CLUSTALW on them.