|
|
|
|
CSE143 Winter 2005 Project 3
Deep Aqua, optimal Tic-Tac-Toe player
Part A due: Saturday, March 5, at 9:00 pm. Assignments may be turned
in up to 24 hours late without penalty, however you should try to finish
part A by the original deadline.
Part B due: Friday, March 11 at 11 am. No late assignments
will
be accepted.
Goals
The primary goal of this project is to give you experience using
recursion, and more generally, algorithms -- systematic procedures for
accomplishing a task. You'll also get some more practice in
splitting up a problem into convenient classes, and a bit of
reinforcement on putting together a simple GUI for user interaction.
Background
The bread-and-butter of classical artificial intelligence is search -- evaluate all the options
for what to do and pick the best. Often, a task to be
accomplished has multiple steps, and multiple options at each
step. In that case, the search procedure starts at the beginning,
and considers each option for the first step, to evaluate the benefit
of taking that step in helping complete the task. But to
evaluate a given first step, the search has to examine all the options
for the second step, that follow from that first step, and see how well
they would do. And
to evaluate an option for the second step, it has to check out all
possible third steps that follow from that second step. And so
on, until at some point, a sequence of steps reaches a point where
either the task is accomplished, or fails, or, more generally, some
final state is reached that state has some "value", as far as
accomplishing the task. After evaluating all possible paths,
there will be one (or several) paths with the highest value -- any of
those will do.
As you might imagine, examining all possible sequences of steps can
take quite a bit of time. In fact, for all but the simplest
tasks, there just isn't enough time to check out all
possibilities. Just think about how many cases we'd need to
check: Say that, for each step, there are k options, and say that
we expect to find a final state after n steps. How many sequences
are there? Well, there are k choices for the first step, and for
each of those there are k choices, so that's k2 so
far. For each of those k2 pairs of first-and-second
steps, there are k third steps -- now we're up to k3.
For all n steps, we'll need to try kn sequences. That
is, the complexity of this sort of search is O(kn) -- this
is exponential in the expected number of steps, which can very rapidly
turn into "not before the sun burns out".
Nevertheless, we're going to try it on a very small problem -- playing
tic-tac-toe. In playing a game like this, the "task" is to make
moves until we get to the end of the game, and the value is our final
score. For tic-tac-toe, we can win, draw, or lose -- win has the
highest value, then draw, then lose. (By tradition, we give a
positive score (e.g. 1) to a win, nothing (0) for a draw, and a
negative score (e.g. -1) for a loss.) So when we assign a value
to a sequence of moves, what matters is what's at the end -- the value
of the whole sequence of moves is our score at the end.
If we're playing tic-tac-toe (or any other two-player game), we control
only our own moves. So if we're going to search for the best
move, we can imagine picking each of our
moves in turn, but the usefulness of those moves depends on what our
opponent does, so if we don't know what they're going to pick, how can
we possibly evaluate our own moves? Well, we can imagine what
we'd do, if we were the opponent -- we can make their moves for them,
in the hypothetical games we're evaluating. So, do we want them
to make random moves? Or be an average player? Or should we
assume they're going to mess up? No! Selecting a move for
our opponent that is less than the best they could do is wishful
thinking on our part, and is risky, because the real opponent won't
want to give us any breaks -- they're going to do the best they can,
and we have no reason to expect they won't choose their best
move. So, when we play for them, we should do just that -- choose
their best move against us.
How do we figure out what would be our opponent's best move? The
same way we figure out ours -- search. And if we're careful to be
somewhat general, we can use our same search procedure, or something
very like it, for our moves and theirs.
What's different when we're playing for our own player, versus for the
opponent? It's in what value we assign to moves. It's
easiest to think of the value of moves in terms of our own player's
score in both cases. When we make a move, we're trying to make
our score as large as possible. When the opponent makes a move,
they're trying to make our score as small as possible. So when we
alternately make our moves and theirs, we're alternately trying to
maximize, then minimize, our score. This scheme of assuming both
we and the opponent make the best moves we can, and that our best is
what maximizes our score, and their best is what minimizes our score,
is called minimax
search. It's the optimal choice for any two-player "zero-sum"
game (a game where if one wins, the other loses), and is at the heart
of even the most sophisticated game-playing computer programs.
Not administrivia
Unlike previous projects, there is really not a separate first part
that can be done without taking into account what will come later. The
design decisions you make depend on knowing what you're ultimately
going to be using your code for. So although we'll split the
requirements into two sections, you
should read the entire project
description up front, and keep in mind, as you're doing the first part,
what you'll need for the second. Also, the first part is
short -- you
should go straight on to the second when you're done, and not wait for
the part A due date.
Administrivia
For this project you may work with a partner of your own
choosing, possibly from a different section, or you may work alone. If
you don't choose either of these options, you will be assigned a new
partner from your section, as in past projects. If you work with a
partner, you and your partner should turn in a single set of files.
After the final part of the project is done, you will individually
produce a written report. We strongly suggest that you work with a
partner, as it will be very helpful to have someone to talk to about
detailed design and implementation decisions.
Grading for part A: This is a progress check. We will try to
give you quick feedback on the scale of 0-3:
3=no major problems, 2=something needs to be fixed, 1=serious trouble,
and 0=no credible effort. As always, besides the stated
requirements, be sure to include appropriate JavaDoc and
other comments in your code, use meaningful names, indent sensibly,
provide toString() methods where appropriate, and so forth.
Grading for part B: This
is
the final code turnin for
the project. If you work with a partner, you and your partner
will
receive the same scores on the programming parts of the project.
Programming projects will be evaluated both for how well the code works
and how well it is written, each on a scale of 0 to 4 (see the project
assessment guidelines on the main projects page for details).
Overview
In this project, you'll be implementing a tic-tac-toe playing program
using minimax search, and making it possible for a human to play
against your automated player. You'll start by allowing two
humans to play each other, with a simple GUI, then adding the option to
play against the computer (and perhaps even have two automated players
play each other).
For purposes of the progress check only, we're dividing this into part
A, where you'll get the human-versus-human version running, with a
mouse-operated GUI, and part B, where you'll add the automated
player. But as mentioned above, you should keep in mind at all
times what you'll need to do for part B. We'll include some notes
on what to watch out for as you start on part A. In the Details
section, we'll note whether each topic is included in part A or part B,
and will summarize the requirements for each at the end.
Just in case anyone is unsure of the rules of tic-tac-toe, here is a
brief description (which is all that's necessary...). Tic-tac-toe
is
played on a board with 9 squares, arranged in 3 rows and 3
columns.
Two players alternately mark an unmarked square with their symbol -- an
X for the player who starts first, and O for the player who starts
second. A player's goal is to get three of their own symbols in a
row,
either horizontally, vertically, or diagonally. Play stops either
when
one player wins, or when all squares have been filled without a win --
in that case, the game is a draw.
Details
Dividing up the problem (part A)
This is an opportunity for you to get some practice in splitting up an
application into classes. Think of what objects there are in a
tic-tac-toe game, what tasks each has to perform, and what information
you need to have for each.
There isn't One Best Way to do this, and, in fact, you may find that
what's convenient with two human players isn't so convenient when one
player is part of your code. For instance...
- Human players using a GUI will just make their moves at
will, but an automated player (or a human player using a command line
(text) interface), will need to be asked for a move. So when you
have an automated player, what will do the asking? Two humans
will "run" the game themselves, by taking turns, but what will control
the game if one or both players don't take the initiative, but instead
need to be asked? It may be helpful to think about what you'd do
with two humans if you were doing just a command line interface.
- Two human players are the same type of thing -- it would be
easy to just build them directly into the game. In fact, because
you know whether it's X or O that plays next, you don't even need to
know which human clicked the mouse in your GUI! An automated
player is rather different, but it would be nice make the two types
"plug-replaceable", i.e. not to have to change anything else, but just
"plug in" different types of players.
- When you implement minimax, you'll need to represent many
different possible moves, without losing track of what the current
state of the game actually is. With two human players making
their own decisions, you don't need any recursive search, so all you
need to know is the current state of the game.
- The MVC architecture is your friend. It tells you to
keep the model (e.g. the game state) separate from the GUI. When
you're doing a minimax search, you don't want to drag an entire GUI
along with each possible move's game state!
Representing the state of the tic-tac-toe game (part A)
To support playing the game, you'll need a way to...
- ...represent the "state" of the game -- what Xs and Os are
on the board, and whose turn is next.
- ...tell if the game is over, and what the outcome is.
- ...get moves from the players.
- ...decide if a player's proposed move is legal. (It's
not if there's a symbol already there.)
We suggest that you make it possible to use each of these features
without a GUI, so you can test by executing single method calls from
DrJava or from JUnit tests. JUnit tests are not required for this project, but
you may find them helpful as you debug your code.
A simple GUI (part A)
Put together a simple
GUI that lets players select moves with the mouse. Because time
is short, and we don't want you bogged down in learning new Swing
classes, we'll give more explicit recommendations for this GUI.
You'll need three things:
- ...the board itself -- a 3 x 3 grid of JButtons will work
nicely.
- ...a button to start or reset the game.
- ...a place for a message, to tell who should move next, or
to tell when the game is over and what the outcome is.
You'll want to use a JFrame as the top level window as usual, with a
JPanel to display the game board and another one beneath it for the control
button and message.
Recall that the default layout manager for a JPanel displays the components
that are added to it from left to right. For the game board, we'd like
to specify instead that the buttons that make up the game cells be laid
out in a 3 x 3 grid. A different layout manager, GridLayout, is perfect
for this. So when you create the JPanel to display the game, instead
of using the no-argument JPanel constructor, use this sequence to specify
that you want to use GridLayout as the layout manager, and it should
have 3 rows and 3 columns:
JPanel grid = new JPanel(new GridLayout(3,
3));
Then create 9 buttons and add them to the grid -- they
should
fill
in
left
to
right,
then
top
to bottom. In order to keep your buttons from shrinking to nothing
when you "pack" your JFrame, you should set their preferred size to something
reasonable.
For instance:
JButton button = new
JButton();
button.setPreferredSize(new Dimension(80, 80));
grid.add( button );
Then add this panel to the JFrame using it's getContentPane().add() method.
For your start button and a status message area, you can put another
JPanel in (say) the south part of your content pane. A convenient
component for a text message is a JLabel -- it's like a JButton in that
it can display a line of text, or an icon, but it does nothing in
response to mouse clicks. JLabel has a setText method that you
can use to change what message it displays.
If you don't give JPanel a specific layout manager as above, it has
a FlowLayout -- this arranges components mainly left to right, top
to
bottom, but doesn't constrain them to be the same size, and it will "flow" them
onto multiple rows as needed depending on their size and the width of the
window. If you add a JButton and a JLabel,
they'll likely end up side by side, which is what you want.
Running a human versus human game (part A)
You've seen how to get button press events (look back at
BallSimControl.java). Since you know whether X or O plays next,
you don't need anything but a mouse click on a button to tell both what
square the move is in and which symbol is needed. You can display
the symbol right on the button -- you can change the symbol with
setText. (If a player clicks a button that's already been used
for a move, you can just ignore it --
don't worry about telling the player they're being silly.)
Make sure you can stop the game when someone wins, or when it's a draw
(e.g. stop responding to attempts to move).
Display a message to tell who should move next, or, when the game is
over, what the outcome was.
Implementing an automated player -- minimax search
(part B)
When your automated player is called on to choose a move, it may be
starting from some partly-filled board. From there, it will need
to check each allowed move. What it needs for each possible move
is the expected score if it chooses that move. Then it can choose
a move with the best score. In this case, where the player is
making a move for itself, the "best" score is the highest. By
tradition, since it's picking the highest score, when the player tries
out a move for itself, it's called "Max".
The expected score from making a move is also called the "value" of the
move. (We sometimes also say this is the value of the game state
that results from making that move.)
How do we get the value of one of these moves? Well, if the
move ends the game, then its value is the final score: +1 for a win, 0
for a draw, -1 for a loss.
If the move didn't end the game, then it's time to make a move
pretending we're the opponent. So we do almost exactly the same
thing: Starting from this new game state (including the
hypothetical move the player is trying out), we go though all allowed
moves for the opponent, and get their values. Again, these are
our player's expected scores. But now, the opponent does their
worst to us, and chooses the smallest, and returns that as the value of
their move. Because it chooses the minimum score, the opponent is
traditionally called "Min".
We just said, when the opponent tries out their moves, it has to
evaluate them. How we do that is suspiciously similar to what we
did before. If the move ends the game, we return the final
score. If not, we go on to...try out moves for our player again.
So now we've come full circle. Surely we don't want to replicate
all the player's search code right here, and then replicate the
opponent's code, and then the player's code, and then... This is
where recursion comes in -- we already have (or will) a method that
tries out our players moves and returns the best value. So, we
just call that. And inside that call, it may need to try out
opponent moves again, but we've got that covered, because it's just
using the same opponent code as before.
There are two major gotchas when writing a method that calls itself (or
calls something that calls it back, or any series of calls that can
come around and call the same thing again).
First is, we have to stop some time. If you think your program is
really slow, it may not be slow, but in an infinite recursion.
Unlike an infinite loop, an infinite recursion can't go on
indefinitely. This is because on each method call, Java needs
some memory for parameter values and other local variables of the
method it's calling, and some place to remember where to go back to
when the method is done. It can't get rid of this memory until
the method returns. If the method keeps calling itself, Java will
have to keep allocating more of these "call frames". Eventually
it will run out of memory, and you'll get a
StackOverflowError. So be sure you correctly detect end states!
Second, our recursive methods must be "reentrant". This means it
must be possible to any number of callers to be using the same code
without interfering with each other. If the method relies on
storing information in instance variables, or other "global" variables,
then when a recursive call comes along and changes those itself, the
original call's information will be gone. So recursive methods
must store any information they need only in local variables, as each
call gets its own copy of these.
Running a human versus machine game (Part B)
Now that you have an automated player, you need to let it take the
place of one of the human players. Unlike the human versus human
case, the players won't take turns on their own -- you'll have to
alternately get a move from the human and the automated player.
Be careful not to let the human sneak extra moves in before the
automated player has had a chance to take their turn!
Summary of what to do for part A
- Come up with a way to represent the state of a tic-tac-toe
game.
- Let two human players play against each other, using a
simple GUI.
Summary of what to do for part B
- Make an automated player, using minimax search.
- Allow a human to play the automated player through the GUI.
Hints
Understanding minimax -- It's good to try
out a minimax search by hand, drawing all the possible moves that
branch out from each game state. But this gets big very
fast. What you might do instead is try a 2 x 2 version of
tic-tac-toe.
Multiple game states -- When you need to try out
lots of moves starting from some state, you could make an object
representing each new move's state from scratch, specifying what's in
each square. An alternative is to take the starting state and a
move, and make a new state based on those.
Different
types of players -- This is an opportunity to try
making your own interface or (abstract) superclass to represent a
generic player. Because the human player operating a GUI, and the
automated player that has to be asked for moves are rather different,
giving them a common interface may involve some compromises.
Write the
code first, remove redundancy later -- When you're
implementing minimax, you may find you have two very similar sets of
code, one for "Min" moves and one for "Max" moves. If you can
find a way to merge them right away, that's great, but don't get hung
up on this. Another way to do it is to go ahead and get it
working with two almost identical sets of code, then afterward look for
tricks to combine the two.
Here's
another case where you might want to tolerate some redundancy to begin
with. You've probably noticed that sometimes before you start
into a loop, you have to "prime the pump" by doing something for the
first time outside the loop, that you repeat inside the loop. The
same sort of thing happens with recursion, only there, the pump-priming
usually includes the call to start the recursion, and inside the
recursive method, the parallel code includes the recursive call.
Later, if you want to get try and get rid of the duplication, one trick
is to make an intermediate method, and call it both from where the
recursion is started, and form inside the recursive method. The
intermediate method then calls the original recursive method.
That is, you have a method a that calls method b, then b calls a.
This is a perfectly valid form of recursion. (Note you may not
need this trick at all, so don't worry if it doesn't match how you set
up the recursion.)
Extra credit
We strongly recommend that you get your basic requirements running
before trying for extra credit.
Optimize
your search -- There are many opportunities for
making your search faster. Some of these have to do with not
re-doing work you've already done, others with realizing you're on a
hopeless path. For instance:
- It's possible to reach the same game state in tic-tac-toe
by different sequences of moves. So any state you reach, you might have already evaluated.
To avoid evaluating it again, you could save the value. You'd
want a way to store it so you could find it given the state. By
now, you can probably guess what Java library class is appropriate for
this... But there's a problem. If you're making new game
state objects on each move, you won't have the exact same object when
you want to look up a previous state. So you'll need to change
how equals() works, to compare the contents of the state, not the
reference.
- An additional wrinkle is that some states are mirror images
of each other, or rotated versions of each other. Does this make
any difference to the eventual score? So you could extend your
equality check to allow these equivalent states as well.
- At any point when you're trying out a set of moves, you
have a current best that you have to beat. If you hit some end
state that is going to spoil that entire "branch" of the recursion, so
that there's no way you can beat that previous best, you can quit
evaluating that branch.
Turn your
application into an applet -- An applet is a Java
program that can be loaded from a web server and run in a
browser. If you convert your GUI into an applet, and package all
your code up appropriately, you can add your tic-tac-toe game to your
own web page. Look at these Java tutorial sections on writing
applets, and on using
Swing components in applets.
Show what
the automated player is doing -- E.g. when the
automated player is choosing, pop up a window or highlight the board showing
each
move
it's
trying. Give it a "next" button to step through the moves slowly.
Make a
general two-player game driver -- Can you come up
with an abstract representation of a game -- perhaps a game interface
-- so that you can reuse your minimax search for other games?
For more information (optional, but if you're interested in this stuff...)
Artificial Intelligence: A
Modern Approach, by Russell and Norvig, has a chapter on search
for game playing. This is ch. 6 in the 2nd edition, ch. 5 in the
1st edition.
Even much more complex game playing programs, like Deep Blue, that won
against the best human chess player, Gary Kasparov, in 1997, use
minimax. But they don't run games all the way out to a win or
loss! (Think about how many options there are for each next move
-- the "branching factor" k we mentioned above when talking about
complexity -- and how many moves there might be to the end of a chess
game -- that was n above. So how big is kn for
chess? Riiiight... And anyway, moves in championship chess
have time limits. So chess-playing programs stop at some depth of
future moves, and they estimate the "future value" of the board
positions they do reach -- this substitutes for the actual score at the
end. The competence of a chess program depends not only on how
many moves it can check in the time available, but also how well it
estimates the value of a board position. These "evaluation
functions" are often carefully-guarded secrets.
If you'd like to read about the software (and the specialized computer
hardware) that were required to beat Gary Kasparov, a description of
the events and technology are recorded here. There's
an article by the UW's own William Calvin about how
humans play chess. Or go to the ACM
(Association for Computing Machinery) Library web site, and search
for "deep blue" -- there are several apropos articles. (If you
can't access articles here from outside the UW, try it from a UW
machine -- we may have a site license for their content.)
Why did the "background" section say "classical AI" depends on
search? That implies it's history -- what's wrong with it?
And if that's "classical", what's "modern"? Search takes too long
for most realistic problems. We've got more effective methods
now, that don't require exponential runtimes. They also deal with
the fact that the real world is imprecise, and often what we want it an
approximation, or a good-enough solution. These new methods go by
names like "machine learning" and "data mining". Instead of
search, they depend on statistics...
Explaining how this works is beyond the scope of this writeup. ;-)
What to Turn In
Use the appropriate turnin form for part A or part B to
turn in the Java source files that make up your project, including any
JUnit tests. If your project uses any other files, such as images, turn
those in also. If you have many files, including things like images,
you can bundle them into an archive file (zip, jar, or tar) and turn
that in. Multiple turnins are fine and highly recommended if you are
planning to add extra credit features. Once you've got something that
meets the
basic requirements, turn that in. Then if you add to your project, turn
in the extended version(s) later - we'll grade the last one you turn in.
|