|
4/30/99: The primary goals for this project
have not changed since 4/21/99. This is a good thing, as our vision
of this project has not been changed remarkably. For reference, a
repetiton of our previous goals follows.
Two main features are vitally important. These are:
1. Efficient memory management.
Currently the test cases are being stored in upwards of 200MB files.
It is the client's wish that this be reduced so that larger test cases can be run,
and so that the search space is not as large.
Thus any significant reduction of size of storage (by at least some constant factor)
is desired. Because of the sheer size of the data there may be limitations as to
what platforms and what software are available to use.
2. No loss of information. There must be no loss of any information that
is currently kept or planned to be used in the immediate future.
Any information that is currently kept by the inefficient dictionary system
must be able to be either directly accessed or calculated from our storage space.
Back to top
|
Secondary Goals
4/30/99: Some of the secondary goals have
changed slightly, some have been dropped entirely. Timestamping is
probably going to be a higher priority, as it will make our tables
have a unique key for the values, making life significantly easier.
Using the database for queries, however, has been somewhat dropped,
as the client has concerns that too many things would have to be
rewritten on his end to accomodate this feature. The platform issues
and user interface have, for the moment, not changed, though our
choice of using MySQL as the definite database management system
lends itself to easily portable code. Thus, here are our revised secondary goals:
1. Timestamping. Currently there is no way
to make any temporal relationship tests. With very efficient memory
management this becomes a feasible goal. Based on our schema ideas,
it is unclear whether this will increase the data stored
significantly -- the expected answer is that it will not.
Some research on this is being done, but for the time being it is assumed that we will either be forced
to insert arbitrary ordering of some kind, or take as input the timestamps that the user sends us.
2. User interface. The client has expressed virtually no interest
in a nice, easy user interface. However, most databases have some
kind of nice user interface, as this makes debugging easier, it
makes the client's life easier, and it incorporates abstraction and
encapsulation of data better. Depending on the platform, a clean
user interface can be very easy or very hard, and thus will be
platform independent.
3. Platform. The client's main platform used are Sun systems.
However, the client has expressed interest in other platforms, and does
not mind what platform eventually gets used, so long as it is a platform that
the program already runs on. It is currently the plan to use NT as our main platform.
It may be nice to be able to make the program at least portable easily to other platforms.
Back to top
|
4/30/99: This part of the project plan has also not changed for the most part,
save the erroneous information about
making new invariants.
The Database project will consist of three modules, in it's
current form. The first module translates the data that comes out of
the dynamic invariant program and converts it to a form that our
database can understand. The second module is the database itself; a
separate instance of the database will be created for every run of
the client's program. The final module will be the interface to the
database. It will handle returning information from the database in
the form that the client expects.
Back to top
|
4/30/99: We have committed to using MySQL as our database managment system.
Furthermore, we all have an account on a Sun, and are setting up the database system there.
The current idea is to make and test the database on the Sun, then port it
over to wherever the user would like us to, provided it is somewhere on the research Suns.
For software, the group will be using MySQL as the DBMS, a free
SQL server/database program. This program has several advantages.
One is that it is extraordinarily portable. There are versions that
run on NT, Linux, Solaris, *BSD, and others. This is a good thing,
as our client uses a Sun as their primary platform, and it would be
desired to have the database on the same platform as the program it
is designed to run with.
MySQL is also a familiar interface. There is an API for all of the platforms mentioned
above, it is primarily programmable in C/C++, and is not hard to understand.
Thus, we can use any platform we choose for development, so long as that platform supports MySQL.
MySQL satisfies all our requirements for a database program save one --
it cannot use arrays as data structures in the tables, that we've seen.
However, it can create tables on the fly, it can store arbitrary-sized strings,
can be accessed remotely easily, and has reasonably decent data storage size.
Finally, several people in the department have knowledge of MySQL,
including the TA Ryan Satogata. Thus, we will be able to get support for the DBMS
There are other requirements for software on the front end of the system.
One module that must be constructed is the script that takes the text output of
the invariant program and transforms it into something our database can understand.
Currently a scripting language such as Perl or Python are the top runners.
Preliminary work has been done with Perl, and it is the front runner.
Hardware specifications are as follows: A Sun machine, running OS
5.0 or greater. This is because the development machine and the
target machine both have these traits.
Back to top
|
|
The
ER diagram
is somewhat erroneous, as
the current schema does not easily port to a "good" ER diagram.
There are not many relationships that are necessary to get --
for the most part, the information retrieval will be handled via
queries of one form or another that do not involve many joins, and
thus the relationships between tables are not as important.
Our schema plan is as follows: For every program point we keep
track of its name, the type of point (begin, end, loop), the
variable names of the function, the variable types of the function,
and comparable variables. For each uniquely identified program point
(by name), we create a separate table. That table's name is
dependent on the program point's name, and has entries for each
instance of a program point's calls. Because these entries vary from
function to function (one function can have any number of
parameters), we have to create a new table for every
function/program point. Each of these rows is uniquely identified by
a timestamp, that is either assigned based on its relative distance
from the beginning of the data trace file, or given to us.
Note that there are several concerns about this schema that have
not fully convinced us to adopt it completely. One is how it deals
with multi-dimensional arrays -- we will have to talk with Michael
about this. Another is the creation of tables and timestamps -- it
may be more efficient to store identically called functions
together, and increase some variable called "count". For the time
being, this is what we will do.
Back to top
|
|