Project Plan and Platform 4/30/99

Primary Goals

4/30/99: The primary goals for this project have not changed since 4/21/99. This is a good thing, as our vision of this project has not been changed remarkably. For reference, a repetiton of our previous goals follows.

Two main features are vitally important. These are:

1. Efficient memory management. Currently the test cases are being stored in upwards of 200MB files. It is the client's wish that this be reduced so that larger test cases can be run, and so that the search space is not as large. Thus any significant reduction of size of storage (by at least some constant factor) is desired. Because of the sheer size of the data there may be limitations as to what platforms and what software are available to use.

2. No loss of information. There must be no loss of any information that is currently kept or planned to be used in the immediate future. Any information that is currently kept by the inefficient dictionary system must be able to be either directly accessed or calculated from our storage space.

Back to top

Secondary Goals

4/30/99: Some of the secondary goals have changed slightly, some have been dropped entirely. Timestamping is probably going to be a higher priority, as it will make our tables have a unique key for the values, making life significantly easier. Using the database for queries, however, has been somewhat dropped, as the client has concerns that too many things would have to be rewritten on his end to accomodate this feature. The platform issues and user interface have, for the moment, not changed, though our choice of using MySQL as the definite database management system lends itself to easily portable code. Thus, here are our revised secondary goals:

1. Timestamping. Currently there is no way to make any temporal relationship tests. With very efficient memory management this becomes a feasible goal. Based on our schema ideas, it is unclear whether this will increase the data stored significantly -- the expected answer is that it will not. Some research on this is being done, but for the time being it is assumed that we will either be forced to insert arbitrary ordering of some kind, or take as input the timestamps that the user sends us.

2. User interface. The client has expressed virtually no interest in a nice, easy user interface. However, most databases have some kind of nice user interface, as this makes debugging easier, it makes the client's life easier, and it incorporates abstraction and encapsulation of data better. Depending on the platform, a clean user interface can be very easy or very hard, and thus will be platform independent.

3. Platform. The client's main platform used are Sun systems. However, the client has expressed interest in other platforms, and does not mind what platform eventually gets used, so long as it is a platform that the program already runs on. It is currently the plan to use NT as our main platform. It may be nice to be able to make the program at least portable easily to other platforms.

Back to top

Finished Project overview

4/30/99: This part of the project plan has also not changed for the most part, save the erroneous information about making new invariants.

The Database project will consist of three modules, in it's current form. The first module translates the data that comes out of the dynamic invariant program and converts it to a form that our database can understand. The second module is the database itself; a separate instance of the database will be created for every run of the client's program. The final module will be the interface to the database. It will handle returning information from the database in the form that the client expects.

Back to top

Platform Specifications

4/30/99: We have committed to using MySQL as our database managment system. Furthermore, we all have an account on a Sun, and are setting up the database system there. The current idea is to make and test the database on the Sun, then port it over to wherever the user would like us to, provided it is somewhere on the research Suns.

For software, the group will be using MySQL as the DBMS, a free SQL server/database program. This program has several advantages. One is that it is extraordinarily portable. There are versions that run on NT, Linux, Solaris, *BSD, and others. This is a good thing, as our client uses a Sun as their primary platform, and it would be desired to have the database on the same platform as the program it is designed to run with.

MySQL is also a familiar interface. There is an API for all of the platforms mentioned above, it is primarily programmable in C/C++, and is not hard to understand. Thus, we can use any platform we choose for development, so long as that platform supports MySQL.

MySQL satisfies all our requirements for a database program save one -- it cannot use arrays as data structures in the tables, that we've seen. However, it can create tables on the fly, it can store arbitrary-sized strings, can be accessed remotely easily, and has reasonably decent data storage size.

Finally, several people in the department have knowledge of MySQL, including the TA Ryan Satogata. Thus, we will be able to get support for the DBMS

There are other requirements for software on the front end of the system. One module that must be constructed is the script that takes the text output of the invariant program and transforms it into something our database can understand. Currently a scripting language such as Perl or Python are the top runners. Preliminary work has been done with Perl, and it is the front runner.

Hardware specifications are as follows: A Sun machine, running OS 5.0 or greater. This is because the development machine and the target machine both have these traits.

Back to top

ER Diagram/Schema

The ER diagram is somewhat erroneous, as the current schema does not easily port to a "good" ER diagram. There are not many relationships that are necessary to get -- for the most part, the information retrieval will be handled via queries of one form or another that do not involve many joins, and thus the relationships between tables are not as important.

Our schema plan is as follows: For every program point we keep track of its name, the type of point (begin, end, loop), the variable names of the function, the variable types of the function, and comparable variables. For each uniquely identified program point (by name), we create a separate table. That table's name is dependent on the program point's name, and has entries for each instance of a program point's calls. Because these entries vary from function to function (one function can have any number of parameters), we have to create a new table for every function/program point. Each of these rows is uniquely identified by a timestamp, that is either assigned based on its relative distance from the beginning of the data trace file, or given to us.

Note that there are several concerns about this schema that have not fully convinced us to adopt it completely. One is how it deals with multi-dimensional arrays -- we will have to talk with Michael about this. Another is the creation of tables and timestamps -- it may be more efficient to store identically called functions together, and increase some variable called "count". For the time being, this is what we will do.

Back to top