ScalabilityTest

This is a rough rendering of a page from the old Prevayler wiki. Please see the new wiki for current documentation.

The scalability test's goal is to compare Prevayler's scalability to that of DBMSs such as Oracle, MySQL and SQLServer.

It is composed of two tests:
#Query Scalability Test - Measures how many queries the test subjects can handle per second.
#Manipulation Scalability Test - Measures how many manipulations the test subjects can handle per second.
The tests run against a defined subject interface. There are two scalability test subject implementations: the JDBC implementation, which uses any SQL database to store its objects, and the Prevayler implementation, which uses plain Java objects.


How does each scalability test work?
First, it creates a number of RecordObjects. This number can be configured to 100k, 1M or 10M objects. When the subject is a database, these RecordObjects are mapped directly to rows in a table.

Then, the test runs for several one minute rounds. For each round, the number of test threads is increased by one. These test threads are threads that perform the greatest ammount of queries (query test) or manipulations (manipulation test) they can during that one minute.

The tests are limited to very simple operations so that the relational databases can cope:
-Each query is a simple attribute equality query (WHERE NAME='NAME1234') and returns a List of 10, 100 or 1000 RecordObjects depending on the number of RecordObjects initially created.
-Each manipulation is a transaction with one RecordObject deletion, one insertion and one update, all by ID.

Isn't it unfair for the database to have to read from disk while Prevayler uses RAM only?
You must give your database enough RAM so that it can keep all its data blocks in RAM and not have to read from disk either. @;) The results are only considered valid if you do that.


See: ScalabilityTestResults for results and easy instructions on how to run the tests for yourself.
How if the query is consisting complex condition (using AND, OR, etc) and with complex relation (JOIN, INNER-JOIN, etc)? Since what I got in mind, good RDBMS not only can store data, but it must provide a way to retreive needed data fast.
I'm not talking about simple guess-book-like table, but more to real, big enterprise database here..


AND? OR? JOIN? INNER-JOIN? All too simple. How about a query based on polimorphically evaluated methods based on the current millisecond? Can your database even handle that? Even object databases are notoriously better than relational databases for complex queries. I have chosen a very simple query on purpose to handicap Prevayler and give relational databases a better chance. --KlausWuestefeld.


Ok, this is what I'm looking for. I can download and use prevayler on my next project, but too bad... I don't know how to perform complex query on my collections. Is there any sample or resources that I can visit to learn more about this things? so I can rely on prevayler to perform my database queries.


See: HowDoIQueryMyObjects?


Yes, prove it that it's "TOO SIMPLE" since Java's collections (eg. TreeMap, HashMap) are only using one key, how if it involve more than one key (eg. query based on the name and zip-code)?


You are not restricted to using the Java collections API. See: HowDoIQueryMyObjects? and NoMorePorridge.


Everybody, just don't click the "NoMorePorridge" links, it give you nothing.. no solution there.. it's just joke... (don't know who's idea is this!)
Prevayler is a great concept, but it not ready to be used for serious application. We can say that prevayler is a very powerfull boat-engine, but it's only an engine, it cannot float in the water yet... so no use!
If you claim prevayler are the solution.. think again.. since it not complete!!!.. I know we not restricted only to using java collections API. but... what else? maybe somebody got the API to do query, but somebody else NOT.
So how we can use prevayler for our project if we don't have any query API????


Have you taken a look at the collections APIs and query languages in HowDoIQueryMyObjects? Have you even entered that page?


Hi, I am new to this discussion and IMHO there are some valid points on both sides:
- On one hand, the SQL minded people should realize that doing a natural join between tables is a quite inefficient way to collect related data. When manipulating classes, most of the related informations are connected using associations between different classes. One-to-many associations are implemented very efficiently using Java collections. Many-to-many associations can be implemented using ad-hoc classes. And the queries can be written in an efficient way by navigating through these links and applying different filters at each level (this is how pre-relational databases where working too).
- On the other hand, the Relational model gives you a flexibility that is not addressed by class-oriented languages like Java. If your user needs to collect informations in a new unanticipated way it should be quite easy to write a new SQL query collecting the information and to add some indexes to the tables to get a decent result. Whereas with class-oriented languages you will probably need to refactor your classes to add a new association and think twice about how to feed this association for existing objects and how to handle the migration (see Schema Evolution problems regarding that matter). Chances are, that these repeated difficulties will make your developers find some workarounds to implement those evolutions that will bring you far from the object paradigm...
For me, class-oriented languages are perfect to develop application/web servers, network clients, GUI frameworks, image processing, etc... but are just too inflexible for building business apps. I would love to see Prevayler available for an object-oriented language like Self (by object-oriented, I mean that objects are not built by classes allowing for more flexibility).
Just my 2 eurocents.
Arnaud.Clere@free.fr

Build your own Prevayler in Self then. It's just ten classes (in a class-oriented language), you'd do it in no time. -- JonTirsen

You're right. I need to talk less and code more  :)
Arnaud



Prevayler is great! Previously, I wanted to use FirebirdSQL to store my data, as it could grow large, and everyone advised me to store my data in a RDBMS, mainly because they are good at handling large quantities of data, and you don't have to go through the hassle of having to parse CSV or use your own database file format. Naturally, I overlooked one of Java's main features: Object Serialization. I didn't know what it was, until I was confronted with Prevayler. Prevayler inspired me to implement a similar system into my server application. While it handles multiple data types, at most a few hundred thousand of each, it still manages to write 10 million of my custom business objects of some 8 different types (that's 80 million) to disk in less than 20 seconds! And my application uses less memory, it creates less garbage (no more JDBC overhead) and it's faster. It even saved me the costs of having to put up a seperate database server. thanks Prevayler!

A question concerning the previous comment. You say you put 80 million objects into memory? What types do you put there? How much RAM does all that occupy? Which JDK do you use?
Storing in memory is great! But, the JDK I use (Sun's) seems to use way too much memory. I know cause I tried to port my Postgres database (dumped size: only 14M; 200.000 records in all) into memory. The first 30.000 filled the default 64M that java allocates!
So, I'm looking for the JDK with the smallest memory footprint. Any suggestions?

I wish memory size wasn't an issue already. Database world would be SO much simpler and nicer :-) Wherefore art thou Athlon64? :-)
--
Atus

Just set the JVMs memory size higher, you can use close to 2GB using most JVMs (given that you have enough RAM).

While I was testing this great project, I noticed that inserting and deleting data actually "does" take pretty long. Doing queries on the data (which is in memory) is of course very VERY fast.. of course would a database with very much memory and caching be fast too... if you do a few 1000times the same query on a database (like oracle) then I guess oracle has cached this data in memory and thus it will be almost as fast as Prevayler I think.. or am I wrong here?

You're wrong, for one simple reason: there's too much code between Oracle and your application to be faster than Prevayler. There's the query planner, caching code, network protocol, JDBC driver, and a stack of interfaces between your app and your Oracle data. With Prevayler, you have Java method calls, which are fast.
-- CarlosVillela