Authors’ Instructions for the Preparation

Performance Comparison of Database Access over the Internet

- Java Servlets vs CGI

T. Andrew Yang Ralph F. Grove

yang@grove.iup.edu rfgrove@computer.org

Indiana University of Pennsylvania, Computer Science Department

Stright 319, IUP, Indiana, PA 15705, USA

FAX#: (724) 357-2724

Corresponding Author: T. Andrew Yang

Keywords: Performance Comparison, Web Server, Web-Database Connectivity, Servlet, CGI

Abstract

Our recent work on database access using Java servlets (see Yang and Kim 1999) focused on the performance metering of sequential versus concurrent connection schemes between the web server and the database server. In this paper, we plan to extend the work by comparing the performance of database access between servlets and CGI scripts in the Internet environment. To guarantee a fair comparison, all the parameters in both sets of experiments are identical, except for the connectivity mechanism between the web server and the database server. The first section of this draft paper gives an introduction to the 3-Tier WWW model and its integration with Java servlets or CGI to enable database connectivity. The section is followed by a discussion of the servlets that we developed to experiment with distributed data access, and the two different types of servlet-database connection schemes (sequential vs concurrent). The findings from the earlier performance metering experiments using Java servlets are then summarized. The configuration of the performance comparison experiments using servlets and CGI are illustrated in the following section. The paper concludes with analysis of the experiments comparing the performance of servlets vs CGI.

1 Introduction

With the increasing popularity of the Internet, especially the world wide web (WWW), transparent access to information stored on multiple database servers has become a desirable feature. It is the responsibility of the web developers to design the access of data from possibly multiple database servers across the network. CGI allows a web developer to write CGI scripts to answer user requests and to access database servers. The Java Servlets API, which was introduced by JavaSoft in 1997 and included in its Java Development Kit version 1.1 and above, has been considered to be one of the most promising alternatives of server-side development to CGI.

In the past, we explored the integration of Java applets and JDBC (Java Database Connectivity) for the access of database servers on the WWW (see Yang et al 1998). When JDBC is integrated with servlets, a 3-tier client/server model is formed, with the web server integrated with servlets being the middle tier and the database servers at the back end. Our recent work on database access using Java servlets (see Yang and Kim 1999) focused on the performance metering of sequential versus concurrent connection schemes between the web server and the database server. In this paper, we plan to extend the work by comparing the performance of web-server/database-server connectivity using, respectively, servlets and CGI scripts on the Internet environment.

2 The 3-Tier Client/Server Model

Figure 1. The Applet Mechanism. Figure 2. The Servlet Mechanism.

A servlet is the server-side equivalent of an applet. While an applet is a piece of Java code that is transmitted from a web server to a client and then loaded by the client to answer user requests, a servlet is a piece of Java code that is loaded by the web server when triggered by a user request. The different mechanism underlying the applets and the servlets technology is illustrated in Figure 1 and Figure 2, respectively.

2.1 Servlets and Databases

When JDBC is used in a servlet, a three-tier application is created. The three-tier computing model is illustrated in Figure 3.

The first tier of such an application could use any number of Java-enabled browsers. It uses either an applet or an HTML form for user input, and it receives and displays the result of the database query returned from the 2^nd tier (the web server).

The second tier is implemented with a web server and Java servlets that encapsulate the specific logic of the application at hand. The Java servlet is able to access the database and returns an HTML page listing the data (see Hunter and Crawford 1998, Moss 1999).

Figure 3. A Three-Tier Client/Server Model.

The third tier consists of databases managed by a database management system. The servlets running as part of the second tier interact with this DBMS to indirectly retrieve and/or update the databases. Answer returned from the DBMS is sent to the servlet, which then forwards it to the web browser as a HTML page.

2.2 CGI and Database Connectivity

The mechanism underlying CGI (Common Gateway Interface) is similar to that of Java servlets. Being a more established method, CGI scripts have been widely used in WWW applications to provide on-line database connectivity. Perl has been used as the dominant scripting language with CGI, although other languages can also be used.

The main difference between the CGI and the Java servlets, when used as the connectivity mechanism between a web server and a database server, is how they are activated, respectively. A CGI script is activated by the web server each time a request for the CGI script arrives. In the case of Java servlets, a servlet remains alive once it is activated. We are interested in the impact of this difference between the two mechanisms, with a focus on the performance of database access and the overall throughput of the web server.

3 Measuring the Performance of Servlet-DBMS Connections

Various types of connections between the servlet and the database server have been proposed. Two kinds of servlet-DBMS connections, for instance, were described in (see Hunter and Crawford 1998): one is a servlet using a pool of connection to the database, and the other is a pool of servlets simultaneously connecting to a database.

In our earlier experiments (see Yang and Kim 1999), we focused on the performance comparison of two types of servlet-DBMS connection schemes. In the sequential connection scheme, the servlet creates a connection (in the init( ) method) to the database server the first time the servlet is invoked. The subsequent data access queries sent to the servlet are forwarded to the DBMS via the same connection. The requests are sequentially synchronized and processed in a first-come-first-served manner.

In the concurrent connection scheme, each time the servlet is invoked it creates a new connection to the database server (in the service( ) method). These connections are handled as concurrent processes in the system. Presumably these concurrent processes can be executed by the system simultaneously and overlapping of execution time between these processes is possible.

Our initial hypothesis with regard to the performance of these two types of servlet-DBMS connections was that the concurrent version would outperform the sequential version. The hypothesis was based on the fact that concurrent processing of the connections would result in earlier completion of the queries, compared to the sequential processing of those queries. The results from the experiments turned out to be more interesting than what our initial hypothesis was.

3.1 Parameters of the Experiments

Figure 4 illustrates the experimental setting we have used in this project to measure the performance of servlets-database connectivity. A Microsoft Access database local to the web server represents the database component of the experimental system. To eliminate the network overhead from the performance figures, we did not include a remote database server in the experiments. For the purpose of this experiment we have taken a single table named authors, which defines 9 columns beginning with an Author ID Number as the primary key. First name, last name, phone, address, city, state, zip code and contract status fills the rest of the table. There are nine tuples in the table.

The servlets used Java’s JDBC API for database access. JDBC is the embedded SQL facility for Java (Friedrichs and Jubin 1999, Siple 1997, Yang et al 1998). It enables a Java program to maintain database connections and manipulate the data stored in the database via the connections. Figure 4 also shows the sequence of events that would occur given a user request. Each of the events is labeled with its order in the sequence.

Figure 4. The Configuration of Experiments Measuring Servlet-DBMS Connections.

We have designed performance-metering tools using Java and JavaScript to test the two types of servlets. The experimenter may enter into a text field in a HTML form the number of connection requests to be made from this particular client. When the 'Execute' button is pressed, a JavaScript then sends as many connection requests to the underlying servlet, which forwards the requests to the DBMS using either sequential or concurrent connection scheme.

Time-stamping was used as the measuring method. The servlet first records the system time (the start-time) before it submits the query to the DBMS. It then submits the query. When the query returns, the servlet records the system time (the completion-time) again and saves the start-time, the completion-time, and the elapsed time into a data file. Once the experiment is completed, the data files were fed into an analysis program. The program calculated the sum of the elapsed time for each of the individual queries (SOD), as well as the overall elapsed time between the start of the first query and the completion of the last query in that experiment (OET).

Type of

Connection

2 Clients

4 Clients

10 Clients

15 Clients

Sequential

Connection Requests per Client

20 (#1)

100 (#3)

20 (#5)

100 (#7)

20 (#9)

100 (#11)

20 (#13)

100 (#15)

Concurrent

Connection Requests per Client

20 (#2)

100 (#4)

20 (#6)

100 (#8)

20 (#10)

100 (#12)

20 (#14)

100 (#16)

Table 1. Parameter Settings of the Experiments.

Table 1 shows the configurations of the experiments. For each version of the servlets, four configurations of clients were used: 2, 4, 10 and 15 clients. For each configuration of clients, two different numbers of connection requests per client were used: 20 and 100 connection requests. The complete set of experiments thus contained 16 individual experiments.

Three performance figures (in ms), Sum of Difference (SOD), Overall Elapsed Time (OET), and Non-Connection-Related Time (NCRT), were employed in comparing the performance of the sequential and concurrent connection schemes. SOD is the sum of all the individual connection's elapsed time incurred in that particular experiment. OET is the elapsed time between the beginning of the first connection and the completion of the last connection in a particular experiment.

The major difference between these two types of performance figures is that SOD deals with only the time spent over the connection between the servlet and the DBMS. OET, however, includes the SOD plus the time spent by the servlets at other tasks such as memory management, time spent in waiting for clients' requests, etc. (i.e., NCRT). Each of the NCRTs is the difference between the respective OET and SOD.

3.2 Analysis of the Experiments

number of clients	2		4		10		15
requests	20	100	20	100	20	100	20	100
exp#	#1	#3	#5	#7	#9	#11	#13	#15
SOD	1,410	6,766	1,282	13,522	9,782	47,839	14,898	73,706
OET	25,173	112,071	11,426	198,806	30,114	202,642	55,229	379,015
NCRT	23,763	105,305	10,144	185,284	20,332	154,803	40,331	305,309
exp#	#2	#4	#6	#8	#10	#12	#14	#16
SOD	1,392	6,469	2,554	12,704	8,435	51,383	33,331	249,978
OET	17,124	114,124	29,032	205,165	36,002	367,146	64,433	401,768
NCRT	15,732	107,655	26,478	192,461	27,567	315,763	31,102	151,790

Table 2. Raw data obtained from the experiments.

Table 2 shows the raw data from the experiments. The differences, in terms of SODs, NCRTs, and OETs, between the compatible pairs of experiments are depicted in Figures 5, 6, and 7 respectively. Compatible pairs of experiments are those with the same number of clients and the same number of connection requests. The control parameter between a compatible pair of experiments is the type of connection.

A. Connection-Related Time (SOD)

Figure 5. Comparison of connection time (SOD)

It was observed from the raw data that it took in average 30 ms per database connection, given the simple SELECT query we used. As shown in Figure 5, among experiments with the same connection scheme but with different number of clients, the SODs are basically proportional to the 'total number of connections'. An exception is when the number of clients is 15 and the connection scheme is concurrent (#14 and #16), where the connection time increased significantly. We had noticed from the collected data that some of the connections in the two experiments took hundreds or even thousands of ms before completion. A plausible explanation is that, due to the large number of concurrent channels between the servlet and the DBMS, the DBMS was not able to service some of the requests in a timely manner, resulting in poor overall quality of service.

Between compatible pairs of experiments, the times spent over servlet-database connection were quite compatible when the number of clients were 2, 4, or 10. When the number of clients increased to 15, their respective performance became dramatically different, due to the significant increase of overhead placed over the DBMS by the large number of concurrent connections, as indicated earlier.

B. NCRT and OET

While SOD measures the time spent by the servlet(s) over database connections, NCRT includes time spent by the servlet(s) in completing the processing of all the user requests. These times include the time incurred to the internal processing of the servlets, such as function calls, memory management, etc., as well as time spent by the servlets when waiting for the arrival of user requests. Therefore, factors such as overhead placed upon the underlying processors of the clients, the network delay, etc., would have some impact on NCRT.

Figure 6. Comparison of Non-Connection-Related Time (NCRT)

Figure 7. Comparison of Overall Elapsed Time (OET)

As depicted in Figure 6, significantly higher NCRTs were incurred to the sequential servlet while the number of clients reach 15. This phenomena, we believe, was caused by the large number of user requests (1500) that needed to be scheduled by the servlet to share the only connection to the DBMS.

When the number of clients was 15 and the number of requests per client was 100, the NCRT of the concurrent servlet (exp#16) dropped significantly. Our explanation is that other factors mentioned above (client processors, network delay, etc.) had contributed to this phenomena.

Figure 7 shows the Overall Elapsed Time (OET) incurred by the two servlets. In both cases the sequential servlet outperformed the concurrent servlet.

3.3 Lessons Learned from the Earlier Experiments

sequential connection

+ nearly uniform overhead upon the DBMS

- very high servlet overhead at high traffic

concurrent connection

+ comparatively lower servlet overhead at high traffic

- very high DBMS overhead at high traffic

Table 3. Summary of the trade-offs of the two connection schemes.

An important lesson learned from our earlier experiments was that, contrary to the common belief in the superiority of concurrent processing over sequential processing, the actual performance of concurrent computing depends on various parameters in the distributed environment. Table 3 summarized the pros and cons of both connection schemes.

Based on the strength and weakness of the two connection schemes, we have made the following observations:

· When the number of connection requests becomes large^[1], a high performance database server is desirable when the concurrent scheme is employed by the servlet.

· Similarly, at high traffic, a high performance web server is desirable if the sequential scheme is employed by the servlet.

· When the database server at the back end is not powerful enough, a sequential servlet is desirable.

4 Measuring the Performance of Servlet vs. CGI-Script DBMS Connections

The main purpose of the experiments is to compare the performance of Java servlets vs CGI scripts with regard to database access over the Internet. In addition, we re-configured the experiment parameters such that some of the findings from the earlier sets of experiments, in which only servlets were used, may be verified. A major change to the parameters of these new sets of experiments is the number of clients used. As in the earlier experiments, for each different number of clients, two kinds of requests were made: one was 20 and the other was 100 requests per client. The sequential and concurrent schemes remained part of the parameters.

4.1 Configuration of the Experiments

The configuration of the system is depicted in Figure 4, except that the connection module can be either Java servlets or CGI in the respective set of experiments. When Java servlets is used as the connection module, JRUN was used as the servlet engine. MySQL is used as the DBMS in both sets of experiments. CGI scripting is implemented using Perl 5, along with the Perl/MySQL driver module 1.2209. We use a Pentium II machine running RedHat Linux 6.0 as the server. The machine runs Apache as the web server. Figure 8. indicates the hardware and software configuration used in these experiments.

Figure 8. System Configurations

In this experiment, the earlier trials using servlets to access a database were repeated, and the trials were extended by using CGI scripts as an alternative access mechanism. In the case of Servlets, both sequential and concurrent connections were made, as described in Section 3. In the case of CGI, database access requests were submitted without synchronization at the level of the CGI scripts. Although the main interest of comparison was between Servlets and CGI, these experiments were also intended to validate the relative performance between sequential and concurrent database access given by the earlier experiments, though no direct comparison is possible since the server platform was not the same.

Table 4 shows the configuration of the experiments with respect to the number of clients and the number of database requests per client. Each client consisted of a desktop PC running Netscape with a unique network connection. In the case of Servlets, the multiple requests were generated through server-side execution of multiple <servlet> tags in the requested document. In the case of CGI, the multiple requests were implemented by embedding the database operations (connect( ), prepare( ), execute( ), et al) within a loop of a CGI script. For each experiment, the same three data (SOD, OET, and NCRT) were collected or computed. Each individual experiment is assigned a unique experiment number.

Clients:	2		4		10		15		20
Connections Per Client:	20	100	20	100	20	100	20	100	20	100
Sequential:	#1	#3	#5	#7	#9	#11	#13	#15	#17	#19
Concurrent:	#2	#4	#6	#8	#10	#12	#14	#16	#18	#20
CGI	#1c	#3c	#5c	#7c	#9c	#11c	#13c	#15c	#17c	#19c

Table 4: Experiments Configuration

4.2 Analysis of the Experiments

Table 5 shows the raw data from the experiments. The differences between the three sets of experiments are shown in Figures 9 and 10. The Y-axis values are in milliseconds for the figures.

Sequential Servlet	#1	#3	#5	#7	#9	#11	#13	#15	#17	#19
SOD	1885	8854	3988	18205	9774	47699	14078	71740	19348	94459
OET	4486	21176	9539	45457	22476	116914	34103	178934	45687	241582
NCRT	2601	12322	5551	27252	12702	69215	20025	107194	26339	147123
Concurrent Servlet	#2	#4	#6	#8	#10	#12	#14	#16	#18	#20
SOD	2249	9663	4601	22407	11925	59673	16955	87863	24515	120499
OET	5800	24707	10377	51419	27094	136861	81362	215367	55570	277005
NCRT	3551	15044	5776	29012	15169	77188	64407	127504	31055	156506
CGI	#1	#3	#5	#7	#9	#11	#13	#15	#17	#19
SOD	135	483	367	1310	2500	5305	2053	5912	5989	29602
OET	1369	23591	4924	22769	6356	25900	24997	47854	32874	65450
NCRT	1234	23108	4557	21459	3856	20595	22944	41942	26885	35848

Table 5: Raw Data Collected from the Experiment

Figure 9 shows a comparison of SOD for sequential servlet vs. concurrent servlet vs CGI access. In the case of servlets, concurrent access is slightly more expensive than sequential access. With 20 clients issuing 100 requests each, for example, the difference is 120 vs. 94 seconds, about half a minute. Though the difference varies over the experiments, concurrent access is consistently more expensive.

The comparison between servlet performance and CGI were significant. With 20 clients issuing 100 requests each, for example, the performance of CGI script was 3 or 4 times better than the servlets. CGI also outperforms servlets when the number of clients increased, in both the 20 requests per client and the 100 requests per client cases.

Figure 9 - Comparison of SOD

Figure 10 shows a comparison of NCRT. Generally, NCRT for concurrent access is slightly higher than that for sequential access. The data point for 15 clients / 20 connections appears to be an aberration, possibly caused by an unexpected and sudden server load at the time of this trial. Subsequent trials across the entire range of parameters confirmed this suspicion. In the case of NCRT, CGI also outperforms both the sequential and the concurrent servlets. The difference became more significant in the case of 100 requests per client when the number of clients increased.

Figure 10 - Comparison of NCRT

References

Friedrichs, J., Jubin, H. (1999) Java Thin-Client Programming for a Network Computing Environment, Prentice Hall.

Hunter, J., Crawford, W. (1998) JAVA Servlet Programming, O’Reilly & Associates, Inc., Sebastopol, CA.

Moss, K. (1999) Java Servlets With CDROM (2^nd Ed.), McGraw-Hill Book Company.

Siple, M.D. (1997) The Complete Guide to Java Database Programming, McGraw-Hill Book Company.

Yang, A., Kim, J. (1999) Performance Metering of Distributed Access Using Java Servlets, Proceedings of the ADBIS'99 Conference (Advances in Databases and Information Systems), University of Maribor, Slovenia.

Yang, A., Linn, J., Quadrato, D. (1998) Developing Integrated Web and Database Applications Using JAVA Applets and JDBC Drivers, Proceedings of the 29th ACM SIGCSE Technical Symposium, Atlanta, GA.

^[1] The actual threshold would depend on the power of the database server. In our experiments, when the total number of connection requests reached 1,500, the underlying ACCESS database showed sign of deteriorated quality of service.