Performance
Comparison of Database Access over the Internet
- Java Servlets
vs CGI
yang@grove.iup.edu rfgrove@computer.org
Indiana University of Pennsylvania, Computer Science Department
Stright 319, IUP, Indiana, PA 15705, USA
FAX#: (724) 357-2724
Corresponding Author: T. Andrew Yang
Keywords: Performance Comparison, Web Server, Web-Database Connectivity, Servlet, CGI
Abstract
Our recent work on database access using Java servlets (see Yang and Kim 1999) focused on the performance metering of sequential versus concurrent connection schemes between the web server and the database server. In this paper, we plan to extend the work by comparing the performance of database access between servlets and CGI scripts in the Internet environment. To guarantee a fair comparison, all the parameters in both sets of experiments are identical, except for the connectivity mechanism between the web server and the database server. The first section of this draft paper gives an introduction to the 3-Tier WWW model and its integration with Java servlets or CGI to enable database connectivity. The section is followed by a discussion of the servlets that we developed to experiment with distributed data access, and the two different types of servlet-database connection schemes (sequential vs concurrent). The findings from the earlier performance metering experiments using Java servlets are then summarized. The configuration of the performance comparison experiments using servlets and CGI are illustrated in the following section. The paper concludes with analysis of the experiments comparing the performance of servlets vs CGI.
1 Introduction
With
the increasing popularity of the Internet, especially the world wide web (WWW),
transparent access to information stored on multiple database servers has
become a desirable feature. It is the
responsibility of the web developers to design the access of data from possibly
multiple database servers across the network. CGI allows a web developer to
write CGI scripts to answer user requests and to access database servers. The Java Servlets API, which was introduced
by JavaSoft in 1997 and included in its Java Development Kit version 1.1 and
above, has been considered to be one of the most promising alternatives of
server-side development to CGI.
In
the past, we explored the integration of Java applets and JDBC (Java Database
Connectivity) for the access of database servers on the WWW (see Yang et al
1998). When JDBC is integrated with
servlets, a 3-tier client/server model is formed, with the web server
integrated with servlets being the middle tier and the database servers at the
back end. Our recent work on database
access using Java servlets (see Yang and Kim 1999) focused on the performance
metering of sequential versus concurrent connection schemes between
the web server and the database server.
In this paper, we plan to extend the work by comparing the performance
of web-server/database-server connectivity using, respectively, servlets and
CGI scripts on the Internet environment.
2 The 3-Tier Client/Server Model
Figure 1. The Applet Mechanism. Figure 2. The Servlet Mechanism.
A servlet is the server-side equivalent of an
applet. While an applet is a piece of
Java code that is transmitted from a web server to a client and then loaded by
the client to answer user requests, a servlet is a piece of Java code that is
loaded by the web server when triggered by a user request. The different mechanism underlying the
applets and the servlets technology is illustrated in Figure 1 and Figure 2,
respectively.
When
JDBC is used in a servlet, a three-tier application is created. The three-tier computing model is
illustrated in Figure 3.
The
first tier of such an application could use any number of Java-enabled
browsers. It uses either an applet or
an HTML form for user input, and it receives and displays the result of the
database query returned from the 2nd tier (the web server).
The second tier is implemented with a web server and Java
servlets that encapsulate the specific logic of the application at hand. The Java servlet is able to access the
database and returns an HTML page listing the data (see Hunter and Crawford
1998, Moss 1999).
Figure 3. A Three-Tier Client/Server Model.
The
third tier consists of databases managed by a database management system. The servlets running as part of the second
tier interact with this DBMS to indirectly retrieve and/or update the
databases. Answer returned from the
DBMS is sent to the servlet, which then forwards it to the web browser as a
HTML page.
The
mechanism underlying CGI (Common Gateway Interface) is similar to that of Java
servlets. Being a more established
method, CGI scripts have been widely used in WWW applications to provide
on-line database connectivity. Perl has
been used as the dominant scripting language with CGI, although other languages
can also be used.
The
main difference between the CGI and the Java servlets, when used as the
connectivity mechanism between a web server and a database server, is how they
are activated, respectively. A CGI
script is activated by the web server each time a request for the CGI script
arrives. In the case of Java servlets,
a servlet remains alive once it is activated.
We are interested in the impact of this difference between the two
mechanisms, with a focus on the performance of database access and the overall
throughput of the web server.
3
Measuring
the Performance of Servlet-DBMS Connections
Various
types of connections between the servlet and the database server have been
proposed. Two kinds of servlet-DBMS
connections, for instance, were described in (see Hunter and Crawford 1998):
one is a servlet using a pool of connection to the database, and the other is a
pool of servlets simultaneously connecting to a database.
In
our earlier experiments (see Yang and Kim 1999), we focused on the performance
comparison of two types of servlet-DBMS connection schemes. In the sequential connection scheme, the
servlet creates a connection (in the init(
) method) to the database server the first time the servlet is
invoked. The subsequent data access
queries sent to the servlet are forwarded to the DBMS via the same
connection. The requests are
sequentially synchronized and processed in a first-come-first-served manner.
In the
concurrent connection scheme, each
time the servlet is invoked it creates a new connection to the database server
(in the service( ) method). These connections are handled as concurrent
processes in the system. Presumably
these concurrent processes can be executed by the system simultaneously and
overlapping of execution time between these processes is possible.
Our
initial hypothesis with regard to the performance of these two types of
servlet-DBMS connections was that the concurrent version would outperform the
sequential version. The hypothesis was
based on the fact that concurrent processing of the connections would result in
earlier completion of the queries, compared to the sequential processing of
those queries. The results from the
experiments turned out to be more interesting than what our initial hypothesis
was.
Figure
4 illustrates the experimental setting we have used in this project to measure
the performance of servlets-database connectivity. A Microsoft Access database
local to the web server represents the database component of the experimental
system. To eliminate the network
overhead from the performance figures, we did not include a remote database
server in the experiments. For the
purpose of this experiment we have taken a single table named authors, which defines 9 columns beginning
with an Author ID Number as the primary key.
First name, last name, phone, address, city, state, zip code and
contract status fills the rest of the table.
There are nine tuples in the table.
The
servlets used Java’s JDBC API for database access. JDBC is the embedded SQL facility for Java (Friedrichs and Jubin
1999, Siple 1997, Yang et al 1998). It
enables a Java program to maintain database connections and manipulate the data
stored in the database via the connections.
Figure 4 also shows the sequence of events that would occur given a user
request. Each of the events is labeled
with its order in the sequence.
Figure 4. The Configuration of Experiments Measuring Servlet-DBMS Connections.
We
have designed performance-metering tools using Java and JavaScript to test the
two types of servlets. The experimenter
may enter into a text field in a HTML form
the number of connection requests to be made from this particular
client. When the 'Execute' button is
pressed, a JavaScript then sends as many connection requests to the underlying
servlet, which forwards the requests to the DBMS using either sequential or
concurrent connection scheme.
Time-stamping
was used as the measuring method. The
servlet first records the system time (the start-time) before it submits the
query to the DBMS. It then submits the
query. When the query returns, the
servlet records the system time (the completion-time) again and saves the
start-time, the completion-time, and the elapsed time into a data file. Once the experiment is completed, the data
files were fed into an analysis program.
The program calculated the sum of the elapsed time for each of the
individual queries (SOD), as well as the overall elapsed time between the start
of the first query and the completion of the last query in that experiment
(OET).
Type of Connection |
|
2 Clients |
4 Clients |
10 Clients |
15 Clients |
||||
Sequential |
Connection Requests per
Client |
20 (#1) |
100 (#3) |
20 (#5) |
100 (#7) |
20 (#9) |
100 (#11) |
20 (#13) |
100 (#15) |
Concurrent |
Connection Requests per
Client |
20 (#2) |
100 (#4) |
20 (#6) |
100 (#8) |
20 (#10) |
100 (#12) |
20 (#14) |
100 (#16) |
Table 1. Parameter Settings of the Experiments.
Table
1 shows the configurations of the experiments.
For each version of the servlets, four configurations of clients were
used: 2, 4, 10 and 15 clients. For each
configuration of clients, two different numbers of connection requests per
client were used: 20 and 100 connection requests. The complete set of experiments thus contained 16 individual
experiments.
Three performance figures (in ms), Sum of Difference
(SOD), Overall Elapsed Time (OET),
and Non-Connection-Related Time (NCRT),
were employed in comparing the performance of the sequential and concurrent
connection schemes. SOD is the sum of
all the individual connection's elapsed time incurred in that particular
experiment. OET is the elapsed time
between the beginning of the first connection and the completion of the last
connection in a particular experiment.
The major difference between these two types of
performance figures is that SOD deals with only the time spent over the
connection between the servlet and the DBMS.
OET, however, includes the SOD plus the time spent by the servlets at
other tasks such as memory management, time spent in waiting for clients'
requests, etc. (i.e., NCRT). Each of
the NCRTs is the difference between the respective OET and SOD.
number of clients |
2 |
4 |
10 |
15 |
||||
requests |
20 |
100 |
20 |
100 |
20 |
100 |
20 |
100 |
exp# |
#1 |
#3 |
#5 |
#7 |
#9 |
#11 |
#13 |
#15 |
SOD |
1,410 |
6,766 |
1,282 |
13,522 |
9,782 |
47,839 |
14,898 |
73,706 |
OET |
25,173 |
112,071 |
11,426 |
198,806 |
30,114 |
202,642 |
55,229 |
379,015 |
NCRT |
23,763 |
105,305 |
10,144 |
185,284 |
20,332 |
154,803 |
40,331 |
305,309 |
exp# |
#2 |
#4 |
#6 |
#8 |
#10 |
#12 |
#14 |
#16 |
SOD |
1,392 |
6,469 |
2,554 |
12,704 |
8,435 |
51,383 |
33,331 |
249,978 |
OET |
17,124 |
114,124 |
29,032 |
205,165 |
36,002 |
367,146 |
64,433 |
401,768 |
NCRT |
15,732 |
107,655 |
26,478 |
192,461 |
27,567 |
315,763 |
31,102 |
151,790 |
Table 2. Raw data obtained from the experiments.
Table
2 shows the raw data from the experiments.
The differences, in terms of SODs, NCRTs, and OETs, between the compatible pairs of experiments are
depicted in Figures 5, 6, and 7 respectively.
Compatible pairs of experiments are those with the same number of
clients and the same number of connection requests. The control parameter between a compatible pair of experiments is
the type of connection.
A.
Connection-Related Time (SOD)
Figure 5. Comparison of connection time (SOD)
It
was observed from the raw data that it took in average 30 ms per database connection, given the simple SELECT query we
used. As shown in Figure 5, among
experiments with the same connection scheme but with different number of
clients, the SODs are basically proportional to the 'total number of
connections'. An exception is when the
number of clients is 15 and the connection scheme is concurrent (#14 and #16), where the connection time increased
significantly. We had noticed from the
collected data that some of the connections in the two experiments took
hundreds or even thousands of ms before completion. A plausible explanation is that, due to the large number of
concurrent channels between the servlet and the DBMS, the DBMS was not able to
service some of the requests in a timely manner, resulting in poor overall
quality of service.
Between
compatible pairs of experiments, the times spent over servlet-database
connection were quite compatible when the number of clients were 2, 4, or
10. When the number of clients
increased to 15, their respective performance became dramatically different,
due to the significant increase of overhead placed over the DBMS by the large
number of concurrent connections, as indicated earlier.
B.
NCRT and OET
While
SOD measures the time spent by the servlet(s) over database connections, NCRT
includes time spent by the servlet(s) in completing the processing of all the
user requests. These times include the
time incurred to the internal processing of the servlets, such as function
calls, memory management, etc., as well as time spent by the servlets when
waiting for the arrival of user requests.
Therefore, factors such as overhead placed upon the underlying
processors of the clients, the network delay, etc., would have some impact on
NCRT.
Figure 6. Comparison of Non-Connection-Related Time (NCRT)
Figure 7. Comparison of Overall
Elapsed Time (OET)
As
depicted in Figure 6, significantly higher NCRTs were incurred to the sequential servlet while the number of
clients reach 15. This phenomena, we believe,
was caused by the large number of user requests (1500) that needed to be
scheduled by the servlet to share the only connection to the DBMS.
When
the number of clients was 15 and the number of requests per client was 100, the
NCRT of the concurrent servlet
(exp#16) dropped significantly. Our
explanation is that other factors mentioned above (client processors, network
delay, etc.) had contributed to this phenomena.
Figure
7 shows the Overall Elapsed Time (OET) incurred by the two servlets. In both cases the sequential servlet outperformed the concurrent servlet.
sequential connection |
+
nearly uniform overhead upon the DBMS - very high servlet overhead at high traffic |
concurrent connection |
+ comparatively
lower servlet overhead at high traffic -
very high DBMS overhead at high traffic |
Table 3. Summary of the trade-offs of the two connection schemes.
An
important lesson learned from our earlier experiments was that, contrary to the
common belief in the superiority of concurrent processing over sequential
processing, the actual performance of concurrent computing depends on various
parameters in the distributed environment.
Table 3 summarized the pros and cons of both connection schemes.
Based
on the strength and weakness of the two connection schemes, we have made the
following observations:
·
When
the number of connection requests becomes large[1],
a high performance database server is desirable when the concurrent scheme is employed by the servlet.
·
Similarly,
at high traffic, a high performance web server is desirable if the sequential scheme is employed by the
servlet.
·
When
the database server at the back end is not powerful enough, a sequential servlet is desirable.
4
Measuring
the Performance of Servlet vs. CGI-Script DBMS Connections
The
main purpose of the experiments is to compare the performance of Java servlets
vs CGI scripts with regard to database access over the Internet. In addition, we re-configured the experiment
parameters such that some of the findings from the earlier sets of experiments,
in which only servlets were used, may be verified. A major change to the parameters of these new sets of experiments
is the number of clients used. As in
the earlier experiments, for each different number of clients, two kinds of
requests were made: one was 20 and the other was 100 requests per client. The sequential and concurrent
schemes remained part of the parameters.
The configuration of the system is depicted in Figure 4, except that the connection module can be either Java servlets or CGI in the respective set of experiments. When Java servlets is used as the connection module, JRUN was used as the servlet engine. MySQL is used as the DBMS in both sets of experiments. CGI scripting is implemented using Perl 5, along with the Perl/MySQL driver module 1.2209. We use a Pentium II machine running RedHat Linux 6.0 as the server. The machine runs Apache as the web server. Figure 8. indicates the hardware and software configuration used in these experiments.
Figure 8. System Configurations
In this experiment, the earlier trials using servlets to access a database were repeated, and the trials were extended by using CGI scripts as an alternative access mechanism. In the case of Servlets, both sequential and concurrent connections were made, as described in Section 3. In the case of CGI, database access requests were submitted without synchronization at the level of the CGI scripts. Although the main interest of comparison was between Servlets and CGI, these experiments were also intended to validate the relative performance between sequential and concurrent database access given by the earlier experiments, though no direct comparison is possible since the server platform was not the same.
Table 4 shows the configuration of the experiments with respect to the number of clients and the number of database requests per client. Each client consisted of a desktop PC running Netscape with a unique network connection. In the case of Servlets, the multiple requests were generated through server-side execution of multiple <servlet> tags in the requested document. In the case of CGI, the multiple requests were implemented by embedding the database operations (connect( ), prepare( ), execute( ), et al) within a loop of a CGI script. For each experiment, the same three data (SOD, OET, and NCRT) were collected or computed. Each individual experiment is assigned a unique experiment number.
Clients: |
2 |
4 |
10 |
15 |
20 |
|||||
Connections
Per Client: |
20 |
100 |
20 |
100 |
20 |
100 |
20 |
100 |
20 |
100 |
Sequential: |
#1 |
#3 |
#5 |
#7 |
#9 |
#11 |
#13 |
#15 |
#17 |
#19 |
Concurrent: |
#2 |
#4 |
#6 |
#8 |
#10 |
#12 |
#14 |
#16 |
#18 |
#20 |
CGI |
#1c |
#3c |
#5c |
#7c |
#9c |
#11c |
#13c |
#15c |
#17c |
#19c |
Table 4: Experiments Configuration
Table
5 shows the raw data from the experiments. The differences between the three
sets of experiments are shown in Figures 9 and 10. The Y-axis values are in milliseconds for the figures.
Sequential Servlet |
#1 |
#3 |
#5 |
#7 |
#9 |
#11 |
#13 |
#15 |
#17 |
#19 |
SOD |
1885 |
8854 |
3988 |
18205 |
9774 |
47699 |
14078 |
71740 |
19348 |
94459 |
OET |
4486 |
21176 |
9539 |
45457 |
22476 |
116914 |
34103 |
178934 |
45687 |
241582 |
NCRT |
2601 |
12322 |
5551 |
27252 |
12702 |
69215 |
20025 |
107194 |
26339 |
147123 |
Concurrent Servlet |
#2 |
#4 |
#6 |
#8 |
#10 |
#12 |
#14 |
#16 |
#18 |
#20 |
SOD |
2249 |
9663 |
4601 |
22407 |
11925 |
59673 |
16955 |
87863 |
24515 |
120499 |
OET |
5800 |
24707 |
10377 |
51419 |
27094 |
136861 |
81362 |
215367 |
55570 |
277005 |
NCRT |
3551 |
15044 |
5776 |
29012 |
15169 |
77188 |
64407 |
127504 |
31055 |
156506 |
CGI |
#1 |
#3 |
#5 |
#7 |
#9 |
#11 |
#13 |
#15 |
#17 |
#19 |
SOD |
135 |
483 |
367 |
1310 |
2500 |
5305 |
2053 |
5912 |
5989 |
29602 |
OET |
1369 |
23591 |
4924 |
22769 |
6356 |
25900 |
24997 |
47854 |
32874 |
65450 |
NCRT |
1234 |
23108 |
4557 |
21459 |
3856 |
20595 |
22944 |
41942 |
26885 |
35848 |
Table 5: Raw Data Collected from the Experiment
Figure 9 shows a comparison of SOD for sequential servlet vs. concurrent servlet vs CGI access. In the case of servlets, concurrent access is slightly more expensive than sequential access. With 20 clients issuing 100 requests each, for example, the difference is 120 vs. 94 seconds, about half a minute. Though the difference varies over the experiments, concurrent access is consistently more expensive.
The comparison between servlet performance and CGI were significant. With 20 clients issuing 100 requests each, for example, the performance of CGI script was 3 or 4 times better than the servlets. CGI also outperforms servlets when the number of clients increased, in both the 20 requests per client and the 100 requests per client cases.
Figure 9 - Comparison of SOD
Figure
10 shows a comparison of NCRT. Generally, NCRT for concurrent access is
slightly higher than that for sequential access. The data point for 15 clients
/ 20 connections appears to be an aberration, possibly caused by an unexpected
and sudden server load at the time of this trial. Subsequent trials across the entire range of parameters confirmed
this suspicion. In the case of NCRT,
CGI also outperforms both the sequential and the concurrent servlets. The difference became more significant in
the case of 100 requests per client when the number of clients increased.
Figure 10 - Comparison of NCRT
References
Friedrichs, J., Jubin, H.
(1999) Java Thin-Client Programming for a
Network Computing Environment, Prentice Hall.
Hunter, J., Crawford, W.
(1998) JAVA Servlet Programming,
O’Reilly & Associates, Inc., Sebastopol, CA.
Moss, K. (1999) Java Servlets With CDROM (2nd
Ed.), McGraw-Hill Book Company.
Siple, M.D. (1997) The Complete Guide to Java Database
Programming, McGraw-Hill Book Company.
Yang, A., Kim, J. (1999)
Performance Metering of Distributed Access Using Java Servlets, Proceedings of the ADBIS'99 Conference
(Advances in Databases and Information Systems), University of Maribor,
Slovenia.
Yang, A., Linn, J., Quadrato,
D. (1998) Developing Integrated Web and Database Applications Using JAVA
Applets and JDBC Drivers, Proceedings of
the 29th ACM SIGCSE Technical Symposium,
Atlanta, GA.
[1] The actual threshold would depend on the power of the database server. In our experiments, when the total number of connection requests reached 1,500, the underlying ACCESS database showed sign of deteriorated quality of service.