Performance Comparison of Database Access over the Internet

- Java Servlets vs CGI

   T. Andrew Yang                                         Ralph F. Grove

         yang@grove.iup.edu                                   rfgrove@computer.org

 

Indiana University of Pennsylvania, Computer Science Department

Stright 319, IUP, Indiana, PA 15705, USA

FAX#: (724) 357-2724

 

Corresponding Author: T. Andrew Yang

 

Keywords: Performance Comparison, Web Server, Web-Database Connectivity, Servlet, CGI

 

Abstract

Our recent work on database access using Java servlets (see Yang and Kim 1999) focused on the performance metering of sequential versus concurrent connection schemes between the web server and the database server.  In this paper, we plan to extend the work by comparing the performance of database access between servlets and CGI scripts in the Internet environment.  To guarantee a fair comparison, all the parameters in both sets of experiments are identical, except for the connectivity mechanism between the web server and the database server.  The first section of this draft paper gives an introduction to the 3-Tier WWW model and its integration with Java servlets or CGI to enable database connectivity.  The section is followed by a discussion of the servlets that we developed to experiment with distributed data access, and the two different types of servlet-database connection schemes (sequential vs concurrent).  The findings from the earlier performance metering experiments using Java servlets are then summarized.  The configuration of the performance comparison experiments using servlets and CGI are illustrated in the following section.  The paper concludes with analysis of the experiments comparing the performance of servlets vs CGI.

1   Introduction

With the increasing popularity of the Internet, especially the world wide web (WWW), transparent access to information stored on multiple database servers has become a desirable feature.  It is the responsibility of the web developers to design the access of data from possibly multiple database servers across the network. CGI allows a web developer to write CGI scripts to answer user requests and to access database servers.  The Java Servlets API, which was introduced by JavaSoft in 1997 and included in its Java Development Kit version 1.1 and above, has been considered to be one of the most promising alternatives of server-side development to CGI.

In the past, we explored the integration of Java applets and JDBC (Java Database Connectivity) for the access of database servers on the WWW (see Yang et al 1998).  When JDBC is integrated with servlets, a 3-tier client/server model is formed, with the web server integrated with servlets being the middle tier and the database servers at the back end.  Our recent work on database access using Java servlets (see Yang and Kim 1999) focused on the performance metering of sequential versus concurrent connection schemes between the web server and the database server.  In this paper, we plan to extend the work by comparing the performance of web-server/database-server connectivity using, respectively, servlets and CGI scripts on the Internet environment. 

 


2   The 3-Tier Client/Server Model

Figure 1. The Applet Mechanism.                                 Figure 2. The Servlet Mechanism.

A servlet is the server-side equivalent of an applet.  While an applet is a piece of Java code that is transmitted from a web server to a client and then loaded by the client to answer user requests, a servlet is a piece of Java code that is loaded by the web server when triggered by a user request.  The different mechanism underlying the applets and the servlets technology is illustrated in Figure 1 and Figure 2, respectively.

2.1     Servlets and Databases

When JDBC is used in a servlet, a three-tier application is created.  The three-tier computing model is illustrated in Figure 3.

The first tier of such an application could use any number of Java-enabled browsers.  It uses either an applet or an HTML form for user input, and it receives and displays the result of the database query returned from the 2nd tier (the web server).


          The second tier is implemented with a web server and Java servlets that encapsulate the specific logic of the application at hand.  The Java servlet is able to access the database and returns an HTML page listing the data (see Hunter and Crawford 1998, Moss 1999).

Figure 3. A Three-Tier Client/Server Model.

The third tier consists of databases managed by a database management system.  The servlets running as part of the second tier interact with this DBMS to indirectly retrieve and/or update the databases.  Answer returned from the DBMS is sent to the servlet, which then forwards it to the web browser as a HTML page.

2.2     CGI and Database Connectivity

The mechanism underlying CGI (Common Gateway Interface) is similar to that of Java servlets.  Being a more established method, CGI scripts have been widely used in WWW applications to provide on-line database connectivity.  Perl has been used as the dominant scripting language with CGI, although other languages can also be used. 

The main difference between the CGI and the Java servlets, when used as the connectivity mechanism between a web server and a database server, is how they are activated, respectively.  A CGI script is activated by the web server each time a request for the CGI script arrives.  In the case of Java servlets, a servlet remains alive once it is activated.  We are interested in the impact of this difference between the two mechanisms, with a focus on the performance of database access and the overall throughput of the web server.

 

3          Measuring the Performance of Servlet-DBMS Connections

Various types of connections between the servlet and the database server have been proposed.  Two kinds of servlet-DBMS connections, for instance, were described in (see Hunter and Crawford 1998): one is a servlet using a pool of connection to the database, and the other is a pool of servlets simultaneously connecting to a database.

In our earlier experiments (see Yang and Kim 1999), we focused on the performance comparison of two types of servlet-DBMS connection schemes. In the sequential connection scheme, the servlet creates a connection (in the init( ) method) to the database server the first time the servlet is invoked.  The subsequent data access queries sent to the servlet are forwarded to the DBMS via the same connection.  The requests are sequentially synchronized and processed in a first-come-first-served manner.

In the concurrent connection scheme, each time the servlet is invoked it creates a new connection to the database server (in the service( ) method).  These connections are handled as concurrent processes in the system.  Presumably these concurrent processes can be executed by the system simultaneously and overlapping of execution time between these processes is possible. 

Our initial hypothesis with regard to the performance of these two types of servlet-DBMS connections was that the concurrent version would outperform the sequential version.  The hypothesis was based on the fact that concurrent processing of the connections would result in earlier completion of the queries, compared to the sequential processing of those queries.  The results from the experiments turned out to be more interesting than what our initial hypothesis was.

 

3.1      Parameters of the Experiments

Figure 4 illustrates the experimental setting we have used in this project to measure the performance of servlets-database connectivity. A Microsoft Access database local to the web server represents the database component of the experimental system.  To eliminate the network overhead from the performance figures, we did not include a remote database server in the experiments.  For the purpose of this experiment we have taken a single table named authors, which defines 9 columns beginning with an Author ID Number as the primary key.  First name, last name, phone, address, city, state, zip code and contract status fills the rest of the table.  There are nine tuples in the table.

The servlets used Java’s JDBC API for database access.   JDBC is the embedded SQL facility for Java (Friedrichs and Jubin 1999, Siple 1997, Yang et al 1998).  It enables a Java program to maintain database connections and manipulate the data stored in the database via the connections.  Figure 4 also shows the sequence of events that would occur given a user request.  Each of the events is labeled with its order in the sequence.

 

Figure 4. The Configuration of Experiments Measuring Servlet-DBMS Connections.

 

We have designed performance-metering tools using Java and JavaScript to test the two types of servlets.  The experimenter may enter into a text field in a HTML form  the number of connection requests to be made from this particular client.  When the 'Execute' button is pressed, a JavaScript then sends as many connection requests to the underlying servlet, which forwards the requests to the DBMS using either sequential or concurrent connection scheme.

Time-stamping was used as the measuring method.  The servlet first records the system time (the start-time) before it submits the query to the DBMS.  It then submits the query.  When the query returns, the servlet records the system time (the completion-time) again and saves the start-time, the completion-time, and the elapsed time into a data file.  Once the experiment is completed, the data files were fed into an analysis program.  The program calculated the sum of the elapsed time for each of the individual queries (SOD), as well as the overall elapsed time between the start of the first query and the completion of the last query in that experiment (OET).

Type of

Connection

 

 

2 Clients

 

4 Clients

 

10 Clients

 

15 Clients

 

Sequential

Connection Requests per Client

20 (#1)

100 (#3)

20 (#5)

100 (#7)

20 (#9)

100 (#11)

20 (#13)

100 (#15)

 

Concurrent

Connection Requests per Client

20 (#2)

100 (#4)

20 (#6)

100 (#8)

20 (#10)

100 (#12)

20 (#14)

100 (#16)

Table 1. Parameter Settings of the Experiments.

Table 1 shows the configurations of the experiments.  For each version of the servlets, four configurations of clients were used: 2, 4, 10 and 15 clients.  For each configuration of clients, two different numbers of connection requests per client were used: 20 and 100 connection requests.  The complete set of experiments thus contained 16 individual experiments.

Three performance figures (in ms), Sum of Difference (SOD), Overall Elapsed Time (OET), and Non-Connection-Related Time (NCRT), were employed in comparing the performance of the sequential and concurrent connection schemes.  SOD is the sum of all the individual connection's elapsed time incurred in that particular experiment.  OET is the elapsed time between the beginning of the first connection and the completion of the last connection in a particular experiment.  

The major difference between these two types of performance figures is that SOD deals with only the time spent over the connection between the servlet and the DBMS.  OET, however, includes the SOD plus the time spent by the servlets at other tasks such as memory management, time spent in waiting for clients' requests, etc. (i.e., NCRT).  Each of the NCRTs is the difference between the respective OET and SOD.


3.2      Analysis of the Experiments

number of clients

2

4

10

15

requests

20

100

20

100

20

100

20

100

exp#

#1

#3

#5

#7

#9

#11

#13

#15

SOD

1,410

6,766

1,282

13,522

9,782

47,839

14,898

73,706

OET

25,173

112,071

11,426

198,806

30,114

202,642

55,229

379,015

NCRT

23,763

105,305

10,144

185,284

20,332

154,803

40,331

305,309

exp#

#2

#4

#6

#8

#10

#12

#14

#16

SOD

1,392

6,469

2,554

12,704

8,435

51,383

33,331

249,978

OET

17,124

114,124

29,032

205,165

36,002

367,146

64,433

401,768

NCRT

15,732

107,655

26,478

192,461

27,567

315,763

31,102

151,790

 

 

Table 2.  Raw data obtained from the experiments.

 

Table 2 shows the raw data from the experiments.  The differences, in terms of SODs, NCRTs, and OETs, between the compatible pairs of experiments are depicted in Figures 5, 6, and 7 respectively.  Compatible pairs of experiments are those with the same number of clients and the same number of connection requests.  The control parameter between a compatible pair of experiments is the type of connection.

 

A.     Connection-Related Time (SOD)

Figure 5. Comparison of connection time (SOD)


It was observed from the raw data that it took in average 30 ms per database connection, given the simple SELECT query we used.  As shown in Figure 5, among experiments with the same connection scheme but with different number of clients, the SODs are basically proportional to the 'total number of connections'.   An exception is when the number of clients is 15 and the connection scheme is concurrent (#14 and #16), where the connection time increased significantly.  We had noticed from the collected data that some of the connections in the two experiments took hundreds or even thousands of ms before completion.  A plausible explanation is that, due to the large number of concurrent channels between the servlet and the DBMS, the DBMS was not able to service some of the requests in a timely manner, resulting in poor overall quality of service.

Between compatible pairs of experiments, the times spent over servlet-database connection were quite compatible when the number of clients were 2, 4, or 10.  When the number of clients increased to 15, their respective performance became dramatically different, due to the significant increase of overhead placed over the DBMS by the large number of concurrent connections, as indicated earlier.

 

B.     NCRT and OET

While SOD measures the time spent by the servlet(s) over database connections, NCRT includes time spent by the servlet(s) in completing the processing of all the user requests.  These times include the time incurred to the internal processing of the servlets, such as function calls, memory management, etc., as well as time spent by the servlets when waiting for the arrival of user requests.  Therefore, factors such as overhead placed upon the underlying processors of the clients, the network delay, etc., would have some impact on NCRT.

Figure 6. Comparison of Non-Connection-Related Time (NCRT)


Figure 7. Comparison of Overall Elapsed Time (OET)

As depicted in Figure 6, significantly higher NCRTs were incurred to the sequential servlet while the number of clients reach 15.  This phenomena, we believe, was caused by the large number of user requests (1500) that needed to be scheduled by the servlet to share the only connection to the DBMS.

When the number of clients was 15 and the number of requests per client was 100, the NCRT of the concurrent servlet (exp#16) dropped significantly.  Our explanation is that other factors mentioned above (client processors, network delay, etc.) had contributed to this phenomena.

Figure 7 shows the Overall Elapsed Time (OET) incurred by the two servlets.  In both cases the sequential servlet outperformed the concurrent servlet.

3.3      Lessons Learned from the Earlier Experiments

sequential connection

+ nearly uniform overhead upon the DBMS

-  very high servlet overhead at high traffic

concurrent connection

+ comparatively lower servlet overhead at high traffic

- very high DBMS overhead at high traffic

 

 

Table 3.  Summary of the trade-offs of the two connection schemes.

An important lesson learned from our earlier experiments was that, contrary to the common belief in the superiority of concurrent processing over sequential processing, the actual performance of concurrent computing depends on various parameters in the distributed environment.  Table 3 summarized the pros and cons of both connection schemes.

 

Based on the strength and weakness of the two connection schemes, we have made the following observations:

·        When the number of connection requests becomes large[1], a high performance database server is desirable when the concurrent scheme is employed by the servlet. 

·        Similarly, at high traffic, a high performance web server is desirable if the sequential scheme is employed by the servlet. 

·        When the database server at the back end is not powerful enough, a sequential servlet is desirable.

 

4          Measuring the Performance of Servlet vs. CGI-Script DBMS Connections

The main purpose of the experiments is to compare the performance of Java servlets vs CGI scripts with regard to database access over the Internet.  In addition, we re-configured the experiment parameters such that some of the findings from the earlier sets of experiments, in which only servlets were used, may be verified.  A major change to the parameters of these new sets of experiments is the number of clients used.  As in the earlier experiments, for each different number of clients, two kinds of requests were made: one was 20 and the other was 100 requests per client.  The sequential and concurrent schemes remained part of the parameters.

4.1      Configuration of the Experiments

The configuration of the system is depicted in Figure 4, except that the connection module can be either Java servlets or CGI in the respective set of experiments.  When Java servlets is used as the connection module, JRUN was used as the servlet engine.   MySQL is used as the DBMS in both sets of experiments.  CGI scripting is implemented using Perl 5, along with the Perl/MySQL driver module 1.2209.  We use a Pentium II machine running RedHat Linux 6.0 as the server.  The machine runs Apache as the web server.  Figure 8. indicates the hardware and software configuration used in these experiments.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Figure 8.  System Configurations

In this experiment, the earlier trials using servlets to access a database were repeated, and the trials were extended by using CGI scripts as an alternative access mechanism. In the case of Servlets, both sequential and concurrent connections were made, as described in Section 3.  In the case of CGI, database access requests were submitted without synchronization at the level of the CGI scripts.  Although the main interest of comparison was between Servlets and CGI, these experiments were also intended to validate the relative performance between sequential and concurrent database access given by the earlier experiments, though no direct comparison is possible since the server platform was not the same.

Table 4 shows the configuration of the experiments with respect to the number of clients and the number of database requests per client.  Each client consisted of a desktop PC running Netscape with a unique network connection.  In the case of Servlets, the multiple requests were generated through server-side execution of multiple <servlet> tags in the requested document.  In the case of CGI, the multiple requests were implemented by embedding the database operations (connect( ), prepare( ), execute( ), et al) within a loop of a CGI script.  For each experiment, the same three data (SOD, OET, and NCRT) were collected or computed.  Each individual experiment is assigned a unique experiment number.

 

Clients:

2

4

10

15

20

Connections Per Client:

20

100

20

100

20

100

20

100

20

100

Sequential:

#1

#3

#5

#7

#9

#11

#13

#15

#17

#19

Concurrent:

#2

#4

#6

#8

#10

#12

#14

#16

#18

#20

CGI

#1c

#3c

#5c

#7c

#9c

#11c

#13c

#15c

#17c

#19c

Table 4: Experiments Configuration

4.2      Analysis of the Experiments

Table 5 shows the raw data from the experiments. The differences between the three sets of experiments are shown in Figures 9 and 10.  The Y-axis values are in milliseconds for the figures.

 

 

 

Sequential Servlet

#1

#3

#5

#7

#9

#11

#13

#15

#17

#19

SOD

1885

8854

3988

18205

9774

47699

14078

71740

19348

94459

OET

4486

21176

9539

45457

22476

116914

34103

178934

45687

241582

NCRT

2601

12322

5551

27252

12702

69215

20025

107194

26339

147123

Concurrent Servlet

#2

#4

#6

#8

#10

#12

#14

#16

#18

#20

SOD

2249

9663

4601

22407

11925

59673

16955

87863

24515

120499

OET

5800

24707

10377

51419

27094

136861

81362

215367

55570

277005

NCRT

3551

15044

5776

29012

15169

77188

64407

127504

31055

156506

CGI

#1

#3

#5

#7

#9

#11

#13

#15

#17

#19

SOD

135

483

367

1310

2500

5305

2053

5912

5989

29602

OET

1369

23591

4924

22769

6356

25900

24997

47854

32874

65450

NCRT

1234

23108

4557

21459

3856

20595

22944

41942

26885

35848

Table 5: Raw Data Collected from the Experiment

Figure 9 shows a comparison of SOD for sequential servlet vs. concurrent servlet vs CGI access. In the case of servlets, concurrent access is slightly more expensive than sequential access. With 20 clients issuing 100 requests each, for example, the difference is 120 vs. 94 seconds, about half a minute. Though the difference varies over the experiments, concurrent access is consistently more expensive.

The comparison between servlet performance and CGI were significant.  With 20 clients issuing 100 requests each, for example, the performance of CGI script was 3 or 4 times better than the servlets.  CGI also outperforms servlets when the number of clients increased, in both the 20 requests per client and the 100 requests per client cases.

Figure 9 - Comparison of SOD

 

Figure 10 shows a comparison of NCRT. Generally, NCRT for concurrent access is slightly higher than that for sequential access. The data point for 15 clients / 20 connections appears to be an aberration, possibly caused by an unexpected and sudden server load at the time of this trial.  Subsequent trials across the entire range of parameters confirmed this suspicion.  In the case of NCRT, CGI also outperforms both the sequential and the concurrent servlets.  The difference became more significant in the case of 100 requests per client when the number of clients increased.

Figure 10 - Comparison of NCRT

References

Friedrichs, J., Jubin, H. (1999) Java Thin-Client Programming for a Network Computing Environment, Prentice Hall.

 

Hunter, J., Crawford, W. (1998) JAVA Servlet Programming, O’Reilly & Associates, Inc., Sebastopol, CA.

 

Moss, K. (1999) Java Servlets With CDROM (2nd Ed.), McGraw-Hill Book Company.

 

Siple, M.D. (1997) The Complete Guide to Java Database Programming, McGraw-Hill Book Company.

 

Yang, A., Kim, J. (1999) Performance Metering of Distributed Access Using Java Servlets, Proceedings of the ADBIS'99 Conference (Advances in Databases and Information Systems), University of Maribor, Slovenia.

 

Yang, A., Linn, J., Quadrato, D. (1998) Developing Integrated Web and Database Applications Using JAVA Applets and JDBC Drivers, Proceedings of the 29th ACM SIGCSE  Technical Symposium, Atlanta, GA.

 



[1] The actual threshold would depend on the power of the database server.  In our experiments, when the total number of connection requests reached 1,500, the underlying ACCESS database showed sign of deteriorated quality of service.