CSCI 5333 Data Base Management
System
Spring 2002 (1/14-5/6)
Chapter 5: Storage Structures
5.1 Basic Concepts
- Physical level issues: storage of data in the secondary
storage devices; efficient access of physical data; auxiliary
data structure for the secondary storage - indexes; access methods
- A storage hierarchy of mixed storage media
- Primary storage: main memory, cache memory
- Secondary storage: magnetic disks, optical disks, and
tapes.
- Volatile versus nonvolatile storage
- On-line versus off-line devices
- Exercise: Compare the tradeoffs of
primary storage versus secondary storage media.
- The process of physical database design involves
choosing from among the options the particular data organization techniques
that best suit the given application requirements.
- DBMS system implementers must study data organization
techniques so that they can implement them efficiently and thus provide
the DBA and users of the DBMS with sufficient options.
- Primary file organizations determine how the
records of a file are physically placed on the disk, and hence how the records
can be accessed.
- A heap file (or unordered file) places
the records on disk in no particular order by appending new records at
the end of the file.
- A sorted file (or sequential file
) keeps the records ordered by the value of a particular field (called
the sort key).
- A hashed file uses a hash function applied
to a particular field (called the hash key) to determine a record’s placement
on disk.
- Other primary file organizations, such as B-trees,
R-trees, Quad-trees, etc., use tree structures.
NOTE: The terminology may carry
different meaning in different context. Sequential files
in COBOL , for example, are unordered files.
- A secondary organization or auxiliary access
structure allows efficient access to the records of a file based on
alternate fields than those that have been used for the primary file organization.
Most of these exist as indexes. (See Chapter 6)
- Exercise: Given a relation called
STUDENT, as shown on page 128, what would be an appropriate
primary file organization to store the records? If
you choose to store the file as a sorted file, wich attribute
should be the sort key? Justify your answer.
- Exercise (Cont.): What indexes would you like
to build on top of the primary organization?
- Exercise (Cont.): Is there a universal
rule in making the above decisions?
Go to the
Index
5.2.1 Hardware Description of Disk Devices
- Bits, Bytes (Characters), KB, MB, GB, ...
- See Figures 13.1 and 13.2: tracks, surfaces, cylinders,
sectors, disk read/write head, disk rotation
- Compare: Disk blocks (or pages) versus disk
sectors
- Question: What is the operating system's
role in disk initializaiton (or formatting)?
- Question: How is the hardware
address of a block determined?
- Question: How is the
disk buffer (or cluster) used during disk read/write operations, respectively?
- Exercise: Where are the following
devices or software located: disk controller, disk driver, and disk
manager ? Explain their respective functionality.
- Important disk parameters: See Table 13.1
and Appendix C.
Rotation (or spindle) speed, rotation
delay, latency, seek time, block transfer time, bulk transfer
rate
- Question: Explain why "locating
data on disk is a major bottleneck in database applications"?
- Question: Do you agree with
the following statement on page 121: "The file structures we discuss here
and in Chapter 6 attempt to minimize the number of block
transfers needed to locate and transfer the required data
from disk to main memory" ? Justify your answer.
- Exercise: Answer the questions in exercise
13.23.
5.4 Buffering of Blocks
- Concurrent operations achieved by interleaved
operations (versus by parallel operations): See Figure 5.5,
p.127
- Double Buffering: See Figure 5.6, p.128
Ø
Go to the
Index
5.5 Placing File Records on Disk
- Fixed-Length versus Variable-Length Records:
See Fig. 5.7, p.130 for examples
- Question: What are the reasons a file
may be made up of variable-length records? See page 129.
- Compare: sector, block,
record
- Blocking factor (bfr) for a
file: The number of records that can fit into a block.
B: the block size
R: the record size
B >= R
bfr = floor (B/R)
Unused space in the block = B - (bfr * R) bytes
- Unspanned versus Spanned
Record Organization in a Block: See Fig. 5.8, page 132
- For variable-length
records using spanned organization, each
block may store a different number of records.
- The number of blocks b needed for
a file of r records:
b = ceiling (r/bfr) blocks
, bfr represents the average number of records per block
- Question: How do you
determine the number of blocks needed in a file using unspanned
organization?
- 5.5.4 Alternative Methods in Allocating
File Blocks on Disk
- Contiguous allocation
- Linked allocation
- Clustered allocation
- Indexed allocation
- Exercise: Draw
diagrams to illustrate the different allocation methods.
- Question: Which program is in
charge of maintaining the file headers (also called file descriptors
)?
- Question: Wh
at are the data stored in the file headers?
- The goal of a good file organization
is to locate the block that contains a desired record with a minimal
number of block transfers.
5.6 Typical Operations on Files
- Retrieval versus Update operations
- Exercise: Draw a UML diagram
to illustrate the relationships among the following concepts -
files, records, disks, blocks, memory, disk buffers, programs, and program
variables.
- Exercise: Examine
each of the operations listed on pp.133-134 and explain how each of the
operations is carried out by using the UML diagram you create from the exercise
above.
- Record-at-a-time operations versus
set-at-a-time operations
- Compare: a file
organization versus an access method
- Static versus
dynamic files
Ø
Go to the
Index
- Algorithm 5.1 (p.138): Binary search
on a sequential file
- Algorithm 5.2 (p.141): Internal
Hashing algorithms
- Questions: What is collision
in hashing? How are collision handled? (See
p.141 )
- Question: In external
hashing, what is the minimum number of disk retrieval operations
? How about the maximum number of disk retrieval operations?
See Fig. 5.11 and 5.12 (pp.143-144).
- Exercise: Compare the three
primary file organization methods by listing their respective characteristics,
strength, and weakness.
5.7 Files of Unordered Records (Heap Files)
5.8 Files of Ordered Records (Sorted Files)
5.9 (External) Hashing Techniques
Ø
Go to the
Index
|
Main Page
Biography
Teaching
Research
Services
Other Links
Last updated:
4/02
|