Introduction To DATA STRUCTURES: Basic Principles
Introduction To DATA STRUCTURES: Basic Principles
Basic Principles
Data structures are generally based on the ability of a computer to fetch and store data
at any place in its memory, specified by an address — a bit string that can be itself
stored in memory and manipulated by the program. Thus the record and array data
structures are based on computing the addresses of data items with arithmetic
operations; while the linked data structures are based on storing addresses of data
items within the structure itself. Many data structures use both principles, sometimes
combined in non-trivial ways (as in XOR linking).
A linked list whose nodes contain two fields: an integer value and a link to the next
node
Linked lists are among the simplest and most common data structures; they provide
an easy implementation for several important abstract data structures,
including stacks, queues, associative arrays, and symbolic expressions.
The principal benefit of a linked list over a conventional array is that the order of the
linked items may be different from the order that the data items are stored in memory
or on disk. For that reason, linked lists allow insertion and removal of nodes at any
point in the list, with a constant number of operations.
On the other hand, linked lists by themselves do not allow random access to the data,
or any form of efficient indexing. Thus, many basic operations — such as obtaining
the last node of the list, or finding a node that contains a given datum, or locating the
place where a new node should be inserted — may require scanning most of the list
elements.
Linked lists can be implemented in most languages. Languages such
as Lisp and Scheme have the data structure built in, along with operations to access
the linked list. Procedural languages, such as C, or object-oriented languages, such
as C++ and Java, typically rely on mutable references to create linked lists.
A singly-linked list whose nodes contain two fields: an integer value and a link to the
next node
In a doubly-linked list, each node contains, besides the next-node link, a second link
field pointing to the previous node in the sequence. The two links may be
called forward(s) andbackwards, or next and prev(ious).
A doubly-linked list whose nodes contain three fields: an integer value, the link
forward to the next node, and the link backward to the previous node
The technique known as XOR-linking allows a doubly-linked list to be implemented
using a single link field in each node. However, this technique requires the ability to
do bit operations on addresses, and therefore may not be available in some high-level
languages.
In a multiply-linked list, each node contains two or more link fields, each field being
used to connect the same set of data records in a different order (e.g., by name, by
department, by date of birth, etc.). (While doubly-linked lists can be seen as special
cases of multiply-linked list, the fact that the two orders are opposite to each other
leads to simpler and more efficient algorithms, so they are usually treated as a
separate case.)
In the case of a doubly circular linked list, the only change that occurs is the end, or
"tail" of the said list is linked back to the front, "head", of the list and vice versa.
Once the array is set up, access to any element is convenient and fast with the [ ]
operator. (Extra for experts) Array access with expressions such as scores[i] is
almost always implemented using fast address arithmetic: the address of an element is
computed as an offset from the start of the array which only requires one
multiplication and one addition.
The disadvantages of arrays are...
1) The size of the array is fixed — 100 elements in this case. Most often this size is
specified at compile time with a simple declaration such as in the example above .
With a little extra effort, the size of the array can be deferred until the array is created
at runtime, but after that it remains fixed. (extra for experts) You can go to the trouble
of dynamically allocating an array in the heap and then dynamically resizing it with
realloc(), but that requires some real programmer effort.
2) Because of (1), the most convenient thing for programmers to do is to allocate
arrays which seem "large enough" (e.g. the 100 in the scores example). Although
convenient, this strategy has two disadvantages: (a) most of the time there are just 20
or 30 elements in the array and 70% of the space in the array really is wasted. (b) If
the program ever needs to process more than 100 scores, the code breaks. A surprising
amount of commercial code has this sort of naive array allocation which wastes space
most of the time and crashes for special occasions. (Extra for experts) For relatively
large arrays (larger than 8k bytes), the virtual memory system may partially
compensate for this problem, since the "wasted" elements are never touched.
3) (minor) Inserting new elements at the front is potentially expensive because
existing elements need to be shifted over to make room. Linked lists have their own
strengths and weaknesses, but they happen to be strong where arrays are weak. The
array's features all follow from its strategy of allocating the memory for all its
elements in one block of memory. Linked lists use an entirely different strategy.
As we will see, linked lists allocate memory for each element separately and only
when necessary.