Linked List Implementation in C#.NET
Last Updated: 11 January 2005
One of the challenges faced by an application designer is
reading in large sequential files and having to also load the data in processing
arrays as it is read. For example, let's say we have a large data file and we
have groupings within that file bounded by data indicators that signal a
switching of the group. We read and process the file by group and when we are
finished with the group, the data buffers must be refreshed. This data structure
at first sounds like an array. However, here is the real issue: How big do we
build the array? This is a question not easily answered when one cannot
determine the "largest case" scenario for an array. And the issue
grows even larger when we have a requirement to have what is known as a
"jagged array" where dimensions are not always equal or uniform.
In this article, I will refer to an actual project where a
large input file was opened up for sequential read and the data within that file
was logically divided into groups with varying amounts of related data per
group. For the data within the group, we might have to access any one of the
rows to determine if it is there in order to make processing decisions. In the
old days, we would try to estimate the maximum number of rows in a group and
build the array that size. This can cause a couple of issues. First, we do not
really know the maximum number of rows that will be in any given group -- we can
only estimate. If we estimate wrong, then this leads to an error where the array
is not large enough. Second, even if we size it right, there is tremendous
memory waste because the array area must be allocated in a fixed length. Figure
1 below shows this situation.
Figure 1
Figure 1 shows a diagram of a group structure
if fixed length arrays are used. GP1 is the group header record from the
sequential file. We need to store the record in the array because there are
items of data needed within it and therefore we have to allocate a full array
row for it. SGA, SGB, and SGC are all different segments types that are stored
in the array within group GP1. The segment type is also like a sub-grouping
within GP1 and their individual segments are suffixed with a number 1, 2, 3,
etc. Every cell in the diagram marked "Unused" is just that. They are
"Unused" because the number of elements in that particular segment are
assigned a cell but the array is fixed length and we therefore have to allocate
up to the maximum number of cells. Every cell marked "Unused" is a
wasted memory area.
There are two data structures that were used
to build a jagged array and solve these problems. The first structure was a
linked-list implementation which had to be built through code and the second is
a built-in C# structure known as an ArrayList. This article will discuss how
that using a combination of the two structures will yield an overall structure
as is shown in Figure 2:
Figure 2
The Linked List and ArrayList Structures
The linked list structure is basically a series of nodes with a
"next" pointer that binds them like a chain. The advantages of this
type of structure are:
The disadvantages are:
In our implementation, we use a C# structure known as an
ArrayList to store the linked lists that will be the offshoot branches along the
chain. Why do we use an ArrayList and not another linked list? Simple. It is for
simplicity sake only. If we were to use a series of branches of linked lists,
then we would have to have more complex implementations of the GoToNode method
in the dll. We would have to traverse the tree to find a specific node on a sub branch. With an ArrayList, we simply push the entire sublist on the ArrayList
and can pop it off with one index (as it will only hold one list in our
implementation here).
Implementation: Building the DLL
The basic building block of a linked list is
a structure known as the tree node. We started this application by building a .dll
named LinkList.dll. Initially, the class declaration to define a node within the
linked list is shown in figure 3. The squares in figure 2 above would be
referred to as nodes.
public
class
LlNode // This is
the node object within a linked list
{
public string
status; //
You can put a status indicator in here
public ArrayList
minornode = new ArrayList();
// This is the offshoot branch
public string
data; //
This is the data buffer -- put your data here
public LlNode
next; //
Will point to the next node in the list
}; //
*** LlNode declaration ***
Figure 3
The field "status" is a string area
that you can use for anything to indicate some kind of state information or
anything you want for that matter. The data within this field exists for the
life of the node. The next field within the class is of type ArrayList and is
named minornode. Here, we will actually place another linked list as a branch
node. The segments SGA, SGB, and SGC displayed in figure 2 actually start at
minornode. The string area "data" is the actual data buffer and will
hold the input record area. And finally, but very important, is the field next
which is actually a pointer or references another node associated with the list
to build.
Once we have the node defined, we want to
create a class for the linked list data structure and methods. Our class will
have the following methods:
| LinkedList() |
This is the constructor method |
| GoToNode(int pos) |
This method is invoked to position to an absolute location
(indicated by pos) on the linked list |
| PushNode(LlNode newnode, int pos) |
We use this method to push a new node onto the linked list
(which in our implementation is somewhat like a stack in that nodes are
placed on the list in a last-in-first-out method) |
| ListLen() |
Returns an integer value of the linked list length. |
| PrintList() |
To print the contents of an entire linked list on the
console. |
| Exists(L1Node pnode, string srch, ref int pindex) |
This method will return a boolean to indicate if a type of
data segment (i.e., SGA, SGB, etc.) exists on the linked list tree. This
was used in the implementation for validating that required segments were
there. |
Remember, what we want is a way to read in an associated grouping of records
into a jagged array, manipulate or extract the information from the array, and
then dispose of the grouping in memory when finished. This is useful for
processing situations like reading in all personal information for an account
holder, getting the information you need, and then disposing of the structure.
When using fixed-length arrays, you have to initialize each cell. This can be
time consuming and resource intensive. With a linked list, all you have to do is
reset or declare "new" the list array object.
Creating a New List
To instantiate a new linked list you must declare a type of LinkedList and
construct the new object. For example, to declare and instantiate the LinkedList
object MyList we would have the following code:
public static LinkedList MyList = new LinkedList();
Nodes are part of the list therefore we must instantiate a
new node with the following code:
public static LlNode MyNode = new LlNode();
Building the List
When building the list, we work at the node level first. We fill the node
with the input record data and then push it onto the list much the same way we
would treat a stack structure. In following code example, we add the data from
the input flat file record to the instance of node and then use the PushNode
method to place it on the list:
LinkedList templist = new LinkedList();
// templist is a new linked list structure
LlNode node = new LlNode // Llnode is the node
that will get placed on the list
node.data = sInRec // the data buffer of node now has the input record data
templist.PushNode(node,0) // Push the node on the list at the entry
point
Processing the Records
Our sample file has the following records in it:
HDR 0100
GRP GROUP1
SGT SEGMENT A1
SGT SEGMENT A2
SGT SEGMENT A3
SGT SEGMENT A4
GRP GROUP2
SGT SEGMENT A1 G2
GRP GROUP3
SGT SEGMENT A1
SGT SEGMENT A2
SGT SEGMENT A3
SGT SEGMENT A4
SGT SEGMENT A5
SGT SEGMENT A6
SGT SEGMENT A7
SGT SEGMENT A8
This file, as listed above, has a header record (HDR), three group records
(GRP), and various segments (SGT) per group.
Now, let's trace the life cycle of a record from flat file to linked list to
disposal. Looking at the program listing LinkedListMain.cs in the class
MainClass, we see that the program has one argument on command line: the input
text filename (above) which we named datafile.txt. The file is opened for input
via the StreamReader class and read sequentially. For each record read, control
is passed to the "Process" method:
StreamReader oInfile = File.OpenText(sInfilename);
sInRec = oInfile.ReadLine();
iState = HEADER;
while (sInRec != null) {
Process();
sInRec = oInfile.ReadLine();
}
The method "Process" starts by passing each record through a simple
state machine. Basically, if the first three bytes of the record is
"HDR" then this indicates an informational header record.
The "state" becomes the constant HEADER. The code next looks
for a record type (first 3 bytes in the record) of GRP. The state then becomes
the constant GRP.
if (sType == "HDR") {
iState = HEADER;
}
if (sType == "GRP" && iState == HEADER) {
MyNode.status = "Initialized";
iState = GRP; }
else
if (sType == "GRP"
&& iState == GRP) {
MyNode.data = "MyNode";
MyList.PushNode(MyNode, 0);
Console.WriteLine("Show
list");
MyList.PrintList();
bExists =
MyList.Exists(MyNode,"GRP",ref pindex);
PrintMyList(0); // Print the
grouping node
PrintMyList(1); // Print all
segments in the group (branch)
}
At the occurrence of the GRP record, a new list node or LlNode is created.
This will store the linked list (or really a pointer to the linklist stored in
minornode) of all segments within the grouping. A temporary list named templist
is created and a new instance of the list node area MyNode is declared. The data
from the flat file record is placed in the MyNode.data buffer and the node is
pushed on at position 0 in the linked list templist.
switch (sType) {
case "GRP":
// We can put some checks here to make sure
// that we do not add more than one GRP per grouping
// This is the first segment in this loop and thus a
// new grouping List node is declared;
MyNode = new LlNode();
// node is the ListNode object where the segment's data
// is stored.
LlNode node = new LlNode();
// templist will ultimately end up
on the minornode ArrayList
templist = new LinkedList();
node.data = sInRec;
// add the data for this segment to
templist
templist.PushNode(node,0);
// push templist onto the ArrayList minornode
MyNode.minornode.Add(templist);
MyNode.status = "nothing";
break;
Figure 4 shows the structure of the list at the occurrence of a new grouping
event:

Figure 4
After the GRP segment, the program expects the data segments within the group
or SGT records. The program will read as many as there are before the next GRP
record. Here is where the branch list gets built off of the minor node of MyNode
for each SGT record read.
case "SGT":
// Declare new instance of node -- it will store the input rec
node = new LlNode();
node.data = sInRec;
// Declare a new working list
templist = new LinkedList();
// Insert node into working list
templist.PushNode(node,0);
// Check to see if there is already a "SGT" list in this loop node
// The Exists() method will return the index of the segment if true
bexists = MyList.Exists(MyNode,"SGT",ref pindex);
Figure 5 shows the resulting structure after reading and processing the SGT
segments:

Figure 5
The linked list just keeps building until the end of the flat file or another
GRP record is encountered to indicate a new grouping. Note that in order to
"push" the templist onto the ArrayList indicated in minornode, we must
remove the old list first then insert the new list with the RemoveAt and Insert
methods of the ArrayList class.
Upon encountering a new GRP record, we can extract, process, and manipulate
the data in the linked list before discarding it. Before placing a new GRP
segment in the new grouping, we initialize or create a new list node by doing:
MyNode = new LlNode();. At the point where MyNode is declared new, the old
memory allocated for the linked list branching from MyNode is gone and the .NET
garbage collector comes along to reuse it. A new node with the same name,
MyNode, is now prepared to point to a new list.
By using the method PrintMyList(node index), we can dump the contents of the
entire list to the console. This method travels down the chain from the end to
the start in order to print the list in first in first out order. Why is the
list built in reverse? It is because we are always pushing a new node on at
position 0 with the PushNode method. This is much easier than trying to figure
out where the end of list is and append to the end. In the PrintMyList method,
the order is reversed so that you have the impression that the records are on
the list in original order.
A Recommendation
Use the Visual Studio or DBGCLR debuggers to correct any bugs found in your
linked list implementation. Otherwise you will get thoroughly confused when
trying to isolate problems in the linked list processing. These debuggers do an
excellent job in "exploding" the linked list and letting you see the
entire tree. Figure 6 shows a Visual Studio .NET debugging session for this
program with the linked list exploded:

Figure 6
Some Final Thoughts
This is a simple example of a combination linked list and ArrayList. The
intent is for you to be able to take this code and adapt and modify to fit your
particular data processing needs. Without structures like this, we are forced to
table grouped information in fixed-types of structures such as random access
flat files, database tables, and fixed-length arrays. All of these structures
are either memory-intensive or input/output intensive. The linked list is very
efficient on memory and very fast. When you have to do large volume table
processing of sequential data give this structure consideration. It just might
do the trick for you.
Attachments:
Demo Project Files: LinkedListMain.zip
Dll Project Files: LinkedListLib.Zip