Part 4: Purpose of Implementations and Algorithms
Data Structures in Java
By Richard Baldwin
Baldwin explains that the core collection interfaces in the Java Collections
Framework allow collections to be manipulated without regard for how they are
implemented.
Java Programming, Lecture Notes #1356
Preface
This is the fourth lesson in a miniseries on Java data
structures and the Java Collections Framework. The first lesson in the
miniseries was entitled Data
Structures in Java: Part 1, Getting Started.
The purpose of this miniseries is to help you learn the essential features
of Object-Oriented data structures in Java using the Collections Framework.
Supplementary material
I recommend that you also study the other lessons in my extensive collection
of online Java tutorials. You will find those lessons published at Gamelan.com. However, as
of the date of this writing, Gamelan doesn't maintain a consolidated index of
my Java tutorial lessons, and sometimes they are difficult to locate
there. You will find a consolidated index at Baldwin's Java
Programming Tutorials.
The index on my site provides links to the lessons at Gamelan.com .
Preview
At least three things are included in a collections
framework:
- interfaces
- implementations
- algorithms
The previous lesson discussed the purpose of the
interfaces. This lesson will discuss the purpose of the implementations
and the algorithms in the Collections Framework.
Introduction
In the previous lesson, entitled Data Structures in Java: Part 3, Purpose
of Framework Interfaces, we learned that the framework provides nine
concrete implementations of the interfaces in the framework. These nine
implementation classes are available for immediate instantiation to produce
objects to satisfy your collection needs.
We also learned that the framework provides three incomplete
implementations. These classes are available for you to use as a starting
point in defining your own implementations. Default implementations of
many of the interface methods are provided in the incomplete implementations.
Discussion
and Sample Program
Purpose of implementations
The implementations in the Java Collections Framework are the
concrete definitions of the classes that implement the core collection
interfaces. For example, as of JDK 1.3, the concrete implementations
in the Java Collections Framework are provided by the following nine classes.
- HashSet
- TreeSet
- LinkedList
- ArrayList
- Vector
- HashMap
- WeakHashMap
- TreeMap
- Hashtable
Available for immediate use
These classes are available for immediate use to instantiate collection
objects.
As you can see, there are two classes that obviously fall into the set
category, two that obviously fall into the list category, and three that
obviously fall into the map category. I will have more to say
about the details of these classes in subsequent lessons.
This leaves two additional classes whose names don't readily divulge the
category in which they belong.
Vector and Hashtable classes
The classes Vector and Hashtable were part of Java even before
the Java Collections Framework became available. The Vector class
can be used to instantiate objects that fall in the general list
category.
The Hashtable class can be used to instantiate objects that fall in
the map category.
These two classes have been upgraded to make them compatible with the
Collections Framework.
Abstract implementations
In addition to the concrete implementations listed above, the following
three classes partially implement the interfaces, but are not intended for
instantiation. Rather, they are intended to be extended into new concrete
classes that you define.
- AbstractSet
- AbstractList
- AbstractMap
Therefore, by either using one of the three classes listed
above as a starting point, or by starting from scratch and fully implementing
one or more of the interfaces, you can provide new concrete implementations to
augment the framework to include collections that meet your special needs.
Purpose of algorithms
Algorithms are methods (not necessarily exposed) that provide useful
capabilities, such as searching and sorting. For example, the Collection
interface declares an exposed method named contains.
The contains method
- Receives an incoming
reference of type Object as a parameter
- Searches the collection
looking for an element that matches the incoming reference
- Returns true if the
collection on which the method is invoked contains the specified element
Different classes, different
implementations
You can safely invoke the contains method on any object instantiated
from a class that properly implements the Collection interface, even if
you don't know the actual type of the collection object.
The manner in which the search will be performed will probably differ from
one concrete implementation of the interface to the next. For example, a TreeSet
object will perform the search very rapidly with a time cost of only log(n)
where n is the number of elements. On the other hand, for the same number
of elements, because of a different underlying data structure, a search on an ArrayList
object will probably require more time than a search on a TreeSet
object. As the number of elements increases, the difference in time cost
between the two will also increase.
Consider the sample program shown in Listing 1. This program compares
the search speed of the ArrayList class and the TreeSet
class. A detailed discussion of the program follows Listing 1.
/*File SpeedTest01 Copyright 2001 R.G.Baldwin **************************************/
import java.util.*;
public class SpeedTest01{ public static void main( String args[]){ new Worker().doIt(); }//end main() }//end class SpeedTest01
class Worker{ public void doIt(){ int size = 500000; //Create a TreeSet object Collection aTree = new TreeSet(); //Populate the TreeSet object with // random values. The add() method // for a set rejects duplicates. Random rnGen = new Random(); for(int ct = 0; ct < size; ct++){ aTree.add(new Double( rnGen.nextDouble())); }//end for loop //Create and populate an ArrayList // object with the same random // values Collection aList = new ArrayList(aTree); //Extract a value near the center // of the ArrayList object to use // as a test case. Object testVal = ((List)aList).get(size/2); //Search for the test value in each // of the collection objects. // Measure and display the time // required to perform the search // in each case. long start = new Date().getTime(); boolean found = aList.contains(testVal); long stop = new Date().getTime(); System.out.println( found + " " + (stop - start)); start = new Date().getTime(); for(int x = 0; x < 10000; x++){ found = aTree.contains(testVal); }//end for loop stop = new Date().getTime(); System.out.println( found + " " + (stop - start));
}//end doIt() }// end class Worker
Listing 1 |
The program begins by instantiating a TreeSet object and populating
it with approximately 500,000 elements as shown in Listing 2 below. The
values encapsulated in the objects referred to by the elements in the
collection are produced by a random number generator.
Recall that the add method of a Set object rejects duplicate elements,
so there may be fewer than 500,000 elements in the object after it is
populated.
public void doIt(){ int size = 500000;
Collection aTree = new TreeSet();
Random rnGen = new Random(); for(int ct = 0; ct < size; ct++){ aTree.add(new Double( rnGen.nextDouble())); }//end for loop
Listing 2 |
One of the capabilities of the Collection Framework is to create a new Collection
object and populate it with the contents of an existing Collection
object of a different (or the same) actual type.
The code in Listing 3 instantiates an ArrayList object and populates
it with the contents of the existing TreeSet object. As a result,
we then have two different Collection objects of different actual types
containing the same elements.
Collection aList = new ArrayList(aTree);
Listing 3 |
The objective of this program is to compare the times required to search for
and find an element in each of the collections. Thus, we need a target
element to search for.
The code in Listing 4 below extracts a value near the center of the ArrayList
object using an index to find and extract the value. This is a very fast
operation on a List object. This value is saved in testVal
to be used later for test purposes.
Note that the reference to the ArrayList object was saved as type Collection
in Listing 3 above.
Note also that it was necessary to cast that reference to type List in
Listing 4 below in order to invoke the get method on the
reference. This is because the Collection interface does not
declare a method named get. Rather, the get method is added
to the List interface to define a more specialized form of collection.
Object testVal = ((List)aList).get(size/2);
Listing 4 |
The code in Listing 5 below invokes the contains method to search for
the test value in each of the collections. It uses the system clock to
measure the time required to find the element in each case. I will assume
that you understand how to use the Date class for this purpose, and
won't provide a detailed explanation.
long start = new Date().getTime(); boolean found = aList.contains(testVal); long stop = new Date().getTime(); System.out.println( found + " " + (stop - start)); start = new Date().getTime(); for(int x = 0; x < 10000; x++){ found = aTree.contains(testVal); }//end for loop stop = new Date().getTime(); System.out.println( found + " " + (stop - start));
}//end doIt()
Listing 5 |
The output from the program was:
true 171
true 30
The first line of output applies to the ArrayList object, and the
second line applies to the TreeSet object.
As we would expect, the test value was successfully found in both cases;
hence the display of true in both cases.
Time required to search ArrayList
The output indicates that approximately 171 milliseconds were required to
find the test value in the ArrayList object.
Time required to search TreeSet object
On the other hand, the time required to find the test value in the TreeSet
object was so small that it wasn't even measurable within the granularity of
the system clock (other experiments have caused me to believe that the
granularity of the system clock on this machine is at least ten milliseconds).
Hence, the original reported time required to find the test value in the TreeSet
object was zero.
In order to get a measurable time value to search the TreeSet object,
I had to wrap the invocation of the contains method in a for-loop and
search for the same value 10,000 times in succession. Thus, the time
required to find the test value in the TreeSet object was approximately
0.0030 milliseconds as compared to 171 milliseconds for the ArrayList
object.
(I'll let you do the arithmetic to see if this makes sense in terms of
the expected time cost to search the two different types of objects.
Don't forget the extra overhead of the for-loop.)
Different implementations
This is a graphic demonstration that even though both objects can be treated
as type Collection, and the contains method can be invoked on
either object in a polymorphic manner, the actual implementations of the two
objects and the implementations of the contains methods in those two
objects are different.
Each type of collection has advantages and disadvantages, depending on your
needs.
Polymorphic behavior applies
The important point is that if you receive a reference to the collection
object as type Collection, you can invoke the contains method on
that reference without regard to the underlying structure of the collection
object. This is because polymorphic behavior applies.
Very briefly, polymorphic behavior means that the actual method that is
executed is the appropriate method for that type of object regardless of the
type of the reference to the object. This is one of the great advantages
of using the Java Collections Framework and passing collection objects among
methods as interface types.
Sorting algorithms
Some of the implementations of the Java Collection Framework maintain their
elements in a random order, and other implementations maintain their elements
in a sorted order. Thus, the framework also provides sorting
algorithms. However, the sorting algorithms used to maintain the order of
the collections are not exposed in the way that the search algorithm is exposed
(via the contains() method). Rather, the sorting algorithms are implicit
in those implementations that need them, and are absent from those
implementations that don't need them.
Now for a little quiz
Let's see if you are still awake. Select the words in one pair of
parentheses in the following statement that causes the statement to be true.
The interfaces in the Collections Framework make it possible to manipulate
the contents of collections in a manner that is (dependent on) (independent
of) the underlying implementation of each collection.
And the answer is ...
The interfaces in the Collections Framework make it possible to manipulate
the contents of collections in a manner that is independent of the underlying
implementation of each collection. That is the beauty of basing the
framework on interfaces that declare polymorphic methods.
I placed this question here to drive home the point that the methods
declared in the Collection interface can be invoked on collection
objects in a polymorphic manner.
That is to say, as a user of an object instantiated from a class that
properly implements the Collection interface, you can invoke the methods
declared in that interface on a reference to the object and be confident that
the actual method that is invoked will be the version that is appropriate for
the class from which the object was instantiated. This is polymorphic
behavior.
In the event that you need to invoke a method that is not declared in the Collection
interface (such as the get() method in Listing 4 above), you can pass
the reference as one of the more specialized sub-interfaces of Collection,
such as Set.
Benefits of using the Collections Framework
The Java
Tutorial from Sun lists and explains the following benefits of using the
Java Collections Framework.
- It reduces programming
effort
- It increases program speed
and quality
- It allows interoperability
among unrelated APIs
- It reduces the effort to
learn and use new APIs
- It reduces effort to design
new APIs
- It fosters software reuse
For a detailed explanation of these benefits, I am simply going to refer you
directly to this excellent online book.
Summary
So, let's recap some of what we have learned in this and the
previous lessons.
The core collection interfaces in the Collections Framework are shown in
Listing 6.
The basic purpose of the core collection interfaces in the Java Collections Framework
is to allow collections to be manipulated without regard for how the
collections are implemented, provided of course that the implementations comply
with the contracts.
The framework provides the following nine concrete implementations (classes)
of the interfaces shown in Listing 1:
- HashSet
- TreeSet
- LinkedList
- ArrayList
- Vector
- HashMap
- WeakHashMap
- TreeMap
- Hashtable
For example, the classes TreeSet and ArrayList are concrete
implementations of the Collection interface as shown in the above
list. (Actually, they are concrete implementations of sub-interfaces
of Collection. JDK 1.3 doesn't provide any direct implementations of
Collection).
A collection object instantiated from the class TreeSet and a
collection object instantiated from the class ArrayList can each be
viewed as being of the interface type Collection.
Methods having the same signatures can be used to manipulate either
collection with confidence that the behavior of the method will be appropriate
for the actual type of collection involved.
The framework also provides the following incomplete implementations of the
core interfaces:
- AbstractSet
- AbstractList
- AbstractMap
The purpose of these implementations is to provide you with a starting point
for defining your own concrete implementations for more specialized
collections.
What's
Next?
In the next lesson, I will dig a little deeper into the
details of the Java Collections Framework. I will begin the next lesson
with a sample program that illustrates the basic purpose of the core
collection interfaces, which is to allow collections to be manipulated
without regard for how the collections are implemented, as long as the
implementation meets the contract of the interface.
Copyright 2000, Richard G. Baldwin. Reproduction in whole or in part
in any form or medium without express written permission from Richard Baldwin
is prohibited.
Richard
Baldwin is a college professor and private consultant whose primary
focus is a combination of Java and XML. In addition to the many
platform-independent benefits of Java applications, he believes that a
combination of Java and XML will become the primary driving force in the
delivery of structured information on the Web.
Richard has participated in numerous consulting projects involving Java,
XML, or a combination of the two. He frequently provides onsite Java
and/or XML training at the high-tech companies located in and around Austin,
Texas. He is the author of Baldwin's Java Programming Tutorials,
which has gained a worldwide following among experienced and aspiring Java
programmers. He has also published articles on Java Programming in Java Pro
magazine.
Richard holds an MSEE degree from Southern Methodist University and has
many years of experience in the application of computer technology to real-world
problems. |