Part 6: Duplicate Elements, Ordered Collections, Sorted Collections, and Interface Specialization
Data Structures in Java
By Richard Baldwin
In this lesson Baldwin shows you that all concrete implementations in the
Java Collections Framework (JDK 1.3) implement a subinterface of the Collection
interface.
Java Programming, Lecture Notes #1360
Preface
This is the sixth lesson in a miniseries on Java data
structures and the Java Collections Framework. The first lesson in the
miniseries was entitled Data
Structures in Java: Part 1, Getting Started. The previous lesson was
entitled Data Structures in Java: Part
5, The Core Collection Interfaces.
The purpose of this miniseries is to help you learn the essential features
of Object-Oriented data structures in Java using the Collections Framework.
Viewing tip
You may find it useful to open another copy of this lesson in a separate
browser window. That will make it easier for you to scroll back and forth
among the different figures and listings while you are reading about them.
Supplementary material
I recommend that you also study the other lessons in my extensive collection
of online Java tutorials. You will find those lessons published at Gamelan.com. However, as
of the date of this writing, Gamelan doesn't maintain a consolidated index of
my Java tutorial lessons, and sometimes they are difficult to locate
there. You will find a consolidated index at Baldwin's Java
Programming Tutorials.
The index on my site provides links to the lessons at Gamelan.com.
Preview
In the previous lesson, entitled Data Structures in Java: Part 5, The Core
Collection Interfaces, you learned that the Java Collections Framework
defines six core interfaces, in two distinct trees. You learned the names
and the inheritance structure of those interfaces. You also learned about
their purpose. You saw how the interfaces declare polymorphic methods
that apply to implementations of the interfaces, and you learned about the optional
methods of the Collection interface.
In this lesson you will learn that all of the implementations of the
interfaces in the Java Collections Framework implement one of the subinterfaces
of the Collection interface. You will learn that a Set
object cannot contain duplicate elements, but a List object can contain
duplicate elements.
You will learn about the difference between ordered collections and sorted
collections. You will also learn about ascending order and the natural
ordering of objects. In addition, you will learn how more specialized
stipulations are placed on interfaces as you progress down the interface
inheritance hierarchy of the Java Collections Framework.
Discussion
Let's start with a quiz
As is often the case, I am going to begin this lesson with a little quiz
just to make sure that you are awake. Is the following statement True or
False?
The TreeSet class is a direct implementation of
the Collection interface.
The answer is False. The TreeSet class is not a
direct implementation of the Collection interface. Rather, the TreeSet
class is a direct implementation of the SortedSet interface. The SortedSet
interface extends the Set interface, and the Set interface
extends the Collection interface.
The root of the collection hierarchy
The Collection interface is the root of the collection
hierarchy. JDK 1.3 doesn't provide any direct implementations of the Collection
interface. All of the implementations of the interfaces in the Java
Collections Framework implement one of the subinterfaces of the Collection
interface.
What does Sun say about this?
Here is what the Sun documentation has to say on the topic of the Collection
interface:
"The SDK does not provide any direct implementations of this
interface: it provides implementations of more specific subinterfaces like Set
and List. This interface is typically used to pass collections around and
manipulate them where maximum generality is desired."
The Sun documentation also states:
"Bags or multisets (unordered collections that may contain duplicate
elements) should implement this interface directly."
However, JDK 1.3 does not provide any implementations for bags or
multisets. If you need collections of these types, you will need to
define the implementation classes yourself.
What about duplicate elements?
Some implementations of Collection allow duplicate elements, and
others do not. Implementations of the List interface (such as
ArrayList) allow duplicate elements. Implements of Set and SortedSet
(such as TreeSet) do not allow duplicate elements. This was
illustrated in a previous lesson entitled Data Structures in Java: Part 5, The Core
Collection Interfaces.
A sample program in that lesson created two collection objects and applied
the polymorphic add() method to add the same elements to each
collection. One of the collection objects was of type ArrayList,
and the other collection object was of type TreeSet. The elements
added to each collection contained one pair of duplicate elements. The
duplicate element was automatically excluded from the TreeSet object,
but was retained in the ArrayList object.
So, what is a set?
According to Sun, a Set is a "collection that contains no
duplicate elements ... this interface models the mathematical set
abstraction." An object of type Set is typically used to model
collections such as Social Security numbers where duplicates are not allowed.
And, what is a list?
Also according to Sun, a List is "An ordered collection (also
known as a sequence). The user of this interface has precise control over where
in the list each element is inserted. The user can access elements by their
integer index (position in the list), and search for elements in the
list."
Ordered is not the same as sorted
Note that an ordered collection is not the same as a sorted
collection. The fact that the collection is ordered derives from the fact
that each element in the collection has a specific position specified by an
index. In a sorted collection, the position of each element is determined
by its value relative to the values of its predecessors and successors.
Sun goes on to say, "Unlike sets, lists typically allow duplicate
elements. More formally, lists typically allow pairs of elements e1 and e2 such
that e1.equals(e2), and they typically allow multiple null elements if they
allow null elements at all."
Is ascending sort order always required?
Not all implementations of the Collection interface maintain the
elements in ascending sort order. Some may, and others do not. For
example, as discussed above, implementations of the List interface (such
as ArrayList) do not maintain their elements in sorted order at all.
In other words, the position of an element in an ArrayList does not
depend on the value of the element.
On the other hand, implementations of the interfaces named SortedSet (such
as TreeSet) and SortedMap do maintain their elements in sorted
order. However, that order is not necessarily ascending. When an
object is instantiated from a class that implements the SortedSet
interface, the sorting order for that object can be established by providing an
object instantiated from a class that implements the Comparator
interface. In that case, the author of the implementing class determines
the order imposed on the elements in the collection.
Does case matter in String objects?
For example, if your SortedSet object contains references to String
objects, the natural ascending sort would take the difference between upper
case and lower case characters into account. However, you might prefer
that case be ignored when establishing the sorted order. You can
accomplish this by providing an object of a class that implements the Comparator
interface and which defines the compare() and equals() methods in
such a way as to eliminate case considerations for comparisons of String
objects. (The lesson entitled Swing
from A to Z: Analyzing Swing Components, Part 1, Concepts contains a
sample program named Introspect03 that implements the Comparator interface for
exactly this purpose.)
Subinterfaces have more stipulations
As you progress down the inheritance hierarchy, you find that additional
stipulations apply at each level of inheritance. As an example, according
to Sun, "The Set interface places additional stipulations, beyond those
inherited from the Collection interface, on the contracts of all constructors
and on the contracts of the add, equals and hashCode
methods."
The important point is that specific subinterfaces of the Collection
interface can define requirements that do not apply to all subinterfaces of the
Collection interface. For example, the add() method of the Set
interface stipulates the following:
"Adds the specified element to this set if it is
not already present."
On the other hand, the add() method of the Collection
interface simply states:
"Ensures that this collection contains the
specified element."
Thus, the contract for the add() method of an object
of a class that implements the Set interface is more specialized than
the contract for the add() method of an object of a class that
implements Collection interface.
An additional stipulation on the constructor for a Set object is that
all constructors must create a set that contains no duplicate elements.
Stipulations on SortedSet
The SortedSet interface extends the Set interface. The SortedSet
interface contains the following stipulation that makes it more specialized
than a Set.
"A set that further guarantees that its iterator will traverse the
set in ascending element order, sorted according to the natural ordering of its
elements (see Comparable), or by a Comparator provided at sorted set creation
time."
Let's end with a quiz
I'm going to finish this lesson with several questions in the form of a quiz.
To ensure that this is a learning experience, I will provide an explanation in
addition to the answer for each question.
Q1 True or False? A
collection that implements the List interface maintains its elements in
ascending alphanumeric order.
The answer to question 1 is false. Unlike collections that implement
the SortedSet interface, the order of the elements in a collection that
implements the List interface is not based on the values of the objects
referred to by the elements in the list.
Q2 True or False? A
collection that implements the List interface is an unordered
collection.
The answer to question 2 is also false. A collection that implements
the List interface is an ordered collection (also known as a
sequence). According to Sun, "The user of the interface has
precise control over where in the list each element is inserted." Elements
can be inserted and retrieved on the basis of their integer index (position
in the list) using the following methods:
add(int
index, Object element)
get(int index)
Valid index values are positive integers that begin with zero. When
the add method is used to insert an element at a specific position in
the sequence, the element currently at that position (if any) and any
subsequent elements are shifted toward higher index values to make room for the
new element.
Another version of the add method takes a reference to an object as
an incoming parameter and appends the specified element to the end of the
collection.
The get method simply returns the element at the specified position
in the collection.
The List interface also declares various other methods that can be
used to manipulate the contents of the collection.
Q3 True or False? A
collection that implements the List interface is allowed to contain
duplicate values.
The answer to question 3 is true. Unlike a collection that implements
the Set interface, a collection that implements the List
interface is typically allowed to contain duplicate values. More
formally, according to Sun, "lists typically allow pairs of elements e1 and e2 such
that e1.equals(e2),
and they typically allow multiple null elements if they allow null elements at
all."
Q4 True or False? The
contracts of the methods in the List interface are the same as the
contracts of the methods inherited from the Collection interface.
The answer to question 4 is false. According to Sun, "The List
interface places additional stipulations, beyond those specified in the Collection
interface, on the contracts of the iterator, add, remove,
equals,
and hashCode
methods."
For example, the iterator() method (for both the List and Collection
interfaces) returns an iterator over the elements in the collection.
For the Collection interface, there are no guarantees concerning the
order in which the elements are returned by the methods of the Iterator
object.
On the other hand, the iterator() method for the List
interface returns an iterator over the elements in the collection in proper
sequence, where the sequence is determined by the numeric index. In other
words, when you invoke the methods of the Iterator object on a List,
the elements will be returned in the proper sequence as determined by a numeric
index.
Similarly, according to Sun, the SortedSet interface "guarantees
that its iterator will traverse the set in ascending element order, sorted
according to the natural ordering of its elements (see Comparable), or by a
Comparator provided at sorted set creation time."
Summary
In this lesson you learned that all of the implementations
of the interfaces in the Java Collections Framework implement one of the
subinterfaces of the Collection interface. You learned that a Set
object cannot contain duplicate elements, but a List object can contain duplicate
elements.
You learned about the difference between ordered collections and sorted
collections. You also learned about ascending order and the natural
ordering of objects. In addition, you learned how more specialized
stipulations are placed on interfaces as you progress down the interface
inheritance hierarchy of the Java Collections Framework.
What's
Next?
The SortedSet interface "guarantees that its
iterator will traverse the set in ascending element order, sorted according to
the natural ordering of its elements (see Comparable), or by a Comparator
provided at sorted set creation time." In the next lesson, I
will show you how to use the Comparator interface to control the sorted
order of your collections.
Copyright 2000, Richard G. Baldwin. Reproduction in whole or in part
in any form or medium without express written permission from Richard Baldwin
is prohibited.
Richard
Baldwin is a college professor and private consultant whose primary
focus is a combination of Java and XML. In addition to the many
platform-independent benefits of Java applications, he believes that a
combination of Java and XML will become the primary driving force in the
delivery of structured information on the Web.
Richard has participated in numerous consulting projects involving Java,
XML, or a combination of the two. He frequently provides onsite Java
and/or XML training at the high-tech companies located in and around Austin,
Texas. He is the author of Baldwin's Java Programming Tutorials,
which has gained a worldwide following among experienced and aspiring Java
programmers. He has also published articles on Java Programming in Java Pro
magazine.
Richard holds an MSEE degree from Southern Methodist University and has
many years of experience in the application of computer technology to real-world
problems. |