Wednesday, October 27, 2010

Data Structures : Just a start towards long journey...

We, the so called "Future Computer Scientists or Software Engineeres", who are going to change a lot of technical things in near future know very well that computers operate on bits, but we do not usually think in these terms; sincerely speaking, many a times we don't like to. Although we know that integer is a sequence of 32 bits(not necessarily), we prefer seeing it from our point of view as an entity with it's own individuality. This individuality of integer get reflected on operations that can be performed on integers but not on variables of different types.

When we talk about this indivduality, three same looking terms “data types”, “data structures” and “abstract data type” comes in our mind but clearly stating, all these terms have different meanings. So why don't we have a look on simple meanings and definitions of these terms. First, data type in a programming language states the set of values that a variable of that type may assume, operations that can be performed in that variable and the way the values of that type are stored. Second, an abstract data type is a mathematical model, packaged with various operations defined on that model. An abstract data type is defined indirectly, only by the operations that may be performed on it and by mathematical constraints on the effects (and possibly cost) of those operations. Abstract data types are purely theoretical entities, used (among other things) to simplify the description of abstract algorithms. We shall design algorithms in terms of ADT's and we generally do, but to implement an algorithm in a given programming language we must find some way of representing the ADT's in terms of the data types and operators supported by language itself. To represent the mathematical model underlaying an ADT, we use “data structures”, which are collection of variables, possibly of several data types, connected in different ways. Formaly speaking, a “data structure” is a particular way of storing and organizing data in a computer so that it can be used efficiently.

Basic data types which we generally work on and see their use in daily life are array, stack, queue, link lists, tree, graphs. These data structure are furthur divided into sub-branches and midified for special purposes. Some different and advanced data structures are also added into this list. After covering basic implementation and definitions of these data structures, we will discuss about some advanced ones.

Array data structure is an arrangement of items at equally spaced addresses in computer memory. These all items are most of the times of same kind or are enveloped in same kind objects. Arrays are generally used where a list or grouping of items is needed. We can easily retrieve, add, delete and change elements in array as arrays have there elements at contiguous memory locations in same size cells. We can access these cells in any order using indexing. As array is my favorite data structure, I will surely be taking its side. With small or sometimes more difficulty, we can implement almost all data structures using arrays. A common problem with array is that it is a static data type and its size should be knoen n advance. Link list does not suffer with this problem and generally prefered because of this reason only.

Linked list is a data structure, that consists of a sequence of data records such that in each record there is a field that contains a reference or link to the next record of the sequence. Like arrays, they are also very simple and common data structures and can help in implementing almost all data structures. Although they do not provide random access to the elements but this drawback is overpowered by its dynamicity, which allow adding or deleting any number of items from list. Beacuse of being dynamic nature and no need of continuos memory locations, link list are generally preferred to implement data structure like trees and graphs. Link list are further divided into streams because of number of links in a record and the direction they point.

Stack is a data structure which is based on the principle of Last In First Out. Stack and queues play a very important role in system level tasks and are most involved in one of the most important features of operating system, i.e. Process Management. We see a lot of practical life usage of stack. Most common example of stack is set of plates placed one over another. We can place and pick plate from the top only.

Queue is a data structure which is based on the principle of First In First Out but modified implementation of queue results in some more data structures like Double Ended Queue, Circular Queue and Priority Queue. Most common example of simple queue is any queue for say movie tickets or line of Pepsi bottles in packing module of factory. A new item always comes at end any element is taken from the other end only. The order of taking elements out can be changed and it can result in queues with different behaviour. Ends at which new items can be appended and items can be removed also result in different behaviour and different types of queues.

Tree, Graph and Linked lists are very close to each other and share a basic imlementation related concept that each of these have links in one record refering to other ones. They are different in number of links and usage generally, but we can clearly say that trees are some special cases of graphs. Different properties of trees result into different type of trees and these properties can be number of links or the way data is stored and ordered in the records or say nodes of data structures. Graphs also vary because of some similar kind of properties

There are some more data structures which are hybrid or say modified ones of what we discussed till now. These are made using these data structures and their crossover. These include Hash tables, Different types of trees like BST, AVL Tree, Red Back Tree, B Tree, B+ Tree, B* Tree, B# Tree, Splay Tree and this list of trees goes on. If one just thinks about the number of all these data structures, I came to know about a fairly large collection. We have nearby 15 types of arrays, 10 types of lists, among trees there are 20 binary trees, 10 B trees, 15 heaps, 10 tries, 10 multiway trees, 25 space partitioning trees, 8-10 application specific trees, 10 hashes, 5 graphs and I am still not confident about the completion of the list.

One of specialized branch of these data structures is Concurrent data structures. Concurrent data structure is a particular way of storing and organizing data for access by multiple computing threads or processes on a computer. At old times, such data structures were used on uniprocessor machines with operating systems supporting multithreading. Concurrent data structures, intended for use in parallel or distributed computing environments, differ from Sequential data structures, intended for use on uniprocessor machine. Main difference is that in a sequential environment one specifies the data structure's properties and checks that they are implemented correctly, by providing safety properties. In a concurrent environment, the specification must also describe liveness properties which an implementation must provide. Safety properties usually state that something bad never happens, while liveness properties state that something good keeps happening. Concurrent data structures are significantly more difficult to design and to verify as being correct than their sequential counterparts. This additional difficulty comes because of concurrency, exacerbated by the fact that threads must be thought of as being completely asynchronous.

After concurrency, these comes some special large size data structures which are meant for speial purposes assigned to them like, file structure handling, memory management, database management or working on large mathematical equations. These data structures mostly include trees like B+ Tree, R Trees, Segment Tree, Disjoint set data structures.

While we talk about this data structures, we can't just leave without having a look on the use of these data structures and places where they can best used. Arrays are generally used to store similar kind or ordered or unordered information. It mostly contains of basic data types. Lists are used as counterparts of arrays but with facility of dynamicity. Stack is used in operating system for process management and to handle recursion. We can handle recursion ourselves and do it with help of iteration and using stacks. Queue is also used by operating system for process scheduling and process synchronization. Trees are used to make decisions, choose among options, for database applications and for memory management. Graphs are mostly related to network related problems and also used in social networking also.

Though this not everything about data structures, an interested person can see the vast area of data structures and their usage. One simple statement is that almost any problem or process make use of data structures in direct or indirect way.