The soft corner...

Wednesday, October 27, 2010

Data Structures : Just a start towards long journey...

We, the so called "Future Computer Scientists or Software Engineeres", who are going to change a lot of technical things in near future know very well that computers operate on bits, but we do not usually think in these terms; sincerely speaking, many a times we don't like to. Although we know that integer is a sequence of 32 bits(not necessarily), we prefer seeing it from our point of view as an entity with it's own individuality. This individuality of integer get reflected on operations that can be performed on integers but not on variables of different types.

When we talk about this indivduality, three same looking terms “data types”, “data structures” and “abstract data type” comes in our mind but clearly stating, all these terms have different meanings. So why don't we have a look on simple meanings and definitions of these terms. First, data type in a programming language states the set of values that a variable of that type may assume, operations that can be performed in that variable and the way the values of that type are stored. Second, an abstract data type is a mathematical model, packaged with various operations defined on that model. An abstract data type is defined indirectly, only by the operations that may be performed on it and by mathematical constraints on the effects (and possibly cost) of those operations. Abstract data types are purely theoretical entities, used (among other things) to simplify the description of abstract algorithms. We shall design algorithms in terms of ADT's and we generally do, but to implement an algorithm in a given programming language we must find some way of representing the ADT's in terms of the data types and operators supported by language itself. To represent the mathematical model underlaying an ADT, we use “data structures”, which are collection of variables, possibly of several data types, connected in different ways. Formaly speaking, a “data structure” is a particular way of storing and organizing data in a computer so that it can be used efficiently.

Basic data types which we generally work on and see their use in daily life are array, stack, queue, link lists, tree, graphs. These data structure are furthur divided into sub-branches and midified for special purposes. Some different and advanced data structures are also added into this list. After covering basic implementation and definitions of these data structures, we will discuss about some advanced ones.

Array data structure is an arrangement of items at equally spaced addresses in computer memory. These all items are most of the times of same kind or are enveloped in same kind objects. Arrays are generally used where a list or grouping of items is needed. We can easily retrieve, add, delete and change elements in array as arrays have there elements at contiguous memory locations in same size cells. We can access these cells in any order using indexing. As array is my favorite data structure, I will surely be taking its side. With small or sometimes more difficulty, we can implement almost all data structures using arrays. A common problem with array is that it is a static data type and its size should be knoen n advance. Link list does not suffer with this problem and generally prefered because of this reason only.

Linked list is a data structure, that consists of a sequence of data records such that in each record there is a field that contains a reference or link to the next record of the sequence. Like arrays, they are also very simple and common data structures and can help in implementing almost all data structures. Although they do not provide random access to the elements but this drawback is overpowered by its dynamicity, which allow adding or deleting any number of items from list. Beacuse of being dynamic nature and no need of continuos memory locations, link list are generally preferred to implement data structure like trees and graphs. Link list are further divided into streams because of number of links in a record and the direction they point.

Stack is a data structure which is based on the principle of Last In First Out. Stack and queues play a very important role in system level tasks and are most involved in one of the most important features of operating system, i.e. Process Management. We see a lot of practical life usage of stack. Most common example of stack is set of plates placed one over another. We can place and pick plate from the top only.

Queue is a data structure which is based on the principle of First In First Out but modified implementation of queue results in some more data structures like Double Ended Queue, Circular Queue and Priority Queue. Most common example of simple queue is any queue for say movie tickets or line of Pepsi bottles in packing module of factory. A new item always comes at end any element is taken from the other end only. The order of taking elements out can be changed and it can result in queues with different behaviour. Ends at which new items can be appended and items can be removed also result in different behaviour and different types of queues.

Tree, Graph and Linked lists are very close to each other and share a basic imlementation related concept that each of these have links in one record refering to other ones. They are different in number of links and usage generally, but we can clearly say that trees are some special cases of graphs. Different properties of trees result into different type of trees and these properties can be number of links or the way data is stored and ordered in the records or say nodes of data structures. Graphs also vary because of some similar kind of properties

There are some more data structures which are hybrid or say modified ones of what we discussed till now. These are made using these data structures and their crossover. These include Hash tables, Different types of trees like BST, AVL Tree, Red Back Tree, B Tree, B+ Tree, B* Tree, B# Tree, Splay Tree and this list of trees goes on. If one just thinks about the number of all these data structures, I came to know about a fairly large collection. We have nearby 15 types of arrays, 10 types of lists, among trees there are 20 binary trees, 10 B trees, 15 heaps, 10 tries, 10 multiway trees, 25 space partitioning trees, 8-10 application specific trees, 10 hashes, 5 graphs and I am still not confident about the completion of the list.

One of specialized branch of these data structures is Concurrent data structures. Concurrent data structure is a particular way of storing and organizing data for access by multiple computing threads or processes on a computer. At old times, such data structures were used on uniprocessor machines with operating systems supporting multithreading. Concurrent data structures, intended for use in parallel or distributed computing environments, differ from Sequential data structures, intended for use on uniprocessor machine. Main difference is that in a sequential environment one specifies the data structure's properties and checks that they are implemented correctly, by providing safety properties. In a concurrent environment, the specification must also describe liveness properties which an implementation must provide. Safety properties usually state that something bad never happens, while liveness properties state that something good keeps happening. Concurrent data structures are significantly more difficult to design and to verify as being correct than their sequential counterparts. This additional difficulty comes because of concurrency, exacerbated by the fact that threads must be thought of as being completely asynchronous.

After concurrency, these comes some special large size data structures which are meant for speial purposes assigned to them like, file structure handling, memory management, database management or working on large mathematical equations. These data structures mostly include trees like B+ Tree, R Trees, Segment Tree, Disjoint set data structures.

While we talk about this data structures, we can't just leave without having a look on the use of these data structures and places where they can best used. Arrays are generally used to store similar kind or ordered or unordered information. It mostly contains of basic data types. Lists are used as counterparts of arrays but with facility of dynamicity. Stack is used in operating system for process management and to handle recursion. We can handle recursion ourselves and do it with help of iteration and using stacks. Queue is also used by operating system for process scheduling and process synchronization. Trees are used to make decisions, choose among options, for database applications and for memory management. Graphs are mostly related to network related problems and also used in social networking also.

Though this not everything about data structures, an interested person can see the vast area of data structures and their usage. One simple statement is that almost any problem or process make use of data structures in direct or indirect way.

Thursday, August 20, 2009

Object-Oriented Programming...

Object-Oriented Programming(OOP) uses "Objects" – data structures consisting of datafields and methods – and their interactions to design applications and computer programs. Programming techniques may include features such as Information Hiding, Data Abstraction, Encapsulation, Modularity, Polymorphism, and Inheritance. It was not commonly used in mainstream software application development until the early 1990s. Many modern programming languages now support OOP.

One of the principal advantages of object-oriented programming techniques over procedural programming techniques is that they enable programmers to create modules that do not need to be changed when a new type of object is added. A programmer can simply create a new object that inherits many of its features from existing objects. This makes object-oriented programs easier to modify.

The methodology focuses on data rather than processes, with programs composed of self-sufficient modules (objects) each containing all the information needed to manipulate its own data structure. This is in contrast to the existing modular programming which had been dominant for many years that focused on the function of a module, rather than specifically the data, but equally provided for code reuse, and self-sufficient reusable units of programming logic, enabling collaboration through the use of linked modules (subroutines). This more conventional approach, which still persists, tends to consider data and behavior separately.

An object-oriented program may thus be viewed as a collection of cooperating objects, as opposed to the conventional model, in which a program is seen as a list of tasks (subroutines) to perform.

OOP can be used to translate from real-world phenomena to program elements (and vice versa). OOP was even invented for the purpose of physical modeling in the Simula-67 programming language. However, not everyone agrees that direct real-world mapping is facilitated by OOP, or is even a worthy goal; Bertrand Meyer argues in Object-Oriented Software Construction that a program is not a model of the world but a model of some part of the world; "Reality is a cousin twice removed". At the same time, some principal limitations of OOP had been noted.

However, Niklaus Wirth said of OOP in his paper "Good Ideas through the Looking Glass", "This paradigm closely reflects the structure of systems 'in the real world', and it is therefore well suited to model complex systems with complex behaviours.

Thursday, July 30, 2009

Structured Programming - Little Introduction

Structured Programming (sometimes known as Modular Programming) is a subset of Procedural Programming that enforces a logical structure on the program being written to make it more efficient and easier to understand and modify.

Structured programming was first suggested by Corrado Bohm and Guiseppe Jacopini. The two mathematicians demonstrated that any computer program can be written with just three structures: Decisions, Sequences and Loops.

1) In "Decision" one of a number of statements is executed depending on the state of the program. This is usually expressed with keywords such as if..then..else..endif, switch, or case.

2) "Sequence" refers to an ordered execution of statements.

3) In "Loop" a statement is executed until the program reaches a certain state or operations are applied to every element of a collection. This is usually expressed with keywords such as while, repeat, for or do..until. Often it is recommended that each loop should only have one entry point (and in the original structural programming, also only one exit point), and a few languages enforce this.

Structured programming frequently (but not always) employs a top-down design model, in which developers map out the overall program structure into separate subsections. A defined function or set of similar functions is coded in a separate module or submodule, which means that code can be loaded into memory more efficiently and that modules can be reused in other programs. After a module has been tested individually, it is then integrated with other modules into the overall program structure.

It is possible to do structured programming in almost any procedural programming language, but since about 1970 when structured programming began to gain popularity as a technique, most new procedural programming languages have included features to encourage structured programming, (and sometimes have left out features that would make unstructured programming easy). Some of famous structured languages are ALGOL, PASCAL, Ada, C etc.

Coders should break larger pieces of code into shorter subroutines (functions and procedures in some languages) that are small enough to be understood easily. In general, programs should use global variables sparingly; instead, subroutines should use local variables and take arguments by either value or reference. These techniques help to make isolated small pieces of code easier to understand without having to understand the whole program at once.

Strictly speaking, in a structured programming language, any code structure should have only one entry point and one point of exit; many languages such as C allow multiple paths to a structure's exit (such as "continue", "break", and "return"), which can bring both advantages and disadvantages in readability.

So, Happy Programming.....