CS #01 | Data Structure


Introduction

  • Linear Data Structure
    • Array
    • Linked List
    • Stack
    • Queue
  • Non-Linear Data Structure
    • Tree
    • Heap
    • Hash
    • Graph
  • Data Types vs Data Structures
    • Data Type defines the kind of values a variable can hold and the operations that can be performed on those values (int, float, char, bool)
    • Data Structure is a way of organizing and storing data in memory so that it can be accessed and manipulated efficiently.


1. Array

  • Definition : A data structure that stores a collection of elements (usually of the same type) in contiguous memory locations.
  • Characteristics
    • Fixed size : Once created the size of an array cannot be changed (in low-level languages like c/c++)
    • Same data type : All elements must be of the same type
    • Contiguous memory : Elements are stored next to each other in memory, which makes access very fast
    • Index-based access : Elements are accessed using indices
  • Operations
    • Access, O(1) : the memory address can be calculated directly
    • Update, O(1) : arr[i] = x
    • Traversal, O(n) : loop through array
    • Insertion/Deletion, O(n) : elements may need to be shifted
  • Advantages
    • Very fast random access (by indexing)
    • Easy to use and implement
  • Disadvantages
    • Fixed size (cannot grow/shrink easily)
    • Costly insertion and deletion (need shifting)
    • Wasted memory if array is allocated larger than needed
arr = [1, 2, 3, 4]
print(arr[0]) # Access
arr[2] = 99 # Update
[print(x) for x in arr] # Traversal


2. Linked List

  • Definition : a data structure where elements (called nodes) are connected using pointers (links).
    • Each node contains data (the value) and pointer (reference) to the next node in the list
  • Characteristics :
    • Dynamic size : Can easily grow or shrink at runtime
    • Non-contiguous memory : Nodes can be stored anywhere in memory
    • Pointer based navigation : Accessing an element requires following pointers from the head node
  • Operations
    • Access, O(n) : must traverse from head
    • Update, O(1) : if node is known node.data=x
    • Traversal, O(n)
    • Insertion/Deletion, O(1) if position is known, otherwise O(n) to find the spot
  • Advantages
    • Dynamic size
    • Efficient insertions/deletions (no shifting required like arrays)
  • Disadvantages
    • No random access (must traverse sequentially)
    • Extra memory overhead (storing pointers)
    • Cache unfriendly (nodes not stored contiguously)
class Node:
    def __init__(self, data):
        self.data = data
        self.next = None
# Cread nodes
head = Node(10)
head.next = Node(20)
head.next.next = Node(30)

# Traverse
current = head
while current:
    print(current.data)
    current = current.next


3. Stack

  • Definition : a linear data structure that follows the LIFO (Last In, First Out) principles
  • Characteristics
    • LIFO : Last inserted elements is the first one removed
    • Restricted Access : Only the top elemnet can be accessed, added, or removed
    • Can be implemented using arrays or linked lists.
  • Operations
    • Push, O(1)
    • Pop, O(1)
    • Traversal, O(n)
  • Advantages
    • Simple to use and efficient for undo/redo, recursion, parsing, backtracking
    • Fast operations (all O(1) for push,pop)
  • Disadvantages
    • Limited access (only top elements is available)
    • FIxed size if implemented using an array (unless dynamically resized)
  • Real-world Uses
    • Undo/Redo in text editors
    • Browser history (back button)
stack = []
# push
stack.append(10)
stack.append(20)
stack.append(30)
# pop
top = stack.pop()


4. Queue

  • Definition : a linear data structure that follows FIFO (First In, First Out) principles
  • Characteristics
    • FIFO : First Inserted element is the first one removed
    • Front : where elements are removed
    • Rear : where elements are added
  • Operations
    • Enqueue, O(1)
    • Dequeue, O(1)
    • Traversal, O(n)
  • Advantages
    • Maintains order of processing
    • Simple to implement
  • Disadvantages
    • Random access is not allowed
  • Real-World uses
    • Printers
    • Customer Services Lines

"""
# double ended queue
It is highly efficient O(1). While a standard list's list.pop(0) takes time because it shifts
all remaining elements, deque is implemented as a doubly linked list and performs this instantly.

"""

from collections import deque

queue = deque()
# Enqueue
queue.append(10)
queue.append(20)
queue.append(30)
# Dequeue
front = queue.popleft()


5. Tree

  • Definition : A tree is a non-linear hierarchical data structure consisting of nodes and connected by edges
    • A tree starts swith a special node called the root
    • Each node can have child nodes (subtrees)
    • Nodes without children are called leaves
  • Characteristics
    • Hierarchical structure
    • Root node at the top : every other node has exactly one parent
    • Edges connect nodes
  • Operations
    • Insertion : Depends on tree type
    • Deletion : More complex
    • Traversal & Search O(logn) : Visit nodes in a specific order, efficient in BST
      • BFS (Breadth First Search) : Level order
        • using queue (from collections import deque)
      • DFS (Depth First Search) : Preorder, Inorder, Postorder
        • Preorder : root -> left -> right
        • Inorder : left -> root -> right
        • Postorder : left -> right -> root
  • Advantages
    • Represent hierarchical data naturally
    • Enable fast search, insert, delete, in balanced trees O(logn)
  • Disadvantages
    • More complex to implement than linear structures
    • Performance can degrade if tree becomes unbalanced
  • Real-World Uses
    • File Systems (folders, subfolders)
    • Database (B-trees)
    • API responses
    • AI/ML (decision trees)
  • Variants
    • Binary Search Tree (BST) : left < root < right
      • inorder traversal → sorted array
# Binary Tree
class Node:
    def __init__(self, data):
        self.data = data
        self.left = None
        self.right = None
# Create nodes
root = Node(10)
root.left = Node(5)
root.right = Node(20)

# Traversal (preorder) - recursive
def preorder(node):
    if node:
        print(node.data)
        preorder(node.left)
        preorder(node.right)

# Traversal (inorder) - recursive
def inorder(node):
    if node:
        inorder(node.left)
        print(node.data)
        inorder(node.right)

preorder(root)
inorder(root)



6. Heap

  • Definition : Special complete binary tree that satisfies the heap propery
    • Max Heap : Every parent node is greater than or equal to its children
    • Min Heap : Every parent node is less than or equal to its children
  • Characteristics
    • Complete Binary Tree : All levels are filled, except possibly the last
    • Heap Property : Parent node compares consistently with children
    • Efficient operations : Insertions and deletions keep the tree balanced
  • Operations
    • Insert, O(logn): new element bubbled up to restore heap property
      • To maintain the complete binary tree structure
        • Insert the new element at the next available position
        • And then, restore the heap property
    • Delete, O(logn): root removed, last element moved to root, then bubbled down
    • Heapify, O(n) : build heap from an array
  • Advantages
    • Guarantees fast access to min/max element
    • Useful for priority-based tasks
    • Efficient for sorting (Heap Sort)
  • Disadvantages
    • Not good for fast searching arbitrary elements (only root is fast)
    • Requires extra work to maintain heap property after insertions/deletions
  • Real world Uses
    • Heap sort : efficient O(nlogn) sorting
import heapq
# Create a min-heap
heap = []
heapq.heappush(heap, 20)
heapq.heappush(heap, 5)
heapq.heappush(heap, 15)
print(heap) #[5, 20, 15]

# pop (remove smalles element)
print(heapq.heappop(heap))



7. Hash Map

  • Definition : A data structure that stores key-value pairs and provides fast access to values based on their keys.
  • Characteristics
    • Key-Value Mapping : Each unique key maps to a value
    • Hash Function : Converts a key into an index
    • Collision Handling : Two different keys may hash to the same index.
    • Dynamic Resizing : Often grows when load factor (items/buckets) gets too high
  • Operations
    • Insert, O(1) average, O(n) worst-case
    • Search, O(1) average, O(n) worst-case
    • Delete, O(1)
    • Traversal, O(n)
  • Advantages
    • Extremely fast lookups on average, O(1)
    • Great for dictionaries, caches, symbol tables.
    • Flexible keys (can be strings, numbers, tuples)
  • Disadvantages
    • Collisions degrade performance
    • Unordered (no natural order of keys)
    • Requires a good hash function to avoid clustering (Hbase)
      • When the hash function does not distribute keys evenly across the table, many keys end up in the same region (or in the same bucket)
      • A hash function takes a key and converts it into a number. That number is then mapped into the hash table’s array index
        • e.g. using modulo : hash(“apple”) → 9324983 → store at index(bucket) No.3
  • Real world Uses
    • Databases (Indexing with hash-based indexes)
# create a hash table
hash = {}
# Insert
hash["apple"] = 10
hash["banana"] = 20
# Search
print(hash["apple"]
# Delete
del hash["apple"]
# Traverse
for key, value in hash.items():
    print(key, value)


8. Graph

  • Definition : A non-linear data structure consisting of a set of nodes (vertices) and a set of edges that connect pairs of nodes
  • Characteristics
    • Vertices : represents entities
    • Edges : represent relationships between vertices
    • Directed vs Undirected : Edges have a direction (A → B) vs edges are two way (A - B)
    • Weighted vs Unweighted : Edges have cost and weights vs All edges treated equally
    • Cycles vs Acyclic
  • Operations
    • Traversal
    • Insert/Delete veritices and edges, O(1) to O(V^2)
    • Search
      • DFS (Depth First Search) : Explore deeply first
        • stack or recursive() and [visited]
        • To explore all possible paths and record the characteristics of each path
      • BFS (Breadth First Search) : Explore level by level.
        • with queue (collections.deque) and [visited]
        • When the graph size is extremely large
  • Advantages
    • Very flexible, can model complex relationships
    • Efficient algorithms exist for pathfinding, connectivity, shortest paths
  • Disadvantages
    • More complex than trees or arrays.
    • Some operations can be expensive or large graphs
  • Real world Uses
    • Social networks
    • Recommendation Engines
from collections import defaultdict

class Graph:
    def __init__(self):
        self.graph = defaultdict(list)

    def add_edge(self, u, v):
        self.graph[u].append(v)


    def bfs(self, start):
        visited = set()
        queue = [start]
        while queue:
            node = queue.pop(0)
            if node not in visited:
                print(node, end=" ")
                visited.add(node)
                queue.extend(self.graph[node])
g = Graph()
g.add_edge(0, 1)
g.add_edge(0, 2)
g.add_edge(1, 2)
g.add_edge(2, 0)
g.add_edge(2, 3)
g.bfs(2)  # 2, 0, 3, 1