CS #01 | Data Structure

2019-08-01 1. Computer Science Comments

Introduction

Linear Data Structure
- Array
- Linked List
- Stack
- Queue
Non-Linear Data Structure
- Tree
- Heap
- Hash
- Graph
Data Types vs Data Structures
- Data Type defines the kind of values a variable can hold and the operations that can be performed on those values (int, float, char, bool)
- Data Structure is a way of organizing and storing data in memory so that it can be accessed and manipulated efficiently.

1. Array

Definition : A data structure that stores a collection of elements (usually of the same type) in contiguous memory locations.
Characteristics
- Fixed size : Once created the size of an array cannot be changed (in low-level languages like c/c++)
- Same data type : All elements must be of the same type
- Contiguous memory : Elements are stored next to each other in memory, which makes access very fast
- Index-based access : Elements are accessed using indices
Operations
- Access, O(1) : the memory address can be calculated directly
- Update, O(1) : arr[i] = x
- Traversal, O(n) : loop through array
- Insertion/Deletion, O(n) : elements may need to be shifted
Advantages
- Very fast random access (by indexing)
- Easy to use and implement
Disadvantages
- Fixed size (cannot grow/shrink easily)
- Costly insertion and deletion (need shifting)
- Wasted memory if array is allocated larger than needed

arr = [1, 2, 3, 4]
print(arr[0]) # Access
arr[2] = 99 # Update
[print(x) for x in arr] # Traversal

2. Linked List

Definition : a data structure where elements (called nodes) are connected using pointers (links).
- Each node contains data (the value) and pointer (reference) to the next node in the list
Characteristics :
- Dynamic size : Can easily grow or shrink at runtime
- Non-contiguous memory : Nodes can be stored anywhere in memory
- Pointer based navigation : Accessing an element requires following pointers from the head node
Operations
- Access, O(n) : must traverse from head
- Update, O(1) : if node is known node.data=x
- Traversal, O(n)
- Insertion/Deletion, O(1) if position is known, otherwise O(n) to find the spot
Advantages
- Dynamic size
- Efficient insertions/deletions (no shifting required like arrays)
Disadvantages
- No random access (must traverse sequentially)
- Extra memory overhead (storing pointers)
- Cache unfriendly (nodes not stored contiguously)

class Node:
    def __init__(self, data):
        self.data = data
        self.next = None
# Cread nodes
head = Node(10)
head.next = Node(20)
head.next.next = Node(30)

# Traverse
current = head
while current:
    print(current.data)
    current = current.next

3. Stack

Definition : a linear data structure that follows the LIFO (Last In, First Out) principles
Characteristics
- LIFO : Last inserted elements is the first one removed
- Restricted Access : Only the top elemnet can be accessed, added, or removed
- Can be implemented using arrays or linked lists.
Operations
- Push, O(1)
- Pop, O(1)
- Traversal, O(n)
Advantages
- Simple to use and efficient for undo/redo, recursion, parsing, backtracking
- Fast operations (all O(1) for push,pop)
Disadvantages
- Limited access (only top elements is available)
- FIxed size if implemented using an array (unless dynamically resized)
Real-world Uses
- Undo/Redo in text editors
- Browser history (back button)

stack = []
# push
stack.append(10)
stack.append(20)
stack.append(30)
# pop
top = stack.pop()

4. Queue

Definition : a linear data structure that follows FIFO (First In, First Out) principles
Characteristics
- FIFO : First Inserted element is the first one removed
- Front : where elements are removed
- Rear : where elements are added
Operations
- Enqueue, O(1)
- Dequeue, O(1)
- Traversal, O(n)
Advantages
- Maintains order of processing
- Simple to implement
Disadvantages
- Random access is not allowed
Real-World uses
- Printers
- Customer Services Lines


"""
# double ended queue
It is highly efficient O(1). While a standard list's list.pop(0) takes time because it shifts
all remaining elements, deque is implemented as a doubly linked list and performs this instantly.

"""

from collections import deque

queue = deque()
# Enqueue
queue.append(10)
queue.append(20)
queue.append(30)
# Dequeue
front = queue.popleft()

5. Tree

Definition : A tree is a non-linear hierarchical data structure consisting of nodes and connected by edges
- A tree starts swith a special node called the root
- Each node can have child nodes (subtrees)
- Nodes without children are called leaves
Characteristics
- Hierarchical structure
- Root node at the top : every other node has exactly one parent
- Edges connect nodes
Operations
- Insertion : Depends on tree type
- Deletion : More complex
- Traversal & Search O(logn) : Visit nodes in a specific order, efficient in BST
  - BFS (Breadth First Search) : Level order
    - using queue (from collections import deque)
  - DFS (Depth First Search) : Preorder, Inorder, Postorder
    - Preorder : root -> left -> right
    - Inorder : left -> root -> right
    - Postorder : left -> right -> root
Advantages
- Represent hierarchical data naturally
- Enable fast search, insert, delete, in balanced trees O(logn)
Disadvantages
- More complex to implement than linear structures
- Performance can degrade if tree becomes unbalanced
Real-World Uses
- File Systems (folders, subfolders)
- Database (B-trees)
- API responses
- AI/ML (decision trees)
Variants
- Binary Search Tree (BST) : left < root < right
  - inorder traversal → sorted array

# Binary Tree
class Node:
    def __init__(self, data):
        self.data = data
        self.left = None
        self.right = None
# Create nodes
root = Node(10)
root.left = Node(5)
root.right = Node(20)

# Traversal (preorder) - recursive
def preorder(node):
    if node:
        print(node.data)
        preorder(node.left)
        preorder(node.right)

# Traversal (inorder) - recursive
def inorder(node):
    if node:
        inorder(node.left)
        print(node.data)
        inorder(node.right)

preorder(root)
inorder(root)

6. Heap

Definition : Special complete binary tree that satisfies the heap propery
- Max Heap : Every parent node is greater than or equal to its children
- Min Heap : Every parent node is less than or equal to its children
Characteristics
- Complete Binary Tree : All levels are filled, except possibly the last
- Heap Property : Parent node compares consistently with children
- Efficient operations : Insertions and deletions keep the tree balanced
Operations
- Insert, O(logn): new element bubbled up to restore heap property
  - To maintain the complete binary tree structure
    - Insert the new element at the next available position
    - And then, restore the heap property
- Delete, O(logn): root removed, last element moved to root, then bubbled down
- Heapify, O(n) : build heap from an array
Advantages
- Guarantees fast access to min/max element
- Useful for priority-based tasks
- Efficient for sorting (Heap Sort)
Disadvantages
- Not good for fast searching arbitrary elements (only root is fast)
- Requires extra work to maintain heap property after insertions/deletions
Real world Uses
- Heap sort : efficient O(nlogn) sorting

import heapq
# Create a min-heap
heap = []
heapq.heappush(heap, 20)
heapq.heappush(heap, 5)
heapq.heappush(heap, 15)
print(heap) #[5, 20, 15]

# pop (remove smalles element)
print(heapq.heappop(heap))

7. Hash Map

Definition : A data structure that stores key-value pairs and provides fast access to values based on their keys.
Characteristics
- Key-Value Mapping : Each unique key maps to a value
- Hash Function : Converts a key into an index
- Collision Handling : Two different keys may hash to the same index.
- Dynamic Resizing : Often grows when load factor (items/buckets) gets too high
Operations
- Insert, O(1) average, O(n) worst-case
- Search, O(1) average, O(n) worst-case
- Delete, O(1)
- Traversal, O(n)
Advantages
- Extremely fast lookups on average, O(1)
- Great for dictionaries, caches, symbol tables.
- Flexible keys (can be strings, numbers, tuples)
Disadvantages
- Collisions degrade performance
- Unordered (no natural order of keys)
- Requires a good hash function to avoid clustering (Hbase)
  - When the hash function does not distribute keys evenly across the table, many keys end up in the same region (or in the same bucket)
  - A hash function takes a key and converts it into a number. That number is then mapped into the hash table’s array index
    - e.g. using modulo : hash(“apple”) → 9324983 → store at index(bucket) No.3
Real world Uses
- Databases (Indexing with hash-based indexes)

# create a hash table
hash = {}
# Insert
hash["apple"] = 10
hash["banana"] = 20
# Search
print(hash["apple"]
# Delete
del hash["apple"]
# Traverse
for key, value in hash.items():
    print(key, value)

8. Graph

Definition : A non-linear data structure consisting of a set of nodes (vertices) and a set of edges that connect pairs of nodes
Characteristics
- Vertices : represents entities
- Edges : represent relationships between vertices
- Directed vs Undirected : Edges have a direction (A → B) vs edges are two way (A - B)
- Weighted vs Unweighted : Edges have cost and weights vs All edges treated equally
- Cycles vs Acyclic
Operations
- Traversal
- Insert/Delete veritices and edges, O(1) to O(V^2)
- Search
  - DFS (Depth First Search) : Explore deeply first
    - stack or recursive() and [visited]
    - To explore all possible paths and record the characteristics of each path
  - BFS (Breadth First Search) : Explore level by level.
    - with queue (collections.deque) and [visited]
    - When the graph size is extremely large
Advantages
- Very flexible, can model complex relationships
- Efficient algorithms exist for pathfinding, connectivity, shortest paths
Disadvantages
- More complex than trees or arrays.
- Some operations can be expensive or large graphs
Real world Uses
- Social networks
- Recommendation Engines

from collections import defaultdict

class Graph:
    def __init__(self):
        self.graph = defaultdict(list)

    def add_edge(self, u, v):
        self.graph[u].append(v)


    def bfs(self, start):
        visited = set()
        queue = [start]
        while queue:
            node = queue.pop(0)
            if node not in visited:
                print(node, end=" ")
                visited.add(node)
                queue.extend(self.graph[node])
g = Graph()
g.add_edge(0, 1)
g.add_edge(0, 2)
g.add_edge(1, 2)
g.add_edge(2, 0)
g.add_edge(2, 3)
g.bfs(2)  # 2, 0, 3, 1