Introduction
- Linear Data Structure
- Array
- Linked List
- Stack
- Queue
- Non-Linear Data Structure
- Data Types vs Data Structures
- Data Type defines the kind of values a variable can hold and the operations that can be performed on those values (int, float, char, bool)
- Data Structure is a way of organizing and storing data in memory so that it can be accessed and manipulated efficiently.
1. Array
- Definition : A data structure that stores a collection of elements (usually of the same type) in contiguous memory locations.
- Characteristics
- Fixed size : Once created the size of an array cannot be changed (in low-level languages like c/c++)
- Same data type : All elements must be of the same type
- Contiguous memory : Elements are stored next to each other in memory, which makes access very fast
- Index-based access : Elements are accessed using indices
- Operations
- Access, O(1) : the memory address can be calculated directly
- Update, O(1) :
arr[i] = x
- Traversal, O(n) : loop through array
- Insertion/Deletion, O(n) : elements may need to be shifted
- Advantages
- Very fast random access (by indexing)
- Easy to use and implement
- Disadvantages
- Fixed size (cannot grow/shrink easily)
- Costly insertion and deletion (need shifting)
- Wasted memory if array is allocated larger than needed
arr = [1, 2, 3, 4]
print(arr[0]) # Access
arr[2] = 99 # Update
[print(x) for x in arr] # Traversal
2. Linked List
- Definition : a data structure where elements (called nodes) are connected using pointers (links).
- Each node contains data (the value) and pointer (reference) to the next node in the list
- Characteristics :
- Dynamic size : Can easily grow or shrink at runtime
- Non-contiguous memory : Nodes can be stored anywhere in memory
- Pointer based navigation : Accessing an element requires following pointers from the head node
- Operations
- Access, O(n) : must traverse from head
- Update, O(1) : if node is known
node.data=x
- Traversal, O(n)
- Insertion/Deletion, O(1) if position is known, otherwise O(n) to find the spot
- Advantages
- Dynamic size
- Efficient insertions/deletions (no shifting required like arrays)
- Disadvantages
- No random access (must traverse sequentially)
- Extra memory overhead (storing pointers)
- Cache unfriendly (nodes not stored contiguously)
class Node:
def __init__(self, data):
self.data = data
self.next = None
# Cread nodes
head = Node(10)
head.next = Node(20)
head.next.next = Node(30)
# Traverse
current = head
while current:
print(current.data)
current = current.next
3. Stack
- Definition : a linear data structure that follows the LIFO (Last In, First Out) principles
- Characteristics
- LIFO : Last inserted elements is the first one removed
- Restricted Access : Only the top elemnet can be accessed, added, or removed
- Can be implemented using arrays or linked lists.
- Operations
- Push, O(1)
- Pop, O(1)
- Traversal, O(n)
- Advantages
- Simple to use and efficient for undo/redo, recursion, parsing, backtracking
- Fast operations (all O(1) for push,pop)
- Disadvantages
- Limited access (only top elements is available)
- FIxed size if implemented using an array (unless dynamically resized)
- Real-world Uses
- Undo/Redo in text editors
- Browser history (back button)
stack = []
# push
stack.append(10)
stack.append(20)
stack.append(30)
# pop
top = stack.pop()
4. Queue
- Definition : a linear data structure that follows FIFO (First In, First Out) principles
- Characteristics
- FIFO : First Inserted element is the first one removed
- Front : where elements are removed
- Rear : where elements are added
- Operations
- Enqueue, O(1)
- Dequeue, O(1)
- Traversal, O(n)
- Advantages
- Maintains order of processing
- Simple to implement
- Disadvantages
- Random access is not allowed
- Real-World uses
- Printers
- Customer Services Lines
"""
# double ended queue
It is highly efficient O(1). While a standard list's list.pop(0) takes time because it shifts
all remaining elements, deque is implemented as a doubly linked list and performs this instantly.
"""
from collections import deque
queue = deque()
# Enqueue
queue.append(10)
queue.append(20)
queue.append(30)
# Dequeue
front = queue.popleft()
5. Tree
- Definition : A tree is a non-linear hierarchical data structure consisting of nodes and connected by edges
- A tree starts swith a special node called the root
- Each node can have child nodes (subtrees)
- Nodes without children are called leaves
- Characteristics
- Hierarchical structure
- Root node at the top : every other node has exactly one parent
- Edges connect nodes
- Operations
- Insertion : Depends on tree type
- Deletion : More complex
- Traversal & Search O(logn) : Visit nodes in a specific order, efficient in BST
- BFS (Breadth First Search) : Level order
- using queue (
from collections import deque)
- DFS (Depth First Search) : Preorder, Inorder, Postorder
- Preorder : root -> left -> right
- Inorder : left -> root -> right
- Postorder : left -> right -> root
- Advantages
- Represent hierarchical data naturally
- Enable fast search, insert, delete, in balanced trees O(logn)
- Disadvantages
- More complex to implement than linear structures
- Performance can degrade if tree becomes unbalanced
- Real-World Uses
- File Systems (folders, subfolders)
- Database (B-trees)
- API responses
- AI/ML (decision trees)
- Variants
- Binary Search Tree (BST) : left < root < right
- inorder traversal → sorted array
# Binary Tree
class Node:
def __init__(self, data):
self.data = data
self.left = None
self.right = None
# Create nodes
root = Node(10)
root.left = Node(5)
root.right = Node(20)
# Traversal (preorder) - recursive
def preorder(node):
if node:
print(node.data)
preorder(node.left)
preorder(node.right)
# Traversal (inorder) - recursive
def inorder(node):
if node:
inorder(node.left)
print(node.data)
inorder(node.right)
preorder(root)
inorder(root)
6. Heap
- Definition : Special complete binary tree that satisfies the heap propery
- Max Heap : Every parent node is greater than or equal to its children
- Min Heap : Every parent node is less than or equal to its children
- Characteristics
- Complete Binary Tree : All levels are filled, except possibly the last
- Heap Property : Parent node compares consistently with children
- Efficient operations : Insertions and deletions keep the tree balanced
- Operations
- Insert, O(logn): new element bubbled up to restore heap property
- To maintain the complete binary tree structure
- Insert the new element at the next available position
- And then, restore the heap property
- Delete, O(logn): root removed, last element moved to root, then bubbled down
- Heapify, O(n) : build heap from an array
- Advantages
- Guarantees fast access to min/max element
- Useful for priority-based tasks
- Efficient for sorting (Heap Sort)
- Disadvantages
- Not good for fast searching arbitrary elements (only root is fast)
- Requires extra work to maintain heap property after insertions/deletions
- Real world Uses
- Heap sort : efficient O(nlogn) sorting
import heapq
# Create a min-heap
heap = []
heapq.heappush(heap, 20)
heapq.heappush(heap, 5)
heapq.heappush(heap, 15)
print(heap) #[5, 20, 15]
# pop (remove smalles element)
print(heapq.heappop(heap))
7. Hash Map
- Definition : A data structure that stores key-value pairs and provides fast access to values based on their keys.
- Characteristics
- Key-Value Mapping : Each unique key maps to a value
- Hash Function : Converts a key into an index
- Collision Handling : Two different keys may hash to the same index.
- Dynamic Resizing : Often grows when load factor (items/buckets) gets too high
- Operations
- Insert, O(1) average, O(n) worst-case
- Search, O(1) average, O(n) worst-case
- Delete, O(1)
- Traversal, O(n)
- Advantages
- Extremely fast lookups on average, O(1)
- Great for dictionaries, caches, symbol tables.
- Flexible keys (can be strings, numbers, tuples)
- Disadvantages
- Collisions degrade performance
- Unordered (no natural order of keys)
- Requires a good hash function to avoid clustering (Hbase)
- When the hash function does not distribute keys evenly across the table, many keys end up in the same region (or in the same bucket)
- A hash function takes a key and converts it into a number. That number is then mapped into the hash table’s array index
- e.g. using modulo : hash(“apple”) → 9324983 → store at index(bucket) No.3
- Real world Uses
- Databases (Indexing with hash-based indexes)
# create a hash table
hash = {}
# Insert
hash["apple"] = 10
hash["banana"] = 20
# Search
print(hash["apple"]
# Delete
del hash["apple"]
# Traverse
for key, value in hash.items():
print(key, value)
8. Graph
- Definition : A non-linear data structure consisting of a set of nodes (vertices) and a set of edges that connect pairs of nodes
- Characteristics
- Vertices : represents entities
- Edges : represent relationships between vertices
- Directed vs Undirected : Edges have a direction (A → B) vs edges are two way (A - B)
- Weighted vs Unweighted : Edges have cost and weights vs All edges treated equally
- Cycles vs Acyclic
- Operations
- Traversal
- Insert/Delete veritices and edges, O(1) to O(V^2)
- Search
- DFS (Depth First Search) : Explore deeply first
- stack or recursive() and [visited]
- To explore all possible paths and record the characteristics of each path
- BFS (Breadth First Search) : Explore level by level.
- with queue (collections.deque) and [visited]
- When the graph size is extremely large
- Advantages
- Very flexible, can model complex relationships
- Efficient algorithms exist for pathfinding, connectivity, shortest paths
- Disadvantages
- More complex than trees or arrays.
- Some operations can be expensive or large graphs
- Real world Uses
- Social networks
- Recommendation Engines
from collections import defaultdict
class Graph:
def __init__(self):
self.graph = defaultdict(list)
def add_edge(self, u, v):
self.graph[u].append(v)
def bfs(self, start):
visited = set()
queue = [start]
while queue:
node = queue.pop(0)
if node not in visited:
print(node, end=" ")
visited.add(node)
queue.extend(self.graph[node])
g = Graph()
g.add_edge(0, 1)
g.add_edge(0, 2)
g.add_edge(1, 2)
g.add_edge(2, 0)
g.add_edge(2, 3)
g.bfs(2) # 2, 0, 3, 1