(辞書とその実装: 二分探索木など)
Data Structures and Algorithms
8th lecture, November 14, 2024
https://www.sw.it.aoyama.ac.jp/2024/DA/lecture8.html
Martin J. Dürst
© 2009-24 MartinJ. Dürst 青山学院大学
- Leftovers and summary of last lecture
- Sorting algorithms faster than O(n log n)
- The dictionary ADT
- Binary trees and their traversal methods
- Binary search trees
- Balanced trees
- Stable sorting
- Sorting in various programming languages (C/Ruby)
- Quicksort is a very efficient algorithm for sorting
- In the worst case, quicksort is O(n2); on average, O(n log n)
- Quicksort is a good example for the use of average time complexity and randomized algorithms
- Implementing quicksort requires careful attention to many details
- Animation of many sorting algorithms shows their behavior and properties: sort.svg
- Sorting based on pairwise comparison is Ω(nlogn)
- One person sorts 5'000 pages
- 8 people together sort 30'000 pages
- 218341.368 seconds (⇒about 61 hours)
- 61010·103·1010 (units? way too big!)
- O(40000) (how many seconds could this be)
- Calulation of actual time backwards from big-O notation: 1second/operation, n=3000, O(n2) ⇒ 9'000'000 seconds?
- A O(n) algorithm (example: "5 seconds per page")
- For 8 people, having only one person work towards the end of the algorithm
- For humans, binary sorting is constraining (sorting into 3~20 parts is better)
- Using bubble sort (868 days without including breaks or sleep)
- Prepare 1010 boxes (problem: space, cost, distance for walking)
- Forgetting time for preparation, cleanup, breaks,...
- Submitting just a program, or just a list of numbers
- Report too short
- All sorting algorithms studied so far assume an arbitrary distribution of values
- Decisions are made by pairwise comparisons of values
→depth of decision tree is at least Ω(n log n) - If there is some knowledge about the value distribution, improvements are possible
- Extreme example: Integers from 1 to n
→Final place of data can be predicted exactly
→O(n) - This observation leads to two additional algorithms: bin sort and radix sort
(also called bucket sort)
Example: Sorting by student number
- Separate data into 10 parts using most significant digit
- Apply recursively to less significant digits
- To manage memory, split separation into two phases
(one_digit_stable_sort
in 8binradix.rb)- Calculate size of each part
- Move data items
- Complexity is O(n k), where k is the number of digits
- Implementation in Ruby:
conceptual_bin_sort
in 8binradix.rb
- Sort once for each digit, starting with the least significant digit
- No need to partition data
- A stable sorting method is necessary
- Complexity is O(n k), where k is the number of digits
- Implementation in Ruby:
radix_sort
in 8binradix.rb
Bin Sort | Radix Sort | |
---|---|---|
Complexity | O(n k) | O(n k) |
First digit sorted | most significant digit | least significant digit |
Last digit sorted | least significant digit | most significant digit |
Direction | → | ← |
Stable sort needed | No | Yes |
Data partitioning needed/possible | Yes | No |
- In practice, hybrid sorting algorithms are frequently used
- Example: Timsort
- Combination of merge sort, insertion sort, and several optimizations
- Efficient on many kinds of real input data (semi-ordered)
- Efficient for indirect data access
- Default implementation in Python, Java,...
- Recently, computers and CPUs do not get faster, but smaller, and larger in numbers
- Parallelization becomes important
- For some tasks, parallelization can be very difficult
- For some tasks, parallelization can be easy
- Many sorting algorithms are eazy to parallelize:
- Bubblesort
- Mergesort
- Quicksort
- Binsort
(caution: Dictionary ADT ≠ dictionary(book))
- For each data item, there is:
- A key (e.g. student number): Used to identify the data item, e.g. during search
- A value: All the information besides the key (may be empty, or very large)
- Operations
- New
- Insert
- Delete
- Search/find
- Sorted array: Search is O(log n)(binary search), insertion/deletion is O(n)
- Unordered array/linear list: Search and deletion are O(n), insertion is O(1)
- Ideally, search/insertion/deletion should all be O(log n) or even O(1)
- Binary search tree (this lecture)
- Balanced tree (lecture 9)
- Hashing (lecture 10)
- Graph: Consisting of nodes and edges
- Tree: The root does not have any parent; all other nodes have exactly one parent
- Binary tree: Each node has ≦2 children
(similar to complete binary tree, but layers do not have to be full or left-aligned)
- Depth first
- Preorder
- Inorder
- Postorder
- Breadth first
- Binary tree
- Each node contains one data item
- For any node with key k:
- All the keys in the left subtree will be <k (or ≦k)
- All the keys in the right subtree will be >k (or ≧k)
⇒Inorder traversal produces keys in increasing order
- What to do with multiple identical keys is implementation-dependent
- Start searching from the root node
- If the search key, compared to the current node key, is:
- The same: Return the data item at the current node
- Smaller: Search in left subtree (recursion)
- Greater: Search the right subtree (recursion)
- The empty node: Terminate seach (not found!)
- Start insertion from the root node
- If the inserted key, compared to the current node key, is:
- Smaller: Insert item into left subtree (recursion)
- Greater: Insert item into right subtree (recursion)
- The same: Give up/replace/insert into right subtree, ... (implementation dependent)
- If the current node is empty: Insert item here as a new node (with two empty nodes as children)
- Find the node to delete (same as search)
- If the number of (real, non-empty) children is:
- 0: Delete current node (replace with empty node)
- 1: Replace current node with child
- 2: Replace current node with smallest child in the right subtree
(or biggest child in left subtree)
- Share a single special node (
NilNode
) wherever there is no child node - Pseudocode/implementation in Ruby: 8bintree.rb
- Execution speed depends on the height of the tree
- Best height is O(log n)
- Worst height is O(n)
- Average height is O(log n)
(assuming that all input orders have the same probability)
- In the worst case, the shape of a general search tree is the same as the shape of a linear list
- The order of insertions/deletions cannot be changed
- An algorithm where data gets available little-by-little is are called an online algorithm
- An algorithm where all data is available at the start is called an offline algorithm
- Therefore, selecting the dividing item using randomization is impossible
- In the case of a complete binary tree, insertion/deletion take too much time
Solution: A tree that is to some degree (but not perfectly) balanced
(definition/invariants)
- The number of children for each node is 2, 3, or 4
- If a node has k children, it stores k-1 keys and data items
(if the number of children is 2, then this is the same as a binary search tree node) - The keys stored in a node are the separators for the subtrees
- The tree is of uniform height
- In the lowest layer of the tree, the nodes have no children
- Bin sort and radix sort are O(n k)
- A dictionary is an ADT storing values that can be found using keys
- A binary search tree is a way to implement a dictionary
- Operations on binary search trees are O(log n) on average, but O(n) in the worst case
- This problem can be addressed using balanced trees
(no need to submit)
- Calculate the minimum and maximum height h of a binary search tree with n data items
- Calculate the minimum and maximum height of a 2-3-4 tree with n data items
- Using various examples, think about how to insert items into a 2-3-4 tree and propose an algorithm
- bin sort
- ビンソート
- most significant digit
- 最上位の桁
- radix sort
- 基数整列
- least significant digit
- 最下位の桁
- balanced tree
- 平衡木
- traversal method
- 辿り方
- binary search tree
- 二分探索木
- balanced tree
- 平衡木
- hashing
- ハッシュ法
- binary tree
- 二分木
- parallelization
- 並列化
- depth first
- 深さ優先
- preorder
- 行きがけ順
- inorder
- 通りがけ順
- postorder
- 帰りがけ順
- breadth first
- 幅優先
- key
- キー、鍵
- implementation-dependent
- 実装依存
- top-down 2-3-4 tree
- トップダウン 2-3-4 木
- uniform height
- 一定の高さ