Data Structures and Algorithms: Dictionaries and their Implementation: Binary Trees, ... (2024)

(辞書とその実装: 二分探索木など)

Data Structures and Algorithms

8th lecture, November 14, 2024

https://www.sw.it.aoyama.ac.jp/2024/DA/lecture8.html

Martin J. Dürst

© 2009-24 MartinJ. Dürst 青山学院大学

  • Leftovers and summary of last lecture
  • Sorting algorithms faster than O(n log n)
  • The dictionary ADT
  • Binary trees and their traversal methods
  • Binary search trees
  • Balanced trees
  • Stable sorting
  • Sorting in various programming languages (C/Ruby)
  • Quicksort is a very efficient algorithm for sorting
  • In the worst case, quicksort is O(n2); on average, O(n log n)
  • Quicksort is a good example for the use of average time complexity and randomized algorithms
  • Implementing quicksort requires careful attention to many details
  • Animation of many sorting algorithms shows their behavior and properties: sort.svg
  • Sorting based on pairwise comparison is Ω(nlogn)
  1. One person sorts 5'000 pages
  2. 8 people together sort 30'000 pages
  • 218341.368 seconds (⇒about 61 hours)
  • 61010·103·1010 (units? way too big!)
  • O(40000) (how many seconds could this be)
  • Calulation of actual time backwards from big-O notation: 1second/operation, n=3000, O(n2) ⇒ 9'000'000 seconds?
  • A O(n) algorithm (example: "5 seconds per page")
  • For 8 people, having only one person work towards the end of the algorithm
  • For humans, binary sorting is constraining (sorting into 3~20 parts is better)
  • Using bubble sort (868 days without including breaks or sleep)
  • Prepare 1010 boxes (problem: space, cost, distance for walking)
  • Forgetting time for preparation, cleanup, breaks,...
  • Submitting just a program, or just a list of numbers
  • Report too short
  • All sorting algorithms studied so far assume an arbitrary distribution of values
  • Decisions are made by pairwise comparisons of values
    →depth of decision tree is at least Ω(n log n)
  • If there is some knowledge about the value distribution, improvements are possible
  • Extreme example: Integers from 1 to n
    →Final place of data can be predicted exactly
    →O(n)
  • This observation leads to two additional algorithms: bin sort and radix sort

(also called bucket sort)

Example: Sorting by student number

  • Separate data into 10 parts using most significant digit
  • Apply recursively to less significant digits
  • To manage memory, split separation into two phases
    (one_digit_stable_sort in 8binradix.rb)
    1. Calculate size of each part
    2. Move data items
  • Complexity is O(n k), where k is the number of digits
  • Implementation in Ruby: conceptual_bin_sort in 8binradix.rb
  • Sort once for each digit, starting with the least significant digit
  • No need to partition data
  • A stable sorting method is necessary
  • Complexity is O(n k), where k is the number of digits
  • Implementation in Ruby: radix_sort in 8binradix.rb
Bin Sort Radix Sort
Complexity O(n k) O(n k)
First digit sorted most significant digit least significant digit
Last digit sorted least significant digit most significant digit
Direction
Stable sort needed No Yes
Data partitioning needed/possible Yes No
  • In practice, hybrid sorting algorithms are frequently used
  • Example: Timsort
    • Combination of merge sort, insertion sort, and several optimizations
    • Efficient on many kinds of real input data (semi-ordered)
    • Efficient for indirect data access
    • Default implementation in Python, Java,...
  • Recently, computers and CPUs do not get faster, but smaller, and larger in numbers
  • Parallelization becomes important
  • For some tasks, parallelization can be very difficult
  • For some tasks, parallelization can be easy
  • Many sorting algorithms are eazy to parallelize:
    • Bubblesort
    • Mergesort
    • Quicksort
    • Binsort

(caution: Dictionary ADT dictionary(book))

  • For each data item, there is:
    • A key (e.g. student number): Used to identify the data item, e.g. during search
    • A value: All the information besides the key (may be empty, or very large)
  • Operations
    • New
    • Insert
    • Delete
    • Search/find
  • Sorted array: Search is O(log n)(binary search), insertion/deletion is O(n)
  • Unordered array/linear list: Search and deletion are O(n), insertion is O(1)
  • Ideally, search/insertion/deletion should all be O(log n) or even O(1)
    • Binary search tree (this lecture)
    • Balanced tree (lecture 9)
    • Hashing (lecture 10)
  • Graph: Consisting of nodes and edges
  • Tree: The root does not have any parent; all other nodes have exactly one parent
  • Binary tree: Each node has ≦2 children
    (similar to complete binary tree, but layers do not have to be full or left-aligned)
  • Depth first
    • Preorder
    • Inorder
    • Postorder
  • Breadth first
  • Binary tree
  • Each node contains one data item
  • For any node with key k:
    • All the keys in the left subtree will be <k (or k)
    • All the keys in the right subtree will be >k (or k)

    ⇒Inorder traversal produces keys in increasing order

  • What to do with multiple identical keys is implementation-dependent
  • Start searching from the root node
  • If the search key, compared to the current node key, is:
    • The same: Return the data item at the current node
    • Smaller: Search in left subtree (recursion)
    • Greater: Search the right subtree (recursion)
    • The empty node: Terminate seach (not found!)
  • Start insertion from the root node
  • If the inserted key, compared to the current node key, is:
    • Smaller: Insert item into left subtree (recursion)
    • Greater: Insert item into right subtree (recursion)
    • The same: Give up/replace/insert into right subtree, ... (implementation dependent)
  • If the current node is empty: Insert item here as a new node (with two empty nodes as children)
  • Find the node to delete (same as search)
  • If the number of (real, non-empty) children is:
    • 0: Delete current node (replace with empty node)
    • 1: Replace current node with child
    • 2: Replace current node with smallest child in the right subtree
      (or biggest child in left subtree)
  • Share a single special node (NilNode) wherever there is no child node
  • Pseudocode/implementation in Ruby: 8bintree.rb
  • Execution speed depends on the height of the tree
  • Best height is O(log n)
  • Worst height is O(n)
  • Average height is O(log n)
    (assuming that all input orders have the same probability)
  • In the worst case, the shape of a general search tree is the same as the shape of a linear list
  • The order of insertions/deletions cannot be changed
    • An algorithm where data gets available little-by-little is are called an online algorithm
    • An algorithm where all data is available at the start is called an offline algorithm
  • Therefore, selecting the dividing item using randomization is impossible
  • In the case of a complete binary tree, insertion/deletion take too much time

Solution: A tree that is to some degree (but not perfectly) balanced

(definition/invariants)

  • The number of children for each node is 2, 3, or 4
  • If a node has k children, it stores k-1 keys and data items
    (if the number of children is 2, then this is the same as a binary search tree node)
  • The keys stored in a node are the separators for the subtrees
  • The tree is of uniform height
  • In the lowest layer of the tree, the nodes have no children
  • Bin sort and radix sort are O(n k)
  • A dictionary is an ADT storing values that can be found using keys
  • A binary search tree is a way to implement a dictionary
  • Operations on binary search trees are O(log n) on average, but O(n) in the worst case
  • This problem can be addressed using balanced trees

(no need to submit)

  • Calculate the minimum and maximum height h of a binary search tree with n data items
  • Calculate the minimum and maximum height of a 2-3-4 tree with n data items
  • Using various examples, think about how to insert items into a 2-3-4 tree and propose an algorithm
bin sort
ビンソート
most significant digit
最上位の桁
radix sort
基数整列
least significant digit
最下位の桁
balanced tree
平衡木
traversal method
辿り方
binary search tree
二分探索木
balanced tree
平衡木
hashing
ハッシュ法
binary tree
二分木
parallelization
並列化
depth first
深さ優先
preorder
行きがけ順
inorder
通りがけ順
postorder
帰りがけ順
breadth first
幅優先
key
キー、鍵
implementation-dependent
実装依存
top-down 2-3-4 tree
トップダウン 2-3-4 木
uniform height
一定の高さ
Data Structures and Algorithms: Dictionaries and their Implementation:
  Binary Trees, ... (2024)

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Corie Satterfield

Last Updated:

Views: 6778

Rating: 4.1 / 5 (62 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Corie Satterfield

Birthday: 1992-08-19

Address: 850 Benjamin Bridge, Dickinsonchester, CO 68572-0542

Phone: +26813599986666

Job: Sales Manager

Hobby: Table tennis, Soapmaking, Flower arranging, amateur radio, Rock climbing, scrapbook, Horseback riding

Introduction: My name is Corie Satterfield, I am a fancy, perfect, spotless, quaint, fantastic, funny, lucky person who loves writing and wants to share my knowledge and understanding with you.