Assignment 4

Submission: Upload a single .zip file named u{Student ID}.zip (e.g., u6880517.zip) to MS Teams Classroom.

Typesetting. For written questions, typeset your answers and produce a PDF named hw4.pdf; place it at the top level of your zip. We do not accept scanned or handwritten submissions. Proper typesetting means math symbols — summations, fractions, inequalities — are rendered with correct notation, bounds, and spacing, as you would see in a textbook.

AI Usage Policy. Before you begin, remind yourself of the AI usage policy and guideline on the course website. In brief: Don’t offload your learning and thinking to AI. Use AI to help you learn.

Counting Bad Pairs (20 pts)

Save your work in Zombies.kt and ZombiesTest.kt.

In the village of Salaya, humans and zombies live side by side — indistinguishable to the eye. When the villagers line up in a row, humans arrange themselves tallest-to-shortest, but zombies act erratically. Given a list of heights, a pair of positions $(i, j)$ with $i < j$ is bad if $\mathtt{hs}[i] < \mathtt{hs}[j]$ (the person at position $i$ is shorter than the one at position $j$, violating the tall-to-short norm).

(i) Specification. Write a formal precondition/postcondition spec for:

fun countBad(hs: List<Int>): Int

(ii) Implementation. Implement countBad so that it runs in $O(n \log n)$ time, where $n$ = hs.size. A correct but slow $O(n^2)$ reference:

fun countBadSlow(hs: List<Int>): Int {
    var count = 0
    for (i in hs.indices)
        for (j in i + 1 until hs.size)
            if (hs[i] < hs[j]) count++
    return count
}

Your fast implementation must return the same answer. (Hint: Write a merge-like algorithm that simultaneously sorts the list and accumulates the count of bad pairs. When you’re done, it should look almost identical to merge sort with one extra piece of bookkeeping.)

For reference, the following examples should hold:

countBad(listOf(35, 22, 10)) $= 0$
countBad(listOf(3, 1, 4, 2)) $= 3$
countBad(listOf(5, 4, 11, 7)) $= 4$
countBad(listOf(1, 7, 22, 13, 25, 4, 10, 34, 16, 28, 19, 31)) $= 49$

(iii) Testing. Write a JUnit test suite in ZombiesTest.kt that verifies your implementation, including the examples above and any boundary cases you identify.

(iv) Complexity. Analyze the running time of your implementation. State and justify a $\Theta$ bound in terms of $n$.

Binary Search: Last Occurrence (15 pts)

Save your work in BinarySearchLast.kt and BinarySearchLastTest.kt.

When a key appears multiple times, the standard binary search algorithm may return the index of any occurrence. Your task is to implement a variant that always returns the index of the last occurrence.

(i) Specification. Write a formal precondition/postcondition spec for:

fun binarySearchLast(a: List<Int>, k: Int): Int?

The function should return $\max\{i \mid a[i] = k\}$, or null if $k \notin a$. For example:

binarySearchLast(listOf(1, 2, 2, 2, 4, 5), 2) returns 3
binarySearchLast(listOf(1, 2, 2, 2, 4, 5), 0) returns null
binarySearchLast(listOf(1, 2, 2, 2, 4, 5), 5) returns 5

(ii) Implementation. Implement binarySearchLast in $O(\log n)$ time, where $n$ = a.size. Your solution must remain $O(\log n)$ even when $k$ appears roughly $n$ times in a row — scanning linearly from a found occurrence will not receive full credit.

(iii) Testing. Write a JUnit test suite in BinarySearchLastTest.kt. Include at a minimum: the examples above, the case where k is absent, and cases that test the boundaries of your binary search range.

Rank and Select (25 pts)

Save your work in Rank.kt and RankTest.kt.

The rank of an element $e$ with respect to a list $L$, written $r(L, e)$, is the number of elements in $L$ strictly less than $e$. For example, with $L = [10, 21, 32, 53, 54]$:

$r(L, 9) = 0$
$r(L, 10) = 0$
$r(L, 33) = 3$ (since $10$, $21$, $32$ are all less than $33$)

Throughout, let $n$ = a.size, $m$ = b.size, and $A + B$ denote the concatenation of the two lists. You are guaranteed that $A + B$ contains no repeated numbers.

(i) Specification. Write formal precondition/postcondition specs for both functions below.

(ii) Subtask I — rank. Implement:

fun rank(a: List<Int>, b: List<Int>, e: Int): Int

which takes two sorted lists and returns $r(A+B,\, e)$ without materializing $A+B$. Your implementation must run in $O(\log(n+m))$ time.

(iii) Subtask II — select. Implement:

fun select(a: List<Int>, b: List<Int>, k: Int): Int?

which returns the element of rank $k$ in $A+B$ — equivalently, (a + b).sorted()[k] (only faster). Return null if no such element exists. Your implementation must run in $O(\log^2(n+m))$ time. (Note: $\log n + \log m \in O(\log(n+m))$ when $n, m > 0$.)

Bonus. This is not required. Redo Subtask II in $O(\log(n+m))$ time.

(iv) Testing. Write a JUnit test suite in RankTest.kt covering both functions. Include cases for elements present and absent, boundary indices ($k = 0$, $k = n+m-1$), and a case where one of the two lists is empty.

(v) Complexity. State and justify the running time of each of your implementations.

Stuttering Substring (20 pts)

Save your work in Stutter.kt and StutterTest.kt.

We reviewed the stuttering-substring problem in class. This question asks you to implement it efficiently.

Definitions. We use a non-standard meaning of “substring” in this course: $A$ is a substring of $B$ if the characters of $A$ appear in $B$ in order, but not necessarily contiguously (you may skip characters of $B$). Formally, $A$ is a substring of $B$ if there exist indices $0 \leq i_0 < i_1 < \cdots < i_{n-1} < |B|$ such that $B[i_k] = A[k]$ for all $k$.

For example, "cat" is a substring of "excavate" (positions 2, 5, 7), but "vet" is not.

The stutter of string $A$ by factor $k$, written $A^{(k)}$, repeats each character of $A$ consecutively $k$ times. $A^{(0)}$ is the empty string (a substring of any $B$). For example:

"hello" $^{(2)}$ = "hheelllloo"
"hi" $^{(5)}$ = "hhhhhiiiii"

The stuttering-substring problem: given strings $A$ and $B$, find the largest $k \geq 0$ such that $A^{(k)}$ is a substring of $B$.

(i) Specification and implementation — isSubstr. Write a formal spec and implement:

fun isSubstr(a: String, b: String): Boolean

that returns whether a is a substring of b in the sense defined above. Your implementation must run in $O(|a| + |b|)$ time. (Hint: Use two pointers advancing through a and b — once a character of b is examined, it never needs to be revisited.)

(ii) Complexity analysis — isSubstr. Justify why your implementation meets the $O(|a| + |b|)$ bound. In particular, argue that each character of a and each character of b is “touched” at most once.

(iii) Specification and implementation — stutter. Write a formal spec and implement:

fun stutter(a: String, k: Int): String

that produces $A^{(k)}$. For a string of length $n$ this should run in $O(nk)$ time.

(iv) Specification and implementation — maxStutter. Write a formal spec and implement:

fun maxStutter(a: String, b: String): Int

that returns the answer to the stuttering-substring problem for the pair (a, b). Both a and b are guaranteed to be non-empty.

Your implementation must run in $O((m + n)\log(m/n))$ time, where $n = |a|$ and $m = |b|$. (Hint: Observe that if $A^{(k)}$ is a substring of $B$ then $A^{(k-1)}$ is too — so the set of valid $k$ values is a contiguous range $\{0, 1, \dots, k^*\}$. What does that suggest about how to find $k^*$ efficiently? What is the largest $k$ worth trying?)

(v) Complexity analysis — maxStutter. Analyze the running time of your implementation. Define what a “probe” is in your algorithm and argue: (a) the number of probes is $O(\log(m/n))$, and (b) each probe costs $O(m + n)$. Conclude that the total running time is $O((m+n)\log(m/n))$.

(vi) Testing. Write a JUnit test suite in StutterTest.kt covering all three public functions. At a minimum include:

isSubstr: a case where a is empty, where a == b, where a is a substring but not a contiguous one, and where a is not a substring.
stutter: $k = 0$, $k = 1$, and $k > 1$.
maxStutter: a case where the answer is 0, a case where the answer is 1, and a case where $|b|$ is a large multiple of $|a|$.

Quick Sort Recurrence (20 pts)

Written question — answer in your PDF.

Consider the recurrence

\[f(n) = n+1 + \frac{2}{n}\bigl(f(n-1) + f(n-2) + \dots + f(1)\bigr), \qquad f(0) = 0.\]

This recurrence represents the average running time of the randomized quicksort algorithm. (For those who have taken Discrete Math: this is the expected running time.) For now, we won’t discuss how one arrives at such a recurrence — your goal is to solve it for a closed form.

(i) Consider $f(n)$ and $f(n-1)$ for $n \geq 2$:

\[f(n) = (n+1) + \frac{2}{n}\bigl(f(n-1) + f(n-2) + \dots + f(1)\bigr)\] \[f(n-1) = n + \frac{2}{n-1}\bigl(f(n-2) + f(n-3) + \dots + f(1)\bigr)\]

Multiplying the first equation by $n$ and the second by $(n-1)$:

\[n \cdot f(n) = n(n+1) + 2\bigl(f(n-1) + f(n-2) + \dots + f(1)\bigr) \tag{A}\] \[(n-1)f(n-1) = n(n-1) + 2\bigl(f(n-2) + f(n-3) + \dots + f(1)\bigr) \tag{B}\]

Subtracting (B) from (A) yields

\[n \cdot f(n) = 2n + (n+1)f(n-1). \tag{$*$}\]

Your task in this step is to understand the derivation above. No work to submit.

(ii) Let $g(n) = \dfrac{f(n)}{n+1}$. Rewrite equation ($*$) in terms of $g$. (Hint: divide both sides of ($*$) by $n(n+1)$.)

(iii) Find a closed form for $g(n)$. The recurrence isn’t in the standard table, but it unravels directly. As a warm-up, here is how to solve the related recurrence $h(n) = h(n-1) + \dfrac{1}{n}$ with $h(0) = 0$:

\[h(n) = h(n-1) + \frac{1}{n} = h(n-2) + \frac{1}{n} + \frac{1}{n-1} = \cdots = \frac{1}{n} + \frac{1}{n-1} + \dots + \frac{1}{2} + 1.\]

This sum has a name: the $n$-th Harmonic number $H_n = \displaystyle\sum_{k=1}^{n} \frac{1}{k}$, so $h(n) = H_n$. Apply the same unraveling technique to find a closed form for $g(n)$.

(iv) Use your closed form for $g(n)$ to derive a closed form for $f(n)$. Then use the fact below to conclude that $f(n) \in O(n \log n)$.

Fact: $H_n \leq 1 + \ln n$, where $\ln$ denotes the natural logarithm.