22/11/2025
Heap vs Sorted Array: Why Building a Heap is Linear
One of the most surprising results in algorithm analysis is that building a heap takes only time, while sorting an array requires comparisons. This isn't just a lucky optimization—it's a fundamental consequence of how many valid arrangements exist for each structure. Let's explore why.
The Sorted Array: Only One Way
For distinct elements, there is exactly one valid sorted arrangement. Out of all possible permutations, only one satisfies the sorted order. The probability of randomly arriving at this configuration is:
Why Comparisons Are Required
To find this single correct arrangement through comparisons, we can model the sorting process as a decision tree. Each comparison eliminates some permutations from consideration:
shInitial state: n! possible permutations remain Compare a[0] vs a[1] / \ a[0]<a[1] a[1]<a[0] / \ n!/2 remain n!/2 remain | | [more comparisons] [more comparisons] | | v v Eventually: 1 permutation remains (sorted)Initial state: n! possible permutations remain Compare a[0] vs a[1] / \ a[0]<a[1] a[1]<a[0] / \ n!/2 remain n!/2 remain | | [more comparisons] [more comparisons] | | v v Eventually: 1 permutation remains (sorted)
Each comparison can at best halve the number of remaining possibilities (it's a binary decision). To narrow down from possibilities to 1, we need at least:
Using Stirling's approximation:
Let's visualize this with a concrete example for elements :
shAll 24 permutations: [1234, 1243, 1324, 1342, 1423, 1432, 2134, 2143, 2314, 2341, 2413, 2431, 3124, 3142, 3214, 3241, 3412, 3421, 4123, 4132, 4213, 4231, 4312, 4321] After "compare 1st vs 2nd": - If 1st < 2nd: eliminates ~12 permutations - If 1st > 2nd: eliminates ~12 permutations After several more comparisons: ...eventually converges to: [1234] ← The only valid sorted outputAll 24 permutations: [1234, 1243, 1324, 1342, 1423, 1432, 2134, 2143, 2314, 2341, 2413, 2431, 3124, 3142, 3214, 3241, 3412, 3421, 4123, 4132, 4213, 4231, 4312, 4321] After "compare 1st vs 2nd": - If 1st < 2nd: eliminates ~12 permutations - If 1st > 2nd: eliminates ~12 permutations After several more comparisons: ...eventually converges to: [1234] ← The only valid sorted output
The decision tree has depth .
Key insight: Because there's only one target out of possibilities, we need comparisons to identify it.
The Heap: Many Valid Arrangements
A max-heap satisfies the heap property: every parent is greater than or equal to its children. However, unlike a sorted array, many different arrangements can satisfy this property for the same set of elements.
Consider the array as a max-heap:
shOriginal heap: 5 / \ 3 4 / \ 1 2Original heap: 5 / \ 3 4 / \ 1 2
We can swap the left and right children at any node and still maintain the heap property:
shSwapped at root: 5 / \ 4 3 ← Swapped 3 and 4 / \ 1 2 Swapped at left child: 5 / \ 3 4 / \ 2 1 ← Swapped 1 and 2 Multiple swaps: 5 / \ 4 3 / \ 2 1 ← Both levels swappedSwapped at root: 5 / \ 4 3 ← Swapped 3 and 4 / \ 1 2 Swapped at left child: 5 / \ 3 4 / \ 2 1 ← Swapped 1 and 2 Multiple swaps: 5 / \ 4 3 / \ 2 1 ← Both levels swapped
All of these are valid max-heaps! This flexibility is the key: there are many more valid heap configurations than there is a single sorted arrangement.
Counting the Number of Heaps:
Let's derive exactly how many distinct max-heap structures can be formed from distinct elements.
Recurrence Relation
For a heap of size with root element (the maximum), we partition the remaining elements into:
- A left subtree of size
- A right subtree of size
The left subtree can be any valid heap of size , giving possibilities. Similarly, the right subtree has possibilities.
But which elements go left vs right? For a complete binary tree of size , the size of the left subtree is determined by the tree shape. However, we can choose any elements from the remaining elements to place in the left subtree. That's ways.
This gives the recurrence:
where and are determined by the complete binary tree structure (left subtree as full as possible), and .
Exact Formula
There's an elegant closed form for :
where the product is over all nodes in the heap tree, and is the size of the subtree rooted at node .
Why does this work? The numerator represents all possible arrangements of elements. The denominator counts the "over-counting" due to heap symmetries. Each subtree of size has internal orderings that don't matter for the heap property, so we divide by to remove those redundant arrangements.
Small Checks
Let's verify this formula for small values:
:
Heap: [1]
✓
:
shHeap: 2 | 1Heap: 2 | 1
Subtree sizes: root has , left child has
✓
:
shHeap: 3 / \ 1 2 (or swap: 2 and 1)Heap: 3 / \ 1 2 (or swap: 2 and 1)
Subtree sizes: root has , both children have each
✓
Using recurrence: ✓
:
shPossible heaps: 4 4 4 / \ / \ / \ 3 2 3 1 2 3 ... / / / 1 2 1 And more variations...Possible heaps: 4 4 4 / \ / \ / \ 3 2 3 1 2 3 ... / / / 1 2 1 And more variations...
for a complete tree of size 4
Subtree sizes: root=, left child=, left-left child=, right child=
✓
Using recurrence: ✓
Asymptotic Analysis:
To understand how grows, let's analyze the denominator .
In a complete binary tree:
- There are levels
- Level has nodes (for )
- Nodes at level have subtrees of size
The product becomes:
Taking logarithms:
The dominant terms come from large , where and we have terms overall. Working through the algebra:
But more precisely, the sum evaluates to approximately for some constant , meaning:
Therefore:
Using Stirling's approximation , we get:
Key observation: While , we have but with a much smaller exponent than .
More specifically: compared to .
Why Heap Construction is Linear
Now the punchline: because there are exponentially many valid heaps (roughly ), the probability of building a valid heap is much higher:
This is vastly larger than for sorted arrays. In the decision tree model:
Mathematical proof sketch: The standard heapify algorithm works bottom-up, with each level requiring work proportional to the number of nodes times their height. The total is:
The sum converges to a constant (specifically, 1), so:
The combinatorial argument reinforces this: we don't need to make comparisons because we're not searching for one specific arrangement among possibilities. We're searching for any of the valid heaps, which requires only comparisons.
Conclusion
The fundamental difference:
- Sorted array: 1 valid arrangement out of → requires comparisons
- Max-heap: valid arrangements → requires only comparisons
This isn't just a clever algorithm optimization—it's a deep mathematical truth about the flexibility of heap structures versus the rigidity of sorted order. The heap property is local (parent children), allowing many global configurations, while sorting is global (total order), allowing only one.
References:
leimao.github.io/blog/Heap-Building-Asymptotic-Algorithm
geeksforgeeks.org/dsa/number-ways-form-heap-n-distinct-integers
This article was written with help from Claude, in parts of Latex & some paragraphs