数据结构 | 优先队列堆

一　优先队列

优先队列是一些按照重要性或优先级来组织的对象。按优先级组织的二叉检索树，其平均情况下插入和删除操作的总时间代价为 $\Theta(n\log n)$ ，但这可能回导致二叉检索树不平衡，降低性能。这样一来，堆数据结构应运而生。

二　堆

（一）定义

堆由以下两条性质来定义：

它是一棵完全二叉树；
其中存储的数据局部有序。

注意到堆和二叉检索树的区别。二叉检索树定义了一组完全排序的结点，而堆的排序是局部的——堆的兄弟结点之间并无必然联系。

根据局部有序的不同，有以下两种堆：

最大堆-任意一个结点存储的值都大于或等于其任意一个子结点存储的值。
最小堆-任意一个结点存储的值都小于或等于其任意一个子结点存储的值。

这两种堆都各有所长。下文都以最大堆为例。

单击此处查看最大堆的类声明及其成员函数的实现

下面是从教材中摘录的最大堆的类声明及其成员函数的实现。

// Heap class
template <typename E, typename Comp> class heap {
private:
  E* Heap;          // Pointer to the heap array
  int maxsize;         // Maximum size of the heap
  int n;               // Number of elements now in the heap

  // Helper function to put element in its correct place
  void siftdown(int pos) {
    while (!isLeaf(pos)) { // Stop if pos is a leaf
      int j = leftchild(pos);  int rc = rightchild(pos);
      if ((rc < n) && Comp::prior(Heap[rc], Heap[j]))
        j = rc;            // Set j to greater child's value
      if (Comp::prior(Heap[pos], Heap[j])) return; // Done
      swap(Heap, pos, j);
      pos = j;             // Move down
    }
  }

public:
  heap(E* h, int num, int max)     // Constructor
    { Heap = h;  n = num;  maxsize = max;  buildHeap(); }
  int size() const       // Return current heap size
    { return n; }
  bool isLeaf(int pos) const // True if pos is a leaf
    { return (pos >= n/2) && (pos < n); }
  int leftchild(int pos) const
    { return 2*pos + 1; }    // Return leftchild position
  int rightchild(int pos) const
    { return 2*pos + 2; }    // Return rightchild position
  int parent(int pos) const  // Return parent position
    { return (pos-1)/2; }
  void buildHeap()           // Heapify contents of Heap
    { for (int i=n/2-1; i>=0; i--) siftdown(i); }

  // Insert "it" into the heap
  void insert(const E& it) {
    Assert(n < maxsize, "Heap is full");
    int curr = n++;             //这里先使用n再递增。因为一共n个元素，下标由0至n-1，故新增元素下标是递增前的n
    Heap[curr] = it;            // Start at end of heap
    // Now sift up until curr's parent > curr
    while ((curr!=0) &&
           (Comp::prior(Heap[curr], Heap[parent(curr)]))) {
      swap(Heap, curr, parent(curr));
      curr = parent(curr);
    }
  }
  // Remove first value
  E removefirst() {
    Assert (n > 0, "Heap is empty");
    swap(Heap, 0, --n);       // Swap first with last value
    if (n != 0) siftdown(0);  // Siftdown new root val
    return Heap[n];             // Return deleted value
  }

  // Remove and return element at specified position
  E remove(int pos) {
    Assert((pos >= 0) && (pos < n), "Bad position");
    if (pos == (n-1)) n--; // Last element, no work to do
    else
    {
      swap(Heap, pos, --n);          // Swap with last value
      while ((pos != 0) &&
             (Comp::prior(Heap[pos], Heap[parent(pos)]))) {
        swap(Heap, pos, parent(pos)); // Push up large key
        pos = parent(pos);
      }
      if (n != 0) siftdown(pos);     // Push down small key
    }
    return Heap[n];
  }
};

类maxheap对基于数组的实现进行了两项调整：

堆的结点根据其在堆中的逻辑位置（实际上就是数组的下标）指示，而不是使用指针指向。
指向所用数组的指针作为构造函数的参数。Heap = h;中h就是数组的位置。

（二）分析

1.`siftdown`函数

// Helper function to put element in its correct place
void siftdown(int pos) {
  while (!isLeaf(pos)) { // Stop if pos is a leaf
    int j = leftchild(pos);  int rc = rightchild(pos);
    if ((rc < n) && Comp::prior(Heap[rc], Heap[j]))
      j = rc;            // Set j to greater child's value
    if (Comp::prior(Heap[pos], Heap[j])) return; // Done
    swap(Heap, pos, j);
    pos = j;             // Move down
  }
}

siftdown函数将当前结点与子结点比较并进行适当地移动。这个函数的思路是这样的：

获取pos的左右子结点。
在第一个if语句中，通过Comp类的prior函数比较二者值的大小（注意到在最大堆中，排序应该是从大到小的。如果prior函数的第一个参数大于第二个参数，则返回True）。该语句的结果总是j被赋较大值的下标。
第二个if语句是判断当前pos处的值是否已经处在合适的位置，即其值比Heap[j]大（注意在上一步中Heap[j]已经用来表示左右子结点中的较大者），如是，则意味着pos处的值已经在合适的位置上，因此直接结束循环，并返回；否则将二者中较大的值所在的位置与pos位置进行交换，最后将pos向下（值较大的位置）移动，如此循环，直至到达叶结点或者到达某个合适的分支结点。这样就实现了将元素放到合适的位置上。

2.`insert`函数和`buildHeap`函数

void buildHeap()           // Heapify contents of Heap
    { for (int i=n/2-1; i>=0; i--) siftdown(i); }

// Insert "it" into the heap
void insert(const E& it) {
  Assert(n < maxsize, "Heap is full");
  int curr = n++;             //这里先使用n再递增。因为一共n个元素，下标由0至n-1，故新增元素下标是递增前的n
  Heap[curr] = it;            // Start at end of heap
  // Now sift up until curr's parent > curr
  while ((curr!=0) &&
          (Comp::prior(Heap[curr], Heap[parent(curr)]))) {
    swap(Heap, curr, parent(curr));
    curr = parent(curr);
  }
}

insert函数用于插入值到堆中。为了保持完全二叉树的形状，插入不能从根结点开始，而是应该末尾位置开始上移调整至合适的位置。
buildHeap函数用于建堆。之前提到堆的初始化基于数组（一个完全二叉树），建堆过程要做的是调整每一个结点的位置。教材提到一种源于归纳法的较好算法。假设根的左右子树已经是堆，并且根的元素值为R，那么就有两种情况：

R的值大于或等于其两个子结点，此时形成堆结构；
R的值小于某一个或全部两个子结点的值，此时R与其中的较大值交换位置，然后继续重复本流程，直至到达某一层，使它大于它的子结点或者它成为叶结点。

这就是函数siftdown完成的内容，因此在buildHeap函数中会调用它。注意到之前提到的假设：根的左右子树已经是堆，因此建堆过程由后向前；同时叶结点不能再下移，因此不需要访问叶结点。所以，从最后一个分支结点开始，也就是int i=n/2-1;，逐个向上执行函数siftdown。

实际上，建堆的方式除了使用buildHeap函数的方式，还可以使用insert函数将值逐个插入堆中。关于二者的时间复杂度分析如下：

函数insert：在最差情况下，插入的值要从堆的底部移动到堆的顶端，这意味着每一次调用该函数的时间代价是 $\Theta(\log n)$ ，插入n个值则是 $\Theta(n\log n)$ 。
函数buildHeap：该函数代价是调用函数siftdown的代价之和。在一棵完全二叉树中，大约有一半的结点是叶结点，无需向下移动。然后，四分之一的结点在叶结点上一层，只需移动一层。以此类推，在树中每向上一层，结点的数目就为前一层的一半，所需移动层数加1。因此树中所有元素所需移动的最大距离是
$\sum\limits_{i=1}^{\log n}(i-1)\frac{n}{2^i}=\frac{n}{2}\sum\limits_{i=1}^{\log n}\frac{i-1}{2^{i-1}}$
其中 $(i-1)$ 是移动层数， $n/(2^i)$ 是从叶结点层往前第i层的结点数。该式表明在最差情况下，建堆的时间代价是 $\Theta(n)$ 。这优于逐个插入的方法。

3.`removefirst`函数和`remove`函数

// Remove first value
E removefirst() {
  Assert (n > 0, "Heap is empty");
  swap(Heap, 0, --n);       // Swap first with last value
  if (n != 0) siftdown(0);  // Siftdown new root val
  return Heap[n];             // Return deleted value
}

// Remove and return element at specified position
E remove(int pos) {
  Assert((pos >= 0) && (pos < n), "Bad position");
  if (pos == (n-1)) n--; // Last element, no work to do
  else
  {
    swap(Heap, pos, --n);          // Swap with last value
    while ((pos != 0) &&
            (Comp::prior(Heap[pos], Heap[parent(pos)]))) {
      swap(Heap, pos, parent(pos)); // Push up large key
      pos = parent(pos);
    }
    if (n != 0) siftdown(pos);     // Push down small key
  }
  return Heap[n];
}

这两个删除函数的思想是相同的，即都是将堆中最后一个位置上的元素与被删除元素交换，来保持堆的形状，然后使用siftdown函数调整位置。利用--n使元素数目自减，这样堆中的操作就不会纳入最后一个元素。请注意，这里不能释放该元素，否则就会造成空间丢失。

一 优先队列

二 堆

（一）定义

（二）分析

1.siftdown函数

2.insert函数和buildHeap函数

3.removefirst函数和remove函数

一　优先队列

二　堆

1.`siftdown`函数

2.`insert`函数和`buildHeap`函数

3.`removefirst`函数和`remove`函数