Abstract: A system and method for recovering from failures in the disk access path of a clustered computing system. Each node of the clustered computing system is provided with proxy software for handling physical disk access requests from applications executing on the node and for directing the disk access requests to an appropriate server to which the disk is physically attached. The proxy software on each node maintains state information for all pending requests originating from that node. In response to detection of a failure along the disk access path, the proxy software on all of the nodes directs all further requests for disk access to a secondary node physically attached to the same disk.

We claim:

1. A clustered multi-processing system comprising:

  • at least three interconnected nodes wherein less than all nodes are server nodes, each node including a memory;
  • a multi-ported disk having at least a primary tail physically attached to a primary server node and a secondary tail physically attached to a secondary server node;
  • a disk access request mechanism, coupled to the nodes, for communicating a disk access request from an originating node to a server node physically attached to the disk along one of at least a primary disk access path and a secondary disk access path defined between the originating node, the server nodes and the disk;
  • a failure detection mechanism, coupled to the nodes, for detecting failures along one of the primary disk access path and the secondary disk access path; and,
  • proxy logic stored in the memory on each of the nodes and coupled to the failure detection mechanism, for redirecting subsequent disk access requests along a non-failing disk access path to the disk, when a failure is detected;
  • said proxy logic comprising a two-phase commit protocol including:
    • a coordinator node being adapted for broadcasting a suspend message to participant nodes to suspend access to a failed disk access path and waiting for an acknowledge message from all participant nodes;
    • each participant node receiving the suspend message being adapted for suspending said access to the failed disk access path, sending the acknowledge message to the coordinator node confirming suspension of said access to the failed disk access path, and waiting for a resume message from the coordinator node;
    • the coordinator node being further adapted for sending the resume message upon receipt of the acknowledge message from said all participant nodes; and
    • said each participant node being further adapted for redirecting said subsequent disk access requests along the non-failing disk access path to the disk, upon receipt of the resume message.

  • A. M. Gheith and J. L. Peterson, "Shared Virtual Disk for a Cluster of Processors with Separate I/O Devices and Shared Memory", IBM Technical Disclosure Bulletin, vol. 36, No. 06B, pp. 375-377, Jun. 1993.
  • H. Bardsley III et al., "Dynamic Storage Susbsystem Path Switching", IBM Technical Disclosure Bulletin, vol. 32, No. 11, pp. 168-169, Apr. 1990.
  • J. C. O'Quin et al., "Takeover Scheme for Control of Shared Disks", IBM Technical Disclosure Bulletin, vol. 32, No. 2, pp. 378-380, Jul. 1989.