Oracle 12C RAC 基于权重的节点驱逐算法- -Server Weight-Based Node Eviction

在大部分集群里,都会有在可能发生脑裂情况下的驱逐算法,有的简单粗暴,有的相对复杂。在oracle RAC中,驱逐算法相对较为复杂,并且这些算法已沿用了多个版本。不过即使是很多从业多年的老DBA也无法详细准确的描述。

算法如下:

  • 首先,想要存活,节点必须能访问到超过(n/2)+1个vote disk,如果这个条件不能满足,节点将不参与驱逐算法,自发离开集群。
  • 其次,满足vote disk最小数量的节点。当集群因心跳网络断裂,分割为多个子集群的话,拥有较多节点的子集群存活,拥有较少节点的子集群将被vote disk上的kill block来“赐死”。
  • 再次,满足vote disk最小数量的节点。当集群因心跳网络断裂产生多个子集群,且有子集群中节点数量一致时,拥有最小实例号的子集群存活,其他子集群被驱逐。
  • 最后,倘若心跳网络未发生断裂,有3个vote disk — A,B,C的集群中,如果节点1可以访问AB表决盘,节点2可以访问BC表决盘,集群节点关系维持不变。不发生驱逐。

一直这个算法沿用到12C之前。从12.1.0.2开始,Oracle对待RAC的驱逐,引入了一个新的算法,叫做基于权重的节点驱逐算法–Server Weight-Based Node Eviction。

在官方文档有如下描述:

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/cwadd/oracle-clusterware-administration.html#GUID-48000840-68BA-4BCF-9A74-D5C18D77013F

You can configure the Oracle Clusterware failure recovery mechanism to choose which cluster nodes to terminate or evict in the event of a private network (cluster interconnect) failure.
In a split-brain situation, where a cluster experiences a network split, partitioning the cluster into disjoint cohorts, Oracle Clusterware applies certain rules to select the surviving cohort, potentially evicting a node that is running a critical, singleton resource.
You can affect the outcome of these decisions by adding value to a database instance or node so that, when Oracle Clusterware must decide whether to evict or terminate, it will consider these factors and attempt to ensure that all critical components remain available. You can configure weighting functions to add weight to critical components in your cluster, giving Oracle Clusterware added input when deciding which nodes to evict when resolving a split-brain situation.
You may want to ensure that specific nodes survive the tie-breaking process, perhaps because of certain hardware characteristics, or that certain resources survive, perhaps because of particular databases or services. You can assign weight to particular nodes, resources, or services, based on the following criteria:
You can assign weight only to administrator-managed nodes.
You can assign weight to servers or applications that are registered Oracle Clusterware resources.
Weight contributes to importance of the component and influences the choice that Oracle Clusterware makes when managing a split-brain situation. With other critical factors being equal between the various cohorts, Oracle Clusterware chooses the heaviest cohort to survive.

官方文档摘录

大意是DBA可以手工修改节点或者实例的权重,来在可能发生驱逐的时候,刻意保留某些节点。我个人理解,这个“刻意”可能是基于要保留的节点的硬件配置较好,或者因为extended rac的引入,要刻意保留离业务近的节点,或者存储性能较好的节点等等原因。

可以在三个层面配置这个权重:

  • To assign weight to database instances or services, you use the -css_critical yes parameter with the srvctl add database or srvctl add service commands when adding a database instance or service. You can also use the parameter with the srvctl modify database and srvctl modify service commands.在DB层面配置权重,以保证db所在的节点可以存活,因为在较多RAC中,某个DB的实例并不是分布于所有节点。
  • To assign weight to non ora.* resources, use the -attr "CSS_CRITICAL=yes" parameter with the crsctl add resource and crsctl modify resource commands when you are adding or modifying resources.
  • To assign weight to a server, use the -css_critical yes parameter with the crsctl set server command.在物理机节点层面配置权重,以保证物理节点不被驱逐。

在新的算法引入之后,RAC节点的驱逐方式发生了一点变化:

  • If the sub-clusters are of the different sizes, the functionality is same as in previous releases.如果子集群的节点数量不一样,那么遵照之前的驱逐算法。在文前已经描述。
  • If all the sub-clusters are of the same size, the functionality has been modified as follows:
    • If the sub-clusters have equal node weights, the sub-cluster with the lowest node number survives to ensure that, in a two-node cluster, the node with the lowest node number survives.如果子集群的节点数量一样,且权重一样,那么最小实例号所在的子集群存活。
    • If the sub-clusters have unequal node weights, the sub-cluster having the higher weight survives to ensure that, in a two-node cluster, the node with the lowest node number gets evicted due to the lower weight.如果子集群节点数量一样,那么权重高的节点所在的子集群存活,而不一定是之前的,实例号最小的节点存活。

当发生驱逐时,根据12c: Which Node Will Survive when Split Brain Takes Place (文档 ID 1951726.1)所描述,将会看到如下日志:

2014-11-24 14:25:41.140615 : CSSD:1117321536: clssnmrCheckSplit: Waiting for node weights, stamp(311972654)

如上显示,split参考了节点的权重配置。

–END–

2 thoughts on “Oracle 12C RAC 基于权重的节点驱逐算法- -Server Weight-Based Node Eviction

Alfred Zhao 发表评论 取消回复