Question: We are using RAC for High Availability. What is better two nodes or three nodes? It's been suggested that "the more nodes, the better". What is the optimal number of RAC nodes?

Answer: First, remember that RAC only protects against server failure, and you must still mirror the disk and have network redundancy for guaranteed 100% availability. In my experience, RAC is not as good as some other options for scalability (i.e. the "scale up" approach), and the most common use of RAC is for HA, and covering server failover.

Vertical vs. Horizontal scalability for Oracle

Oracle 11g RAC Provisioning Pack tips

In deciding about the number of nodes for HA, just multiply the Mean Time Between Failure's (MTBF) for each node together and get an overall probability of failure. It's all about covering the probability of server failure. When used exclusively for HA, two-node RAC is ideal, especially when the nodes are geographically distributed and connected via super-fast dark fiber networks.

Oracle ACE Andy Kerber notes that two node clusters are ideal under some circumstances:

"Well, 3 nodes can be better than two if your workload is fully scalable, and there is no resource contention among the three nodes. I would venture to say that the majority of RAC systems are 2 node clusters, and are set up that way as part of a HA system instead of a workload scaling system."

Oracle Certified Master Steve Karam notes the considerations when choosing the optimal number of RAC nodes for failover:

"Three nodes are 'better' because if one crashes, you won't be failing the entirety of your load over to a single solitary server. In a three node RAC, it allows two nodes to fail before the situation is 'critical.'

Uptime is a function of many unknown variables (power to the server room, stable memory consumption, hardware stability, etc) over time. One server = a boundary condition that limits your chances of uptime by introducing a constant: one unknown variable being negative at any time = total breakdown.

More servers = wider boundaries = less emphasis on the unknown variables that can ultimately cause downtime. Now your uptime is a function of many unknown variables per server over time. More servers = less chance of total downtime.

At the same time, saying that more nodes is always better is false. More nodes are better until you get close to the limits imposed by your back-end cluster interconnect. Thus the boundary condition concept is introduced to a new boundary condition: the number of servers you can have before your performance tanks, which is a different equation entirely.

Interestingly enough, you could use the boundary condition argument to speak the case for SSD (in my hastily assembled opinion). If performance is based upon the ease with which data can travel from the database to the end user; the unknown variables are the amount of data (block gets), the location of data (RAM or disk), concurrency of access (latches, locks), and other obstacles (misc. waits).

The most common tuning practice is to reduce the number of block gets, thereby limiting the first unknown. By using SSD we can moot the second unknown, and always ensure our data comes from RAM. RAC will add a new variable to the 'location' unknown (RAM, remote RAM, disk), but it limits the third unknown (concurrency) since it will be spread across multiple machines.

It can also limit other obstacles (waits) if the waits are node dependent (for instance, network waits would not be limited or removed by going RAC, but px waits could). By this logic, you could say the best system imaginable would be a RAC cluster powered by SSD with a tiny buffer cache on each node, running well tuned SQL.

• Well tuned SQL limits the ill effects of the 'block gets' unknown

• SSD with a tiny buffer cache negates the location unknown, by nulling disk and limiting remote RAM, leaving SSD as the main location of data. This also reduces the ill effects of the cluster interconnect bandwidth/latency boundary condition

• Multiple nodes limit the effects of the concurrency unknown (though the databases = 1 boundary means hot blocks are still possible)

Thus your tuning knobs become the simple concepts of the number of nodes you have (based on concurrency) and the number of block gets you request (based on access paths/data requirements).

Simply put, I would say that our bottlenecks (single points of contention or failure) are math's boundary conditions. They are the limits within which we must operate. Except that as DBAs, we get to do something mathematicians, physicists, etc. don't get to do: change the boundaries."

See my related notes on determining RAC node optimization:

Oracle RAC block size & cache size Configuration tips

Oracle RAC Extended Distance Clusters

Export/Import and the Oracle Scheduler Scheduling

Oracle Real Application Clusters RAC for Data Warehouse Applications

2008 Market Survey of SSD vendors for Oracle:

There are many vendors who offer rack-mount solid-state disk that work with Oracle databases, and the competitive market ensures that product offerings will continuously improve while prices fall. SearchStorage notes that SSD is will soon replace platter disks and that hundreds of SSD vendors may enter the market:

"The number of vendors in this category could rise to several hundred in the next 3 years as enterprise users become more familiar with the benefits of this type of storage."

As of June 2008, many of the major hardware vendors (including Sun and EMC) are replacing slow disks with RAM-based disks, and Sun announced that all of their large servers will offer SSD.

As of June 2008, here are the major SSD vendors for Oracle databases (vendors are listed alphabetically):

2008 rack mount SSD Performance Statistics

SearchStorage has done a comprehensive survey of rack mount SSD vendors, and lists these SSD rack mount vendors, with this showing the fastest rack-mount SSD devices (as of May 15, 2008):

manufacturer	model	technology	interface	performance metrics and notes
Texas Memory Systems	RamSan-400	RAM SSD	Fibre Channel InfiniBand	3,000MB/s random sustained external throughput, 400,000 random IOPS
Violin Memory	Violin 1010	RAM SSD	PCIe	1,400MB/s read, 1,00MB/s write with ×4 PCIe, 3 microseconds latency
Solid Access Technologies	USSD 200FC	RAM SSD	Fibre Channel SAS SCSI	391MB/s random sustained read or write per port (full duplex is 719MB/s), with 8 x 4Gbps FC ports aggregated throughput is approx 2,000MB/s, 320,000 IOPS
Curtis	HyperXCLR R1000	RAM SSD	Fibre Channel	197MB/s sustained R/W transfer rate, 35,000 IOPS

Choosing the right SSD for Oracle

When evaluating SSD for Oracle databases you need to consider performance (throughput and response time), reliability (Mean Time Between failures) and TCO (total cost of ownership). Most SSD vendors will provide a test RAM disk array for benchmark testing so that you can choose the vendor who offers the best price/performance ratio

Oracle Databases

Saturday, June 28, 2008

Determining the optimal number of Oracle RAC nodes

Vertical vs. Horizontal scalability for Oracle

Oracle 11g RAC Provisioning Pack tips

Oracle RAC block size & cache size Configuration tips

Oracle RAC Extended Distance Clusters

Export/Import and the Oracle Scheduler Scheduling

Oracle Real Application Clusters RAC for Data Warehouse Applications

2008 Market Survey of SSD vendors for Oracle:

2008 rack mount SSD Performance Statistics

Choosing the right SSD for Oracle

No comments:

Labels

Blog Archive

Commonly Visited Websites/Forums/Blogs

All About Oracle Databases