Sometimes adding spindles is not enough to resolve a storage performance problem. Today – with high performing SSD disks – it’s getting more and more important to build a good understanding on the architecture of a storage arrays. Eventually this will help you to determine the caveats and will make you pose the good questions to get the maximum performance out of your IT infrastructure.
Some time ago, I got confronted with a storage performance problem. The disk array installed in an appliance was able to move 600GB/hour in 60% write and 40% read. When contacting the vendor, they advised me to install more spindles as this will distribute the load and I would gain performance. Upon validation of the system – storage array controller – specifications, it became clear this would not resolve the situation as the storage controller was already at maximum performance! This is a perfect example of a CPU-bounded system.
The goal of this blogpost is to define the difference between CPU-bound and Disk-bound. Continue reading
SAN switches are the core components in your Storage Area Network. Therefore it’s important to monitor the devices correctly to ensure the operational continuity of your storage infrastructure.
The first approach is by implementing Brocade Network Advisor (BNA). BNA is a tool used to manage, monitor and report (performance -, throughput visualization and many more) on fiber channel SAN switches and comes with a cost (platform hosting the software + software licenses/support). Brocade Network Advisor can be integrated with Microsoft System Center Operations Manager with a Management Pack (plugin).
A different approach is by monitoring the devices with plain SNMP traps. This blogpost will guide you through the configuration process of SNMP on Brocade SAN switches. We will be using Microsoft System Center Orchestrator to collect the information in the SNMP traps and push it into a incident management tool. The use of Microsoft System Center is not mandatory, a Nagios could do the trick as well. Continue reading
I would like to share some of my personal best-practices in upgrading firmwares on Brocade SAN switches and/or directors. This blog post is divided in several sections:
- Pre-upgrade tasks;
- Upgrading the SAN switch;
- Post-upgrade tasks;
- Alternative FTP server;
- Putty log of a SAN switch upgrade.
First of all, it’s important to mention every Brocade SAN switch has two firmware partitions. The Fabric OS (also referred to as “FOS“) is booted from the active partition, whereas the secondary partition is used to provide the ability to perform a non-disruptive firmware upgrade or as a fallback mechanism in case the firmware on the primary partition is damaged.
Appl Primary/Secondary Versions
The firmware on the SAN switch can be upgraded disruptively or, non-disruptively which will take some more time. When you are upgrading SAN switch components in a live production environment, it’s highly advisable to use the non-disruptive approach.
The firmware upgrade path can be collected from the “Brocade Fabric OS vA.B.CD Release Notes“. In general, we can say non-disruptive upgrade is supported from the previous version (identified as “B” in the version information). For example: v7.1.2b > v7.2.1c > v7.3.1d > v7.4.1. Continue reading
MPIO or multi-path is a mechanism to mitigate the effects of a failure (HBA failure, switch failure, …) by routing the storage traffic over an alternate path between the servers and the storage device.
In normal situations, the system is configured redundantly to avoid unneccesary point of failure. This means:
- redundant storage (at least 2 controllers);
- redundant fabrics;
- redundant host bus adapters;
Iometer is an I/O subsystem measurement and characterization tool for single and clustered systems. It was originally developed by the Intel Corporation and in 2001 it was given to the Open Source Development Lab (OSDL). Since the moment of birth it gained in popularity and basically every IT guy has used it at least once.
IOmeter is a very powerfull tool, but if it’s inproperly used it will give you some distorted results. Therefore it’s advised to follow these guidelines:
So recently I encountered a very weird phenomenon at one of my customers and we had a very hard time to determine the root cause of the issue.
My customer buys his servers each time in a set of 12 (one rack). All servers are equipped with a dual-port fiberchannel Host Bus Adapter (HBA). Each port is connected to a different fabric (TOP & BOT fabric).
One of the racks freshly installed a few weeks before the maintenance weekend when we performed a storage-, SAN switch & server upgrade was causing a whole bunch of issues. The fact is that all switches are in a healthy state, no errors are visible in the errorlog and all ports have succesfully performed a Fabric Login Process (FLOGI).
Our customers uses HP Proliant DLxxx G7 servers with a combination of QLogic and Emulex fiberchannel cards. The fiberchannel switches are HP-branded: HP Brocade 8/40 SAN switch.
In a first case, we verified the port configuration:
- Fix speed? Yes, 8G.
- Fillword? Configured with mode 3 (aa-then-ia: attempts hardware arbff-arbff (mode 1) first. If the attempt fails to go into active state, this command executes software idle-arb (mode 2). Mode 3 is the preferable to modes 1 and 2 as it captures more cases.)
and ofcourse the port statistics (and in more detail, the port errors). Here I came to the conclusion the numbers where very static. Wich means the port is online in the fabric, and as no errors are filling up I came to the conclusion the port was not being used in the fabric even it was zoned out with a storage array and a disk has been presented.
When configuring a brand new Brocade San switch settings are configured with factory defaults.
Logging in on the switch can be done by opening an SSH/Telnet session to the SAN switch on IP 10.77.77.77.
Hope this helps.
When a LUN is presented to a physical Windows server it’s more easy to link the LUN to the mounted disk within the operation system.
You can simply open the Disk Management tool (diskmgmt.msc) and click on the properties of the disk itself (not the partition!).
On the general tab, you can find the LUN number in the Location section.
In our case some LUNs are directly presented to a virtual machine running Windows 2008R2 (Raw Device Mappings). The disks itself are used within a virtual cluster configuration.
The customer asked me to identify the disk within the virtual machine together with the disk on the storage array (HP EVA 8400).
If you think this should be an easy task, think twice! The procedure below will be able to assist you in this matter.
HP Insight Remote Support can be used for automatic call creation and monitoring a wide-range of different components. When I install the software, it’s mainly for monitoring Storage Components (such as: HP P6000, SAN Switches, Management Servers, etc.).
The procedure below has been validated on HP Insight Remote Support 7.0.5 (build 0193).
In some situations, you want to control the sender name used to send messages. The procedure below describes how you can alter the sender e-mail address.
The default e-mails are send by using the following sender mail address: InsightRemoteSupport@hp.com.
When creating a new volume on an HP P2000G3 the system throws the following error in the GUI:
"Unable to create volume NewVolume. The vdisk was not found on this system."
Eventually we tried to create the volume by using the CLI, but this gave us a similar error notification:
"Error: The vdisk was not found on this system."