The complete practical guide to architect storage networks for maximum efficiency, and to quickly troubleshoot congestion issues. Data is the most important entity in a data center. In addition to being stored securely for the long term, data must be accessible to applications with low latency so that high performance can be maintained 24/7. To meet this performance goal of applications, the storage infrastructure services must be designed accordingly. In Detecting, Troubleshooting, and Preventing Congestion in Storage Networks, leading Cisco experts take a practical approach to explaining the…mehr
The complete practical guide to architect storage networks for maximum efficiency, and to quickly troubleshoot congestion issues. Data is the most important entity in a data center. In addition to being stored securely for the long term, data must be accessible to applications with low latency so that high performance can be maintained 24/7. To meet this performance goal of applications, the storage infrastructure services must be designed accordingly. In Detecting, Troubleshooting, and Preventing Congestion in Storage Networks, leading Cisco experts take a practical approach to explaining the congestion handling mechanisms of the transport technologies like FC, FCoE, RoCE, and TCP. The authors share proven troubleshooting methodology developed through years of firsthand experience as well as analytical techniques for monitoring the storage fabrics and gaining predictive insights. Through real-world experiences and case study examples, you'll learn what questions to ask, how to start planning, what exists today, what components still must evolve, and how to drive value in building custom applications in detecting congestion in large-scale storage networks. * Optimize user experience with faster resolution of congestion in storage networks in production data centers * Master congestion handling mechanisms in technologies like FC, FCoE, RoCE, and TCP * Applicable to networks connected to all storage array types and vendors including Dell, HPE, IBM, NetApp, Hitachi * Real-world case studies and troubleshooting methodology ensuring storage SLAs are consistently met * Increase uptime with custom analytical tools for predicting and resolving congestion * Boost storage infrastructure efficiency in a hybrid cloud model * Save on employee training and reduce support ticket hassles with vendors
Paresh Gupta, CCIE No. 36645, has almost two decades of experience in the computer industry. Currently, as a senior leader of Technical Marketing Engineering for Cisco, he drives the technical and market evolution of products, technologies, and solutions such as SAN Analytics, Nexus Dashboard, UCS, MDS, and Nexus switches. He has been testing and validating congestion in storage networks for many years. In his multiple roles, he has invented/patented many ideas, developed many features, and trained thousands of people in sales, partner, and customer communities. Paresh is the creator of the fullblown traffic monitoring apps for Cisco UCS Servers (UTM) and MDS switches (MTM). Hundreds of organizations use Pareshs apps in production around the world. Edward Mazurek, CCIE No. 6448, has more than 40 years of experience in the computer networking industry. The first 18 were with IBM, supporting products such as Virtual Machine (VM) and VTAM. He has spent the last 22+ years with the Cisco TAC. As a principal engineer, he supports data center networking technologies, including storage-area networking, Fibre Channel, and FCoE in the MDS, UCS, and Nexus 5000, 6000, 7000, and 9000 series. He holds two CCIEsSNA/IP Integration (2000) and Storage Area Networking (SAN)and is presently a CCIE Emeritus. Ed has spearheaded the congestion-handling mechanisms on the Cisco MDS 9000 switches and Nexus 9000 switches. Based on his deep understanding, Ed developed an app that is being used by other Cisco engineers and continues to be in high demand by various partners. In addition to inventing multiple features on Cisco products, Ed holds multiple patents in the field of network congestion.
Inhaltsangabe
Introduction xxxii Chapter 1 Introduction to Congestion in Storage Networks 1 Types of Storage in a Data Center 1 Storage Protocols, Transports, and Networks 6 Storage Networks 21 Congestion in Storage Networks: An Overview 28 NVMe over Fabrics 43 Quality of Service (QoS) 46 Summary 51 References 52 Chapter 2 Understanding Congestion in Fibre Channel Fabrics 55 Fibre Channel Flow Control 55 Congestion Spreading in Fibre Channel Fabrics 67 Frame Flow Within a Fibre Channel Switch 86 The Effects of Bit Errors on Congestion 92 B2B Credit Loss and Recovery 112 Fibre Channel Counters Summary 123 Summary 127 References 127 Chapter 3 Detecting Congestion in Fibre Channel Fabrics 129 Congestion Detection Workflow 129 Congestion Detection Metrics 135 Congestion Detection Metrics on Cisco MDS Switches 137 Automatic Alerting 168 Detecting Congestion Using Remote Monitoring Platforms 177 Detecting Congestion Due to Slow Drain and Overutilization 192 Slow Drain and Overutilization at the Same Time 194 Detecting Congestion on long-distance links 195 Summary 195 References 196 Chapter 4 Troubleshooting Congestion in Fibre Channel Fabrics 199 Troubleshooting Methodology and Workflow 199 Hints and Tips for Troubleshooting Congestion 214 Cisco MDS NX-OS Commands for Troubleshooting Congestion 219 Case Study 1: Finding Congestion Culprits and Victims in a Single-Switch Fabric 242 Case Study 2: Credit Loss Recovery Causing Frame Drops 271 Case Study 3: Overutilization on a Single Device Causing Massive Congestion Problems 297 Case Study 4: Long-Distance ISLs Causing Congestion 323 Summary 336 References 337 Chapter 5 Solving Congestion with Storage I/O Performance Monitoring 339 Why Monitor Storage I/O Performance? 339 How and Where to Monitor Storage I/O Performance 340 Cisco SAN Analytics Architecture 344 Understanding I/O Flows in a Storage Network 347 I/O Flow Metrics 350 I/O Operations and Network Traffic Patterns 358 Summary 379 References 379 Chapter 6 Preventing Congestion in Fibre Channel Fabrics 381 An Overview of Eliminating or Reducing Congestion 382 Link Capacity 386 Congestion Recovery by Disconnecting the Culprit Device 387 Congestion Recovery by Dropping Frames 388 Traffic Segregation 398 Congestion Prevention Using Rate Limiters on Storage Arrays 433 Congestion Prevention Using Dynamic Ingress Rate Limiting on Switches 436 Preventing Congestion by Notifying the End Devices 457 Network Design Considerations 469 Summary 475 References 476 Chapter 7 Congestion Management in Ethernet Storage Networks 479 Ethernet Flow Control 479 Understanding Congestion in Lossless Ethernet Networks 506 Detecting Congestion in Lossless Ethernet Networks 511 Troubleshooting Congestion in Lossless Ethernet Networks 534 Preventing Congestion in Lossless Ethernet Networks 547 Lossless Traffic with VXLAN 565 Summary 569 References 570 Chapter 8 Congestion Management in TCP Storage Networks 573 Understanding Congestion in TCP Storage Networks 574 Storage I/O Performance Monitoring 587 Preventing Congestion in TCP Storage Networks 597 Detecting Congestion in TCP Storage Networks 615 Troubleshooting Congestion in TCP Storage Networks 625 iSCSI and NVMe/TCP in a Lossless Network 630 iSCSI and NVMe/TCP with VXLAN 631 Fibre Channel over TCP/IP (FCIP) 631 Modified TCP Implementations 637 Summary 638 References 639 Chapter 9 Congestion Management in Cisco UCS Servers 641 Cisco UCS Architecture 641 Understanding Congestion in a UCS Domain 644 Detecting Congestion in a UCS Domain 645 The UCS Traffic Monitoring (UTM) App 648 Summary 668 References 669