{"id":57697,"date":"2020-10-08T10:45:49","date_gmt":"2020-10-08T15:45:49","guid":{"rendered":"https:\/\/blog.cpanel.com\/?p=57697"},"modified":"2020-10-08T10:45:49","modified_gmt":"2020-10-08T15:45:49","slug":"disk-io-errors-troubleshooting-on-linux-servers","status":"publish","type":"post","link":"https:\/\/devel.www.cpanel.net\/blog\/tips-and-tricks\/disk-io-errors-troubleshooting-on-linux-servers\/","title":{"rendered":"Disk IO Errors: Troubleshooting on Linux Servers"},"content":{"rendered":"\n
Disk IO errors (input\/output) issues are a common cause of poor performance on web hosting servers. Hard drives have speed limits, and if software tries to read or write too much data too quickly, applications and users are forced to wait. To put it another way, storage devices can be a bottleneck that stops the server from reaching its full performance potential. <\/p>\n\n\n\n
Disk IO is not the only cause of slow servers, so in this article, we\u2019ll explain how to use Linux IO stats to identify disk IO issues and how to diagnose and fix servers with storage bottlenecks. <\/p>\n\n\n\n
Because disk IO is so important to server performance, it can manifest in many different ways:<\/p>\n\n\n\n
If you observe any of these, high IO loads might be the culprit, but how do you know it\u2019s a storage bottleneck rather than a problem with the network or processor?<\/p>\n\n\n\n
From the user\u2019s perspective, an IO bottleneck might look just like network latency, among other problems. It\u2019s prudent to make sure that the server\u2019s storage is the real culprit so we don\u2019t waste time and money fixing the wrong problem.<\/p>\n\n\n\n
We\u2019ll need to use diagnostic tools on the server\u2019s command line, so log in with SSH<\/a>. <\/p>\n\n\n\n First, let\u2019s see if the CPU is waiting for disk operations to complete. Type \u201ctop\u201d and press enter.<\/em> This launches the top<\/em> tool, which shows server statistics and a list of running processes. The wa <\/em>metric shows IO-wait, the amount of time the CPU spends waiting for IO completion represented as a percentage. <\/p>\n\n\n\n IO-wait is one of a series of processor activity figures in the %CPU row. It also includes:<\/p>\n\n\n\n On the single-CPU server in the example images, these are straightforward to understand. Our server\u2019s CPU spends 59 percent of its time waiting for IO input instead of processing data. An IO-wait above 1 may indicate the server\u2019s hard drives are struggling to supply the processor with data. <\/p>\n\n\n\n On multi-core and multi-processor servers, it\u2019s a little more complex. Because top <\/em>adds the CPU utilization figures for all cores, they can exceed 100 percent. As a rule, if the IO-wait percentage is bigger than 1 when divided by the number of CPU cores, then the processor must wait before it can process data. For example, on a 4-core system with a wa <\/em>of 10 percent, the IO-wait is around 2.5, so the processors are forced to wait. <\/p>\n\n\n\n IO-wait times don\u2019t always mean there is an IO bottleneck, but it is a valuable clue, especially when it correlates with observed performance issues. To discover the cause, we need to investigate further with vmstat<\/em>, which shows statistics for IO, CPU, and memory activity, among others. <\/p>\n\n\n\n We\u2019re asking vmstat <\/em>to show us ten readings at one-second intervals. The first line shows average IO stats since the last reboot, and the subsequent lines show real-time statistics.<\/p>\n\n\n\n We are interested in the io<\/em> column, which is divided into input and output. It shows that large amounts of data were written to a storage device throughout the test period. Compared to the average loads in the first row, the IO system is being seriously stressed. <\/p>\n\n\n\n Next, we want to know which hard drive is under load. To find out, we can use iostat. <\/em><\/p>\n\n\n\n The -m option tells iostat to display statistics in megabytes per second, and -d says we\u2019re interested in device utilization. <\/p>\n\n\n\n The device called vda<\/em> is writing 730 MB of data each second. Whether that\u2019s a problem depends on the capabilities of the server and the device, but with the observed performance degradation and large IO wait times, it\u2019s reasonable to conclude that excessive disk IO on vda<\/em> is the cause of our issues. <\/p>\n\n\n\n There is one other piece of information that could help us narrow things down: the mount point of the vda <\/em>device. The mount point is the directory on the server\u2019s filesystem the device is connected to. You can find it with the lsblk <\/em>command.<\/em><\/p>\n\n\n\n We can see that vda <\/em>has one partition called vda1 <\/em>and that it\u2019s mounted on the root of the filesystem (\/). On this server, that information is not particularly helpful; it only has one mounted device. However, on a server with several storage devices, lsblk <\/em>can help you to figure out where the data is being written and which application is writing it. <\/p>\n\n\n\n Once you have identified the affected drive, there are several approaches you can take to mitigate disk IO issues.<\/p>\n\n\n\n For example, you may want to try changing a few settings to see if performance improves before you upgrade hard drives or memory. Three hard drive configuration settings you should try changing first are:<\/p>\n\n\n\n 1. Turn on write caching<\/p>\n\n\n\n 2. Turn on direct memory access<\/p>\n\n\n\n 3. Upgrade Server Hardware<\/p>\n\n\n\n Here\u2019s how to do that:<\/p>\n\n\n\n Write caching collects data for multiple writes in a RAM cache before writing them permanently to the drive. Because it reduces the number of hard drive writes, it can improve performance in some scenarios. <\/p>\n\n\n\n Write caching can cause data loss if the server\u2019s power is cut before the cache is written to the disk. Don\u2019t activate write caching if you want to minimize the risk of data loss. <\/p>\n\n\n\n The hdparm <\/em>utility can turn write caching on and off. It may not be installed by default on your server, but you can install it from CentOS\u2019s core repository with:<\/p>\n\n\n\n The following command turns write caching on:<\/p>\n\n\n\n To turn write caching off, use:<\/p>\n\n\n\n Direct Memory Access (DMA) allows the server\u2019s components to access its RAM directly, without going via the CPU. It can significantly increase hard drive performance in some scenarios. <\/p>\n\n\n\n To enable DMA, run the following command, replacing \/dev\/hda <\/em>with your hard drive:<\/p>\n\n\n\n You can turn DMA off with:<\/p>\n\n\n\n DMA isn\u2019t available on all servers and, with virtual servers in particular, you may not be able to modify hard drive settings.<\/p>\n\n\n\n If configuration tweaks don\u2019t solve your IO issues, it\u2019s time to think about upgrading, replacing, or reorganizing the server\u2019s hardware. <\/p>\n\n\n\n Disk IO bottlenecks can be tricky to diagnose, but the process we\u2019ve outlined here will help you to quickly determine whether you have an IO problem, which drives are affected, and what you can do to improve your server\u2019s performance. <\/p>\n\n\n\n As always, if you have any feedback or comments, please let us know. We are here to help in the best ways we can. You\u2019ll find us on Discord<\/a>, the cPanel forums<\/a>, and Reddit<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":" Disk IO errors (input\/output) issues are a common cause of poor performance on web hosting servers. Hard drives have speed limits, and if software tries to read or write too much data too quickly, applications and users are forced to wait. To put it another way, storage devices can be a bottleneck that stops the […]<\/p>\n","protected":false},"author":77,"featured_media":65541,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[61],"tags":[],"class_list":["post-57697","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tips-and-tricks"],"acf":[],"yoast_head":"\n<\/figure>\n\n\n\n
vmstat 1 10<\/code><\/pre>\n\n\n\n
<\/figure>\n\n\n\n
iostat -md<\/code><\/pre>\n\n\n\n
<\/figure>\n\n\n\n
<\/figure>\n\n\n\n
How To Fix Disk IO Issues<\/strong><\/h2>\n\n\n\n
Turn on Write Caching<\/strong><\/h3>\n\n\n\n
yum install hdparm<\/code><\/pre>\n\n\n\n
hdparm -W1 \/dev\/sda<\/code><\/pre>\n\n\n\n
hdparm -W0 \/dev\/sda<\/code><\/pre>\n\n\n\n
Turn on Direct Memory Access<\/strong><\/h3>\n\n\n\n
hdparm -d1 \/dev\/hda<\/code><\/pre>\n\n\n\n
hdparm -d0 \/dev\/hda<\/code><\/pre>\n\n\n\n
Upgrade Server Hardware<\/strong><\/h3>\n\n\n\n