Linux, Storage

GlusterFS Fuse Hanging on CentOS 7

Having strange GlusterFS hanging when using the native FUSE client on CentOS? This was a bit of a bitch, actually. It was hard to reproduce. Eventually, the only semi-regular way to repro it was to create lots of small files from multiple servers at the same time.

The Behavior

It would still be mounted but hang. The only indication of things being a problem would be a console hang when trying to df or use the filesystem.

The kern.log also shows that there’s long waits waiting for either the application running on top, or the fuse client itself.

Note: I was actually able to make the NFS client hang, but we don’t want to use the NFS client due to losing the graceful failover features etc. Performance has been reported to be an issue with the fuse client, but I was able to tune this pretty well. I don’t want to go into that here.

The Solution

The base CentOS 7 kernel is pretty old. I mean, it’s still updated, but it’s still 3.10.0-327.10.1 as of June 2016. Instead of compiling our own kernel, I grabbed the RPMs from Elrepo (http://elrepo.org/tiki/tiki-index.php).

Installed this, after many days of troubleshooting, testing, and tuning, this solved the issue. No more lock-ups or fop STAT / LOCK issues.

I didn’t want to go main-line 4.6 kernel, so I opted for the 4.5.4-1 stable kernel. You should also be aware these are VMs running under VMware.

Here’s a quick hacked together Ansible playbook to handle the upgrade and verification for you via yum.

Versions

  • Glusterfs Server – 3.7.11-1 (April 18 2016)
  • Glusterfs Fuse Client – 3.7.11-1
  • Old kernel 3.1.0-327.10.1.el7
  • New kernel 4.5.4-1.el7.elrepo

– hosts: all
sudo: true
vars:
kernel_version: “4.5.4-1.el7.elrepo”

tasks:
– name: Read Kernel Version
command: ‘uname -r’
register: result

– name: Has kernel upgrade already completed
fail: msg=”Kernel version already {{ kernel_version }}”
when: “‘{{ kernel_version }}’ in result.stdout”

– name: Uninstall Existing Kernel Packages
yum: pkg={{item}} state=absent disable_gpg_check=yes
with_items:
– kernel-headers
– kernel-tools
– kernel-tools-libs

– name: Install Existing Kernel Packages
yum: pkg={{item}} update_cache=yes state=installed disable_gpg_check=yes
with_items:
– kernel-ml-{{ kernel_version }}
– kernel-ml-devel-{{ kernel_version }}
– kernel-ml-headers-{{ kernel_version }}
– kernel-ml-tools-{{ kernel_version }}
– kernel-ml-tools-libs-{{ kernel_version }}
– kernel-ml-tools-libs-devel-{{ kernel_version }}

– name: Set Boot Time Option for Kernel
command: “grub2-set-default 0”

– name: Change grub2 configs
command: “grub2-mkconfig -o /boot/grub2/grub.cfg”

– name: Read Kernel Version
command: ‘uname -r’
register: result
ignore_errors: True

– name: Print Kernel Version
debug: var=result.stdout_lines

– name: Restart server
sudo: true
command: “{{ item }}”
async: 0
poll: 0
with_items:
– “shutdown -r +1″
ignore_errors: true

– name: Wait for server to reboot
wait_for: >
host={{ inventory_hostname }}
port=20848
state=started
delay=90
timeout={{ 5 * 60 }}
delegate_to: localhost

– name: Read Kernel Version
command: ‘uname -r’
register: result
ignore_errors: True

– name: Print Kernel Version
debug: var=result.stdout_lines

– name: Did kernel upgrade fail
fail: msg=”Kernel does not match {{ kernel_version }} actual kernel is result.stdout”
when: “‘{{ kernel_version }}’ not in result.stdout”

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s