Contact
Site: US UK AU |
Nexcess Blog

File Lock Contention

October 15, 2014 1 Comment RSS Feed

File LockI’m sure that we’ve all experienced issues that deal with file lock (flock) contention. More often than not, within the Nexcess tech community, we see this occuring in Magento (typically the sessions files). As such, I figured I’d put throw together something that will allow you to troubleshooting what’s going on more efficiently. Firstly, lets review the two basic types of locks:

READ/SHARED lock:

This type of lock is generally acquired when a process needs to obtain information from a data structure. Typically, these types of locks are non-blocking (they won’t prevent other processes from acquiring a read lock), since retrieval of data won’t affect what other processes are seeing. For example, process (A) reading from data structure (1) won’t affect the information that is seen by process (B), which might be reading from the same data structure. Because this type of lock can be shared among multiple processes, it’s often referred to as a ‘SHARED’ lock.

WRITE/EXCLUSIVE lock:

This type of lock is acquired when a process needs to write information to a data structure. WRITE locks are blocking, meaning that they will prevent other processes from acquiring either a READ or WRITE lock on the data structure. For example, process (A), which is writing to data structure (1) will prevent reads from processes (B), and writes from process (C) to/from the data structure. This allows all data to be thoroughly written to the data structure prior to another process reading from (or writing to) the data structure. Since this type of lock is generally associated with only one process at a time, it’s often referred to as an ‘EXCLUSIVE’ lock.

The concept of READ/WRITE locks extends to other portions of the system (for example, MySQL — read up on ACID (http://dev.mysql.com/doc/refman/5.6/en/mysql-acid.html) transactions for a more in-depth explanation), but for now I’ll cover file locks as seen on the filesystem.

Examples:

Let’s take a look how exclusive locks work. For this example, I’ll make use of the ‘flock’ command.

STEP 1: Create a file

First I’ll create a file called /tmp/flock.file:

 $ touch /tmp/flock.file 

STEP 2: Create a file lock

Next, I’ll acquire a WRITE/EXCLUSIVE lock on the file. The time duration for this lock will be 1 day (via the ‘sleep’ command). For more information on how the flock command works, refer to the man page (or refer to http://linux.die.net/man/1/flock):

$ flock --exclusive /tmp/flock.file --command 'sleep 1d' &
	[1] 654 

STEP 3: Create contending processes

Now I’ll create ten additional processes that will each request a WRITE/EXCLUSIVE lock on the same file (/tmp/flock.file). These processes will last for 5 seconds before echoing that they have completed (to stdout):

 $ for i in {1..10}; do flock --exclusive /tmp/flock.file --command 'sleep 5s && echo Process $i completed' & done
	[1] 2683
	[2] 2684
	[3] 2685
	[4] 2686
	[4] 2687
	[5] 2688
	[6] 2689
	[7] 2690
	[8] 2691
	[9] 2692 

STEP 4: List processes

Alright, so I performed a bunch of bash hoo-rah.. what next? Well, now I’ll use the ‘ps’ (process snapshot) command (you can also use the top, htop and/or atop utilities, with the appropriate columns set) to view my commands. I’m partial to ps, since I can tailor the output to my liking. Again, refer to the man page for whatever utility that you chose, to gain a better understanding of the output/flags/options that you can employ (since I’m using ps, I’ll include its online man page: http://linux.die.net/man/1/ps):

 $ ps -wwwC flock -o user,ppid,pid,stat,wchan:30,etime,command
	USER      PPID   PID STAT WCHAN                              ELAPSED COMMAND
	jrome741  3617   654 S    wait                                 01:14 flock --exclusive /tmp/flock.file --command sleep 1d
	jrome741  3617  2683 S    flock_lock_file_wait                 00:23 flock --exclusive /tmp/flock.file --command sleep 5s && echo Process $i completed
	jrome741  3617  2684 S    flock_lock_file_wait                 00:23 flock --exclusive /tmp/flock.file --command sleep 5s && echo Process $i completed
	jrome741  3617  2685 S    flock_lock_file_wait                 00:23 flock --exclusive /tmp/flock.file --command sleep 5s && echo Process $i completed
	jrome741  3617  2686 S    flock_lock_file_wait                 00:23 flock --exclusive /tmp/flock.file --command sleep 5s && echo Process $i completed
	jrome741  3617  2687 S    flock_lock_file_wait                 00:23 flock --exclusive /tmp/flock.file --command sleep 5s && echo Process $i completed
	jrome741  3617  2688 S    flock_lock_file_wait                 00:23 flock --exclusive /tmp/flock.file --command sleep 5s && echo Process $i completed
	jrome741  3617  2689 S    flock_lock_file_wait                 00:23 flock --exclusive /tmp/flock.file --command sleep 5s && echo Process $i completed
	jrome741  3617  2690 S    flock_lock_file_wait                 00:23 flock --exclusive /tmp/flock.file --command sleep 5s && echo Process $i completed
	jrome741  3617  2691 S    flock_lock_file_wait                 00:23 flock --exclusive /tmp/flock.file --command sleep 5s && echo Process $i completed
	jrome741  3617  2692 S    flock_lock_file_wait                 00:23 flock --exclusive /tmp/flock.file --command sleep 5s && echo Process $i completed 

The -C flock portion will list all processes associated with the ‘flock’ command. The WCHAN column will show the kernel function on which a process is waiting. If you aren’t yet familiar with wait channels, no worries; you won’t need to know much about them at this point. The important part is that you notice the ‘flock_lock_file_wait’ in the WCHAN column. This shows that the process has halted from any userspace code execution (essentially application-level coding written by the user), and is instead waiting on the kernel coding (kernelspace) to acquire some sort of file lock.

STEP 5: Check it out using the procfs filesystem

I can use the /proc filesystem to see the same situation, albeit presented in different format. It’s important to note that not every system you come across will have a procfs mount; you can verify if your system does have one by the output of ‘mount | grep ‘type proc’):

 $ grep FLOCK /proc/locks
	1: FLOCK  ADVISORY  WRITE 654 08:02:108973579 0 EOF
	1: -> FLOCK  ADVISORY  WRITE 2683 08:02:108973579 0 EOF
	1: -> FLOCK  ADVISORY  WRITE 2685 08:02:108973579 0 EOF
	1: -> FLOCK  ADVISORY  WRITE 2686 08:02:108973579 0 EOF
	1: -> FLOCK  ADVISORY  WRITE 2687 08:02:108973579 0 EOF
	1: -> FLOCK  ADVISORY  WRITE 2689 08:02:108973579 0 EOF
	1: -> FLOCK  ADVISORY  WRITE 2692 08:02:108973579 0 EOF
	1: -> FLOCK  ADVISORY  WRITE 2688 08:02:108973579 0 EOF
	1: -> FLOCK  ADVISORY  WRITE 2691 08:02:108973579 0 EOF
	1: -> FLOCK  ADVISORY  WRITE 2690 08:02:108973579 0 EOF
	1: -> FLOCK  ADVISORY  WRITE 2684 08:02:108973579 0 EOF
	6: FLOCK  ADVISORY  WRITE 2200 08:02:72134542 0 EOF 

The general header for this file (it will vary on different kernels) is

 ID:prefix	<FLOCK>	<WRITE/READ LOCK> <PID> <MAJOR DEVICE #>:<MINOR DEVICE #>:<INODE OF LOCKED FILE> <SIZE/OFFSET> <EOF> 

You can see that there are a number of FLOCK entries. You’ll notice that there are a number of entries containing the same ID as well; suggesting that these are all related. If you look at the third to last column (containing the major and minor device numbers, and the inode reference), you’ll see that they are identical. The major and minor device numbers will tell you the block device/partition on which the locked resource lies. For mappings, you can use either the lsblk command (if it’s available on your system):

 $ lsblk
	NAME   KNAME MOUNTPOINT                      FSTYPE MAJ:MIN MODEL              SIZE STATE   MODE       TYPE
	sda    sda                                            8:0   SanDisk SD5SG212 119.2G running brw-rw---- disk
	|-sda1 sda1  /boot                           xfs      8:1                    972.7M         brw-rw---- part
	|-sda2 sda2  /                               xfs      8:2                     19.1G         brw-rw---- part
	`-sda3 sda3  /home                           xfs      8:3                     99.2G         brw-rw---- part 

Or you can use a combination of the procfs and another utility such as df:

 $ cat /proc/partitions
	major minor  #blocks  name
	
	   8        0  125034840 sda
	   8        1     995998 sda1
	   8        2   20000925 sda2
	   8        3  104037885 sda3
	
	$ df -hT
	Filesystem     Type   Size  Used Avail Use% Mounted on
	/dev/sda2      xfs     20G   13G  6.7G  65% /
	/dev/sda3      xfs    100G   27G   73G  27% /home
	/dev/sda1      xfs    968M   47M  922M   5% /boot 

The inode is a little trickier to find; since there isn’t an easy way to list a mapping of inode <-> file on the system. However, that’s ok; we’re not out of options yet!!

STEP 6: Run an strace

Let’s run an strace (http://linux.die.net/man/1/strace) on one of the stalled processes from our ps/top/atop/htop command:

$ strace -fp 2683
	Process 2683 attached - interrupt to quit
	flock(3, LOCK_EX <unfinished ...>
	Process 2683 detached 

You should receive output that resembles the above. If you don’t, simply choose another stalled process and try again. At max, you should only need to try twice.
You’ll see that it’s waiting on the system call flock. From looking at the rest of rest of the line, you can surmise that the type of file lock is an EXCLUSIVE or WRITE lock (LOCK_EX). For those that are interested in the general syntax of a system call, you can refer to the OS man page, or check out http://linux.die.net/man/2/flock for a general guide). The number directly after the opening parenthesis is known as a file descriptor: an integer that is used to refer to certain resources. I’m going to make use of this little guy, using my trusty lsof (list open files) utility (http://linux.die.net/man/8/lsof):

STEP 7: Determine the file in question using lsof and file descriptor of strace

First I’ll run lsof with the -p -ad flags. Make certain that you use the -a flag as well to logically AND two options together; my command is currently read as: “list the open files belonging to PID 2683 AND with a file descriptor of 3”. Since lsof options are logically OR’d together by default, without the -a flag, the command would be read as: “list the open files belonging to PID 2683 OR with a file descriptor of 3” … not what you want (give it a try and you’ll see what I mean).

 $ lsof -p 2683 -ad3
	COMMAND  PID     USER   FD   TYPE DEVICE SIZE/OFF      NODE NAME
	flock   2683 jrome741    3r   REG    8,2        0 108973579 /tmp/flock.file 

STEP 8: Determine processes that are contending for the file

Hey!! That’s the file that I previously placed a lock on! If you take a look, I also now have a inode (NODE column) <-> file (NAME column) mapping. I can now find all processes that are requesting use of the /tmp/flock.file resource by using the -f flag with lsof (refer to the man page for additional information regarding this):

 $ lsof -f -- /tmp/flock.file
	COMMAND  PID     USER   FD   TYPE DEVICE SIZE/OFF      NODE NAME
	flock    654 jrome741    3rW  REG    8,2        0 108973579 /tmp/flock.file
	sleep    655 jrome741    3r   REG    8,2        0 108973579 /tmp/flock.file
	flock   2683 jrome741    3r   REG    8,2        0 108973579 /tmp/flock.file
	flock   2684 jrome741    3r   REG    8,2        0 108973579 /tmp/flock.file
	flock   2685 jrome741    3r   REG    8,2        0 108973579 /tmp/flock.file
	flock   2686 jrome741    3r   REG    8,2        0 108973579 /tmp/flock.file
	flock   2687 jrome741    3r   REG    8,2        0 108973579 /tmp/flock.file
	flock   2688 jrome741    3r   REG    8,2        0 108973579 /tmp/flock.file
	flock   2689 jrome741    3r   REG    8,2        0 108973579 /tmp/flock.file
	flock   2690 jrome741    3r   REG    8,2        0 108973579 /tmp/flock.file
	flock   2691 jrome741    3r   REG    8,2        0 108973579 /tmp/flock.file
	flock   2692 jrome741    3r   REG    8,2        0 108973579 /tmp/flock.file 

There are a few things to note with this;

  • (a) The PIDs match those reported in both your ps command as well as (possibly) the /proc/locks output
  • (b) The device column matches that reported in the /proc/locks output
  • (c) The NODE (inode) column matches that reported in the /proc/locks output
  • (d) The OFFSET/SIZE column matches that reported in the /proc/locks output
  • (e) The name of the resource under contention is the last column. Of all of the information, this is the most human-friendly column.
  • (f) The file descriptor (FD) column contains the file descriptor that we saw in our strace (note that the numerical value may vary across processes).

We also see, within the FD column, that there are letters. I’ve included the basis of what the man page has to offer regarding these letters; refer to the man page of lsof for additional information:

 >> FD is followed by one of these characters, describing the mode under which the file is open:
	>> 
	>>      r for read access;
	>>      w for write access;
	>>      u for read and write access;
	>>      space if mode unknown and no lock character follows;
	>>      `-' if mode unknown and lock character follows.
	>> 
	>> The mode character is followed by one of these lock characters, describing the type of lock applied to the file:
	>> 
	>>      r for read lock on part of the file;
	>>      R for a read lock on the entire file;
	>>      w for a write lock on part of the file;
	>>      W for a write lock on the entire file;
	>>      u for a read and write lock of any length;
	>>      U for a lock of unknown type;
	>>        space if there is no lock. 

You’ll see that almost all of the processes have an ‘r’ as their mode; one process (PID 654) contains a ‘W’, signifying that it holds a WRITE lock on the entire file. Since WRITE locks are exclusive, all other processes will wait until this process relinquishes the lock.

STEP 9: Monitor process, and/or kill process (as a last resort)

WEELLL, being as smart as I am, and setting the flock command for 1 day, those other processes will be waiting for a while before they ever get to acquire their lock. I’ll now terminate the process possessing the exclusive lock, so as to speed things up a bit.

*************

NOTE: In production, you will not want to kill the process in order to relinquish the lock. The only reason that I’m doing it in this example is because I already know what the process is doing (sleeping for 1 day). On a production server, you *won’t* know for certain what the process is doing until you investigate using a utility such as strace.

*************

$ kill 654

Alright, so now that I’ve killed the process, we should soon see the other processes complete. After 5 seconds, we do see one complete!

Process 1 completed

The other processes follow suit, every five second afterwards

 Process 2 completed
	Process 10 completed
	Process 6 completed
	Process 4 completed
	Process 7 completed
	Process 3 completed
	Process 9 completed
	Process 5 completed
	Process 8 completed 

Why do they not complete in the same order that we started them? This is because they’re all contending for the lock; it’s basically a free-for-all as to which process acquires the lock first (race conditions, scheduling, semaphores and other system stuff that you needn’t necessarily worry about). They’re not aware of, nor do they care about, other processes contending for the same lock (at least not until they find themselves waiting again). Their only purpose is to acquire the lock whenever they can. Since each of the contending processes themselves acquire an exclusive lock on the process, you’ll see them complete in 5 second intervals.

Now, it’s worthwhile to test the following and note the results (note that I’ve decreased the amount of time that the processes in the loop sleep — from 5 seconds to 1 second):

 ## Process 1 obtains an EXCLUSIVE lock, everything else requests a SHARED lock
	$ flock --exclusive /tmp/flock.file --command 'sleep 1d' &
	$ for i in {1..10}; do flock --shared /tmp/flock.file --command "sleep 1s && echo Process $i completed" & done	

	## Process 1 obtains an SHARED lock, everything else requests a SHARED lock
	$ flock --shared /tmp/flock.file --command 'sleep 1d' &
	$ for i in {1..10}; do flock --shared /tmp/flock.file --command "sleep 1s && echo Process $i completed" & done	

	## Process 1 obtains an SHARED lock, everything else requests a EXCLUSIVE lock
	$ flock --shared /tmp/flock.file --command 'sleep 1d' &
	$ for i in {1..10}; do flock --exclusive /tmp/flock.file --command "sleep 1s && echo Process $i completed" & done	 

After running through these three, it should become evident that the following statements generally hold true:

  • (a) Exclusive locks will block both shared and other exclusive locks
  • (b) Shared locks will not block anything

What does all of this meaaaaaaaaann?!?!?

Imagine if you will, the following:

--> The 11 processes that we were previously working with are php-fpm children
	--> The /tmp/flock.file file is now a /home/foobarco/foobar.com/html/var/session/sess_oldmcdonaldhadafarm session file

NOW we’re talking (NOT)!! Assuming that the above are true, we can see where issues might arise. In the previous examples, I was at a slight advantage, since I set up the initial environment (Steps 1-3), and then searched for a specific process in Step 4. I then worked my way backwards to prove everything. The only difference is that, when dealing with client coding, we’ll start at Step 4, using one of the following commands:

 $ ps -wwwC php-fpm -o user,ppid,pid,stat,wchan:30,etime,command		  ## List by the command php-fpm (assuming a php-fpm server)
	$ ps -wwwC httpd -o user,ppid,pid,stat,wchan:30,etime,command			  ## List by the command httpd (assuming a server configured to use mod_php)
	$ ps -wwwC lshttpd -o user,ppid,pid,stat,wchan:30,etime,command		  ## List by the command lshttpd (assuming a litespeed server)
	$ ps -wwwu <UNIXUSER> -o user,ppid,pid,stat,wchan:30,etime,command	## List processes belonging to the unix user <UNIXUSER> 

You won’t necessarily know how long the process holding the lock is going to last, nor will you initially know what file/resource is under contention. The latter unknown will be uncovered in steps 6 and 7.

You can see where issues might arise on the server when it comes to file contention. For example, consider the following:

--> Initial process holding the lock is performing some task that requires an unknown amount of time
	--> Additional incoming processes for the same user require the locked resource

It’s easy to imagine that, under a server configured to use PHP-FPM, the user will soon reach max_children. This could lead to other issues as well (memory exhaustion related to poor coding multiplied by the number of stalled processes, maxClients being reached, etc). This can also be manifested in a myriad of symptoms ‘downstream’ and ‘upstream’ of PHP-FPM (i.e. sleeping queries in the database, since PHP-FPM may have opened a connection to MySQL, but hasn’t necessarily had the chance to send additional information to the server).

**** NOTE *****

Once you reach Step 9, it’s important NOT to kill the process; I only did this in the example to expedite the point that I was trying to make at the time. Instead, now that you have found the process that possesses the lock, you should run an strace on the process to determine what is causing it to hold onto the lock for such a long time. For instance, you might see that the process is

-- Iterating over cache on disk
	-- Attempting to grab a remote resource
	-- Stuck in a coding loop
	-- Waiting on a complex MySQL query

Learning how to obtain and relay this information will oftentimes prove valuable preventing future occurences of the same issue.

Posted in: Nexcess
  • ErikHansen

    Thanks for the post! Those are some helpful commands to see which processes have file locks, and what files are being locked.