Python Generator Find Files With Wildcard

This is a neat way to generate file names in a directory that match a specific pattern, I use this to generate a list of files exported out of hive to load into S3.

Python3 Subprocess and Rsync Deadlock Strace Timeout

I recently came across a tough to debug issue where I was calling a shell script from python using the subprocess module, this shell script called rsync, no matter what I would always run into a timeout situation. I fired up strace and noticed that the process was in a timeout state.

select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)

I looked at the subprocess documentation and apparently using pipes will fill the system pipe buffer.

Warning

This will deadlock when using stdout=PIPE and/or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use communicate() to avoid that.

I was baffled, I finally took the approach to eliminate stderr and stdout and just check the return status of the command using run(). Here is what I finally came up with, and all was well.

Hope you find this and it helps you.