Automate pg_dump pg_restore Of Tables From Config File Send Slack Update

You can use this python code to setup a cron that will sync postgres tables from one database to another. This will read from a config file and will be able to do multiple tables from the same run. This can be useful to sync a daily table from source to destinations. This will also send a alert to slack if its ok or critical.


Python Remove Files That Match Pattern Older Than N Days

Neat little script that implements find in pure python, this can be passed different patterns and directories. The script will walk the directories and match the patterns, it will then generate a list of files and get the ctime of each. Some comparison is done against a date you set and removes them. This is great for cleaning up application logs that clog up the filesystem.

Mass Rename Files In Gcloud With Python Multiprocessing Parallel Gsutil

I had been tasked with renaming in place, up in the cloud, not bringing the files down locally, 50000 files. I looked at using wildcards with gsutil however I was not able to remove what I wanted from the file, so I set out on creating a shell script to perform the task, I created a listing of files with gsutil and did some awk magic to get just the filenames into listing2.txt. I wrote the following loop.

This will rename the files stripping out what I wanted, files go from:

work-data-sample__0_0_1.csv.gz to data-sample__0_0_1.csv.gz

I launched it and noticed something odd, it was only iterating over the list and making one call to the gcloud api to rename the file. This was going to take forever, it actually took 24 hours. I did some reading of the docs and saw that gsutil has a -m option for multiprocessing, I also checked the source code and it looks like gsutil is multiprocess out of the box.

gsutil source code:

This is basically saying if the OS can handle multiprocessing, lets spawn the same amount of processes that the system has cpus, and then set the thread count to 5. So my for loop in bash would of taken forever with -m option as well.

So I created some python code that would solve this issue, it would perform all the steps in one, list the files and substring out the filename, and use pythons multiprocessing to spawn 25 workers to do the api calls in chunks. I learned a lot from this and I hope it helps others, I will add comments in the code to show whats going on.

You can see the process spawns 25 worker processes that will iterate over the list and perform the move in chunks.

Python Function Execute Subprocess With Timeout

I have a project that rsync’s data from an RPM repository for a local version of this repo. The issue I was faced with was the remote mirror would sometimes stop the rsync due to overloaded network or other unforeseen issues. I wanted to use rsyncs hashing algorithm to have it start right where it left off so I wrote a function to do this. If 900 seconds was hit it usually meant there was an issue with the transfer. I also want to state here that I observed the rsync stop serving issue on many mirrors so it was not just an issue with the TCP network. I use this in production and it logs each iteration or restart. The function below will also kill the current rsync so multiple copies are not running at the same time. I also only wanted to perform 5 iterations of rsync so I use a while loop here.

Here are the individual rsync commands in the INI configuration.

Here is how I call the execute_jobs_timeout() function:

The function:

Log Snippet showing each command executing:

Python Generator Find Files With Wildcard

This is a neat way to generate file names in a directory that match a specific pattern, I use this to generate a list of files exported out of hive to load into S3.

Python3 Subprocess and Rsync Deadlock Strace Timeout

I recently came across a tough to debug issue where I was calling a shell script from python using the subprocess module, this shell script called rsync, no matter what I would always run into a timeout situation. I fired up strace and noticed that the process was in a timeout state.

select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)

I looked at the subprocess documentation and apparently using pipes will fill the system pipe buffer.


This will deadlock when using stdout=PIPE and/or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use communicate() to avoid that.

I was baffled, I finally took the approach to eliminate stderr and stdout and just check the return status of the command using run(). Here is what I finally came up with, and all was well.

Hope you find this and it helps you.

Nagios Check Postgres Table Date Column Against now()

I had a situation where a daily sync of a table from one database to another was failing. This table was updated daily so the query should return something like this when it was synced correctly:

I use Nagios very heavily and I setup a custom plugin to check the query’s date against today’s date, this should warn or critical based on user supplied arguments. Here is what a failure looks like when running from the nagios servers command line. This worked well at alerting me when the sync failed, this was integrated into the nagios subsystem and emails and slack alerts are generated as expected.


I needed a quick plugin to warn me if one of our AWS REDSHIFT instances had a table count above 6000 and alert critical if above 7000. I decided to write a python plugin for nagios to do the chore. You can see the source code and the example of executing it below on the nagios host.

Python Backup WORDPRESS Site / DATABASE and HTML

I have this blog hosted on a LINODE dedicated LINUX server. It’s about 10 dollars a month for a 1 core system with about 250GB of disk space and 1GB of RAM, this server runs the common LAMP stack, I needed a quick and dirty script to backup MYSQL database and the PHP code contained in the /var/www/html folder. I wanted the script to compress the contents of both and move them into a directory with the correct date. See the comments below outlining the code and the action of running it.

So you can see we generated 2 files in a dated directory, I chose to use both zip and gunzip for compression algoritims. To view the contents you can run the normal linux commands to extract the files.

So there you have it, I can tar up the entire dated directory for easy offsite backup now of my entire site Hope this helps someone, feel free to copy the source code and change at will.


Python Mysql Connector

Thought I would try my hand at some SQL programming with Python, I was stuck using a Windows machine(BLAH) I wanted to setup a MySQL database and 1 table for testing. I am on a Windows machine so I installed WAMP which is a Windows Apache Mysql Php server. You can get the installer for Windows here:

You can get the python library / module here:

Mysql Python
Using PhpMyAdmin I setup a database and added the table.


As you can see I created a function that added data into my new database. I pass it 4 parameters, id-name-dept-salary, when you execute this script the database gets populated with the correct data. Pretty cool!!

Here is how you can see the help and call the command properly:
help command

Here is what you can see when performing a select * from the table.
database command

Here is the awesome IDE pyCharm and the results from running the code.
Here is an example of how to query that database from the table.

Here is the output of the fetch command using ID as a parameter:
database command

Took me a while to figure out the syntax, so hopefully this helps someone.