lambda.us-east-1.amazonaws.com

January 17, 2024January 17, 2024 admin2 Comments

Recently while working on one of our EMR projects that uses lambdas and airflow, I ran into the following timeout issue:

botocore.exceptions.ReadTimeoutError: Read timeout on endpoint URL: "https://lambda.us-east-1.amazonaws.com/2015-03-31/functions/infrastructure-lambda-incoming/invocations"

1	botocore.exceptions.ReadTimeoutError: Read timeout on endpoint URL: "https://lambda.us-east-1.amazonaws.com/2015-03-31/functions/infrastructure-lambda-incoming/invocations"

We have a lambda that was invoked from boto3 in a Airflow step that would update dynamo db with values needed for our pipeline. This function worked in previous tests with no issues. We did add to the lambda function which was causing it to take longer than normal. When we tested the lambda from the console, the function worked fine, albeit it took a bit longer than the previous version. When calling from Airflow we would continually run into the timeout issue, causing the function to be executed multiple times during retries.

I thought to test this function from the awscli and it revealed the issue, the default boto3 timeout is 60 seconds, this was longer than our lambda was taking. So even though we set the lambda timeout to 4 minutes, boto was timing out at 1 minute, never getting the response back from lambda. The way we fixed this was to have boto3 setup a lambda_config that had a longer timeout.

lambda_config = config.Config(
    read_timeout=900,
    connect_timeout=900,
    retries={"max_attempts": 0}
)

lambda_config = config.Config(

read_timeout=900,

connect_timeout=900,

retries={"max_attempts": 0}

)

def call_lambda(**kwargs):
    lambda_client = boto3.client('lambda', region_name=REGION, config=lambda_config)
    payload = {--snipped--}
    # invoke the lambda and wait for it to complete
    response = lambda_client.invoke(
        FunctionName=lambda_name,
        InvocationType="RequestResponse",
        Payload=json.dumps(payload),
    )
    if response["StatusCode"] != 200:
        raise AirflowException("Preprocess lambda has failed")

def call_lambda(**kwargs):

lambda_client = boto3.client('lambda', region_name=REGION, config=lambda_config)

payload = {--snipped--}

# invoke the lambda and wait for it to complete

response = lambda_client.invoke(

FunctionName=lambda_name,

InvocationType="RequestResponse",

Payload=json.dumps(payload),

)

if response["StatusCode"] != 200:

raise AirflowException("Preprocess lambda has failed")

RequestsDependencyWarning: urllib3 (1.26.18) or chardet (3.0.4) doesn’t match a supported version!

November 27, 2023November 28, 2023 adminLeave a comment

I ran into this issue on a CENTOS8 server that has yet to be updated to RHEL8, after upgrading some packages via Pip:

[root@server01 site-packages]# python3.6
Python 3.6.8 (default, Sep 10 2021, 09:13:53) 
[GCC 8.5.0 20210514 (Red Hat 8.5.0-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
/usr/local/lib/python3.6/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.18) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning)
>>> quit()

[root@server01 site-packages]# python3.6

Python 3.6.8 (default, Sep 10 2021, 09:13:53)

[GCC 8.5.0 20210514 (Red Hat 8.5.0-3)] on linux

Type "help", "copyright", "credits" or "license" for more information.

>>> import requests

/usr/local/lib/python3.6/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.18) or chardet (3.0.4) doesn't match a supported version!

RequestsDependencyWarning)

>>> quit()

Turns out:

Module python3-requests is not compatible with locally installed third party module urllib3 of version 1.26.8 and get conflicting with Red Hat provided python3-urllib3 version 1.24.2-5.el8.

I was able to get around this by upgrading URLLIB3 and REQUESTS:

[root@server01 site-packages]# pip3.6 install -U urllib3 requests

1	[root@server01 site-packages]# pip3.6 install -U urllib3 requests

Works Now:

[root@server01 site-packages]# python3.6
Python 3.6.8 (default, Sep 10 2021, 09:13:53) 
[GCC 8.5.0 20210514 (Red Hat 8.5.0-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> quit()

[root@server01 site-packages]# python3.6

Python 3.6.8 (default, Sep 10 2021, 09:13:53)

[GCC 8.5.0 20210514 (Red Hat 8.5.0-3)] on linux

Type "help", "copyright", "credits" or "license" for more information.

>>> import requests

>>> quit()

Python Linux Find Files With Pattern Accessed Older Than N Days And Remove

July 7, 2022July 14, 2022 adminLeave a comment

This is a neat utility that you can use to keep in your sysadmin bag of tricks, it walks the directory you define recursively and grabs all the file access times and stores them into a list, it then compares them against a command line parameter for days ago. If its older than N days it will remove the file. What’s really nice about this utility is it has a debug mode, this way you can see what will be deleted before you remove debug and execute it.

#!/usr/bin/env python3

import argparse
import fnmatch
import os
import sys
from datetime import datetime, timedelta
from pathlib import Path

# set date now.
now = datetime.today()

# setup dir to clean
home = str(Path.home())
target_dir = '/home/jasonr' # CHANGE TO WHERE YOU WANT TO SEARCH

# dir to clean
dirs_to_clean = target_dir

# setup cli arguments.
parser = argparse.ArgumentParser(
    description='''
[--days_ago 60] will keep 60 days worth of files.
[--debug yes] will print out statements with no actions.''',
    formatter_class=argparse.RawTextHelpFormatter)
parser.add_argument('--days_ago',
                    help='[--days_ago NN]')
parser.add_argument('--debug',
                    help='[--debug (yes|no)')
args = parser.parse_args()

# allowed arguments from cli.
accepted_cli_args = ['yes', 'no']

# sanity check, assign days to keep on system.
if args.days_ago is None:
    days = 60
else:
    days = args.days_ago

# define a list of patterns
patterns = ['*.csv', '*.txt'] # YOU CAN ADD ANY PATTERN TO LIST

# sanity check, assign debug true or false
if args.debug in accepted_cli_args:
    if args.debug == 'yes':
        debug = True
    else:
        debug = False
else:
    print("{0}: Wrong parameter --debug (yes or no): [{1}]"
          .format(now, args.debug))
    sys.exit(1)


def find_files(dir_to_clean):
    file_list = []
    days_ago = datetime.now() - timedelta(days=int(days))
    for root, dirs, files in os.walk(dir_to_clean):
        for pattern in patterns:
            for filename in fnmatch.filter(files, pattern):
                file_list.append(os.path.join(root, filename))
                file_list.sort()

    for file in file_list:
        try:
            file_atime = datetime.fromtimestamp(os.path.getatime(file))
        except Exception as e:
            print("{0}: File Access Time Get Failed: [{1}]"
                  .format(now, e))
        if file_atime < days_ago:
            if os.path.isfile(file):
                try:
                    if not debug:
                        print("{0}: Removing file: [{1}]"
                              .format(now, file))
                        os.remove(file)
                    else:
                        print("{0}: DEBUG: Removing file: [{1}]"
                              .format(now, file))
                except OSError as e:
                    print("{0}: File Clean Up Failed: [{1}]"
                          .format(now, e))
                    sys.exit(1)


# main function.
def main():
    find_files(dirs_to_clean)


if __name__ == "__main__":
    main()

#!/usr/bin/env python3

import argparse

import fnmatch

import os

import sys

from datetime import datetime, timedelta

from pathlib import Path

# set date now.

now = datetime.today()

# setup dir to clean

home = str(Path.home())

target_dir = '/home/jasonr' # CHANGE TO WHERE YOU WANT TO SEARCH

# dir to clean

dirs_to_clean = target_dir

# setup cli arguments.

parser = argparse.ArgumentParser(

description='''

[--days_ago 60] will keep 60 days worth of files.

[--debug yes] will print out statements with no actions.''',

formatter_class=argparse.RawTextHelpFormatter)

parser.add_argument('--days_ago',

help='[--days_ago NN]')

parser.add_argument('--debug',

help='[--debug (yes|no)')

args = parser.parse_args()

# allowed arguments from cli.

accepted_cli_args = ['yes', 'no']

# sanity check, assign days to keep on system.

if args.days_ago is None:

days = 60

else:

days = args.days_ago

# define a list of patterns

patterns = ['*.csv', '*.txt'] # YOU CAN ADD ANY PATTERN TO LIST

# sanity check, assign debug true or false

if args.debug in accepted_cli_args:

if args.debug == 'yes':

debug = True

else:

debug = False

else:

print("{0}: Wrong parameter --debug (yes or no): [{1}]"

.format(now, args.debug))

sys.exit(1)

def find_files(dir_to_clean):

file_list = []

days_ago = datetime.now() - timedelta(days=int(days))

for root, dirs, files in os.walk(dir_to_clean):

for pattern in patterns:

for filename in fnmatch.filter(files, pattern):

file_list.append(os.path.join(root, filename))

file_list.sort()

for file in file_list:

try:

file_atime = datetime.fromtimestamp(os.path.getatime(file))

except Exception as e:

print("{0}: File Access Time Get Failed: [{1}]"

.format(now, e))

if file_atime < days_ago:

if os.path.isfile(file):

try:

if not debug:

print("{0}: Removing file: [{1}]"

.format(now, file))

os.remove(file)

else:

print("{0}: DEBUG: Removing file: [{1}]"

.format(now, file))

except OSError as e:

print("{0}: File Clean Up Failed: [{1}]"

.format(now, e))

sys.exit(1)

# main function.

def main():

find_files(dirs_to_clean)

if __name__ == "__main__":

main()

[jasonr@sb-jralph-8 ~]$ python3 finder.py --days_ago 90 --debug yes
2022-07-07 11:22:57.524454: DEBUG: Removing file: [/home/jasonr/aws/dist/awscli/examples/emr/create-cluster-synopsis.txt]
2022-07-07 11:22:57.524454: DEBUG: Removing file: [/home/jasonr/aws/dist/cryptography-3.3.2-py3.8.egg-info/top_level.txt]
2022-07-07 11:22:57.524454: DEBUG: Removing file: [/home/jasonr/aws/dist/docutils/parsers/rst/include/README.txt]
2022-07-07 11:22:57.524454: DEBUG: Removing file: [/home/jasonr/aws/dist/docutils/parsers/rst/include/isoamsa.txt]
2022-07-07 11:22:57.524454: DEBUG: Removing file: [/home/jasonr/aws/dist/docutils/parsers/rst/include/isoamsb.txt]
2022-07-07 11:22:57.524454: DEBUG: Removing file: [/home/jasonr/aws/dist/docutils/parsers/rst/include/isoamsc.txt]

[jasonr@sb-jralph-8 ~]$ python3 finder.py --days_ago 90 --debug yes

2022-07-07 11:22:57.524454: DEBUG: Removing file: [/home/jasonr/aws/dist/awscli/examples/emr/create-cluster-synopsis.txt]

2022-07-07 11:22:57.524454: DEBUG: Removing file: [/home/jasonr/aws/dist/cryptography-3.3.2-py3.8.egg-info/top_level.txt]

2022-07-07 11:22:57.524454: DEBUG: Removing file: [/home/jasonr/aws/dist/docutils/parsers/rst/include/README.txt]

2022-07-07 11:22:57.524454: DEBUG: Removing file: [/home/jasonr/aws/dist/docutils/parsers/rst/include/isoamsa.txt]

2022-07-07 11:22:57.524454: DEBUG: Removing file: [/home/jasonr/aws/dist/docutils/parsers/rst/include/isoamsb.txt]

2022-07-07 11:22:57.524454: DEBUG: Removing file: [/home/jasonr/aws/dist/docutils/parsers/rst/include/isoamsc.txt]

AWS EMR ImportError: this version of pandas is incompatible with numpy < 1.17.3

May 10, 2022August 5, 2022 admin7 Comments

I found another one that I thought was worth writing a quick blog post about. We use AWS Elastic Map Reduce with transient clusters, so in order to get the python libraries installed, we need to use the bootstrap feature. We ran into many issues trying the standard bootstrap script which looked something like this:

[09:43:14] jason@jralph-mbp14:~ $ cat bootstrap.sh
aws s3 cp s3://bucket1-us-east-1/EMR/requirements.txt .
sudo python3 -m pip install -r requirements.txt

[09:43:14] jason@jralph-mbp14:~ $ cat bootstrap.sh

aws s3 cp s3://bucket1-us-east-1/EMR/requirements.txt .

sudo python3 -m pip install -r requirements.txt

The contents of requirements.txt looked like this:

[09:43:14] jason@jralph-mbp14:~ $ cat requirements.txt
boto3
botocore
awscli
requests
scikit-learn
numpy
pandas

[09:43:14] jason@jralph-mbp14:~ $ cat requirements.txt

boto3

botocore

awscli

requests

scikit-learn

numpy

pandas

We would get all the nodes in the cluster to bootstrap properly however the logs showed the following:

Traceback (most recent call last):
  File "analysis.py", line 6, in <module>
    import pandas as pd
  File "/usr/local/lib64/python3.7/site-packages/pandas/__init__.py", line 22, in <module>
    from pandas.compat import (
  File "/usr/local/lib64/python3.7/site-packages/pandas/compat/__init__.py", line 15, in <module>
    from pandas.compat.numpy import (
  File "/usr/local/lib64/python3.7/site-packages/pandas/compat/numpy/__init__.py", line 27, in <module>
    f"this version of pandas is incompatible with numpy < {_min_numpy_ver}\n"
ImportError: this version of pandas is incompatible with numpy < 1.17.3
your numpy version is 1.16.5.
Please upgrade numpy to >= 1.17.3 to use this pandas version

Traceback (most recent call last):

File "analysis.py", line 6, in <module>

import pandas as pd

File "/usr/local/lib64/python3.7/site-packages/pandas/__init__.py", line 22, in <module>

from pandas.compat import (

File "/usr/local/lib64/python3.7/site-packages/pandas/compat/__init__.py", line 15, in <module>

from pandas.compat.numpy import (

File "/usr/local/lib64/python3.7/site-packages/pandas/compat/numpy/__init__.py", line 27, in <module>

f"this version of pandas is incompatible with numpy < {_min_numpy_ver}\n"

ImportError: this version of pandas is incompatible with numpy < 1.17.3

your numpy version is 1.16.5.

Please upgrade numpy to >= 1.17.3 to use this pandas version

And when trying to import from pyspark, we saw this:

Traceback (most recent call last):
  File "analysis.py", line 6, in <module>
    import pandas as pd
ModuleNotFoundError: No module named 'pandas'

Traceback (most recent call last):

File "analysis.py", line 6, in <module>

import pandas as pd

ModuleNotFoundError: No module named 'pandas'

After speaking with AWS support, it turns out this was a known issue. When a cluster is launched, EMR first provisions the EC2 instances, after that it runs the bootstrap actions. Thus, when the bootstrap action runs, it installs the desired version. However, since the applications are installed after the bootstrap action, these applications override the custom installation for the Python packages. In order to get around the issue of the version being overridden, the workaround is to make use of a Bootstrap Action that delays the installation of the packages until the nodes are fully up and running. This will resolve the conflict that we have been seeing with pandas and numpy. Here is what our final working bootstrap.sh looks like, hope this helps, it was a tough one to solve:

#!/bin/bash
set -x

cat > /var/tmp/fix-bootstap.sh <<'EOF'
#!/bin/bash
set -x

while true; do
    NODEPROVISIONSTATE=`sed -n '/localInstance [{]/,/[}]/{
    /nodeProvisionCheckinRecord [{]/,/[}]/ {
    /status: / { p }
    /[}]/a
    }
    /[}]/a
    }' /emr/instance-controller/lib/info/job-flow-state.txt | awk ' { print $2 }'`

    if [ "$NODEPROVISIONSTATE" == "SUCCESSFUL" ]; then
        echo "Running my post provision bootstrap"
        # Enter your code here
        sudo python3 -m pip install --upgrade pip
        sudo python3 -m pip install boto3
        sudo python3 -m pip install botocore
        sudo python3 -m pip install sklearn
        sudo python3 -m pip install requests
        sudo python3 -m pip install numpy
        sudo python3 -m pip install pandas
        echo '-------BOOTSTRAP COMPLETE-------' 

        exit
    else
        echo "Sleeping Till Node is Provisioned"
        sleep 10
    fi
done

EOF

chmod +x /var/tmp/fix-bootstap.sh
nohup /var/tmp/fix-bootstap.sh  2>&1 &

#!/bin/bash

set -x

cat > /var/tmp/fix-bootstap.sh <<'EOF'

#!/bin/bash

set -x

while true; do

NODEPROVISIONSTATE=`sed -n '/localInstance [{]/,/[}]/{

/nodeProvisionCheckinRecord [{]/,/[}]/ {

/status: / { p }

/[}]/a

}

/[}]/a

}' /emr/instance-controller/lib/info/job-flow-state.txt | awk ' { print $2 }'`

if [ "$NODEPROVISIONSTATE" == "SUCCESSFUL" ]; then

echo "Running my post provision bootstrap"

# Enter your code here

sudo python3 -m pip install --upgrade pip

sudo python3 -m pip install boto3

sudo python3 -m pip install botocore

sudo python3 -m pip install sklearn

sudo python3 -m pip install requests

sudo python3 -m pip install numpy

sudo python3 -m pip install pandas

echo '-------BOOTSTRAP COMPLETE-------'

exit

else

echo "Sleeping Till Node is Provisioned"

sleep 10

done

EOF

chmod +x /var/tmp/fix-bootstap.sh

nohup /var/tmp/fix-bootstap.sh 2>&1 &

Automate pg_dump pg_restore Of Tables From Config File Send Slack Update

July 6, 2020July 15, 2020 adminLeave a comment

You can use this python code to setup a cron that will sync postgres tables from one database to another. This will read from a config file and will be able to do multiple tables from the same run. This can be useful to sync a daily table from source to destinations. This will also send a alert to slack if its ok or critical.

[logging]
log_file = pg_table_sync_dev_to_prod.log
log_path = /home/postgres

[pg_table_source_dest]
public.jason_test_table1 = public.jason_test_table1
public.jason_test_table2 = public.jason_test_table2
public.jason_test_table3 = public.jason_test_table3

[hosts]
source_db = db-sbx01
dest_db = db10

[database]
dev_db = devdb
prod_db = proddb

[dump_location]
local_location = /u04/pg_data_dumps/transfer_tables/

[slack]
webhook = https://hooks.slack.com/services/<yourwebhookhere>

[logging]

log_file = pg_table_sync_dev_to_prod.log

log_path = /home/postgres

[pg_table_source_dest]

public.jason_test_table1 = public.jason_test_table1

public.jason_test_table2 = public.jason_test_table2

public.jason_test_table3 = public.jason_test_table3

[hosts]

source_db = db-sbx01

dest_db = db10

[database]

dev_db = devdb

prod_db = proddb

[dump_location]

local_location = /u04/pg_data_dumps/transfer_tables/

[slack]

webhook = https://hooks.slack.com/services/<yourwebhookhere>

__author__ = 'jralph'
__version__ = '1.0.0'

import configparser
import os
import sys
import logging
import subprocess
import shlex
import socket
import datetime
import requests

# set hostname.
hostname = socket.gethostname()

# set date now.
now = datetime.datetime.now()

# obtain script name and assign to variable.
script_name = sys.argv[0].split('.')[0]

# sanity check for configuration environment variable.
if "INI_PATH" not in os.environ.keys():
    print('INI_PATH is not set, check the .bashrc')
    sys.exit(1)

# parse the configuration sections of the ini file.
config = configparser.ConfigParser()
try:
    config.read(os.environ['INI_PATH'] + '/pg_table_sync_dev_prod.ini')
    config.sections()
    log_file = config.get('logging', 'log_file')
    log_path = config.get('logging', 'log_path')
    slack_hook = config.get('slack', 'webhook')
except configparser.NoSectionError as e:
    print('FATAL: Command failed with error [{0}]'.format(e))

# setup logging.
try:
    logging.basicConfig(filename='%s/%s' % (log_path, log_file),
                        format='%(asctime)s %(message)s',
                        datefmt='%m-%d-%Y %I:%M:%S %p -',
                        level=logging.DEBUG)
except NameError as e:
    print('FATAL: Command failed with error [{0}]'.format(e))

# get hosts and tablenames.
try:
    pg_tables_to_sync = dict(config.items('pg_table_source_dest'))
    source_db = config.get('hosts', 'source_db')
    dest_db = config.get('hosts', 'dest_db')
    dump_location = config.get('dump_location', 'local_location')
    dev_db = config.get('database', 'dev_db')
    prod_db = config.get('database', 'prod_db')
except (configparser.NoSectionError, NameError) as e:
    logging.critical('FATAL: Command failed with error [{0}]'.format(e))


# pg_dump function.
def pg_dump():
    cmd_list = []
    tables = {}
    try:
        tables = sorted(pg_tables_to_sync.items())
    except NameError as e:
        logging.critical('FATAL: Command failed with error [{0}]'.format(e))
    for key, value in tables:
        dump_cmd = 'pg_dump -Fc -h {0} -d {1} -t {2} -f {4}{2}.{3}.pgdump'.format(
            source_db, dev_db, key, now.strftime("%Y%m%d"), dump_location)
        cmd_list.append(dump_cmd)
    return cmd_list

# pg_restore function.
def pg_restore():
    cmd_list = []
    tables = {}
    try:
        tables = sorted(pg_tables_to_sync.items())
    except NameError as e:
        logging.critical('FATAL: Command failed with error [{0}]'.format(e))
    for key, value in tables:
        dump_cmd = 'pg_restore -c -h {0} -d {1} {4}{2}.{3}.pgdump'.format(
            dest_db, prod_db, key, now.strftime("%Y%m%d"), dump_location)
        cmd_list.append(dump_cmd)
    return cmd_list

# send to slack function.
def send_to_slack(slack_url, state, command, date_format, priority, target_os):
    slack_data = {'attachments': [
        {
            "fallback": "Required plain-text summary of the attachment.",
            "color": priority,
            "pretext": "PG Table Sync",
            "author_name": command,
            "text": "%s" % date_format,
            "fields": [
                {
                    "title": "%s" % target_os,
                    "value": state,
                    "short": "false"
                }
            ],
            "footer": "AFS Slack",
            "footer_icon": "https://platform.slack-edge.com"
                           "/img/default_application_icon.png"
        }
    ]}
    response = requests.post(
        slack_url, json=slack_data)
    if response.status_code != 200:
        raise ValueError(
            'Request to slack returned an error %s, the response is:\n%s'
            % (response.status_code, response.text))


# execute with logging.
def execute_jobs(cmd):
    try:
        logging.info('Start Command: [%s]' % cmd)
        subprocess.run(shlex.split(cmd), check=True)
        logging.info('Command Success: [%s]' % cmd)
        try:
            send_to_slack(slack_hook, 'Ok', cmd,
                          datetime.datetime.today(), 'good', hostname)
        except ValueError as e:
            logging.critical('FATAL: Slack post failed with error [%s]'
                             % e)
    except subprocess.CalledProcessError as e:
        logging.critical('[%s] FATAL: Command failed with error [%s]'
                         % (cmd, e))
        try:
            send_to_slack(slack_hook, 'Critical', cmd,
                          datetime.datetime.today(), 'danger', hostname)
        except ValueError as e:
            logging.critical('FATAL: Slack post failed with error [%s]'
                             % e)


# main
def main():
    for command in pg_dump():
        execute_jobs(command)
    for command in pg_restore():
        execute_jobs(command)

    logging.info('finished ' + script_name)


if __name__ == "__main__":
    main()

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

__author__ = 'jralph'

__version__ = '1.0.0'

import configparser

import os

import sys

import logging

import subprocess

import shlex

import socket

import datetime

import requests

# set hostname.

hostname = socket.gethostname()

# set date now.

now = datetime.datetime.now()

# obtain script name and assign to variable.

script_name = sys.argv[0].split('.')[0]

# sanity check for configuration environment variable.

if "INI_PATH" not in os.environ.keys():

print('INI_PATH is not set, check the .bashrc')

sys.exit(1)

# parse the configuration sections of the ini file.

config = configparser.ConfigParser()

try:

config.read(os.environ['INI_PATH'] + '/pg_table_sync_dev_prod.ini')

config.sections()

log_file = config.get('logging', 'log_file')

log_path = config.get('logging', 'log_path')

slack_hook = config.get('slack', 'webhook')

except configparser.NoSectionError as e:

print('FATAL: Command failed with error [{0}]'.format(e))

# setup logging.

try:

logging.basicConfig(filename='%s/%s' % (log_path, log_file),

format='%(asctime)s %(message)s',

datefmt='%m-%d-%Y %I:%M:%S %p -',

level=logging.DEBUG)

except NameError as e:

print('FATAL: Command failed with error [{0}]'.format(e))

# get hosts and tablenames.

try:

pg_tables_to_sync = dict(config.items('pg_table_source_dest'))

source_db = config.get('hosts', 'source_db')

dest_db = config.get('hosts', 'dest_db')

dump_location = config.get('dump_location', 'local_location')

dev_db = config.get('database', 'dev_db')

prod_db = config.get('database', 'prod_db')

except (configparser.NoSectionError, NameError) as e:

logging.critical('FATAL: Command failed with error [{0}]'.format(e))

# pg_dump function.

def pg_dump():

cmd_list = []

tables = {}

try:

tables = sorted(pg_tables_to_sync.items())

except NameError as e:

logging.critical('FATAL: Command failed with error [{0}]'.format(e))

for key, value in tables:

dump_cmd = 'pg_dump -Fc -h {0} -d {1} -t {2} -f {4}{2}.{3}.pgdump'.format(

source_db, dev_db, key, now.strftime("%Y%m%d"), dump_location)

cmd_list.append(dump_cmd)

return cmd_list

# pg_restore function.

def pg_restore():

cmd_list = []

tables = {}

try:

tables = sorted(pg_tables_to_sync.items())

except NameError as e:

logging.critical('FATAL: Command failed with error [{0}]'.format(e))

for key, value in tables:

dump_cmd = 'pg_restore -c -h {0} -d {1} {4}{2}.{3}.pgdump'.format(

dest_db, prod_db, key, now.strftime("%Y%m%d"), dump_location)

cmd_list.append(dump_cmd)

return cmd_list

# send to slack function.

def send_to_slack(slack_url, state, command, date_format, priority, target_os):

slack_data = {'attachments': [

{

"fallback": "Required plain-text summary of the attachment.",

"color": priority,

"pretext": "PG Table Sync",

"author_name": command,

"text": "%s" % date_format,

"fields": [

{

"title": "%s" % target_os,

"value": state,

"short": "false"

}

"footer": "AFS Slack",

"footer_icon": "https://platform.slack-edge.com"

"/img/default_application_icon.png"

}

]}

response = requests.post(

slack_url, json=slack_data)

if response.status_code != 200:

raise ValueError(

'Request to slack returned an error %s, the response is:\n%s'

% (response.status_code, response.text))

# execute with logging.

def execute_jobs(cmd):

try:

logging.info('Start Command: [%s]' % cmd)

subprocess.run(shlex.split(cmd), check=True)

logging.info('Command Success: [%s]' % cmd)

try:

send_to_slack(slack_hook, 'Ok', cmd,

datetime.datetime.today(), 'good', hostname)

except ValueError as e:

logging.critical('FATAL: Slack post failed with error [%s]'

% e)

except subprocess.CalledProcessError as e:

logging.critical('[%s] FATAL: Command failed with error [%s]'

% (cmd, e))

try:

send_to_slack(slack_hook, 'Critical', cmd,

datetime.datetime.today(), 'danger', hostname)

except ValueError as e:

logging.critical('FATAL: Slack post failed with error [%s]'

% e)

# main

def main():

for command in pg_dump():

execute_jobs(command)

for command in pg_restore():

execute_jobs(command)

logging.info('finished ' + script_name)

if __name__ == "__main__":

main()

LOGGING EXAMPLE:

07-06-2020 11:44:28 AM - Start Command: [pg_dump -Fc -h db-sbx01 -d db1 -t public.jason_test_table -f /u04/pg_data_dumps/transfer_tables/public.jason_test_table.20200706.pdump]
07-06-2020 11:44:30 AM - Command Success: [pg_dump -Fc -h db-sbx01 -d db1 -t public.jason_test_table -f /u04/pg_data_dumps/transfer_tables/public.jason_test_table.20200706.pgdump]
07-06-2020 11:44:30 AM - Starting new HTTPS connection (1): hooks.slack.com
07-06-2020 11:44:30 AM - https://hooks.slack.com:443 "POST /services/T04MEPB2K/B72JPEUUB/nfEqv7bsKafUUjLoKgo0oT5S HTTP/1.1" 200 22
07-06-2020 11:44:30 AM - Start Command: [pg_restore -c -h db10 -d db1 /u04/pg_data_dumps/transfer_tables/public.jason_test_table.20200706]
07-06-2020 11:44:31 AM - Command Success: [pg_restore -c -h db10 -d db1 /u04/pg_data_dumps/transfer_tables/public.jason_test_table.20200706]
07-06-2020 11:44:31 AM - Starting new HTTPS connection (1): hooks.slack.com
07-06-2020 11:44:31 AM - https://hooks.slack.com:443 "POST /services/T04MEPB2K/B72JPEUUB/nfEqv7bsKafUUjLoKgo0oT5S HTTP/1.1" 200 22
07-06-2020 11:44:31 AM - finished PgTableSyncDevProd

07-06-2020 11:44:28 AM - Start Command: [pg_dump -Fc -h db-sbx01 -d db1 -t public.jason_test_table -f /u04/pg_data_dumps/transfer_tables/public.jason_test_table.20200706.pdump]

07-06-2020 11:44:30 AM - Command Success: [pg_dump -Fc -h db-sbx01 -d db1 -t public.jason_test_table -f /u04/pg_data_dumps/transfer_tables/public.jason_test_table.20200706.pgdump]

07-06-2020 11:44:30 AM - Starting new HTTPS connection (1): hooks.slack.com

07-06-2020 11:44:30 AM - https://hooks.slack.com:443 "POST /services/T04MEPB2K/B72JPEUUB/nfEqv7bsKafUUjLoKgo0oT5S HTTP/1.1" 200 22

07-06-2020 11:44:30 AM - Start Command: [pg_restore -c -h db10 -d db1 /u04/pg_data_dumps/transfer_tables/public.jason_test_table.20200706]

07-06-2020 11:44:31 AM - Command Success: [pg_restore -c -h db10 -d db1 /u04/pg_data_dumps/transfer_tables/public.jason_test_table.20200706]

07-06-2020 11:44:31 AM - Starting new HTTPS connection (1): hooks.slack.com

07-06-2020 11:44:31 AM - https://hooks.slack.com:443 "POST /services/T04MEPB2K/B72JPEUUB/nfEqv7bsKafUUjLoKgo0oT5S HTTP/1.1" 200 22

07-06-2020 11:44:31 AM - finished PgTableSyncDevProd

Python Remove Files That Match Pattern Older Than N Days

January 16, 2020January 22, 2020 adminLeave a comment

Neat little script that implements find in pure python, this can be passed different patterns and directories. The script will walk the directories and match the patterns, it will then generate a list of files and get the ctime of each. Some comparison is done against a date you set and removes them. This is great for cleaning up application logs that clog up the filesystem.

#!/usr/bin/python3.5

import fnmatch
import os
from datetime import datetime, timedelta
from pathlib import Path

# set variables for dirs to clean.
log_path = os.environ["LOG_PATH"]
user_prod_home = str(Path.home())

# set lists of dirs and patterns to clean
dirs_to_clean = [log_path, user_prod_home]
patterns = ['*.log', 'app_*']


# function to loop and search patterns and rm files.
def find_files(dir_to_clean):
    file_list = []
    days_ago = datetime.now() - timedelta(days=60)
    for root, dirs, files in os.walk(dir_to_clean):
        for pattern in patterns:
            for filename in fnmatch.filter(files, pattern):
                file_list.append(os.path.join(root, filename))
                file_list.sort()

    for file in file_list:
        file_ctime = datetime.fromtimestamp(os.path.getctime(file))
        if file_ctime < days_ago:
            if os.path.isfile(file):
                try:
                    print("Removing file :[{0}]".format(file))
                    os.remove(file)
                except OSError as e:
                    print('File Clean Up Failed: [{0}]'.format(e))


# main function
def main():
    for dirs in dirs_to_clean:
        find_files(dirs)


if __name__ == "__main__":
    main()

#!/usr/bin/python3.5

import fnmatch

import os

from datetime import datetime, timedelta

from pathlib import Path

# set variables for dirs to clean.

log_path = os.environ["LOG_PATH"]

user_prod_home = str(Path.home())

# set lists of dirs and patterns to clean

dirs_to_clean = [log_path, user_prod_home]

patterns = ['*.log', 'app_*']

# function to loop and search patterns and rm files.

def find_files(dir_to_clean):

file_list = []

days_ago = datetime.now() - timedelta(days=60)

for root, dirs, files in os.walk(dir_to_clean):

for pattern in patterns:

for filename in fnmatch.filter(files, pattern):

file_list.append(os.path.join(root, filename))

file_list.sort()

for file in file_list:

file_ctime = datetime.fromtimestamp(os.path.getctime(file))

if file_ctime < days_ago:

if os.path.isfile(file):

try:

print("Removing file :[{0}]".format(file))

os.remove(file)

except OSError as e:

print('File Clean Up Failed: [{0}]'.format(e))

# main function

def main():

for dirs in dirs_to_clean:

find_files(dirs)

if __name__ == "__main__":

main()

Mass Rename Files In Gcloud With Python Multiprocessing Parallel Gsutil

September 24, 2019January 4, 2020 admin2 Comments

I had been tasked with renaming in place, up in the cloud, not bringing the files down locally, 50000 files. I looked at using wildcards with gsutil however I was not able to remove what I wanted from the file, so I set out on creating a shell script to perform the task, I created a listing of files with gsutil and did some awk magic to get just the filenames into listing2.txt. I wrote the following loop.

This will rename the files stripping out what I wanted, files go from:

work-data-sample__0_0_1.csv.gz to data-sample__0_0_1.csv.gz

for files in $(cat listing2.txt) ; do  
    echo "Renaming: $files --> ${files#work-}"
    gsutil mv gs://gs-bucket/jason_testing/$files gs://gs-bucket/jason_testing/${files#work-}
done

for files in $(cat listing2.txt) ; do

echo "Renaming: $files --> ${files#work-}"

gsutil mv gs://gs-bucket/jason_testing/$files gs://gs-bucket/jason_testing/${files#work-}

done

I launched it and noticed something odd, it was only iterating over the list and making one call to the gcloud api to rename the file. This was going to take forever, it actually took 24 hours. I did some reading of the docs and saw that gsutil has a -m option for multiprocessing, I also checked the source code and it looks like gsutil is multiprocess out of the box.

gsutil source code:

should_prohibit_multiprocessing, unused_os = ShouldProhibitMultiprocessing()
if should_prohibit_multiprocessing:
  DEFAULT_PARALLEL_PROCESS_COUNT = 1
  DEFAULT_PARALLEL_THREAD_COUNT = 24
else:
  DEFAULT_PARALLEL_PROCESS_COUNT = min(multiprocessing.cpu_count(), 32)
  DEFAULT_PARALLEL_THREAD_COUNT = 5

should_prohibit_multiprocessing, unused_os = ShouldProhibitMultiprocessing()

if should_prohibit_multiprocessing:

DEFAULT_PARALLEL_PROCESS_COUNT = 1

DEFAULT_PARALLEL_THREAD_COUNT = 24

else:

DEFAULT_PARALLEL_PROCESS_COUNT = min(multiprocessing.cpu_count(), 32)

DEFAULT_PARALLEL_THREAD_COUNT = 5

This is basically saying if the OS can handle multiprocessing, lets spawn the same amount of processes that the system has cpus, and then set the thread count to 5. So my for loop in bash would of taken forever with -m option as well.

So I created some python code that would solve this issue, it would perform all the steps in one, list the files and substring out the filename, and use pythons multiprocessing to spawn 25 workers to do the api calls in chunks. I learned a lot from this and I hope it helps others, I will add comments in the code to show whats going on.

#!/usr/bin/env python3.5
import subprocess
import multiprocessing
import datetime
import shlex


class GsRenamer:
    def __init__(self):
        self.gs_cmd = '/home/jasonr/google-cloud-sdk/bin/gsutil'
        self.file_list = []
        self.final_rename_list = []
        self.now = datetime.datetime.now()

    # method to call subprocess on each command I feed it.
    def execute_jobs(self, cmd):
        try:
            print('{0} INFO: Running rename command: [{1}]'.format(self.now,
                                                                    cmd))
            subprocess.run(shlex.split(cmd), check=True)
        except subprocess.CalledProcessError as e:
            print('[{0}] FATAL: Command failed with error [{1}]').format(cmd,
                                                                         e)

    # method to get all the filenames from the bucket in gcloud and split it to
    # get the filenames. Load the results into a list. filter the list for any
    # blank lines. 
    def get_filenames_from_gs(self):
        cmd = [self.gs_cmd, 'ls',
               'gs://gs-bucket/jason_testing']
        output = subprocess.Popen(cmd, stdout=subprocess.PIPE,
        universal_newlines=True).communicate()[0].splitlines()
        for full_file_path in output:
            file_name  = full_file_path.split('/')[-1]
            self.file_list.append(file_name)
            self.file_list = list(filter(None, self.file_list))


    # method to iterate over the list and create the commands,
    # also leverage pythons string replace to rename the source
    # and target file. build the commands and load into a list.
    # Also use pythons multiprocessing module to spawn 25 processes
    # that hit the api in chunks cutting down the time to rename
    # all the files in place.
    def rename_files(self, string_original, string_replace):
        for files in self.file_list:
            renamed_files = files.replace(string_original,
                                          string_replace)
            rename_command = "{0} mv gs://gs-bucket_testing/{1} " \
                             "gs://gs-bucket/jason_testing/{2}" \
                             .format(self.gs_cmd, files, renamed_files)
            self.final_rename_list.append(rename_command)
        self.final_rename_list.sort()
        multiprocessing.pool = multiprocessing.Pool(
            processes=25)
        multiprocessing.pool.map(self.execute_jobs, self.final_rename_list)


def main():
    gsr = GsRenamer()
    gsr.get_filenames_from_gs()
    gsr.rename_files('work-', '')


if __name__ == "__main__":
    main()

#!/usr/bin/env python3.5

import subprocess

import multiprocessing

import datetime

import shlex

class GsRenamer:

def __init__(self):

self.gs_cmd = '/home/jasonr/google-cloud-sdk/bin/gsutil'

self.file_list = []

self.final_rename_list = []

self.now = datetime.datetime.now()

# method to call subprocess on each command I feed it.

def execute_jobs(self, cmd):

try:

print('{0} INFO: Running rename command: [{1}]'.format(self.now,

cmd))

subprocess.run(shlex.split(cmd), check=True)

except subprocess.CalledProcessError as e:

print('[{0}] FATAL: Command failed with error [{1}]').format(cmd,

# method to get all the filenames from the bucket in gcloud and split it to

# get the filenames. Load the results into a list. filter the list for any

# blank lines.

def get_filenames_from_gs(self):

cmd = [self.gs_cmd, 'ls',

'gs://gs-bucket/jason_testing']

output = subprocess.Popen(cmd, stdout=subprocess.PIPE,

universal_newlines=True).communicate()[0].splitlines()

for full_file_path in output:

file_name = full_file_path.split('/')[-1]

self.file_list.append(file_name)

self.file_list = list(filter(None, self.file_list))

# method to iterate over the list and create the commands,

# also leverage pythons string replace to rename the source

# and target file. build the commands and load into a list.

# Also use pythons multiprocessing module to spawn 25 processes

# that hit the api in chunks cutting down the time to rename

# all the files in place.

def rename_files(self, string_original, string_replace):

for files in self.file_list:

renamed_files = files.replace(string_original,

string_replace)

rename_command = "{0} mv gs://gs-bucket_testing/{1} " \

"gs://gs-bucket/jason_testing/{2}" \

.format(self.gs_cmd, files, renamed_files)

self.final_rename_list.append(rename_command)

self.final_rename_list.sort()

multiprocessing.pool = multiprocessing.Pool(

processes=25)

multiprocessing.pool.map(self.execute_jobs, self.final_rename_list)

def main():

gsr = GsRenamer()

gsr.get_filenames_from_gs()

gsr.rename_files('work-', '')

if __name__ == "__main__":

main()

You can see the process spawns 25 worker processes that will iterate over the list and perform the move in chunks.

jasonr    4612  0.3  0.1 403772 10968 pts/0    Sl+  13:34   0:00  |           \_ python3.5 renamer.py
jasonr    4758  0.0  0.0 176432  8080 pts/0    S+   13:34   0:00  |               \_ python3.5 renamer.py
jasonr    4759  0.0  0.0 176432  8080 pts/0    S+   13:34   0:00  |               \_ python3.5 renamer.py
jasonr    4760  0.0  0.0 176432  8080 pts/0    S+   13:34   0:00  |               \_ python3.5 renamer.py
jasonr    4761  0.0  0.0 176432  8084 pts/0    S+   13:34   0:00  |               \_ python3.5 renamer.py
jasonr    4762  0.0  0.0 176432  8088 pts/0    S+   13:34   0:00  |               \_ python3.5 renamer.py
jasonr    4763  0.0  0.0 176432  8092 pts/0    S+   13:34   0:00  |               \_ python3.5 renamer.py
jasonr    4764  0.0  0.0 176432  8092 pts/0    S+   13:34   0:00  |               \_ python3.5 renamer.py
jasonr    4765  0.0  0.0 176432  8092 pts/0    S+   13:34   0:00  |               \_ python3.5 renamer.py
jasonr    4766  0.0  0.0 176432  8100 pts/0    S+   13:34   0:00  |               \_ python3.5 renamer.py
jasonr    4767  0.0  0.0 176432  8100 pts/0    S+   13:34   0:00  |               \_ python3.5 renamer.py
--SNIP--

jasonr 4612 0.3 0.1 403772 10968 pts/0 Sl+ 13:34 0:00 | \_ python3.5 renamer.py

jasonr 4758 0.0 0.0 176432 8080 pts/0 S+ 13:34 0:00 | \_ python3.5 renamer.py

jasonr 4759 0.0 0.0 176432 8080 pts/0 S+ 13:34 0:00 | \_ python3.5 renamer.py

jasonr 4760 0.0 0.0 176432 8080 pts/0 S+ 13:34 0:00 | \_ python3.5 renamer.py

jasonr 4761 0.0 0.0 176432 8084 pts/0 S+ 13:34 0:00 | \_ python3.5 renamer.py

jasonr 4762 0.0 0.0 176432 8088 pts/0 S+ 13:34 0:00 | \_ python3.5 renamer.py

jasonr 4763 0.0 0.0 176432 8092 pts/0 S+ 13:34 0:00 | \_ python3.5 renamer.py

jasonr 4764 0.0 0.0 176432 8092 pts/0 S+ 13:34 0:00 | \_ python3.5 renamer.py

jasonr 4765 0.0 0.0 176432 8092 pts/0 S+ 13:34 0:00 | \_ python3.5 renamer.py

jasonr 4766 0.0 0.0 176432 8100 pts/0 S+ 13:34 0:00 | \_ python3.5 renamer.py

jasonr 4767 0.0 0.0 176432 8100 pts/0 S+ 13:34 0:00 | \_ python3.5 renamer.py

--SNIP--

Python Function Execute Subprocess With Timeout

June 23, 2019October 15, 2020 adminLeave a comment

I have a project that rsync’s data from an RPM repository for a local version of this repo. The issue I was faced with was the remote mirror would sometimes stop the rsync due to overloaded network or other unforeseen issues. I wanted to use rsyncs hashing algorithm to have it start right where it left off so I wrote a function to do this. If 900 seconds was hit it usually meant there was an issue with the transfer. I also want to state here that I observed the rsync stop serving issue on many mirrors so it was not just an issue with the TCP network. I use this in production and it logs each iteration or restart. The function below will also kill the current rsync so multiple copies are not running at the same time. I also only wanted to perform 5 iterations of rsync upon error or timeout so I use a while loop here.

Here are the individual rsync commands in the INI configuration.

[rsync_cmds]
rsync01 = /usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/os/x86_64/ 7/x86_64/
rsync02 = /usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/updates/x86_64/ 7/updates/x86_64/
rsync03 = /usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/centosplus/x86_64/ 7/centosplus/x86_64/
rsync04 = /usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/extras/x86_64/ 7/extras/x86_64

[rsync_cmds]

rsync01 = /usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/os/x86_64/ 7/x86_64/

rsync02 = /usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/updates/x86_64/ 7/updates/x86_64/

rsync03 = /usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/centosplus/x86_64/ 7/centosplus/x86_64/

rsync04 = /usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/extras/x86_64/ 7/extras/x86_64

Here is how I call the execute_jobs_timeout() function:

rsync_commands = dict(config.items('rsync_cmds'))
def rsync_data():
    for name, cmds in sorted(rsync_commands.items()):
        execute_jobs_timeout(cmds)

rsync_commands = dict(config.items('rsync_cmds'))

def rsync_data():

for name, cmds in sorted(rsync_commands.items()):

execute_jobs_timeout(cmds)

The function:

def execute_jobs_timeout(cmd):
    iteration = 0
    while iteration < 5:
        proc = subprocess.Popen(shlex.split(cmd),
                                start_new_session=True)
        try:
            logger.info('Start Command: [%s]' % sanitize(cmd))
            stdout_data, stderr_data = proc.communicate(timeout=900)
            if proc.returncode != 0:
                logger.critical(
                    "%r failed, status code %s stdout %r stderr %r" % (
                        sanitize(cmd), proc.returncode,
                        stdout_data, stderr_data))
                iteration += 1
                if iteration == 5:
                    logger.critical('Execute Jobs Failed After 5 Iterations.')
                    break
                continue
            logger.info('Success: [%s]' % sanitize(cmd))
            break
        except (subprocess.TimeoutExpired, subprocess.SubprocessError) as e:
            os.killpg(os.getpgid(proc.pid), signal.SIGKILL)
            logger.warning('[%s]' % e)
            logger.info('Restarting [%s]' % sanitize(cmd))
            iteration += 1
            if iteration == 5:
                logger.critical('Execute Jobs Failed After 5 Iterations.')
                break
            continue

def execute_jobs_timeout(cmd):

iteration = 0

while iteration < 5:

proc = subprocess.Popen(shlex.split(cmd),

start_new_session=True)

try:

logger.info('Start Command: [%s]' % sanitize(cmd))

stdout_data, stderr_data = proc.communicate(timeout=900)

if proc.returncode != 0:

logger.critical(

"%r failed, status code %s stdout %r stderr %r" % (

sanitize(cmd), proc.returncode,

stdout_data, stderr_data))

iteration += 1

if iteration == 5:

logger.critical('Execute Jobs Failed After 5 Iterations.')

break

continue

logger.info('Success: [%s]' % sanitize(cmd))

break

except (subprocess.TimeoutExpired, subprocess.SubprocessError) as e:

os.killpg(os.getpgid(proc.pid), signal.SIGKILL)

logger.warning('[%s]' % e)

logger.info('Restarting [%s]' % sanitize(cmd))

iteration += 1

if iteration == 5:

logger.critical('Execute Jobs Failed After 5 Iterations.')

break

continue

Log Snippet showing each command executing:

2019-05-25 03:15:03,872 - __main__ - INFO - Restarting [/usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/os/x86_64/ 7/x86_64/] - devdbadmin
2019-05-25 03:15:03,875 - __main__ - INFO - Start Command: [/usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/os/x86_64/ 7/x86_64/] - devdbadmin
2019-05-25 03:27:53,801 - __main__ - INFO - Success: [/usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/os/x86_64/ 7/x86_64/] - devdbadmin
2019-05-25 03:27:53,821 - __main__ - INFO - Start Command: [/usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/updates/x86_64/ 7/updates/x86_64/] - devdbadmin
2019-05-25 03:42:53,821 - __main__ - WARNING - [Command '['/usr/local/bin/rsync', '-a', 'rsync://mirror.cogentco.com/CentOS/7/updates/x86_64/', '7/updates/x86_64/']' timed out after 899.9999316609465 seconds] - devdbadmin
2019-05-25 03:42:53,822 - __main__ - INFO - Restarting [/usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/updates/x86_64/ 7/updates/x86_64/] - devdbadmin
2019-05-25 03:42:53,850 - __main__ - INFO - Start Command: [/usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/updates/x86_64/ 7/updates/x86_64/] - devdbadmin
2019-05-25 03:57:53,851 - __main__ - WARNING - [Command '['/usr/local/bin/rsync', '-a', 'rsync://mirror.cogentco.com/CentOS/7/updates/x86_64/', '7/updates/x86_64/']' timed out after 899.9999369028956 seconds] - devdbadmin
2019-05-25 03:57:53,852 - __main__ - INFO - Restarting [/usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/updates/x86_64/ 7/updates/x86_64/] - devdbadmin
2019-05-25 03:57:53,854 - __main__ - INFO - Start Command: [/usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/updates/x86_64/ 7/updates/x86_64/] - devdbadmin
2019-05-25 04:01:28,522 - __main__ - INFO - Success: [/usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/updates/x86_64/ 7/updates/x86_64/] - devdbadmin
2019-05-25 04:01:28,524 - __main__ - INFO - Start Command: [/usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/centosplus/x86_64/ 7/centosplus/x86_64/] - devdbadmin
2019-05-25 04:16:28,527 - __main__ - WARNING - [Command '['/usr/local/bin/rsync', '-a', 'rsync://mirror.cogentco.com/CentOS/7/centosplus/x86_64/', '7/centosplus/x86_64/']' timed out after 899.9999288369436 seconds] - devdbadmin

2019-05-25 03:15:03,872 - __main__ - INFO - Restarting [/usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/os/x86_64/ 7/x86_64/] - devdbadmin

2019-05-25 03:15:03,875 - __main__ - INFO - Start Command: [/usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/os/x86_64/ 7/x86_64/] - devdbadmin

2019-05-25 03:27:53,801 - __main__ - INFO - Success: [/usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/os/x86_64/ 7/x86_64/] - devdbadmin

2019-05-25 03:27:53,821 - __main__ - INFO - Start Command: [/usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/updates/x86_64/ 7/updates/x86_64/] - devdbadmin

2019-05-25 03:42:53,821 - __main__ - WARNING - [Command '['/usr/local/bin/rsync', '-a', 'rsync://mirror.cogentco.com/CentOS/7/updates/x86_64/', '7/updates/x86_64/']' timed out after 899.9999316609465 seconds] - devdbadmin

2019-05-25 03:42:53,822 - __main__ - INFO - Restarting [/usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/updates/x86_64/ 7/updates/x86_64/] - devdbadmin

2019-05-25 03:42:53,850 - __main__ - INFO - Start Command: [/usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/updates/x86_64/ 7/updates/x86_64/] - devdbadmin

2019-05-25 03:57:53,851 - __main__ - WARNING - [Command '['/usr/local/bin/rsync', '-a', 'rsync://mirror.cogentco.com/CentOS/7/updates/x86_64/', '7/updates/x86_64/']' timed out after 899.9999369028956 seconds] - devdbadmin

2019-05-25 03:57:53,852 - __main__ - INFO - Restarting [/usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/updates/x86_64/ 7/updates/x86_64/] - devdbadmin

2019-05-25 03:57:53,854 - __main__ - INFO - Start Command: [/usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/updates/x86_64/ 7/updates/x86_64/] - devdbadmin

2019-05-25 04:01:28,522 - __main__ - INFO - Success: [/usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/updates/x86_64/ 7/updates/x86_64/] - devdbadmin

2019-05-25 04:01:28,524 - __main__ - INFO - Start Command: [/usr/local/bin/rsync -a rsync://mirror.cogentco.com/CentOS/7/centosplus/x86_64/ 7/centosplus/x86_64/] - devdbadmin

2019-05-25 04:16:28,527 - __main__ - WARNING - [Command '['/usr/local/bin/rsync', '-a', 'rsync://mirror.cogentco.com/CentOS/7/centosplus/x86_64/', '7/centosplus/x86_64/']' timed out after 899.9999288369436 seconds] - devdbadmin

Python Generator Find Files With Wildcard

February 13, 2018February 13, 2018 adminLeave a comment

This is a neat way to generate file names in a directory that match a specific pattern, I use this to generate a list of files exported out of hive to load into S3.

def find_files(directory, pattern):
    for root, dirs, files in os.walk(directory):
        for basename in sorted(files):
            if fnmatch.fnmatch(basename, pattern):
                filename = os.path.join(root, basename)
                yield filename

def find_files(directory, pattern):

for root, dirs, files in os.walk(directory):

for basename in sorted(files):

if fnmatch.fnmatch(basename, pattern):

filename = os.path.join(root, basename)

yield filename

local_dir = '/mnt/share/etl/date/'
for files in find_files(local_dir,'*.gz'):
    key = files[1:]
    try:
        awss3.upload(key,files)
        log_msg = ('uploading file: [{0}] to S3').format(files)
        log.write(log_msg)
    except Exception as e:
        log_msg = ('ERROR: {0} uploading file: [{0}] to S3').format(e,files)
        log.write(log_msg, 'error')

local_dir = '/mnt/share/etl/date/'

for files in find_files(local_dir,'*.gz'):

key = files[1:]

try:

awss3.upload(key,files)

log_msg = ('uploading file: [{0}] to S3').format(files)

log.write(log_msg)

except Exception as e:

log_msg = ('ERROR: {0} uploading file: [{0}] to S3').format(e,files)

log.write(log_msg, 'error')

Python3 Subprocess and Rsync Deadlock Strace Timeout

February 11, 2018October 9, 2019 admin1 Comment

I recently came across a tough to debug issue where I was calling a shell script from python using the subprocess module, this shell script called rsync, no matter what I would always run into a timeout situation. I fired up strace and noticed that the process was in a timeout state.

select(4, NULL, [3], [3], {60, 0}) = 0 (Timeout)

I looked at the subprocess documentation and apparently using pipes will fill the system pipe buffer.

Warning

This will deadlock when using stdout=PIPE and/or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use communicate() to avoid that.

I was baffled, I finally took the approach to eliminate stderr and stdout and just check the return status of the command using run(). Here is what I finally came up with, and all was well.

stdbuf -oL -e0 /usr/local/bin/rsync --outbuf=N -avz

1	stdbuf -oL -e0 /usr/local/bin/rsync --outbuf=N -avz

def execute_jobs(cmd):
     try:
         logger.info('Start Command: [%s]' % cmd)
         subprocess.run(shlex.split(cmd), check=True)
         logger.info('Command Success: [%s]' % cmd)
     except subprocess.CalledProcessError as e:
         logger.critical('[%s] FATAL: Command failed with error [%s]' % (cmd,e))

def execute_jobs(cmd):

try:

logger.info('Start Command: [%s]' % cmd)

subprocess.run(shlex.split(cmd), check=True)

logger.info('Command Success: [%s]' % cmd)

except subprocess.CalledProcessError as e:

logger.critical('[%s] FATAL: Command failed with error [%s]' % (cmd,e))

Hope you find this and it helps you.

Jason R. Ralph

Linux All Day Everyday

Tag: python

botocore.exceptions.ReadTimeoutError: Read timeout on endpoint URL: https://lambda.us-east-1.amazonaws.com

RequestsDependencyWarning: urllib3 (1.26.18) or chardet (3.0.4) doesn’t match a supported version!

Python Linux Find Files With Pattern Accessed Older Than N Days And Remove

AWS EMR ImportError: this version of pandas is incompatible with numpy < 1.17.3

Automate pg_dump pg_restore Of Tables From Config File Send Slack Update

Python Remove Files That Match Pattern Older Than N Days

Mass Rename Files In Gcloud With Python Multiprocessing Parallel Gsutil

Python Function Execute Subprocess With Timeout

Python Generator Find Files With Wildcard

Python3 Subprocess and Rsync Deadlock Strace Timeout