I needed a utility to alert our team when any long running queries were running on a production postgres cluster. I came up with the following python code that achieves just that. This would alert slack if an active query exceeds 45 mins. The script takes in user parameters as well, I will demonstrate the way to call it. Hope it helps someone.
CRON CALL:
1 2 |
### postgres long running query check ### */15 * * * * /usr/bin/python2.7 /home/postgres/bin/pg_long_running_query.py --database proddb --dbhost proddb01 --user postgres --alert_time_mins 45 >> /home/postgres/pg_long_running_query.log 2>&1 |
CODE:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
#!/usr/bin/python2.7 __author__ = "Jason Ralph" import psycopg2 import psycopg2.extras import argparse import urllib def send_message_to_slack(text): import requests import json webhook_url = 'https://hooks.slack.com/services/--redacted--' slack_data = {'text': "%s" % text} response = requests.post( webhook_url, data=json.dumps(slack_data), headers={'Content-Type': 'application/json'} ) if response.status_code != 200: raise ValueError( 'Request to slack returned an error %s, the response is:\n%s' % (response.status_code, response.text) ) def get_long_running_queries(): parser = argparse.ArgumentParser(description='Check long Running ' 'Queries On Postgres ' 'Databases And Alert') parser.add_argument('--database', help='target database') parser.add_argument('--dbhost', help='target dbhost') parser.add_argument('--user', help='database user') parser.add_argument('--alert_time_mins', help='alert time in mins: e.g 30') args = parser.parse_args() conn = psycopg2.connect("dbname='%s' host='%s' user='%s' port=5432" % (args.database, args.dbhost, args.user)) sql = ("""SELECT pid, usename, now() - pg_stat_activity.query_start AS duration, query, state FROM pg_stat_activity WHERE (now() - pg_stat_activity.query_start) > interval '"%s" minutes';""") % args.alert_time_mins cursor = conn.cursor(cursor_factory=psycopg2.extras.DictCursor) cursor.execute(sql) count = 0 while True: row = cursor.fetchone() if row is None: break if row['usename'] == 'postgres': continue if row['state'] == 'idle': continue count += 1 pid = row['pid'] user = row['usename'] duration = row['duration'] query = row['query'] state = row['state'] msg_items = ['LONG RUNNING QUERY ON HOST: %s\n' % args.dbhost, 'PID: %s\n' % pid, 'DURATION: %s\n' % duration, 'QUERY: %s\n' % query, 'STATE: %s\n' % state, 'USER: %s\n' % user, 'COUNT: %s\n' % count] msg = ''.join(msg_items) send_message_to_slack(msg) conn.close() def main(): get_long_running_queries() if __name__ == '__main__': main() |
SLACK MESSAGE:
1 2 3 4 5 6 7 |
LONG RUNNING QUERY ON HOST: proddb01 PID: 30270 DURATION: 0:55:02.748624 QUERY: SELECT --redacted-- STATE: active USER: dbuser COUNT: 1 |