HTTPSConnectionPool(host=’files.pythonhosted.org’, port=443): Read timed out

June 22, 2023June 22, 2023 adminLeave a comment

I recently had an issue where one of our EMR clusters failed to bootstrap the python modules via PIP. I checked the logs and saw that we ran into the following error:

HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out

1	HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out

I wanted to have PIP not die if it timed out, I also wanted it to retry on failure. By adding the following to my bootstrap.sh I was able to have the PIP socket timeout at a longer interval, also bump up the retries to 10. I have not seen the issue since I applied the new settings.

sudo python3 -m pip --timeout 100 --retries 10 install --upgrade pip
sudo python3 -m pip --timeout 100 --retries 10 install

1 2	sudo python3 -m pip --timeout 100 --retries 10 install --upgrade pip sudo python3 -m pip --timeout 100 --retries 10 install

From the PIP help page:

  --retries <retries>         Maximum number of retries each connection should attempt (default 5 times).
  --timeout <sec>             Set the socket timeout (default 15 seconds).

1 2	--retries <retries> Maximum number of retries each connection should attempt (default 5 times). --timeout <sec> Set the socket timeout (default 15 seconds).

Upgrade Rocky Linux 8 to 9 CLI

May 18, 2023May 18, 2023 admin2 Comments

I thought I would share my version of how I updated the server that runs this blog from Rocky 8 to Rocky 9 without a clean install. I want to mention this is a do at your own risk post, this is not officially supported.

!!!Do not attempt this if you do not have backups and a way to fully recover your system.!!!

The first step I took was go to the rocky download site and make sure I grabbed the latest GPG, RELEASE and REPOS:

https://download.rockylinux.org/pub/rocky/9/BaseOS/x86_64/os/Packages/r/

You will need to modify the below command to match the version you find in the above site, once that is complete you can run it.

[root@rocky-us-east-jasonralph ~]# dnf install -y https://download.rockylinux.org/pub/rocky/9/BaseOS/x86_64/os/Packages/r/{rocky-gpg-keys-9.2-1.4.el9.noarch.rpm,rocky-release-9.2-1.4.el9.noarch.rpm,rocky-repos-9.2-1.4.el9.noarch.rpm}

1	[root@rocky-us-east-jasonralph ~]# dnf install -y https://download.rockylinux.org/pub/rocky/9/BaseOS/x86_64/os/Packages/r/{rocky-gpg-keys-9.2-1.4.el9.noarch.rpm,rocky-release-9.2-1.4.el9.noarch.rpm,rocky-repos-9.2-1.4.el9.noarch.rpm}

One road block was dnf did not like that I had remi and epel release 8, so I removed them and it went fine.

[root@rocky-us-east-jasonralph ~]# dnf install -y https://download.rockylinux.org/pub/rocky/9/BaseOS/x86_64/os/Packages/r/{rocky-gpg-keys-9.2-1.4.el9.noarch.rpm,rocky-release-9.2-1.4.el9.noarch.rpm,rocky-repos-9.2-1.4.el9.noarch.rpm}
Last metadata expiration check: 0:20:38 ago on Thu 18 May 2023 10:27:24 PM UTC.
rocky-gpg-keys-9.2-1.4.el9.noarch.rpm                                                                                                                                                                                                   169 kB/s |  12 kB     00:00    
rocky-release-9.2-1.4.el9.noarch.rpm                                                                                                                                                                                                    403 kB/s |  23 kB     00:00    
rocky-repos-9.2-1.4.el9.noarch.rpm                                                                                                                                                                                                      298 kB/s |  12 kB     00:00    
Error: 
 Problem: problem with installed package remi-release-8.7-2.el8.remi.noarch
  - package remi-release-8.7-2.el8.remi.noarch requires system-release(releasever) = 8, but none of the providers can be installed
  - package remi-release-8.4-1.el8.remi.noarch requires system-release(releasever) = 8, but none of the providers can be installed
  - package remi-release-8.5-2.el8.remi.noarch requires system-release(releasever) = 8, but none of the providers can be installed
  - package remi-release-8.5-3.el8.remi.noarch requires system-release(releasever) = 8, but none of the providers can be installed
  - package remi-release-8.6-1.el8.remi.noarch requires system-release(releasever) = 8, but none of the providers can be installed
  - cannot install both rocky-release-9.2-1.4.el9.noarch and rocky-release-8.7-1.2.el8.noarch
  - conflicting requests
(try to add '--allowerasing' to command line to replace conflicting packages or '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)

[root@rocky-us-east-jasonralph ~]# dnf install -y https://download.rockylinux.org/pub/rocky/9/BaseOS/x86_64/os/Packages/r/{rocky-gpg-keys-9.2-1.4.el9.noarch.rpm,rocky-release-9.2-1.4.el9.noarch.rpm,rocky-repos-9.2-1.4.el9.noarch.rpm}

Last metadata expiration check: 0:20:38 ago on Thu 18 May 2023 10:27:24 PM UTC.

rocky-gpg-keys-9.2-1.4.el9.noarch.rpm 169 kB/s | 12 kB 00:00

rocky-release-9.2-1.4.el9.noarch.rpm 403 kB/s | 23 kB 00:00

rocky-repos-9.2-1.4.el9.noarch.rpm 298 kB/s | 12 kB 00:00

Error:

Problem: problem with installed package remi-release-8.7-2.el8.remi.noarch

- package remi-release-8.7-2.el8.remi.noarch requires system-release(releasever) = 8, but none of the providers can be installed

- package remi-release-8.4-1.el8.remi.noarch requires system-release(releasever) = 8, but none of the providers can be installed

- package remi-release-8.5-2.el8.remi.noarch requires system-release(releasever) = 8, but none of the providers can be installed

- package remi-release-8.5-3.el8.remi.noarch requires system-release(releasever) = 8, but none of the providers can be installed

- package remi-release-8.6-1.el8.remi.noarch requires system-release(releasever) = 8, but none of the providers can be installed

- cannot install both rocky-release-9.2-1.4.el9.noarch and rocky-release-8.7-1.2.el8.noarch

- conflicting requests

(try to add '--allowerasing' to command line to replace conflicting packages or '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)

Find the epel and remi release rpms:

[root@rocky-us-east-jasonralph ~]# rpm -qa| grep release
remi-release-8.7-2.el8.remi.noarch
epel-release-8-19.el8.noarch
rocky-release-8.7-1.2.el8.noarch

[root@rocky-us-east-jasonralph ~]# rpm -qa| grep release

remi-release-8.7-2.el8.remi.noarch

epel-release-8-19.el8.noarch

rocky-release-8.7-1.2.el8.noarch

Remove them:

[root@rocky-us-east-jasonralph ~]# yum remove epel-release-8-19.el8.noarch remi-release-8.7-2.el8.remi.noarch

1	[root@rocky-us-east-jasonralph ~]# yum remove epel-release-8-19.el8.noarch remi-release-8.7-2.el8.remi.noarch

Upgrade your system to 9 from 8:

[root@rocky-us-east-jasonralph ~]# dnf install -y https://download.rockylinux.org/pub/rocky/9/BaseOS/x86_64/os/Packages/r/{rocky-gpg-keys-9.2-1.4.el9.noarch.rpm,rocky-release-9.2-1.4.el9.noarch.rpm,rocky-repos-9.2-1.4.el9.noarch.rpm}

1	[root@rocky-us-east-jasonralph ~]# dnf install -y https://download.rockylinux.org/pub/rocky/9/BaseOS/x86_64/os/Packages/r/{rocky-gpg-keys-9.2-1.4.el9.noarch.rpm,rocky-release-9.2-1.4.el9.noarch.rpm,rocky-repos-9.2-1.4.el9.noarch.rpm}

I ignored this error, it seems like its just a GPG error:

  Running scriptlet: rocky-gpg-keys-8.7-1.2.el8.noarch                                                                                                                                                                                                              6/6 
Line is not an assignment at '/usr/lib/sysctl.d/50-redhat.conf:8': (null)
Couldn't write '1' to 'net/ipv4/conf/*/rp_filter', ignoring: No such file or directory
warning: %transfiletriggerin(systemd-239-68.el8_7.4.x86_64) scriptlet failed, exit status 1

Error in <unknown> scriptlet in rpm package rocky-gpg-keys

Running scriptlet: rocky-gpg-keys-8.7-1.2.el8.noarch 6/6

Line is not an assignment at '/usr/lib/sysctl.d/50-redhat.conf:8': (null)

Couldn't write '1' to 'net/ipv4/conf/*/rp_filter', ignoring: No such file or directory

warning: %transfiletriggerin(systemd-239-68.el8_7.4.x86_64) scriptlet failed, exit status 1

Error in <unknown> scriptlet in rpm package rocky-gpg-keys

Verify:

[root@rocky-us-east-jasonralph ~]# cat /etc/rocky-release
Rocky Linux release 9.2 (Blue Onyx)

1 2	[root@rocky-us-east-jasonralph ~]# cat /etc/rocky-release Rocky Linux release 9.2 (Blue Onyx)

Rebuild the RPM database to now use SQLITE:

[root@rocky-us-east-jasonralph ~]# rpm --rebuilddb

1	[root@rocky-us-east-jasonralph ~]# rpm --rebuilddb

Thats it, reboot:

[root@rocky-us-east-jasonralph ~]# reboot

1	[root@rocky-us-east-jasonralph ~]# reboot

I did have some issues with dnf where I needed to reset some modules.

[root@rocky-us-east-jasonralph ~]#  dnf check
Modular dependency problems:

 Problem 1: conflicting requests
  - nothing provides module(platform:el8) needed by module httpd:2.4:8070020230406163027:3b9f49c4.x86_64
 Problem 2: conflicting requests
  - nothing provides module(platform:el8) needed by module mariadb:10.3:8060020220913075833:d63f516d.x86_64
 Problem 3: conflicting requests
  - nothing provides module(platform:el8) needed by module mysql:8.0:8060020221025174942:d63f516d.x86_64
 Problem 4: conflicting requests
  - nothing provides module(platform:el8) needed by module nginx:1.14:8040020210610090123:9f9e2e7e.x86_64

[root@rocky-us-east-jasonralph ~]# dnf check

Modular dependency problems:

Problem 1: conflicting requests

- nothing provides module(platform:el8) needed by module httpd:2.4:8070020230406163027:3b9f49c4.x86_64

Problem 2: conflicting requests

- nothing provides module(platform:el8) needed by module mariadb:10.3:8060020220913075833:d63f516d.x86_64

Problem 3: conflicting requests

- nothing provides module(platform:el8) needed by module mysql:8.0:8060020221025174942:d63f516d.x86_64

Problem 4: conflicting requests

- nothing provides module(platform:el8) needed by module nginx:1.14:8040020210610090123:9f9e2e7e.x86_64

I needed to reset the modules one by one, there may be more on your system:

[root@rocky-us-east-jasonralph ~]# dnf module reset httpd:2.4 mariadb:10.3 mysql:8.0 nginx:1.14

1	[root@rocky-us-east-jasonralph ~]# dnf module reset httpd:2.4 mariadb:10.3 mysql:8.0 nginx:1.14

That seemed to fix it, good luck.

Python Linux Find Files With Pattern Accessed Older Than N Days And Remove

July 7, 2022July 14, 2022 adminLeave a comment

This is a neat utility that you can use to keep in your sysadmin bag of tricks, it walks the directory you define recursively and grabs all the file access times and stores them into a list, it then compares them against a command line parameter for days ago. If its older than N days it will remove the file. What’s really nice about this utility is it has a debug mode, this way you can see what will be deleted before you remove debug and execute it.

#!/usr/bin/env python3

import argparse
import fnmatch
import os
import sys
from datetime import datetime, timedelta
from pathlib import Path

# set date now.
now = datetime.today()

# setup dir to clean
home = str(Path.home())
target_dir = '/home/jasonr' # CHANGE TO WHERE YOU WANT TO SEARCH

# dir to clean
dirs_to_clean = target_dir

# setup cli arguments.
parser = argparse.ArgumentParser(
    description='''
[--days_ago 60] will keep 60 days worth of files.
[--debug yes] will print out statements with no actions.''',
    formatter_class=argparse.RawTextHelpFormatter)
parser.add_argument('--days_ago',
                    help='[--days_ago NN]')
parser.add_argument('--debug',
                    help='[--debug (yes|no)')
args = parser.parse_args()

# allowed arguments from cli.
accepted_cli_args = ['yes', 'no']

# sanity check, assign days to keep on system.
if args.days_ago is None:
    days = 60
else:
    days = args.days_ago

# define a list of patterns
patterns = ['*.csv', '*.txt'] # YOU CAN ADD ANY PATTERN TO LIST

# sanity check, assign debug true or false
if args.debug in accepted_cli_args:
    if args.debug == 'yes':
        debug = True
    else:
        debug = False
else:
    print("{0}: Wrong parameter --debug (yes or no): [{1}]"
          .format(now, args.debug))
    sys.exit(1)


def find_files(dir_to_clean):
    file_list = []
    days_ago = datetime.now() - timedelta(days=int(days))
    for root, dirs, files in os.walk(dir_to_clean):
        for pattern in patterns:
            for filename in fnmatch.filter(files, pattern):
                file_list.append(os.path.join(root, filename))
                file_list.sort()

    for file in file_list:
        try:
            file_atime = datetime.fromtimestamp(os.path.getatime(file))
        except Exception as e:
            print("{0}: File Access Time Get Failed: [{1}]"
                  .format(now, e))
        if file_atime < days_ago:
            if os.path.isfile(file):
                try:
                    if not debug:
                        print("{0}: Removing file: [{1}]"
                              .format(now, file))
                        os.remove(file)
                    else:
                        print("{0}: DEBUG: Removing file: [{1}]"
                              .format(now, file))
                except OSError as e:
                    print("{0}: File Clean Up Failed: [{1}]"
                          .format(now, e))
                    sys.exit(1)


# main function.
def main():
    find_files(dirs_to_clean)


if __name__ == "__main__":
    main()

#!/usr/bin/env python3

import argparse

import fnmatch

import os

import sys

from datetime import datetime, timedelta

from pathlib import Path

# set date now.

now = datetime.today()

# setup dir to clean

home = str(Path.home())

target_dir = '/home/jasonr' # CHANGE TO WHERE YOU WANT TO SEARCH

# dir to clean

dirs_to_clean = target_dir

# setup cli arguments.

parser = argparse.ArgumentParser(

description='''

[--days_ago 60] will keep 60 days worth of files.

[--debug yes] will print out statements with no actions.''',

formatter_class=argparse.RawTextHelpFormatter)

parser.add_argument('--days_ago',

help='[--days_ago NN]')

parser.add_argument('--debug',

help='[--debug (yes|no)')

args = parser.parse_args()

# allowed arguments from cli.

accepted_cli_args = ['yes', 'no']

# sanity check, assign days to keep on system.

if args.days_ago is None:

days = 60

else:

days = args.days_ago

# define a list of patterns

patterns = ['*.csv', '*.txt'] # YOU CAN ADD ANY PATTERN TO LIST

# sanity check, assign debug true or false

if args.debug in accepted_cli_args:

if args.debug == 'yes':

debug = True

else:

debug = False

else:

print("{0}: Wrong parameter --debug (yes or no): [{1}]"

.format(now, args.debug))

sys.exit(1)

def find_files(dir_to_clean):

file_list = []

days_ago = datetime.now() - timedelta(days=int(days))

for root, dirs, files in os.walk(dir_to_clean):

for pattern in patterns:

for filename in fnmatch.filter(files, pattern):

file_list.append(os.path.join(root, filename))

file_list.sort()

for file in file_list:

try:

file_atime = datetime.fromtimestamp(os.path.getatime(file))

except Exception as e:

print("{0}: File Access Time Get Failed: [{1}]"

.format(now, e))

if file_atime < days_ago:

if os.path.isfile(file):

try:

if not debug:

print("{0}: Removing file: [{1}]"

.format(now, file))

os.remove(file)

else:

print("{0}: DEBUG: Removing file: [{1}]"

.format(now, file))

except OSError as e:

print("{0}: File Clean Up Failed: [{1}]"

.format(now, e))

sys.exit(1)

# main function.

def main():

find_files(dirs_to_clean)

if __name__ == "__main__":

main()

[jasonr@sb-jralph-8 ~]$ python3 finder.py --days_ago 90 --debug yes
2022-07-07 11:22:57.524454: DEBUG: Removing file: [/home/jasonr/aws/dist/awscli/examples/emr/create-cluster-synopsis.txt]
2022-07-07 11:22:57.524454: DEBUG: Removing file: [/home/jasonr/aws/dist/cryptography-3.3.2-py3.8.egg-info/top_level.txt]
2022-07-07 11:22:57.524454: DEBUG: Removing file: [/home/jasonr/aws/dist/docutils/parsers/rst/include/README.txt]
2022-07-07 11:22:57.524454: DEBUG: Removing file: [/home/jasonr/aws/dist/docutils/parsers/rst/include/isoamsa.txt]
2022-07-07 11:22:57.524454: DEBUG: Removing file: [/home/jasonr/aws/dist/docutils/parsers/rst/include/isoamsb.txt]
2022-07-07 11:22:57.524454: DEBUG: Removing file: [/home/jasonr/aws/dist/docutils/parsers/rst/include/isoamsc.txt]

[jasonr@sb-jralph-8 ~]$ python3 finder.py --days_ago 90 --debug yes

2022-07-07 11:22:57.524454: DEBUG: Removing file: [/home/jasonr/aws/dist/awscli/examples/emr/create-cluster-synopsis.txt]

2022-07-07 11:22:57.524454: DEBUG: Removing file: [/home/jasonr/aws/dist/cryptography-3.3.2-py3.8.egg-info/top_level.txt]

2022-07-07 11:22:57.524454: DEBUG: Removing file: [/home/jasonr/aws/dist/docutils/parsers/rst/include/README.txt]

2022-07-07 11:22:57.524454: DEBUG: Removing file: [/home/jasonr/aws/dist/docutils/parsers/rst/include/isoamsa.txt]

2022-07-07 11:22:57.524454: DEBUG: Removing file: [/home/jasonr/aws/dist/docutils/parsers/rst/include/isoamsb.txt]

2022-07-07 11:22:57.524454: DEBUG: Removing file: [/home/jasonr/aws/dist/docutils/parsers/rst/include/isoamsc.txt]

Node Application Stopped Sending Updates To Slack – can’t identify protocol

June 24, 2021 adminLeave a comment

I wanted to share my experience with a node application that I support. This particular application is an API, it happens to log each and every request it receives to a internal slack channel. Our team uses this channel for many things, to verify when the API is in maintenance, to check that requests are processing, to see status on the overall health of the API etc..

Once in a while out of nowhere we would stop receiving these updates to slack. I set out to troubleshoot why this may be happening, at first we thought that we were hitting the slack rate limits, which is clearly defined here:

https://api.slack.com/docs/rate-limits

However after reading the linked doc, I was skeptical. The API does serve a lot of requests, but not enough to hit their limit. We have 2 servers that send slack messages and process the API requests and when they stopped sending it would be both servers, not just one. Also we have run into this before and restarting the service fixed the issue, so I was sure we did not hit the rate limit. Also trying to send a manual slack update using curl would not work! I knew this had to be something with the linux OS itself, and not the Slack service.

I tried to use netstat to see if we were hitting some type of OS limit, and all looked well. Next I tried one of my favorite tools, LSOF, at first I grepped for deleted to see if something was being held and not released. I did not see anything that stood out, next I grepped for node and low and behold I saw this:

[root@ip-172-x-x-x ~]# lsof | grep node
--SNIP--
node       1794 nodeuser   19u     sock                0,6       0t0     651101 can't identify protocol
node       1794 nodeuser   20w      REG              202,1 209793922     294970 /opt/afs/mc_api_logs/debug.log
node       1794 nodeuser   21w      REG              202,1   2409554     274199 /opt/afs/mc_api_logs/exceptions.log
node       1794 nodeuser   22w      REG              202,1    572278     294971 /opt/afs/mc_api_logs/error.log
node       1794 nodeuser   23w      REG              202,1   2409554     274199 /opt/afs/mc_api_logs/exceptions.log
node       1794 nodeuser   24w      REG              202,1   2258649     294980 /opt/afs/mc_api_logs/warn.log
node       1794 nodeuser   25w      REG              202,1   2409554     274199 /opt/afs/mc_api_logs/exceptions.log
node       1794 nodeuser   26w      REG              202,1         0     294989 /opt/afs/mc_api_logs/info.log
node       1794 nodeuser   27w      REG              202,1   2409554     274199 /opt/afs/mc_api_logs/exceptions.log
node       1794 nodeuser   28u     IPv4              13731       0t0        TCP *:pcsync-https (LISTEN)
node       1794 nodeuser   29u     sock                0,6       0t0     512828 can't identify protocol
node       1794 nodeuser   30u     sock                0,6       0t0      14507 can't identify protocol
node       1794 nodeuser   31u     sock                0,6       0t0      14028 can't identify protocol
node       1794 nodeuser   32u     sock                0,6       0t0      15183 can't identify protocol
node       1794 nodeuser   33u     sock                0,6       0t0      15628 can't identify protocol
node       1794 nodeuser   34u     sock                0,6       0t0      16346 can't identify protocol
node       1794 nodeuser   35u     sock                0,6       0t0      15778 can't identify protocol
node       1794 nodeuser   36u     sock                0,6       0t0      16847 can't identify protocol
node       1794 nodeuser   37u     sock                0,6       0t0      17512 can't identify protocol
node       1794 nodeuser   38u     sock                0,6       0t0      25572 can't identify protocol
node       1794 nodeuser   39u     sock                0,6       0t0      18437 can't identify protocol
--SNIP--

[root@ip-172-x-x-x ~]# lsof | grep node

--SNIP--

node 1794 nodeuser 19u sock 0,6 0t0 651101 can't identify protocol

node 1794 nodeuser 20w REG 202,1 209793922 294970 /opt/afs/mc_api_logs/debug.log

node 1794 nodeuser 21w REG 202,1 2409554 274199 /opt/afs/mc_api_logs/exceptions.log

node 1794 nodeuser 22w REG 202,1 572278 294971 /opt/afs/mc_api_logs/error.log

node 1794 nodeuser 23w REG 202,1 2409554 274199 /opt/afs/mc_api_logs/exceptions.log

node 1794 nodeuser 24w REG 202,1 2258649 294980 /opt/afs/mc_api_logs/warn.log

node 1794 nodeuser 25w REG 202,1 2409554 274199 /opt/afs/mc_api_logs/exceptions.log

node 1794 nodeuser 26w REG 202,1 0 294989 /opt/afs/mc_api_logs/info.log

node 1794 nodeuser 27w REG 202,1 2409554 274199 /opt/afs/mc_api_logs/exceptions.log

node 1794 nodeuser 28u IPv4 13731 0t0 TCP *:pcsync-https (LISTEN)

node 1794 nodeuser 29u sock 0,6 0t0 512828 can't identify protocol

node 1794 nodeuser 30u sock 0,6 0t0 14507 can't identify protocol

node 1794 nodeuser 31u sock 0,6 0t0 14028 can't identify protocol

node 1794 nodeuser 32u sock 0,6 0t0 15183 can't identify protocol

node 1794 nodeuser 33u sock 0,6 0t0 15628 can't identify protocol

node 1794 nodeuser 34u sock 0,6 0t0 16346 can't identify protocol

node 1794 nodeuser 35u sock 0,6 0t0 15778 can't identify protocol

node 1794 nodeuser 36u sock 0,6 0t0 16847 can't identify protocol

node 1794 nodeuser 37u sock 0,6 0t0 17512 can't identify protocol

node 1794 nodeuser 38u sock 0,6 0t0 25572 can't identify protocol

node 1794 nodeuser 39u sock 0,6 0t0 18437 can't identify protocol

--SNIP--

My eyes went right to the “can’t identify protocol”, I opened up a browser and started to research, first hit when searching “can’t identify protocol” was a stack overflow article with the solution.

https://stackoverflow.com/questions/7911840/seeing-too-many-lsof-cant-identify-protocol

When lsof prints “Can’t identify protocol”, this usually relates to sockets (it should also say ‘sock’ in the relevant output lines).

So, somewhere in your code you are probably connecting sockets and not closing them properly (perhaps you need a finally block).

I suggest you step through your code with a debugger (easiest to use your IDE, potentially with a remote debugger, if necesssary), while running lsof side-by-side. You should eventually be able to see which thread / line of code is creating these File Descriptors.

Turns out that the node application was opening file descriptors / sockets and not closing them properly, this caused the system to hit the hard limit on open files / file descriptors. You can view the hard and soft limit like so, switch to the user that application is running as and run:

[nodeuser@ip-172-x-x-x ~]$ ulimit -Hn
4096
[nodeuser@ip-172-x-x-x ~]$ ulimit -Sn
1024

[nodeuser@ip-172-x-x-x ~]$ ulimit -Hn

4096

[nodeuser@ip-172-x-x-x ~]$ ulimit -Sn

1024

So you can see that the nodeuser has a hard limit of 4096 open files, which due to the application not properly closing them, we hit the ceiling. This explains why restarting the server or the process fixed it. It would release the open file descriptors and the system was able to open sockets again. I spoke with the developer and we researched, looks like one of the modules we were using was the cause of the issue, perhaps we were using it wrong? I found this out from this article:
https://stackoverflow.com/questions/24922745/node-js-winston-how-to-safely-drain-a-logger

Question:

I have experimented with instantiating and closing winston loggers as (half) described on https://github.com/flatiron/winston#instantiating-your-own-logger, to no avail. I run into trouble closing file transports of Winston’s – walking through it’s source code, I found that the proper way to close off a logger would seem to be the close method. I expected this to take care of closing the transport file used by the logger – however that turned out to be not so.

Varying in frequency according to node.js server load, winston would still hold on to many transport files, infinitely long after the close method had been called for them, indefinitely long after no new writes were being initiated to them. I observed that through the node.js process file descriptors table (lsof -p). Even though close has been called for a Winston logger, it would indefinitely keep the file descriptor of the log file “in use”, i.e. the log file never gets really closed. Thus leaking file descriptors and eventually making the node.js process bump into the ulimit (-n) limit after my application has been up for long.

Should there be a specific programming pattern for draining a Winston logger such that it can be eventually closed?

Answer:

Create only one logger instance and then derive children from it. In this case, winston will hold only one open file handler. Might also be better for performance.

So that was it, the developers agreed and set out to create a patch, problem solved.

AWS CLI Max Concurrent Requests Tuning

January 3, 2020November 11, 2021 admin4 Comments

In this post I would like to go over how I tuned a test server for copying / syncing files from the local filesystem to S3 over the internet. If you ever had the task of doing this, you will notice that as the file count grows, so does the time it takes to upload the files to S3. After some web searching I found out that AWS allows you to tune the config to allow more concurrency than default.
AWS CLI S3 Config

The parameter that we will be playing with is max_concurrent_requests
This has a default value of 10, which allows only 10 requests to the AWS API for S3. Lets see if we can make some changes to that value and get some performance gains. My test setup is as follows:

2 x Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
8GB RAM
CentOS release 6.10 (Final)

2 x Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

8GB RAM

CentOS release 6.10 (Final)

I have 56 102MB files in the test directory:

-rw-r--r-- 1 jasonr domain^users 101M Sep 24 11:44 sample__0_0_7.csv.gz
-rw-r--r-- 1 jasonr domain^users 102M Sep 24 11:44 sample__0_0_53.csv.gz
-rw-r--r-- 1 jasonr domain^users 101M Sep 24 11:44 sample__0_0_6.csv.gz
-rw-r--r-- 1 jasonr domain^users 101M Sep 24 11:44 sample__0_0_8.csv.gz
-rw-r--r-- 1 jasonr domain^users 101M Sep 24 11:44 sample__0_0_55.csv.gz
--snip--
[jasonr@jr-sandbox jason_test]$ ls| wc -l
56

-rw-r--r-- 1 jasonr domain^users 101M Sep 24 11:44 sample__0_0_7.csv.gz

-rw-r--r-- 1 jasonr domain^users 102M Sep 24 11:44 sample__0_0_53.csv.gz

-rw-r--r-- 1 jasonr domain^users 101M Sep 24 11:44 sample__0_0_6.csv.gz

-rw-r--r-- 1 jasonr domain^users 101M Sep 24 11:44 sample__0_0_8.csv.gz

-rw-r--r-- 1 jasonr domain^users 101M Sep 24 11:44 sample__0_0_55.csv.gz

--snip--

[jasonr@jr-sandbox jason_test]$ ls| wc -l

For the first test I am going to run aws s3 sync with no changes, so out of the box it should have 10 max_concurrent_requests. Lets use the Linux time command to gather the time result to copy all 56 files to S3. I will delete the folder on S3 with each iteration to keep the test the same. You can also view the 443 requests via netstat and count them as well to show whats going on. In all the tests my best result was 250. So as you can see you will need to play with the settings to get the best result, these settings will change along with the server specs.

1. 1m25.919s with the default configuration:

[jasonr@jr-sandbox jason_test]$ time aws s3 sync . s3://dev-redshift/jason_sync_test/
upload: ./sample__0_0_0.csv.gz to s3://dev-redshift/jason_sync_test/sample__0_0_0.csv.gz
upload: ./sample__0_0_10.csv.gz to s3://dev-redshift/jason_sync_test/sample__0_0_10.csv.gz
upload: ./sample__0_0_11.csv.gz to s3://dev-redshift/jason_sync_test/sample__0_0_11.csv.gz
upload: ./sample__0_0_12.csv.gz to s3://dev-redshift/jason_sync_test/sample__0_0_12.csv.gz
upload: ./sample__0_0_13.csv.gz to s3://dev-redshift/jason_sync_test/sample__0_0_13.csv.gz
--snip--

real	1m25.919s
user	0m35.153s
sys	0m15.879s

[jasonr@jr-sandbox jason_test]$ time aws s3 sync . s3://dev-redshift/jason_sync_test/

upload: ./sample__0_0_0.csv.gz to s3://dev-redshift/jason_sync_test/sample__0_0_0.csv.gz

upload: ./sample__0_0_10.csv.gz to s3://dev-redshift/jason_sync_test/sample__0_0_10.csv.gz

upload: ./sample__0_0_11.csv.gz to s3://dev-redshift/jason_sync_test/sample__0_0_11.csv.gz

upload: ./sample__0_0_12.csv.gz to s3://dev-redshift/jason_sync_test/sample__0_0_12.csv.gz

upload: ./sample__0_0_13.csv.gz to s3://dev-redshift/jason_sync_test/sample__0_0_13.csv.gz

--snip--

real 1m25.919s

user 0m35.153s

sys 0m15.879s

2. Now lets set the max conqurent requests to 20 and try again, you can do this with the command below, after running we can see a little gain.

[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 20
[jasonr@jr-sandbox jason_test]$ cat ~/.aws/config 
[default]
s3 =
    max_concurrent_requests = 20
[root@jr-sandbox ~]# netstat -an| grep 443| wc -l
20

real	1m13.277s
user	0m36.186s
sys	0m16.462s

[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 20

[jasonr@jr-sandbox jason_test]$ cat ~/.aws/config

[default]

s3 =

max_concurrent_requests = 20

[root@jr-sandbox ~]# netstat -an| grep 443| wc -l

real 1m13.277s

user 0m36.186s

sys 0m16.462s

3. Bumped up to 50 shows a bit more gain:

[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 50
[jasonr@jr-sandbox jason_test]$ cat ~/.aws/config 
[default]
s3 =
    max_concurrent_requests = 50

[root@jr-sandbox ~]# netstat -an| grep 443| wc -l
49
real	1m0.720s
user	0m37.669s
sys	0m19.344s

[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 50

[jasonr@jr-sandbox jason_test]$ cat ~/.aws/config

[default]

s3 =

max_concurrent_requests = 50

[root@jr-sandbox ~]# netstat -an| grep 443| wc -l

real 1m0.720s

user 0m37.669s

sys 0m19.344s

4. Bumped up to 100, I start to notice that we lost some speed:

[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 100
[jasonr@jr-sandbox jason_test]$ cat ~/.aws/config 
[default]
s3 =
    max_concurrent_requests = 100
[root@jr-sandbox ~]# netstat -an| grep 443| wc -l
95
real	1m4.212s
user	0m39.737s
sys	0m21.847s

[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 100

[jasonr@jr-sandbox jason_test]$ cat ~/.aws/config

[default]

s3 =

max_concurrent_requests = 100

[root@jr-sandbox ~]# netstat -an| grep 443| wc -l

real 1m4.212s

user 0m39.737s

sys 0m21.847s

5. Bumped up to 250 we see the best result so far:

[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 250
[jasonr@jr-sandbox jason_test]$ cat ~/.aws/config 
[default]
s3 =
    max_concurrent_requests = 250
[root@jr-sandbox ~]# netstat -an| grep 443| wc -l
234
real	0m55.036s
user	0m42.841s
sys	0m21.409s

[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 250

[jasonr@jr-sandbox jason_test]$ cat ~/.aws/config

[default]

s3 =

max_concurrent_requests = 250

[root@jr-sandbox ~]# netstat -an| grep 443| wc -l

234

real 0m55.036s

user 0m42.841s

sys 0m21.409s

6. Bumped up to 500, we lose performance, most likely due to the machine resources.

[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 500
[jasonr@jr-sandbox jason_test]$ cat ~/.aws/config 
[default]
s3 =
    max_concurrent_requests = 500
[root@jr-sandbox ~]# netstat -an| grep 443| wc -l
465
real	1m16.593s
user	0m50.336s
sys	0m25.806s

[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 500

[jasonr@jr-sandbox jason_test]$ cat ~/.aws/config

[default]

s3 =

max_concurrent_requests = 500

[root@jr-sandbox ~]# netstat -an| grep 443| wc -l

465

real 1m16.593s

user 0m50.336s

sys 0m25.806s

So to wrap up, you can tune the amount of concurrent requests allowed from the aws cli to s3, you will need to play with this setting to get the best results for your machine.

Python Backup WORDPRESS Site / DATABASE and HTML

May 7, 2016May 7, 2016 adminLeave a comment

I have this blog hosted on a LINODE dedicated LINUX server. It’s about 10 dollars a month for a 1 core system with about 250GB of disk space and 1GB of RAM, this server runs the common LAMP stack, I needed a quick and dirty script to backup MYSQL database and the PHP code contained in the /var/www/html folder. I wanted the script to compress the contents of both and move them into a directory with the correct date. See the comments below outlining the code and the action of running it.

# Linux server kernel version details.
[22:47:51] root@jasonralph:~/py_backup_site_full # uname -a
Linux jasonralph.jasonralph.org 4.5.0-x86_64-linode65 #2 SMP Mon Mar 14 18:01:58 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

# Linux distro information
[22:52:15] root@jasonralph:~/py_backup_site_full # cat /etc/centos-release
CentOS release 6.7 (Final)

# Free output
[22:59:58] root@jasonralph:~/py_backup_site_full # free -m
             total       used       free     shared    buffers     cached
Mem:           991        796        195          0         62        329
-/+ buffers/cache:        403        587
Swap:          511          0        511

# Here you can see the python binary location on the filesystem.
[22:46:36] root@jasonralph:~/py_backup_site_full # which python
/usr/local/bin/python

# Now you can see the python version I am using. 
[22:46:50] root@jasonralph:~/py_backup_site_full # python --version
Python 2.7.6

# Long listing of the backup folder I have setup. 
[22:47:48] root@jasonralph:~/py_backup_site_full # ls -l
total 4
-rwxr-xr-x 1 root root 1560 May  7 22:46 wordpress_backup.py

# Linux server kernel version details.

[22:47:51] root@jasonralph:~/py_backup_site_full # uname -a

Linux jasonralph.jasonralph.org 4.5.0-x86_64-linode65 #2 SMP Mon Mar 14 18:01:58 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

# Linux distro information

[22:52:15] root@jasonralph:~/py_backup_site_full # cat /etc/centos-release

CentOS release 6.7 (Final)

# Free output

[22:59:58] root@jasonralph:~/py_backup_site_full # free -m

total used free shared buffers cached

Mem: 991 796 195 0 62 329

-/+ buffers/cache: 403 587

Swap: 511 0 511

# Here you can see the python binary location on the filesystem.

[22:46:36] root@jasonralph:~/py_backup_site_full # which python

/usr/local/bin/python

# Now you can see the python version I am using.

[22:46:50] root@jasonralph:~/py_backup_site_full # python --version

Python 2.7.6

# Long listing of the backup folder I have setup.

[22:47:48] root@jasonralph:~/py_backup_site_full # ls -l

total 4

-rwxr-xr-x 1 root root 1560 May 7 22:46 wordpress_backup.py

# cat of the python script to see the source code. 

[23:02:39] root@jasonralph:~/py_backup_site_full # cat wordpress_backup.py
#!/usr/local/bin/python
import optparse
import os
import datetime
import shutil
from subprocess import Popen, PIPE

date = datetime.datetime.now().strftime('%Y%m%d-%s')
f_date = datetime.datetime.now().strftime('%Y%m%d')

def backup_all_databases():
    args = ['mysqldump', '-u', 'root', '-pPASSWORD', '--all-databases']
    with open("%s.sql.gz" % f_date, 'wb') as f:
        p1 = Popen(args, stdout=PIPE)
        p2 = Popen('gzip', stdin=p1.stdout, stdout=f)
        p1.stdout.close()
        p2.wait()
        p1.wait()

def tar_html_folder():
    output_filename_1 = "%s.html_dir"  % f_date
    output_filename_2 = "%s.html_dir.zip"  % f_date
    dir_name = '/var/www/html'
    dst = "%s" % date
    shutil.make_archive(output_filename_1, 'zip', dir_name)
    shutil.move(output_filename_2, dst)

def main():
    archive_path = "%s" % date
    os.mkdir(archive_path, 0755)
    backup_all_databases()
    src_file = "%s.sql.gz" % f_date
    dst = "%s" % date
    shutil.move(src_file, dst)
    tar_html_folder()

if __name__ == "__main__":
    main()

# cat of the python script to see the source code.

[23:02:39] root@jasonralph:~/py_backup_site_full # cat wordpress_backup.py

#!/usr/local/bin/python

import optparse

import os

import datetime

import shutil

from subprocess import Popen, PIPE

date = datetime.datetime.now().strftime('%Y%m%d-%s')

f_date = datetime.datetime.now().strftime('%Y%m%d')

def backup_all_databases():

args = ['mysqldump', '-u', 'root', '-pPASSWORD', '--all-databases']

with open("%s.sql.gz" % f_date, 'wb') as f:

p1 = Popen(args, stdout=PIPE)

p2 = Popen('gzip', stdin=p1.stdout, stdout=f)

p1.stdout.close()

p2.wait()

p1.wait()

def tar_html_folder():

output_filename_1 = "%s.html_dir" % f_date

output_filename_2 = "%s.html_dir.zip" % f_date

dir_name = '/var/www/html'

dst = "%s" % date

shutil.make_archive(output_filename_1, 'zip', dir_name)

shutil.move(output_filename_2, dst)

def main():

archive_path = "%s" % date

os.mkdir(archive_path, 0755)

backup_all_databases()

src_file = "%s.sql.gz" % f_date

dst = "%s" % date

shutil.move(src_file, dst)

tar_html_folder()

if __name__ == "__main__":

main()

# Lets execute the script and check the contents of the directory it creates. 
[23:02:41] root@jasonralph:~/py_backup_site_full # python wordpress_backup.py
-- Warning: Skipping the data of table mysql.event. Specify the --events option explicitly.

# Long listing of the directoy.
[23:03:47] root@jasonralph:~/py_backup_site_full # ls -l
total 8
drwxr-xr-x 2 root root 4096 May  7 23:03 20160507-1462676621
-rwxr-xr-x 1 root root 1059 May  7 23:02 wordpress_backup.py

# Change directory to the newly created folder. 
[23:03:50] root@jasonralph:~/py_backup_site_full # cd 20160507-1462676621/

# Long listing of the files in the newly created folder. 
[23:03:55] root@jasonralph:~/py_backup_site_full/20160507-1462676621 # ls -l
total 79084
-rw-r--r-- 1 root root 80614105 May  7 23:03 20160507.html_dir.zip
-rw-r--r-- 1 root root   360709 May  7 23:03 20160507.sql.gz

# Lets execute the script and check the contents of the directory it creates.

[23:02:41] root@jasonralph:~/py_backup_site_full # python wordpress_backup.py

-- Warning: Skipping the data of table mysql.event. Specify the --events option explicitly.

# Long listing of the directoy.

[23:03:47] root@jasonralph:~/py_backup_site_full # ls -l

total 8

drwxr-xr-x 2 root root 4096 May 7 23:03 20160507-1462676621

-rwxr-xr-x 1 root root 1059 May 7 23:02 wordpress_backup.py

# Change directory to the newly created folder.

[23:03:50] root@jasonralph:~/py_backup_site_full # cd 20160507-1462676621/

# Long listing of the files in the newly created folder.

[23:03:55] root@jasonralph:~/py_backup_site_full/20160507-1462676621 # ls -l

total 79084

-rw-r--r-- 1 root root 80614105 May 7 23:03 20160507.html_dir.zip

-rw-r--r-- 1 root root 360709 May 7 23:03 20160507.sql.gz

So you can see we generated 2 files in a dated directory, I chose to use both zip and gunzip for compression algoritims. To view the contents you can run the normal linux commands to extract the files.

# unzipping the files using gunzip and zip
[23:03:55] root@jasonralph:~/py_backup_site_full/20160507-1462676621 # unzip 20160507.html_dir.zip && gunzip 20160507.sql.gz

# Long listing of the files 
[23:08:31] root@jasonralph:~/py_backup_site_full/20160507-1462676621 # ls -l
total 80660
-rw-r--r-- 1 root root 80614105 May  7 23:03 20160507.html_dir.zip
-rw-r--r-- 1 root root  1973075 May  7 23:03 20160507.sql
drwxr-xr-x 3 root root     4096 May  7 23:08 jasonralph

# Head command showing the first 10 lines of the sql file. 
[23:11:33] root@jasonralph:~/py_backup_site_full/20160507-1462676621 # head -n 10 20160507.sql
-- MySQL dump 10.13  Distrib 5.1.73, for redhat-linux-gnu (x86_64)
--
-- Host: localhost    Database:
-- ------------------------------------------------------
-- Server version       5.1.73

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;

# Long listing of the jasonralph folder in /var/www/html
[23:12:48] root@jasonralph:~/py_backup_site_full/20160507-1462676621 # ls jasonralph/wordpress/
black_cat_blink.gif  image.jpg    MW_Line.mp4  rh.jpg               wp-admin              wp-config.php         wp-content   wp-links-opml.php  wp-mail.php      wp-settings.php   xmlrpc.php
delete_file.mvg      index.php    out.png      tux_linux_white.jpg  wp-blog-header.php    wp-config.php.bkup    wp-cron.php  wp-load.php        wp-pass.php      wp-signup.php
expolit.svg          license.txt  readme.html  wp-activate.php      wp-comments-post.php  wp-config-sample.php  wp-includes  wp-login.php       wp-register.php  wp-trackback.php

# unzipping the files using gunzip and zip

[23:03:55] root@jasonralph:~/py_backup_site_full/20160507-1462676621 # unzip 20160507.html_dir.zip && gunzip 20160507.sql.gz

# Long listing of the files

[23:08:31] root@jasonralph:~/py_backup_site_full/20160507-1462676621 # ls -l

total 80660

-rw-r--r-- 1 root root 80614105 May 7 23:03 20160507.html_dir.zip

-rw-r--r-- 1 root root 1973075 May 7 23:03 20160507.sql

drwxr-xr-x 3 root root 4096 May 7 23:08 jasonralph

# Head command showing the first 10 lines of the sql file.

[23:11:33] root@jasonralph:~/py_backup_site_full/20160507-1462676621 # head -n 10 20160507.sql

-- MySQL dump 10.13 Distrib 5.1.73, for redhat-linux-gnu (x86_64)

-- Host: localhost Database:

-- ------------------------------------------------------

-- Server version 5.1.73

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;

/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;

/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;

/*!40101 SET NAMES utf8 */;

# Long listing of the jasonralph folder in /var/www/html

[23:12:48] root@jasonralph:~/py_backup_site_full/20160507-1462676621 # ls jasonralph/wordpress/

black_cat_blink.gif image.jpg MW_Line.mp4 rh.jpg wp-admin wp-config.php wp-content wp-links-opml.php wp-mail.php wp-settings.php xmlrpc.php

delete_file.mvg index.php out.png tux_linux_white.jpg wp-blog-header.php wp-config.php.bkup wp-cron.php wp-load.php wp-pass.php wp-signup.php

expolit.svg license.txt readme.html wp-activate.php wp-comments-post.php wp-config-sample.php wp-includes wp-login.php wp-register.php wp-trackback.php

So there you have it, I can tar up the entire dated directory for easy offsite backup now of my entire site jasonralph.org. Hope this helps someone, feel free to copy the source code and change at will.

Best,
Jason

PYTHON – Script to download youtube videos for offline viewing

April 3, 2016April 24, 2016 adminLeave a comment

I was interested in viewing this video of a news conference (USENIX 2016) on my trip home on Metro North Train, NYC => CT. The trip is about an hour an 10 minutes from Manhattan’s Grand Central Terminal to Milford CT, express train that is. My concern was that I would have choppy internet service on the way since I recently updated my laptop and the built in Verizon Mobile card was not activated yet. I would need to use my ATT iPhone as a hotspot, which proved to be very shakey at times. A colleague of mine recommended a website for making youtube videos available for offline viewing. The name of this site was:

http://www.keepvid.com

Right off the rip I was concerned that this site was infested with malware and any other bullshit associated with a free video ripping service. I used the site and was able to create a download of the video I was interested in, however who knows how sick my Windows based machine just got. I could of contracted anything from this site.

I thought about this and said, there has to be a better way, or a python lib for this, and low and behold a search came up with PYTUBE:
https://github.com/nficano/pytube

This library had some interesting features and literally blew away the keepvid site in regards to flexibility. Here is some explaining of what this library can do. Please have a look at the examples below, I will do my best to narrate them.

Here I use PIP to install the PYTUBE lib, you can ignore the DEPRECATION: warning for my outdated python that blares at you for being such an idiot.

[root@jasonralph ~]# pip install pytube
DEPRECATION: Python 2.6 is no longer supported by the Python core team, please upgrade your Python. A future version of pip will drop support for Python 2.6
Collecting pytube
  Using cached pytube-6.1.8.tar.gz
Installing collected packages: pytube
  Running setup.py install for pytube ... done
Successfully installed pytube-6.1.8

[root@jasonralph ~]# pip install pytube

DEPRECATION: Python 2.6 is no longer supported by the Python core team, please upgrade your Python. A future version of pip will drop support for Python 2.6

Collecting pytube

Using cached pytube-6.1.8.tar.gz

Installing collected packages: pytube

Running setup.py install for pytube ... done

Successfully installed pytube-6.1.8

Next up you can see that I am setting a variable yt(this is the video you want to download). Using python’s Pretty Print Lib you can run the pprint(yt.get_videos() method to see what formats are available for download.

Please have a look at the comments in the code for a bit more details in regards to what is going on, in this example I am using the filename Pulp_Fiction.mp4 for my filename I want to be when downloaded.

[jralph@jasonralph ~]$ cat py_video_downloader.py
from pytube import YouTube
from pprint import pprint

yt = YouTube("http://www.youtube.com/watch?v=Ik-RsDGPI5Y")

pprint(yt.get_videos())

print(yt.filename)

yt.set_filename('Pulp_Fiction.mp4')

# Notice that the list is ordered by lowest resolution to highest. If you
# wanted the highest resolution available for a specific file type, you
# can simply do:
print(yt.filter('mp4')[-1])
# <Video: H.264 (.mp4) - 720p>

# You can also get all videos for a given resolution
pprint(yt.filter(resolution='720p'))


video = yt.get('mp4', '720p')

# NOTE: get() can only be used if and only if one object matches your criteria.
# for example:

pprint(yt.videos)


video.download('/home/jralph/')

[jralph@jasonralph ~]$ cat py_video_downloader.py

from pytube import YouTube

from pprint import pprint

yt = YouTube("http://www.youtube.com/watch?v=Ik-RsDGPI5Y")

pprint(yt.get_videos())

print(yt.filename)

yt.set_filename('Pulp_Fiction.mp4')

# Notice that the list is ordered by lowest resolution to highest. If you

# wanted the highest resolution available for a specific file type, you

# can simply do:

print(yt.filter('mp4')[-1])

# <Video: H.264 (.mp4) - 720p>

# You can also get all videos for a given resolution

pprint(yt.filter(resolution='720p'))

video = yt.get('mp4', '720p')

# NOTE: get() can only be used if and only if one object matches your criteria.

# for example:

pprint(yt.videos)

video.download('/home/jralph/')

Ok so here is what it looks like when you execute the program:

[23:32:59] jralph@jasonralph:~ $ python py_video_downloader.py
[<Video: MPEG-4 Visual (.3gp) - 144p - Simple>,
 <Video: MPEG-4 Visual (.3gp) - 240p - Simple>,
 <Video: Sorenson H.263 (.flv) - 240p - N/A>,
 <Video: H.264 (.mp4) - 360p - Baseline>,
 <Video: H.264 (.mp4) - 720p - High>,
 <Video: VP8 (.webm) - 360p - N/A>]
Pulp Fiction - Dancing Scene
<Video: H.264 (.mp4) - 720p - High>
[<Video: H.264 (.mp4) - 720p - High>]
/usr/lib/python2.6/site-packages/pytube/api.py:141: DeprecationWarning: videos property deprecated. Use `get_videos()` instead.
  "instead.", DeprecationWarning)
[<Video: MPEG-4 Visual (.3gp) - 144p - Simple>,
 <Video: MPEG-4 Visual (.3gp) - 240p - Simple>,
 <Video: Sorenson H.263 (.flv) - 240p - N/A>,
 <Video: H.264 (.mp4) - 360p - Baseline>,
 <Video: H.264 (.mp4) - 720p - High>,
 <Video: VP8 (.webm) - 360p - N/A>]

[23:32:59] jralph@jasonralph:~ $ python py_video_downloader.py

[<Video: MPEG-4 Visual (.3gp) - 144p - Simple>,

<Video: MPEG-4 Visual (.3gp) - 240p - Simple>,

<Video: Sorenson H.263 (.flv) - 240p - N/A>,

<Video: H.264 (.mp4) - 360p - Baseline>,

<Video: H.264 (.mp4) - 720p - High>,

<Video: VP8 (.webm) - 360p - N/A>]

Pulp Fiction - Dancing Scene

<Video: H.264 (.mp4) - 720p - High>

[<Video: H.264 (.mp4) - 720p - High>]

/usr/lib/python2.6/site-packages/pytube/api.py:141: DeprecationWarning: videos property deprecated. Use `get_videos()` instead.

"instead.", DeprecationWarning)

[<Video: MPEG-4 Visual (.3gp) - 144p - Simple>,

<Video: MPEG-4 Visual (.3gp) - 240p - Simple>,

<Video: Sorenson H.263 (.flv) - 240p - N/A>,

<Video: H.264 (.mp4) - 360p - Baseline>,

<Video: H.264 (.mp4) - 720p - High>,

<Video: VP8 (.webm) - 360p - N/A>]

As you can see we have a new filename with the video we asked for to watch without a streaming internet connection, here is a ls to show:

[23:33:02] jralph@jasonralph:~ $ ls -ltr
total 32352
drwxr-xr-x 2 root   root       4096 Mar 20 22:50 image_staging
drwxr-xr-x 2 root   root       4096 Mar 20 22:51 JR.ORG_SITE_BACKUPS
-rwxrwxr-x 1 jralph jralph      691 Apr  3 00:34 py_video_downloader.py
-rw-rw-r-- 1 jralph jralph 33113243 Apr  3 23:33 Pulp_Fiction.mp4.mp4

[23:33:02] jralph@jasonralph:~ $ ls -ltr

total 32352

drwxr-xr-x 2 root root 4096 Mar 20 22:50 image_staging

drwxr-xr-x 2 root root 4096 Mar 20 22:51 JR.ORG_SITE_BACKUPS

-rwxrwxr-x 1 jralph jralph 691 Apr 3 00:34 py_video_downloader.py

-rw-rw-r-- 1 jralph jralph 33113243 Apr 3 23:33 Pulp_Fiction.mp4.mp4

As always, I am sure there are better ways to do this and I am sure there is cleaner code. Most of this code was taken right from the authors site who is a badass, here is his link:

https://github.com/nficano/pytube

Hope you liked,
J$0N

CYGWIN – clear.exe from scratch C Program

February 14, 2016 adminLeave a comment

Jason R. Ralph

Linux All Day Everyday

Tag: linux

HTTPSConnectionPool(host=’files.pythonhosted.org’, port=443): Read timed out

Upgrade Rocky Linux 8 to 9 CLI

Python Linux Find Files With Pattern Accessed Older Than N Days And Remove

Node Application Stopped Sending Updates To Slack – can’t identify protocol

AWS CLI Max Concurrent Requests Tuning

Python Backup WORDPRESS Site / DATABASE and HTML

PYTHON – Script to download youtube videos for offline viewing

CYGWIN – clear.exe from scratch C Program