ImportError: cannot import name ‘is_s3express_bucket’

October 31, 2024October 31, 2024 adminLeave a comment

Another day another odd error, my colleague and I were alerted to an issue on one of our MWAA Airflow 2.5.1 environments.

ImportError: cannot import name 'is_s3express_bucket'

1	ImportError: cannot import name 'is_s3express_bucket'

The above error was happening when a new DAG tried to instantiate S3 via botocore/boto3. We traced it down to the S3transfer package. Apparently in this commit:
https://github.com/boto/s3transfer/commit/3b50c31bb608188cdfb0fc7fd8e8cd03b6b7b187

Support was added for S3express, and from what we gathered we needed to upgrade 3 packages in our constraints.txt to solve the import issue.

=========================================
--constraint "/usr/local/airflow/dags/constraints.txt"

airflow==2.5.1
s3transfer==0.10.3
boto3==1.35.52
botocore==1.35.52
...................................
# Other existing packages
=========================================

=========================================

--constraint "/usr/local/airflow/dags/constraints.txt"

airflow==2.5.1

s3transfer==0.10.3

boto3==1.35.52

botocore==1.35.52

...................................

# Other existing packages

=========================================

After making these package version changes we were back in business.

Redshift Serverless Data Sharing Query aborted due to read failure on a perm block

May 1, 2024 adminLeave a comment

Here is an interesting error that we recently encountered with one of our Redshift Serverless and Redshift Provisioned clusters. We have a data sharing setup where the serverless DB is the producer cluster of certain key tables. We share these tables to a provisioned Redshift cluster via data sharing.

When querying this particular table on the provisioned cluster through the data share with python(psycopg2) and airflow we received the following error.

2024-04-29 18:06:25,073 - ERROR -   | Query aborted due to read failure on a perm block.
  | HINT:  Please try again.
2024-04-29 18:06:25,073 - ERROR -   | Stacktrace:
2024-04-29 18:06:25,073 - ERROR -   | Traceback (most recent call last):
  |   File "/usr/local/airflow/.local/lib/python3.10/site-packages/soda/execution/query/query.py", line 122, in fetchone
  |     cursor.execute(self.sql)
  | psycopg2.errors.IoError: Query aborted due to read failure on a perm block.
  | HINT:  Please try again.

2024-04-29 18:06:25,073 - ERROR - | Query aborted due to read failure on a perm block.

| HINT: Please try again.

2024-04-29 18:06:25,073 - ERROR - | Stacktrace:

2024-04-29 18:06:25,073 - ERROR - | Traceback (most recent call last):

| File "/usr/local/airflow/.local/lib/python3.10/site-packages/soda/execution/query/query.py", line 122, in fetchone

| cursor.execute(self.sql)

| psycopg2.errors.IoError: Query aborted due to read failure on a perm block.

| HINT: Please try again.

We opened a support case with AWS and was informed that this is due to a meta data mismatch that can be resolved by running an update against the shared table on the producer side. After running this update we were back in business and things operated as normal.

The following query can be executed on the producer cluster as a mitigation: “UPDATE SET = 1 WHERE false;” where TABLE_NAME is the name of the table on which queries are failing and COLUMN_NAME is the name of any column in this table. This query will not result in any actual change to the producer’s data, but will result in synchronizing the metadata pertaining to TABLE_NAME on the consumer and thus letting subsequent datasharing queries go through successfully.

"UPDATE <TABLE_NAME> SET <COLUMN_NAME> = 1 WHERE false;"

1	"UPDATE <TABLE_NAME> SET <COLUMN_NAME> = 1 WHERE false;"

— Jason Ralph

AWS Apache Managed Airflow EMR ModuleNotFoundError: No module named ‘requests’ Bootstrap

November 2, 2021November 9, 2021 adminLeave a comment

I came across another fun one the other day, we are in the process of migrating our on premise elastic map reduce system into the cloud. We are using AWS EMR and have AWS Managed Airflow as the executor (DAG). We came across an odd situation with a pyspark application. When using Airflow with a SparkSubmitHook, the job would bootstrap looking just fine according to the run logs, however it would fail with No module named 'requests' when the application tried to import it. This was very odd since we have this application running from spark-submit just fine when calling it from the master node command line.

I decided to investigate the differences, our bootstrap script for installing python modules via pip which we call from the EMR API RunJobFlow call looks like this:

#!/bin/bash
pip_bin=pip3
${pip_bin} install --user -U pip
${pip_bin} install --user boto3
${pip_bin} install --user boto
${pip_bin} install --user requests
${pip_bin} install --user psycopg2-binary

#!/bin/bash

pip_bin=pip3

${pip_bin} install --user -U pip

${pip_bin} install --user boto3

${pip_bin} install --user boto

${pip_bin} install --user requests

${pip_bin} install --user psycopg2-binary

This is very basic, all it does is upgrade PIP and run PIP install to install each of the modules. When checking the bootstrap log I can see that PIP upgrades and goes out to the repo and installs the packages just fine. So why were we getting the No module named 'requests' error when executing through airflow. After a ton of googling and research I have found the issue and applied a solution that worked. Turns out airflow will run as the root user when bootstrapping, so if you notice we use the --user argument in pip. This will instruct the packages to be installed in the calling users home directory, the kicker is the code is run by the hadoop user on the EMR cluster nodes after executing from airflow. So turns out, the hadoop user is unable to access the requests module since root installed it with --user. I changed the bootstrap script to the following and it all started working, by removing --user and prefixing with sudo, the packages now get installed in a globally available area for all users. I am sure there are better ways to do this, I am still learning and researching, but if you run into this, the change below with get you out of the woods.

#!/bin/bash
sudo python3 -m pip install \
                        boto3 \
	                    boto \
		                requests \
                        psycopg2-binary

#!/bin/bash

sudo python3 -m pip install \

boto3 \

boto \

requests \

psycopg2-binary

After some further research, and testing we decided to utilize a requirements.txt file to be called by the bootstrap shell script in the RunJobFlow call, first create a requirements.txt file, I like to hardcode the versions so nothing changes unexpectedly as you bootstrap a new cluster and it reaches out to PyPy to get the packages.

https://docs.aws.amazon.com/emr/latest/APIReference/API_RunJobFlow.html

Add your desired packages and version numbers to a file called requirements.txt like below:

boto3==1.17.54
boto==2.49.0
requests==2.18.4
psycopg2-binary==2.8.6

boto3==1.17.54

boto==2.49.0

requests==2.18.4

psycopg2-binary==2.8.6

Then you will need to copy this file into a bucket you have access to:

aws s3 cp requirements.txt s3://YOUR_S3_BUCKET_NAME/requirements.txt

1	aws s3 cp requirements.txt s3://YOUR_S3_BUCKET_NAME/requirements.txt

Then create a shell script that has the following, call it bootstrap.sh:

#!/bin/bash

set -x 

echo '-----------RUNNING BOOTSTRAP------------------------'

echo '-----------COPYING REQUIREMENTS FILE LOCALLY--------'

aws s3 cp s3://YOUR_S3_BUCKET_NAME/requirements.txt .

echo '-----------INSTALLING REQUIREMENTS------------------'

sudo python3 -m pip install -r requirements.txt

echo '-----------DONE BOOTSTRAP---------------------------'

#!/bin/bash

set -x

echo '-----------RUNNING BOOTSTRAP------------------------'

echo '-----------COPYING REQUIREMENTS FILE LOCALLY--------'

aws s3 cp s3://YOUR_S3_BUCKET_NAME/requirements.txt .

echo '-----------INSTALLING REQUIREMENTS------------------'

sudo python3 -m pip install -r requirements.txt

echo '-----------DONE BOOTSTRAP---------------------------'

Copy that shell script to your bucket:

aws s3 cp bootstrap.sh s3://YOUR_S3_BUCKET_NAME/bootstrap.sh

1	aws s3 cp bootstrap.sh s3://YOUR_S3_BUCKET_NAME/bootstrap.sh

And execute it via the bootstrap actions in the RunJobFlow EMR API call:

"BootstrapActions": [
    {
      "Name": "string",
      "ScriptBootstrapAction": {
        "Path": "s3://YOUR_S3_BUCKET_NAME/bootstrap.sh"
      }
    }
  ],

"BootstrapActions": [

{

"Name": "string",

"ScriptBootstrapAction": {

"Path": "s3://YOUR_S3_BUCKET_NAME/bootstrap.sh"

}

As you can see the shell script will be executed which will copy the requirements.txt file locally and then run pip -r against it which will install all the packages. If you want to see the log on a running cluster, you can ssh to the master node and view the logs here to see the bootstrapping take place:

/emr/instance-controller/log/bootstrap-actions

1	/emr/instance-controller/log/bootstrap-actions

You should see the stdout log as so:

-----------RUNNING BOOTSTRAP------------------
-----------COPYING REQUIREMENTS FILE LOCALLY--------
Completed 67 Bytes/67 Bytes (629 Bytes/s) with 1 file(s) remaining
download: s3://YOUR_S3_BUCKET_NAME/requirements.txt to ./requirements.txt
-----------INSTALLING REQUIREMENTS------------------
Collecting boto==2.48.0
  Downloading boto-2.48.0-py2.py3-none-any.whl (1.4 MB)
Collecting boto3==1.6.15
  Downloading boto3-1.6.15-py2.py3-none-any.whl (128 kB)
Collecting requests==2.18.4
  Downloading requests-2.18.4-py2.py3-none-any.whl (88 kB)
Collecting psycopg2-binary==2.8.6
  Downloading psycopg2_binary-2.8.6-cp37-cp37m-manylinux1_x86_64.whl (3.0 MB)
Collecting botocore<1.10.0,>=1.9.15
  Downloading botocore-1.9.23-py2.py3-none-any.whl (4.1 MB)
Collecting s3transfer<0.2.0,>=0.1.10
  Downloading s3transfer-0.1.13-py2.py3-none-any.whl (59 kB)
Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/local/lib/python3.7/site-packages (from boto3==1.6.15->-r jason_requirements.txt (line 2)) (0.10.0)
Collecting urllib3<1.23,>=1.21.1
  Downloading urllib3-1.22-py2.py3-none-any.whl (132 kB)
Collecting certifi>=2017.4.17
  Downloading certifi-2021.10.8-py2.py3-none-any.whl (149 kB)
Collecting idna<2.7,>=2.5
  Downloading idna-2.6-py2.py3-none-any.whl (56 kB)
Collecting chardet<3.1.0,>=3.0.2
  Downloading chardet-3.0.4-py2.py3-none-any.whl (133 kB)
Requirement already satisfied: docutils>=0.10 in /usr/lib/python3.7/site-packages (from botocore<1.10.0,>=1.9.15->boto3==1.6.15->-r jason_requirements.txt (line 2)) (0.14)
Collecting python-dateutil<2.7.0,>=2.1
  Downloading python_dateutil-2.6.1-py2.py3-none-any.whl (194 kB)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/site-packages (from python-dateutil<2.7.0,>=2.1->botocore<1.10.0,>=1.9.15->boto3==1.6.15->-r jason_requirements.txt (line 2)) (1.13.0)
Installing collected packages: boto, python-dateutil, botocore, s3transfer, boto3, urllib3, certifi, idna, chardet, requests, psycopg2-binary
  Attempting uninstall: boto
    Found existing installation: boto 2.49.0
    Uninstalling boto-2.49.0:
      Successfully uninstalled boto-2.49.0
Successfully installed boto-2.48.0 boto3-1.6.15 botocore-1.9.23 certifi-2021.10.8 chardet-3.0.4 idna-2.6 psycopg2-binary-2.8.6 python-dateutil-2.6.1 requests-2.18.4 s3transfer-0.1.13 urllib3-1.22
-----------DONE BOOTSTRAP---------------------

-----------RUNNING BOOTSTRAP------------------

-----------COPYING REQUIREMENTS FILE LOCALLY--------

Completed 67 Bytes/67 Bytes (629 Bytes/s) with 1 file(s) remaining

download: s3://YOUR_S3_BUCKET_NAME/requirements.txt to ./requirements.txt

-----------INSTALLING REQUIREMENTS------------------

Collecting boto==2.48.0

Downloading boto-2.48.0-py2.py3-none-any.whl (1.4 MB)

Collecting boto3==1.6.15

Downloading boto3-1.6.15-py2.py3-none-any.whl (128 kB)

Collecting requests==2.18.4

Downloading requests-2.18.4-py2.py3-none-any.whl (88 kB)

Collecting psycopg2-binary==2.8.6

Downloading psycopg2_binary-2.8.6-cp37-cp37m-manylinux1_x86_64.whl (3.0 MB)

Collecting botocore<1.10.0,>=1.9.15

Downloading botocore-1.9.23-py2.py3-none-any.whl (4.1 MB)

Collecting s3transfer<0.2.0,>=0.1.10

Downloading s3transfer-0.1.13-py2.py3-none-any.whl (59 kB)

Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/local/lib/python3.7/site-packages (from boto3==1.6.15->-r jason_requirements.txt (line 2)) (0.10.0)

Collecting urllib3<1.23,>=1.21.1

Downloading urllib3-1.22-py2.py3-none-any.whl (132 kB)

Collecting certifi>=2017.4.17

Downloading certifi-2021.10.8-py2.py3-none-any.whl (149 kB)

Collecting idna<2.7,>=2.5

Downloading idna-2.6-py2.py3-none-any.whl (56 kB)

Collecting chardet<3.1.0,>=3.0.2

Downloading chardet-3.0.4-py2.py3-none-any.whl (133 kB)

Requirement already satisfied: docutils>=0.10 in /usr/lib/python3.7/site-packages (from botocore<1.10.0,>=1.9.15->boto3==1.6.15->-r jason_requirements.txt (line 2)) (0.14)

Collecting python-dateutil<2.7.0,>=2.1

Downloading python_dateutil-2.6.1-py2.py3-none-any.whl (194 kB)

Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/site-packages (from python-dateutil<2.7.0,>=2.1->botocore<1.10.0,>=1.9.15->boto3==1.6.15->-r jason_requirements.txt (line 2)) (1.13.0)

Installing collected packages: boto, python-dateutil, botocore, s3transfer, boto3, urllib3, certifi, idna, chardet, requests, psycopg2-binary

Attempting uninstall: boto

Found existing installation: boto 2.49.0

Uninstalling boto-2.49.0:

Successfully uninstalled boto-2.49.0

Successfully installed boto-2.48.0 boto3-1.6.15 botocore-1.9.23 certifi-2021.10.8 chardet-3.0.4 idna-2.6 psycopg2-binary-2.8.6 python-dateutil-2.6.1 requests-2.18.4 s3transfer-0.1.13 urllib3-1.22

-----------DONE BOOTSTRAP---------------------

Hope this helps.

Capture AWS CLI Output With Timestamps On Each Line Of Output

December 31, 2020 adminLeave a comment

I needed a way to get output from aws cli captured into a log file with timestamps, out of the box the aws cli output has no timestamps in the output. If you execute a aws s3 cp command, something like this:

aws s3 cp s3://jason-test-bucket-1/test_part_00 s3://jason-test-bucket-2/jason_test/

1	aws s3 cp s3://jason-test-bucket-1/test_part_00 s3://jason-test-bucket-2/jason_test/

You will see output like so:

copy: s3://jason-test-bucket-1/test_part_00 to s3://jason-test-bucket-2/jason_test/test_part_00

1	copy: s3://jason-test-bucket-1/test_part_00 to s3://jason-test-bucket-2/jason_test/test_part_00

As you can see this does not show a timestamp in each event of output from the aws cli. So I scoured the internet and found out some interesting things. Turns out that aws cli out of the box outputs with carriage returns instead of newlines. So trying standard awk piping methods was not working. Also aws cli has the ability to change the output, so I needed to add a cli parameter to set output to text. Next we needed to use TR to substitute the carriage returns with newlines, finally we can pipe to awk and print a timestamp on each output event from the aws cli. The final command and output looks like this:

#!/bin/bash
log='test.log'
aws s3 --output text cp s3://jason-test-bucket-1/test_part_00 s3://jason-test-bucket-2/jason_test/ | tr "\r" "\n" > >(awk '{print strftime("%Y-%m-%d:%H:%M:%S ") $0}') | tee >> $log 2>&1

#!/bin/bash

log='test.log'

aws s3 --output text cp s3://jason-test-bucket-1/test_part_00 s3://jason-test-bucket-2/jason_test/ | tr "\r" "\n" > >(awk '{print strftime("%Y-%m-%d:%H:%M:%S ") $0}') | tee >> $log 2>&1

Produces the following in the log file which is my desired result:

2020-12-31:13:32:13 Completed 726.3 KiB/726.3 KiB (3.8 MiB/s) with 1 file(s) remaining
2020-12-31:13:32:13 copy: s3://jason-test-bucket-1/test_part_00 to s3://jason-test-bucket-2/jason_test/test_part_00

1 2	2020-12-31:13:32:13 Completed 726.3 KiB/726.3 KiB (3.8 MiB/s) with 1 file(s) remaining 2020-12-31:13:32:13 copy: s3://jason-test-bucket-1/test_part_00 to s3://jason-test-bucket-2/jason_test/test_part_00

I hope this helps someone else as it was a bear to solve for me.

AWS CLI Max Concurrent Requests Tuning

January 3, 2020November 11, 2021 admin4 Comments

In this post I would like to go over how I tuned a test server for copying / syncing files from the local filesystem to S3 over the internet. If you ever had the task of doing this, you will notice that as the file count grows, so does the time it takes to upload the files to S3. After some web searching I found out that AWS allows you to tune the config to allow more concurrency than default.
AWS CLI S3 Config

The parameter that we will be playing with is max_concurrent_requests
This has a default value of 10, which allows only 10 requests to the AWS API for S3. Lets see if we can make some changes to that value and get some performance gains. My test setup is as follows:

2 x Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
8GB RAM
CentOS release 6.10 (Final)

2 x Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

8GB RAM

CentOS release 6.10 (Final)

I have 56 102MB files in the test directory:

-rw-r--r-- 1 jasonr domain^users 101M Sep 24 11:44 sample__0_0_7.csv.gz
-rw-r--r-- 1 jasonr domain^users 102M Sep 24 11:44 sample__0_0_53.csv.gz
-rw-r--r-- 1 jasonr domain^users 101M Sep 24 11:44 sample__0_0_6.csv.gz
-rw-r--r-- 1 jasonr domain^users 101M Sep 24 11:44 sample__0_0_8.csv.gz
-rw-r--r-- 1 jasonr domain^users 101M Sep 24 11:44 sample__0_0_55.csv.gz
--snip--
[jasonr@jr-sandbox jason_test]$ ls| wc -l
56

-rw-r--r-- 1 jasonr domain^users 101M Sep 24 11:44 sample__0_0_7.csv.gz

-rw-r--r-- 1 jasonr domain^users 102M Sep 24 11:44 sample__0_0_53.csv.gz

-rw-r--r-- 1 jasonr domain^users 101M Sep 24 11:44 sample__0_0_6.csv.gz

-rw-r--r-- 1 jasonr domain^users 101M Sep 24 11:44 sample__0_0_8.csv.gz

-rw-r--r-- 1 jasonr domain^users 101M Sep 24 11:44 sample__0_0_55.csv.gz

--snip--

[jasonr@jr-sandbox jason_test]$ ls| wc -l

For the first test I am going to run aws s3 sync with no changes, so out of the box it should have 10 max_concurrent_requests. Lets use the Linux time command to gather the time result to copy all 56 files to S3. I will delete the folder on S3 with each iteration to keep the test the same. You can also view the 443 requests via netstat and count them as well to show whats going on. In all the tests my best result was 250. So as you can see you will need to play with the settings to get the best result, these settings will change along with the server specs.

1. 1m25.919s with the default configuration:

[jasonr@jr-sandbox jason_test]$ time aws s3 sync . s3://dev-redshift/jason_sync_test/
upload: ./sample__0_0_0.csv.gz to s3://dev-redshift/jason_sync_test/sample__0_0_0.csv.gz
upload: ./sample__0_0_10.csv.gz to s3://dev-redshift/jason_sync_test/sample__0_0_10.csv.gz
upload: ./sample__0_0_11.csv.gz to s3://dev-redshift/jason_sync_test/sample__0_0_11.csv.gz
upload: ./sample__0_0_12.csv.gz to s3://dev-redshift/jason_sync_test/sample__0_0_12.csv.gz
upload: ./sample__0_0_13.csv.gz to s3://dev-redshift/jason_sync_test/sample__0_0_13.csv.gz
--snip--

real	1m25.919s
user	0m35.153s
sys	0m15.879s

[jasonr@jr-sandbox jason_test]$ time aws s3 sync . s3://dev-redshift/jason_sync_test/

upload: ./sample__0_0_0.csv.gz to s3://dev-redshift/jason_sync_test/sample__0_0_0.csv.gz

upload: ./sample__0_0_10.csv.gz to s3://dev-redshift/jason_sync_test/sample__0_0_10.csv.gz

upload: ./sample__0_0_11.csv.gz to s3://dev-redshift/jason_sync_test/sample__0_0_11.csv.gz

upload: ./sample__0_0_12.csv.gz to s3://dev-redshift/jason_sync_test/sample__0_0_12.csv.gz

upload: ./sample__0_0_13.csv.gz to s3://dev-redshift/jason_sync_test/sample__0_0_13.csv.gz

--snip--

real 1m25.919s

user 0m35.153s

sys 0m15.879s

2. Now lets set the max conqurent requests to 20 and try again, you can do this with the command below, after running we can see a little gain.

[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 20
[jasonr@jr-sandbox jason_test]$ cat ~/.aws/config 
[default]
s3 =
    max_concurrent_requests = 20
[root@jr-sandbox ~]# netstat -an| grep 443| wc -l
20

real	1m13.277s
user	0m36.186s
sys	0m16.462s

[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 20

[jasonr@jr-sandbox jason_test]$ cat ~/.aws/config

[default]

s3 =

max_concurrent_requests = 20

[root@jr-sandbox ~]# netstat -an| grep 443| wc -l

real 1m13.277s

user 0m36.186s

sys 0m16.462s

3. Bumped up to 50 shows a bit more gain:

[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 50
[jasonr@jr-sandbox jason_test]$ cat ~/.aws/config 
[default]
s3 =
    max_concurrent_requests = 50

[root@jr-sandbox ~]# netstat -an| grep 443| wc -l
49
real	1m0.720s
user	0m37.669s
sys	0m19.344s

[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 50

[jasonr@jr-sandbox jason_test]$ cat ~/.aws/config

[default]

s3 =

max_concurrent_requests = 50

[root@jr-sandbox ~]# netstat -an| grep 443| wc -l

real 1m0.720s

user 0m37.669s

sys 0m19.344s

4. Bumped up to 100, I start to notice that we lost some speed:

[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 100
[jasonr@jr-sandbox jason_test]$ cat ~/.aws/config 
[default]
s3 =
    max_concurrent_requests = 100
[root@jr-sandbox ~]# netstat -an| grep 443| wc -l
95
real	1m4.212s
user	0m39.737s
sys	0m21.847s

[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 100

[jasonr@jr-sandbox jason_test]$ cat ~/.aws/config

[default]

s3 =

max_concurrent_requests = 100

[root@jr-sandbox ~]# netstat -an| grep 443| wc -l

real 1m4.212s

user 0m39.737s

sys 0m21.847s

5. Bumped up to 250 we see the best result so far:

[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 250
[jasonr@jr-sandbox jason_test]$ cat ~/.aws/config 
[default]
s3 =
    max_concurrent_requests = 250
[root@jr-sandbox ~]# netstat -an| grep 443| wc -l
234
real	0m55.036s
user	0m42.841s
sys	0m21.409s

[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 250

[jasonr@jr-sandbox jason_test]$ cat ~/.aws/config

[default]

s3 =

max_concurrent_requests = 250

[root@jr-sandbox ~]# netstat -an| grep 443| wc -l

234

real 0m55.036s

user 0m42.841s

sys 0m21.409s

6. Bumped up to 500, we lose performance, most likely due to the machine resources.

[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 500
[jasonr@jr-sandbox jason_test]$ cat ~/.aws/config 
[default]
s3 =
    max_concurrent_requests = 500
[root@jr-sandbox ~]# netstat -an| grep 443| wc -l
465
real	1m16.593s
user	0m50.336s
sys	0m25.806s

[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 500

[jasonr@jr-sandbox jason_test]$ cat ~/.aws/config

[default]

s3 =

max_concurrent_requests = 500

[root@jr-sandbox ~]# netstat -an| grep 443| wc -l

465

real 1m16.593s

user 0m50.336s

sys 0m25.806s

So to wrap up, you can tune the amount of concurrent requests allowed from the aws cli to s3, you will need to play with this setting to get the best results for your machine.

PSQL Connect To AWS Redshift From Windows 10 PowerShell

March 16, 2018December 20, 2019 admin2 Comments

Coming from a completely Linux background, I was tasked with connecting to a aws redshift cluster or a postgres cluster via Windows powershell and PSQL. I knew it was possible and searching the internet came up with CMD prompt solutions, when I attempted via powershell, I was faced with the following error below, you will need to install postgres on windows10 to get access to the psql binary, you can get it here:
https://www.postgresql.org/download/windows/

PS C:\WINDOWS\system32> psql.exe -h afs-rs-dev02.us-east-1.redshift.amazonaws.com  -p 5439 -U awsmaster benchmark01
Password for user awsmaster:
psql: FATAL:  invalid value for parameter "client_encoding": "WIN1252"

PS C:\WINDOWS\system32> psql.exe -h afs-rs-dev02.us-east-1.redshift.amazonaws.com -p 5439 -U awsmaster benchmark01

Password for user awsmaster:

psql: FATAL: invalid value for parameter "client_encoding": "WIN1252"

Turns out a colleague of mine and I figured out you will need to set the variable PGCLIENTENCODING via the powershell command line. This was expected but we could not nail down the syntax, we found it.

PS C:\WINDOWS\system32> $env:PGCLIENTENCODING='utf-8';
PS C:\WINDOWS\system32> psql.exe -h afs-rs-dev02.us-east-1.redshift.amazonaws.com  -p 5439 -U awsmaster benchmark01
Password for user awsmaster:
psql (10.1, server 8.0.2)
WARNING: Console code page (437) differs from Windows code page (1252)
         8-bit characters might not work correctly. See psql reference
         page "Notes for Windows users" for details.
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

benchmark01=#

PS C:\WINDOWS\system32> $env:PGCLIENTENCODING='utf-8';

PS C:\WINDOWS\system32> psql.exe -h afs-rs-dev02.us-east-1.redshift.amazonaws.com -p 5439 -U awsmaster benchmark01

Password for user awsmaster:

psql (10.1, server 8.0.2)

WARNING: Console code page (437) differs from Windows code page (1252)

8-bit characters might not work correctly. See psql reference

page "Notes for Windows users" for details.

SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)

Type "help" for help.

benchmark01=#

Once this is set, you can connect to PG as normal.

Jason R. Ralph

Linux All Day Everyday

Tag: aws

ImportError: cannot import name ‘is_s3express_bucket’

AWS Apache Managed Airflow EMR ModuleNotFoundError: No module named ‘requests’ Bootstrap

Capture AWS CLI Output With Timestamps On Each Line Of Output

AWS CLI Max Concurrent Requests Tuning

PSQL Connect To AWS Redshift From Windows 10 PowerShell