{"id":791,"date":"2020-01-03T09:54:57","date_gmt":"2020-01-03T14:54:57","guid":{"rendered":"http:\/\/jasonralph.org\/?p=791"},"modified":"2021-11-11T16:45:51","modified_gmt":"2021-11-11T21:45:51","slug":"aws-cli-max-concurrent-requests-tuning","status":"publish","type":"post","link":"https:\/\/jasonralph.org\/?p=791","title":{"rendered":"AWS CLI Max Concurrent Requests Tuning"},"content":{"rendered":"<p>In this post I would like to go over how I tuned a test server for copying \/ syncing files from the local filesystem to S3 over the internet.  If you ever had the task of doing this, you will notice that as the file count grows, so does the time it takes to upload the files to S3.  After some web searching I found out that AWS allows you to tune the config to allow more concurrency than default.<br \/>\n<a href=\"https:\/\/docs.aws.amazon.com\/cli\/latest\/topic\/s3-config.html\">AWS CLI S3 Config<\/a><\/p>\n<p>The parameter that we will be playing with is <strong>max_concurrent_requests<\/strong><br \/>\nThis has a default value of 10, which allows only 10 requests to the AWS API for S3.  Lets see if we can make some changes to that value and get some performance gains.  My test setup is as follows:<\/p>\n<pre class=\"theme:solarized-dark lang:default decode:true \" >\r\n2 x Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz\r\n8GB RAM\r\nCentOS release 6.10 (Final)\r\n<\/pre>\n<p>I have 56 102MB files in the test directory:<\/p>\n<pre class=\"theme:solarized-dark lang:default decode:true \" >\r\n-rw-r--r-- 1 jasonr domain^users 101M Sep 24 11:44 sample__0_0_7.csv.gz\r\n-rw-r--r-- 1 jasonr domain^users 102M Sep 24 11:44 sample__0_0_53.csv.gz\r\n-rw-r--r-- 1 jasonr domain^users 101M Sep 24 11:44 sample__0_0_6.csv.gz\r\n-rw-r--r-- 1 jasonr domain^users 101M Sep 24 11:44 sample__0_0_8.csv.gz\r\n-rw-r--r-- 1 jasonr domain^users 101M Sep 24 11:44 sample__0_0_55.csv.gz\r\n--snip--\r\n[jasonr@jr-sandbox jason_test]$ ls| wc -l\r\n56\r\n<\/pre>\n<p>For the first test I am going to run aws s3 sync with no changes, so out of the box it should have 10 max_concurrent_requests.  Lets use the Linux time command to gather the time result to copy all 56 files to S3.  I will delete the folder on S3 with each iteration to keep the test the same. You can also view the 443 requests via netstat and count them as well to show whats going on.  In all the tests my best result was 250.  So as you can see you will need to play with the settings to get the best result, these settings will change along with the server specs. <\/p>\n<p>1. 1m25.919s with the default configuration:<\/p>\n<pre class=\"theme:solarized-dark lang:default decode:true \" >\r\n[jasonr@jr-sandbox jason_test]$ time aws s3 sync . s3:\/\/dev-redshift\/jason_sync_test\/\r\nupload: .\/sample__0_0_0.csv.gz to s3:\/\/dev-redshift\/jason_sync_test\/sample__0_0_0.csv.gz\r\nupload: .\/sample__0_0_10.csv.gz to s3:\/\/dev-redshift\/jason_sync_test\/sample__0_0_10.csv.gz\r\nupload: .\/sample__0_0_11.csv.gz to s3:\/\/dev-redshift\/jason_sync_test\/sample__0_0_11.csv.gz\r\nupload: .\/sample__0_0_12.csv.gz to s3:\/\/dev-redshift\/jason_sync_test\/sample__0_0_12.csv.gz\r\nupload: .\/sample__0_0_13.csv.gz to s3:\/\/dev-redshift\/jason_sync_test\/sample__0_0_13.csv.gz\r\n--snip--\r\n\r\nreal\t1m25.919s\r\nuser\t0m35.153s\r\nsys\t0m15.879s\r\n<\/pre>\n<p>2. Now lets set the max conqurent requests to 20 and try again, you can do this with the command below, after running we can see a little gain. <\/p>\n<pre class=\"theme:solarized-dark lang:default decode:true \" >\r\n[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 20\r\n[jasonr@jr-sandbox jason_test]$ cat ~\/.aws\/config \r\n[default]\r\ns3 =\r\n    max_concurrent_requests = 20\r\n[root@jr-sandbox ~]# netstat -an| grep 443| wc -l\r\n20\r\n\r\nreal\t1m13.277s\r\nuser\t0m36.186s\r\nsys\t0m16.462s\r\n<\/pre>\n<p>3. Bumped up to 50 shows a bit more gain:<\/p>\n<pre class=\"theme:solarized-dark lang:default decode:true \" >\r\n[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 50\r\n[jasonr@jr-sandbox jason_test]$ cat ~\/.aws\/config \r\n[default]\r\ns3 =\r\n    max_concurrent_requests = 50\r\n\r\n[root@jr-sandbox ~]# netstat -an| grep 443| wc -l\r\n49\r\nreal\t1m0.720s\r\nuser\t0m37.669s\r\nsys\t0m19.344s\r\n<\/pre>\n<p>4. Bumped up to 100, I start to notice that we lost some speed:<\/p>\n<pre class=\"theme:solarized-dark lang:default decode:true \" >\r\n[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 100\r\n[jasonr@jr-sandbox jason_test]$ cat ~\/.aws\/config \r\n[default]\r\ns3 =\r\n    max_concurrent_requests = 100\r\n[root@jr-sandbox ~]# netstat -an| grep 443| wc -l\r\n95\r\nreal\t1m4.212s\r\nuser\t0m39.737s\r\nsys\t0m21.847s\r\n<\/pre>\n<p>5. Bumped up to 250 we see the best result so far:<\/p>\n<pre class=\"theme:solarized-dark lang:default decode:true \" >\r\n[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 250\r\n[jasonr@jr-sandbox jason_test]$ cat ~\/.aws\/config \r\n[default]\r\ns3 =\r\n    max_concurrent_requests = 250\r\n[root@jr-sandbox ~]# netstat -an| grep 443| wc -l\r\n234\r\nreal\t0m55.036s\r\nuser\t0m42.841s\r\nsys\t0m21.409s\r\n<\/pre>\n<p>6. Bumped up to 500, we lose performance, most likely due to the machine resources. <\/p>\n<pre class=\"theme:solarized-dark lang:default decode:true \" >\r\n[jasonr@jr-sandbox jason_test]$ aws configure set default.s3.max_concurrent_requests 500\r\n[jasonr@jr-sandbox jason_test]$ cat ~\/.aws\/config \r\n[default]\r\ns3 =\r\n    max_concurrent_requests = 500\r\n[root@jr-sandbox ~]# netstat -an| grep 443| wc -l\r\n465\r\nreal\t1m16.593s\r\nuser\t0m50.336s\r\nsys\t0m25.806s\r\n<\/pre>\n<p>So to wrap up, you can tune the amount of concurrent requests allowed from the aws cli to s3, you will need to play with this setting to get the best results for your machine.  <\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this post I would like to go over how I tuned a test server for copying \/ syncing files from the local filesystem to S3 over the internet. If you ever had the task of doing this, you will notice that as the file count grows, so does the time it takes to upload [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[38,1],"tags":[62,78,12,79,80],"class_list":["post-791","post","type-post","status-publish","format-standard","hentry","category-coding-thoughts","category-general-code","tag-aws","tag-cli","tag-linux","tag-s3","tag-tuning"],"_links":{"self":[{"href":"https:\/\/jasonralph.org\/index.php?rest_route=\/wp\/v2\/posts\/791","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jasonralph.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jasonralph.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jasonralph.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/jasonralph.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=791"}],"version-history":[{"count":7,"href":"https:\/\/jasonralph.org\/index.php?rest_route=\/wp\/v2\/posts\/791\/revisions"}],"predecessor-version":[{"id":952,"href":"https:\/\/jasonralph.org\/index.php?rest_route=\/wp\/v2\/posts\/791\/revisions\/952"}],"wp:attachment":[{"href":"https:\/\/jasonralph.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=791"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jasonralph.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=791"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jasonralph.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=791"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}