{"id":985,"date":"2022-07-07T11:28:23","date_gmt":"2022-07-07T15:28:23","guid":{"rendered":"https:\/\/jasonralph.org\/?p=985"},"modified":"2022-07-14T22:44:51","modified_gmt":"2022-07-15T02:44:51","slug":"python-linux-find-files-with-pattern-accessed-older-than-n-days","status":"publish","type":"post","link":"https:\/\/jasonralph.org\/?p=985","title":{"rendered":"Python Linux Find Files With Pattern Accessed Older Than N Days And Remove"},"content":{"rendered":"<p>This is a neat utility that you can use to keep in your sysadmin bag of tricks, it walks the directory you define recursively and grabs all the file access times and stores them into a list, it then compares them against a command line parameter for days ago.  If its older than N days it will remove the file.  What&#8217;s really nice about this utility is it has a debug mode, this way you can see what will be deleted before you remove debug and execute it. <\/p>\n<pre class=\"theme:solarized-dark lang:default decode:true \" >\r\n#!\/usr\/bin\/env python3\r\n\r\nimport argparse\r\nimport fnmatch\r\nimport os\r\nimport sys\r\nfrom datetime import datetime, timedelta\r\nfrom pathlib import Path\r\n\r\n# set date now.\r\nnow = datetime.today()\r\n\r\n# setup dir to clean\r\nhome = str(Path.home())\r\ntarget_dir = '\/home\/jasonr' # CHANGE TO WHERE YOU WANT TO SEARCH\r\n\r\n# dir to clean\r\ndirs_to_clean = target_dir\r\n\r\n# setup cli arguments.\r\nparser = argparse.ArgumentParser(\r\n    description='''\r\n[--days_ago 60] will keep 60 days worth of files.\r\n[--debug yes] will print out statements with no actions.''',\r\n    formatter_class=argparse.RawTextHelpFormatter)\r\nparser.add_argument('--days_ago',\r\n                    help='[--days_ago NN]')\r\nparser.add_argument('--debug',\r\n                    help='[--debug (yes|no)')\r\nargs = parser.parse_args()\r\n\r\n# allowed arguments from cli.\r\naccepted_cli_args = ['yes', 'no']\r\n\r\n# sanity check, assign days to keep on system.\r\nif args.days_ago is None:\r\n    days = 60\r\nelse:\r\n    days = args.days_ago\r\n\r\n# define a list of patterns\r\npatterns = ['*.csv', '*.txt'] # YOU CAN ADD ANY PATTERN TO LIST\r\n\r\n# sanity check, assign debug true or false\r\nif args.debug in accepted_cli_args:\r\n    if args.debug == 'yes':\r\n        debug = True\r\n    else:\r\n        debug = False\r\nelse:\r\n    print(\"{0}: Wrong parameter --debug (yes or no): [{1}]\"\r\n          .format(now, args.debug))\r\n    sys.exit(1)\r\n\r\n\r\ndef find_files(dir_to_clean):\r\n    file_list = []\r\n    days_ago = datetime.now() - timedelta(days=int(days))\r\n    for root, dirs, files in os.walk(dir_to_clean):\r\n        for pattern in patterns:\r\n            for filename in fnmatch.filter(files, pattern):\r\n                file_list.append(os.path.join(root, filename))\r\n                file_list.sort()\r\n\r\n    for file in file_list:\r\n        try:\r\n            file_atime = datetime.fromtimestamp(os.path.getatime(file))\r\n        except Exception as e:\r\n            print(\"{0}: File Access Time Get Failed: [{1}]\"\r\n                  .format(now, e))\r\n        if file_atime < days_ago:\r\n            if os.path.isfile(file):\r\n                try:\r\n                    if not debug:\r\n                        print(\"{0}: Removing file: [{1}]\"\r\n                              .format(now, file))\r\n                        os.remove(file)\r\n                    else:\r\n                        print(\"{0}: DEBUG: Removing file: [{1}]\"\r\n                              .format(now, file))\r\n                except OSError as e:\r\n                    print(\"{0}: File Clean Up Failed: [{1}]\"\r\n                          .format(now, e))\r\n                    sys.exit(1)\r\n\r\n\r\n# main function.\r\ndef main():\r\n    find_files(dirs_to_clean)\r\n\r\n\r\nif __name__ == \"__main__\":\r\n    main()\r\n<\/pre>\n<pre class=\"theme:solarized-dark lang:default decode:true \" >\r\n[jasonr@sb-jralph-8 ~]$ python3 finder.py --days_ago 90 --debug yes\r\n2022-07-07 11:22:57.524454: DEBUG: Removing file: [\/home\/jasonr\/aws\/dist\/awscli\/examples\/emr\/create-cluster-synopsis.txt]\r\n2022-07-07 11:22:57.524454: DEBUG: Removing file: [\/home\/jasonr\/aws\/dist\/cryptography-3.3.2-py3.8.egg-info\/top_level.txt]\r\n2022-07-07 11:22:57.524454: DEBUG: Removing file: [\/home\/jasonr\/aws\/dist\/docutils\/parsers\/rst\/include\/README.txt]\r\n2022-07-07 11:22:57.524454: DEBUG: Removing file: [\/home\/jasonr\/aws\/dist\/docutils\/parsers\/rst\/include\/isoamsa.txt]\r\n2022-07-07 11:22:57.524454: DEBUG: Removing file: [\/home\/jasonr\/aws\/dist\/docutils\/parsers\/rst\/include\/isoamsb.txt]\r\n2022-07-07 11:22:57.524454: DEBUG: Removing file: [\/home\/jasonr\/aws\/dist\/docutils\/parsers\/rst\/include\/isoamsc.txt]\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>This is a neat utility that you can use to keep in your sysadmin bag of tricks, it walks the directory you define recursively and grabs all the file access times and stores them into a list, it then compares them against a command line parameter for days ago. If its older than N days [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1,4],"tags":[114,113,85,82,12,24],"class_list":["post-985","post","type-post","status-publish","format-standard","hentry","category-general-code","category-python","tag-access-time","tag-atime","tag-files","tag-find","tag-linux","tag-python-2"],"_links":{"self":[{"href":"https:\/\/jasonralph.org\/index.php?rest_route=\/wp\/v2\/posts\/985","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jasonralph.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jasonralph.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jasonralph.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/jasonralph.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=985"}],"version-history":[{"count":4,"href":"https:\/\/jasonralph.org\/index.php?rest_route=\/wp\/v2\/posts\/985\/revisions"}],"predecessor-version":[{"id":989,"href":"https:\/\/jasonralph.org\/index.php?rest_route=\/wp\/v2\/posts\/985\/revisions\/989"}],"wp:attachment":[{"href":"https:\/\/jasonralph.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=985"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jasonralph.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=985"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jasonralph.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=985"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}