If you are looking to utilize Python to manipulate your directory tree or files on your system, there are many tools to help, including Python's standard os module. The following is a simple/basic recipe to assist with finding certain files on your system by file extension.
If you have had the experience of "losing" a file in your system where you don't remember its location and are not even sure of its name, though you remember its type, this is where you might find this recipe useful.
In a way this recipe is a combination of How to Traverse a Directory Tree and Recursive Directory Traversal in Python: Make a list of your movies!, but we'll tweak it a bit and build upon it in part two.
To script this task, we can use the
walk
function in the os.path
module or the walk
function in the os
module (using Python version 2.x or Python 3.x, respectively).Recursion with os.path.walk in Python 2.x
Theos.path.walk
function takes 3 arguments:arg
- an arbitrary (but mandatory) argument.visit
- a function to execute upon each iteration.top
- the top of the directory tree to walk.
arg
.Here is the definition of step:
os.path.walk
function, not by the user. The three arguments that walk passes on each iteration are:ext
- the arbitrary argument given toos.path.walk
.dirname
- the directory name for that iteration.names
- the names of all files underdirname
.
os.path.walk
.The second line ensures our
ext
string is lowercase. The
third line begins our loop of the argument names, which is a list type.
The fourth line is how we retrieve the names of files with the
extension we want, using the string method endswith
to test for a suffix.The final line prints the path of any file that passes the suffix (extension) test, concatenating the
dirname
argument to the name (with the appropriate system-dependent separator).Now after combining our step function with the walk function, the script looks something like this:
wx_py
installed in the site-packages for Python 2.7, the output looks like this:Recursion with os.walk in Python 3.x
Now let's do the same using Python 3.x.The
os.walk
function in Python 3.x works differently,
providing a few more options than the other. It takes 4 arguments, and
only the first is mandatory. The arguments (and their default values) in
order are:top
- the root of the directory to walk.topdown(=True)
- boolean designating top-down or bottom-up walking.onerror(=None)
- name of a function to call if an error occurs.followlinks(=False)
- boolean designating whether or not to follow symbolic links.
Instead of defining a separate function to call as with step we will write the
os.walk
generator into the loop that went into the step
function. Like the Python 2.x version, os.walk
produces 3 values we can use for every iteration (the directory path,
the directory names, and the filenames), but this time they are in the
form of a 3-tuple, so we have to adjust our method accordingly. Other
than that we won't change the extension suffix test at all, so the
script ends up looking something like this:with
statement (as in Reading and Writing Files in Python) like so:logname
, and the third argument to os.path.walk
. The with statement has replaced the print
statement. Because of the nature of os.path.walk
function, step
is required to open up the log file, write to it, and close it every
time it finds a file name; this won't cause any errors but is a bit
awkward. We must also note that because the log file is opened up in
append mode, it will not overwrite a log file that exists already, it will only append to the file. This means if we run the script 2 or more times in a row without changing the logname
, the results for each run will be added to the same file, which may not be desirable.The modified version Python 3.x script is much less awkward:
results
string, and then when the search is over, the results are written to
the log file. Unlike the Python 2.x version, the log file is opened in write mode, meaning any existing log file will
be overwritten. In both cases the log file will be written in the same
directory as the script (because we didn't specify a full path name).With that we have a simple script to find files of a certain extension under a file tree and log those results. In the parts that follow we'll build upon this adding functionality to search for multiple file types, avoid certain paths, and more.
No comments:
Post a Comment