Navigating Files and Directories¶
This tutorial assumes you have already opened or logged into a Unix Shell Environment. This year, we will be practicing these commands within the URI/UMASSD Unity Cluster.
The notes below are modified from the excellent course notes of Ryan Abernathy who modified them from the Unix Shell tutorial that is freely available on the Software Carpentry website. I highly recommend checking out the full version for further reading. The material is being used here under the terms of the Creative Commons Attribution license.
]$
The dollar sign is a prompt, which shows us that the shell is waiting for input; your shell may use a different character as a prompt and may add information before the prompt.
Next,
let's find out where we are by running a command called pwd
(which stands for "print working directory").
This tells you where you are in the current directory structure. Recall from earlier in the course that the path reveals a folder's position within the directory structure. pwd
provides the absolute path to your current position. In my case, it is /users/urihpc25
,
]$ pwd
]$ /users/urihpc25
Directory structure is the organization of files, and other data objects into drives and folders.¶
- The document root specifies the lowest level in the structure. This is usually equivalent to the harddrive.
- The subfolders sit within the document root.
The path specifies the location of a file in the directory strucure.¶
- The path specifies the lowest level in the structure. This is usually equivalent to the harddrive.
- A path can be absolute. Absolute paths are specified w.r.t. the document root.
- A path can be relative to the current location.
Paths are used to help executable files find the other files they need.
Example: If I have an .ipynb in /pylibrary/spongebob and I want to use it to load an image in /Users/Downloads, I can specify the location as:
]$ imread('/Users/Downloads/myimage.png')
# or
]$ imread('../../Users/Downloads/myimage.png')
]$ cd ../brice # relative path, when I'm in spongebob
]$ cd /Users/pylibrary/brice. # absolute path
# The ../ sequence means "go up one level" in the directory structure.
Listing contents¶
The contents of the current directory. The command ls
is short for "list":
]$ ls
anaconda batch.script batch_scripts dask-worker-space data pkgs README scratch TERMS this.png
ls
prints the names of the files and directories in the current directory in
alphabetical order. We can see more detail by typing ls -l
]$ ls -l
drwxr-xr-x 5 urihpc25 uri_fall2021 4096 Oct 25 14:23 anaconda
lrwxrwxrwx. 1 urihpc25 uri_fall2021 27 Jun 17 2014 batch.script -> /gpfs/home/doc/batch.script
lrwxrwxrwx. 1 urihpc25 uri_fall2021 28 Jun 17 2014 batch_scripts -> /gpfs/home/doc/batch_scripts
lrwxrwxrwx. 1 urihpc25 uri_fall2021 23 Sep 14 09:50 data -> /gpfs/data/uri_fall2021
drwxr-xr-x 2 urihpc25 uri_fall2021 4096 Oct 25 18:13 datum
-rw-r--r-- 1 urihpc25 uri_fall2021 145777 Oct 25 19:21 lsexplore.png
drwxr-xr-x 2 urihpc25 uri_fall2021 4096 Oct 25 18:43 ocg404
drwxr-xr-x 2 urihpc25 uri_fall2021 4096 Oct 25 13:02 pkgs
lrwxrwxrwx. 1 urihpc25 uri_fall2021 21 Jun 17 2014 README -> /gpfs/home/doc/README
-rw-r--r-- 1 urihpc25 uri_fall2021 8686 Feb 19 2018 README.GTF
lrwxrwxrwx. 1 urihpc25 uri_fall2021 22 Sep 14 09:50 scratch -> /gpfs/scratch/urihpc25
lrwxrwxrwx. 1 urihpc25 uri_fall2021 20 Jun 17 2014 TERMS -> /gpfs/home/doc/TERMS
This format gives a whole lot of information. The first column containts the permissions for the system/group/user level. The 3rd column says the owner (urihpc25),4th column is the group name, 5th column is file size in bytes, and columns 6-8 show modification date. This is a little like the text version of what you see in Finder on Mac or Explorer on Windows.
If you want to see all of the options for 'ls', type $ man ls to get the manual.
]$ man ls
Navigating around¶
To change locations within the directories, use the 'cd' command, which stands for 'change directory'
These commands result in the same thing. They use the relative and absolute path respectively to change into the data directory.
]$ cd data
]$ cd /users/urihpc25/data
]$ pwd
]$ /users/urihpc25/data
This command moves up one level
]$ cd ../
]$ pwd
]$ /users/urihpc25
The ..
is a special directory name meaning, the parent of the current directory.
Now take a guess at what this command will do:
]$ pwd
]$ /users/urihpc25
]$ cd ../../users
]$ pwd
Where have we ended up after that? Take a guess and give it a try.
NOTE: cd
without an argument will return you to your home directory,
which is great if you've gotten lost in your own filesystem.
Tab Completion¶
Some names can be long and cumbersome to type
]$ ls dask-worker-space
Using tab completion, the Shell will search the directory and offer matches. Typing
]$ ls da
and then pressing the tab key on the keyboard, the shell provides two options:
dask-worker-space/ data/
Adding one more letter, the 's' will leave only one option, so typing tab again will produce.
]$ ls dask-worker-space/
Arrow keys¶
By using the Up/Down arrow keys at the command prompt, you can cycle through your command history. This is useful if you need to repeat some commands or refresh your memory on what you did.
Key Points:¶
- The file system is responsible for managing information on the disk.
- Information is stored in files, which are stored in directories (folders).
- Directories can also store other directories, which forms a directory tree.
cd path
changes the current working directory.ls path
prints a listing of a specific file or directory;ls
on its own lists the current working directory.pwd
prints the user's current working directory./
on its own is the root directory of the whole file system.- A relative path specifies a location starting from the current location.
- An absolute path specifies a location from the root of the file system.
- Directory names in a path are separated with '/' on Unix, but '\\' on Windows.
- '..' means 'the directory above the current one'; '.' on its own means 'the current directory'.
Make, Copy, Delete¶
We're about to generate some new Python code and files; to keep things organized, make a new directory
]$ mkdir ocg404
mkdir
means "make directory". Since ocg404
is a relative path
(i.e., doesn't have a leading slash), the new directory is created in the current working directory:
]$ ls
ls
reveals its existence
anaconda batch.script batch_scripts dask-worker-space data ocg404 pkgs README scratch TERMS this.png
We can add a .py
script to this directory, by using the Nano editor and passing it the name of our new file, which we just made up:
]$ nano ocg404/script1.py
Type in a few lines of code:
]$ print("Hellow,orld");
Once done, press Ctrl-X
(press the Ctrl or Control key and, while
holding it down, press the X key), you will be asked if you want to save the file. Choose yes, and then the file is saved and closed.
If you want to make a new version of script1.py
, but not lose the original version, then you can make a copy:
]$ cd ocg404/
]$ cp script1.py script2.py
To rename the file instead of copy it, use the mv
or move command:
]$ mv script1.py script2.py
After this, script1.py will cease to exist.
To delete a file use the rm
command. Let's change into the ocg404 directory and delete it. Assuming we are still inside ocg404/
]$ rm script2.py
Likewise, this achieves the same result as above, if our directory location is up one level in /users/urihpc25
.
]$ rm ocg404/script2.py
NOTE: Deleting Is Forever
The Unix shell doesn't have a trash bin that we can recover deleted files from (though most graphical interfaces to Unix do). Instead, when we delete files, they are unhooked from the file system so that their storage space on disk can be recycled. Tools for finding and recovering deleted files do exist, but there's no guarantee they'll work in any particular situation, since the computer may recycle the file's disk space right away.
Suppose we want to delete the ocg404/ directory and all of its contents:
]$ rm ocg404/
rm: cannot remove ‘ocg404’: Is a directory
To delete the folder and all of its contents, we can force rm to recursively descend into the directory and remove everythng.
]$ rm -rf ocg404
Wild cards¶
Shell commands have lots of shortcuts for handling many files. This is necessary, because without a mouse, you can't do things like SHIFT + Click to select multiple files. The wildcards are two great examples of this.
]$ ls *.* # List all of the files and all of the extensions (e.g. .csv, .txt. doc)
]$ ls *.csv # List all files of type .csv
]$ rm *.png # Delete all files of type .png. This is powerful, but also dangerous!
]$ ls 2021* # List all the files that begin with 2021, regardless of their extension.
In my ocg404 directory, I have a bunch of geotiff files and associated metadata. I want to get a list of the Landsat 5 files. These all start with LT05
[urihpc25@login005 ocg404]$ ls -l LT05* # List all the files that begin with 2021, regardless of their extension.
-rw-r--r-- 1 urihpc25 uri_fall2021 34884 Aug 30 2016 LT05_L1TP_012031_20110902_20160831_01_T1_ANG.txt
-rw-r--r-- 1 urihpc25 uri_fall2021 60124178 Aug 30 2016 LT05_L1TP_012031_20110902_20160831_01_T1_B1.TIF
-rw-r--r-- 1 urihpc25 uri_fall2021 60124178 Aug 30 2016 LT05_L1TP_012031_20110902_20160831_01_T1_B2.TIF
-rw-r--r-- 1 urihpc25 uri_fall2021 60124178 Aug 30 2016 LT05_L1TP_012031_20110902_20160831_01_T1_B3.TIF
-rw-r--r-- 1 urihpc25 uri_fall2021 60124178 Aug 30 2016 LT05_L1TP_012031_20110902_20160831_01_T1_B4.TIF
-rw-r--r-- 1 urihpc25 uri_fall2021 60124178 Aug 30 2016 LT05_L1TP_012031_20110902_20160831_01_T1_B5.TIF
-rw-r--r-- 1 urihpc25 uri_fall2021 60124178 Aug 30 2016 LT05_L1TP_012031_20110902_20160831_01_T1_B6.TIF
-rw-r--r-- 1 urihpc25 uri_fall2021 60124178 Aug 30 2016 LT05_L1TP_012031_20110902_20160831_01_T1_B7.TIF
-rw-r--r-- 1 urihpc25 uri_fall2021 120189198 Aug 30 2016 LT05_L1TP_012031_20110902_20160831_01_T1_BQA.TIF
-rw-r--r-- 1 urihpc25 uri_fall2021 17395 Aug 30 2016 LT05_L1TP_012031_20110902_20160831_01_T1_GCP.txt
-rw-r--r-- 1 urihpc25 uri_fall2021 7392 Aug 30 2016 LT05_L1TP_012031_20110902_20160831_01_T1_MTL.txt
-rw-r--r-- 1 urihpc25 uri_fall2021 132825010 Oct 25 20:16 LT05_L1TP_012031_20110902_20160831_01_T1.tar.gz
-rw-r--r-- 1 urihpc25 uri_fall2021 261826 Aug 30 2016 LT05_L1TP_012031_20110902_20160831_01_T1_VER.jpg
-rw-r--r-- 1 urihpc25 uri_fall2021 119248 Aug 30 2016 LT05_L1TP_012031_20110902_20160831_01_T1_VER.txt
These are a few simple exeamples of what are referred to as filename substitutions.
Key Points¶
cp old new
copies existing file titled old into new file title new.mkdir path
creates a new directory.mv old new
moves (renames) a file or directory.rm path
removes (deletes) a file.- Use of the Control key may be described in many ways, including
Ctrl-X
,Control-X
, and^X
. - The shell does not have a trash bin: once something is deleted, it's really gone.
- Depending on the type of work you do, you may need a more powerful text editor than Nano.
Additional notes:¶
Good names for files and directories¶
Complicated names of files and directories can make your life painful when working on the command line. Here we provide a few useful tips for the names of your files.
Don't use whitespaces.
Whitespaces can make a name more meaningful but since whitespace is used to break arguments on the command line is better to avoid them on name of files and directories. You can use
-
or_
instead of whitespace.Don't begin the name with
-
(dash).Commands treat names starting with
-
as options.Stick with letters, numbers,
.
(period),-
(dash) and_
(underscore).Many other characters have special meanings on the command line, and might confuse your operation.
If you need to refer to names of files or directories that have whitespace
or another non-alphanumeric character, you should surround the name in quotes (""
).
Which Editor?¶
When we say, "nano
is a text editor," we really do mean "text": it can
only work with plain character data, not tables, images, or any other
human-friendly media. We use it in examples because it is one of the
least complex text editors. However, because of this trait, it may
not be powerful enough or flexible enough for the work you need to do
after this workshop. On Unix systems (such as Linux and Mac OS X),
many programmers use Emacs or
Vim (both of which require more time to learn),
or a graphical editor such as
Gedit. On Windows, you may wish to
use Notepad++. Windows also has a built-in
editor called notepad
that can be run from the command line in the same
way as nano
for the purposes of this lesson.
No matter what editor you use, you will need to know where it searches for and saves files. If you start it from the shell, it will (probably) use your current working directory as its default location. If you use your computer's start menu, it may want to save files in your desktop or documents directory instead. You can change this by navigating to another directory the first time you "Save As..."