Move running commands in Linux to blog website

author: alex <alex@pdp7.net> 2026-02-22 18:23:37 +0100
committer: alex <alex@pdp7.net> 2026-02-22 18:26:11 +0100
commit: 7993f53c88f341a18f8823c4d9855dc895125bda (patch)
tree: 96740386fff0a7e265e649a1282dca08cff33c89 /blog/content/notes/tech/running-commands-in-linux.gmi
parent: 1e12a1428290563788102b57740d572b491e8a47 (diff)
1 files changed, 259 insertions, 0 deletions
diff --git a/blog/content/notes/tech/running-commands-in-linux.gmi b/blog/content/notes/tech/running-commands-in-linux.gmi
new file mode 100644
index 00000000..4fe4a004
--- /dev/null
+++ b/blog/content/notes/tech/running-commands-in-linux.gmi
@@ -0,0 +1,259 @@
+# Running commands in Linux
+
+## Motivating examples
+
+=> https://cwe.mitre.org/data/definitions/1337.html The 2021 CWE Top 25 Most Dangerous Software Weaknesses helps focus on the biggest security issues that developers face.
+
+=> https://cwe.mitre.org/data/definitions/78.html Number 5 on that list is Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').
+
+Software developers often write code that invokes other programs. For example, shell scripts tend to be mostly composed of invocations of programs such as find, grep, etc. Even software developed in languages such as Python, C, or Java often invokes other programs.
+
+Python software developers use the subprocess module to perform this task. Other languages provide similar facilities.
+
+Consider the two following Python sessions to execute an equivalent to the bash statement "cat /etc/passwd":
+
+```
+$ python3
+>>> import subprocess
+>>> subprocess.run(["cat", "/etc/passwd"])
+```
+
+```
+$ python3
+>>> import subprocess
+>>> subprocess.run("cat /etc/passwd", shell=True)
+```
+
+Both scripts use the same run function, with different values of the shell parameter (the shell parameter defaults to True). When executing a command with many arguments, shell=True seems to be terser. "a b c d e" is shorter and easier to read than ["a", "b", "c", "d", "e"]. Readable code is easier to maintain, so a software developer could prefer the shell=True version.
+
+However, using shell=True can introduce the "OS Command Injection" weakness easily.
+
+Create a file named "injection.py" with the following contents:
+
+```
+import sys
+import subprocess
+
+subprocess.run(f"cat {sys.argv[1]}", shell=True)
+```
+
+This program uses the cat command to display the contents of a file.
+For example, if you run (using Python 3.6 or higher):
+
+```
+$ python3 injection.py /etc/passwd
+```
+
+The terminal shows the contents of the `/etc/passwd` file.
+
+However, if you run:
+
+```
+$ python3 injection.py '/etc/passwd ; touch injected'
+```
+
+The terminal shows the same file, but a file named `injected` also appears in the current directory.
+
+Create a file named "safe.py" with the following contents:
+
+```
+import sys
+import subprocess
+
+subprocess.run(["cat", sys.argv[1]])
+```
+
+Running "python3 safe.py /etc/passwd" has the same behavior as using injection.py. However, repeating the command that creates a file using safe.py results in:
+
+```
+$ python3 safe.py '/etc/passwd ; touch injected'
+cat: '/etc/passwd ; touch injected': No such file or directory
+```
+
+injection.py is vulnerable to "OS Command Injection" because it uses shell=True, whereas safe.py is not.
+
+If a malicious user can get strings such as "/etc/passwd ; touch injected" to code that uses shell=True, then the user can execute arbitrary code in the system. Code that does not handle user input might not be exposed to such issues, but user input might creep in and introduce unexpected vulnerabilities. Avoiding the use of `shell=True` and similar features can be safer than making sure that user input is correctly handled in all cases.
+
+## Writing shell scripts that handle files with spaces in their names
+
+Create a file called backup.sh with the following contents:
+
+```
+#!/bin/bash
+
+for a in $1/* ; do
+    cp $a $a.bak
+done
+```
+
+Run the following statements in the terminal to create a sample directory with files.
+
+```
+$ mkdir backup_example_1
+$ for a in $(seq 1 9) ; do echo $a >backup_example_1/$a ; done
+```
+
+These statements create the backup_example_1 directory, and files named 1 ... 9.
+
+The backup.sh script creates a copy of each file in a directory. If you run:
+
+```
+$ bash backup.sh backup_example_1/
+```
+
+Then the script will copy 1 to 1.bak, and so on.
+
+However, if you create a new directory with files whose names have spaces:
+
+```
+$ mkdir backup_example_2
+$ for a in $(seq 1 9) ; do echo $a >backup_example_1/"file $a" ; done
+```
+
+Then the backup.sh script does not work correctly:
+
+```
+$ bash backup.sh backup_example_2/
+cp: cannot stat 'backup_example_2//*': No such file or directory
+```
+
+In order to fix the script, change the contents of backup.sh to:
+
+```
+#!/bin/bash
+
+for a in "$1/*" ; do
+    cp "$a" "$a.bak"
+done
+```
+
+## Background
+
+### int main(int argc, char *argv[])
+
+Programs written in C for Linux define a function called main that is the entry point of the program. Documents such as the N2310 draft of the C language standard describe the main function. Page 11, section 5.1.2.2.1, "Program startup", provides a common definition of main:
+
+```
+int main(int argc, char *argv[]) { /* ... */ }
+```
+
+=> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2310.pdf The N2310 draft of the C language standard
+
+The argc parameter contains the **c**ount of the arguments provided to the program. The argv parameter contains their **v**alues.
+
+Create a file named argv.c with the following contents:
+
+```
+#include <stdio.h>
+
+int main(int argc, char *argv[]) {
+    for(int i=0; i<argc; i++) {
+        printf("Argument %d -%s-\n", i, argv[i]);
+    }
+}
+```
+
+Compile the file running the following command:
+
+```
+$ cc argv.c
+```
+
+This produces an executable file named "a.out". This executable will print the arguments you provide via the command line:
+
+```
+$ ./a.out
+Argument 0 -./a.out-
+```
+
+```
+$ ./a.out arg1 arg2 arg3
+Argument 0 -./a.out-
+Argument 1 -arg1-
+Argument 2 -arg2-
+Argument 3 -arg3-
+```
+
+Note that the first argument is the name of the executable file itself.
+
+Note that when using quoting, the program prints things like:
+
+```
+$ ./a.out "a b" c
+Argument 0 -./a.out-
+Argument 1 -a b-
+Argument 2 -c-
+```
+
+So the first argument is "a b" (without quotes).
+
+### exec(3)
+
+UNIX-like operating systems provide the "exec" family of functions to invoke commands. "man 3 exec" describes the exec family of functions in Linux. Linux provides the execl, execlp, execle, execv, execvp, and execvpe functions. These functions allow us to execute a command from within a C program.
+
+Create a file named execlp.c with the following contents:
+
+```
+#include <stdlib.h>
+#include <unistd.h>
+
+int main() {
+    exit(execlp("cat", "cat", "/etc/passwd", NULL));
+}
+```
+
+Compile the file running the following command:
+
+```
+$ cc execlp.c
+```
+
+This produces an executable file named "a.out".
+Execute it:
+
+```
+$ ./a.out
+```
+
+This is equivalent to running in a shell the statement "cat /etc/passwd".
+
+This article does not describe the intricacies of the exec family of functions. However, let's analyze the call to execlp.
+
+The exec functions whose name contains a "p" look up the command to execute by searching for executables named like the first argument in the directories listed in the PATH environment variable. In the example, execlp looks up the cat executable in directories such as /usr/bin.
+
+The second argument is also the name of the program.
+
+Note that in the preceding argv.c example, the zeroth argument is the name of the program being executed. Some executables in Linux systems are present under different names (using symbolic links). For example, xzcat is a symbolic link to xz. Running xzcat or xz runs the same executable file, but the executable uses the zeroth argument to change its behavior.
+
+This technique is a simple way to "share" code between similar programs. The BusyBox project provides many common utilities, such as ls and cat, in a single executable. By sharing code among all utilities, the BusyBox executable is smaller.
+
+The rest of the parameters to execlp are the arguments for the executable file.
+
+In a way, exec functions "call" the main function of other programs. The parameters to exec are "passed" to the main function.
+
+### Shells
+
+Programs such as bash provide a way to execute other programs. When you type a statement such as "cat /etc/passwd", bash parses the statement into a command to execute and arguments. Then, bash uses an exec function to run the program with arguments.
+
+The simplest bash statements are words separated by spaces, of the form "arg0 arg1 arg2 ... argn".
+
+On such a statement, bash executes something like:
+
+```
+execlp(arg0, arg0, arg1, _..._, argn, NULL)
+```
+
+And the program will receive the string arg0 as the zeroth argument, arg1 as the first argument, and so forth.
+
+However, using cat to view the contents of files, the user might want to view a file whose name contains spaces.
+
+The statement "cat a b" has two arguments: a and b. For each argument, cat prints the file of that name. So the "cat a b" statement prints the contents of the a and b files, not of a file named "a b".
+
+## Further reading
+
+=> http://teaching.idallen.com/cst8177/13w/notes/000_find_and_xargs.html Using find -exec or xargs to process pathnames with other commands
+=> https://infosec.exchange/@david_chisnall/115116683569142801 Early UNIX did glob expansion in the shell not because that’s more sensible than providing a glob and option parsing API in the standard library, but because they didn’t have enough disk space or RAM to duplicate code and they didn’t have shared libraries... For example, on FreeBSD, I often do pkg info foo* to print info about packages that start with some string. If I forget to quote the last argument, this behaves differently depending on whether the current directory contains one or more files that have the prefix that I used. If they do, the shell expands them and pkg info returns nothing because I don’t have any installed packages that match those files. If they don’t, the shell passes the star to the program, which does glob expansion but against a namespace that is not the filesystem namespace. The pkg tool knows that this argument is a set of names of installed packages, not files in the current directory, but it can’t communicate that to the shell and so the shell does the wrong thing. Similarly, on DOS the rename command took a load of source files and a destination file or pattern. You could do rename *.c *.txt and it would expand the first pattern, then do the replacement based on the two patterns. UNIX’s mv can’t do that and I deleted a bunch of files by accident when I started using Linux because it’s not obvious to a user what actually happens when you write mv *.c *.txt. There is a GNU (I think?) rename command and its syntax is far more baroque than the DOS one because it is fighting against the shell doing expansion without any knowledge of the argument structure.
+
+## TODO
+
+=> https://news.ycombinator.com/item?id=36722570 SSH particularities
author	alex <alex@pdp7.net>	2026-02-22 18:23:37 +0100
committer	alex <alex@pdp7.net>	2026-02-22 18:26:11 +0100
commit	7993f53c88f341a18f8823c4d9855dc895125bda (patch)
tree	96740386fff0a7e265e649a1282dca08cff33c89 /blog/content/notes/tech/running-commands-in-linux.gmi
parent	1e12a1428290563788102b57740d572b491e8a47 (diff)