aboutsummaryrefslogtreecommitdiff
path: root/programming/python
diff options
context:
space:
mode:
authoralex <alex@pdp7.net>2026-03-01 20:03:15 +0100
committeralex <alex@pdp7.net>2026-03-01 20:03:15 +0100
commitd2ebc40be0d31b66a3a9f90f7a6bd0ed0dccbe7c (patch)
tree9f3e1a00c15254861444a96e52f02f2065dee9d4 /programming/python
parentaf4ff4abd127145ccd4d58d060c982456ba7a334 (diff)
Condense most Python articles to single article in blog website
Diffstat (limited to 'programming/python')
-rw-r--r--programming/python/creating_nice_python_cli_tools.md40
-rw-r--r--programming/python/dependency_handling.md111
-rw-r--r--programming/python/project_setup.md114
-rw-r--r--programming/python/scraping_with_selenium_on_docker.md7
4 files changed, 0 insertions, 272 deletions
diff --git a/programming/python/creating_nice_python_cli_tools.md b/programming/python/creating_nice_python_cli_tools.md
deleted file mode 100644
index b192da1a..00000000
--- a/programming/python/creating_nice_python_cli_tools.md
+++ /dev/null
@@ -1,40 +0,0 @@
-Following this advice can make your tools easy to install by others, pleasant to use, robust, cross-platform, and powerful.
-
-* Use [my suggestions for setting up Python projects](project_setup.md), particularly:
- * Provide instructions for installing your tool using [pipx](https://github.com/pypa/pipx).
- Using pipx, people can install and upgrade your script using a simple command that requires no administrative privileges (but it requires having Python and pipx installed).
- * As you are using [uv](https://docs.astral.sh/uv/), following the indications above:
- * Use [entry points](https://docs.astral.sh/uv/concepts/projects/config/#entry-points), so when installing your tool via pipx or other means, your scripts are added to the user's path.
- * Dependencies you define will be installed automatically along with your application.
- This reduces the effort users need to use your application if you need third-party libraries.
- However, I would still advise to avoid unnecessary dependencies (for simple HTTP requests you can use the base library. If you do complex requests, then using a third-party library might be much simpler).
- As you are using pipx, those dependencies will be installed to a isolated virtualenv, so they will not interfere with anything on your system.
- * As your application is properly packaged, you can split your code into different Python files and use imports without issues.
-* If your application requires secrets, such as credentials or others, consider using:
- * The standard [getpass](https://docs.python.org/3/library/getpass.html) module.
- This prompts for a string on the command line, hiding what the user types.
- * The [keyring](https://pypi.org/project/keyring/) library.
- This stores secrets using your operating system facilities.
-* Use the [appdirs](https://pypi.org/project/appdirs/) library to obtain "user paths", such as the users directory for configuration, cache, or data.
- appdirs knows the proper paths for Linux, macOS and Windows.
- So for example, if your tool caches files and uses appdirs to find the cache directory, you might gain benefits such as cache files being excluded from backups.
-* If your tool requires significant time to complete a process:
- * Use the [tqdm](https://tqdm.github.io/) library to add a progress bar.
- * But also consider using the standard [concurrent.futures](https://docs.python.org/3/library/concurrent.futures.html) module to add parallelism if you can.
- The [map](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor.map) function is particularly easy to use.
- Use it with a [ThreadPoolExecutor](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor) if the parallel tasks are IO-bound or invoke other programs, or with [ProcessPoolExecutor](https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor) if they perform significant CPU work in Python (to avoid the [GIL](https://wiki.python.org/moin/GlobalInterpreterLock)).
- * Consider using the standard [logging](https://docs.python.org/3/library/logging.html) module with a format that uses a timestamp, so users can inspect how much time is spent in different parts of the program.
- You can also use logging module to implement flags such as `--debug` and `--verbose`.
-* Although fancier tools exist, the standard [argparse](https://docs.python.org/3/library/argparse.html) module is good enough for most argument parsing.
- It has decent support for [sub-commands](https://docs.python.org/3/library/argparse.html#sub-commands), and the linked document describes a very nice pattern to define functions for sub-commands, under "One particularly effective way of handling sub-commands..."
- Provide help text for non-obvious parameters.
- argparse supports a lot of different argument types with a lot of functionality out of the box, such as enumerated options, integers, and file names.
- The main reason for using a fancier argument parsing is that argparse does not have autocomplete support, but you can add [argcomplete](https://github.com/kislyuk/argcomplete) to an argparse program with minimal modifications to retrofit autocomplete.
-* Remember that the standard [json](https://docs.python.org/3/library/json.html) module is built-in.
- You can use it to add a mode to your tool that generates JSON output instead of human-readable output, for easy automation of your tool, maybe using [jq](https://stedolan.github.io/jq/) or [fx](https://github.com/antonmedv/fx).
-* Use the standard [subprocess](https://docs.python.org/3/library/subprocess.html) module to execute other commands.
- * Remember never to use `shell=True`, so among other things, your tool will work correctly with files using spaces in their names.
- * Use `check=True` so if the subprocess fails, an exception will be raised.
- This is likely the best default behavior, although the error is a bit ugly, this normally prevents ugly problems and it's a safe option.
-
-You can find examples for many of those techniques in my [repos](https://github.com/alexpdp7?tab=repositories&q=&type=&language=python&sort=).
diff --git a/programming/python/dependency_handling.md b/programming/python/dependency_handling.md
deleted file mode 100644
index 8acf0cee..00000000
--- a/programming/python/dependency_handling.md
+++ /dev/null
@@ -1,111 +0,0 @@
-# Some brief notes about Python dependency management
-
-This article is mostly written for people who have already used Setuptools and have faced issues derived from its "limitations".
-Specifically, if you have seen files named `requirements.txt` and have wondered how they work, what problem do they solve, and if they are something you should investigate, I hope you find this article interesting.
-
-If you are starting to write Python software and you are looking at an introductory text about distributing your software and using dependencies, I would recommend you to skip directly to using the "new generation" Python packaging tools.
-This way, you can avoid most of the complexities in this post.
-You can also check out the [Python Packaging User Guide](https://packaging.python.org/en/latest/) and [my own prescriptive project setup recommendations](project_setup.md).
-
-Most programs can use third-party libraries to implement parts of their functionality without implementing everything from scratch.
-
-pip is the recommended package installer for Python.
-Python installers include pip, although pip is a component that can be installed separately from Python.
-Some Linux distributions separate pip from the main Python package (for example, Debian has a `python3` package and a `python3-pip` package), but a Python install without `pip` is not really fully functional for many purposes.
-
-pip fetches Python packages from diverse sources and adds them to a Python installation.
-Python packages can specify other packages as dependencies, so when pip installs a package, it also installs the required dependency chain.
-
-The traditional mechanism for packages to specify dependencies is Setuptools and other closely related projects.
-
-## About Setuptools
-
-Setuptools is a build and distribution system based on the distutils module that was part of the base Python library.
-
-Package metadata in Setuptools can be defined in many different ways, such as a `setup.py` file, a `setup.cfg` file, or a `pyproject.toml` file.
-In these files, you list the dependencies for your package, specifying the name of the package and constraints.
-
-Constraints define which version of a dependency you want to use.
-The constraint does not be an exact version, it can also be a range of versions, or a constraint such as "lower than version n".
-
-(Constraints additionally can specify other restrictions, such as requiring different versions for different Python versions, and other interesting possibilities.)
-
-In my opinion, although you can package applications and libraries properly using Setuptools, doing it correctly requires much knowledge, effort, and is error-prone.
-
-## Version locking and `requirements.txt`
-
-There is a dependency-management approach that can be very effective in many cases.
-
-This approach involves differentiating between "applications" and "libraries".
-
-Libraries are Python packages meant to be used as a dependency by other Python code.
-Applications are Python code that may use other libraries as dependencies, but which no other Python code depends on.
-
-### Specifying dependencies for libraries
-
-Libraries specify coarse but safe dependency requirements.
-
-Suppose that we are developing the foo library.
-The foo library depends on the bar library.
-The bar library uses a versioning scheme similar to semantic versioning.
-When we develop the foo library, we use version 1.2.3 of the bar library.
-
-Then, we specify that the foo library depends on the bar library, with a version constraint like `>=1.2.3, <1.3`.
-This version constraint lets the library to be used with the 1.2.4 version, which is likely compatible with the code in the foo library, and even introduce valuable bug fixes.
-However, the 1.3.0 version of the bar library would not be a valid dependency.
-This is probably a good idea; the 1.3.0 may contain changes that the foo code is incompatible with.
-(When we later create new versions of the foo library, we may want to consider depending on newer versions of the bar library, and possibly update the code so it continues working correctly.)
-
-This helps reduce conflicts.
-As libraries specify coarse dependencies, the chances of two libraries having incompatible requirements is lower.
-However, specifying coarse dependencies probably requires more testing to ensure that if different dependency versions are installed, the library works correctly.
-
-### Specifying dependencies for applications
-
-Applications specify exact dependency requirements.
-
-While libraries are not usually run on their own, applications are executed directly by end users.
-If a library does not work well, then you can temporarily go back to an older version or apply other fixes.
-But if an application does not work correctly, you have worse problems.
-
-If you specify exact dependency versions for an application, users of the application will always use a single combination of dependencies, which makes making things robust easy.
-
-A popular approach is for applications to specify Setuptools requirements with coarse versioning (just like libraries do), but to provide a list of the specific versions used for development and deployment.
-To create this list of dependencies, you can install your application using pip or some other mechanism, then extract a list of the dependency versions that were installed and store it in a file.
-For example, you can do this by executing:
-
-```
-$ pip install . # executed from the root of the application source code
-$ pip freeze >requirements.txt
-```
-
-Later on, if you install the application using the following command:
-
-```
-$ pip install -r requirements.txt
-```
-
-Then you will always install the same set of dependencies, preventing issues by updated dependencies.
-
-Note: pip and other package installers do *not* use `requirements.txt` or any other similar file outside the `setup.cfg` file and the other files defined in Setuptools.
-If you do not install your application explicitly using `pip install -r requirements.txt`, you will probably install a different set of dependencies.
-
-## Beyond version locking
-
-Following the approach above can be enough to use dependencies correctly.
-
-However, maintaining the Setuptools version dependencies and `requirements.txt` is straightforward, but tedious.
-Also, this approach of dependency management is not obvious, and may not be easy to get right completely.
-
-For these reasons, several projects have appeared that implement approaches similar to the one described above, but more automatic and prescriptive.
-These projects often manage automatically a file equivalent to `requirements.txt`, while the developer only specifies coarse dependencies for applications.
-
-Some of these tools are listed by [a page about relevant projects about packaging](https://packaging.python.org/en/latest/key_projects/) maintained by the [Python Packaging Authority](https://www.pypa.io/).
-Look for tools about managing dependencies and packaging.
-
-Thanks to some improvements in the Python ecosystem, pip can nowadays install dependencies using many different packaging tools correctly.
-
-These projects can also offer some other improvements, so I would encourage Python developers to investigate them and try them out.
-
-However, also note that following a correct approach, Setuptools and manual version locking are perfectly valid ways to manage Python code dependencies.
-Also, there are projects such as [pip-tools](https://github.com/jazzband/pip-tools) that complement Setuptools, addressing many of the issues described here, without requiring entirely new packaging tools.
diff --git a/programming/python/project_setup.md b/programming/python/project_setup.md
deleted file mode 100644
index a5f0c789..00000000
--- a/programming/python/project_setup.md
+++ /dev/null
@@ -1,114 +0,0 @@
-There is a significant amount of Python project tooling. This document collects my personal recommendations on how to set up a Python project.
-
-It is not meant to reflect the best or most common practices, just my personal taste.
-
-# Use pipx
-
-Pipx is a tool that installs Python packages to your user environment. It creates an isolated environment for every tool, so if you install multiple packages they won't have version conflicts. It also takes care of adding a module's entrypoints to your user path.
-
-uv can do very much the same and additionally uv can install most Python versions.
-However, at the time of writing this, pipx is available as a package in many Linux distributions, while uv is not.
-
-If your project can be packaged so that it works with pipx, then many Linux users will be able to install it with pipx after installing pipx with their package manager.
-
-uv can be more convenient for software that requires specific versions of Python that are not available in Linux distributions, but in general cannot be installed with Linux package managers.
-
-# Use uv
-
-When using third-party dependencies in your Python code, it is highly interesting to avoid installing any project-specific dependency outside the project.
-
-To achieve that, traditionally virtualenvs are used; those are miniature Python installations where you can install any library you want. Virtualenvs need to be explicitly activated to be used, so it is easy to have a virtualenv for each Python project you are working on.
-
-uv is a tool that leverages virtualenvs to manage a project's dependencies, managing virtualenvs automatically.
-uv can also manage Python distributions, downloading automatically Python versions other than the existing ones on your system.
-
-There are many similar tools such as pipenv and there are many multiple ways to specify a project's dependencies (`setup.py`, `requirements.txt`, etc.); uv provides a convenient way to do everything.
-
-Consider reading [some brief notes about Python dependency management](dependency_handling.md).
-
-# Test your code
-
-Write the necessary amount of tests so you can make changes to your code with confidence.
-
-If you find yourself iterating over a piece of code slowly, try to isolate the code you are writing so it can be tested in isolation for faster iteration.
-
-## Use pytest for testing
-
-Python provides *two* testing frameworks in its standard library, but they have some limitations:
-
-* `unittest` is an xUnit-style testing framework which follows non-PEP-8 naming conventions (probably because it copied the Java's jUnit), so extra work needs to be done to make your test cases PEP-8 compliant
-* `doctest` is a tool which allows you to run tests embedded in comments. For some code, it is great and helps you provide good, up-to-date documentation. However, a significant amount of code is awkward to test using `doctest`.
-
-Use `doctest` whenever you can, but outside that, use `pytest` to write PEP-8-compliant tests.
-
-Ensure that your test suite runs correctly by running `pytest` without any arguments.
-
-Use plain Python's `assert` statements to check assertions in your tests; `pytest` does some magic to provide nice error messages on failed assertions.
-
-## Gate your changes with testing
-
-Set up your version control so changes cannot be made to your main codeline without passing continuous integration tests (and possibly, code review).
-
-# Perform automated code formatting and static checking
-
-> [!NOTE]
-> I have been using [ruff](https://github.com/astral-sh/ruff) recently.
-> Not enough to recommend it unconditionally over flake8/black, but I am liking it so far.
-> Consider testing it.
-> It requires slightly less configuration and it comes with more lints.
-
-## Use Ruff
-
-Use Ruff to format and lint your code.
-
-# Version control
-
-## Use a minimal gitignore file
-
-See [use gitignore properly](../git/git_advice.md#use-gitignore-properly).
-
-## Keep your code together
-
-All the code you modify as part of the project should be kept in a single repository so you can make atomic changes. If you find yourself making changes across multiple repositories and having to coordinate them, consider merging those repositories.
-
-Use git submodules or similar mechanisms to refer to code you modify that must be kept external.
-
-Use [Josh](../git/combining_repos_with_josh_filter.md) to publish parts of the repository outside the main repository if needed.
-
-# Support multiple modern versions of Python
-
-Unless you have a specific requirement to support Python 2, don't.
-
-It is reasonable to support multiple versions of Python 3 from 3.4 onwards. Supporting the oldest versions might limit the features you can use (although features from more modern versions have been backported), so evaluate which operating systems and versions you need to support and try to support Python versions readily available for them (in Linux, by using mainline distro repos, for instance).
-
-Even if you are not running your code using the latest versions of Python, try to support all the newest available versions.
-
-Use continuous integration to run your tests in all supported versions of Python.
-
-# Use ipython and ipdb
-
-Add ipython and ipdb as development dependencies.
-
-# Versioning
-
-Unless you have a specific requirement to support multiple versions of your code or to distribute to a platform that *requires* versioning (such as pypi), do not explicitly version your code but allow implicit versioning (e.g. it should be possible to identify which Git commit deployed code comes from).
-
-# Documentation
-
-Provide a `README` containing:
-
-* The purpose of the code
-* How to use the code
-* How to develop the code
-
-If the `README` becomes unwieldly, separate usage instructions to `USAGE` and/or development instructions to `HACKING`.
-
-Provide docstrings detailing the external interface of Python modules. Provide internal comments in modules detailing implementation.
-
-If you are developing a library/framework, consider using Sphinx. Sphinx can create a documentation website for a Python project, taking advantage of docstrings.
-
-# Distribution
-
-If your code can be executed from a command line, consider documenting installation via `pipx`.
-
-If your code has dependencies that are not trivial to install (such as Pandas), consider publishing a Docker image or using dependencies that are simpler to install. Design your Docker images so rebuilding the image on most changes is fast.
diff --git a/programming/python/scraping_with_selenium_on_docker.md b/programming/python/scraping_with_selenium_on_docker.md
deleted file mode 100644
index 61ba1c12..00000000
--- a/programming/python/scraping_with_selenium_on_docker.md
+++ /dev/null
@@ -1,7 +0,0 @@
-Don't use Selenium, use [Playwright](https://playwright.dev/python/):
-
-* Playwright automatically sets up headless browsers.
-* Provides convenient abstractions for locating elements in a page (mostly no XPath required. It can match "intelligently" using text).
-* Has a handy UI tool that records your actions in a browser and writes equivalent *readable* Playwright code.
-
-Further reading: https://new.pythonforengineers.com/blog/web-automation-dont-use-selenium-use-playwright/