Smart open - read/write to STDIN/STDOUT or file Michał Moroz
Often you'll find yourself requiring to handle both saving
a file or writing it out directly to the console. Or the
other way around, reading either from a file or from standard
input. Here's the code that simplifies it and implements
standard convention for specifying -
(the minus sign) as standard input
or output that can be directly passed from program arguments.
The code
import sys
import contextlib
@contextlib.contextmanager
def smart_open(filename=None, mode='r', pipe=None, **kwargs):
if filename and filename != '-':
fh = open(filename, mode=mode, **kwargs)
elif pipe is not None:
fh = pipe
else:
fh = sys.stdin if mode.startswith('r') else sys.stdout
try:
yield fh
finally:
if fh not in (sys.stdout, sys.stderr, sys.stdin):
fh.close()
Usage
When to use
Use if one or more of the following criteria are met:
- The tool needs to output to a single file or to standard output
- The tool needs to read either standard input or one or more files (or both)
- The choice of input/output depends on arguments passed to the tool
Don't use it, if:
- You are writing to multiple files
- Trying to fit
smart_open
would only complicate things there - A custom solution would be a better fit
- Trying to fit
- You're trying to log some information
- Use a proper logging library, such as
logging
, Loguru or StructLog, for that purpose
- Use a proper logging library, such as
- You're moving forwards and backwards in file with
seek
andtell
- Obviously, standard input/output doesn't support this
Also, see the conclusion section if you want to replace a bunch of open
calls with smart_open
, as there are a couple of cases where in-place replacement would not work.
Reading files or sys.stdin
The simplest thing you can do with this function is to read contents of a file.
from somewhere import smart_open
import sys
def main():
with smart_open(sys.argv[1], 'r') as f:
contents = f.read()
# do something with contents
print(contents)
if __name__ == '__main__':
main()
If the filename is provided, its contents will be read and printed out.
$ python script.py somefile.txt
[... contents of some file]
In the case filename happens to be -
, then sys.stdin
will be read instead.
$ cat somefile.txt | python script.py -
[... contents of some file]
or
$ python script.py - < somefile.txt
[... contents of some file]
That makes it a lot of easier to use your Python tools in Bash scripting, where piping output between a plethora of tools happens often.
Writing files or outputting to sys.stdout
In the same way, you can open a file for writing. By default, smart_open
would choose sys.stdout
as a default output if -
is provided as a filename.
from somewhere import smart_open
import sys
def main():
with smart_open(sys.argv[1], 'w') as f:
f.write("Hello World\n")
if __name__ == '__main__':
main()
Then, the script can be run in a similar fashion to the one specified above.
$ python script.py output.txt
or
$ python script.py -
[... some output]
sys.stderr
and pipe
argument
You can easily set sys.stderr
as output:
with smart_open(filename, 'r', pipe=sys.stderr) as f:
f.write("Error: Something bad has happened\n")
However, such usage raises questions whether this code should not be implemented with the use of a logging library. Be careful not to abuse this feature.
The pipe
argument can accept an arbitrary file handle, so
you can pass something non-standard there, such as /dev/null
for a quiet mode. Please note that this stretches beyond
the original contract smart_open
provides, and may lead to abuse, too.
Integrations
With Docopt
I love working with Docopt, and here's where the library shines:
"""
Save some data.
Usage:
script [-o FILE]
Options:
-o FILE File to be saved to [default: -]
"""
from docopt import docopt
from somewhere import smart_open
def main():
args = docopt(__doc__, version="1.0")
with smart_open(args['-o'], 'w') as fh:
fh.write("Hello, World!")
if __name__ == "__main__":
main()
Because we set a default value for the -o
option to -
, the program will by default print out the result to the standard output, which in most cases would be the expected outcome.
$ python script.py
[... output]
or
$ python script.py -o file.txt
With argparse
We can achieve the same with argparse
:
def main():
parser = argparse.ArgumentParser(
description="A simple script to demonstrate argparse."
)
parser.add_argument(
'-o',
'--output',
type=str,
default='-',
help='Output file [default: -]'
)
args = parser.parse_args()
with smart_open(args.output, 'w') as fh:
fh.write("Hello, World!")
if __name__ == "__main__":
main()
The result would be the same.
Conclusion
The main benefit of the smart_open
function is that we do not need to branch our code depending on whether we want to save/read a file or write/read from STDOUT
/STDIN
. As the saying goes, it just works.
There are two caveats though:
The first one is that smart_open
doesn't replicate open()
's signature fully, as it adds a pipe
keyword argument just after the mode
argument. So make sure that every argument after mode
is a keyword argument:
with smart_open(filename, 'r', encoding='utf-8') as f:
...
Even more important, contextmanager
forces us to use with
statements to open files. So the following would not work:
f = smart_open(filename, 'w')
# won't work!
f.write("Hello World")
Both of these could be fixed by enough refactoring, but that would complicate the function even further for a little gain, at least to me.
But feel free to modify the code to your liking.