Smart open - read/write to STDIN/STDOUT or file Michał Moroz

Often you'll find yourself requiring to handle both saving a file or writing it out directly to the console. Or the other way around, reading either from a file or from standard input. Here's the code that simplifies it and implements standard convention for specifying - (the minus sign) as standard input or output that can be directly passed from program arguments.

The code

import sys
import contextlib

@contextlib.contextmanager
def smart_open(filename=None, mode='r', pipe=None, **kwargs):
    if filename and filename != '-':
        fh = open(filename, mode=mode, **kwargs)
    elif pipe is not None:
        fh = pipe
    else:
        fh = sys.stdin if mode.startswith('r') else sys.stdout

    try:
        yield fh
    finally:
        if fh not in (sys.stdout, sys.stderr, sys.stdin):
            fh.close()

Usage

When to use

Use if one or more of the following criteria are met:

  1. The tool needs to output to a single file or to standard output
  2. The tool needs to read either standard input or one or more files (or both)
  3. The choice of input/output depends on arguments passed to the tool

Don't use it, if:

  1. You are writing to multiple files
    1. Trying to fit smart_open would only complicate things there
    2. A custom solution would be a better fit
  2. You're trying to log some information
    1. Use a proper logging library, such as logging, Loguru or StructLog, for that purpose
  3. You're moving forwards and backwards in file with seek and tell
    1. Obviously, standard input/output doesn't support this

Also, see the conclusion section if you want to replace a bunch of open calls with smart_open, as there are a couple of cases where in-place replacement would not work.

Reading files or sys.stdin

The simplest thing you can do with this function is to read contents of a file.

from somewhere import smart_open

import sys


def main():
    with smart_open(sys.argv[1], 'r') as f:
        contents = f.read()

    # do something with contents
    print(contents)


if __name__ == '__main__':
    main()

If the filename is provided, its contents will be read and printed out.

$ python script.py somefile.txt
[... contents of some file]

In the case filename happens to be -, then sys.stdin will be read instead.

$ cat somefile.txt | python script.py -
[... contents of some file]

or

$ python script.py - < somefile.txt
[... contents of some file]

That makes it a lot of easier to use your Python tools in Bash scripting, where piping output between a plethora of tools happens often.

Writing files or outputting to sys.stdout

In the same way, you can open a file for writing. By default, smart_open would choose sys.stdout as a default output if - is provided as a filename.

from somewhere import smart_open

import sys


def main():
    with smart_open(sys.argv[1], 'w') as f:
        f.write("Hello World\n")


if __name__ == '__main__':
    main()

Then, the script can be run in a similar fashion to the one specified above.

$ python script.py output.txt

or

$ python script.py -
[... some output]

sys.stderr and pipe argument

You can easily set sys.stderr as output:

with smart_open(filename, 'r', pipe=sys.stderr) as f:
    f.write("Error: Something bad has happened\n")

However, such usage raises questions whether this code should not be implemented with the use of a logging library. Be careful not to abuse this feature.

The pipe argument can accept an arbitrary file handle, so you can pass something non-standard there, such as /dev/null for a quiet mode. Please note that this stretches beyond the original contract smart_open provides, and may lead to abuse, too.

Integrations

With Docopt

I love working with Docopt, and here's where the library shines:

"""
Save some data.

Usage:
    script [-o FILE]

Options:
    -o FILE  File to be saved to [default: -]
"""
from docopt import docopt

from somewhere import smart_open


def main():
    args = docopt(__doc__, version="1.0")

    with smart_open(args['-o'], 'w') as fh:
        fh.write("Hello, World!")


if __name__ == "__main__":
    main()

Because we set a default value for the -o option to -, the program will by default print out the result to the standard output, which in most cases would be the expected outcome.

$ python script.py
[... output]

or

$ python script.py -o file.txt

With argparse

We can achieve the same with argparse:

def main():
    parser = argparse.ArgumentParser(
        description="A simple script to demonstrate argparse."
    )

    parser.add_argument(
        '-o', 
        '--output', 
        type=str, 
        default='-', 
        help='Output file [default: -]'
    )

    args = parser.parse_args()

    with smart_open(args.output, 'w') as fh:
        fh.write("Hello, World!")

if __name__ == "__main__":
    main()

The result would be the same.

Conclusion

The main benefit of the smart_open function is that we do not need to branch our code depending on whether we want to save/read a file or write/read from STDOUT/STDIN. As the saying goes, it just works.

There are two caveats though:

The first one is that smart_open doesn't replicate open()'s signature fully, as it adds a pipe keyword argument just after the mode argument. So make sure that every argument after mode is a keyword argument:

with smart_open(filename, 'r', encoding='utf-8') as f:
    ...

Even more important, contextmanager forces us to use with statements to open files. So the following would not work:

f = smart_open(filename, 'w')
# won't work!
f.write("Hello World")

Both of these could be fixed by enough refactoring, but that would complicate the function even further for a little gain, at least to me.

But feel free to modify the code to your liking.