Everything You Ever Wanted to Know About Python’s Import Machinery

Write better structured modules and packages

Paul Papacz
15 min readDec 17, 2020
Python import statement

If you’ve been working with Python for a while, you’ve probably come across the “__main__ idiom”. It consists of a couple of lines of code that usually look like this:

In this article, I would like to explore the meaning of these lines in some greater depth and use this common pattern as a starting point for an exploration of Python’s import machinery. This should help you understand better what is happening during import, and also help you to bring structure into your own modules and packages. (This article is written referring to the standard CPython implementation and Python version 3.6.)

Executing a Module With the Interpreter

When a module like the one shown above (module_a.py) is passed to the interpreter (e.g. as python module_a.py) on the command line, Python’s import machinery collects information about the module, and defines and sets several attributes that can be used to control the module’s behaviour. These attributes are set before any of the code in the module is executed and are accessible from within the module. A list of those attributes can be found here.

Another thing that happens when the interpreter is invoked with a file is, the __main__ module gets initialised, and the statements in the file get executed and become part of the the module’s namespace. The __main__ module’s __name__ attribute is set to the string value __main__. More on this below.

To make more sense of the paragraphs above, let’s add a few statements to module_a.py in order to inspect the attributes set by the Python interpreter:

globals() is a built-in function that returns a dictionary containing all the symbols (variables, methods, etc) defined in the current namespace. Line 4 in the code above copies the dictionary returned by globals() before iterating it and printing its keys and values. (It is necessary to operate on the copy because the variables k and v become part of the namespace for this module and change at every step. Lines 4–5 are only there to print out the variables in a more readable way, you might as well replace them with print(globals()) or a similar statement.)

When you execute this code via:

$ python module_a.py

you should see output that looks very similar to the following (slightly truncated for readability):

$ python module_a.py
globals Module A:
'__name__': '__main__'
'__doc__': 'Module A'
'__package__': None
'__loader__': <_frozen_importlib_external.SourceFileLoader ...>
'__spec__': None
'__annotations__': {}
'__builtins__': <module 'builtins' (built-in)>
'__file__': 'module_a.py'
'__cached__': None
Hello World

As you can see, most of the attributes described in the official documentation are defined and have some value assigned to them. You can see that __file__ contains the file name (this will typically be the relative or full path the the file), __doc__ contains the module’s docstring and __name__ is set to the string __main__. These attributes are now defined in the __main__ module’s namespace and can be directly accessed, as is done in line 15 in the code above (module_a). Since the value of __name__ is __main__ in this case, the main() function gets called, which in turn calls function_a(), which prints out Hello World.

It is only by convention that the method called after the __name__ check is called main. It is to the script’s author to decide what should happen in this place. It is possible to call any (defined) function, method, etc, or have some more complex initialisation code. More on that later.

Importing a Module

If we want to use functions, classes, etc, which are defined in module_a in another module, let’s say in module_b, we can easily import module_a in there. Assuming that both files are in the same directory, the code could simply look like:

If you pass module_b to the interpreter now, you’ll see output like the following (truncated for readability):

$ python module_b.py
globals Module A:
'__name__': 'module_a'
'__doc__': 'Module A'
'__package__': ''
'__loader__': <_frozen_importlib_external.SourceFileLoader ...>
'__spec__': ModuleSpec(name='module_a', loader=<_frozen_importlib_external.SourceFileLoader ...>, origin='/path/to/module_a.py')
'__file__': '/path/to/module_a.py'
'__cached__': '/path/to/__pycache__/module_a.cpython-36.pyc'
'__builtins__': {'__name__': 'builtins', ...}

The output is generated by the print statements inside module_a.py, after it has been imported by module_b.py.

Notice the differences between this and the first output: __name__ is now set to the module’s name rather than __main__, __file__ is an absolute path to the file the module has been imported from, __spec__ is set to an instance of ModuleSpec (see here for more information), and __builtins__ is set to the builtins’s module dictionary (this is a CPython implementation detail).

Also, notice that Hello World does not get printed out. Since module_a’s __name__ is now set to it’s name (rather than __main__), line 15 in module_a.py prevents main() from getting called when the module is loaded and imported, and therefore function_a() never gets called.

To get a better understanding of the variables in module_b’s namespace let’s add a print out to module_b.py:

Running this, should produce output similar to the following:

$ python module_b.py
globals Module A:
'__name__': 'module_a'
'__doc__': 'Module A'
'__package__': ''
'__loader__': <_frozen_importlib_external.SourceFileLoader ...>
'__spec__': ModuleSpec(name='module_a', loader=<_frozen_importlib_external.SourceFileLoader ...>, origin='/path/to/module_a.py')
'__file__': '/path/to/module_a.py'
'__cached__': '/path/to/__pycache__/module_a.cpython-36.pyc'
'__builtins__': {'__name__': 'builtins', ...}
globals module B:
'__name__': '__main__'
'__doc__': 'Module B'
'__package__': None
'__loader__': <_frozen_importlib_external.SourceFileLoader ...>
'__spec__': None
'__annotations__': {}
'__builtins__': <module 'builtins' (built-in)>
'__file__': 'module_b.py'
'__cached__': None
'module_a': <module 'module_a' from '/path/to/module_a.py'>

The first half should look the same as before, while the second half should look similar to the output when we ran python module_a.py but with module_a replaced with module_b in most places. On top of that, the namespace now also contains module_a, which makes it (and everything defined in it) accessible inside module_b.

One more thing to notice is that __package__ in module_a’s namespace is set to an empty string, while __package__ in module_b’s namespace is set to None. Python will try to determine whether a module is part of a package. Since module_a is being imported by module_b in this case, it is at least possible that it might be part of a package, therefore the variable is set to an empty string, while module_b is directly executed which implies it cannot be part of a package (in this particular execution).

The output shows us that module_a has been successfully imported into module_b, its function definitions have been loaded and can be accessed, e.g.:

which would print out Hello World (and the variables inside module_a’s namespace upon import).

The __main__ Module

As mentioned above, when a module is executed by invoking the interpreter directly, the __main__ module is initialised in order to provide the namespace for the top-level environment of the program. To get a better understanding of what that means, we can use Python’s sys module to get a list of loaded modules (sys is one of the few modules that gets initialised on interpreter start-up). For this, let’s create a new module with the following content:

The sys.modules variable holds a dictionary with all modules that have been loaded so far (but not necessarily imported). Sorting for convenience and printing them out gives something like the following (truncated for readability):

$ python module_c.py
modules
'__main__': <module 'module_c' from '/path/to/module_c.py'>
'_bootlocale': <module '_bootlocale' from '/usr/lib/python3.6/_bootlocale.py'>
'_codecs': <module '_codecs' (built-in)>
...
...
...
'warnings': <module 'warnings' from '/usr/lib/python3.6/warnings.py'>
'weakref': <module 'weakref' from '/usr/lib/python3.6/weakref.py'>
'zipimport': <module 'zipimport' (built-in)>

It is a mapping between module names (by which the modules can be accessed) and the module instances (a module is a Python object itself) for all loaded module. In other words, the modules listed are known to the interpreter and can be import’ed inside the given module. This is also the first place Python will search for modules to import.

As you can see, the first entry is the __main__ module which has been initialised with module_c’s content. This means, we can use this module to further convince ourselves that the __main__ module’s and the current module’s namespace are the exact same thing. To do this, let’s create another module with the following content:

Running this, should give:

$ python module_d.py
True
False
True
True

This shows us, not only is the value of __name__ the same in both cases, but they also refer to the same object in memory.

Understanding __main__.py

In addition to the “__main__ idiom”, Python offers a way of achieving the same effect by creating a file called __main__.py inside a project directory, alongside the actual module files. This can be useful when a project has become very large and you would like to split the logic into multiple files/modules, or if you want to keep functionality strictly compartmentalised.

Imagine a package with the following directory structure:

my_package/
├── __main__.py
├── module_x.py
└── module_y.py

and files with the following content:

It is now possible, to pass the directory to the Python interpreter to execute, which gives output like the following (truncated for readability):

$ python my_package
globals Module X:
'__name__': 'module_x'
'__doc__': 'Module X'
'__package__': ''
'__loader__': <_frozen_importlib_external.SourceFileLoader ...>
'__spec__': ModuleSpec(name='module_x', loader=<_frozen_importlib_external.SourceFileLoader ...>, origin='my_package/module_x.py')
'__file__': 'my_package/module_x.py'
'__cached__': 'my_package/__pycache__/module_x.cpython-36.pyc'
'__builtins__': {'__name__': 'builtins', ...}
globals Module Y:
'__name__': 'module_y'
'__doc__': 'Module Y'
'__package__': ''
'__loader__': <_frozen_importlib_external.SourceFileLoader ...>
'__spec__': ModuleSpec(name='module_y', loader=<_frozen_importlib_external.SourceFileLoader ...>, origin='my_package/module_y.py')
'__file__': 'my_package/module_y.py'
'__cached__': 'my_package/__pycache__/module_y.cpython-36.pyc'
'__builtins__': {'__name__': ...}
globals main:
'__name__': '__main__'
'__doc__': 'Main module'
'__package__': ''
'__loader__': <_frozen_importlib_external.SourceFileLoader ...>
'__spec__': ModuleSpec(name='__main__', loader=<_frozen_importlib_external.SourceFileLoader ...>, origin='my_package/__main__.py')
'__annotations__': {}
'__builtins__': <module 'builtins' (built-in)>
'__file__': 'my_package/__main__.py'
'__cached__': 'my_package/__pycache__/__main__.cpython-36.pyc'
'module_x': <module 'module_x' from 'my_package/module_x.py'>
'module_y': <module 'module_y' from 'my_package/module_y.py'>

function x
function y

Most of the output is similar to what has been described above, but the fact that it is printed at all and the order in which it is printed, gives us insight into what the Python interpreter is doing.

We see module_x’s namespace variables, followed by the module_y’s and the __main__ module’s namespace variables. Since the __main__ module is the only place we have done any imports so far, this tells us that the interpreter is automatically picking up whatever is in __main__.py and executing it as if it was specified on the command line directly (this is not exactly true, as the paths in most cases would be absolute instead of relative).

The next thing to notice is that the name attribute __name__ for module_x and module_y are set to the respective names, as you would expect for modules being imported, while __name__ is set to __main__ for __main__.py.

Notice how module_x and module_y are part of the namespace in __main__.py (as you would expect since we are importing them), and we can make calls to functions defined inside those modules.

The last two lines show us that the two calls to functions defined in module_x and module_y are executed as well.

Be aware of how every line in each module gets executed automatically upon import (the function definitions inside module_x and module_y are statements that get executed as well, while the functions themselves don’t).

It is also possible to pass in the absolute path to the package, i.e.:

$ python /path/to/my_package

The result should be the same, with absolute instead of relative paths in the output.

Advantages of Using the __main__ Idiom

Whether it is via the if __name__ == '__main__' “guard” statement or by using a __main__.py file, one key advantage is separation of logic defined in your modules from its execution. The details on if you should use it, how to structure a project and which pieces of logic should go where, will generally depend on what the code does and how it is intended to be used. Here are a few common patterns to consider.

Testing

Imagine module_a from the first example, didn’t have the guard statement and would make a call to the main function every time the module is imported somewhere. If you wanted to write a test for function_a, you would have to import module_a in your test script, which would immediately call main() and subsequently function_a(). In this particular case this might not be a big deal, but if function_a had more impactful side-effect (maybe writing a file to a specified location), you would most likely want to avoid that, or at least have more control over it.

Command Line Arguments

Another use-case is a module or package that is designed to run stand-alone (as a command line script) but which also defines logic that might be used (via imports) in other modules. Since your project is designed to run from the command line, it is likely to have some form of command-line argument processing, using argparse or a comparable library.

It might also be necessary to perform other initial steps like reading and checking configuration files, setting up a logger, etc. These and other things may be unnecessary or even counterproductive when the module is imported as part of another project.

Imports

If your project is using libraries that are only relevant when it is executed, it can make sense to import those libraries in __main__.py and that way, avoid having to import those when your code gets imported somewhere else.

Understanding __init__.py

A slightly more common thing in Python projects and module to find is a __init__.py file. The official documentation tells us that

when a regular package is imported, this __init__.py file is implicitly executed, and the objects it defines are bound to names in the package’s namespace.

This means that __init__.py serves a different purpose than __main__.py and we can use the method described above to understand the differences in more detail.

Let’s extend my_package from above and add a __init__.py file:

my_package/
├── __init__.py
├── __main__.py
├── module_x.py
└── module_y.py

where __init__.py has the following content:

When we now pass the package to the interpreter (python my_package), we should see the exact same output as in the previous example, since nothing has changed for the execution of a package in that way.

The main difference comes in, when we treat the package as an actual package, and import it.

Importing a Package

In order to get the subtleties, I will describe a step-by-step approach. We’re starting a Python interpreter session without any parameters and use our two-liner from above to get an idea of the current namespace, i.e.:

$ python
Python 3.6.9 (default, Oct 8 2020, 12:12:24)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> for k, v in dict(globals()).items():
... print(f'{repr(k)}: {repr(v)}')
...
'__name__': '__main__'
'__doc__': None
'__package__': None
'__loader__': <class '_frozen_importlib.BuiltinImporter'>
'__spec__': None
'__annotations__': {}
'__builtins__': <module 'builtins' (built-in)>

This looks very similar to the case where we passed in a module directly to the interpreter. As you might have expected, the interpreter created a module with name __main__ and populated some of the module related variables.

In the next step, we import my_package:

>>> import my_package 
globals init:
'__name__': 'my_package'
'__doc__': 'Init'
'__package__': 'my_package'
'__loader__': <_frozen_importlib_external.SourceFileLoader ...>
'__spec__': ModuleSpec(name='my_package', loader=<_frozen_importlib_external.SourceFileLoader ...>, origin='/path/to/my_package/__init__.py', submodule_search_locations=['/path/to/my_package'])
'__path__': ['/path/to/my_package']
'__file__': '/path/to/my_package/__init__.py'
'__cached__': '/path/to/my_package/__pycache__/__init__.cpython-36.pyc'
'__builtins__': {'__name__': 'builtins', ...}

and see the namespace variables for __init__.py printed out.

Notice how, in this case, __name__ , as well as __package__, is set to the string value my_package. This shows us that everything in __init__.py has been executed upon import (including the function definition), and that a new module (and also namespace) has been created that contains bindings to everything defined in __init__.py.

After importing my_package in the interpreter session, let’s use globals() to inspect the name space and to ensure that the package is available:

>>> for k, v in dict(globals()).items(): 
... print(f'{repr(k)}: {repr(v)}')
'__name__': '__main__'
'__doc__': None
...
...
'my_package': <module 'my_package' from '/path/to/my_package/__init__.py'>

The last line should show my_package now.

To further convince ourselves, we can run a few checks like these:

>>> my_package.__name__
'my_package'
>>> my_package.__file__
'/path/to/my_package/__init__.py'

And eventually:

>>> my_package.package_level_function() 
package level function

The function defined in __init__.py is immediately accessible, the two modules (module_x and module_y), however, are not:

>>> my_package.module_x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'my_package' has no attribute 'module_x'

Importing a Module From a Package

In order to make the modules accessible, and to further explore what happens when we import one of the modules from my_package, let’s start a new interpreter session and run the following import statement:

$ python
...
>>> from my_package import module_x

We see two things happen in this case:
The namespace variables in __init__.py are being printed out, followed by the namespace variables in module_x:

globals init:
'__name__': 'my_package'
'__doc__': 'Init module'
'__package__': 'my_package'
...
...
'__builtins__': {'__name__': 'builtins', ...}
globals module X:
'__name__': 'my_package.module_x'
'__doc__': 'Module X'
'__package__': 'my_package'
'__loader__': <_frozen_importlib_external.SourceFileLoader ...>
'__spec__': ModuleSpec(name='my_package.module_x', loader=<_frozen_importlib_external.SourceFileLoader ...>, origin='/path/to/my_package/module_x.py')
'__file__': '/path/to/my_package/module_x.py'
'__cached__': '/path/to/my_package/__pycache__/module_x.cpython-36.pyc'
'__builtins__': {'__name__': 'builtins', ...}

In other words, everything in __init__.py has been executed before importing module_x and executing everything inside it.

Also notice how module_x’s __name__ has been set to the module’s fully-qualified name, while __package__ has been set to my_package. Printing out the variables the main namespace shows us:

>>> for k, v in dict(globals()).items(): 
... print(f'{repr(k)}: {repr(v)}')
...
'__name__': '__main__'
'__doc__': None
'__package__': None
'__loader__': <class '_frozen_importlib.BuiltinImporter'>
'__spec__': None
'__annotations__': {}
'__builtins__': <module 'builtins' (built-in)>
'module_x': <module 'my_package.module_x' from '/path/to/my_package/module_x.py'>

In other words, the last line tells us that my_package.module_x is now bound to a variable called module_x in the namespace and that it is accessible (while my_package isn’t):

>>> module_x.function_x() 
function x
>>>
>>> my_package.package_level_function()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'my_package' is not defined

This is not surprising, since my_package didn’t show up in the namespace variables.

To further explore what is going on, let’s have a look at sys.modules. In the same session, run the following few lines (output truncated for readability):

>>> import sys
>>> for k, v in sorted(sys.modules.items()):
... print(f'{repr(k)}: {repr(v)}')
...
'__future__': <module '__future__' from '/usr/lib/python3.6/__future__.py'>
'__main__': <module '__main__' (built-in)>
...
'my_package': <module 'my_package' from '/path/to/my_package/__init__.py'>
'my_package.module_x': <module 'my_package.module_x' from '/path/to/my_package/module_x.py'>
...
'zlib': <module 'zlib' (built-in)>

The long list that is printed out, again, contains all the modules that have been loaded by the interpreter up until this moment. We notice that my_package as well as my_package.module_x have been loaded but only my_package.module_x has been imported and bound to a name in the main namespace (which means it shows up when printing out globals and can be accessed in the interpreter).

Importing a Package Module

Let’s see what happens when we import the module using it’s fully-qualified name. To do so, we start a new Python interpreter session and type in the following:

$ python
...
>>> import my_package.module_x

The output is very similar to the previous case, the variables in __init__.py’s namespace are printed out, followed by those in module_x:

globals init:
'__name__': 'my_package'
'__doc__': 'Init module'
'__package__': 'my_package'
...
globals module X:
'__name__': 'my_package.module_x'
'__doc__': 'Module X'
'__package__': 'my_package'
...

The difference becomes clearer, when we inspect the variables in the main namespace:

>>> for k, v in dict(globals()).items():
... print(f'{repr(k)}: {repr(v)}')
...
'__name__': '__main__'
'__doc__': None
'__package__': None
'__loader__': <class '_frozen_importlib.BuiltinImporter'>
'__spec__': None
'__annotations__': {}
'__builtins__': <module 'builtins' (built-in)>
'my_package': <module 'my_package' from '/path/to/my_package/__init__.py'>

We see, that in contrast to the method above, my_package (instead of module_x) is now defined in the namespace, and we cannot access module_x directly but have to use its fully-qualified name:

>>> module_x.function_x() 
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'module_x' is not defined
>>>
>>> my_package.module_x.function_x()
function x

It is also possible to call functions and everything else defined in __init__.py:

>>> my_package.package_level_function() 
package level function

To further understand what is going on, let’s inspect my_package with help of Python’s dir function (which produces a list similar to globals.keys()):

>>> for v in dir(my_package): 
... print(repr(v))
...
'__builtins__'
'__cached__'
'__doc__'
'__file__'
'__loader__'
'__name__'
'__package__'
'__path__'
'__spec__'
'module_x'
'package_level_function'

As we can see,module_x has become part of my_package’s namespace, which explains why we cannot call it directly.

In other words, this way we have loaded, imported and bound my_package to a variable in the main namespaces, while binding module_x to a variable in my_package’s namespace. This behaviour is also described in the documentation of the __import__ function which gets called during import.

Importing * From a Package

Python’s official tutorial does a great job in explaining what happens when you import * from a package. Let’s use the method described in this article to get a better understanding of this in a fresh interpreter session:

$ python
...
>>> from my_package import *
globals init:
'__name__': 'my_package'
'__doc__': 'Init module'
'__package__': 'my_package'
'__loader__': <_frozen_importlib_external.SourceFileLoader ...>
'__spec__': ModuleSpec(name='my_package', loader=<_frozen_importlib_external.SourceFileLoader ...>, origin='/path/to/my_package/
__init__.py', submodule_search_locations=['/path/to/my_package'])
'__path__': ['/path/to/my_package']
'__file__': '/path/to/my_package/__init__.py'
'__cached__': '/path/to/my_package/__pycache__/__init__.cpython-36.pyc'
'__builtins__': {'__name__': 'builtins', ...}

As expected, the code in __init__.py has been executed, but nothing else. We can further check this by inspecting namespace variables:

>>> for k, v in dict(globals()).items():
... print(f'{repr(k)}: {repr(v)}')
...
'__name__': '__main__'
'__doc__': None
'__package__': None
'__loader__': <class '_frozen_importlib.BuiltinImporter'>
'__spec__': None
'__annotations__': {}
'__builtins__': <module 'builtins' (built-in)>
'package_level_function': <function package_level_function ...>

The only (additional) available object is package_level_function, while module_x and module_y have not been imported or loaded.

If we would like to change this, we can follow the instructions in the tutorial and add __all__ to the package. __init__.py is the ideal place to add this variable, so we modify the file to look like this:

In a new interpreter session we repeat the import:

$ python
...
>>> from my_package import *

As expected, we see the __init__.py printouts followed by the module_x printouts:

globals init:
'__name__': 'my_package'
'__doc__': 'Init module'
'__package__': 'my_package'
...
'__all__': ['module_x']
globals module X:
'__name__': 'my_package.module_x'
'__doc__': 'Module X'
'__package__': 'my_package'
...

Notice that __all__ is now defined in __init__’s namespaces, which is the reason why module_x gets imported.

We can now further inspect the main namespace:

>>> for k, v in dict(globals()).items():
... print(f'{repr(k)}: {repr(v)}')
...
'__name__': '__main__'
'__doc__': None
'__package__': None
'__loader__': <class '_frozen_importlib.BuiltinImporter'>
'__spec__': None
'__annotations__': {}
'__builtins__': <module 'builtins' (built-in)>
'module_x': <module 'my_package.module_x' from '/path/to/my_package/module_x.py'>

As expected, we see module_x in the namespace, however, package_level_function is not directly accessible in the namespace in this case (neither is my_package itself). In other words, excluding things from __all__ allows you to “hide” objects, functions, variables, etc, defined in __init__.py that you might use for the initialisation of your package but that are not supposed to be exposed to the user (e.g. because they contain a name that is likely to clash with other imports).

Conclusion

There are a lot of subtleties associated with Python’s import machinery but it is powerful tool that does a lot of heavy-lifting for you when it comes to finding modules in your file tree, loading and importing them. It also gives you a lot of flexibility and convenience when importing modules.

Python also allows you to segregate the logic in your project in a way that will help others understand the project’s structure more easily.

There is much more to the import machinery and generally the official Python Tutorial is a great reference for intermediate and advanced programmers.

If you are like me, observing the internals in a simple way, like the one described in this article, is a great way of solidifying your knowledge and getting a better grasp of the language’s details.

--

--

Paul Papacz

Software Developer | Data Engineer | Former Scientist