You have seen how you can reuse code in your program by defining functions once. What if you wanted to reuse a number of functions in other programs that you write? As you might have guessed, the answer is modules.
There are various methods of writing modules, but the simplest way is
to create a file with a .py
extension that contains functions and
variables.
Another method is to write the modules in the native language in which the Python interpreter itself was written. For example, you can write modules in the C programming language and when compiled, they can be used from your Python code when using the standard Python interpreter.
A module can be imported by another program to make use of its functionality. This is how we can use the Python standard library as well. First, we will see how to use the standard library modules.
Example (save as using_sys.py
):
import sys
print('The command line arguments are:')
for i in sys.argv:
print(i)
print('\n\nThe PYTHONPATH is', sys.path, '\n')
Output:
$ python3 using_sys.py we are arguments
The command line arguments are:
using_sys.py
we
are
arguments
The PYTHONPATH is ['/Users/swaroop/byte_of_python',
'/usr/local/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3',
'/usr/local/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/plat-darwin',
'/usr/local/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/lib-dynload',
'/usr/local/lib/python3.3/site-packages']
How It Works:
First, we import the sys
module using the import
statement. Basically, this translates to us telling Python that we
want to use this module. The sys
module contains functionality
related to the Python interpreter and its environment i.e. the
*sys*tem.
When Python executes the import sys
statement, it looks for the
sys
module. In this case, it is one of the built-in modules, and
hence Python knows where to find it.
If it was not a compiled module i.e. a module written in Python, then
the Python interpreter will search for it in the directories listed in
its sys.path
variable. If the module is found, then the statements
in the body of that module are run and the module is made available
for you to use. Note that the initialization is done only the first
time that we import a module.
The argv
variable in the sys
module is accessed using the dotted
notation i.e. sys.argv
. It clearly indicates that this name is part
of the sys
module. Another advantage of this approach is that the
name does not clash with any argv
variable used in your program.
The sys.argv
variable is a list of strings (lists are explained in
detail in a later chapter). Specifically, the
sys.argv
contains the list of command line arguments i.e. the
arguments passed to your program using the command line.
If you are using an IDE to write and run these programs, look for a way to specify command line arguments to the program in the menus.
Here, when we execute python using_sys.py we are arguments
, we run
the module using_sys.py
with the python
command and the other
things that follow are arguments passed to the program. Python stores
the command line arguments in the sys.argv
variable for us to use.
Remember, the name of the script running is always the first argument
in the sys.argv
list. So, in this case we will have 'using_sys.py'
as sys.argv[0]
, 'we'
as sys.argv[1]
, 'are'
as sys.argv[2]
and 'arguments'
as sys.argv[3]
. Notice that Python starts counting
from 0 and not 1.
The sys.path
contains the list of directory names where modules are
imported from. Observe that the first string in sys.path
is empty -
this empty string indicates that the current directory is also part of
the sys.path
which is same as the PYTHONPATH
environment
variable. This means that you can directly import modules located in
the current directory. Otherwise, you will have to place your module
in one of the directories listed in sys.path
.
Note that the current directory is the directory from which the
program is launched. Run import os; print(os.getcwd())
to find out
the current directory of your program.
Importing a module is a relatively costly affair, so Python does some
tricks to make it faster. One way is to create byte-compiled files
with the extension .pyc
which is an intermediate form that Python
transforms the program into (remember the
introduction section on how Python works?). This
.pyc
file is useful when you import the module the next time from a
different program - it will be much faster since a portion of the
processing required in importing a module is already done. Also, these
byte-compiled files are platform-independent.
Note
: These .pyc
files are usually created in the same directory as
the corresponding `.py` files. If Python does not have permission
to write to files in that directory, then the `.pyc` files will
not be created.
If you want to directly import the argv
variable into your program
(to avoid typing the sys.
everytime for it), then you can use the
from sys import argv
statement.
In general, you should avoid using this statement and use the
import
statement instead since your program will avoid name clashes
and will be more readable.
Example:
from math import sqrt
print("Square root of 16 is", sqrt(16))
Every module has a name and statements in a module can find out the
name of their module. This is handy for the particular purpose of
figuring out whether the module is being run standalone or being
imported. As mentioned previously, when a module is imported for the
first time, the code it contains gets executed. We can use this to
make the module behave in different ways depending on whether it is
being used by itself or being imported from another module. This can
be achieved using the __name__
attribute of the module.
Example (save as using_name.py
):
if __name__ == '__main__':
print('This program is being run by itself')
else:
print('I am being imported from another module')
Output:
$ python3 using_name.py
This program is being run by itself
$ python3
>>> import using_name
I am being imported from another module
>>>
How It Works:
Every Python module has its __name__
defined. If this is
'__main__'
, that implies that the module is being run standalone by
the user and we can take appropriate actions.
Creating your own modules is easy, you've been doing it all along!
This is because every Python program is also a module. You just have
to make sure it has a .py
extension. The following example should
make it clear.
Example (save as mymodule.py
):
def sayhi():
print('Hi, this is mymodule speaking.')
__version__ = '0.1'
The above was a sample module. As you can see, there is nothing particularly special about it compared to our usual Python program. We will next see how to use this module in our other Python programs.
Remember that the module should be placed either in the same directory
as the program from which we import it, or in one of the directories
listed in sys.path
.
Another module (save as mymodule_demo.py
):
import mymodule
mymodule.sayhi()
print ('Version', mymodule.__version__)
Output:
$ python3 mymodule_demo.py
Hi, this is mymodule speaking.
Version 0.1
How It Works:
Notice that we use the same dotted notation to access members of the module. Python makes good reuse of the same notation to give the distinctive 'Pythonic' feel to it so that we don't have to keep learning new ways to do things.
Here is a version utilising the from..import
syntax (save as
mymodule_demo2.py
):
from mymodule import sayhi, __version__
sayhi()
print('Version', __version__)
The output of mymodule_demo2.py
is same as the output of
mymodule_demo.py
.
Notice that if there was already a __version__
name declared in the
module that imports mymodule, there would be a clash. This is also
likely because it is common practice for each module to declare it's
version number using this name. Hence, it is always recommended to
prefer the import
statement even though it might make your program a
little longer.
You could also use:
from mymodule import *
This will import all public names such as sayhi
but would not import
__version__
because it starts with double underscores.
Zen of Python
: One of Python's guiding principles is that "Explicit is better
than Implicit". Run `import this` to learn more and see
[this StackOverflow discussion](http://stackoverflow.com/questions/228181/zen-of-python)
which lists examples for each of the principles.
You can use the built-in dir
function to list the identifiers that
an object defines. For example, for a module, the identifiers include
the functions, classes and variables defined in that module.
When you supply a module name to thedir()
function, it returns the
list of the names defined in that module. When no argument is applied
to it, it returns the list of names defined in the current module.
Example:
$ python3
>>> import sys # get list of attributes, in this case, for the sys module
>>> dir(sys)
['__displayhook__', '__doc__', '__excepthook__', '__name__', '__package__', '__s
tderr__', '__stdin__', '__stdout__', '_clear_type_cache', '_compact_freelists',
'_current_frames', '_getframe', 'api_version', 'argv', 'builtin_module_names', '
byteorder', 'call_tracing', 'callstats', 'copyright', 'displayhook', 'dllhandle'
, 'dont_write_bytecode', 'exc_info', 'excepthook', 'exec_prefix', 'executable',
'exit', 'flags', 'float_info', 'getcheckinterval', 'getdefaultencoding', 'getfil
esystemencoding', 'getprofile', 'getrecursionlimit', 'getrefcount', 'getsizeof',
'gettrace', 'getwindowsversion', 'hexversion', 'intern', 'maxsize', 'maxunicode
', 'meta_path', 'modules', 'path', 'path_hooks', 'path_importer_cache', 'platfor
m', 'prefix', 'ps1', 'ps2', 'setcheckinterval', 'setprofile', 'setrecursionlimit
', 'settrace', 'stderr', 'stdin', 'stdout', 'subversion', 'version', 'version_in
fo', 'warnoptions', 'winver']
>>> dir() # get list of attributes for current module
['__builtins__', '__doc__', '__name__', '__package__', 'sys']
>>> a = 5 # create a new variable 'a'
>>> dir()
['__builtins__', '__doc__', '__name__', '__package__', 'a', 'sys']
>>> del a # delete/remove a name
>>> dir()
['__builtins__', '__doc__', '__name__', '__package__', 'sys']
>>>
How It Works:
First, we see the usage of dir
on the imported sys
module. We can
see the huge list of attributes that it contains.
Next, we use the dir
function without passing parameters to it. By
default, it returns the list of attributes for the current
module. Notice that the list of imported modules is also part of this
list.
In order to observe the dir
in action, we define a new variable a
and assign it a value and then check dir
and we observe that there is
an additional value in the list of the same name. We remove the
variable/attribute of the current module using the del
statement and
the change is reflected again in the output of the dir
function.
A note on del
- this statement is used to delete a variable/name
and after the statement has run, in this case del a
, you can no
longer access the variable a
- it is as if it never existed before
at all.
Note that the dir()
function works on any object. For example, run
dir('print')
to learn about the attributes of the print function, or
dir(str)
for the attributes of the str class.
There is also a
vars()
function which can potentially give you the attributes and their
values, but it will not work for all cases.
By now, you must have started observing the hierarchy of organizing your programs. Variables usually go inside functions. Functions and global variables usually go inside modules. What if you wanted to organize modules? That's where packages come into the picture.
Packages are just folders of modules with a special __init__.py
file
that indicates to Python that this folder is special because it
contains Python modules.
Let's say you want to create a package called 'world' with subpackages 'asia', 'africa', etc. and these subpackages in turn contain modules like 'india', 'madagascar', etc.
This is how you would structure the folders:
- <some folder present in the sys.path>/
- world/
- __init__.py
- asia/
- __init__.py
- india/
- __init__.py
- foo.py
- africa/
- __init__.py
- madagascar/
- __init__.py
- bar.py
Packages are just a convenience to hierarchically organize modules. You will see many instances of this in the standard library.
Just like functions are reusable parts of programs, modules are reusable programs. Packages are another hierarchy to organize modules. The standard library that comes with Python is an example of such a set of packages and modules.
We have seen how to use these modules and create our own modules.
Next, we will learn about some interesting concepts called data structures.