problem_solving.asciidoc 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277
  1. [[problem_solving]]
  2. == Problem Solving
  3. We have explored various parts of the Python language and now we will take a look at how all these
  4. parts fit together, by designing and writing a program which _does_ something useful. The idea is
  5. to learn how to write a Python script on your own.
  6. === The Problem
  7. The problem we want to solve is:
  8. __________________________________________________
  9. I want a program which creates a backup of all my important files.
  10. __________________________________________________
  11. Although, this is a simple problem, there is not enough information for us to get started with the
  12. solution. A little more *analysis* is required. For example, how do we specify _which_ files are to
  13. be backed up? _How_ are they stored? _Where_ are they stored?
  14. After analyzing the problem properly, we *design* our program. We make a list of things about how
  15. our program should work. In this case, I have created the following list on how _I_ want it to
  16. work. If you do the design, you may not come up with the same kind of analysis since every person
  17. has their own way of doing things, so that is perfectly okay.
  18. - The files and directories to be backed up are specified in a list.
  19. - The backup must be stored in a main backup directory.
  20. - The files are backed up into a zip file.
  21. - The name of the zip archive is the current date and time.
  22. - We use the standard `zip` command available by default in any standard GNU/Linux or Unix
  23. distribution. Note that you can use any archiving command you
  24. want as long as it has a command line interface.
  25. .For Windows users
  26. [NOTE]
  27. Windows users can http://gnuwin32.sourceforge.net/downlinks/zip.php[install] the `zip` command from
  28. the http://gnuwin32.sourceforge.net/packages/zip.htm[GnuWin32 project page] and add `C:\Program
  29. Files\GnuWin32\bin` to your system `PATH` environment variable, similar to <<dos_prompt,what we did
  30. for recognizing the python command itself>>.
  31. === The Solution
  32. As the design of our program is now reasonably stable, we can write the code which is an
  33. *implementation* of our solution.
  34. Save as `backup_ver1.py`:
  35. [source,python]
  36. --------------------------------------------------
  37. include::programs/backup_ver1.py[]
  38. --------------------------------------------------
  39. Output:
  40. --------------------------------------------------
  41. include::programs/backup_ver1.txt[]
  42. --------------------------------------------------
  43. Now, we are in the *testing* phase where we test that our program works properly. If it doesn't
  44. behave as expected, then we have to *debug* our program i.e. remove the *bugs* (errors) from the
  45. program.
  46. If the above program does not work for you, copy the line printed after the `Zip command is` line
  47. in the output, paste it in the shell (on GNU/Linux and Mac OS X) / `cmd` (on Windows), see what the
  48. error is and try to fix it. Also check the zip command manual on what could be wrong. If this
  49. command succeeds, then the problem might be in the Python program itself, so check if it exactly
  50. matches the program written above.
  51. .How It Works
  52. You will notice how we have converted our *design* into *code* in a step-by-step manner.
  53. We make use of the `os` and `time` modules by first importing them. Then, we specify the files and
  54. directories to be backed up in the `source` list. The target directory is where we store all the
  55. backup files and this is specified in the `target_dir` variable. The name of the zip archive that
  56. we are going to create is the current date and time which we generate using the `time.strftime()`
  57. function. It will also have the `.zip` extension and will be stored in the `target_dir` directory.
  58. Notice the use of the `os.sep` variable - this gives the directory separator according to your
  59. operating system i.e. it will be `'/'` in GNU/Linux and Unix, it will be `'\\'` in Windows and
  60. `':'` in Mac OS. Using `os.sep` instead of these characters directly will make our program portable
  61. and work across all of these systems.
  62. The `time.strftime()` function takes a specification such as the one we have used in the above
  63. program. The `%Y` specification will be replaced by the year with the century. The `%m`
  64. specification will be replaced by the month as a decimal number between `01` and `12` and
  65. so on. The complete list of such specifications can be found in the
  66. http://docs.python.org/2/library/time.html#time.strftime[Python Reference Manual].
  67. We create the name of the target zip file using the addition operator which _concatenates_ the
  68. strings i.e. it joins the two strings together and returns a new one. Then, we create a string
  69. `zip_command` which contains the command that we are going to execute. You can check if this
  70. command works by running it in the shell (GNU/Linux terminal or DOS prompt).
  71. The `zip` command that we are using has some options and parameters passed. The `-r` option
  72. specifies that the zip command should work **r**ecursively for directories i.e. it should include
  73. all the subdirectories and files. The two options are combined and specified in a shortcut as
  74. `-qr`. The options are followed by the name of the zip archive to create followed by the list of
  75. files and directories to backup. We convert the `source` list into a string using the `join` method
  76. of strings which we have already seen how to
  77. use.
  78. Then, we finally *run* the command using the `os.system` function which runs the command as if it
  79. was run from the *system* i.e. in the shell - it returns `0` if the command was successfully, else
  80. it returns an error number.
  81. Depending on the outcome of the command, we print the appropriate message that the backup has
  82. failed or succeeded.
  83. That's it, we have created a script to take a backup of our important files!
  84. .Note to Windows Users
  85. [NOTE]
  86. Instead of double backslash escape sequences, you can also use raw strings. For example, use
  87. `'C:\\Documents'` or `r'C:\Documents'`. However, do *not* use `'C:\Documents'` since you end up
  88. using an unknown escape sequence `\D`.
  89. Now that we have a working backup script, we can use it whenever we want to take a backup of the
  90. files. This is called the *operation* phase or the *deployment* phase of the software.
  91. The above program works properly, but (usually) first programs do not work exactly as you
  92. expect. For example, there might be problems if you have not designed the program properly or if
  93. you have made a mistake when typing the code, etc. Appropriately, you will have to go back to the
  94. design phase or you will have to debug your program.
  95. === Second Version
  96. The first version of our script works. However, we can make some refinements to it so that it can
  97. work better on a daily basis. This is called the *maintenance* phase of the software.
  98. One of the refinements I felt was useful is a better file-naming mechanism - using the _time_ as
  99. the name of the file within a directory with the current _date_ as a directory within the main
  100. backup directory. The first advantage is that your backups are stored in a hierarchical manner and
  101. therefore it is much easier to manage. The second advantage is that the filenames are much
  102. shorter. The third advantage is that separate directories will help you check if you have made a
  103. backup for each day since the directory would be created only if you have made a backup for
  104. that day.
  105. Save as `backup_ver2.py`:
  106. [source,python]
  107. --------------------------------------------------
  108. include::programs/backup_ver2.py[]
  109. --------------------------------------------------
  110. Output:
  111. --------------------------------------------------
  112. include::programs/backup_ver2.txt[]
  113. --------------------------------------------------
  114. .How It Works
  115. Most of the program remains the same. The changes are that we check if there is a directory with
  116. the current day as its name inside the main backup directory using the `os.path.exists`
  117. function. If it doesn't exist, we create it using the `os.mkdir` function.
  118. === Third Version
  119. The second version works fine when I do many backups, but when there are lots of backups, I am
  120. finding it hard to differentiate what the backups were for! For example, I might have made some
  121. major changes to a program or presentation, then I want to associate what those changes are with
  122. the name of the zip archive. This can be easily achieved by attaching a user-supplied comment to
  123. the name of the zip archive.
  124. WARNING: The following program does not work, so do not be alarmed, please follow along because
  125. there's a lesson in here.
  126. Save as `backup_ver3.py`:
  127. [source,python]
  128. --------------------------------------------------
  129. include::programs/backup_ver3.py[]
  130. --------------------------------------------------
  131. Output:
  132. --------------------------------------------------
  133. include::programs/backup_ver3.txt[]
  134. --------------------------------------------------
  135. .How This (does not) Work
  136. *This program does not work!* Python says there is a syntax error which means that the script does
  137. not satisfy the structure that Python expects to see. When we observe the error given by Python, it
  138. also tells us the place where it detected the error as well. So we start *debugging* our program
  139. from that line.
  140. On careful observation, we see that the single logical line has been split into two physical lines
  141. but we have not specified that these two physical lines belong together. Basically, Python has
  142. found the addition operator (`+`) without any operand in that logical line and hence it doesn't
  143. know how to continue. Remember that we can specify that the logical line continues in the next
  144. physical line by the use of a backslash at the end of the physical line. So, we make this
  145. correction to our program. This correction of the program when we find errors is called *bug
  146. fixing*.
  147. === Fourth Version
  148. Save as `backup_ver4.py`:
  149. [source,python]
  150. --------------------------------------------------
  151. include::programs/backup_ver4.py[]
  152. --------------------------------------------------
  153. Output:
  154. --------------------------------------------------
  155. include::programs/backup_ver4.txt[]
  156. --------------------------------------------------
  157. .How It Works
  158. This program now works! Let us go through the actual enhancements that we had made in version 3. We
  159. take in the user's comments using the `input` function and then check if the user actually entered
  160. something by finding out the length of the input using the `len` function. If the user has just
  161. pressed `enter` without entering anything (maybe it was just a routine backup or no special changes
  162. were made), then we proceed as we have done before.
  163. However, if a comment was supplied, then this is attached to the name of the zip archive just
  164. before the `.zip` extension. Notice that we are replacing spaces in the comment with underscores -
  165. this is because managing filenames without spaces is much easier.
  166. === More Refinements
  167. The fourth version is a satisfactorily working script for most users, but there is always room for
  168. improvement. For example, you can include a _verbosity_ level for the program where you can specify
  169. a `-v` option to make your program become more talkative or a `-q` to make it _quiet_.
  170. Another possible enhancement would be to allow extra files and directories to be passed to the
  171. script at the command line. We can get these names from the `sys.argv` list and we can add them to
  172. our `source` list using the `extend` method provided by the `list` class.
  173. The most important refinement would be to not use the `os.system` way of creating archives and
  174. instead using the http://docs.python.org/2/library/zipfile.html[zipfile] or
  175. http://docs.python.org/2/library/tarfile.html[tarfile] built-in modules to create these
  176. archives. They are part of the standard library and available already for you to use without
  177. external dependencies on the zip program to be available on your computer.
  178. However, I have been using the `os.system` way of creating a backup in the above examples purely
  179. for pedagogical purposes, so that the example is simple enough to be understood by everybody but
  180. real enough to be useful.
  181. Can you try writing the fifth version that uses the
  182. http://docs.python.org/2/library/zipfile.html[zipfile] module instead of the `os.system` call?
  183. === The Software Development Process
  184. We have now gone through the various *phases* in the process of writing a software. These phases
  185. can be summarised as follows:
  186. 1. What (Analysis)
  187. 2. How (Design)
  188. 3. Do It (Implementation)
  189. 4. Test (Testing and Debugging)
  190. 5. Use (Operation or Deployment)
  191. 6. Maintain (Refinement)
  192. A recommended way of writing programs is the procedure we have
  193. followed in creating the backup script: Do the analysis and
  194. design. Start implementing with a simple version. Test and debug
  195. it. Use it to ensure that it works as expected. Now, add any features that you want and continue to
  196. repeat the Do It-Test-Use cycle as many times as required.
  197. Remember:
  198. [quote,'http://97things.oreilly.com/wiki/index.php/Great_software_is_not_built,_it_is_grown[Bill de hÓra]']
  199. __________________________________________________
  200. Software is grown, not built.
  201. __________________________________________________
  202. === Summary
  203. We have seen how to create our own Python programs/scripts and the various stages involved in
  204. writing such programs. You may find it useful to create your own program just like we did in this
  205. chapter so that you become comfortable with Python as well as problem-solving.
  206. Next, we will discuss object-oriented programming.