123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393 |
- [[data_structures]]
- == Data Structures
- Data structures are basically just that - they are *structures* which can hold some *data*
- together. In other words, they are used to store a collection of related data.
- There are four built-in data structures in Python - _list, tuple, dictionary and set_. We will see
- how to use each of them and how they make life easier for us.
- [[list]]
- === List
- A `list` is a data structure that holds an ordered collection of items i.e. you can store a
- *sequence* of items in a list. This is easy to imagine if you can think of a shopping list where
- you have a list of items to buy, except that you probably have each item on a separate line in your
- shopping list whereas in Python you put commas in between them.
- The list of items should be enclosed in square brackets so that Python understands that you are
- specifying a list. Once you have created a list, you can add, remove or search for items in the
- list. Since we can add and remove items, we say that a list is a *mutable* data type i.e. this type
- can be altered.
- === Quick Introduction To Objects And Classes
- Although I've been generally delaying the discussion of objects and classes till now, a little
- explanation is needed right now so that you can understand lists better. We will explore this topic
- in detail in a <<oop,later chapter>>.
- A list is an example of usage of objects and classes. When we use a variable `i` and assign a value
- to it, say integer `5` to it, you can think of it as creating an *object* (i.e. instance) `i` of
- *class* (i.e. type) `int`. In fact, you can read `help(int)` to understand this better.
- A class can also have *methods* i.e. functions defined for use with respect to that class only. You
- can use these pieces of functionality only when you have an object of that class. For example,
- Python provides an `append` method for the `list` class which allows you to add an item to the end
- of the list. For example, `mylist.append('an item')` will add that string to the list
- `mylist`. Note the use of dotted notation for accessing methods of the objects.
- A class can also have *fields* which are nothing but variables defined for use with respect to that
- class only. You can use these variables/names only when you have an object of that class. Fields
- are also accessed by the dotted notation, for example, `mylist.field`.
- Example (save as `ds_using_list.py`):
- [source,python]
- --------------------------------------------------
- include::programs/ds_using_list.py[]
- --------------------------------------------------
- Output:
- --------------------------------------------------
- include::programs/ds_using_list.txt[]
- --------------------------------------------------
- .How It Works
- The variable `shoplist` is a shopping list for someone who is going to the market. In `shoplist`,
- we only store strings of the names of the items to buy but you can add _any kind of object_ to a
- list including numbers and even other lists.
- We have also used the `for..in` loop to iterate through the items of the list. By now, you must
- have realised that a list is also a sequence. The speciality of sequences will be discussed in a
- <<sequence,later section>>.
- Notice the use of the trailing comma in the `print` statement to indicate that we want to end the
- output with a space instead of the usual line break. Think of the comma as telling Python that we
- have more items to print on the same line.
- Next, we add an item to the list using the `append` method of the list object, as already discussed
- before. Then, we check that the item has been indeed added to the list by printing the contents of
- the list by simply passing the list to the `print` statement which prints it neatly.
- Then, we sort the list by using the `sort` method of the list. It is important to understand that
- this method affects the list itself and does not return a modified list - this is different from
- the way strings work. This is what we mean by saying that lists are _mutable_ and that strings are
- _immutable_.
- Next, when we finish buying an item in the market, we want to remove it from the list. We achieve
- this by using the `del` statement. Here, we mention which item of the list we want to remove and
- the `del` statement removes it from the list for us. We specify that we want to remove the first
- item from the list and hence we use `del shoplist[0]` (remember that Python starts counting from
- 0).
- If you want to know all the methods defined by the list object, see `help(list)` for details.
- [[tuple]]
- === Tuple
- Tuples are used to hold together multiple objects. Think of them as similar to lists, but without
- the extensive functionality that the list class gives you. One major feature of tuples is that they
- are *immutable* like strings i.e. you cannot modify tuples.
- Tuples are defined by specifying items separated by commas within an optional pair of parentheses.
- Tuples are usually used in cases where a statement or a user-defined function can safely assume
- that the collection of values i.e. the tuple of values used will not change.
- Example (save as `ds_using_tuple.py`):
- [source,python]
- --------------------------------------------------
- include::programs/ds_using_tuple.py[]
- --------------------------------------------------
- Output:
- --------------------------------------------------
- include::programs/ds_using_tuple.txt[]
- --------------------------------------------------
- .How It Works
- The variable `zoo` refers to a tuple of items. We see that the `len` function can be used to get
- the length of the tuple. This also indicates that a tuple is a <<sequence,sequence>> as well.
- We are now shifting these animals to a new zoo since the old zoo is being closed. Therefore, the
- `new_zoo` tuple contains some animals which are already there along with the animals brought over
- from the old zoo. Back to reality, note that a tuple within a tuple does not lose its identity.
- We can access the items in the tuple by specifying the item's position within a pair of square
- brackets just like we did for lists. This is called the _indexing_ operator. We access the third
- item in `new_zoo` by specifying `new_zoo[2]` and we access the third item within the third item in
- the `new_zoo` tuple by specifying `new_zoo[2][2]`. This is pretty simple once you've understood the
- idiom.
- .Tuple with 0 or 1 items
- [NOTE]
- An empty tuple is constructed by an empty pair of parentheses such as `myempty = ()`. However, a
- tuple with a single item is not so simple. You have to specify it using a comma following the first
- (and only) item so that Python can differentiate between a tuple and a pair of parentheses
- surrounding the object in an expression i.e. you have to specify `singleton = (2 , )` if you mean
- you want a tuple containing the item `2`.
- .Note for Perl programmers
- [NOTE]
- A list within a list does not lose its identity i.e. lists are not flattened as in Perl. The same
- applies to a tuple within a tuple, or a tuple within a list, or a list within a tuple, etc. As far
- as Python is concerned, they are just objects stored using another object, that's all.
- [[dictionary]]
- === Dictionary
- A dictionary is like an address-book where you can find the address or contact details of a person
- by knowing only his/her name i.e. we associate *keys* (name) with *values* (details). Note that the
- key must be unique just like you cannot find out the correct information if you have two persons
- with the exact same name.
- Note that you can use only immutable objects (like strings) for the keys of a dictionary but you
- can use either immutable or mutable objects for the values of the dictionary. This basically
- translates to say that you should use only simple objects for keys.
- Pairs of keys and values are specified in a dictionary by using the notation `d = {key1 : value1,
- key2 : value2 }`. Notice that the key-value pairs are separated by a colon and the pairs are
- separated themselves by commas and all this is enclosed in a pair of curly braces.
- Remember that key-value pairs in a dictionary are not ordered in any manner. If you want a
- particular order, then you will have to sort them yourself before using it.
- The dictionaries that you will be using are instances/objects of the `dict` class.
- Example (save as `ds_using_dict.py`):
- [source,python]
- --------------------------------------------------
- include::programs/ds_using_dict.py[]
- --------------------------------------------------
- Output:
- --------------------------------------------------
- include::programs/ds_using_dict.txt[]
- --------------------------------------------------
- .How It Works
- We create the dictionary `ab` using the notation already discussed. We then access key-value pairs
- by specifying the key using the indexing operator as discussed in the context of lists and
- tuples. Observe the simple syntax.
- We can delete key-value pairs using our old friend - the `del` statement. We simply specify the
- dictionary and the indexing operator for the key to be removed and pass it to the `del`
- statement. There is no need to know the value corresponding to the key for this operation.
- Next, we access each key-value pair of the dictionary using the `items` method of the dictionary
- which returns a list of tuples where each tuple contains a pair of items - the key followed by the
- value. We retrieve this pair and assign it to the variables `name` and `address` correspondingly
- for each pair using the `for..in` loop and then print these values in the for-block.
- We can add new key-value pairs by simply using the indexing operator to access a key and assign
- that value, as we have done for Guido in the above case.
- We can check if a key-value pair exists using the `in` operator.
- For the list of methods of the `dict` class, see `help(dict)`.
- .Keyword Arguments and Dictionaries
- [TIP]
- If you have used keyword arguments in your functions, you have already used dictionaries! Just
- think about it - the key-value pair is specified by you in the parameter list of the function
- definition and when you access variables within your function, it is just a key access of a
- dictionary (which is called the _symbol table_ in compiler design terminology).
- [[sequence]]
- === Sequence
- Lists, tuples and strings are examples of sequences, but what are sequences and what is so special
- about them?
- The major features are *membership tests*, (i.e. the `in` and `not in` expressions) and *indexing
- operations*, which allow us to fetch a particular item in the sequence directly.
- The three types of sequences mentioned above - lists, tuples and strings, also have a *slicing*
- operation which allows us to retrieve a slice of the sequence i.e. a part of the sequence.
- Example (save as `ds_seq.py`):
- [source,python]
- --------------------------------------------------
- include::programs/ds_seq.py[]
- --------------------------------------------------
- Output:
- --------------------------------------------------
- include::programs/ds_seq.txt[]
- --------------------------------------------------
- .How It Works
- First, we see how to use indexes to get individual items of a sequence. This is also referred to as
- the _subscription operation_. Whenever you specify a number to a sequence within square brackets as
- shown above, Python will fetch you the item corresponding to that position in the
- sequence. Remember that Python starts counting numbers from 0. Hence, `shoplist[0]` fetches the
- first item and `shoplist[3]` fetches the fourth item in the `shoplist`sequence.
- The index can also be a negative number, in which case, the position is calculated from the end of
- the sequence. Therefore, `shoplist[-1]` refers to the last item in the sequence and `shoplist[-2]`
- fetches the second last item in the sequence.
- The slicing operation is used by specifying the name of the sequence followed by an optional pair
- of numbers separated by a colon within square brackets. Note that this is very similar to the
- indexing operation you have been using till now. Remember the numbers are optional but the colon
- isn't.
- The first number (before the colon) in the slicing operation refers to the position from where the
- slice starts and the second number (after the colon) indicates where the slice will stop at. If the
- first number is not specified, Python will start at the beginning of the sequence. If the second
- number is left out, Python will stop at the end of the sequence. Note that the slice returned
- _starts_ at the start position and will end just before the _end_ position i.e. the start position
- is included but the end position is excluded from the sequence slice.
- Thus, `shoplist[1:3]` returns a slice of the sequence starting at position 1, includes position 2
- but stops at position 3 and therefore a *slice* of two items is returned. Similarly, `shoplist[:]`
- returns a copy of the whole sequence.
- You can also do slicing with negative positions. Negative numbers are used for positions from the
- end of the sequence. For example, `shoplist[:-1]` will return a slice of the sequence which
- excludes the last item of the sequence but contains everything else.
- You can also provide a third argument for the slice, which is the _step_ for the slicing (by
- default, the step size is 1):
- [source,python]
- --------------------------------------------------
- >>> shoplist = ['apple', 'mango', 'carrot', 'banana']
- >>> shoplist[::1]
- ['apple', 'mango', 'carrot', 'banana']
- >>> shoplist[::2]
- ['apple', 'carrot']
- >>> shoplist[::3]
- ['apple', 'banana']
- >>> shoplist[::-1]
- ['banana', 'carrot', 'mango', 'apple']
- --------------------------------------------------
- Notice that when the step is 2, we get the items with position 0, 2,... When the step size is 3, we
- get the items with position 0, 3, etc.
- Try various combinations of such slice specifications using the Python interpreter interactively
- i.e. the prompt so that you can see the results immediately. The great thing about sequences is
- that you can access tuples, lists and strings all in the same way!
- [[set]]
- === Set
- Sets are _unordered_ collections of simple objects. These are used when the existence of an object
- in a collection is more important than the order or how many times it occurs.
- Using sets, you can test for membership, whether it is a subset of another set, find the
- intersection between two sets, and so on.
- [source,python]
- --------------------------------------------------
- >>> bri = set(['brazil', 'russia', 'india'])
- >>> 'india' in bri
- True
- >>> 'usa' in bri
- False
- >>> bric = bri.copy()
- >>> bric.add('china')
- >>> bric.issuperset(bri)
- True
- >>> bri.remove('russia')
- >>> bri & bric # OR bri.intersection(bric)
- {'brazil', 'india'}
- --------------------------------------------------
- .How It Works
- The example is pretty much self-explanatory because it involves basic set theory mathematics taught
- in school.
- [[references]]
- === References
- When you create an object and assign it to a variable, the variable only _refers_ to the object and
- does not represent the object itself! That is, the variable name points to that part of your
- computer's memory where the object is stored. This is called *binding* the name to the object.
- Generally, you don't need to be worried about this, but there is a subtle effect due to references
- which you need to be aware of:
- Example (save as `ds_reference.py`):
- [source,python]
- --------------------------------------------------
- include::programs/ds_reference.py[]
- --------------------------------------------------
- Output:
- --------------------------------------------------
- include::programs/ds_reference.txt[]
- --------------------------------------------------
- .How It Works
- Most of the explanation is available in the comments.
- Remember that if you want to make a copy of a list or such kinds of sequences or complex objects
- (not simple _objects_ such as integers), then you have to use the slicing operation to make a
- copy. If you just assign the variable name to another name, both of them will ''refer'' to the same
- object and this could be trouble if you are not careful.
- .Note for Perl programmers
- [NOTE]
- Remember that an assignment statement for lists does **not** create a copy. You have to use slicing
- operation to make a copy of the sequence.
- [[more_strings]]
- === More About Strings
- We have already discussed strings in detail earlier. What more can there be to know? Well, did you
- know that strings are also objects and have methods which do everything from checking part of a
- string to stripping spaces!
- The strings that you use in program are all objects of the class `str`. Some useful methods of
- this class are demonstrated in the next example. For a complete list of such methods, see
- `help(str)`.
- Example (save as `ds_str_methods.py`):
- [source,python]
- --------------------------------------------------
- include::programs/ds_str_methods.py[]
- --------------------------------------------------
- Output:
- --------------------------------------------------
- include::programs/ds_str_methods.txt[]
- --------------------------------------------------
- .How It Works
- Here, we see a lot of the string methods in action. The `startswith` method is used to find out
- whether the string starts with the given string. The `in` operator is used to check if a given
- string is a part of the string.
- The `find` method is used to locate the position of the given substring within the string; `find`
- returns -1 if it is unsuccessful in finding the substring. The `str` class also has a neat method
- to `join` the items of a sequence with the string acting as a delimiter between each item of the
- sequence and returns a bigger string generated from this.
- === Summary
- We have explored the various built-in data structures of Python in detail. These data structures
- will be essential for writing programs of reasonable size.
- Now that we have a lot of the basics of Python in place, we will next see how to design and write a
- real-world Python program.
|