Why Classes
A note on modularity and encapsulation...
Why orfs and codons begs to be implemented as classes
Constructing a Codon class
To construct the class use the class keyword - in much the same way you use the def keyword when defining a function.
# a class - a custom compound data type with functionality class Codon: pass
pass is a python keyword that is just means "do nothing". It is needed here because python requires something inside the class statement.
Creating an object
Your Codon class is a Codon object factory. Use it to make a Codon object like this:
# making an object codon = Codon() print codon
As you can see when you print it, codon is an instance of the Codon class, i.e. a Codon object.
Attributes
Classes and objects have attributes. You can make an aa and a triplet attribute like this:
# attributes codon.triplet = "ATG" from exerciseWeek4 import aminoAcidMap codon.aa = aminoAcidMap["ATG"] print codon.triplet print codon.aa
Just like a method is a function defined inside an object, an attribute is a variable inside a class. Note the .something syntax for creating/setting and accessing attributes.
The special method for Initializing
Now, we don't want to have to set the triplet and aa data in every object we make in this way. It would be nice if this happended automatically every time we made a new object using the Codon class. To do this we need to define an initialization method for the class. This method is always called __init__ so that Python recognizes it as a the function to be used for initialization. A method is defined in the much same way as an ordinary function is. What makes them different is the following three things:
- Methods are defined inside a class, where as functions can be defined anywhere.
- A method is called on an object made from the class it is defined it: object.method().
- The first parameter (not argument) of a method is the object that the method is called on. This variable is called self by convention. It is there so the method can access attributes and other methods in the object.
Lets write an __init__ method that sets the two attributes. Note how the argument "ATG" given to the class for constructing the codon object is passed to the __init__ method as the second parameter. The first parameter is always self.
# the initialization method class Codon: # first argument is the object that calls the method - self by convention def __init__(self, triplet): self.triplet = triplet self.aa = aminoAcidMap[triplet] codon = Codon("ATG") print codon print codon.triplet print codon.aa
This __init__ method is only called when the object is contructed and only serve to do the initial things to create the object the way we want it.
A method with no arguments
Task: write a method that prints the aa attribute.
class Codon: def __init__(self, triplet): self.triplet = triplet self.aa = aminoAcidMap[triplet] def printAA(self): print self.aa codon = Codon("ATG") codon.printAA()
Notice how the printAA method takes no arguments but still takes one parameter because self (the object it is called on) is always the first parameter of a method.
Calling the printAA is pretty much like saying "Hey codon - print your amino acid!". Notice how the perspective is different from using a function - where you would say "Hey function - print the amino acid of this codon onbject!"
A method with argument(s)
Task: write a method isSameAA that returns true if the AAs are the same or False otherwise:
class Codon: def __init__(self, triplet): self.triplet = triplet self.aa = aminoAcidMap[triplet] def isSameAA(self, other): return self.aa == other.aa def printAA(self): print self.aa codon1 = Codon("GTT") codon1.printAA() codon2 = Codon("GTC") codon2.printAA() print codon1.isSameAA(codon2)
If you call the method like this: codon.isSameAs(aa) the self parameter receives the value of codon and the triplet receives the value of aa.
Operator overloading
You may have wondered why all the following makes sense in Python:
4 == 5 # returns False becasue numbers are the same [1,2,3] == [1,2,3] # return True because all elemetns are the same "kasper" == "munch" # returns False because the strings are not identical
How does Python know what is "meant" when we test the equality of such different things? The answer is that python interprets x == y as x.__eq__(y). As you know everything is an object in Python. What happends is that Python looks in the x object for a method called __eq__ (eq for equal). If it finds it, as it does for integers, lists, strings any many other objects, it runs that method and returns whatever that method returns. If it does not find it it complains. This means that we can write our own __eq__ method what happens when two codon objects are compared with ==. Actually we already did this with out isSameAs method, so all we need to do is rename it to __eq__. And voila, now we can write codon1==codon2 and Python will know that what we mean is, weather the two codons code for the same amino acid. We have overloaded the == operator with new meaning
The same kind fo thing happens when print and object . A way to tell Python what to print is by defining the __str__ method. This method should return (not print). What print is supposed to print. We did this already too. All we need to do is rename printAA to __str__ and make it return the aa string attribute instead of prining it.
class Codon: def __init__(self, triplet): self.triplet = triplet self.aa = aminoAcidMap[triplet] def __eq__(self, other): print self.aa, other.aa return self.aa == other.aa def __str__(self): return self.aa codon1 = Codon("GTT") codon2 = Codon("GTC") print codon1, codon2 print codon1 == codon2
Constructing an ORF Class
Using classes in classes...
Task: write an __init__ method that takes an argument that is an orf sequence. The __init__ method should split the orf into codons and make an attribute that is a list of Codon objects. To help you, here is the code you used in the ORF exercise for the same purpose:
seq = "ATGGTTTAG" codonList = [] for i in range(0, len(seq), 3): codonList.append(seq[i:i+3])
Once you have done this, implement
- a __str__ method that returns the amino acid sequence as a string.
- an __eq__ method that returns True if the amino acid strings are identical and False otherwise.
print '=' * 30 class ORF: def __init__(self, seq): self.codonList = [] for i in range(0, len(seq), 3): s = seq[i:i+3] c = Codon(s) self.codonList.append(c) #self.codonList = [Codon(seq[i:i+3]) for i in range(0, len(seq), 3)] def __str__(self): s = '' for codon in self.codonList: s += str(codon) return s def __eq__(self, other): return str(orf) == str(orf) orf = ORF("ATGGTTTAG") print orf orf1 = ORF("ATGGTTTAG") orf2 = ORF("ATGGTATAG") print orf1 == orf2
Now that you have learned how Python knows how to test the equality of pairs of very different things, you probably also have a clue how it knows how to iterate over very different things like strings, lists and dictionaries.
for character in "kasper": print character for element in [1,2,3,4]: print element for key in {1: 5, 6: 2}: print key
The secret is that when an object is placed after the in operator, like it is in a for loop, Python looks for an __iter__ method that must return an object that has a method called next. In our case we just return self - the object itself. But doing that we need to make sure then than the object has a next method that returns the things we want to return one by one. For every iteration of the for loop next is called in the object. Here we implement it so it it returns each amino acid in turn. We use the idx attribute to keep track of which element we are at. Once there are no more elements to return we tell Python by writing raise StopIteration. You don't have to know why - but you can look it up.
print '=' * 30 class ORF: def __init__(self, seq): self.codonList = [] for i in range(0, len(seq), 3): s = seq[i:i+3] c = Codon(s) self.codonList.append(c) #self.codonList = [Codon(seq[i:i+3]) for i in range(0, len(seq), 3)] def __str__(self): s = '' for codon in self.codonList: s += str(codon) return s def __eq__(self, other): return str(orf) == str(orf) def __iter__(self): self.idx = 0 return self def next(self): if self.idx < len(self.codonList): i = self.idx self.idx += 1 return self.codonList[i] else: raise StopIteration orf = ORF("ATGGTTTAG") print orf orf1 = ORF("ATGGTTTAG") orf2 = ORF("ATGGTATAG") print orf1 == orf2 for codon in orf: print codon for codon in orf: print codon, codon.triplet