{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Bi 1x 2015: Introduction to Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*This tutorial was generated from an IPython notebook. You can download the notebook [here](intro_to_python.ipynb).*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have already installed [Anaconda](https://store.continuum.io/cshop/anaconda/). Anaconda contains most of what we need to do scientific computing with Python. At the most basic level, it has Python 2.7. It contains other modules we will make heavy use of, the three most important ones being [NumPy](http://www.numpy.org/), [matplotlib](http://matplotlib.org/), and [IPython](http://ipython.org). We will also make heavy use of [SciPy](http://www.scipy.org/) and [scikit-image](http://scikit-image.org/) throughout the course.\n", "\n", "In this tutorial, we will first learn some of the basics of the Python programming language at the same time exploring the properties of NumPy's very useful (and as we'll see, ubiquitous) `ndarray` data structure. Finally, we'll load some data and use matplotlib to generate plots." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting Started with Anaconda" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To launch Anacona, simply double click on the Anaconda icon and a window with four options will appear. \n", "\n", "The second option is to launch an IPython notebook. IPython notebooks are great for creating tutorials - in fact the tutorial you're following right now is in an IPython notebook! The beauty of using an IPython notebook is that you can combine professional typesetting with individual sections of code. The code can be run section by section, or the whole document can be run at once.\n", "\n", "For most of your programming, you will use the Integrated Development Environment called Spyder. To lauch Spyder, click on the last option. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Navigating Spyder" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Open Spyder and you will see that the window is divided into two sides. On the left is the Editor. This what you will use to type up your code. \n", "\n", "On the right is the console. Commands can by typed into the console and will be run immediately (when you hit enter). Conveniently, by default the console uses IPython, which is much easier to work with than a standard Python prompt.\n", "\n", "It is possible to run your entire program by typing it line by line into the console, but this is strongly discouraged. While it is often convenient when you first start programming to type a command into the console to see if it has worked or failed, the commands you need are often lost, and going back to rerun parts of the program require that you type the commands in again, or seach through your command history. This is time consuming and can lead to mistakes. Use the console as a resource to make sure individual lines of code are working correctly while building your code, but always build and run your code from the Editor. You can run your code in the console by using `run my_code.py`, where the file `my_code.py` contains your script.\n", "\n", "SETTING THE PATH: \n", "This is important, because it's the most common reason many people's code fails when they first open Spyder. You have to set the path, that way Spyder knows where to go looking to find the files you want it to run. \n", "\n", "To set the path, cick on the large blue folder on the top right of Spyder and navigate to the directory where your files are located. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting Started Programming" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get started, we're going to learn some basic commands. These can be run within this IPython notebook, but I would suggest that you type them into the console if you can, as typing them will help you remember the commands." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `Hello, world.`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a new programmer, your first task is to greet the world with your new set of skills! We'll start by printing `Hello, world`. In the console, type the following: " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello, world.\n" ] } ], "source": [ "print('Hello, world.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Do you see \"`Hello, world.`\" printed to the screen right below your command? Python is taking your input and printing it out in the console. If you were to type this into your editor and run the `.py` document, \"`Hello, world.`\" would still be shown in the console—this is where Python will print anything you ask to see. \n", "\n", "Here we see the syntax for function calls in Python. Function arguments are enclosed in parentheses. We also learned another important piece of Python syntax. Strings are enclosed in single (or double) quotes.\n", "\n", "Now print the following into your console: " ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello, world.\n" ] } ], "source": [ "# This prints hello world to the console\n", "print ('Hello, world.') #this is the command" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how even though you added words to the line, only \"`Hello, world.`\" is printed. The `#` starts a comment string, anything after the `#`, will not be read or interpreted by Python. This is how you add notes to your program about what certain lines do. Including comments in your code is essential so that other people can read your code and understand what you are trying to accomplish. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Python 2 and Python 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are two versions of Python that are currently in wide use, Python 2 and Python 3. Python 3 is the future, but many (many, many) packages are still written in Python 2. We will use Python 2 for this course. However, Python 3 has two changes that are very useful and not backwards compatible. The most important is that division in Python 3 is different. We'll demonstrate first by dividing two numbers in Python 2." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "5 / 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's right, division of integers returns the floor. This is not the case in Python 3, and we will use Python 3's division operator. We haven't talked about modules yet (we will below), but to ensure that Python 3 division happens, **you need to have the following at the beginning of every bit of code you write.**" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Set up Python 2 so that it divides and prints like Python 3\n", "from __future__ import division, print_function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have also specified that we will use Python 3 style printing, which we have implicitly done already. This is cosmetic, so we won't talk about it more here.\n", "\n", "**Warning**: Integer division is a very common source of bugs in Python. Keep this in mind when debugging code. Make sure you have imported `division` from \"the `__future__`!\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Variable Assignment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To assign a variable in python, we use an `=`, just like you would expect from any math class. Arithmetic operations are also as expected: `+`, `-`, `*`, `/`. \n", "\n", "Try the following: " ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a is 3\n", "b is 27\n", "c is 16.5\n", "d is 16\n" ] } ], "source": [ "a = 3\n", "\n", "b = a**3\n", "\n", "c = (b + 2*a) / 2\n", "\n", "d = (b + 2*a) // 2\n", "\n", "print('a is', a)\n", "print('b is', b)\n", "print('c is', c)\n", "print ('d is', d)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the assignment of the variable `b`, we used the `**` operator. This means \"raise to the power of.\" In the assignment of the variable `d`, we used the `//` operator. This does Python 2-style integer division, returning the floor of the result of the division." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lists, tuples, slicing, and indexing " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A list is a mutable array, meaning it can be edited. Let's explore below! Notice that a list is created using `[]`. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "my_list is [1, 2, 3, 4]\n", "my_list is now [3.14, 2, 3, 4]\n", "the last element in my_list is 4\n" ] } ], "source": [ "my_list = [1,2,3,4]\n", "print('my_list is', my_list)\n", "\n", "my_list[0] = 3.14\n", "print('my_list is now', my_list)\n", "print('the last element in my_list is', my_list[-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that indexing is done with brackets, `[]`. Notice also that **in Python, indexing starts are zero**! Also notice that we can index the last element of a list with `-1`.\n", "\n", "A tuple is just like a list, but it's immutable, meaning that once created, it cannot be changed. Let's try the same thing as above, but with a tuple, created with `()`. But notice that the indexing of the tuple is still denoted with `[ ]` in line 3. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tuple 1 is (1, 2, 3, 4)\n" ] }, { "ename": "TypeError", "evalue": "'tuple' object does not support item assignment", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;31m# This will make Python scream at us because tuples are immutable.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0mmy_tuple\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m3.14\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: 'tuple' object does not support item assignment" ] } ], "source": [ "# Create a tuple and print it\n", "my_tuple = (1,2,3,4)\n", "print('tuple 1 is', my_tuple) \n", "\n", "# This will make Python scream at us because tuples are immutable.\n", "my_tuple[0] = 3.14" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What's the error! Python is objecting because it cannot replace the `1` in `my_tuple[0]` with `3.14`; that operation is not supported. If you try printing out `my_tuple` again, you will see it hasn't been changed. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A string is just a bunch of letters, or letters and numbers, strung together. It can also be indexed, like we did above with the list and the tuple. Let's look at our favorite phrase. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The fifth letter in the phrase is o\n", "The first four letters are Hell\n" ] } ], "source": [ "my_string = 'Hello, world.'\n", "print('The fifth letter in the phrase is', my_string[4])\n", "print('The first four letters are', my_string[0:4])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "IMPORTANT! Python interprets `[0:4]` as $[0,4)$, so be careful when pulling out strings of specific length. Pulling small strings out of our larger string is called slicing, and can also be done with lists and tuples. This can be very powerful, as we can even pull out pieces at regular intervals. " ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 2, 3, 4, 5]\n", "[6, 7, 8, 9, 10]\n", "[1, 3, 5, 7, 9]\n", "[2, 5, 8]\n" ] } ], "source": [ "my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]\n", "\n", "a = my_list[0:5]\n", "print(a)\n", "\n", "b = my_list[5:]\n", "print(b)\n", "\n", "c = my_list[0:10:2]\n", "print(c)\n", "\n", "d = my_list[1:10:3]\n", "print(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Make sure you notice how we create lists c and d. We select the entries in the list from position 0 to 9, selecting every 2 or 3, respectively. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Objects, types, and methods" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python is object-oriented, and all values in a program are objects. An object consists of an identity (where it is stored in memory), a type (a definition of how the object is represented), and data (the value of the object). An object of a given type can have various methods that operate on the data of the object. How do we keep track of what our variables are? Fotunately, python has a function for this, called `type`. Let's try it out. First, we'll create some new objects. " ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the type of a is \n", "the type of b is \n", "the type of my_list is \n", "the type of my_list[0] is \n", "the type of my_list[1] is \n", "the type of my_list[2] is \n" ] } ], "source": [ "a = 4\n", "b = 4.6\n", "\n", "my_list = [1, 3.49, 'bi1x']\n", "\n", "print('the type of a is', type(a))\n", "print('the type of b is', type(b))\n", "print('the type of my_list is', type(my_list))\n", "print('the type of my_list[0] is', type(my_list[0]))\n", "print('the type of my_list[1] is', type(my_list[1]))\n", "print('the type of my_list[2] is', type(my_list[2]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What is most important to notice here is that `my_list` is a list, and that it can contain many different objects, from numbers to strings. \n", "\n", "The data are very stright forward—they are the numbers and values that you associate with your variable. \n", "\n", "Finally, objects have methods that can perform operations on the data. A method is called similarly to a function. This is best seen by example." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the number of 5's in the list is 2\n", "the number of 4's in the list is 1\n", "[1, 1, 3, 3, 4, 5, 5, 13, 17, 19, 31]\n", "[1, 1, 3, 3, 4, 5, 5, 13, 17, 19, 31, 'bi1x']\n" ] } ], "source": [ "my_list = [1 , 5 , 4 , 13 , 3 , 5 , 19 , 31 , 3 , 1 , 17]\n", "\n", "print('the number of 5\\'s in the list is', my_list.count(5))\n", "print('the number of 4\\'s in the list is', my_list.count(4))\n", "\n", "# Sort the list in place\n", "my_list.sort()\n", "\n", "print(my_list)\n", "\n", "# Tack on a string to the end of the list\n", "my_list.append('bi1x')\n", "print(my_list)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, an object has several methods including `count` and `sort`. They are called just like functions, with arguments in parentheses. The name of the method comes after the object name followed by a dot\n", "(`.`).\n", "\n", "The `count` method takes a single argument and returns the number of times that argument appears in the list. The sort function takes no arguments (but still requires open and closed parentheses to be called), and sorts the list in place. Note that `my_list` has been changed and is now sorted. We also use the method `append`, which adds another element to the end of a list.\n", "\n", "IPython conveniently allows you to see what methods and data are available by entering the object name followed by a `dot`, and then pressing tab. Try it!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Modules" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is common that scientific software packages such as [Matlab](http://www.mathworks.com/products/matlab/) and [Mathematica](http://www.wolfram.com/mathematica/) are optimized for a flavor of scientific computing (such as matrix computation in the case of Matlab) and are rather full-featured. On the other hand, Python is a programming language. It was not specifically designed to do scientific computing. So, plain old Python is very limited in scientific computing capability.\n", "\n", "However, Python is very flexible and allows use of **modules**. A module contains classes, functions, attributes, data types, etc., beyond what is built in to Python. In order to use a module, you need to import it to make it available for use. So, as we begin working on data analysis and simulations in Bi 1x, we need to import modules we will use.\n", "\n", "The first things we will import come from the `__future__` module, which we have already seen. This is a special module that enables use of Python 3 standards while running Python 2. Having these in place in your code will help you in the future when you eventually migrate to Python 3. In addition to `division` and `print_function`, you can also import `unicode_literals` and `absolute_import` to make your code more fully Python 3 compliant. The latter two are not necessary for Bi 1x, though." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from __future__ import division, print_function, \\\n", " absolute_import, unicode_literals" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The construction `from import ` puts `` (the things you imported) in the namespace. This construction allows you to pick and choose what attributes you want to import from a given module. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is important to note that until we imported the `__future__` module, its capacities were not available. Keep that in mind: *Plain old Python won't do much until you import a module.*\n", "\n", "Let's now import one of the major workhorses of our class, NumPy!" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "circumference / diameter = 3.14159265359\n", "cos(pi) = -1.0\n" ] } ], "source": [ "# Importing is done with the import statement\n", "import numpy as np\n", "\n", "# We now have access to some handy things, ie np.pi\n", "print('circumference / diameter = ', np.pi)\n", "print('cos(pi) = ', np.cos(np.pi))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that we used the `import ... as` construction. This enabled us to abbreviate the name of the module so we do not have to type `numpy` each time.\n", "\n", "Also, notice that to access the (approximate) value of $\\pi$ in the `numpy` module, we prefaced the name of the attiribute (`pi`) with the module name followed by a dot (`np.`). This is generally how you access attributes in modules.\n", "\n", "We're already getting dangerous with Python. So dangerous, in fact, that we'll write our own module!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Writing your own module (and learning a bunch of syntax!)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Modules are stored in files ending in `.py`. As an example, we will create a module that finds the roots of the quadratic equation\n", "\n", "\\begin{align}\n", "ax^2 + bx + c = 0.\n", "\\end{align}\n", "\n", "Using the Anaconda editing window, we'll create a file called `quadratic.py` containing the code below. The file should be saved in a directory that is part of your `PYTHONPATH` environment variable (which usually contains the present working directory) so that the interpreter can find it." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [], "source": [ "\"\"\"\n", "*** This should be stored in a file quadratic.py. ***\n", "\n", "Quadratic formula module\n", "\"\"\"\n", "from __future__ import division, print_function\n", "import numpy as np\n", "\n", "\n", "# ############\n", "def discriminant(a, b, c):\n", " \"\"\"\n", " Returns the discriminant of a quadratic polynomial\n", " a * x**2 + b * x + c = 0. \n", " \"\"\"\n", " return b**2 - 4.0 * a * c\n", "\n", "\n", "# ############\n", "def roots(a, b, c):\n", " \"\"\"\n", " Returns the roots of the quadratic equation\n", " a * x**2 + b * x + c = 0.\n", " \"\"\" \n", " delta = discriminant(a, b, c)\n", " root_1 = (-b + np.sqrt(delta)) / (2.0 * a)\n", " root_2 = (-b - np.sqrt(delta)) / (2.0 * a)\n", " \n", " return root_1, root_2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is a whole bunch of syntax in there to point out.\n", "- Even though we may have already imported NumPy and items from `__future__` in our Python session, we need to explicitly import it (and any other module we need) in the `.py` file. This ensures that any time we call the function it has the operations it needs to run. \n", "- A function is defined within a module with the `def` statement. It has the function prototype, followed by a colon.\n", "- **Indentation in Python matters!** Everything indented after the `def` statement is part of the function. Once the indentation goes back to the level of the `def` statement, you are no longer in the function.\n", "- We can have multiple functions in a single module (in one `.py` file).\n", "- The `return` statement is used to return the result of a function. If multiple objects are returned, they are separated by commas.\n", "- The text within triple quotes are **doc strings**. They say what the function or module does. These are essential for people to know what your code is doing.\n", "\n", "Now, let's test our new module out! Note that because this tutorial was prepared in an IPython notebook, we do not import the module because it was not created in a separate file. We have noted how the syntax changes in the comments." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "roots: 3.0 -0.666666666667\n" ] } ], "source": [ "# When you run from the console, you will need to import the module.\n", "# Uncomment the line below.\n", "# import quadratic as qd\n", "\n", "# Python has nifty syntax for making multiple definitions on the same line\n", "a, b, c = 3.0, -7.0, -6.0\n", "\n", "# Call the function and print the result. You will call qd.roots,\n", "# since you imported the function from a module\n", "root_1, root_2 = roots(a, b, c)\n", "print('roots:', root_1, root_2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Very nice! Now, let's try another example. This one might have a problem...." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "roots: nan nan\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/Users/Justin/anaconda/lib/python2.7/site-packages/IPython/kernel/__main__.py:26: RuntimeWarning: invalid value encountered in sqrt\n", "/Users/Justin/anaconda/lib/python2.7/site-packages/IPython/kernel/__main__.py:27: RuntimeWarning: invalid value encountered in sqrt\n" ] } ], "source": [ "# Specify a, b, and c that will give imaginary roots\n", "a, b, c = 1.0, -2.0, 2.0\n", "\n", "# Call the function and print the result (again, call qd.roots)\n", "root_1, root_2 = roots(a, b, c)\n", "print('roots:', root_1, root_2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Oh no! It gave us `nan`, which means \"not a number,\" as our roots. It also gave some warning that it encountered invalid (negative) arguments for the `np.sqrt` function. The roots should be $1 \\pm i$, where $i = \\sqrt{-1}$. We will use this opportunity to introduce Python's control flow, starting with an `if` statement." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Control flow: the `if` statement" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will decree that our quadratic equation solver only handles real roots, so it will raise an **exception** if an imaginary root is encountered. So, we modify the contents of the file `quadratic.py` as follows." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [], "source": [ "\"\"\"\n", "*** This should be stored in a file quadratic.py. ***\n", "\n", "Quadratic formula module\n", "\"\"\"\n", "from __future__ import division, print_function\n", "import numpy as np\n", "\n", "\n", "# ############\n", "def discriminant(a, b, c):\n", " \"\"\"\n", " Returns the discriminant of a quadratic polynomial\n", " a * x**2 + b * x + c = 0. \n", " \"\"\"\n", " return b**2 - 4.0 * a * c\n", "\n", "\n", "# ############\n", "def roots(a, b, c):\n", " \"\"\"\n", " Returns the roots of the quadratic equation\n", " a * x**2 + b * x + c = 0.\n", " \"\"\" \n", " delta = discriminant(a, b, c)\n", "\n", " if delta < 0.0:\n", " raise ValueError('Imaginary roots! We only do real roots!')\n", " else:\n", " root_1 = (-b + np.sqrt(delta)) / (2.0 * a)\n", " root_2 = (-b - np.sqrt(delta)) / (2.0 * a)\n", " \n", " return root_1, root_2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have now exposed the syntax for a Python `if` statement. The conditional expression ends with a colon, just like the `def` statement. Note the indentation of blocks of code after the conditionals. (We actually did not need the `else` statement, because the program would just continue without the exception, but I left it there for illustrative purposes. It is actually preferred not to have the `else` statement.)\n", "\n", "Now if we re-import the module (we can use the Python function `reload` in the console for this), the `if` statement will catch our imaginary roots and raise an exception. Note that *you must reload (or start over again and import) the module before your changes take effect.*" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "ename": "ValueError", "evalue": "Imaginary roots! We only do real roots!", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;31m# Pass in parameters that will give imaginary roots (use qd.roots)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mb\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mc\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m1.0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m-\u001b[0m\u001b[0;36m2.0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2.0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0mroot_1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mroot_2\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mroots\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mb\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mc\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m\u001b[0m in \u001b[0;36mroots\u001b[0;34m(a, b, c)\u001b[0m\n\u001b[1;32m 26\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 27\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mdelta\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0;36m0.0\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 28\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Imaginary roots! We only do real roots!'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 29\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 30\u001b[0m \u001b[0mroot_1\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0mb\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msqrt\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdelta\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;36m2.0\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0ma\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mValueError\u001b[0m: Imaginary roots! We only do real roots!" ] } ], "source": [ "# Reload the quadratic module using its abbeviated name we already defined\n", "# Uncomment below)\n", "# reload(qd)\n", "\n", "# Pass in parameters that will give imaginary roots (use qd.roots)\n", "a, b, c = 1.0, -2.0, 2.0\n", "root_1, root_2 = roots(a, b, c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This threw the appropriate exception.\n", "\n", "Congrats! You wrote a functioning module. But now is an important lesson.... " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loops" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want to do something many times, we can use a for loop. This is again best learned by example. Let’s make a function that counts the number of times a subsequence is present in a sequence of DNA. We created a new file called `dna_counter.py`." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def n_subseq (seq, subseq) :\n", " \"\"\"\n", " Given a sequence seq , returns the number of occurrances of subseq.\n", " \"\"\"\n", " # Determine the lengths of the sequence and subsequence\n", " len_subseq = len(subseq)\n", " len_seq = len(seq)\n", " \n", " # First make sure the length of subseq is shorter than seq.\n", " if len_subseq > len_seq :\n", " return 0\n", "\n", " # We loop through the sequence to check for matches\n", " num_subseq = 0\n", " for i in range(0, len_seq - len_subseq + 1):\n", " if seq [i:i+len_subseq] == subseq:\n", " num_subseq += 1 # The += 1 increases the value of a variable by 1\n", "\n", " # We are done looping now , so return the number of subsequences\n", " return num_subseq" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see how it works!" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 1 GATs in the sequence.\n" ] } ], "source": [ "# Uncomment line below\n", "# import dna_counter as dnac\n", "\n", "seq = 'ACTGTACGATCGAGCGATCGAGCGAGTCATTACGACTGAGATCC'\n", "\n", "subseq = 'GAT'\n", "\n", "# Call dnac.nsubseq\n", "n_gat = n_subseq (seq, subseq)\n", "\n", "# Print the result\n", "print('There are %d GATs in the sequence.' % n_gat)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we know it works, let’s look at how the loop was constructed. In the statement at the beginning of the loop, we use the function `range` to define an *iterator*. Calling `range(n)` creates a list of integers from `0` to `n-1`. The `for` statement says that the variable `i` will successively take the values of the iterator." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Keyword arguments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before concluding our quick trip through the very basics of Python and on to NumPy, I want to show a very handy tool in Python, keyword arguments. Before, when we defined a function, we specified its arguments as variable names separated by commas in the def statement. We can also\n", "specify keyword arguments. Here is an example from our quadratic equation solver." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "\"\"\"\n", "Quadratic formula module\n", "\"\"\"\n", "from __future__ import division, print_function\n", "import numpy as np\n", "\n", "\n", "# ############\n", "def discriminant(a, b, c):\n", " \"\"\"\n", " Returns the discriminant of a quadratic polynomial\n", " a * x**2 + b * x + c = 0. \n", " \"\"\"\n", " return b**2 - 4.0 * a * c\n", "\n", "\n", "# ############\n", "def roots(a, b, c, print_discriminant=False,\n", " message_to_the_world='Bi 1x rules'):\n", " \"\"\"\n", " Returns the roots of the quadratic equation\n", " a * x**2 + b * x + c = 0.\n", " \n", " If print_discriminant is True, prints discriminant to screen\n", " \"\"\" \n", " delta = discriminant(a, b, c)\n", " if print_discriminant: \n", " print('discriminant =', delta)\n", " \n", " if message_to_the_world is not None: \n", " print('\\n' + '*'*len(message_to_the_world))\n", " print(message_to_the_world)\n", " print('*'*len(message_to_the_world) + '\\n')\n", "\n", " if delta < 0.0:\n", " raise ValueError('Imaginary roots! We only do real roots!')\n", " else:\n", " root_1 = (-b + np.sqrt(delta)) / (2.0 * a)\n", " root_2 = (-b - np.sqrt(delta)) / (2.0 * a)\n", " \n", " return root_1, root_2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The function quadratic_roots now has two keyword arguments. They are signified by the equals sign. \n", "\n", "The function has three required arguments, `a`, `b`, `c`. If the keyword arguments are omitted in the function call, they take on the default values, as specified in the function definition. \n", "\n", "In the example, the default for `print_discriminant` is `False` and the default for `message_to_the_world` is `'Bi 1x rules!'`. Furthermore, ordering of keyword arguments does not matter when calling the function. They are called in the function similarly to the way they are defined in the function definition.\n", "\n", "Let's try it!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "a, b, c = 3.0, -7.0, -6.0\n", "\n", "# Call qd.roots\n", "root_1, root_2 = roots(a, b, c)\n", "print('roots:', root_1, root_2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since we did not specify the keyword arguments, the defaults were used. We can specify other values." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "root_1, root_2 = roots(a, b, c, print_discriminant=True,\n", " message_to_the_world='Bi 1x TAs are the best!')\n", "print('roots:', root_1, root_2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Intro to NumPy, SciPy, and Matplotlib" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**If you are trying to do a task that you think might be common, it's probably part of NumPy or some other package.** Look, or ask Google, first. In this case, NumPy has a function called `roots` that computes the roots of a polynomial. To figure out how to use it, we can either look at the doc string, or look in the [NumPy and SciPy documentation online](http://docs.scipy.org/doc/) (the documentation for `np.roots` is available [here](http://docs.scipy.org/doc/numpy/reference/generated/numpy.roots.html)). To look at the doc string, you can enter the following at an IPython prompt:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "np.roots?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that we need to pass the coefficients of the polynomial we would like the roots of using an \"`array_like`\" object. We will discuss what this means in a moment, but for now, we will just use a **list** to specify our coefficients and call the `np.roots` function." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Define the coefficients in a list (using square brackets)\n", "coeffs = [3.0, -7.0, -6.0]\n", "\n", "# Call np.roots. It returns an np.ndarray with the roots\n", "roots = np.roots(coeffs)\n", "print('Roots for (a, b, c) = (3, -7, -6):', roots)\n", "\n", "# It even handles complex roots!\n", "roots = np.roots([1.0, -2.0, 2.0])\n", "print('Roots for (a, b, c) = (1, -2, 2): ', roots)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Some `array_like` data types" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the previous example, we used a list as an `array_like` data type. Python has several native data types. We have already mentioned `int`s and `float`s. We just were not very explicit about it. Python's native `array_like` data types are **lists** and **tuples**. Internally, these things are converted into **NumPy arrays**, which is the most often used `array_like` data type we will use. NumPy arrays are your new best friend." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The `np.ndarray`: maybe your new best friend" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lists and tuples can be useful, but for many many applications in data analysis, the `np.ndarray`, which we will colloquially call a \"**NumPy array**,\" is most often used. They are created using the `np.array` function with a list or tuple as an argument. Once created, we can do all sorts of things with them. Let's play!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's make some arrays to see what they look like: " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "array_1 = np.array([1, 2, 3, 4])\n", "print('array 1:')\n", "print(array_1, '\\n')\n", "\n", "array_2 = np.array([[1, 2], [1, 2]])\n", "print('array_2:')\n", "print(array_2, '\\n')\n", "\n", "array_3 = np.array([[1, 2, 3], [1, 2, 3], [1, 2, 3]])\n", "print('array_3:')\n", "print(array_3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sometimes you want an array of all zero values, and that can also be done with `numpy`. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "zero_array = np.zeros((3,4))\n", "print(zero_array, '\\n')\n", "print('the dimesions are', zero_array.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's see how we can do operations on some arrays. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "a = np.array([1, 2, 3])\n", "b = np.array([4.0, 5.0, 6.0])\n", "\n", "# Arithmetic operations are done elementwise\n", "print('a: ', a)\n", "print('b: ', b)\n", "print('a + b: ', a + b)\n", "print('a * b: ', a * b)\n", "print('1.0 + a:', 1.0 + a)\n", "print('a**2: ', a**2)\n", "print('b**a: ', b**a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can check the data type of our matrix." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(a.dtype)\n", "print(b.dtype)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also change the type of the entries of our arrays, for example, from integers to floating. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "array_a = a.astype(float)\n", "print(array_a)\n", "\n", "array_b = b.astype(int)\n", "print(array_b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Slicing is also intuitive. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "a = np.array ([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])\n", "\n", "print(a, '\\n')\n", "print(a[0:3, 2:3])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# view second column\n", "a[:,1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# see all entries below the value of 10\n", "a[a<10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using slices, we can reassign values to the entries in an np.ndarray. For example, say we wanted the third row to be all zeros." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "a [2, :] = np.zeros (a[2 ,:].shape)\n", "\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also reshape arrays." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "a = a.reshape (2 ,8)\n", "print(a, '\\n') \n", "\n", "a = a.reshape(4,4)\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Subpackages in NumPy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you import NumPy, you get a set of core functions, such as np.dot. However, it would be wasteful to import all that NumPy offers into the namespace. Therefore, some of NumPy’s functionality must be separately imported. For example, if we wanted to do some random number generation,\n", "we would need to import numpy.random." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from numpy import random\n", "a = random.rand(4,4)\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### NumPy functions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are some useful NumPy functions we think you might want to use! Go through these one-by-one in the console." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# create evenly spaced points\n", "np.linspace(0,1,10)\n", "\n", "# matrix or vector dot products\n", "np.dot(a,a)\n", "\n", "# concatenate in row dimensions\n", "np.concatenate((a,a))\n", "\n", "# concatenate in the column dimension\n", "np.concatenate((a,a), axis=1)\n", "\n", "# transpose (omit semicolon to see output)\n", "np.transpose(a);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `matplotlib`: our primary plotting tool" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`matplotlib` is a tool for plotting the data stored in NumPy arrays. We will mostly use the interactive plotting module, `matplotlib.pyplot`, which we will abberviate as `plt`. Its syntax is quite simple, and best learned by example. \n", "\n", "We will now write a script to plot some functions. Make a file called `my_first_mpl_plot.py`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "\"\"\" \n", "Make some plots!\n", "\"\"\"\n", "import numpy as np\n", "from numpy import random\n", "import matplotlib.pyplot as plt\n", "\n", "# Import magic function for graphics in IPython notebook\n", "%matplotlib inline\n", "\n", "# Make an x-variable for plotting\n", "x = np.linspace (0, 2*np.pi, 200)\n", "\n", "# This is a nice function\n", "y_1 = np.exp(np.sin(x))\n", "\n", "# We can make another one\n", "y_2 = np.exp(np.cos(x))\n", "\n", "\n", "# We can make some random data to plot as well\n", "x_rand = random.rand(20) * 2 * np.pi\n", "y_rand = random.rand(20) * 3.0\n", "\n", "# Now plot them .\n", "plt.plot(x, y_1, '-') # The ’-’ means to use a line plot\n", "plt.plot(x, y_2, '-')\n", "plt.plot(x_rand, y_rand, 'ko') # The ’ko ’ mean plot as black circles\n", "\n", "# Label the axes. \n", "plt.xlabel('x')\n", "plt.ylabel('y')\n", "\n", "# We can save it as a PDF as well\n", "plt.savefig('my_first_mpl_plot.pdf')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Programming style" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**[PEP](http://legacy.python.org/dev/peps/pep-0001/)**s (Python Enhancement Proposals) document suggestions and guidelines for the Python language, its development, etc. My personal favorite is [PEP8, the Style Guide for Python Code](http://legacy.python.org/dev/peps/pep-0008/). This was largely written by Guido von Rossum, the inventor of Python and its benevolent dictator for life. It details how you should style your code. As Guido says, code is much more often read than written. I strongly urge you to follow PEP8 the best you can. It's a lot to read, so I will highlight important points here. If you follow these guidelines, your code will be much more readable than otherwise. This is particularly useful because you are working in groups and the TAs need to grade your work.\n", "\n", "- Limit line widths to 79 characters (the line break character for Python is `\\`).\n", "- Use spaces between all operators except `**`. E.g., `a = b + c * d**2`.\n", "- In function calls, use a space after each comma. Use no spaces before and after the equals sign when using keyword arguments. E.g., `my_fun(a, b, c=True)`.\n", "- Do not use excess space when indexing. E.g., `a[i]`, not `a [ i ]`.\n", "- Function names should be lowercase, with words separated by underscores as necessary to improve readability.\n", "- Class names should be CamelCase.\n", "- Comment lines should appear immediately before the code they describe. Use in-line comments sparingly." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This concludes our introductory tour of Python with some NumPy, SciPy, and matplotlib thrown in for good measure. There is still *much* to learn, but you will pick up more and more tricks and syntax as we go along.\n", "\n", "For the next tutorial, we will use Python to do some image processing. That is, we will write code to extract data of interest from images. As you get more and more proficient, coding (particularly in Python, in my opinion) will be more and more empowering and FUN!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.9" } }, "nbformat": 4, "nbformat_minor": 0 }