Python is Definitely Not a Snake
What is Python? You probably know it’s a programming language. But is Python a program? A language? An ecosystem? It’s all of these and more. What is the actual software that lets us create such useful scripts, modules, packages, and frameworks?
Many programmers, not to mention managers, and dare I say training companies, are justifiably confused about exactly what it takes to develop applications with Python.
This article will discuss which versions and implementations of Python exist, why you might want to choose a specific version/implementation, and how to manage multiple versions of Python on the same computer.
A History Lesson
Python version 0.9 was first released to the alt.sources USENET newsgroup by Guido van Rossum in 1991. Once released, Guido and many other developers continued to enhance Python for many years, through many major and minor releases. Version 1.0 was released in 1994.
Python 2.5 was released in 2006, and like all releases so far, was backwards compatible all the way back to version 1.0. Around this time, however, the chief Python developers realized that there were some things they’d like to change. Some components were obsolete, badly named, or poorly implemented.
Some of these things could be quietly fixed, but many of them would require changes to the language that would affect existing code. A new version of Python that broke legacy code would be a disaster. The solution to this dilemma was to create a new branch of Python development.
Around the time Python 2.5 was released, Guido and friends started working on Python 3 (jokingly called Python 3000) while simultaneously working on Python 2.6. In October of 2008, Python 2.6 was released, quickly followed in December by Python 3.0. Many of the new features developed for Python 3 were backported to 2.6, as long as they didn’t break existing 2.x code.
After 2.6, it was decided that 2.7 would be the last main release of Python 2. Of course there would be maintenance updates to fix bugs, but the focus of new Python development shifted completely to Python 3. At I write this, in 2015, there is no good reason to use Python 2 unless you have a large amount of legacy code written in Python 2. Guido has set the end-of-life date for Python 2 to be 2020.
You will find that there is not one “killer feature” in Python 3 that makes it better than Python 2. Rather, there are dozens of small and medium improvements. The most visible feature might be the new print() function, which replaces the old Python 2 print keyword. This new function gives much greater flexibility, allowing the user to specify the separator between items, as well as what is printed after the list of items.
Python 2 (separator is always one space)
>>> x = 5
>>> y = 10
>>> z = 15
>>> print x, y, z
5 10 15
Python 3 (separator and end-of-line character can be specified)
>>> x = 5
>>> y = 10
>>> z = 15
>>> print(x, y, z)
5 10 15
>>> print(x, y, z, sep=’,’)
>>> print(x, y, z, sep=”)
Another improvement: In Python 2, the ASCII string type (str) was also used for binary data, and there was a second type, unicode, to handle Unicode characters. This was confusing and sometimes tricky to work with, so in Python 3 we now have the bytes type for binary data, and all strings (still type str) are Unicode. This is a lot simpler to work with.
These differences slowed down the adoption of Python 3. At first, it was because, while all of the standard library (the included modules) had been ported to Python 3, some of the most useful “external” modules had not. After a while, some programmers put up the Python Wall of Shame to show which modules had been ported. When the proportion of ported modules reached 50%, they renamed it to the Python Wall of Superpowers. Nowadays, the vast majority of useful modules have been ported.
See https://docs.python.org/3/whatsnew/3.0.html for the official summary of differences between versions 2 and 3.
There are several different implementations of Python in different languages, and with different internal coding. This section describes the various implementations and why you might choose them.
CPython is the “normal” implementation of Python that most people use. It is written in C/C++, and is available in binary or source form from http://www.python.org/download. While there is not an ANSI standard for Python, CPython is considered the reference implementation, and it is what the other implementations in this section emulate.
CPython is implemented as an executable interpreter, usually named python.exe or python, depending on the platform.
Bottom line: Unless you have very special requirements, you should be using CPython.
All of the following implementations are workalikes to CPython; some do not implement every feature, and not all modules written for CPython are available for them.
The language Cython (not to be confused with CPython) is an extension of Python. It adds type checking, and allows inline C code. The three main advantages of Cython are speed, speed, and speed. Your Python source code is converted to C which is then compiled by a standard C compiler.
As you might guess, the overriding use case for Cython is to speed up your code; a secondary reason is easily create C extensions.
You might want to use Cython if you have a scientific or data analysis algorithm that is too slow in normal Python and not implemented in SciPy or one of its submodules.
Available from: http://cython.org/. Note: Cython is not available via pyenv.
PyPy is an implementation of Python that is designed for fast execution via a just-in-time compiler. It can save memory as well. It supports most popular Python modules. Other than speed and memory, a third advantage of PyPy is that it incorporates StackLess, described below.
Use PyPy if it is important to greatly improve the performance of large number-crunching Python applications. Do not use PyPy if you are writing small utility scripts, your code has a lot of I/O, or if you need C extensions. Also, PyPy may not work if you need certain widely used libraries that are written in C. This includes some modules in the standard library.
It’s best to think of PyPy as an experimental version, also I’m sure the developers would take issue with me for saying that.
Available from: http://pypy.org/ or use pyenv, as described below.
Jython is an implementation of Python in Java. It provides an interpreter, the standard library, and other tools. In addition, it can load and use classes written in Java. Jython releases tend to lag behind CPython releases.
The use case for Jython is when you have a Java codebase that you need to access from Python.
Available from: http://www.jython.org/ or use pyenv.
IronPython is to .NET what Jython is to Java, in pretty much every way. IronPython scripts can import and use any libraries in the .NET framework.
Available from: http://ironpython.net/download/ or use pyenv, as described below.
Stackless Python (sometimes called just “Stackless”) is a Python implementation that uses microthreads, which avoids some limitations of threading in CPython. However, it does not allow a program to utilize multiple CPU cores.
Available from: http://www.stackless.com/. (At the time of this writing, the web site is having technical issues: “Dear visitor, due to technical problems, the Stackless website is currently almost completely down.”), or use pyenv.
Corralling Your Code Into Useful Units
Larry Wall, the developer of Perl, once said “Suppose you went back to Ada Lovelace and asked her the difference between a script and a program. She’d probably look at you funny, then say something like: Well, a script is what you give the actors, but a program is what you give the audience.” For most interpreted languages like Python, a script is a program.
A script is the basic unit of Python programming. It consists of a text file containing Python commands, and is executed by the Python interpreter. Python scripts typically end in .py. While this is not, strictly speaking, required, it is a good idea. An application can be a single script (one file), but generally applications contain a “main” script and many modules.
Many scripts have a main() function that is the program entry point, but this is a convention, not a language requirement.
A Python module is what many languages call a library.
A module is just a specialized script. The only thing that makes a module different from a script is what the developer puts in it. There is no special extension to the file, nor special directives in the code.
Modules contain classes, functions, and variables. A module can contain a single class, multiple classes, classes and functions, or any other combination. Typically there are no top-level statements outside of classes and functions; some modules might, however, contain initialization code.
Many modules, including dozens from the standard library, are written so that they can not only be imported as modules, but also directly executed as scripts. Examples include unittest, pdb (the Python debugger), profile, and calendar. As an example, the unittest module can be imported into scripts for creating tests, but you can also run it from the command line to discover and run tests like this:
python -munittest discover
The -mmodule option says to import the module, and then execute the main() function defined within the module.
Extension Modules (stretching Python)
All Python modules are written in Python, right? Well, no. Many Python modules are hybrids, partially written in C/C++, in order to use existing non-Python libraries. Such modules have a layer of Python that makes calls to compiled C. The compiled C can be created by hand, or generated by a tool such as SWIG or Cython. These modules are called extension modules.
Packages are used to organize modules. A package is a folder that contains one or more modules or other packages. The special module __init__.py may be created in each package folder to initialize the package. In Python 2, __init__.py must be present in a package; in Python 3, it is optional.
“Framework” is not an official term like “module” or “package,” but is used to describe a collection of related packages that provide tools for accomplishing a particular task. Some important Python frameworks include Django (web development), Flask (web development), Twisted (async networking), and SciPy (scientific and engineering development).
You can think of a framework as a megapackage.
Mommy, where does CPython come from?
CPython is available from several different sources. While python.org is the official source, different companies have created distributions of Python that bundle extra features with the default version. All of these bundles provide CPython and the standard library; they provide enhanced installers and a larger selection of Python packages.
This site provides the “standard,” “normal,” “reference,” and “everyday” version of Python, as described above. The documentation for Python 2 and 3, both current and previous releases, are also available here, as well as many other Python resources.
Anaconda is a scientific Python distribution and is by far my favorite way to get Python. It has a simple but powerful graphical installer, it adds 200 useful modules right out of the box, and it has a simple tool for selectively installing about 100 additional modules. There are installers for Windows, Mac, and Linux, and for both Python 2 and Python 3.
Anaconda is provided for free from Continuum Analytics, which also has commercial Python-related products.
I highly recommend this distribution, as it contains everything you would get from python.org, plus many packages you don’t even know you need yet.
Get Anaconda from http://continuum.io/download.
Canopy Express is a free scientific Python distribution developed by Enthought. The scientific, engineering, and big data community have a lot to thank Enthought for, as they have generously subsidized development of the SciPy ecosystem. Like Continuum, they also have paid versions of Canopy and other tools. If you want to use any of the Enthought tools, get Canopy Express, otherwise stick with Anaconda.
Get Canopy Express and other Enthought tools from https://store.enthought.com/
Python(x, y) is a scientific Python distribution that features the Spyder IDE and the PyQt4 package for GUI programming. As a subset of Anaconda, it only supports Python 2, and only supports Windows. I do not recommend it at this time. For more information, see http://python-xy.github.io/.
ActivePython is one of the family of open source languages that are packaged by ActiveState. This company was founded in 1997, when Perl was very popular, to make a Windows-friendly distribution of Perl. They eventually branched out to other languages and other platforms, so that binary installers for ActivePython, ActivePerl, and ActiveTCL are available for Windows, Mac, Solaris, HP-UX, and AIX.
There is currently no compelling reason to install ActivePython unless you already are using other ActiveState languages, and even then it is not necessary.
There are many reasons you might end up with more than one installation of Python on your machine. You might need to support both Python 2-based and Python 3-based applications. I teach python, so I currently have five or six versions installed on my main laptop, and several versions each on other platforms.
I found out the hard way that managing them manually is a pain. I tried changing my PATH environment variable, making symbolic links, defining aliases, and unholy mixtures of all three.
Fortunately, some programmers created pyenv, which makes managing multiple installations of Python very simple.
Pyenv is a command line tool which will install any of a large list of Python implementations (189 currently on my MacBook Pro). Once you have installed two or more implementations, you can select which one is the “global” implementations. The Python interpreter itself, and scripts such as pip, pydoc, and 2to3, will be executed from the current implementation. You can easily switch to another implementation.
For advanced users, you can create “local,” or folder-specific implementations.
Here is what pyenv looks like in use:
$ pyenv versions system 2.6.8 2.7.10 3.5.0 3.6-dev anaconda-2.3.0 * anaconda3-2.3.0 (set by /Users/jstrick/.pyenv/version) pv-2.7.10 pypy-2.6.1 venv-speedtest $ python -V Python 3.4.3 :: Anaconda 2.3.0 (x86_64) $ pyenv global 3.6-dev $ python -V Python 3.6.0a0 $ pyenv global 3.6-dev $ pyenv global 2.7.10 $ python -V Python 2.7.10
Installing Modules and Packages
The primary tool for installing modules and packages is pip, the Python Installation Program (or, some say, “Python Installs Python”). It is beyond the scope of this article to explain all the features of pip, but generally speaking, if you are connected to the Internet, you can install nearly any Python module from the command line by saying
pip install module-or-package
This is sweeping some details under the rug, but for most packages, on most platforms, it works just fine.
For help installing pip, see https://pip.pypa.io/en/latest/installing/.
For more information on using pip, see http://python-packaging-user-guide.readthedocs.org/en/latest/installing/.
Note: pip is already installed if you’re using Python 2 >=2.7.9 or Python 3 >=3.4
What did we learn today?
When you say “I am a Python programmer,” it means you can write Python code using any of the Python interpreters described in this article.
You will be in good company. According to the Tiobe Index, Python is the 4th most popular language in the world today, and the first that is not C or one of its close derivatives.
However, when you say “I just installed Python,” it can mean many different things. The good news is that Python is Python, language-wise, so you can install the implementation that best meets your needs.
Author: John Strickler, one of Accelebrate’s Python instructors
Accelebrate offers private, on-site Python training.