The potential risks of coding in Python

Python questionmark logoWith the growing power of computers, scientists across all fields are starting to appreciate computer modelling and big data analysis as valuable tools for their research. A particularly favoured language is Python, since it is easy to get into compared to other, strongly typed and compiled languages, such as C++. Consequently, because of Python's popularity, there is a great number of code libraries and tools that one can use for free. I myself use Python for analysis and visualisation of simulation results (I write my simulations in C++ though). I have also been exposed to using Python during a couple of commercial projects.

While many like Python because of its ease of use and versatility, I will argue here that these can also be potentially dangerous when it comes to managing larger projects. I would like to illustrate the potential risks that one might come across, when attention is not paid to good coding practises, with the following analogy:

Imagine that you are writing an essay for a coursework during your degree. There are good and bad courses and there are good and bad teachers. Coding in Python can become like writing a really bad essay and getting away with it. Here is why:

1. No structure

No structureYou are allowed to write your essay with any structure you want. You skip an introduction and go straight to your main arguments. You then mention a couple of references in the end. You get a B, which in your mind validates this approach to writing essays.

In Python, it is very easy for your code to become a mess, especially if the fluidity that Python allows is misused. There is no pre-scribed structure to a Python program - you can write in-line code, you can put it into functions or into classes. What can end up happening in bigger projects is that there is a mixture of everything. This makes reading and maintaining complex programs really difficult.

2. You are allowed to redefine words and say things that are not true

You are allowed to redefine words and say things that are not trueIn your essay on African animals, you say that termites are social insects and than rhinos are mammals. A few pages later, you claim that termites are really stones and that rhinos can fly. You get an A - what a creative way of thinking about reality!

In Python, a variable can change its data type during run time. Also, a parameter to a fuction can take on any data type, determined at a run time. Consider the following code snippet:

def whichWeekDayIsIt(date):
   A function that returns name of the week
   day, based on a date.

It is simple enough to understand the function signature right? Yes, until someone changes the body of the function to cover cases when date is not a single date, but a list of dates. You can then end up calling the function as:

myDay = whichWeekDayIsIt(date = [01022016, 02022016])

The parameter name 'date' looses its original meaning. A similar thing would happen if you changed the parameter name to 'dates' and then wanted to call it with a single date, not a list. Because variables have no datatypes, you either have to call variables in a very generic way, defeating the purpose of naming them, or call them in a way that is only partially true.

The second problem with having no data type is that when it comes to passing variables into functions, you have to rely purely on documentation of that function to determine what you should pass in (which, more times than I am comfortable with, is missing in open-source libraries). Alternatively, you have to look into the body of that function to see how a particular parameter is used and to deduce what data type your variable should have. This is sometimes really difficult.

3. The meaning of words changes based on the amount of spaces in front of them

The meaning of words changes based on the amount of spaces in front of themEveryone makes mistakes during typing. But in the essay for this course, a word " dog" that has one space in front of it means a dog, while the phrase "  dog" (two spaces) means a flying saucer. Unfortunately, your word editor does not highlight multiple spaces as an error, which means that you end up writing about UFOs, while what you really wanted to do is describe your pet dog.

The fact that Python purely relies on indentation to determine which logic block a line of code belongs to makes it fairly easy to create bugs that are difficult to spot. Consider the following code snippet:

numOfDrivers = 0
for car in cars:
   print("car {}".format(car))
   numOfDrivers += 1

The number of drivers will be equal to the number of cars. Now imagine that you are editing the code and you make the following indentation mistake:

numOfDrivers = 0
for car in cars:
   print("car {}".format(car))
   print("something else")
numOfDrivers += 1

The number of drivers will now be 1. It is possible that you will not realise this mistake until your code delivers unexpected results. In the worse case scenario, you will never go back to your code and you will make incorrect conclusions based on your results. For example: "For every ten cars, there is one driver, we must have self-driving cars on the road!"

Let's compare this to what happens in other languages, like C++, Java, etc.:

numOfDrivers = 0;
for (i=0; i<cars.length(); i++) {
   print("car {}".format(car))
   print("something else")
numOfDrivers += 1;

Here, the same mistake of unindenting the line that counts the number of drivers was made. However, because the code has a stronger structure and uses the curly brackets { } to identify the code block in the for loop, the resulting number of drivers will still be correct.

4. The only feedback that you get is that you failed or passes your course

The only feedback you get is that you failed or passes your courseYou cannot ask your course coordinator about your research for the essay, because he is too busy or too lazy to answer his emails. You spend weeks writing your essay, submit it and fail the course when it's too late to change anything.

Python has no compiler. It means that you can simply run a Python program without the extra step of turning code into an executable file. Yes, it may be faster to develop (small) applications this way. But in my opinion, it actually makes you less productive in the long run. I have lost count of how many times my program failed after an hour or so of running, because of some simple typo that a compiler could have picked up straight away.

5. Lack of control over volume

Lack of control over volumeYou need to print your essay to hand it in and you are allowed any formatting you like. Because you have absolutely no idea about how printing works, your essay ends up being triple spaced, with headings that span across pages. When you print it, it takes up a whole room.

Python often leaks memory. Have you ever tried running a program that takes a few hours to execute and produces a lot of graphs using matplotlib? Python, like other languages that use a garbage collector, was designed based on the philosophy that it is the computer, not the programmer, that should decide how memory is allocated and when it is freed up.

What this means in practise is that it is very hard to optimise memory usage. Sure, things run smoothly for simple programs that show off a Hello world message and maybe calculate a few things. For a scientific application though, that requires complex computations and uses big data, this can become a problem very quickly.

6. Each sheet of paper is from a different manufacturer. Your printer dies horribly.

Your printer dies horriblyBecause you do not produce your own printing paper, and because paper is really hard to find in bulk, you have to go on the Internet and buy individuals sheets (so you can print out your thousand-page essay). Some paper manufacturers are very bad at their job but no one knows about it. One partcular sheet had stone fragments in it and broke your printer.

It is very common in Python to use libraries, mainly so that your application production is faster, or because some people truly know better how to write code. However, the curse of open-source and open-access applies to program libraries too. Not everyone knows how to code efficiently. Not everyone knows how to document things properly.

A Python application may be using 20 other libraries to do what it is supposed to. You end up installing libraries all the time, especially when you pick up someone else's project. Libraries often depend on other libraries than depend on other libraries. A lot of time might be required to make sure that your computer can run a complex Python program.

Secondly, most libraries are made by people who are simply trying to get by writing code for their own application. Some of those people are kind enough to want to share their code, but they often do not go the extra mile to make their code usable in general. Poor documentation, lack of naming conventions, and other bad coding practises are very common.

Conclusions and recommendations

Just like with any other programming language, there are things that Python is good at and things that it is bad at. I find it a great tool for writing initial exploration code, or small scripts to maintain large numbers of files and directories. Also, I use matplotlib extensively as it can produce some really nice graphs with relative ease.

However, because of the reasons mentioned above, care should be taken when writing Python code, so that when your project grows, you are able to maintain it with ease. I suggest doing the following:

  1. Decide how you would structure your code and stick to it - is this a functional program? Or do you want to rely on classes? If, however, mixing the two makes sense for some reason, then make sure that you put classes and modules into different Python packages.
  2. Name your variables in a way that makes it clear what they are used for.
  3. When accepting parameters into functions, make sure to include data type checks on top of those functions. Handle situations when a parameter value is of a different than desired datatype(s), for example by displaying a warning or throwing an exception.
  4. Create unit tests, i.e., separate functions that will test if the parts of your main program return expected outputs when given specific input.
  5. Use Cython to call optimised C++ functions whenever possible and appropriate
  6. When using open-access libraries, research multiple options. Make sure that you understand what a library does and that it executes faster than its alternatives.

Finally, learn about good coding practises and software sustainability in general and for Python. This will allow you to create your own set of tricks that will help you and others around you along the way.

Read more opinions