6 Best Practices to make your python project production level

Balakrishnan Sathiyakugan
7 min readFeb 10, 2022

--

“Any fool can write code that a computer can understand. Good programmers write code that humans can understand.”

I have recently developed a python authentication SDK for the WSO2 Identity Server and mentored an anomaly detection project for an intern. Currently, I do most of the data science tasks in python. While working on python, We have in-cooperated specific standards to the projects that helped meet production level standards. I wanted to share some key learning from them. I hope it will help build stable, readable, and extendable production-ready code for python developers.

I have structured this blog into the following sections.

  1. Keep it modular, clean & efficient
  2. Optimizing your code
  3. Logging and Instrumentation
  4. Testing
  5. Documentation
  6. Version control and Code review

1. Keep it modular, clean & efficient

When you work in the industry or contribute to an open-source, there are higher chances that your code will be running on the production. The code should be reliable, efficient, and bug-free, but this is not always the case. We should break the code logically into functions and modules.

Modular code is a complex code broken down into smaller parts, i.e., functions or modules. We get functions when we break the code into smaller functional groups to perform a specific task. We get modules (or python files) when we group these functions. We can create and test these modules independently. In many cases, these modules are reusable in other components.

Tips for writing modular code.

  • DRY (Don't Repeat Yourself)

Suppose you have developed a code segment to read the files, don't repeat that code in another place. Try to generalize and consolidate repeated code in functions or loops.

For an example:

Bad code

name = 'kugan'
if name == "kugan":
print 'Hi'
if name == "sathy":
print 'Hi'
if name == "sathiyakugan":
print 'Hi'

We can better rewrite this code like this,

name = 'kugan'
people = ['kugan', 'sathy', 'sathiyakugan']
if name in people:
print 'Hi'
  • Abstract out the logic to improve readability

The above principle improves readability with descriptive function names; you can often go in different directions. Although your code can become more readable when you abstract out logic into functions, it is possible to over-engineer this and have way too many modules, so use your judgment.

  • Minimize the number of entities (functions, classes, modules, etc.)

There are trade-offs to having function calls instead of inline logic. If you have broken up your code into an unnecessary amount of functions and modules, you'll have to jump around everywhere if you want to view the implementation details for something that may be too small to be worth it. Creating more modules doesn't necessarily result in effective modularization.

  • Functions should do one thing.

Each function should focus on doing one thing. The way to know that a function is doing more than "one thing" is if you can extract another part from it with a name that is not merely a restatement of its implementation.

3. Optimising your code

I will say that there is always an easy and elegant way to do anything in Python. Search for it before you write code. Visiting StackOverflow is alright. Always Remember: Creating a function for what already is provided is not pythonic.

Here are a couple of tricks I found useful while I’ve been writing the code.

  • collections.defaultdict():
    Usually, a Python dictionary throws a KeyError if you try to get an item with a key that is not currently in the dictionary. The defaultdict in contrast will simply create any items that you try to access (provided of course they do not exist yet).
    The input can be of types list or bool or int or a custom class. This will create a dictionary with values defaulted as input. For example, >>vocab = collections.defaultdict(list)
    and then if you do
    >>vocab[‘kugan’]
    it will return [].
    However, although keys are assigned default values, the key in dictionary operations still works as usual. For example,
    >>“b” in vocab will still return False. In this way, you can avoid the key checks in your
  • zip(iterable1, iterable2): will put these two iterables together according to index. For example, a = [1,2,3], b = [4,5,6], list(zip(a,b)) will return. i.e. a list of tuples. Note that zip() it itself returns a zip object, we need to enclose that in a list() or set() normally to get it to work. Accordingly, you can also do the reverse operation i.e. unzip by putting an asterisk before the zipped object. For example, list(zip(*zip(a,b))) will give back [(1, 2, 3), (4, 5, 6)]
  • For string concatenation, “”.join() is faster than +=. For example, a = “hello”, a += “ world” is slower than “ “.join([“hello”, “world”]). We need to enclose the two strings as a list in the input. You can use all of the normal slice operations on your list of strings, for insertions, deletions, and modifications. The performance for appending is pretty decent. visit here to learn more about other efficient ways of string concatenation in Python
  • When you are using for loop, while iterating, instead of doing for i in range(len(arr)) we can just do for i, elemeent in enumerate(arr).
  • For binary search, there’s a quick method bisect, bisect_left, bisect_right which returns the index to which we can insert the number provided the array is already sorted.
  • collections.deque() a generalization of stacks and queues, support append and pop in either end in O(1) time.
    We can use this method on the following real-world application
    - In undo operations on the software.
    - To store history in browsers.
    - For implementing both stacks and queues.
  • heapq , priority queue implementations that you can use for many problems that involve finding the best element in a dataset, they offer a solution that’s easy to use and highly effective.

2. Logging

We usually print the outputs to inspect whether our logic is as expected or not. However, we may face multiple issues in the production environment if we use print instead of logging. Here is the reason why it is not a good practice.

  1. Suppose your code does not access the console; the print statement fails.
  2. Including additional logging information is not accessible.
  3. The print statement only displays messages on the console. Recording logging data inside a file or sending it over the internet needs additional work.

Log messages can help you understand more about the context in those results that occurred. When you use logging, choose the appropriate level.

Debug: anything that happens in the program. 
Error: any error that occurs.
Info: all user-driven or system-specific actions, such as regularly scheduled operations.

To know more about logging, refer to this article

3. Testing

Testing your code is essential before deployment. It helps you catch errors and faulty conclusions before they make any significant impact. Testing is a crucial aspect of software engineering, but it is not valid for data scientists. Today, employers look for data scientists to properly prepare their code for an industrial setting, including testing their code.

Here are some resources to learn about testing

4. Documentation

Documentation is an additional text or explained information that comes with or is embedded in the code of the software. It helps explain complex parts of code, making your code easier to navigate and quickly conveying how and why different components of your program are used. Several types of documentation can be added at various levels of the program.

1. Inline comments — line level
2. Docstrings — module and function level
3. Project documentation — project level

Here are some resources to learn about the documentation

Here are a few READMEs from some popular projects:

5. Version control and Code review

At the moment, how do you maintain your code so that it's effortlessly available to you tomorrow? Do you keep a backup of your code? Can you easily share your code with other developers? Do you have a file called 'v1', 'v2', 'v3', within and within? Is your finished code file called 'final1', 'final2, final3', 'Latest Final'? This is a stressful way to work. Yes, version control systems help us manage changes to a file. There are a lot of version control systems such as Git and GitHub as our version control solution.

Here are some resources to learn about version control systems

The following resources offer valuable methods and tools for managing model versions and large amounts of data.

Code reviews

Code reviews are a common practice at work, and for a good reason. Reviewing each other's code can help catch errors, ensure readability, check that standards are being met for production-level code, and share knowledge among a team.

They are beneficial for the reviewer, the reviewee, and the team. Ideally, a programmer's code is reviewed by another programmer since there are specific errors and standards to check for, specifically in data science, such as data leakages, misinterpretation of features, or inappropriate evaluation methods.

Code reviews benefit everyone to promote best programming practices and prepare code for production. Let's go over what to look for in a code review and some tips on conducting one.

Here are some resources to learn about the code reviews

Finally, I would like to thank the Udacity course which helps me to gather the resources for this article.

I hope you found this helpful article. Thanks. Happy Coding.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Balakrishnan Sathiyakugan
Balakrishnan Sathiyakugan

Written by Balakrishnan Sathiyakugan

Microsoft Certified Azure Data Engineer & Data Scientist | Google CodeU | Patent Holder | Data Analytics

No responses yet

Write a response