Style Guide for Python Code

The browser uses very few Python scripts. Most are one-shot scripts that were used when building a track. We archive 
them in this repo but do not run them a lot anymore.

CGIs in Python

We have only 1-2 CGI scripts in Python (e.g. hgGeneGraph and hgMirror, which
runs only on GBIB) and they do not get a lot of usage. However, they do exist
and the pyLib directory contains hgLib3.py with ports of e.g. the menu,
bottleneck, cart parsing and cgi argument parsing, often with the same function
names as their kent C equivalents. So writing CGIs in Python is possible, as 
long as they are not computationally intensive. We are not using a special Python
webserver, we are running CGIs so far like we run C programs, this costs us 200 msec
at startup, but makes management on our web servers much easier. For the two CGIs,
it's certainly sufficient.

PYTHON VERSIONS

Python2 is not used anymore anywhere, and Python3 is now required. The problem of
version incompatibility is vexing in Python, even sometimes among the 3.x
versions. You can usually work around it buy sticking to the basic Python 3.6
or so features and not using the very advanced features. Testing on a very
recent Python version can help. hgLib3.py uses one single external package, the
MySQL library, which comes with it. It should be possible to not 

CALLING C CODE

It's possible to call C library functions directly from Python. But in practice
we only call C binaries via exec(), because of the memory management issue. If you
find yourself doing a lot of that, it may be better to write C directly.

CODE CONVENTIONS

INDENTATION AND SPACING

Each block of code is indented by 4 spaces from the previous block. Do not use
tabs to separate blocks of code. The indentation convention differs from the C coding style
found in src/README, which uses 4-base indents/8-base tabs. Common editor configurations 
for disallowing tabs are:
    
    vim:
        Add "set expandtab" to .vimrc

    emacs:
        Add "(setq-default indent-tabs-mode nil)" to .emacs

Lines are no more than 100 characters wide.

INTERPRETER DIRECTIVE

The first line of any Python script should be:
    #!/usr/bin/env python3

This line will invoke python3 found in the user's PATH. It ensures that scripts developed 
by UCSC can be distributed and explicitly states which Python version was used to develop the scripts.

The kent repo contains a few Python2.7 scripts. These are mostly archived
versions of scripts that are not run anymore.
     
NAMING CONVENTIONS

Use mixedCase for symbol names: the leading character is not capitalized and all
successive words are capitalized. (Classes are an exception, see below.) Non-UCSC
Python code may follow other conventions and does not need to be adapted to
these naming conventions.   

Abbreviations follow rules in src/README:

    Abbreviation of words is strongly discouraged.  Words of five letters and less should 
    generally not be abbreviated. If a word is abbreviated in general it is abbreviated 
    to the first three letters:
       tabSeparatedFile -> tabSepFile
    In some cases, for local variables abbreviating to a single letter for each word is okay:
       tabSeparatedFile -> tsf
    In complex cases you may treat the abbreviation itself as a word, and only the
    first letter is capitalized:
       genscanTabSeparatedFile -> genscanTsf
    Numbers are considered words.  You would represent "chromosome 22 annotations"
    as "chromosome22Annotations" or "chr22Ann." Note the capitalized 'A" after the 22.


Imports

The most correct way to import something in Python is by specifying its containing module:
    import os
    from ucscGb import ra
 
    Then, the qualified name can be used:
        someRa = ra.RaFile()
   
    Do not import as below, as this may cause local naming conflicts:
        from ucscGb.ra import RaFile
        from ucscGb.track import *

Imports should follow the structure:
        
    1. Imports should be at the top of the file. Each section should be separated by a blank line:

        a. standard library imports

        b. third party package/module imports

        c. local package/module imports

For more information, see the "Imports" section:
    http://www.python.org/dev/peps/pep-0008/

Classes

CapitalCase names. Note the leading capital letter to distinguish between a ClassName and 
a functionName. Underscores are not used, except for private internal classes, 
where the name is preceded by double underscores which Python recognizes as private.

Methods

mixedCase names. The leading character is not capitalized, but all successive words are 
capitalized. Underscores are not used, except for private internal methods, 
where the name is preceded by double underscores which Python recognizes as private.

In general try to keep methods around 20 lines.

Functions

mixedCase names. The leading character is not capitalized, but all 
successive words are capitalized.

In general try to keep methods around 20 lines.

Variables

mixedCase names. Underscores are not used, except for private internal variables, 
where the name is preceded by double underscores which Python recognizes as private.

COMMENTING

Comments should follow the conventions:

    1. Every module should have a paragraph at the beginning. Single class modules may 
        omit paragraph in favor of class comment.

    2. Use Python's docstring convention to embed comments, using """triple double quotes""":
       http://www.python.org/dev/peps/pep-0257/

TESTING

Testing can be carried out using the unittest module in Python:
    http://docs.python.org/library/unittest.html

This module allows for self-running scripts, which are self-contained and should provide
their own input and output directories and files. The scripts themselves are composed of 
one or more classes, all of which inherit from unittest.TestCase and contain one or more 
methods which use various asserts or failure checks to determine whether a test passes or not.

    Structure:

        1. At the start of a script import unittest module:
            import unittest

        2. A test case is created as a sub-class of unittest.TestCase:
            class TestSomeFunctions(unittest.TestCase):

        3. Test method names should begin with 'test' so that the test runner is 
           aware of which methods are tests:
            def testSomeSpecificFuntion(self):

        4. Define a setUp method to run prior to start of each test.
            def setUp(self):

        5. Define a tearDown method to run after each test.
            def tearDown(self):

        6. To invoke tests with a simple command-line interface add the following lines:
            if __name__ == '__main__':
                unittest.main()
           
    For other ways to run tests see:
                http://docs.python.org/library/unittest.html
