python notes

[TOC]
Python features a dynamic type system and automatic memory management and supports multiple programming paradigms.[1]

Python Syntax

Python doc

Python doc

builtin function

builtin functions (import builtins)

abs() dict() help() min() setattr()
all() dir() hex() next() slice()
any() divmod() id() object() sorted()
ascii() enumerate() input() oct() staticmethod()
bin() eval() int() open() str()
bool() exec() isinstance() ord() sum()
bytearray() filter() issubclass() pow() super()
bytes() float() iter() print() tuple()
callable() format() len() property() type()
chr() frozenset() list() range() vars()
classmethod() getattr() locals() repr() zip()
compile() globals() map() reversed() __import__()
complex() hasattr() max() round()
delattr() hash() memoryview() set()

dict

  • mydict.items() 将项以list返回
    for key,val in myd3.items():

  • check if dict has element
    Note: I think this method is not easy to read for noobs.
    为了方便记忆,可以将bool()读作有东西,例如:
    if bool(mydict):可以独坐如果mydict中有东西;
    if not bool(mylist):可以独坐如果mylist中有东西

    1
    2
    3
    >>> mydict = {}
    >>> bool(mydict)
    False
  • iteriterms()返回迭代器,节省内存 Only for python2

  • dict.pop()
    pop(key[,default])
    key: 要删除的键值
    default: 如果没有 key,返回 default 值

    1
    mydict.pop('mykey', None)

PS: 或者也可以:

1
2
if key in mydict:
del mydict[key]
  • dict comprehension

    • filter a dictionary according condition
      {k: v for k, v in points.items() if v[0] < 5 and v[1] < 5}
  • dict.get()
    doc:
    dict.get(k[,d]) -> dict[k] if k in dict, else d. d defaults to None.
    Note taht if k is not in dict, dict.get won’t add k to dict

  • dict initialize
    Following two ways are equivalent:

    1
    2
    myd = dict(age=27)
    myd = {"age":27}

tuple

1
2
3
my_tuple = ('haha',) # This is the right way to initialize an one-element tuple
var = ('haha')
type(var) # result is string

set

交集,并集,差集,对称差集:

1
2
3
4
s1 & s2
s1 | s2
s1 - s2
s1 ^ s2

max()

max()

max(iterable, *[, key, default])
max(arg1, arg2, *args[, key])

Return the largest item in an iterable or the largest of two or more arguments.

If one positional argument is provided, it should be an iterable. The largest item in the iterable is returned. If two or more positional arguments are provided, the largest of the positional arguments is returned.

There are two optional keyword-only arguments. The key argument specifies a one-argument ordering function like that used for list.sort(). The default argument specifies an object to return if the provided iterable is empty. If the iterable is empty and default is not provided, a ValueError is raised.

If multiple items are maximal, the function returns the first one encountered. This is consistent with other sort-stability preserving tools such as sorted(iterable, key=keyfunc, reverse=True)[0] and heapq.nlargest(1, iterable,key=keyfunc).

all()/any()

Return True if all elements of the iterable are true (or if the iterable is empty). Equivalent to:

1
2
3
4
5
def all(iterable):
for element in iterable:
if not element:
return False
return True

eval/repr

reprobj转化为字符串格式。
obj==eval(repr(obj))
example of repr:

1
2
3
>>> dict = {'runoob': 'runoob.com', 'google': 'google.com'};
>>> repr(dict)
"{'google': 'google.com', 'runoob': 'runoob.com'}"

filter

把传入的函数依次作用于每个元素,然后根据返回值是True还是False决定保留还是丢弃该元素。
python3起,filter 函数返回的对象从列表改为 filter object(迭代器)。

1
2
3
4
def is_odd(n):
return n % 2 == 1
[item for item in filter(is_odd, [1, 2, 4, 5, 6, 9, 10, 15]) ]

getattr()/setattr()/hasattr()/delattr()

getattr(x, ‘foobar’) is equivalent to x.foobar

All these function are similar. See docs for more.

enumerate

如果mylist是一个二维数组:

1
2
for i,line in enumerate(mylist):
...

line是元组

next(iterator[, default])

print

  • print(end='str')

staticmethod(function)

PS:

@classmethod means: when this method is called, we pass the class as the first argument instead of the instance of that class (as we normally do with methods). This means you can use the class and its properties inside that method rather than a particular instance.
@staticmethod means: when this method is called, we don’t pass an instance of the class to it (as we normally do with methods). This means you can put a function inside a class but you can’t access the instance of that class (this is useful when your method does not use the instance).

classmethod涉及类,不涉及类的实例;staticmethod两者都不涉及,但与类有紧密的联系
ref:

sum()

sum(iterable[, start]) ,iterable

vars()

From python doc:

Return the __dict__ attribute for a module, class, instance, or any other object with a __dict__ attribute.

Python——类属性/实例属性

Python——类属性/实例属性
Python类属性,实例属性 #非常好

1
2
3
4
5
6
C.__name__ # 类C的名字(字符串)
C.__doc__ # 类C的文档字符串
C.__bases__ # 类C的所有父类构成的元组
C.__dict__ # 类C的属性
C.__module__ # 类C定义所在的模块
C.__class__ # 实例C对应的类
  • module.file
    含有module的路径!

file

  1. f.readline()/readlines()/write()/writelines()

    • readline()#每次读入一行
    • readlines() #以list的形式存储每一行
    • write() #自动换行
    • writelines() #不自动换行
  2. f.closed
    注意closed是一个变量,值为True/False

  3. for line in f:用法
    下面程序将打印出f的内容

    1
    2
    for line in f:
    print(line,end='')

list comprehension

  • two fold list comprehension
    1
    2
    content = f.readlines()
    word_list=[word for line in content for word in line.split()]

语法结构的理解:
list comprehension是将:

1
2
3
for ...
for/if ...
do something ...

写成为:
do something ... for ... for/if ...

string

1. strip(s[, chars]) 去掉首尾的字符

默认情况下strip() 去掉首尾的whitespace 【whitespace include \n, \t and \r】。

The charsargument is not a prefix or suffix; rather, all combinations of its values are stripped:

1
2
>>> 'www.example.com'.strip('cmowz.')
'example'

2. format

  • significant digital(有效数字)

    1
    2
    3
    4
    5
    6
    7
    8
    >>> format(12.456789, '.3g')
    '12.5'
    >>> format(12.456789, '.3f')
    '12.457'
    >>> format(12.456789, '.3e')
    '1.246e+01'
    >>> format(12.456789, '.3%')
    '1245.679%'
  • '{0:.2f} {1:s} {2:d}'.format( v0[,v1[v2...] )

  • '{:,.2f}'中逗号表示较大的数字用逗号分隔,比如100万: 1, 000, 000

  • 可以使用关键词

  • 应用时转化:{!s}、{!r}

  • 旧式字符串格式化符号是 %,如:{0:%.4f}

custom object(define __format__ in a class)
1
2
3
4
5
class HAL9000(object):
def __format__(self, format):
if (format == 'open-the-pod-bay-doors'):
return "I'm afraid I can't do that."
return 'HAL 9000'
datetime
1
2
from datetime import datetime
'{:%Y-%m-%d %H:%M}'.format(datetime(2001, 2, 3, 4, 5))

Another Example:

1
2
3
# expiry and timestamp is string like 2018-06-29T12:00:00.000Z and 2018-04-19T18:44:44.320Z
FMT = '%Y-%m-%dT%H:%M:%S'
time_delta = datetime.strptime(expiry, FMT) - datetime.strptime(timestamp, FMT)
named placeholder

use key to hold the place. input the dictionary as elements

1
2
3
4
data = {'first': 'Hodor', 'last': 'Hodor!'}
'{first} {last}'.format(**data)
# or
'{first} {last}'.format(first='Hodor', last='Hodor!')
Getitem and Getattr
Parametrized formats

3. rjust(width[, fillchar]) 【ljust(), center() is similar】

4. zfill(n) 左侧填充0至n位

返回一个原字符串右对齐,并使用空格填充至长度 width 的新字符串。如果指定的长度小于字符串的长度则返回原字符串。

5. split(str="", num=string.count(str)) # str and num is not a kewword

  • num – 分割次数。
  • 注意split后常会产生空字符

6. join

1
2
out.write(" ".join(mylist))
# 用" "链接mylist中的iterm
  • elegant use
    1
    out.write(" ".join( map(str,iterable) ))

7. encode()/decode()

  • str.encode(encoding=“utf-8”, errors=“strict”)

    设置不同错误的处理方案。默认为 ‘strict’,意为编码错误引起一个UnicodeError。 其他可能得值有 ‘ignore’, ‘replace’, ‘xmlcharrefreplace’, ‘backslashreplace’ 以及通过 codecs.register_error() 注册的任何值。

or / and operator

ref: How do the “and” and “or” operators work in Python?
two additional functionality:

  • short circuiting(短路求值)
    1
    2
    if param is not None and param['foo'] == 3:
    pass

becasue of short circuiting, even if param doesn’t exist, param['foo'] won’t report error.

  • Objects can have a boolean value

    Objects can have a boolean value, and boolean operators respect that.

    1
    2
    3
    my_list = ["abc", "123"]
    ["abc", "123"] or []
    >>> ["abc", "123"]
  • short circuiting when objects have a boolean value
    Still take a list as an example:

    1
    2
    3
    4
    5
    [] and ["abc", "123"]
    >>> []
    #NOTE:
    [1, 2] and ["abc", "123"]
    >>> ["abc", "123"]

ternary operator

1
result='5 is larger than 3' if 5>3 else '5 is not larger than 3'

deep copy

  • import copy
  • dict.copy(mydict)

CLI

  • python -c "print('hello world')"
  • python -m mymodule
    1. sys.path is changed
    2. it’s equal to python mymodule.py

other useful things

try-exception

ref: Is it a good practice to use try-except-else in Python? (Answer is: Yes)

  1. 这种用法是标准用法

  2. so-called race-condition
    如果用if语句,错误可能出现在你检测之后,问题才出现(比如你检查磁盘是否有足够空间,可能你检查之后,磁盘又没有空间了)。try-exception则没有这个问题。

  3. try-exception is very efficient
    CPython already implements code for exception checking at every step, regardless of whether you actually use exceptions or not

  4. finally: 之后的语句一定会被执行,哪怕之前已经执行了return语句! [2] [3]

    finally will execute no matter what, even if another line is being evaluated with a return statement.

e.g.[3]:

1
2
3
4
5
6
7
8
try:
try_this(whatever)
except SomeException as the_exception:
handle_SomeException(the_exception)
else:
return something()
finally:
return True

NOTE: 这个过程中,something()将会被调用执行,但不会被返回!因为finally劫持了程序,最终返回值为True.

  1. raise or raise ... from! Don’t raise Error
    Always raise (not raise Error) because in this way stacktrace will be kept.
    Or raise ... from, this can preserve the backtrace with exception chaining too:

    1
    2
    3
    4
    5
    6
    7
    try:
    try_this(whatever)
    except SomeException as the_exception:
    handle(the_exception)
    raise # simplly raise it to upper level. This can keep stacktrace
    # or do it like this:
    # raise DifferentException from the_exception
  2. args of error
    e.g.:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    try:
    raise ValueError("Bala bala", 3.14)
    except ValueError as e:
    print(e.args)
    # output is "Bala bala", 3.14
    # print traceback
    print(traceback.format_exc())
    # or
    print(traceback.format_tb( sys.exc_info()[2] ))

注意raise,没有写raise相当于没有错误发生!例子如下:

1
2
3
4
5
try:
ValueError("Bala bala", 3.14)
except ValueError as e:
print(e.args)
# no output
  1. traceback
    1
    2
    3
    4
    5
    6
    try:
    raise ValueError("Bala bala", 3.14)
    finally:
    print(sys.exc_info())
    # exc_info contains three object: a type object, a Error object and a traceback object
    # Note: if Error is catched by 'except' statement, then in 'finally:' statement `print(sys.exc_info())` has no output

best practice

ref: Error codes vs exceptions: critical code vs typical code
如果你真的对可能出现的错误了如指掌,这时候采用error code;一般情况下,都要用exception,它使你很容易debug。

TIPS

Raising exceptions when an exception is already present in Python 3

ipython

  1. 入门
  • “?” 帮助与显示信息
    ?save 会给出save命令的用法、对象的签名
    ??your_function 显示源代码
  • !pwd 加!执行shell command
  • %hist
  • %edit 使用编辑器打开
  1. %save your_filename 1-30

conda

  • minicoda
    # pip install conda # does not work!
    You have to download miniconda to use conda as package manager.

  • 使用conda管理python环境

  • anaconda
    Anaconda使用总结

  • conda install scipy #安装scipy

    • conda install --download-only ipython-notebook
    • conda install --offline -f ***.tar.bz2 离线安装
  • conda list #列出已安装的包

  • 环境
    conda create -n env_name python=2.7 # create new env
    conda env list # list all env
    conda env export > environment.yml # export to file
    source activate your_env_name # enter your env.
    source activate root # to switch back

  • update python major version
    Search a lot by google and finally I just rf -rf ~/Anaconda and re-install it.
    What was said in stackoverflow: How do I upgrade to Python 3.6 with conda? is true. It’s very hard to update python major version in-place.

  • upgrade pip in anaconda!
    pip install --upgrade your_package
    此时pip会试图更细,并删掉老版本pip。
    然而mac上更新pip是需要root权限的!这就导致老版本被删掉,而新版本没有被安装!
    解决:easy_install pip

pip

  • Usage:pip --help
    for more info
  • install
    centos7下使用yum安装pip

    首先安装epel扩展源:
      yum -y install epel-release
      更新完成之后,就可安装pip:
      yum -y install python-pip
      安装完成之后清除cache:
      yum clean all

  • 更新
    pip install --upgrade your_package

error

  • Could not fetch URL https://pypi.python.org/simple/pytest-cov/

    1
    2
    3
    Could not fetch URL https://pypi.python.org/simple/pytest-cov/: There was a problem confirming the ssl certificate: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:600) - skipping
    Could not find a version that satisfies the requirement pytest-cov (from versions: )
    No matching distribution found for pytest-cov

    解决:

    1
    pip install --trusted-host pypi.python.org pytest-xdist

My Snippet

  • translate in one line:
    Translate word by word: [ your_dict[i] if i in your_dict else i for i in some_words ]
  • string to number: commas as thousands separators
    1
    2
    3
    4
    import locale
    locale.setlocale( locale.LC_ALL, 'en_US.UTF-8' )
    locale.atoi('1,000,000')
    locale.atof('1,000,000.53')

工具

工程能力

OOP

  • 工厂函数
    见python核心编程:工厂函数看上去有点像函数,实质上他们是类,当你调用它们时,实际上是生成了该类型的一个实例,就像工厂生产货物一样.
    e.g., list(), set()就是工厂函数,输入Iterable对象,它们就会生产出list或set的实例。

  • 关联,聚合,组合三者的异同[4] [5]
    相同点:都是说两个类的关系,下面以class A和class B为例。
    不同点:A和B之间的依赖程度不同

    关联 聚合 组合
    依赖程度 A可以没有B
    B可以没有A
    A可以没有B A不可以没有B
    例子 B是A的朋友
    AB为关联关系
    B是A的衬衫
    AB为聚合关系
    B是A的心脏
    AB为组合关系
  • __init__
    Some people may think setting all instance variables in the __init__ is cleaner.
    However, notice that setting a class attribute will make this attribute accessible in the class, even if you just want this attribute accessible to some method.
    So my point is instance and initialize variables unless you need it.

  • TIPS

    • UML(Unified Modeling Language)本身很少有应用了,但是提出的概念广为人知,并且很有用。

best practice

  • my_return= my_funct(para=para)是可以的
    上面左边的para是参数名,右边的是变量名,这样写是可以的,而且是明了的。

debug

要融汇的方法[3]

回答先在本地重现的就算了吧……那么容易就能重现通常说明最基本的代码逻辑覆盖测试都没做好。相比起C/C++来说,动态语言还是比较幸福的,异常都有详细的堆栈,只要打印到日志里就行了,错误信息通常也比较明确。要点在于该打印的日志一点都不能少,严禁在出现异常的时候只打印错误信息而不打堆栈。但归根结底来说,发现和解决bug靠的是良好的程序结构,必要的defensive(关键函数的参数合法性校验等),自动化的测试流程,线上调试只是亡羊补牢。

test

pytest比较好!从它入门 !
unittest vs pytest vs nose
Pytest vs Unittest vs Nose 【详细的对比】
待读!:
Writing unit tests in Python: How do I start?
Improve Your Python: Understanding Unit Testing
python自动化测试 【先读】
最完整的自动化测试流程

TIPS

unrar file with password in python?

IT IS VERY HARD! STOP IT.
Just run unrar x 001.rar -ppassword in shell!

  • pyunpack + patool + unrar (sucks!)
    To unrar a file, you need install 2 python lib and 1 shell command(unrar).
    And it comes to an end that you can’t unrar files with password!

python缺点

  • efficiency
    • slow than java
    • global lock, which makes multi-threads suck
  • hard to distribution (compare to JAVA et al., python is dependent to package)
  • easily decompiled
    • workaround
      Encapsulate the whole program to a web service; compile by cpython? (I’m not sure how this works)

和其他语言的对比

  • ruby
    ruby最大的优势在于Ruby on Rails

我的经验

  • 都说python有很多包、方便,然而包里有可能有很多坑(bug or bad practice),比如Pillow中遇到过**karg的滥用。这些特点使得python很容易开发原型,但很难构建稳定、高效、一致的大型应用。
  • 有不少不符合直觉的"feature"[2]
    例如:
    1
    2
    3
    4
    5
    6
    7
    >>> a = ([1], [2])
    >>> a[0] += [3]
    Traceback (most recent call last):
    File "", line 1, in
    TypeError: 'tuple' object does not support item assignment
    >>> a
    ([1, 3], [2])

reference


  1. Python

  2. 有哪些明明是 bug,却被说成是 feature 的例子?

  3. 老程序员解bug有那些通用套路?

  4. N Randhawa’s answer

  5. 关联,聚合,组合

谢谢~