Python: Difference between revisions

From Elvanör's Technical Wiki
Jump to navigation Jump to search
 
(6 intermediate revisions by the same user not shown)
Line 8: Line 8:


  for a, b in zip(list1, list2):
  for a, b in zip(list1, list2):
* To use the condition ? a : b construct:
a if condition else b
* Converting a Decimal object to a standard (text) representation is surprisingly hard in Python. The normalize() method is extremely dangerous because Decimal("100").normalize() gives you Decimal("1E+2"). For this reason, normalize() should never be used as such. The correct way is to use one of the two following techniques:
d = d.quantize(Decimal(1)) if d == d.to_integral_value() else d.normalize()
d = d.normalize() + Decimal(0)


= Casts =
= Casts =
Line 37: Line 46:
= Classes =
= Classes =


* You don't need to explicitely declare class fields, as they are dynamically created the first time you assign them.
* You don't need to explicitly declare class fields, as they are dynamically created the first time you assign them.
* Every class method should take as a first argument "self".
* Every class method should take as a first argument "self".
* Accessing an object attribute, when the attribute does not exist, results in an exception. Code like:
* Accessing an object attribute, when the attribute does not exist, results in an exception. Code like:
Line 44: Line 53:


will not work as expected if myAttribute was not assigned. One easy solution is to define in the class initializer method (__init__) the attribute and set it equal to None.
will not work as expected if myAttribute was not assigned. One easy solution is to define in the class initializer method (__init__) the attribute and set it equal to None.
= Profiling & Memory =
== Getting memory usage from outside the Python process ==
* Use the following command, it will give you the % of RAM usage and the size of the memory used by the process:
ps -u | grep python
== Sizes ==
* A Python float uses 24 bytes, and you need an additional 8 bytes to hold a pointer to the object in a List for instance (there are no primitive types in Python, everything is an object). So that's 32 bytes per float in a List.
* If you want to optimize RAM, you should consider using numpy or other libraries to hold large amounts of floats.
== Basic profiling techniques ==
* Be careful about using tracemalloc; it needs to be initialized and will impact memory usage by a great deal (so for instance RSS results are no longer significant). Never run tracemalloc in production.
* A simple memory usage technique is to just use the following code. It will give you the memory used by your process.
import psutil
memoryInfo = psutil.Process().memory_info()
rss = memoryInfo.rss / 1024 ** 2
* Note that when using CPython, the interpreter does not always release memory it allocated to the OS. It allocates memory in chunks (called arenas) and if only part of an arena is used, the whole arena is never released. However, an arena is typically 256K and freed space in the arena will be reused by CPython so that's usually not an issue.
* Pympler can also be useful; it will give you the memory space occupied by a given Python object (taking account its references to other Python objects). Most basic usage would be:
asizeof.asized(orderBooks, detail=1).format()
* If you want information about an object that is an instance of a class, you should provide detail=2 so that Pympler will print out details about all the instance properties (it seems properties on a Python class instance are implemented with a dictionary, where the keys are the property names and values the values).
asizeof.asized(myObject, detail=2).format()
* It can also be very useful to exclude some properties (objects) from an asized() call. For instance:
sizer = asizeof.Asizer()
sizer.exclude_refs(myObject.sampleProperty, myObject.httpSession)
sizer.asized(myObject, detail=2).format()
== cProfile ==
* This profiler is built-in to Python. It produces profiling data related to the CPU usage / time spent in each method, and can thus be useful to locate performance bottlenecks.
* You can use snakeviz as a GUI to better analyze the results produced by cProfile.
* To start profiling, use:
profiler = cProfile.Profile()
profiler.enable()
* Note that any method printing or exporting profiling data (like stats.dump_stats()) will "pause" the profiler. You will need to manually call profiler.enable() again, and then data collection will resume (aggregating into the previous profiler object, it does not reset statistics). This is a bit strange and not documented.

Latest revision as of 09:24, 4 June 2024

Important changes from Java

  • When you pass an object to a function, and "reassign" it using its local function argument name, the outside objects won't get reassigned. This is because Python reassigns the local name only. This is a important difference from Java.

Useful techniques

  • To iterate over two lists at the same time, use the zip built-in function:
for a, b in zip(list1, list2):
  • To use the condition ? a : b construct:
a if condition else b
  • Converting a Decimal object to a standard (text) representation is surprisingly hard in Python. The normalize() method is extremely dangerous because Decimal("100").normalize() gives you Decimal("1E+2"). For this reason, normalize() should never be used as such. The correct way is to use one of the two following techniques:
d = d.quantize(Decimal(1)) if d == d.to_integral_value() else d.normalize()
d = d.normalize() + Decimal(0)

Casts

  • To cast a float to an integer, you can use the built-in int() function.

Exceptions

  • Sample code to deal with an exception:
	try:
	    doSomething()
	except Exception, inst:
	    print str(inst.args)
	    print str(sys.exc_info()[0])

This will give you information about the raised exception type.

Scopes

Within a module, inside a function, you can access the module variables normally. However, assigning them is not possible (you would assign a local variable). To assign a module variable inside a function, you need to specify that the variable is global by using the global keyword.

global myVariable

This is quite strange!

Classes

  • You don't need to explicitly declare class fields, as they are dynamically created the first time you assign them.
  • Every class method should take as a first argument "self".
  • Accessing an object attribute, when the attribute does not exist, results in an exception. Code like:
if object.myAttribute:

will not work as expected if myAttribute was not assigned. One easy solution is to define in the class initializer method (__init__) the attribute and set it equal to None.

Profiling & Memory

Getting memory usage from outside the Python process

  • Use the following command, it will give you the % of RAM usage and the size of the memory used by the process:
ps -u | grep python

Sizes

  • A Python float uses 24 bytes, and you need an additional 8 bytes to hold a pointer to the object in a List for instance (there are no primitive types in Python, everything is an object). So that's 32 bytes per float in a List.
  • If you want to optimize RAM, you should consider using numpy or other libraries to hold large amounts of floats.

Basic profiling techniques

  • Be careful about using tracemalloc; it needs to be initialized and will impact memory usage by a great deal (so for instance RSS results are no longer significant). Never run tracemalloc in production.
  • A simple memory usage technique is to just use the following code. It will give you the memory used by your process.
import psutil
memoryInfo = psutil.Process().memory_info()
rss = memoryInfo.rss / 1024 ** 2
  • Note that when using CPython, the interpreter does not always release memory it allocated to the OS. It allocates memory in chunks (called arenas) and if only part of an arena is used, the whole arena is never released. However, an arena is typically 256K and freed space in the arena will be reused by CPython so that's usually not an issue.
  • Pympler can also be useful; it will give you the memory space occupied by a given Python object (taking account its references to other Python objects). Most basic usage would be:
asizeof.asized(orderBooks, detail=1).format()
  • If you want information about an object that is an instance of a class, you should provide detail=2 so that Pympler will print out details about all the instance properties (it seems properties on a Python class instance are implemented with a dictionary, where the keys are the property names and values the values).
asizeof.asized(myObject, detail=2).format()
  • It can also be very useful to exclude some properties (objects) from an asized() call. For instance:
sizer = asizeof.Asizer()
sizer.exclude_refs(myObject.sampleProperty, myObject.httpSession)
sizer.asized(myObject, detail=2).format()


cProfile

  • This profiler is built-in to Python. It produces profiling data related to the CPU usage / time spent in each method, and can thus be useful to locate performance bottlenecks.
  • You can use snakeviz as a GUI to better analyze the results produced by cProfile.
  • To start profiling, use:
profiler = cProfile.Profile()
profiler.enable()
  • Note that any method printing or exporting profiling data (like stats.dump_stats()) will "pause" the profiler. You will need to manually call profiler.enable() again, and then data collection will resume (aggregating into the previous profiler object, it does not reset statistics). This is a bit strange and not documented.