How can I distribute a Python function in PySpark to speed up the computation with the least amount of work?