Data Scientist is considered the sexiest job around the world, at least in the tech world, and it is true that sometimes the results that you can get applying Machine Learning techniques seem to be sorcery. Therefore a lot of people are trying to become data scientist.
A lot of them use Python and almost all the on-line courses use also Python to teach the concepts.
A lot of them use Python and almost all the on-line courses use also Python to teach the concepts.
Python is a lovely programming language. It is so easy to get into and the final code is, sometimes, even beautiful. But all these things mean nothing when you face the
GIL
. There some solutions like
multiprocessing
. It is good when you can split the data in big chunks. Other approaches are related to apply horizontal scaling techniques in order to get a vertical scaling, what does it mean? If you have read about the GIL you already know that Python will only take advantage of one CPU core of your PC so, why don't we launch more Pythons and let's split the load between them using load balancing, messaging systems, and so on?
You can see that the first example is prepared to scale-in and the second one to scale-out. But all of them seem to be too much for a guy who only wants to develop a "Python code" so let's back to the data science world.
You can see that the first example is prepared to scale-in and the second one to scale-out. But all of them seem to be too much for a guy who only wants to develop a "Python code" so let's back to the data science world.
Typing an algorithm in Python is a pleasure and the ecosystem is great. Libraries like
numpy
or
scipy
will help you a lot. But you can also find higher-level libraries like
scikit-learn
or
tensorflow
. So, how am I not going to use Python for Machine Learning?
Some weeks ago I spend some time watching videos and presentations about Erlang and I remembered two things:
Why don't I try to handle my Python code using Elixir? In this way I will be able to scale and specially to add fault tolerance.
- Francesco Cesarini, Founder & Technical Director of Erlang Solutions , defined Erlang as an orchestration language.
-
Demonware
, the company behind the infrastructure for Call of Duty, uses Erlang to handle connections, tasks and especially, to control Python (
slides
).
Why don't I try to handle my Python code using Elixir? In this way I will be able to scale and specially to add fault tolerance.
With these things in mi mind I began to code
Piton
which is a library that uses
Erlang Ports
, thanks to
ErlPort
, to directly communicate Elixir and Python.
The first step is to have a Python project. I am going to use a simple example of a Fibonnaci calculator.
The first step is to have a Python project. I am going to use a simple example of a Fibonnaci calculator.
def fib(n): if n < 0: raise Exception("No negative values !!!") if n == 0: return 0 if n < 3: return 1 return fib(n - 1) + fib(n - 2)
Then, create the module of your own Port using Piton.Port:
defmodule MyPoolPort do use Piton.Port def start(), do: MyPoolPort.start([path: Path.expand("python_folder"), python: "python"], []) def fun(pid, n), do: MyPoolPort.execute(pid, :functions, :fun, [n]) end
It is mandatory to have a start() function which provides the path to the python project and the python interpreter which could belong to a virtual environment. Then you can define as many function as you need. I recommend to create some wrappers for the execute() function which only needs the
pid
of the process which is connected to one Python, the atom of the python module, the atom of the python function and a list of arguments for the python function.
Now we only have to launch our Piton.Pool, indicating which module is going to use and the number Pythons we want to run, and use it:
Now we only have to launch our Piton.Pool, indicating which module is going to use and the number Pythons we want to run, and use it:
iex> {:ok, pool} = Piton.Pool.start_link([module: MyPoolPort, pool_number: 2], []) {:ok, #PID<0.176.0>} iex> Piton.Pool.execute(pool, :fib, [20]) 6765