Article Series
Part II: Numpy implementation
Part III: Numba and Cython implementation (you are here)
This is Part III of a series of three posts. In Part I and II, I discussed pure python and numpy implementations of performing pair-wise distances under a periodic condition, respectively. In this post, I show how to use Numba and Cython to further speed up the python codes.
Skip to see the summary of benchmark results.
At some point, the optimized python codes are not strictly python codes anymore. For instance, in this post, using Cython, we can make our codes very efficient. However, strictly speaking, Cython is not Python. It is a superset of Python, which means that any Python code can be compiled as Cython code but not vice versa. To see the performance boost, one needs to write Cython codes. So what is stopping you to just write C++/C codes instead and be done with it? I believe there is always some balance between the performance of the codes and the effort you put into writing the codes. As I will show here, using Numba or writing Cython codes is straightforward if you are familiar with Python. Hence, I always prefer to optimize the Python codes rather than rewrite it in C/C++ because it is more cost-effective for me.
Just to reiterate, the computation is to calculate pair-wise distances between every pair of $N$ particles under periodic boundary condition. The positions of particles are stored in an array/list with form [[x1,y1,z1],[x2,y2,z2],...,[xN,yN,zN]]
. The distance between two particles, $i$ and $j$ is calculated as the following,
$\Delta_{ij} = \sigma_{ij} - \left[ \sigma_{ij}/L \right] \cdot L$
where $\sigma_{ij}=x_i-x_j$ and $L$ is the length of the simulation box edge. $x_i$ and $x_j$ is the positions. For more information, please read Part I.
Numba is an open-source JIT compiler that translates a subset of Python and NumPy code into fast machine code.
Numba has existed for a few years. I remembered trying it a few years ago but didn’t have a good experience with it. Now it is much more matured and very easy to use as I will show in this post.
On their website, it is stated that Numba can make Python codes as fast as C or Fortran. Numba also provides a way to parallelize the for
loop. First, let’s try to implement a serial version. Numba’s official documentation recommends using Numpy with Numba. Following the suggestion, using the Numpy code demonstrated in Part II, I have the Numba version,
import numba
from numba import jit
@jit(nopython=True, fastmath=True)
def pdist_numba_serial(positions, l):
"""
Compute the pair-wise distances between every possible pair of particles.
positions: a numpy array with form np.array([[x1,y1,z1],[x2,y2,z2],...,[xN,yN,zN]])
l: the length of edge of box (cubic/square box)
return: a condensed 1D list
"""
# determine the number of particles
n = positions.shape[0]
# create an empty array storing the computed distances
pdistances = np.empty(int(n*(n-1)/2.0))
for i in range(n-1):
D = positions[i] - positions[i+1:]
out = np.empty_like(D)
D = D - np.round(D / l, 0, out) * l
distance = np.sqrt(np.sum(np.power(D, 2.0), axis=1))
idx_s = int((2 * n - i - 1) * i / 2.0)
idx_e = int((2 * n - i - 2) * (i + 1) / 2.0)
pdistances[idx_s:idx_e] = distance
return pdistances
Using Numba is almost (see blue box below) as simple as adding the decorator @jit(nopython=True, fastmath=True)
to our function.
Inside the function pdist_numba_serial
, we basically copied the codes except the line D = D - np.round(D / l) * l
in the original code. Instead we need to use np.round(D / l, 0, out)
which is pointed out here
pdist_numba_serial
is a serial implementation. The nature of pair-wise distance computation allows us to parallelize the process by simplifying distributing pairs to multiple cores/threads. Fortunately, Numba does provide a very simple way to do that. The for loop in pdist_numba_serial
can be parallelized using Numba by replacing range
with prange
and adding parallel=True
to the decorator,
from numba import prange
# add parallel=True to the decorator
@jit(nopython=True, fastmath=True, parallel=True)
def pdist_numba_parallel(positions, l):
# determine the number of particles
n = positions.shape[0]
# create an empty array storing the computed distances
pdistances = np.empty(int(n*(n-1)/2.0))
# use prange here instead of range
for i in prange(n-1):
D = positions[i] - positions[i+1:]
out = np.empty_like(D)
D = D - np.round(D / l, 0, out) * l
distance = np.sqrt(np.sum(np.power(D, 2.0), axis=1))
idx_s = int((2 * n - i - 1) * i / 2.0)
idx_e = int((2 * n - i - 2) * (i + 1) / 2.0)
pdistances[idx_s:idx_e] = distance
return pdistances
There are some caveats when using prange
when race condition would occur. However for our case, there is no race condition since the distances calculations for pairs are independent with each other, i.e. there is no communication between cores/threads. For more information on parallelizing using Numba, refer to their documentation.
Benchmark
Now let’s benchmark the two versions of Numba implementations. The result is shown below,
Compared to the fastest Numpy implementation shown in Part II, the serial Numba implementation provides more than three times of speedup. As one can see, the parallel version is about twice as fast as the serial version on my 2-cores laptop. I didn’t test on the machines with more cores but I expect the speed up should scale linearly with the number of cores.
I am sure there are some more advanced techniques to make the Numba version even faster (using CUDA for instance). I would argue that the implementations above are the most cost-effective.
As demonstrated above, Numba provides a very simple way to speed up the python codes with minimal effort. However, if we want to go further, it is probably better to use Cython.
Cython is basically a superset of Python. It allows Cython/Python codes to be compiled to C/C++ and then compiled to machine codes using C/C++ compiler. In the end, you have a C module you can import directly in Python.
Similar to the Numba versions, I show both serial and parallel versions of Cython implementations
%load_ext Cython # load Cython in Jupyter Notebook
%%cython --force --annotate
import cython
import numpy as np
from libc.math cimport sqrt
from libc.math cimport nearbyint
@cython.boundscheck(False)
@cython.wraparound(False)
@cython.cdivision(True) # Do not check division, may leads to 20% performance speedup
def pdist_cython_serial(double [:,:] positions not None, double l):
cdef Py_ssize_t n = positions.shape[0]
cdef Py_ssize_t ndim = positions.shape[1]
pdistances = np.zeros(n * (n-1) // 2, dtype = np.float64)
cdef double [:] pdistances_view = pdistances
cdef double d, dd
cdef Py_ssize_t i, j, k
for i in range(n-1):
for j in range(i+1, n):
dd = 0.0
for k in range(ndim):
d = positions[i,k] - positions[j,k]
d = d - nearbyint(d / l) * l
dd += d * d
pdistances_view[j - 1 + (2 * n - 3 - i) * i // 2] = sqrt(dd)
return pdistances
Some Remarks
Declare static types for variables using cdef
. For instance, cdef double d
declare that the variable d
has a double/float type.
Import sqrt
and nearbyint
from C library instead of using Python function. The general rule is that always try to use C functions directly whenever possible.
positions
is a Numpy array and declared using Typed Memoryviews.
Similar to positions
, pdistances_view
access the memory buffer of the numpy array pdistances
. Value assignments of pdistances
are achieved through pdistances_view
.
It is useful to use %%cython --annotate
to display the analysis of Cython codes. In such a way, you can inspect the potential slowdown of the code. The analysis will highlight lines where Python interaction occurs. In this particular example, it is very important to keep the core part – nested loop – from Python interaction. For instance, if we don’t use sqrt
and nearbyint
from libc.math
but instead just use python’s built-in sqrt
and round
, then you won’t see much speedup since there is a lot of overhead in calling these python functions inside the loop.
Similar to Numba, Cython also allows parallelization. The parallelization is achieved using OpenMP. First, to use OpenMP with Cython, we need to import needed modules,
from cython.parallel import prange, parallel
Then, replace the for i in range(n-1)
in the serial version with
with nogil, parallel():
for i in prange(n-1, schedule='dynamic'):
Everything else remains the same. Here I follow the example on Cython’s official documentation.
schedule='dynamic'
allows the iterations in the loop are distributed through threads as request. Other options include static
, guided
, etc. See here for full documentation.
I had some trouble compiling the parallel version directly in the Jupyter Notebook. Instead, it is compiled as a standalone module. The .pyx
file and setup.py
file can be found here.
Benchmark
The result of benchmarking pdist_cython_serial
and pdist_cython_parallel
is shown in the figure below,
As expected, the serial version is about half the speed of the parallel version on my 2-cores laptop. The serial version is more than two times faster than its counterpart using Numba.
In this serial of posts, using computations of pair-wise distance under periodic boundary condition as an example, I showed various ways to optimize the Python codes using built-in Python functions (Part I), NumPy (Part II), Numba and Cython (this post). The benchmark results from all of the functions tested are summarized in the table below,
Function | Averaged Speed (ms) | Speedup |
---|---|---|
pdist |
1270 | 1 |
pdist_v2 |
906 | 1.4 |
pdist_np_broadcasting |
160 | 7.9 |
pdist_np_naive |
110 | 11.5 |
pdist_numba_serial |
20.7 | 61 |
pdist_numba_parallel |
12.6 | 101 |
pdist_cython_serial |
5.84 | 217 |
pdist_cython_parallel |
3.19 | 398 |
The time is measured when $N=1000$. The parallel versions are tested on a 2-cores machine.
Article Series
Part II: Numpy implementation (you are here)
Part III: Numba and Cython implementation
This is part II of series of three posts on optimizing python code. Using an example of computing pair-wise distances under periodic boundary conditions, I will explore several ways to optimize the python codes, including pure python implementation without any third-party libraries, Numpy implementation, and implementation using Numba or Cython.
In this post, I show how to use Numpy to do the computation. I will demonstrate two different implementations.
Just to reiterate, the computation is to calculate pair-wise distances between every pair of $N$ particles under periodic boundary condition. The positions of particles are stored in an array/list with form [[x1,y1,z1],[x2,y2,z2],...,[xN,yN,zN]]
. The distance between two particles, $i$ and $j$ is calculated as the following,
$\Delta_{ij} = \sigma_{ij} - \left[ \sigma_{ij}/L \right] \cdot L$
where $\sigma_{ij}=x_i-x_j$ and $L$ is the length of the simulation box edge. $x_i$ and $x_j$ is the positions. For more information, you read up in Part I.
By naive, what I meant is that we simply treat numpy array like a normal python list and utilize some basic numpy functions to compute quantity such as summation, mean, power, etc. To get to the point, the codes are the following,
def pdist_np_naive(positions, l):
"""
Compute the pair-wise distances between every possible pair of particles.
positions: a numpy array with form np.array([[x1,y1,z1],[x2,y2,z2],...,[xN,yN,zN]])
l: the length of edge of box (cubic/square box)
return: a condensed 1D list
"""
# determine the number of particles
n = positions.shape[0]
# create an empty array storing the computed distances
pdistances = np.empty(int(n*(n-1)/2.0))
for i in range(n-1):
D = positions[i] - positions[i+1:]
D = D - np.round(D / l) * l
distance = np.sqrt(np.sum(np.power(D, 2.0), axis=1))
idx_s = int((2 * n - i - 1) * i / 2.0)
idx_e = int((2 * n - i - 2) * (i + 1) / 2.0)
pdistances[idx_s:idx_e] = distance
return pdistances
Benchmark
n = 100
positions = np.random.rand(n,2)
%timeit pdist_np_naive(positions,1.0)
2.7 ms ± 376 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
The performance is not bad. This is roughly 4 times speedup compared to the pure python implementation shown in Part I (might not be as fast as what one would expect since the python code shown in the previous post is already well-optimized). Is there any way we can speed up the calculation? We know that for
loops can be very slow in python. Hence, eliminating the for
loop in the example above might be the correct direction. It turns out that we can achieve this by fully utilizing the broadcasting feature of numpy.
To get rid of the loops in the codes above, we need to find some numpy native way to do the same thing. One typical method is to use the broadcasting. Consider the following example,
a = np.array([1,2,3])
b = 4
a + b
>>> array([5,6,7])
This is a simpler example of broadcasting. The underlying operation, in this case, is a loop over the element of a
and add value of b
to it. Instead of writing the loop ourselves, you can simply do a+b
and numpy will do the rest. The term “broadcasting” is in the sense that b
is stretched to be the same dimension of a
and then element-by-element arithmetic operations are taken. Because the broadcasting is implemented in C under the hood, it is much faster than writing for
loop explicitly.
The nature of pair-wise distance computation requires double nested loops which iterate over every possible pair of particles. It turns out that such a task can also be done using broadcasting. Again, I recommend reading their official documentation on broadcasting. The example 4 on that page is a nested loop. Look at the example, shown below
import numpy as np
a = np.array([0.0,10.0,20.0,30.0])
b = np.array([1.0,2.0,3.0])
a[:, np.newaxis] + b
>>> array([[ 1., 2., 3.],
[ 11., 12., 13.],
[ 21., 22., 23.],
[ 31., 32., 33.]])
Notice that the +
operation is applied on every possible pair of elements from a
and b
. It is equvanlently to the codes below,
a = np.array([0.0,10.0,20.0,30.0])
b = np.array([1.0,2.0,3.0])
c = np.empty((len(a), len(b)))
for i in range(len(a)):
for j in range(len(b)):
c[i,j] = a[i] + b[j]
The broadcasting is much simpler regarding the syntax and faster in many cases (but not all) compared to explicit loops. Let’s look at another example shown below,
a = np.array([[1,2,3],[-2,-3,-4],[3,4,5],[5,6,7],[7,6,5]])
diff = a[:, np.newaxis] - a
print('shape of array [a]:', a.shape)
print('Shape of array [diff]:', diff.shape)
>>> shape of array [a]: (5,3)
>>> shape of array [diff]: (5,5,3)
Array a
, with shape (5,3)
, represents 5 particles with coordinates on three dimensions. If we want to compute the differences between each particle on each dimension, a[:, np.newaxis] - a
does the job. Quantity a[:, np.newaxis] - a
has a shape (5,5,3)
whose first and second dimension is the particle indices and the third dimension is spatial.
Following this path, we reach the final code to compute the pair-wise distances under periodic boundary condition,
def pdist_np_broadcasting(positions, l):
"""
Compute the pair-wise distances between every possible pair of particles.
postions: numpy array storing the positions of each particle. Shape: (nxdim)
l: edge size of simulation box
return: nxn distance matrix
"""
D = positions[:, np.newaxis] - positions # D is a nxnxdim matrix/array
D = D - np.around(D / l) * l
# unlike the pdist_np_naive above, pdistances here is a distance matrix with shape nxn
pdistances = np.sqrt(np.sum(np.power(D, 2.0), axis=2))
return pdistances
Benchmark
n = 100
positions = np.random.rand(n,2)
%timeit pdist_np_broadcasting(positions, 1.0)
>>> 1.43 ms ± 649 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
This is about twice as fast as the naive numpy implementation.
pdist_np_broadcasting
returns an array with shape (n,n)
which can be considered as a distance matrix whose element [i,j]
is the distances between particle i
and j
. As you can see, this matrix is symmetric and hence contains duplicated information. There are probably better ways than what shown here to only compute the upper triangle of the matrix instead of a full one.
Now let’s make a final systematic comparison between pdsit_np_naive
and pdist_np_broadcasting
. I benchmark the performance for different values of n
and plot the speed as the function of n
. The result is shown in the figure below,
The result is surprising. The broadcasting version is faster only when the data size is smaller than 200. For large data set, the naive implementation turns out to be faster. What is going on? After googling a little bit, I found these StackOverflow questions 1, 2, 3. It turns out that the problem may lie in memory usage and access. Using the memory-profiler
, I can compare the memory usage from the two versions as a function of n
(see the figure below). The result shows that pdist_np_broadcasting
uses much more memory than pdist_np_naive
, which could explain the differences in speed.
The origin of the difference in memory usage is that for the pdist_np_naive
version, the computation is splitted into individual iteractions of the for
loop. Whereas the pdist_np_broadcasting
performs the computation in one single batch. pdist_np_naive
executes D = positions[i] - positions[i+1:]
inside the loop and every single iteration only creates an array of D
of size smaller than n
. On the other hand, D = positions[:, np.newaxis] - positions
and D = D - np.around(D / l) * l
in pdist_np_broadcasting
create several temporary array of size n*n
.
First, both of numpy implementations shown here lead to several times of speed up comparing to the pure python implementation. When working with numerical computation, use Numpy usually will give better performance. One of the counterexamples would be appending to a list/array where python’s append
is much faster than numpy’s append
.
Many online tutorials and posts recommend using the numpy’s broadcasting feature whenever possible. This is a largely correct statement. However, the example given here shows that the details of the implementation of broadcasting matters. On numpy’s official documentation, it states
There are also cases where broadcasting is a bad idea because it leads to inefficient use of memory that slows computation
pdist_np_broadcasting
is one of the examples where broadcasting might hurt performance. I guess the take-home message is that do not neglect space complexity (memory requirement) if you are trying to optimize the codes and numpy’s broadcasting is not always a good idea.
In the next post, I will show how to use Numba and Cython to boost the computation speed even more.
]]>In this series of posts, several different Python implementations are provided to compute the pair-wise distances in a periodic boundary condition. The performances of each method are benchmarked for comparison. I will investigate the following methods.
Article Series
Part I: Python implementation only using built-in libraries (you are here)
Part II: Numpy implementation
Part III: Numba and Cython implementation
In molecular dynamics simulations or other simulations of similar types, one of the core computations is to compute the pair-wise distances between particles. Suppose we have $N$ particles in our system, the time complexity of computing their pair-wise distances is $O(N^2)$. This is the best we can do when the whole set of pair-wise distances are needed. The good thing is that for actual simulation, in most the cases, we don’t care about the distances if it is larger than some threshold. In such a case, the complexity can be greatly reduced to $O(N)$ using neighbor list algorithm.
In this post, I won’t implement the neighbor list algorithm. I will assume that we do need all the distances to be computed.
If there is no periodic boundary condition, the computation of pair-wise distances can be directly calculated using the built-in Scipy function scipy.spatial.distance.pdist
which is pretty fast. However, with periodic boundary condition, we need to roll our own implementation. For a simple demonstration without losing generality, the simulation box is assumed to be cubic and has its lower left forward corner at the origin. Such set up would simplify the computation.
The basic algorithm of calculating the distance under periodic boundary condition is the following,
$\Delta = \sigma - \left[\sigma/L\right] * L$
where $\sigma = x_i - x_j$ and $L$ is the length of the simulation box edge. $\left[\cdot\right]$ denote the nearest integer. $x_i$ and $x_j$ is the position of particle $i$ and $j$ at one dimension. This computes the distance between two particles along one dimension. The full distance would be the square root of the summation of $\Delta$ from all dimensions.
Basic setup:
All codes shown are using Python version 3.7
The number of particles is n
The positions of all particles are stored in a list/array of the form [[x1,y1,z1],[x2,y2,z2],...,[xN,yN,zN]]
where xi
is the coordinates for particle i
.
The length of simulation box edge is l
.
We will use libraries and tools such as numpy
, itertools
, math
, numba
, cython
.
To clarify first, by pure, I mean that only built-in libraries of python are allowed. numpy
, scipy
or any other third-party libraries are not allowed. Let us first define a function to compute the distance between just two particles.
import math
def distance(p1, p2, l):
"""
Computes the distance between two points, p1 and p2.
p1/p2:python list with form [x1, y1, z1] and [x2, y2, z2] representing the cooridnate at that dimension
l: the length of edge of box (cubic/square box)
return: a number (distance)
"""
dim = len(p1)
D = [p1[i] - p2[i] for i in range(dim)]
distance = math.sqrt(sum(map(lambda x: (x - round(x / l) * l) ** 2.0, D)))
return distance
Now we can define the function to iterate over all possible pairs to give the full list of pair-wise distances,
def pdist(positions, l):
"""
Compute the pair-wise distances between every possible pair of particles.
positions: a python list in which each element is a a list of cooridnates
l: the length of edge of box (cubic/square box)
return: a condensed 1D list
"""
n = len(positions)
pdistances = []
for i in range(n-1):
for j in range(i+1, n):
pdistances.append(distance(positions[i], positions[j], l))
return pdistances
The function pdist
returns a list containing distances of all pairs. Let’s benchmark it!
import numpy as np
n = 100
positions = np.random.rand(n,3).tolist() // convert to python list
%timeit pdist(positions, 1.0)
14.8 ms ± 2.42 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Such speed is sufficient if n
is small. In the above example, we already utilize the built-in map
function and list comprehension to speed up the computation. Can we speed up our code further using only built-in libraries? It turns out that we can. Notice that in the function pdist
, there is a nested loop. What that loop is doing is to iterate over all the combinations of particles. Luckily, the built-in module itertools
provides a function combinations
to do just that. Given a list object lst
or other iterable object, itertools.combinations(lst, r=2)
generates a iterator of all unique pair-wise combination of elements from lst
without duplicates. For instance list(itertools.combinations([1,2,3], r=2))
will return [(1,2),(1,3),(2,3)]
. Utilizing this function, we can rewrite the pdist
function as the following,
def pdist_v2(positions, l):
# itertools.combinations returns an iterator
all_pairs = itertools.combinations(positions, r=2)
return [math.sqrt(sum(map(lambda p1, p2: (p1 - p2 - round((p1 - p2) / l) * l) ** 2.0, *pair))) for pair in all_pairs]
Explanation:
First, we use itertool.combinations()
to return an iterator all_pairs
of all possible combination of particles. r=2
means that we only want pair-wise combinations (no triplets, etc)
We loop over the all_pairs
using list comprehension using [do_something(pair) for pair in all_pairs]
.
item
is a tuple of coordinates of two particles, ([xi,yi,zi],[xj,yj,zj])
.
We use *pair
to unpack the tuple object pair
and then use map
and lambda function to compute the square of distances along each dimension. p1
and p2
represents the coordinates of a pair of particles.
Rigorously speaking, itertools.combinations
takes an iterable object as an argument and returns an iterator. I recommend to read this article and the official documentation to understand the concept of iterable/iterator/generator which is very important for advanced python programming.
Now let’s benchmark the pdist_v2
and compare it to pdist
. To make comparison systematically, I benchmark the performance for different values of n
and plot the speed as the function of n
. The result is the below,
If this is plotted on a log-log scale, one can readily see that both curves scale as $N^2$ which is expected.
The pdist_v2
implementation is about 38% faster than pdist
version. I guess the take-home message from this result is that replacing explicit for
loop with functions like map
and itertools
can boost the performance. However, one needs to make a strategic decision here, as the pdist
version with the explicit loop is much more readable and easier to understand whereas pdist_v2
requires a more or less advanced understanding of Python. Sometimes the readability and maintability of code are more important than its performance.
In the benchmark code above, we convert the numpy array of positions to python list. Since numpy array can be treated just like a python list (but not vice versa), we can instead directly provide numpy array as the argument in both pdist
and pdist_v2
. However, one can experiment a little bit to see that using numpy array directly actually slow down the computation a lot (about 5 times slower on my laptop). The message is that mixing numpy array with built-in functions such as map
or itertools
harms the performance. Instead, one should always try to use numpy native functions if possible when working with numpy array.
In the next post, I will show how to use Numpy to do the same calculation but faster than the pure python implementation shown here.
]]>Skip to the bottom of this page to see the demonstration.
Since I got this site running, I have been wanting to be able to embed some kind of interactive plot in my blog post. For instance, say I want the user to be able to perform some machine learning computations and then visualize the result. Currently, there are a few options to achieve this,
Pure Javascript solution. Both computation and visualization are performed using javascript. This process can either happens on the server or in the browser.
Combination of python and javascript
I have always been amazed by things people can do with javascript, such as deep learning using javascript inside your browser. But I can’t imagine javascript taking over python in scientific computing in near future. I, personally, am much more comfortable with python. Besides, the language has a much more mature scientific library ecosystem. To be able to use python to perform the computation part is essential, hence leaving us only with the second option, which is that python code runs either on a server or directly inside the browser.
Using a server to perform computations means communication with the server. This can have some drawbacks,
With my experience with Binder, the second point can be a dealbreaker. The solution would be simple. Just eliminate the server step! However, since for a long time javascript is the only programming language the browser can interpret directly. No server means that we need to find some way to run python code in the browser directly. There are quite a few options, such as PyPy.js. However it is not possible to use Numpy, Pandas and many other scientific/data analysis libraries in the browser until the Pyodide project came out recently. Pyodide allows python code to run inside the browser through WebAssembly. The best thing is that it allows one to use a few most popular scientific libraries including Numpy, Matplotlib, Pandas, Scipy and even Scikit-learn inside the browser! In fact, to my understanding, any python libraries in principle can be used through Pyodide. I am by no means expert on how Pyodide works. I suggest reading their blog post and checking out the project github repository.
I have been experimenting with Pyodide for a few days. In this post, I would like to give a proof-of-concept demonstration. Since I deal with random walks a lot in my research, I would like to make a simple random walk animation demonstration which
In this example, I will use python code to generate the trajectory of a simple 2D random walker and use plotly.js to handle the visualization.
For demonstration purpose, the random walk in this example is simple,
Here is the python code for generating such random walk,
The following code can be certainly rewritten in javascript, but the simplicity of python’s syntax and its ecosystem of scientific libraries greatly lower the barrier of writing code for more complex computation compared to other languages (this is just my opinion).
# load numpy library
import numpy as np
# function for generating random walk
# it takes the number of steps as only parameter
def walk(n):
# check if the number of steps is an integer
if int(n) != n:
print('number of steps should be an integer')
return None
# the initial position is (0,0)
xy_0 = np.array([0.0, 0.0])
# generate displacements of each step
dxdy = np.random.randn(int(n), 2)
# cumulative sum displacement to get positions at each step
xy = xy_0 + np.cumsum(dxdy, axis=0)
# insert the initial position at the head of the array
xy = np.vstack((xy_0, xy))
# since javascript has no 2D array, it is better to
# return the x-position and y-position, separately
return xy[:,0], xy[:,1]
Now we would like to be able to call this python code in the browser on demand. The browser then does the calculation and get two arrays which contain the $x$ coordinates and $y$ coordinates. Then we can use plotly.js to animate the trajectory.
For better maintainability, I suggest to put python code in a github gist and fetch the content on the fly. It also has an extra benefit that it allows the modification of python code without rebuilding the site.
Before I continue, I would like to point out that one of the biggest problems of Pyodide is that it is very large. To use it, the browser needs to download about 24 Mb code and Numpy library needs another 8 Mb which leads to a total of 32 Mb download size. I want the user to download the Pyodide only when they want to. To achieve this, I dynamically load the Pyodide script only when the initialization button is clicked (see demonstration below).
The python code is called through
gistFetchPromise.then(res => {pyodide.runPython(res)})
where gistFetchPromise
is the promise object of fetching the gist content. Note that the python code needed to be parsed as a raw string. The pyodide.runPython()
function is called to execute the python code. Once it is executed, all the python objects are available in the browser. The defined python function walk
can be accessed through pyodide.globals.walk
. Here is an example,
// Here is the javascript code
// we assign the python function [walk] to javascript [walk]
let walk = pyodide.globals.walk;
// we can call the function [walk] in javascript
let [x,y] = walk(1000);
// now x and y have values of positions of our random walker
The communication between python and javascript is two-way, meaning that we can access javascript variables/objects/functions in python as well. This notebook has some examples.
Once we get the calculated positions x
and y
, we can use plotly.js to plot the result. Fortunately, plotly.js provides a relative simple API for animation. One can also use Bokeh, D3, or any other web visualization tool out there. It is even possible to do the visualization in python directly since Pyodide also work with Matplotlib. However, at this stage, I think it is more straight forward to use a javascript library to handle the visualization since it is designed to manipulate DOM (HTML) after all.
I don’t want to make this post super long, thus I won’t go into very details of the visualization part. The full javascript code we need to load in the page can be found here. The file includes the code for fetching gist, visualization using plotly.js, Pyodide code and event handlers for buttons.
Here is the end product! Click the button Initialize Pyodide
to download the Pyodide and load Numpy. Once the initialization is finished (it can takes about 20 seconds or even longer with slow network. Not good, I know …), the button Reset
, Start
and Pause
will become clickable and green. Then enter a step number (or use the default number 100) and hit Start
button to watch the animation of a 2D random walker. Click Pause
to pause the animation anytime and Start
to resume. Click Reset
button to reset the random walker.
Every time you hit Reset
and Start
, a new random walk trajectory is generated directly inside your browser. There is no server involved whatsoever!
Since Pyodide uses WebAssembly, older browser cannot run the demonstration. You can check whether your browser support WebAssembly. I recommend use latest version of desktop chrome and firefox for the best experience.
ls
to list a directory sometimes can cause it to freeze for a few seconds. I tried to use it for probably a month and eventually gave up. I resorted back to iTerm and appreciated its speed which I never thought about before. Although I still couldn’t appreciate its design at that time, its speed is fast enough to not hinder my work in any way.
Two months ago, I noticed that hyper.js has released version 3 which they claimed to be much faster than the previous version. I gave it another try and the new version is indeed much faster than the version I used before. But after several days of usage, I cannot ignore the noticeable lagging (which is probably around 100 ms). One may argue such tiny lagging is not a big deal, but I find it unbearable if I intend to use it as my main terminal.
The same story goes to Atom. Again, it looks much better than Sublime Text or Vim (Neovim) and has a superb plugin ecosystem. But it is slow, the same experience shared by many people. Its new version certainly feels much faster than the version I used one year ago. However, once installed a few plugins and I started to notice the slowing down of opening a new file, the response from linter, etc. And due to its high memory usage, once you have several applications running, it becomes too slow to work on.
Furthermore, my research often requires me to open some large MD simulation trajectory files (200 MB on the small end, usually 1 GB and above), Atom or even Sublime Text isn’t able to handle it. I have to use Vim (actually Neovim) in such case. I imagine many people who deal with large data file daily will find Vim/Neovim is their only truly reliable text editor.
Both Hyper.js and Atom are not native applications but ones built with electron framework which are essentially web apps/sites running on your local computer. I do see the appeal of Electron which gives developers the ability to write cross-platform software/application using javascript, and maybe it is the future. A good example of Electron-based application is Visual Studio Code which I have been using for a well and it seems it is an application written with performance in mind. I do hope more apps follow this path.
P.S. I just read the terminal latency benchmark by danluu. The data there suggests hyper.js is faster than iTerm (even back in 2017)! However, the benchmark is rather simple, compared to the day-to-day use case. But I do notice that the memory consumed by hyper.js is much higher than other terminals. It could be the reason why I find it frequently freeze (I usually will have a bunch applications/software - Jupyter Notebook, a bunch of tabs in chrome, PDF reader, VMD, Sublime Text, etc - going on at the same time).
]]>The Netlify CMS provides a preview pane which reflects any editing in real-time. However, the default preview pane does not provide some functionalities I need, such as the ability to render math expression and highlight syntax in code blocks. Fortunately, it provides ways to customize the preview pane. The API registerPreviewTemplate
can be used to render customized preview templates. One can provide a React component and the API can use it to render the template. This functionality allows me to incorporate markdown-it and prismjs directly into the preview pane.
In this post, I will demonstrate,
I guess a simple preview template would render a title and the body of the markdown text. Using the variable entry
provided by Netlify CMS, the template can be written as the following,
// Netlify CMS exposes two React method "createClass" and "h"
import htm from 'https://unpkg.com/htm?module';
const html = htm.bind(h);
var Post = createClass({
render() {
const entry = this.props.entry;
const title = entry.getIn(["data", "title"], null);
const body = entry.getIn(["data", "body"], null);
return html`
<body>
<main>
<article>
<h1>${title}</h1>
<div>${body}</div>
</article>
</main>
</body>
`;
}
});
In the example shown above, I use htm npm module to write JSX
like syntax without need of compilation during build time. It is also possible to directly use the method h
provided by Netlify CMS (alias for React’s createElement
) to write the render template, which is the method given in their official examples.
this.props.entry
is exposed by CMS which is a immutable collection containing the collection data which is defined in the config.yml
entry.getIn(["data", "title"])
and entry.getIn(["data", "body"])
access the collection fields title
and body
, respectivelyThe problem with the template shown above is that the variable body
is just a raw string in markdown syntax which is not processed to be rendered as HTML
. Thus, we need a way to parse body
and convert it into HTML
. To do this, I choose to use markdown-it.
import markdownIt from "markdown-it";
import markdownItKatex from "@iktakahiro/markdown-it-katex";
import Prism from "prismjs";
// customize markdown-it
let options = {
html: true,
typographer: true,
linkify: true,
highlight: function (str, lang) {
var languageString = "language-" + lang;
if (Prism.languages[lang]) {
return '<pre class="language-' + lang + '"><code class="language-' + lang + '">' + Prism.highlight(str, Prism.languages[lang], lang) + '</code></pre>';
} else {
return '<pre class="language-' + lang + '"><code class="language-' + lang + '">' + Prism.util.encode(str) + '</code></pre>';
}
}
};
var customMarkdownIt = new markdownIt(options);
The above codes demonstrate how to import
markdown-it as a module and how to configure it.
highlight
part in the options
allows the prism.js to add classes to code blocks and used for CSS styling (hence highlighting)I recommend to use import
to load the prism.js
module in order to use babel-plugin-prismjs to bundle all the dependencies. I had trouble to get prism.js working in the browser using require
instead of import
.
Now we have loaded the markdown-it, the body
can be translated to HTML
using,
const bodyRendered = customMarkdownIt.render(entry.getIn(["data", "body"]));
To render bodyRendered
, we have to use dangerouslySetInnerHTML
which is provided by React to parse a raw HTML
string into the DOM. Finally, the codes for the template are,
var Post = createClass({
render() {
const entry = this.props.entry;
const title = entry.getIn(["data", "title"], null);
const body = entry.getIn(["data", "body"], null);
const bodyRendered = customMarkdownIt.render(body || '');
return html`
<body>
<main>
<article>
<h1>${title}</h1>
<div dangerouslySetInnerHTML=${{__html: bodyRendered}}></div>
</article>
</main>
</body>
`;
}
});
CMS.registerPreviewTemplate('posts', Post);
Note that there is a new line in the end. There, we use the method registerPreviewTemplate
to register our template Post
to be used for the CMS collection named posts
.
Now, I have shown how to 1) write a simple template for the preview pane and 2) how to use markdown-it and prism.js in the template. However, the codes shown above cannot be executed in the browser since the browser has no access to the markdown-it and prismjs which live in your local node_modules
directory. Here enters rollup.js which essentially can look into the node module markdown-it
and prismjs
, and take all the necessary codes and bundle them into one big file which contains all the codes needed without any external dependency anymore. In this way, the code can be executed directly inside the browser. To set up rollup.js. We need a config file,
// rollup.config.js
const builtins = require('rollup-plugin-node-builtins');
const commonjs = require('rollup-plugin-commonjs');
const nodeResolve = require('rollup-plugin-node-resolve');
const json = require('rollup-plugin-json');
const babel = require('rollup-plugin-babel');
export default {
input: 'src/admin/preview.js',
output: {
file: 'dist/admin/preview.js',
format: 'esm',
},
plugins: [
nodeResolve({browser:true}),
commonjs({ignore: ["conditional-runtime-dependency"]}),
builtins(),
json(),
babel({
"plugins": [
["prismjs", {
"languages": ["javascript", "css", "markup", "python", "clike"]
}]
]
})
]
};
src/admin/preview.js
is the path of the template codeesm
tells the rollup.js to bundle the code as an ES module.The perform the bundling, one can either use rollup --config
in the terminal if rollup.js is installed globally or add it as a npm
script. The config above tells the rollup.js to generate the file dist/admin/preview.js
.
To use the template, the final step is to include it as a <script type=module>
tag. Add the following in the <head>
section in your admin/index.html
,
<body>
<script type=module src="/admin/preview.js"></script>
</body>
See this screenshot
]]>.psf
file from LAMMPS data fileIn VMD console, use command cd
to navigate to the directory where the LAMMPS data file is located. Then, run the following command
topo readlammpsdata your_data_file bond
animate write psf your_psf_file
If the command runs successfully, then you should find your .psf
file generated in the directory. To use the .psf
file, first load the generated .psf
file and then load the trajectory file. You should find yourself be able to use the functionalities such as drawing method, coloring method, etc …
Sometimes, we want to view certain molecular through the trajectory. However, the targeted molecular may diffuse in the simulation box, making the visualization difficult. We want to make the camera focus on the interested molecular. Here is a method to do this.
In Extension-Analysis-RMSD Visualizer Tool
, use molecular you want to focus on as Atom Selection
. Then run ALIGN
. You can watch the trajectory as the molecular you select in the center of the camera now.
To render publication quality image, follow the good practices below
Diffuse
material, or the AO-optimized AOShiny
, AOChalky
, or AOEdgy
materials provided in VMD.AO Ambient
factor to 1.0, and the AO Direct
factor to 0.8 as an initial starting point.File - Render - Tachyon
or TachyonInternal
, or use the render
command to do the same.express.js
and axios
. First, I create a simple express server, and secondly, I use axios to make http call to the server created.
The following is the code for our little express server.
// require the express
const express = require('express')
// create a express instance
const app = express()
// specify the port we want to listen to
const port = 3000
// define a data for illustration purpose
const mydata = {a:1,b:2,c:3}
app.get('/', (req, res)) => res.json(mydata)
app.listen(port, () => console.log(`Example app listening on port ${port}!`))
Save the above code to file myexpress-server.js
. Now if you run node myexpress-server.js
in your terminal and open http://localhost:3000
in your browser, you should see the values of mydata
printed on the screen!. Now we have successfully set up a small express server!
axios
to make http callNow we want to acquire our mydata
from some external place, we can use axios
to make API call to our express server built and get our mydata
object. Let’s write our axios code,
// require axios
const axios = require('axios')
// define our axios.get function
const getData = async () => {
try {
const mydata = await axios.get('http://localhost:3000')
console.log(mydata.data)
} catch (error) {
console.error(error)
}
}
// call our function
getData()
Save the following code to a file named myaxios.js
. Now if we a) start our express server by doing node myexpress-server.js
in the terminal, and b) run our axios code in another terminal window using node myaxios.js
. Whola, you can see the data for our mydata
object printed on the terminal!.
Fredholm integral equation of the first kind is written as,
$f(x)=\int_{a}^{b}K(x,s)p(s)\mathrm{d}s.$
The problem is to find $p(s)$, given that $f(x)$ and $K(x,s)$ are known. This equation occurs quite often in many different areas.
Here we describe a discretization-based method to solve the Fredholm integral equation. The integral equation is approximately replaced by a Riemann summation over grids,
$f(x_i)=\sum_j \Delta_s K(x_i, s_j) p(s_j)$
where $\Delta_s$ is the grid size along the dimension $s$ and $x_i$, $s_j$ are the grid points with $i$ and $j$ indicating their indices. When grid size $\Delta_s\to0$, the summation converges to the true integral. It is more convenient to write it in the matrix form,
$\boldsymbol{f} = \boldsymbol{\Delta}_s \boldsymbol{K} \boldsymbol{p}$
where
$\boldsymbol{f}=(f(x_1), f(x_2),\cdots,f(x_n))^{\mathrm{T}},$
$\boldsymbol{K}= \begin{pmatrix} K(x_1,s_1) & K(x_1,s_2) & \cdots & K(x_1,s_m) \\ K(x_2,s_1) & K(x_2,s_2) & \cdots & K(x_2,s_m) \\ \vdots & \vdots & \ddots & \vdots \\ K(x_n,s_1) & K(x_n,s_2) & \cdots & K(x_n,s_m) \end{pmatrix}$
$\boldsymbol{p} = (p(s_1),p(s_2),\cdots,p(s_m))^{\mathrm{T}}$
and $\boldsymbol{\Delta}_s = \Delta_s \boldsymbol{I}$ with $\boldsymbol{I}$ being the identity matrix of dimension $n \times n$.
Now solving the Fredholm integral equation is equivalent to solving a system of linear equations. The standard approach ordinary least squares linear regression, which is to find the vector $\boldsymbol{p}$ minimizing the norm $\vert\vert \boldsymbol{\Delta}_s \boldsymbol{K} \boldsymbol{p}-\boldsymbol{f}\vert\vert_2^2$. In principle, the Fredholm integral equation may have non-unique solutions, thus the corresponding linear equations are also ill-posed. The most commonly used method for ill-posed problem is Tikhonov regularization which is to minimize
$\vert\vert\boldsymbol{\Delta}_s \boldsymbol{K} \boldsymbol{p}-\boldsymbol{f}\vert\vert_2^2+\alpha^2\vert\vert\boldsymbol{p}\vert\vert_2^2$
Note that this is actually a subset of Tikhonov regularization (also called Ridge regularization) with $\alpha$ being a parameter.
In many cases, both $f(x)$ and $g(s)$ are probability density function (PDF), and $K(x,s)$ is a conditional PDF, equivalent to $K(x\vert s)$. Thus, there are two constraints on the solution $p(s)$, that is $p(s)\geq 0$ and $\int p(s)\mathrm{d}s = 1$. These two constraints translate to $p(s_i)\geq 0$ for any $s_i$ and $\Delta_s\sum_i p(s_i)=1$. Hence, we need to solve the Tikhonov regularization problem subject to these two constraints.
In the following, I will show how to solve the Tikhonov regularization problem with both equality and inequality constraints. First, I will show that the Tikhonov regularization problem with non-negative constraint can be easily translated to a regular non-negative least square problem (NNLS) which can be solved using active set algorithm.
Let us construct the matrix,
$\boldsymbol{A}= \begin{pmatrix} \Delta_s \boldsymbol{K} \\ \alpha \boldsymbol{I} \end{pmatrix}$
and the vector,
$\boldsymbol{b}= \begin{pmatrix} \boldsymbol{f}\\ \boldsymbol{0} \end{pmatrix}$
where $\boldsymbol{I}$ is the $m\times m$ identity matrix and $\boldsymbol{0}$ is the zero vector of size $m$. It is easy to show that the Tikhonov regularization problem $\mathrm{min}(\vert\vert\boldsymbol{\Delta}_{s} \boldsymbol{K} \boldsymbol{p} - \boldsymbol{f}\vert\vert_{2}^{2}+\alpha^2\vert\vert\boldsymbol{p}\vert\vert_{2}^{2})$ subject to $\boldsymbol{p}\geq 0$ is equivalent to the regular NNLS problem,
$\mathrm{min}(\vert\vert\boldsymbol{A}\boldsymbol{p}-\boldsymbol{b}\vert\vert_2^2),\mathrm{\ subject\ to\ }\boldsymbol{p}\geq 0$
Now we add the equality constraint, $\Delta_s\sum_i p(s_i)=1$ or $\boldsymbol{1}\boldsymbol{p}=1/\Delta_s$ written in matrix form. My implementation of solving such problem follows the algorithm described in Haskell and Hanson ^{[1]}. According to their method, the problem becomes another NNLS problem,
$\mathrm{min}(\vert\vert\boldsymbol{1}\boldsymbol{p}-1/\Delta_s\vert\vert_2^2+\epsilon^2\vert\vert\boldsymbol{A}\boldsymbol{p}-\boldsymbol{b}\vert\vert_2^2),\mathrm{\ subject\ to\ }\boldsymbol{p}\geq 0$
The solution to the above equation converges to the true solution when $\epsilon\to0^+$. Now I have described the algorithm to solve the Fredholm equation of the first kind when $p(s)$ is a probability density function. I call the algorithm described above as non-negative Tikhonov regularization with equality constraint (NNETR).
Here I show the core code of the algorithm described above.
# core algorithm of non-negative equality Tikhonov regularization (NNETR)
def NNETR(K, f, Delta, epsilon, alpha):
# the first step
A_nn = np.vstack((K, alpha * np.identity(K.shape[1])))
b_nn = np.hstack((f, np.zeros(K.shape[1])))
# the second step
A_nne = np.vstack((epsilon * A_nn, np.full(A_nn.shape[1], 1.0)))
b_nne = np.hstack((epsilon * b_nn, 1.0))
# Use NNLS solver provided by scipy
sol, residue = scipy.optimize.nnls(A_nne, b_nne)
# solution should be divided by Delta (grid size)
sol = sol/Delta
return sol, residue
Haskell, Karen H., and Richard J. Hanson. “An algorithm for linear least squares problems with equality and nonnegativity constraints.” Mathematical Programming 21.1 (1981): 98-118. ↩︎
However the above examples only focus taking the sequential input data and output sequential prediction. My problem is learning mapping between two matrices which is multidimensional. After researching a little bit, I found Multidimensional Recurrent Neural Network can be used here. If you google “Multidimensional Recurrent Neural Network”, the first entry would be this paper by Alex Graves, et al. However I want to point out that almost exact same idea is long proposed back in 2003 in the context of protein contact map prediction in this paper.
I have never had any experience using neural network before. Instead of learning from scratch, I decided that it is probably more efficient to just find a github repo available and study the code from there. Fortunately I did find a very good exemplary code here.
The question is that can MDLSTM learn the mapping between two matrices? From basic linear algebra, we know there are two types of mapping: linear map and non-linear map. So it is natural to study the problem in two cases. Any linear mapping can be represented by a matrix. For simplicity, I use a random matrix to represent the linear mapping we want to learn, $M$. And apply it to a gaussian field matrix $I$ to produce a new transformed matrix $O$, i.e. $O = M\cdot I$. We feed $I$ and $O$ into our MDLSTM network as our inputs and targets. Since our goal is to predict $O$ given the input $I$ where values of elements in $O$ are continuous rather than categorical. So we use linear activation function and mean square error as our loss function.
def fft_ind_gen(n):
a = list(range(0, int(n / 2 + 1)))
b = list(range(1, int(n / 2)))
b.reverse()
b = [-i for i in b]
return a + b
def gaussian_random_field(pk=lambda k: k ** -3.0, size1=100, size2=100, anisotropy=True):
def pk2(kx_, ky_):
if kx_ == 0 and ky_ == 0:
return 0.0
if anisotropy:
if kx_ != 0 and ky_ != 0:
return 0.0
return np.sqrt(pk(np.sqrt(kx_ ** 2 + ky_ ** 2)))
noise = np.fft.fft2(np.random.normal(size=(size1, size2)))
amplitude = np.zeros((size1, size2))
for i, kx in enumerate(fft_ind_gen(size1)):
for j, ky in enumerate(fft_ind_gen(size2)):
amplitude[i, j] = pk2(kx, ky)
return np.fft.ifft2(noise * amplitude)
def next_batch_linear_map(bs, h, w, mapping, anisotropy=True):
x = []
for i in range(bs):
o = gaussian_random_field(pk=lambda k: k ** -4.0, size1=h, size2=w, anisotropy=anisotropy).real
x.append(o)
x = np.array(x)
y = []
for idx, item in enumerate(x):
y.append(np.dot(mapping, item))
y = np.array(y)
# data normalization
for idx, item in enumerate(x):
x[idx] = (item - item.mean())/item.std()
for idx, item in enumerate(y):
y[idx] = (item - item.mean())/item.std()
return x, y
Note that we normalize the matrix elements by making their mean equals zero and variance equal 1. We can visualize the mapping by plotting the matrix
h, w = 10, 10
batch_size = 10
linear_map = np.random.rand(h, w)
batch_x, batch_y = next_batch(batch_size, h, w, linear_map)
fig, ax = plt.subplots(1,3)
ax[0].imshow(batch_x[0], cmap='jet', interpolation='none')
ax[1].imshow(my_multiply, cmap='jet', interpolation='none')
ax[2].imshow(batch_y[0], cmap='jet', interpolation='none')
ax[0].set_title(r'$\mathrm{Input\ Matrix\ }I$')
ax[1].set_title(r'$\mathrm{Linear\ Mapping\ Matrix\ }M$')
ax[2].set_title(r'$\mathrm{Output\ Matrix\ }O$')
ax[0].axis('off')
ax[1].axis('off')
ax[2].axis('off')
plt.tight_layout()
plt.show()
As shown, the matrix $M$ maps $I$ to $O$. Such transformation is called linear mapping. I will show that MDLSTM can indeed learn this mapping up to reasonable accuracy. I use the codes here. The following code is the training part
anisotropy = False
learning_rate = 0.005
batch_size = 200
h = 10
w = 10
channels = 1
x = tf.placeholder(tf.float32, [batch_size, h, w, channels])
y = tf.placeholder(tf.float32, [batch_size, h, w, channels])
linear_map = np.random.rand(h,w)
hidden_size = 100
rnn_out, _ = multi_dimensional_rnn_while_loop(rnn_size=hidden_size, input_data=x, sh=[1, 1])
# use linear activation function
model_out = slim.fully_connected(inputs=rnn_out,
num_outputs=1,
activation_fn=None)
# use a little different loss function from the original code
loss = tf.sqrt(tf.reduce_sum(tf.square(tf.subtract(y, model_out))))
grad_update = tf.train.AdamOptimizer(learning_rate).minimize(loss)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=False))
sess.run(tf.global_variables_initializer())
# Add tensorboard (Really usefull)
train_writer = tf.summary.FileWriter('Tensorboard_out' + '/MDLSTM',sess.graph)
steps = 1000
mypredict_result = []
loss_series = []
for i in range(steps):
batch = next_batch_linear_map(batch_size, h, w, linear_map, anisotropy)
st = time()
batch_x = np.expand_dims(batch[0], axis=3)
batch_y = np.expand_dims(batch[1], axis=3)
mypredict, loss_val, _ = sess.run([model_out, loss, grad_update], feed_dict={x: batch_x, y: batch_y})
mypredict_result.append([batch_x, batch_y, mypredict])
print('steps = {0} | loss = {1:.3f} | time {2:.3f}'.format(str(i).zfill(3),
loss_val,
time() - st))
loss_series.append([i+1, loss_val])
The loss as a function of steps is shown in the figure below. It seems the loss saturate around 70-75. Now let’s see how well our neural network learns? The following figures show five predictions on newly randomly generated input matrix. The results are pretty good for the purpose of illustration. I am sure there must be some room for improvements.
I choose the square of the matrix as the test for nonlinear mapping, $I^{2}$.
def next_batch_nonlinear_map(bs, h, w, anisotropy=True):
x = []
for i in range(bs):
o = gaussian_random_field(pk=lambda k: k ** -4.0, size1=h, size2=w, anisotropy=anisotropy).real
x.append(o)
x = np.array(x)
y = []
for idx, item in enumerate(x):
y.append(np.dot(item, item)) # only changes here
y = np.array(y)
# data normalization
for idx, item in enumerate(x):
x[idx] = (item - item.mean())/item.std()
for idx, item in enumerate(y):
y[idx] = (item - item.mean())/item.std()
return x, y
The following image are the loss function and results.
As you can see, the results are not great but very promising.
]]>To illustrate this method which is used to generate static conformations of a gaussian chain, we need first consider the dynamics of a such system. It is well known the dynamics of a gaussian/ideal chain can be modeled by the Brownian motion of beads connected along a chain, which is ensured to give correct equilibrium ensemble. The model is called “Rouse model”, and very well studied. I strongly suggest the book The Theory of Polymer Dynamics by Doi and Edwards to understand the method used here. I also found a useful material here. I will not go through the details of derivation of solution of Rouse model. To make it short, the motion of a gaussian chain is just linear combinations of a series of inifinite number of independent normal modes. Mathematically, that is,
$\mathbf{R}_{n}=\mathbf{X}_{0}+2\sum_{p=1}^{\infty}\mathbf{X}_{p}\cos\big(\frac{p\pi n}{N}\big)$
where $\mathbf{R}_{n}$ is the position of $n^{th}$ bead and $\mathbf{X}_{p}$ are the normal modes. $\mathbf{X}_{p}$ is the solution of langevin equation $\xi_{p}\frac{\partial}{\partial t}\mathbf{X}_{p}=-k_{p}\mathbf{X}_{p}+\mathbf{f}_{p}$. This is a special case of Orstein-Uhlenbeck process and the equilibrium solution of this equation is just a normal distribution with mean $0$ and variance $k_{\mathrm{B}}T/k_{p}$.
$X_{p,\alpha}\sim \mathcal{N}(0,k_{\mathrm{B}}T/k_{p})\quad, \quad\alpha=x,y,z$
where $k_{p}=\frac{6\pi^{2}k_{\mathrm{B}}T}{N b^{2}}p^{2}$, $N$ is the number of beads or number of steps. $b$ is the kuhn length.
This suggests that we can first generate normal modes. Since the normal modes are independent with each other and they are just gaussian random number. It is very easy and straightforward to do. And then we just transform them to the actual position of beads using the first equation and we get the position of each beads, giving us the conformations. This may seems untrivial at first glance but should give us the correct result. To test this, let’s implement the algorithm in python.
def generate_gaussian_chain(N, b, pmax):
# N = number of beads
# b = kuhn length
# pmax = maximum p modes used in the summation
# compute normal modes xpx, xpy and xpz
xpx = np.asarray(map(lambda p: np.random.normal(scale = np.sqrt(N * b**2.0/(6 * np.pi**2.0 * p**2.0))), xrange(1, pmax+1)))
xpy = np.asarray(map(lambda p: np.random.normal(scale = np.sqrt(N * b**2.0/(6 * np.pi**2.0 * p**2.0))), xrange(1, pmax+1)))
xpz = np.asarray(map(lambda p: np.random.normal(scale = np.sqrt(N * b**2.0/(6 * np.pi**2.0 * p**2.0))), xrange(1, pmax+1)))
# compute cosin terms
cosarray = np.asarray(map(lambda p: np.cos(p * np.pi * np.arange(1, N+1)/N), xrange(1, pmax+1)))
# transform normal modes to actual position of beads
x = 2.0 * np.sum(np.resize(xpx, (len(xpx),1)) * cosarray, axis=0)
y = 2.0 * np.sum(np.resize(xpy, (len(xpy),1)) * cosarray, axis=0)
z = 2.0 * np.sum(np.resize(xpz, (len(xpz),1)) * cosarray, axis=0)
return np.dstack((x,y,z))[0]
Note that there is a parameter called pmax
. Although actual position is the linear combination of inifinite number of normal modes, numerically we must truncate this summation. pmax
set the number of normal modes computed. Also in the above code, we use numpy broadcasting to make the code very consie and efficient. Let’s use this code to generate three conformations with different values of pmax
and plot them
# N = 300
# b = 1.0
conf1 = generate_gaussian_chain(300, 1.0, 10) # pmax = 10
conf2 = generate_gaussian_chain(300, 1.0, 100) # pmax = 100
conf3 = generate_gaussian_chain(300, 1.0, 1000) # pmax = 1000
fig = plt.figure(figsize=(15,5))
# matplotlib codes here
plt.show()
The three plots show the conformations with $p_{\mathrm{max}}=10$, $p_{\mathrm{max}}=100$ and $p_{\mathrm{max}}=1000$. $N=300$ and $b=1$. As clearly shown here, larger number of modes gives more correct result. The normal modes of small p
corresponds the low frequency motion of the chain, thus with small pmax
, we are only considering the low frequency modes. The conformation generated can be considered as some what coarse-grained representation of a gaussian chain. Larger the pmax
is, more normal modes of higher frequency are included, leading to more detailed structure. The coarsing process can be vividly observed in the above figure from right to left (large pmax
to small pmax
). To test our conformations indeed are gaussian chain, we compute the mean end-to-end distance to test whether we get correct Flory scaling ($\langle R_{ee}^{2}\rangle = b^{2}N$).
As shown in the above plot, we indeed get the correct scaling result, $\langle R_{ee}^{2}\rangle = b^{2}N$. When using this method, care should be taken setting the parameter pmax
, which is the number of normal modes computed. This number should be large enough to ensure the correct result. Longer the chain is, the larger pmax
should be set.
This article was originally posted on my old Wordpress blog here.
LAMMPS is a very powerful Molecular Dynamics simulation software I use in my daily research. In our research group, we mainly run Langevin Dynamics (LD) or Brownian Dynamics (BD) simulation. However, for some reason, LAMMPS doesn’t provide a way to do Brownian Dynamics (BD) simulation. Both the LD and BD can be used to sample correct canonical ensemble, which sometimes also be called NVT ensemble.
The BD is the large friction limit of LD, where the inertia is neglected. Thus BD is also called overdamped Langevin Dynamics. It is very important to know the difference between LD and BD since these two terms seems be used indifferently by many people which is simply not correct.
The equation of motion of LD is,
$m \ddot{\mathbf{x}} = -\nabla U(\mathbf{x}) - m\gamma \dot{\mathbf{x}}+\mathbf{R}(t)$
where $m$ is the mass of the particle, $x$ is its position and $\gamma$ is the damping constant. $\mathbf{R}(t)$ is random force. The random force is subjected to fluctuation-dissipation theorem. $\langle \mathbf{R}(0)\cdot\mathbf{R}(t) \rangle = 2m\gamma\delta(t)/\beta$. $\gamma=\xi/m$ where $\xi$ is the drag coefficient. $\mathbf{R(t)}$ is nowhere differentiable, its integral is called Wiener process. Denote the wiener process associated with $ \mathbf{R}(t)$ as $\omega(t)$. It has the property $\omega(t+\Delta t)-\omega(t)=\sqrt{\Delta t}\theta$, $\theta$ is the Gaussian random variable of zero mean, variance of $2m\gamma/\beta$.
$\langle \theta \rangle = 0\quad\quad\langle \theta^{2}\rangle = 2m\gamma/\beta$
The fix fix langevin
provided in LAMMPS
is the numerical simulation of the above equation. LAMMPS
uses a very simple integration scheme. It is the Velocity-Verlet algorithm where the force on a particle includes the friction drag term and the noise term. Since it is just a first order algorithm in terms of the random noise, it can not be used for large friction case. Thus the langevin fix
in LAMMPS
is mainly just used as a way to conserve the temperature (thermostat) in the simulation to sample the conformation space. However in many cases, we want to study the dynamics of our interested system realistically where friction is much larger than the inertia. We need to do BD simulation.
For a overdamped system, $\gamma=\xi/m$ is very large, let’s take the limit $\gamma=\xi/m\to\infty$, the bath becomes infinitely dissipative (overdamped). Then we can neglect the left side of the equation of LD. Thus for BD, the equation of motion becomes
$\dot{\mathbf{x}}=-\frac{1}{\gamma m}\nabla U(\mathbf{x})+\frac{1}{\gamma m}\mathbf{R}(t)$
The first order integration scheme of the above equation is called Euler-Maruyama algorithm, given as
$\mathbf{x}(t+\Delta t)-\mathbf{x}(t)=-\frac{\Delta t}{m\gamma}\nabla U(\mathbf{x})+\sqrt{\frac{2\Delta t}{m\gamma\beta}}\omega(t)$
where $\omega(t)$ is the normal random variable with zero mean and unit variance. Since for BD, the velocities are not well defined anymore, only the positions are updated. The implementation of this scheme in LAMMPS
is straightforward. Based on source codes fix_langevin.cpp
and fix_langevin.h
in the LAMMPS
, I wrote a custom fix
of BD myself. The core part of the code is the following. The whole code is here.
void FixBD::initial_integrate(int vflag)
{
double dtfm;
double randf;
// update x of atoms in group
double **x = atom->x;
double **f = atom->f;
double *rmass = atom->rmass;
double *mass = atom->mass;
int *type = atom->type;
int *mask = atom->mask;
int nlocal = atom->nlocal;
if (igroup == atom->firstgroup) nlocal = atom->nfirst;
if (rmass) {
for (int i = 0; i < nlocal; i++)
if (mask[i] & groupbit) {
dtfm = dtf / rmass[i];
randf = sqrt(rmass[i]) * gfactor;
x[i][0] += dtv * dtfm * (f[i][0]+randf*random->gaussian());
x[i][1] += dtv * dtfm * (f[i][1]+randf*random->gaussian());
x[i][2] += dtv * dtfm * (f[i][2]+randf*random->gaussian());
}
} else {
for (int i = 0; i < nlocal; i++)
if (mask[i] & groupbit) {
dtfm = dtf / mass[type[i]];
randf = sqrt(mass[type[i]]) * gfactor;
x[i][0] += dtv * dtfm * (f[i][0]+randf*random->gaussian());
x[i][1] += dtv * dtfm * (f[i][1]+randf*random->gaussian());
x[i][2] += dtv * dtfm * (f[i][2]+randf*random->gaussian());
}
}
}
As one can see, the implementation of the integration scheme is easy, shown above. dtv
is the time step $\Delta t$ used. dtfm
is $1/(\gamma m)$ and randf
is $\sqrt{2m\gamma/(\Delta t\beta)}$.
The Euler-Maruyama scheme is a simple first order algorithm. Many studies has been done on higher order integration scheme allowing large time step being used. I also implemented a method shown in this paper. The integration scheme called BAOAB
is very simple, given as
$\mathbf{x}(t+\Delta t)-\mathbf{x}(t)=-\frac{\Delta t}{m\gamma}\nabla U(\mathbf{x})+\sqrt{\frac{\Delta t}{2m\gamma\beta}}(\omega(t+\Delta t)+\omega(t))$
The source code of this method can be downloaded here. In addition, feel free to fork my Github repository for fix bd
and fix bd/baoab
. I have done some tests and have been using this code in my research for a while and haven’t found problems. But please test the code yourself if you intend to use it and welcome any feedback if you find any problems.
To decide whether to use LD or BD in the simulation, one need to compare relevant timescales. Consider a free particle governed by the Langevin equation. Solving for the velocity autocorrelation function leads to, $\langle v(0)v(t)\rangle=(kT/m)e^{-\gamma t}$. This shows that the relaxation time for momentum is $\tau_m = 1/\gamma=m/\xi$. There is another timescale called Brownian timescale calculated by $\tau_{BD}=\sigma^2\xi/kT$ where $\sigma$ is the size of the particle. $\tau_{BD}$ is the timescale at which the particle diffuses about its own size. If $\tau_{BD}\gg \tau_m$ and if you are not interested at the dynamics on the timescale $\tau_m$, then one can use BD since the momentum can be safely integrated out. However, if these two timescales are comparable or $\tau_{BD}<\tau_m$, then only LD can be used because the momentum cannot be neglected in this case. To make the problem more complicated, there are more than just these two timescales in most of simulation cases, such as the relaxation time of bond vibration, etc… Fortunately, practically, comparing these two timescales is good enough for many cases.
big bend NP位于德州与墨西哥交界处, 由一条叫Rio Grande的河分界, 整个公园属于Chiwawa沙漠的一部分. 由于位置偏远, (我们从austin开过去要8个小时, 圣安东尼奥算是离公园最近的大城市也得6个小时), 所以是美国到访人数最少的几个国家公园之一 (全美一共59个国家公园big bend到访人数排名42). 这里最受欢迎的活动是徒步和露营, 整个公园的trail一共长180英里(290公里), 由于游客稀少, 所以在这里的露营和徒步会给你非常难忘的体验. Big Bend另外一个出名的是这里的夜空, 这里有美国最黑的夜晚 (因为附近实在太荒了, 几乎没有任何光污染), 遗憾的是我们没能好好欣赏这里的银河. 推荐如果有机会一定要熬到后半夜看看银河. 这里是公园官方关于观星的页面
从五月到八月沙漠地区白天会有极端干燥高温天气, 对于没有什么野外生存经验的人, 由于暴晒和缺水, 夏天在Big Bend的沙漠平原两天以上的徒步和露营是存在生命危险的. 所以我们决定只在High Chisos活动. High chisos是位于公园中央地区的一片拔地而起的山脉, 最高点比沙漠平原处提升了大约3000 feet, 即使在夏天最热的时候, 这里的气温也是可以承受的. 所以大多数夏天来Big Bend徒步露营的人都只会在High Chisos活动.
Big bend公园内的住宿分为两种: camping和lodging.
顾名思义, lodging就是住在公园内的建好的屋子里,有空调,可以洗澡.位置在Chisos Basin,网上的资料基本上都说这个需要提前半年多才能订到,不过我们这次比较幸运, 由于第二天临时改变行程, 当天从山上下来后直接去柜台很幸运的竟然订到了唯一剩下的一间房. 如果你需要洗澡,空调,睡得舒服,那这个lodge就是最好的选择,位置是公园中心,开车去公园其他地方都是最方便的.再加上在海拔比较高的地方,即使是夏季,晚上也会比较凉快.
不过来Big Bend不体验一次露营那真的太可惜了. camping大致分为两种, campgrounds和backcountry camping. campgrounds应该是公园内最常见的露营方式. campgrounds实际上就是公园官方修的专门让游客露营的地方, 每个site之间都离的很近, 会配备公共厕所. Big Bend共有三个campgrounds: chisos basin campground, cottonwood campground和rio grande village campground. chisos basin campground和上面介绍的lodging是在同一个地方, 剩下两个分别在公园的西边和东边. rio grande village是唯一可以停RV的campgrounds. campground的好处是提供厕所,如果只是想体验一下住帐篷,那么campground是最佳选择,因为提供厕所而且因为周围都是其他camping的人,所以很安全(不用担心野生动物).各个campground都有一部分sites是可以在网上提前预订的(预订网站), 剩余的是属于walk up类型,就是直接自己去营地找到还没有被占的campsite, 然后在campsite的入口处self check-in就行. 如果你没有提前在网上预订到,我建议在早上甚至凌晨到达,一般会有人很早就走了. 如果下午过去, 旺季基本是不可能找不到地方的.
但是如果你想真正的体验big bend,那我推荐一定要选择backcountry camping. Backcountry camping的好处就是你周围是没有其他人的,是真正的在野外露营. Backcountry又大致分为两种, 一种是primitive roadside campsite另外就是backpacking camping. roadside campsite是可以开车直接到达的, 很多开房车来big bend的人会选择这种方式. 需要注意的是roadside campsites都在沙漠平原, 而在公园里是不允许长时间发动着车吹空调的, 所以我认为这种方式只适合秋冬季 (除非你不怕热). 秋冬季, 人多, 又想比较悠闲的玩的话, 租辆RV来roadside camping是我最推荐的方式.
而对于想真正体验一把融入大自然的话, 那backpacking无疑是最好的方式. backpacking又分为两种: 在trail旁的公园指定好的地点camping和wilderness camping. widerness camping顾名思义, 就是你可以在(公园沙漠平原处的)任何地方露营和徒步, 需要提前把每天的行程 (大致的徒步路径, 每天的露营地点)告诉公园的工作人员, 这种方式需要足够多的沙漠生存的经验, 很可能徒步一星期过程中一个活人都看不到, 遇到危险是基本每人能帮忙, 新手还是不要想了, 万一挂了得不偿失. 所以我们选择了第一种方式. 这些campsites都在公园最受欢迎的high chisos trails旁, 每一个露营点都离徒步的路线不远,所以不会有什么危险. 所有的这些backpacking campsite是不接受预订的, 只能在计划的第一晚的当天去游客中心订露营点. 这个网站可以查询每个露营点的状态. 另外这个PDF提供了很详尽的每个campsite的情况. 我们的计划是三天两夜的行程, 从chisos basin出发, 在上山和下山的路径上选择两个campsite, 这样可以很充分的享受整个过程.
这里首先想说的是我俩在这之前是完全没有任何露营经验的, 所以是真正意义上的新手. 而Big Bend的徒步是有一定难度的, 我们算是完成了一半, 本来是三天两夜的行程中途改成了两天一夜, 放弃第二夜的主要原因是老婆晚上害怕睡不着… 所以如果你有一定经验, 那不要犹豫, 如果没有经验, 也不要退缩. 我们一开始是有一定顾虑的, 想过住公园的hotel或者campground, 大概在出发前的一个多星期偶然地在网上搜攻略的时候发现了这个论坛 bigbendchats. 网上Big Bend徒步的信息实在太少 (中文的游记/经验贴更是压根就没有, 这也是我写这篇游记的目的之一), 这个论坛的帮助非常大, 上面有详尽的资料, 经验贴和讨论帖, 可以说绝大多数有用的信息我是在这上面发现的.
下面流水账一下我们这次主要的行程
周五早上准备好装备放在车上, 下午5点半开完组会我们就直接从学校出发了, 大概晚上11点半到达提前订的位于Fort Stockton一家motel, 入住好12点一到就上查询campsite的网站研究哪些露营点还没被占. 最终确定了两晚上的露营地点 (LW3和SW3). 虽然夏季不是高峰, 但是有些遗憾的是最受欢迎的那几个露营点还是被先到的人订了.
周六早上7点起床, 8点出发. Fort Stockton离公园的入口大概还有1个半小时的车程, 从公园入口开到Chisos Basin的游客中心又是半个多小时. 在高峰期, 如果想拿到露营的许可需要尽早去游客中心申请, 许可就是一张表, 需要随身携带. 我们10点在游客中心顺利拿到了许可就收拾好装备趁早出发了. 我们选择的是从西边的路 (Laguna Meadow Trail)上山到达第一晚的露营地LW3, 原本也就是3小时的行程由于我们背了太重的装备足足爬了6个小时. LW3离主路大概有一英里的距离, 四周有其余两个营地, 最近的离我们大概几百米, 所以这里是个非常隐蔽的地点. 把帐篷搭好, 我就去周围寻找野生动物的痕迹了, 营地四周发现了好几处鹿的粪便, 还有一条很明显的黑熊常走的路(bear trail), 路上每隔几十米就是一泡黑熊粪便. 粪便都非常干, 应该说明这里有段时间没有"游客"光临了. 另外还发现了一个小水哇, 不过水已经发黄. 用滤水装置加药片过滤了一瓶水, 但仍然是黄色的, 我俩纠结了半天也没喝, 最后用来洗手了. 下午5点多烧水吃了我们的freeze dry food. 出乎意料的好吃. 天黑前我们就睡下了, 由于这是我们第一次野外露营, 一晚上醒来很多次, 总觉得周围有东西在动. 因为要保存水量, 晚上非常渴但是舍不得喝水, 这一晚让我第一次真正意识到水的珍贵…当然对我来说最可怕的是虫子, 这边有一种像苍蝇的虫子会咬人, 被咬起鸡蛋大小的肿包, 非常痒. 我老婆说因为我的血太好吃, 所以虫子一口也没咬她. 不过由于晚上老婆害怕不敢出帐篷, 所以这次我俩很遗憾的没有看到Big Bend的星空.
第二天早上8点多起床, 吃了早饭, 水已经消耗了一半, 由于装备还是很重, 这一天又有很长的路要走, 我俩非常纠结不舍地决定放弃第二天晚上的露营, 改为直接当天从Colima trail转到东线下山. 从我们的营地出发, 继续爬升了1个多小时就到达Colima trail, 然后就开始了漫长的下山过程. 我们穿过Colima trail后到达Boot Canyon, 这里是High chisos最受欢迎的地方之一. 这里有条小溪是整个High Chisos唯一的"稳定"水源. 我们到达那里发现溪水很旺, 鸟类很多. 在Boot Canyon吃完午饭(freeze dry版的宫保鸡丁)后, 歇了一会我俩就继续下山, 东线有段路非常的陡, 我俩庆幸前一天是从西边上的山, 要不然估计第一天就半途放弃了. 又经过了漫长的似乎永远没有尽头的三个多小时的下山过程, 我俩终于到了Chisos Basin大本营. 直奔商店买了瓶可乐, 最想干的事情就是洗一个冷水澡. 这时候已经3点多了, 想到直接开回Austin会太累, 我们决定去Basin campground碰碰运气, 看看有没有空余的位置, 我们确实找到几个空的营地, 不过都没有遮阳的棚子, 当时正是太阳最毒的时候, 我们放弃了住在campground的想法. 我们又查了下公园外面的情况, 全都住满了, 这时候我俩只剩下一个选择了, 去lodge看看有没有剩房. 我俩本来是没抱什么希望的, 没想到还真让我们抢到了最后一个空房. 虽然贵了点, 不过对于已经快累趴下的我俩, 顾不了那么多了… 我俩洗完澡就美美的睡了一觉, 晚上6点起床去sotol vista看日落. 一开始因为太累了, 只想一觉睡到第二天, 庆幸的是我们没有这么干,要不然就会错过这个难忘的日落.
第三天我俩趁着早上还没有太热, 开车去Rio Grande Village看看. 营地基本是空的. 有个很短trail可以爬到一个山包顶上, 从上面可以看到远处的墨西哥小镇. 这边的Rio Grande河有个基本180度的弯, 所以在山顶你的左右两侧实际上都是墨西哥. 最有意思的是我俩看到一个墨西哥人骑着个毛驴悠然自得的从墨西哥一侧跨过Rio Grande河溜达到了美国境内, 然后有慢悠悠的跨过河回到墨西哥. 想到Trump要建墙的计划, 我看看四周, 这里太大太偏了, 觉得基本是痴人说梦.
大概10点钟我们决定返程Austin, 在离开公园的路上在Fossil Discovery Exhibit参观了一下. Big Bend在亿万年前实际上是片海洋, 后来成了一片湿地, 在之后变成了沙漠. 恐龙曾经在这里繁衍生息. 展览建在曾经的一个挖掘点旁, 有一些当地挖掘的化石(包括史前巨鳄的头骨, 翼展有10来米的翼龙化石, 2米高的阿拉莫龍腿骨等等). 另外这里提供一条trail, 游客可以在路上找化石, 所有找到的化石都可以自己带走. 不过我们到那里的时候已经中午了, 由于太热我俩看完展览就匆匆离开了.
]]>I have been doing this for a while. But it eventually comes to me that how good it is if I can make it automatic. So I wrote this python script to do the procedures described in those three articles. I am sure there must be some more elegant way to do this. But this is what I got so far and it works.
import paramiko
import sys
import subprocess
import socket
import argparse
# function to get available port
def get_free_port():
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(('localhost',0))
s.listen(1)
port = s.getsockname()[1]
s.close()
return port
# print out the output from paramiko SSH connection
def print_output(output):
for line in output:
print(line)
parser = argparse.ArgumentParser(description='Locally open IPython Notebook on remote server\n')
parser.add_argument('-t', '--terminate', dest='terminate', action='store_true', \
help='terminate the IPython notebook on remote server')
args = parser.parse_args()
host="***" # host name
user="***" # username
# write a temporary python script to upload to server to execute
# this python script will get available port number
def temp():
with open('free_port_tmp.py', 'w') as f:
f.write('import socket\nimport sys\n')
f.write('def get_free_port():\n')
f.write(' s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)\n')
f.write(" s.bind(('localhost', 0))\n")
f.write(' s.listen(1)\n')
f.write(' port = s.getsockname()[1]\n')
f.write(' s.close()\n')
f.write(' return port\n')
f.write("sys.stdout.write('{}'.format(get_free_port()))\n")
f.write('sys.stdout.flush()\n')
def connect():
# create SSH client
client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
client.load_system_host_keys()
client.connect(host, username=user)
# generate the temp file and upload to server
temp()
ftpClient = client.open_sftp()
ftpClient.put('free_port_tmp.py', "/tmp/free_port_tmp.py")
# execute python script on remote server to get available port id
stdin, stdout, stderr = client.exec_command("python /tmp/free_port_tmp.py")
stderr_lines = stderr.readlines()
print_output(stderr_lines)
port_remote = int(stdout.readlines()[0])
print('REMOTE IPYTHON NOTEBOOK FORWARDING PORT: {}\n'.format(port_remote))
ipython_remote_command = "source ~/.zshrc;tmux \
new-session -d -s remote_ipython_session 'ipython notebook \
--no-browser --port={}'".format(port_remote)
stdin, stdout, stderr = client.exec_command(ipython_remote_command)
stderr_lines = stderr.readlines()
if len(stderr_lines) != 0:
if 'duplicate session: remote_ipython_session' in stderr_lines[0]:
print("ERROR: \"duplicate session: remote_ipython_session already exists\"\n")
sys.exit(0)
print_output(stderr_lines)
# delete the temp files on local machine and server
subprocess.run('rm -rf free_port_tmp.py', shell=True)
client.exec_command('rm -rf /tmp/free_port_tmp.py')
client.close()
port_local = int(get_free_port())
print('LOCAL SSH TUNNELING PORT: {}\n'.format(port_local))
ipython_local_command = "ssh -N -f -L localhost:{}:localhost:{} \
gs27722@wel-145-31.cm.utexas.edu".format(port_local, port_remote)
subprocess.run(ipython_local_command, shell=True)
def close():
# create SSH client
client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
client.load_system_host_keys()
client.connect(host, username=user)
stdin, stdout, stderr = client.exec_command("source ~/.zshrc;tmux kill-session -t remote_ipython_session")
stderr_lines = stderr.readlines()
if len(stderr_lines) == 0:
print('Successfully terminate the IPython notebook\n')
else:
print_output(stderr_lines)
client.close()
if args.terminate:
close()
else:
connect()
This script does the following:
paramiko
to execute the python script. This script gets an available port on localhost.zsh
. You can modify that part of code based on your situationIf the script runs successfully, you will see something like this.
If you want to check does IPython Notebook really runs on the remote machine. Use command tmux ls
. A tmux session named remote_ipython_session
should exist.
In browser, open http://localhost: 50979
. You should be able to access your ipython notebook. To terminate the ipython notebook on the remote machines, simply do
LAMMPS
source codes. This note focuses on compute style of Lammps
which is used to compute certain quantity during the simulation run. Of course you can as well compute these quantities in post-process, however it’s usually faster to do it in the simulation since you can take advantage of the all the distance, forces, et al generated during the simulation instead of computing them again in post-process. I will go through the LAMMPS
source code compute_gyration.h
and compute_gyration.cpp
. I am not very familiar with c++
, so I will also explain some language related details which is what I learn when studying the code. Hope this article can be helpful when someone want to modify or make their own Lammps
compute style.
compute_gyration.h
#ifdef COMPUTE_CLASS
ComputeStyle(gyration,ComputeGyration)
#else
#ifndef LMP_COMPUTE_GYRATION_H
#define LMP_COMPUTE_GYRATION_H
#include "compute.h"
namespace LAMMPS_NS {
class ComputeGyration : public Compute {
public:
ComputeGyration(class LAMMPS *, int, char **);
~ComputeGyration();
void init();
double compute_scalar();
void compute_vector();
private:
double masstotal;
};
}
#endif
#endif
First part of this code
#ifdef COMPUTE_CLASS
ComputeStyle(gyration,ComputeGyration)
#else
is where this specific compute style is defined. If you want to write your own compute style, let’s say intermediate scattering function. Then we write like this
#ifdef COMPUTE_CLASS
ComputeStyle(isf,ComputeISF) // ISF stands for intermediate scattering function
#else
Move to the rest part. #include "compute.h"
and namespace LAMMPS_NS
is to include the base class and namespace. Nothing special is here, you need to have this in every specific compute style header file.
class ComputeGyration : public Compute {
public:
ComputeGyration(class LAMMPS *, int, char **);
~ComputeGyration();
void init();
double compute_scalar();
void compute_vector();
private:
double masstotal;
You can see there is a overal structure in the above code class A : public B
. This basically means that our derived class A will inherit all the public and protected member of class B. More details can be found here, here and here
Next, we declare two types of member of our derived class, public
and private
. public
is the member we want the other code can access to and private
is the member which is only used in the derived class scope. Now let’s look at the public class member. Note that there is no type declaration of class member ComputeGyration
and ~ComputeGyration
. They are called Class Constructor and Class Destructor. They are usually used to set up the initial values for certain member variables as we can see later in compute_gyration.cpp
. Note that for some compute style such as compute_msd.h
, the destructor is virtual, that is virtual ~ComputeMSD
instead of just ~ComputeMSD
. This is because class ComputeMSD
is also inherited by derive class ComputeMSDNonGauss
. So you need to decalre the base destructor as being virtual. Look at this page for more details. Now let’s move forward.
void init();
double compute_scalar();
void compute_vector();
here all the function init
, compute_scalar
and compute_vector
all are the base class member which are already defined in compute.h
. However they are all virtual functions, which means that they can be overrided in the derived class, here it is the ComputeGyration
. This and this pages provide some basic explanations for the use of virtual functions. Here is a list shown in LAMMPS documentation of some examples of the virtual functions you can use in your derived class.
In our case, gyration computation will return a scalor and a vector, then we need compute_scalar()
and compute_vector()
. Private member masstotal
is the quantity calculated locally which is only used within the class and not needed for the rest of the codes.
compute_gyration.cpp
Now let’s look at the compute_gyration.cpp
.
#include <math.h>
#include "compute_gyration.h"
#include "update.h"
#include "atom.h"
#include "group.h"
#include "domain.h"
#include "error.h"
Here the necessary header files are include. The name of these header file is self-explanary. For instance, updata.h
declare the functions to update the timestep, et al.
ComputeGyration::ComputeGyration(LAMMPS *lmp, int narg, char **arg) :
Compute(lmp, narg, arg)
{
if (narg != 3) error->all(FLERR,"Illegal compute gyration command");
scalar_flag = vector_flag = 1;
size_vector = 6;
extscalar = 0;
extvector = 0;
vector = new double[6];
}
The above code define the what the constructor ComputeGyration
actually does. ::
is called scope operator, it is used to specify that the function being defined is a member (in our case which is the constructor which has the same name as the its class) of the class ComputeGyration
and not a regular non-member function. The structure ComputeGyration : Compute()
is called a Member Initializer List. It initializes the member Compute()
with the arguments lmp, narg, arg
. narg
is the number of arguments provided. scalar_flag
, vector_flag
, size_vector
, extscalar
and extvector
all are the flags parameter defined in Compute.h
. For instance, scalar_flag = 1/0
indicates we will/won’t use function compute_scalar()
in our derived class. The meaning of each parameter is explained in compute.h
. This line vector = new double[6]
is to dynamically allocate the memory for array of length 6. Normally the syntax of new operator is such
double *vector = NULL;
vector = new double[6];
Here the line double *vector = NULL
is actually in compute.h
and compute.cpp
. Where pointer vector
is defined in compute.h
and its value is set to NULL
in compute.cpp
.
ComputeGyration::~ComputeGyration()
{
delete [] vector;
}
The above code speficy destructor that is what will be excuted when class ComputeGyration
goes out of scope or is deleted. In this case, it delete the gyration tensor vector defined above. The syntax of delete operator for array is delete [] vector
. For details of new and delete can be found here.
void ComputeGyration::init()
{
masstotal = group->mass(igroup);
}
This part perform one time setup like initialization. Operator -> is just a syntax sugar, class->member
is equivalent with (*class).member
. What group->mass(igroup)
does is to call the member mass()
function of class group
, provided the group-ID, and return the total mass of this group. How value of igroup
is set can be examined in compute.cpp
. It’s the second argument of compute style.
double ComputeGyration::compute_scalar()
{
invoked_scalar = update->ntimestep;
double xcm[3];
group->xcm(igroup,masstotal,xcm);
scalar = group->gyration(igroup,masstotal,xcm);
return scalar;
}
invoked_scalar
is defined in base class Compute
. The value is the last timestep on which compute_scalar()
was invoked. ntimestep
is the member of class update
which is the current timestep. xcm
function of class group
calculate the center-of-mass coords. The result will be stored in xcm
. gyration
function calculate the gyration of a group given the total mass and center of mass of the group. The total mass is calculated in init()
. And in order for it to be accessed here, it is defined as private in compute_gyration.h
. Notice that here there is no explicit code to calculte the gyration scalor because the member function which does this job is already defined in class group
. So we just need to call it. However we also want to calculate the gyration tensor, we need to write a function to calculate it.
void ComputeGyration::compute_vector()
{
invoked_vector = update->ntimestep;
double xcm[3];
group->xcm(igroup,masstotal,xcm);
double **x = atom->x;
int *mask = atom->mask;
int *type = atom->type;
imageint *image = atom->image;
double *mass = atom->mass;
double *rmass = atom->rmass;
int nlocal = atom->nlocal;
double dx,dy,dz,massone;
double unwrap[3];
double rg[6];
rg[0] = rg[1] = rg[2] = rg[3] = rg[4] = rg[5] = 0.0;
for (int i = 0; i < nlocal; i++)
if (mask[i] & groupbit) {
if (rmass) massone = rmass[i];
else massone = mass[type[i]];
domain->unmap(x[i],image[i],unwrap);
dx = unwrap[0] - xcm[0];
dy = unwrap[1] - xcm[1];
dz = unwrap[2] - xcm[2];
rg[0] += dx*dx * massone;
rg[1] += dy*dy * massone;
rg[2] += dz*dz * massone;
rg[3] += dx*dy * massone;
rg[4] += dx*dz * massone;
rg[5] += dy*dz * massone;
}
MPI_Allreduce(rg,vector,6,MPI_DOUBLE,MPI_SUM,world);
if (masstotal == 0.0) return;
for (int i = 0; i < 6; i++) vector[i] = vector[i]/masstotal;
}
The above code do the actual computation of gyration tensor.
Here is the list of meaning of each variable
x
: 2D array of the position of atoms.mask
: array of group information of each atom. if (mask[i] & groupbit)
check whether the atom is in the group on which we want to perform calculation.type
: type of atom.image
: image flags of atoms. For instance a value of 2 means add 2 box lengths to get the unwrapped coordinate.mass
: mass of atoms.rmass
: mass of atoms with finite-size (meaning that it can have rotational motion). Notice that mass of such particle is set by density and diameter, not directly by the mass. That’s why they set two variables rmass
and mass
. To extract mass of atom i
, use rmass[i]
or mass[type[i]]
.nlocal
: number of atoms in one processor.Look at this line domain->unmap(x[i],image[i],unwrap)
, domain.cpp
tells that function unmap
return the unwrapped coordination of atoms in unwrap
. The following several lines calculate the gyration tensor. The MPI code MPI_Allreduce(rg,vector,6,MPI_DOUBLE,MPI_SUM,world)
sums all the six components of rg
calculated by each processor, store the value in vector
and then distribute vector
to all the processors. Refer to this article for details.
Here are two good articles about understanding and hacking LAMMPS code.
]]>Basically the most simple version of this algorithm breaks into following steps:
For random walks on a 3D cubic lattice, there are only 9 distinct rotation operations.
Some references on Pivot algorithm
The implement of this algorithm in Python
is very straightforward. The raw file can be found here
import numpy as np
import timeit
from scipy.spatial.distance import cdist
# define a dot product function used for the rotate operation
def v_dot(a):return lambda b: np.dot(a,b)
class lattice_SAW:
def __init__(self,N,l0):
self.N = N
self.l0 = l0
# initial configuration. Usually we just use a straight chain as inital configuration
self.init_state = np.dstack((np.arange(N),np.zeros(N),np.zeros(N)))[0]
self.state = self.init_state.copy()
# define a rotation matrix
# 9 possible rotations: 3 axes * 3 possible rotate angles(90,180,270)
self.rotate_matrix = np.array([[[1,0,0],[0,0,-1],[0,1,0]],[[1,0,0],[0,-1,0],[0,0,-1]]
,[[1,0,0],[0,0,1],[0,-1,0]],[[0,0,1],[0,1,0],[-1,0,0]]
,[[-1,0,0],[0,1,0],[0,0,-1]],[[0,0,-1],[0,1,0],[-1,0,0]]
,[[0,-1,0],[1,0,0],[0,0,1]],[[-1,0,0],[0,-1,0],[0,0,1]]
,[[0,1,0],[-1,0,0],[0,0,1]]])
# define pivot algorithm process where t is the number of successful steps
def walk(self,t):
acpt = 0
# while loop until the number of successful step up to t
while acpt <= t:
pick_pivot = np.random.randint(1,self.N-1) # pick a pivot site
pick_side = np.random.choice([-1,1]) # pick a side
if pick_side == 1:
old_chain = self.state[0:pick_pivot+1]
temp_chain = self.state[pick_pivot+1:]
else:
old_chain = self.state[pick_pivot:]
temp_chain = self.state[0:pick_pivot]
# pick a symmetry operator
symtry_oprtr = self.rotate_matrix[np.random.randint(len(self.rotate_matrix))]
# new chain after symmetry operator
new_chain = np.apply_along_axis(v_dot(symtry_oprtr),1,temp_chain - self.state[pick_pivot]) + self.state[pick_pivot]
# use cdist function of scipy package to calculate the pair-pair distance between old_chain and new_chain
overlap = cdist(new_chain,old_chain)
overlap = overlap.flatten()
# determinte whether the new state is accepted or rejected
if len(np.nonzero(overlap)[0]) != len(overlap):
continue
else:
if pick_side == 1:
self.state = np.concatenate((old_chain,new_chain),axis=0)
elif pick_side == -1:
self.state = np.concatenate((new_chain,old_chain),axis=0)
acpt += 1
# place the center of mass of the chain on the origin
self.state = self.l0*(self.state - np.int_(np.mean(self.state,axis=0)))
N = 100 # number of monomers(number of steps)
l0 = 1 # bond length(step length)
t = 1000 # number of pivot steps
chain = lattice_SAW(N,l0)
%timeit chain.walk(t)
1 loops, best of 3: 2.61 s per loop
Above code performs a 100 monomer chain with 1000 successful pivot steps. However even with numpy
and the built-in function cdist
of scipy
, the code is still too slow for large number of random walk steps.
When come to the loops, Python can be very slow. In many complex situations, even numpy
and scipy
is not that helpful. For instance in this case, in order to determine the overlaps, we need to have a nested loop over two sets of sites (monomers). In the above code, I use built-in function cdist
of scipy
to do this, which is already highly optimized. But actually we don’t have to complete the loops, because we can stop the search if we encounter one overlap. However I can’t think of a natural numpy
or scipy
way to do this efficiently due to the conditional break. Here is where [Cython]^{[4]} can be extrememly useful. Cython
can translate your python
code to C
and translate your C
or C++
code to a Python
module so you can directly import
your C/C++ code in Python. To do that, first we just handwrite our pivot algorithm using plain C++
code.
#include <math.h>
using namespace std;
void c_lattice_SAW(double* chain, int N, double l0, int ve, int t){
... // pivot algorithm codes here
}
Name the file c_lattice_SAW.cpp
. Here we define a function called c_lattice_SAW
. Where chain
is the array storing the coordinates of monomers, N
is the number of monomers, l0
is the bond length, t
is the number of successful steps.
C++11
library random is used here in order to use Mersenne twister RNG directly.
- The
C++
code in this case is not a complete program. It doesn’t havemain
function.
The whole C++
code can be found here. Beside our plain C
code, we also need a header file c_lattice_SAW.h
.
void c_lattice_SAW(double* chain, int N, double l0, int ve, int t);
If you don’t want to handwrite a C
code, another way to use Cython
is to write plain Cython
program whose syntax is very much Python-like. But in that way, how to get high quality random numbers efficiently is a problem. Usually there are several ways to get random numbers in Cython
This method will be very slow if random number is generated in a big loop because generated C code must call a Python module everytime which is a lot of overhead time.
numpy
to generate many random numbers in advance.This will require large amount of memory and also in many cases, the total number of random numbers needed is not known before the computation.
rand()
from standard library stdlib
in Cython
rand()
is not a very good RNG. In a scientific computation like Monte Carlo simulation, this is not good way to get random numbers.
This can be a good way. Currently I have found two ways to do this: 1. [Use numpy randomkit]^{[5]} 2. [Use GSL library]^{[6]}
C
or C++
code using random library or other external library and use Cython
to wrap the code.This will give the optimal performance, but comes with more complicated and less readable code.
What I did in this post is the last method.
Now we need to make a .pyx
file that will handle the C code in Cython
and define a python function to use our C code. Give the .pyx
a different name like lattice_SAW.pyx
import cython
import numpy as np
cimport numpy as np
cdef extern from "c_lattice_SAW.h":
void c_lattice_SAW(double* chain, int N, double l0, int ve, int t)
@cython.boundscheck(False)
@cython.wraparound(False)
def lattice_SAW(int N, double l0, int ve, int t):
cdef np.ndarray[double,ndim = 1,mode="c"] chain = np.zeros(N*3)
c_lattice_SAW(&chain[0],N,l0,ve,t)
return chain
Compile our C code to generate a shared library which can be imported into Python
as a module. To do that, we use Python
distutils
package. Make a file named setup.py
.
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
import numpy
setup(
cmdclass = {'build_ext':build_ext},
ext_modules = [Extension("lattice_SAW",
sources = ["lattice_SAW.pyx","c_lattice_SAW.cpp"],
extra_compile_args=['-std=c++11','-stdlib=libc++'],
language="c++",
include_dirs = [numpy.get_include()])],
)
Instead of normal arguments, we also have extra_compile_args
here. This is because in the C++
code, we use library random
which is new in C++11
. On Mac, -std=c++11
and -stdlib=libc++
need to be added to tell the compilers to support C++11
and use libc++
as the standard library. On Linux system, just -std=c++11
is enough.
If cimport numpy
is used, then the setting include_dirs = [numpy.get_include()])]
need to be added
Then in terminal we do
Linux
python setup.py build_ext --inplace
or Mac OS
clang++ python setup.py build_ext --inplace
clang++
tell the python use clang
compiler not gcc
because apparently the version of gcc
shipped with OS X doesn’t support C++11
.
If the compilation goes successfully, then a .so
library file is generated. Now we can import our module in Python
in that working directory
import lattice_SAW
import numpy
%timeit lattice_SAW.lattice_SAW(100,1,1,1000)
100 loops, best of 3: 5.97 ms per loop
That is 437 times faster than our Numpy/Scipy way!