Guang Shi - Blog Posts Guang Shi's personal site. Learn about my research projects, read my blog posts and see my photos. 2014-11-13T00:00:00+00:00 https://www.guangshi.io Guang Shi Optimizing python code for computations of pair-wise distances - Part III 2019-10-17T00:00:00+00:00 /posts/python-optimization-using-different-methods-part-3/ Article Series

This is Part III of a series of three posts. In Part I and II, I discussed pure python and numpy implementations of performing pair-wise distances under a periodic condition, respectively. In this post, I show how to use Numba and Cython to further speed up the python codes.

At some point, the optimized python codes are not strictly python codes anymore. For instance, in this post, using Cython, we can make our codes very efficient. However, strictly speaking, Cython is not Python. It is a superset of Python, which means that any Python code can be compiled as Cython code but not vice versa. To see the performance boost, one needs to write Cython codes. So what is stopping you to just write C++/C codes instead and be done with it? I believe there is always some balance between the performance of the codes and the effort you put into writing the codes. As I will show here, using Numba or writing Cython codes is straightforward if you are familiar with Python. Hence, I always prefer to optimize the Python codes rather than rewrite it in C/C++ because it is more cost-effective for me.

## Background

Just to reiterate, the computation is to calculate pair-wise distances between every pair of $N$ particles under periodic boundary condition. The positions of particles are stored in an array/list with form [[x1,y1,z1],[x2,y2,z2],...,[xN,yN,zN]]. The distance between two particles, $i$ and $j$ is calculated as the following,

$\Delta_{ij} = \sigma_{ij} - \left[ \sigma_{ij}/L \right] \cdot L$

where $\sigma_{ij}=x_i-x_j$ and $L$ is the length of the simulation box edge. $x_i$ and $x_j$ is the positions. For more information, please read Part I.

## Using Numba

Numba is an open-source JIT compiler that translates a subset of Python and NumPy code into fast machine code.

Numba has existed for a few years. I remembered trying it a few years ago but didn’t have a good experience with it. Now it is much more matured and very easy to use as I will show in this post.

### Serial Numba Implementation

On their website, it is stated that Numba can make Python codes as fast as C or Fortran. Numba also provides a way to parallelize the for loop. First, let’s try to implement a serial version. Numba’s official documentation recommends using Numpy with Numba. Following the suggestion, using the Numpy code demonstrated in Part II, I have the Numba version,

import numba
from numba import jit

@jit(nopython=True, fastmath=True)
def pdist_numba_serial(positions, l):
"""
Compute the pair-wise distances between every possible pair of particles.

positions: a numpy array with form np.array([[x1,y1,z1],[x2,y2,z2],...,[xN,yN,zN]])
l: the length of edge of box (cubic/square box)
return: a condensed 1D list
"""
# determine the number of particles
n = positions.shape
# create an empty array storing the computed distances
pdistances = np.empty(int(n*(n-1)/2.0))
for i in range(n-1):
D = positions[i] - positions[i+1:]
out = np.empty_like(D)
D = D - np.round(D / l, 0, out) * l
distance = np.sqrt(np.sum(np.power(D, 2.0), axis=1))
idx_s = int((2 * n - i - 1) * i / 2.0)
idx_e = int((2 * n - i - 2) * (i + 1) / 2.0)
pdistances[idx_s:idx_e] = distance
return pdistances


Using Numba is almost (see blue box below) as simple as adding the decorator @jit(nopython=True, fastmath=True) to our function.

Inside the function pdist_numba_serial, we basically copied the codes except the line D = D - np.round(D / l) * l in the original code. Instead we need to use np.round(D / l, 0, out) which is pointed out here

### Parallel Numba Implementation

pdist_numba_serial is a serial implementation. The nature of pair-wise distance computation allows us to parallelize the process by simplifying distributing pairs to multiple cores/threads. Fortunately, Numba does provide a very simple way to do that. The for loop in pdist_numba_serial can be parallelized using Numba by replacing range with prange and adding parallel=True to the decorator,

from numba import prange

# add parallel=True to the decorator
@jit(nopython=True, fastmath=True, parallel=True)
def pdist_numba_parallel(positions, l):
# determine the number of particles
n = positions.shape
# create an empty array storing the computed distances
pdistances = np.empty(int(n*(n-1)/2.0))
# use prange here instead of range
for i in prange(n-1):
D = positions[i] - positions[i+1:]
out = np.empty_like(D)
D = D - np.round(D / l, 0, out) * l
distance = np.sqrt(np.sum(np.power(D, 2.0), axis=1))
idx_s = int((2 * n - i - 1) * i / 2.0)
idx_e = int((2 * n - i - 2) * (i + 1) / 2.0)
pdistances[idx_s:idx_e] = distance
return pdistances


There are some caveats when using prange when race condition would occur. However for our case, there is no race condition since the distances calculations for pairs are independent with each other, i.e. there is no communication between cores/threads. For more information on parallelizing using Numba, refer to their documentation.

Benchmark

Now let’s benchmark the two versions of Numba implementations. The result is shown below,

Compared to the fastest Numpy implementation shown in Part II, the serial Numba implementation provides more than three times of speedup. As one can see, the parallel version is about twice as fast as the serial version on my 2-cores laptop. I didn’t test on the machines with more cores but I expect the speed up should scale linearly with the number of cores.

I am sure there are some more advanced techniques to make the Numba version even faster (using CUDA for instance). I would argue that the implementations above are the most cost-effective.

## Using Cython

As demonstrated above, Numba provides a very simple way to speed up the python codes with minimal effort. However, if we want to go further, it is probably better to use Cython.

Cython is basically a superset of Python. It allows Cython/Python codes to be compiled to C/C++ and then compiled to machine codes using C/C++ compiler. In the end, you have a C module you can import directly in Python.

Similar to the Numba versions, I show both serial and parallel versions of Cython implementations

### Serial Cython implementation

%load_ext Cython # load Cython in Jupyter Notebook
%%cython --force --annotate

import cython
import numpy as np

from libc.math cimport sqrt
from libc.math cimport nearbyint

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.cdivision(True) # Do not check division, may leads to 20% performance speedup
def pdist_cython_serial(double [:,:] positions not None, double l):
cdef Py_ssize_t n = positions.shape
cdef Py_ssize_t ndim = positions.shape

pdistances = np.zeros(n * (n-1) // 2, dtype = np.float64)
cdef double [:] pdistances_view = pdistances

cdef double d, dd
cdef Py_ssize_t i, j, k
for i in range(n-1):
for j in range(i+1, n):
dd = 0.0
for k in range(ndim):
d = positions[i,k] - positions[j,k]
d = d - nearbyint(d / l) * l
dd += d * d
pdistances_view[j - 1 + (2 * n - 3 - i) * i // 2] = sqrt(dd)

return pdistances


Some Remarks

• Declare static types for variables using cdef. For instance, cdef double d declare that the variable d has a double/float type.

• Import sqrt and nearbyint from C library instead of using Python function. The general rule is that always try to use C functions directly whenever possible.

• positions is a Numpy array and declared using Typed Memoryviews.

• Similar to positions, pdistances_view access the memory buffer of the numpy array pdistances. Value assignments of pdistances are achieved through pdistances_view.

• It is useful to use %%cython --annotate to display the analysis of Cython codes. In such a way, you can inspect the potential slowdown of the code. The analysis will highlight lines where Python interaction occurs. In this particular example, it is very important to keep the core part – nested loop – from Python interaction. For instance, if we don’t use sqrt and nearbyint from libc.math but instead just use python’s built-in sqrt and round, then you won’t see much speedup since there is a lot of overhead in calling these python functions inside the loop.

### Parallel Cython Implementation

Similar to Numba, Cython also allows parallelization. The parallelization is achieved using OpenMP. First, to use OpenMP with Cython, we need to import needed modules,

from cython.parallel import prange, parallel


Then, replace the for i in range(n-1) in the serial version with

with nogil, parallel():
for i in prange(n-1, schedule='dynamic'):


Everything else remains the same. Here I follow the example on Cython’s official documentation.

schedule='dynamic' allows the iterations in the loop are distributed through threads as request. Other options include static, guided, etc. See here for full documentation.

I had some trouble compiling the parallel version directly in the Jupyter Notebook. Instead, it is compiled as a standalone module. The .pyx file and setup.py file can be found here.

Benchmark

The result of benchmarking pdist_cython_serial and pdist_cython_parallel is shown in the figure below,

As expected, the serial version is about half the speed of the parallel version on my 2-cores laptop. The serial version is more than two times faster than its counterpart using Numba.

## Summing up

In this serial of posts, using computations of pair-wise distance under periodic boundary condition as an example, I showed various ways to optimize the Python codes using built-in Python functions (Part I), NumPy (Part II), Numba and Cython (this post). The benchmark results from all of the functions tested are summarized in the table below,

Function Averaged Speed (ms) Speedup
pdist 1270 1
pdist_v2 906 1.4
pdist_np_broadcasting 160 7.9
pdist_np_naive 110 11.5
pdist_numba_serial 20.7 61
pdist_numba_parallel 12.6 101
pdist_cython_serial 5.84 217
pdist_cython_parallel 3.19 398

The time is measured when $N=1000$. The parallel versions are tested on a 2-cores machine.

]]>
Optimizing python code for computations of pair-wise distances - Part II 2019-10-08T00:00:00+00:00 /posts/python-optimization-using-different-methods-part-2/ Article Series

This is part II of series of three posts on optimizing python code. Using an example of computing pair-wise distances under periodic boundary conditions, I will explore several ways to optimize the python codes, including pure python implementation without any third-party libraries, Numpy implementation, and implementation using Numba or Cython.

In this post, I show how to use Numpy to do the computation. I will demonstrate two different implementations.

## Background

Just to reiterate, the computation is to calculate pair-wise distances between every pair of $N$ particles under periodic boundary condition. The positions of particles are stored in an array/list with form [[x1,y1,z1],[x2,y2,z2],...,[xN,yN,zN]]. The distance between two particles, $i$ and $j$ is calculated as the following,

$\Delta_{ij} = \sigma_{ij} - \left[ \sigma_{ij}/L \right] \cdot L$

where $\sigma_{ij}=x_i-x_j$ and $L$ is the length of the simulation box edge. $x_i$ and $x_j$ is the positions. For more information, you read up in Part I.

## Naive Numpy Implementation

By naive, what I meant is that we simply treat numpy array like a normal python list and utilize some basic numpy functions to compute quantity such as summation, mean, power, etc. To get to the point, the codes are the following,

def pdist_np_naive(positions, l):
"""
Compute the pair-wise distances between every possible pair of particles.

positions: a numpy array with form np.array([[x1,y1,z1],[x2,y2,z2],...,[xN,yN,zN]])
l: the length of edge of box (cubic/square box)
return: a condensed 1D list
"""
# determine the number of particles
n = positions.shape
# create an empty array storing the computed distances
pdistances = np.empty(int(n*(n-1)/2.0))
for i in range(n-1):
D = positions[i] - positions[i+1:]
D = D - np.round(D / l)  * l
distance = np.sqrt(np.sum(np.power(D, 2.0), axis=1))
idx_s = int((2 * n - i - 1) * i / 2.0)
idx_e = int((2 * n - i - 2) * (i + 1) / 2.0)
pdistances[idx_s:idx_e] = distance
return pdistances


Benchmark

n = 100
positions = np.random.rand(n,2)
%timeit pdist_np_naive(positions,1.0)
2.7 ms ± 376 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


The performance is not bad. This is roughly 4 times speedup compared to the pure python implementation shown in Part I (might not be as fast as what one would expect since the python code shown in the previous post is already well-optimized). Is there any way we can speed up the calculation? We know that for loops can be very slow in python. Hence, eliminating the for loop in the example above might be the correct direction. It turns out that we can achieve this by fully utilizing the broadcasting feature of numpy.

To get rid of the loops in the codes above, we need to find some numpy native way to do the same thing. One typical method is to use the broadcasting. Consider the following example,

a = np.array([1,2,3])
b = 4
a + b
>>> array([5,6,7])


This is a simpler example of broadcasting. The underlying operation, in this case, is a loop over the element of a and add value of b to it. Instead of writing the loop ourselves, you can simply do a+b and numpy will do the rest. The term “broadcasting” is in the sense that b is stretched to be the same dimension of a and then element-by-element arithmetic operations are taken. Because the broadcasting is implemented in C under the hood, it is much faster than writing for loop explicitly.

The nature of pair-wise distance computation requires double nested loops which iterate over every possible pair of particles. It turns out that such a task can also be done using broadcasting. Again, I recommend reading their official documentation on broadcasting. The example 4 on that page is a nested loop. Look at the example, shown below

import numpy as np
a = np.array([0.0,10.0,20.0,30.0])
b = np.array([1.0,2.0,3.0])
a[:, np.newaxis] + b
>>> array([[  1.,  2.,  3.],
[ 11., 12., 13.],
[ 21., 22., 23.],
[ 31., 32., 33.]])


Notice that the + operation is applied on every possible pair of elements from a and b. It is equvanlently to the codes below,

a = np.array([0.0,10.0,20.0,30.0])
b = np.array([1.0,2.0,3.0])
c = np.empty((len(a), len(b)))
for i in range(len(a)):
for j in range(len(b)):
c[i,j] = a[i] + b[j]


The broadcasting is much simpler regarding the syntax and faster in many cases (but not all) compared to explicit loops. Let’s look at another example shown below,

a = np.array([[1,2,3],[-2,-3,-4],[3,4,5],[5,6,7],[7,6,5]])
diff = a[:, np.newaxis] - a
print('shape of array [a]:', a.shape)
print('Shape of array [diff]:', diff.shape)
>>> shape of array [a]: (5,3)
>>> shape of array [diff]: (5,5,3)


Array a, with shape (5,3), represents 5 particles with coordinates on three dimensions. If we want to compute the differences between each particle on each dimension, a[:, np.newaxis] - a does the job. Quantity a[:, np.newaxis] - a has a shape (5,5,3) whose first and second dimension is the particle indices and the third dimension is spatial.

Following this path, we reach the final code to compute the pair-wise distances under periodic boundary condition,

def pdist_np_broadcasting(positions, l):
"""
Compute the pair-wise distances between every possible pair of particles.

postions: numpy array storing the positions of each particle. Shape: (nxdim)
l: edge size of simulation box
return: nxn distance matrix
"""
D = positions[:, np.newaxis] - positions # D is a nxnxdim matrix/array
D = D - np.around(D / l) * l
# unlike the pdist_np_naive above, pdistances here is a distance matrix with shape nxn
pdistances = np.sqrt(np.sum(np.power(D, 2.0), axis=2))
return pdistances


Benchmark

n = 100
positions = np.random.rand(n,2)
>>> 1.43 ms ± 649 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


This is about twice as fast as the naive numpy implementation.

pdist_np_broadcasting returns an array with shape (n,n) which can be considered as a distance matrix whose element [i,j] is the distances between particle i and j. As you can see, this matrix is symmetric and hence contains duplicated information. There are probably better ways than what shown here to only compute the upper triangle of the matrix instead of a full one.

Now let’s make a final systematic comparison between pdsit_np_naive and pdist_np_broadcasting. I benchmark the performance for different values of n and plot the speed as the function of n. The result is shown in the figure below,

The result is surprising. The broadcasting version is faster only when the data size is smaller than 200. For large data set, the naive implementation turns out to be faster. What is going on? After googling a little bit, I found these StackOverflow questions 1, 2, 3. It turns out that the problem may lie in memory usage and access. Using the memory-profiler, I can compare the memory usage from the two versions as a function of n (see the figure below). The result shows that pdist_np_broadcasting uses much more memory than pdist_np_naive, which could explain the differences in speed.

The origin of the difference in memory usage is that for the pdist_np_naive version, the computation is splitted into individual iteractions of the for loop. Whereas the pdist_np_broadcasting performs the computation in one single batch. pdist_np_naive executes D = positions[i] - positions[i+1:] inside the loop and every single iteration only creates an array of D of size smaller than n. On the other hand, D = positions[:, np.newaxis] - positions and D = D - np.around(D / l) * l in pdist_np_broadcasting create several temporary array of size n*n.

## Summing up

First, both of numpy implementations shown here lead to several times of speed up comparing to the pure python implementation. When working with numerical computation, use Numpy usually will give better performance. One of the counterexamples would be appending to a list/array where python’s append is much faster than numpy’s append.

Many online tutorials and posts recommend using the numpy’s broadcasting feature whenever possible. This is a largely correct statement. However, the example given here shows that the details of the implementation of broadcasting matters. On numpy’s official documentation, it states

There are also cases where broadcasting is a bad idea because it leads to inefficient use of memory that slows computation

pdist_np_broadcasting is one of the examples where broadcasting might hurt performance. I guess the take-home message is that do not neglect space complexity (memory requirement) if you are trying to optimize the codes and numpy’s broadcasting is not always a good idea.

In the next post, I will show how to use Numba and Cython to boost the computation speed even more.

]]>
Optimizing python code for computations of pair-wise distances - Part I 2019-09-30T00:00:00+00:00 /posts/python-optimization-using-different-methods/ In this series of posts, several different Python implementations are provided to compute the pair-wise distances in a periodic boundary condition. The performances of each method are benchmarked for comparison. I will investigate the following methods.

Article Series

## Background

In molecular dynamics simulations or other simulations of similar types, one of the core computations is to compute the pair-wise distances between particles. Suppose we have $N$ particles in our system, the time complexity of computing their pair-wise distances is $O(N^2)$. This is the best we can do when the whole set of pair-wise distances are needed. The good thing is that for actual simulation, in most the cases, we don’t care about the distances if it is larger than some threshold. In such a case, the complexity can be greatly reduced to $O(N)$ using neighbor list algorithm.

In this post, I won’t implement the neighbor list algorithm. I will assume that we do need all the distances to be computed.

If there is no periodic boundary condition, the computation of pair-wise distances can be directly calculated using the built-in Scipy function scipy.spatial.distance.pdist which is pretty fast. However, with periodic boundary condition, we need to roll our own implementation. For a simple demonstration without losing generality, the simulation box is assumed to be cubic and has its lower left forward corner at the origin. Such set up would simplify the computation.

The basic algorithm of calculating the distance under periodic boundary condition is the following,

$\Delta = \sigma - \left[\sigma/L\right] * L$

where $\sigma = x_i - x_j$ and $L$ is the length of the simulation box edge. $\left[\cdot\right]$ denote the nearest integer. $x_i$ and $x_j$ is the position of particle $i$ and $j$ at one dimension. This computes the distance between two particles along one dimension. The full distance would be the square root of the summation of $\Delta$ from all dimensions.

Basic setup:

• All codes shown are using Python version 3.7

• The number of particles is n

• The positions of all particles are stored in a list/array of the form [[x1,y1,z1],[x2,y2,z2],...,[xN,yN,zN]] where xi is the coordinates for particle i.

• The length of simulation box edge is l.

• We will use libraries and tools such as numpy, itertools, math, numba, cython.

## Pure Python Implementation

To clarify first, by pure, I mean that only built-in libraries of python are allowed. numpy, scipy or any other third-party libraries are not allowed. Let us first define a function to compute the distance between just two particles.

import math

def distance(p1, p2, l):
"""
Computes the distance between two points, p1 and p2.

p1/p2:python list with form [x1, y1, z1] and [x2, y2, z2] representing the cooridnate at that dimension
l: the length of edge of box (cubic/square box)
return: a number (distance)
"""
dim = len(p1)
D = [p1[i] - p2[i] for i in range(dim)]
distance = math.sqrt(sum(map(lambda x: (x - round(x / l) * l) ** 2.0, D)))
return distance


Now we can define the function to iterate over all possible pairs to give the full list of pair-wise distances,

def pdist(positions, l):
"""
Compute the pair-wise distances between every possible pair of particles.

positions: a python list in which each element is a a list of cooridnates
l: the length of edge of box (cubic/square box)
return: a condensed 1D list
"""
n = len(positions)
pdistances = []
for i in range(n-1):
for j in range(i+1, n):
pdistances.append(distance(positions[i], positions[j], l))
return pdistances


The function pdist returns a list containing distances of all pairs. Let’s benchmark it!

import numpy as np
n = 100
positions = np.random.rand(n,3).tolist() // convert to python list

%timeit pdist(positions, 1.0)
14.8 ms ± 2.42 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


Such speed is sufficient if n is small. In the above example, we already utilize the built-in map function and list comprehension to speed up the computation. Can we speed up our code further using only built-in libraries? It turns out that we can. Notice that in the function pdist, there is a nested loop. What that loop is doing is to iterate over all the combinations of particles. Luckily, the built-in module itertools provides a function combinations to do just that. Given a list object lst or other iterable object, itertools.combinations(lst, r=2) generates a iterator of all unique pair-wise combination of elements from lst without duplicates. For instance list(itertools.combinations([1,2,3], r=2)) will return [(1,2),(1,3),(2,3)]. Utilizing this function, we can rewrite the pdist function as the following,

def pdist_v2(positions, l):
# itertools.combinations returns an iterator
all_pairs = itertools.combinations(positions, r=2)
return [math.sqrt(sum(map(lambda p1, p2: (p1 - p2 - round((p1 - p2) / l) * l) ** 2.0, *pair))) for pair in all_pairs]


Explanation:

• First, we use itertool.combinations() to return an iterator all_pairs of all possible combination of particles. r=2 means that we only want pair-wise combinations (no triplets, etc)

• We loop over the all_pairs using list comprehension using [do_something(pair) for pair in all_pairs].

• item is a tuple of coordinates of two particles, ([xi,yi,zi],[xj,yj,zj]).

• We use *pair to unpack the tuple object pair and then use map and lambda function to compute the square of distances along each dimension. p1 and p2 represents the coordinates of a pair of particles.

Rigorously speaking, itertools.combinations takes an iterable object as an argument and returns an iterator. I recommend to read this article and the official documentation to understand the concept of iterable/iterator/generator which is very important for advanced python programming.

Now let’s benchmark the pdist_v2 and compare it to pdist. To make comparison systematically, I benchmark the performance for different values of n and plot the speed as the function of n. The result is the below,

If this is plotted on a log-log scale, one can readily see that both curves scale as $N^2$ which is expected.

## Conclusion

The pdist_v2 implementation is about 38% faster than pdist version. I guess the take-home message from this result is that replacing explicit for loop with functions like map and itertools can boost the performance. However, one needs to make a strategic decision here, as the pdist version with the explicit loop is much more readable and easier to understand whereas pdist_v2 requires a more or less advanced understanding of Python. Sometimes the readability and maintability of code are more important than its performance.

In the benchmark code above, we convert the numpy array of positions to python list. Since numpy array can be treated just like a python list (but not vice versa), we can instead directly provide numpy array as the argument in both pdist and pdist_v2. However, one can experiment a little bit to see that using numpy array directly actually slow down the computation a lot (about 5 times slower on my laptop). The message is that mixing numpy array with built-in functions such as map or itertools harms the performance. Instead, one should always try to use numpy native functions if possible when working with numpy array.

In the next post, I will show how to use Numpy to do the same calculation but faster than the pure python implementation shown here.

]]>
Run a random walker in the browser using Pyodide 2019-09-23T00:00:00+00:00 /posts/run-a-random-walker-in-your-website-using-pyodide/

Since I got this site running, I have been wanting to be able to embed some kind of interactive plot in my blog post. For instance, say I want the user to be able to perform some machine learning computations and then visualize the result. Currently, there are a few options to achieve this,

• Pure Javascript solution. Both computation and visualization are performed using javascript. This process can either happens on the server or in the browser.

• Combination of python and javascript

• either the computation is done with python but happens on a server,
• or the computation is done with python directly in the browser.
• Any communication with DOM such as visualization is through javascript or API in python.

I have always been amazed by things people can do with javascript, such as deep learning using javascript inside your browser. But I can’t imagine javascript taking over python in scientific computing in near future. I, personally, am much more comfortable with python. Besides, the language has a much more mature scientific library ecosystem. To be able to use python to perform the computation part is essential, hence leaving us only with the second option, which is that python code runs either on a server or directly inside the browser.

Using a server to perform computations means communication with the server. This can have some drawbacks,

• Depends on the usage, one may need to pay for the server.
• Communication overhead can cause delays in user interaction.

With my experience with Binder, the second point can be a dealbreaker. The solution would be simple. Just eliminate the server step! However, since for a long time javascript is the only programming language the browser can interpret directly. No server means that we need to find some way to run python code in the browser directly. There are quite a few options, such as PyPy.js. However it is not possible to use Numpy, Pandas and many other scientific/data analysis libraries in the browser until the Pyodide project came out recently. Pyodide allows python code to run inside the browser through WebAssembly. The best thing is that it allows one to use a few most popular scientific libraries including Numpy, Matplotlib, Pandas, Scipy and even Scikit-learn inside the browser! In fact, to my understanding, any python libraries in principle can be used through Pyodide. I am by no means expert on how Pyodide works. I suggest reading their blog post and checking out the project github repository.

I have been experimenting with Pyodide for a few days. In this post, I would like to give a proof-of-concept demonstration. Since I deal with random walks a lot in my research, I would like to make a simple random walk animation demonstration which

• allows users to specify the number of the steps
• calculate the random walk trajectory in the browser on the fly
• animate the generated trajectory of the random walk

In this example, I will use python code to generate the trajectory of a simple 2D random walker and use plotly.js to handle the visualization.

## Python code for 2D random walker

For demonstration purpose, the random walk in this example is simple,

• It is a two-dimensional walk
• At each step, the displacement along the $x$ and $y$ dimensions are independent and drawn from a gaussian distribution with mean zero and unit variance.
• The number of steps is specified beforehand

Here is the python code for generating such random walk,

The following code can be certainly rewritten in javascript, but the simplicity of python’s syntax and its ecosystem of scientific libraries greatly lower the barrier of writing code for more complex computation compared to other languages (this is just my opinion).

# load numpy library
import numpy as np

# function for generating random walk
# it takes the number of steps as only parameter
def walk(n):
# check if the number of steps is an integer
if int(n) != n:
print('number of steps should be an integer')
return None
# the initial position is (0,0)
xy_0 = np.array([0.0, 0.0])
# generate displacements of each step
dxdy = np.random.randn(int(n), 2)
# cumulative sum displacement to get positions at each step
xy = xy_0 + np.cumsum(dxdy, axis=0)
# insert the initial position at the head of the array
xy = np.vstack((xy_0, xy))
# since javascript has no 2D array, it is better to
# return the x-position and y-position, separately
return xy[:,0], xy[:,1]


## Call our python function inside the browser

Now we would like to be able to call this python code in the browser on demand. The browser then does the calculation and get two arrays which contain the $x$ coordinates and $y$ coordinates. Then we can use plotly.js to animate the trajectory.

For better maintainability, I suggest to put python code in a github gist and fetch the content on the fly. It also has an extra benefit that it allows the modification of python code without rebuilding the site.

Before I continue, I would like to point out that one of the biggest problems of Pyodide is that it is very large. To use it, the browser needs to download about 24 Mb code and Numpy library needs another 8 Mb which leads to a total of 32 Mb download size. I want the user to download the Pyodide only when they want to. To achieve this, I dynamically load the Pyodide script only when the initialization button is clicked (see demonstration below).

The python code is called through

gistFetchPromise.then(res => {pyodide.runPython(res)})


where gistFetchPromise is the promise object of fetching the gist content. Note that the python code needed to be parsed as a raw string. The pyodide.runPython() function is called to execute the python code. Once it is executed, all the python objects are available in the browser. The defined python function walk can be accessed through pyodide.globals.walk. Here is an example,

// Here is the javascript code
// we assign the python function [walk] to javascript [walk]
let walk = pyodide.globals.walk;
// we can call the function [walk] in javascript
let [x,y] = walk(1000);
// now x and y have values of positions of our random walker


The communication between python and javascript is two-way, meaning that we can access javascript variables/objects/functions in python as well. This notebook has some examples.

Once we get the calculated positions x and y, we can use plotly.js to plot the result. Fortunately, plotly.js provides a relative simple API for animation. One can also use Bokeh, D3, or any other web visualization tool out there. It is even possible to do the visualization in python directly since Pyodide also work with Matplotlib. However, at this stage, I think it is more straight forward to use a javascript library to handle the visualization since it is designed to manipulate DOM (HTML) after all.

I don’t want to make this post super long, thus I won’t go into very details of the visualization part. The full javascript code we need to load in the page can be found here. The file includes the code for fetching gist, visualization using plotly.js, Pyodide code and event handlers for buttons.

## Demonstration

Here is the end product! Click the button Initialize Pyodide to download the Pyodide and load Numpy. Once the initialization is finished (it can takes about 20 seconds or even longer with slow network. Not good, I know …), the button Reset, Start and Pause will become clickable and green. Then enter a step number (or use the default number 100) and hit Start button to watch the animation of a 2D random walker. Click Pause to pause the animation anytime and Start to resume. Click Reset button to reset the random walker.

Every time you hit Reset and Start, a new random walk trajectory is generated directly inside your browser. There is no server involved whatsoever!

Since Pyodide uses WebAssembly, older browser cannot run the demonstration. You can check whether your browser support WebAssembly. I recommend use latest version of desktop chrome and firefox for the best experience.

]]>
Speed 2019-09-04T00:00:00+00:00 /posts/speed/ A long time ago, I came upon hyper.js. Before then I was using the native terminal shipped with macOS and iTerm for a long time. The aesthetics of the design of hyper.js immediately hook me. I downloaded it and started using it for my daily research tasks. I immediately find it slow. It very frequently gives no response to my input and even using ls to list a directory sometimes can cause it to freeze for a few seconds. I tried to use it for probably a month and eventually gave up. I resorted back to iTerm and appreciated its speed which I never thought about before. Although I still couldn’t appreciate its design at that time, its speed is fast enough to not hinder my work in any way.

Two months ago, I noticed that hyper.js has released version 3 which they claimed to be much faster than the previous version. I gave it another try and the new version is indeed much faster than the version I used before. But after several days of usage, I cannot ignore the noticeable lagging (which is probably around 100 ms). One may argue such tiny lagging is not a big deal, but I find it unbearable if I intend to use it as my main terminal.

The same story goes to Atom. Again, it looks much better than Sublime Text or Vim (Neovim) and has a superb plugin ecosystem. But it is slow, the same experience shared by many people. Its new version certainly feels much faster than the version I used one year ago. However, once installed a few plugins and I started to notice the slowing down of opening a new file, the response from linter, etc. And due to its high memory usage, once you have several applications running, it becomes too slow to work on.

Furthermore, my research often requires me to open some large MD simulation trajectory files (200 MB on the small end, usually 1 GB and above), Atom or even Sublime Text isn’t able to handle it. I have to use Vim (actually Neovim) in such case. I imagine many people who deal with large data file daily will find Vim/Neovim is their only truly reliable text editor.

Both Hyper.js and Atom are not native applications but ones built with electron framework which are essentially web apps/sites running on your local computer. I do see the appeal of Electron which gives developers the ability to write cross-platform software/application using javascript, and maybe it is the future. A good example of Electron-based application is Visual Studio Code which I have been using for a well and it seems it is an application written with performance in mind. I do hope more apps follow this path.

P.S. I just read the terminal latency benchmark by danluu. The data there suggests hyper.js is faster than iTerm (even back in 2017)! However, the benchmark is rather simple, compared to the day-to-day use case. But I do notice that the memory consumed by hyper.js is much higher than other terminals. It could be the reason why I find it frequently freeze (I usually will have a bunch applications/software - Jupyter Notebook, a bunch of tabs in chrome, PDF reader, VMD, Sublime Text, etc - going on at the same time).

]]>
Customize Netlify CMS preview with Markdown-it and Prism.js 2019-08-26T03:40:47.249+00:00 /posts/customize-netlify-cms-preview/ This site is hosted on Netlify and configured with Netlify CMS. I normally would like to write my post or other contents on this site using vim or other text editors. However, sometimes it is convenient to be able to edit contents online (in the browser) and a CMS allows me to do just that. I can just login https://www.guangshi.io/admin/ in any computer and start editing. In addition, a CMS provides UI to easier editing. The post written and saved in the admin portal is directly commited to the GitHub and trigger a rebuild on Netlify. This very post you are reading now is written and published using Netlify CMS admin portal.

The Netlify CMS provides a preview pane which reflects any editing in real-time. However, the default preview pane does not provide some functionalities I need, such as the ability to render math expression and highlight syntax in code blocks. Fortunately, it provides ways to customize the preview pane. The API registerPreviewTemplate can be used to render customized preview templates. One can provide a React component and the API can use it to render the template. This functionality allows me to incorporate markdown-it and prismjs directly into the preview pane.

In this post, I will demonstrate,

• How to write a simple React component for the post.
• How to use markdown-it and prism.js in the template.
• How to pre-compile the template and use it.

## A simple React component for custom preview

I guess a simple preview template would render a title and the body of the markdown text. Using the variable entry provided by Netlify CMS, the template can be written as the following,

// Netlify CMS exposes two React method "createClass" and "h"
import htm from 'https://unpkg.com/htm?module';
const html = htm.bind(h);

var Post = createClass({
render() {
const entry = this.props.entry;
const title = entry.getIn(["data", "title"], null);
const body = entry.getIn(["data", "body"], null);

return html
<body>
<main>
<article>
<h1>${title}</h1> <div>${body}</div>
</article>
</main>
</body>
;
}
});


In the example shown above, I use htm npm module to write JSX like syntax without need of compilation during build time. It is also possible to directly use the method h provided by Netlify CMS (alias for React’s createElement) to write the render template, which is the method given in their official examples.

• this.props.entry is exposed by CMS which is a immutable collection containing the collection data which is defined in the config.yml
• entry.getIn(["data", "title"]) and entry.getIn(["data", "body"]) access the collection fields title and body, respectively

## Use markdown-it and prism.js in the template

The problem with the template shown above is that the variable body is just a raw string in markdown syntax which is not processed to be rendered as HTML. Thus, we need a way to parse body and convert it into HTML. To do this, I choose to use markdown-it.

import markdownIt from "markdown-it";
import markdownItKatex from "@iktakahiro/markdown-it-katex";
import Prism from "prismjs";

// customize markdown-it
let options = {
html: true,
typographer: true,
highlight: function (str, lang) {
var languageString = "language-" + lang;
if (Prism.languages[lang]) {
return '<pre class="language-' + lang + '"><code class="language-' + lang + '">' + Prism.highlight(str, Prism.languages[lang], lang) + '</code></pre>';
} else {
return '<pre class="language-' + lang + '"><code class="language-' + lang + '">' + Prism.util.encode(str) + '</code></pre>';
}
}
};

var customMarkdownIt = new markdownIt(options);


The above codes demonstrate how to import markdown-it as a module and how to configure it.

• I use markdown-it-katex to enable the ability to render math expression.
• I use prism.js to perform the syntax highlighting. Note that the highlight part in the options allows the prism.js to add classes to code blocks and used for CSS styling (hence highlighting)

I recommend to use import to load the prism.js module in order to use babel-plugin-prismjs to bundle all the dependencies. I had trouble to get prism.js working in the browser using require instead of import.

Now we have loaded the markdown-it, the body can be translated to HTML using,

const bodyRendered = customMarkdownIt.render(entry.getIn(["data", "body"]));


To render bodyRendered, we have to use dangerouslySetInnerHTML which is provided by React to parse a raw HTML string into the DOM. Finally, the codes for the template are,

var Post = createClass({
render() {
const entry = this.props.entry;
const title = entry.getIn(["data", "title"], null);
const body = entry.getIn(["data", "body"], null);
const bodyRendered = customMarkdownIt.render(body || '');

return html
<body>
<main>
<article>
<h1>${title}</h1> <div dangerouslySetInnerHTML=${{__html: bodyRendered}}></div>
</article>
</main>
</body>
;
}
});

CMS.registerPreviewTemplate('posts', Post);


Note that there is a new line in the end. There, we use the method registerPreviewTemplate to register our template Post to be used for the CMS collection named posts.

## Pre-compile the template

Now, I have shown how to 1) write a simple template for the preview pane and 2) how to use markdown-it and prism.js in the template. However, the codes shown above cannot be executed in the browser since the browser has no access to the markdown-it and prismjs which live in your local node_modules directory. Here enters rollup.js which essentially can look into the node module markdown-it and prismjs, and take all the necessary codes and bundle them into one big file which contains all the codes needed without any external dependency anymore. In this way, the code can be executed directly inside the browser. To set up rollup.js. We need a config file,

// rollup.config.js
const builtins = require('rollup-plugin-node-builtins');
const commonjs = require('rollup-plugin-commonjs');
const nodeResolve = require('rollup-plugin-node-resolve');
const json = require('rollup-plugin-json');
const babel = require('rollup-plugin-babel');

export default {
output: {
format: 'esm',
},
plugins: [
nodeResolve({browser:true}),
commonjs({ignore: ["conditional-runtime-dependency"]}),
builtins(),
json(),
babel({
"plugins": [
["prismjs", {
"languages": ["javascript", "css", "markup", "python", "clike"]
}]
]
})
]
};

• src/admin/preview.js is the path of the template code
• Set the format to be esm tells the rollup.js to bundle the code as an ES module.
• I use the babel-plugin-prismjs to handle the dependencies of prism.js.

The perform the bundling, one can either use rollup --config in the terminal if rollup.js is installed globally or add it as a npm script. The config above tells the rollup.js to generate the file dist/admin/preview.js.

To use the template, the final step is to include it as a <script type=module> tag. Add the following in the <head> section in your admin/index.html,

<body>
</body>


## It works!

See this screenshot

Notes on some useful VMD tips 2019-08-18T00:00:00+00:00 /posts/vmd-tips/ In this note, I will accumulate some VMD tips I find useful. The main purpose of this note is for self convenience but I hope it can be useful for anyone who wander on this page

#### How to generate VMD .psf file from LAMMPS data file

In VMD console, use command cd to navigate to the directory where the LAMMPS data file is located. Then, run the following command

topo readlammpsdata your_data_file bond
animate write psf your_psf_file


If the command runs successfully, then you should find your .psf file generated in the directory. To use the .psf file, first load the generated .psf file and then load the trajectory file. You should find yourself be able to use the functionalities such as drawing method, coloring method, etc …

#### Fix certain molecular in the camera when view the trajectory file

Sometimes, we want to view certain molecular through the trajectory. However, the targeted molecular may diffuse in the simulation box, making the visualization difficult. We want to make the camera focus on the interested molecular. Here is a method to do this.

In Extension-Analysis-RMSD Visualizer Tool, use molecular you want to focus on as Atom Selection. Then run ALIGN. You can watch the trajectory as the molecular you select in the center of the camera now.

#### Rendering for publication quality image

To render publication quality image, follow the good practices below

• For each representation, select a material that is fairly diffuse such as the Diffuse material, or the AO-optimized AOShiny, AOChalky, or AOEdgy materials provided in VMD.
• Enable ambient occlusion lighting in the Display Settings window as described above.
• Set the AO Ambient factor to 1.0, and the AO Direct factor to 0.8 as an initial starting point.
• Render the scene using File - Render - Tachyon or TachyonInternal, or use the render command to do the same.
• Due to the increased computational complexity of rendering the molecule with ambient occlusion lighting, it’s highly recommended to run VMD and Tachyon on a multi-processor or multi-core workstation for best performance.
]]>
Concept illustrations for express.js and axios 2019-07-24T00:00:00+00:00 /posts/express-axios-exercise/ This note is about an exercise of using express.js and axios. First, I create a simple express server, and secondly, I use axios to make http call to the server created.

## Express server

The following is the code for our little express server.

// require the express
const express = require('express')
// create a express instance
const app = express()
// specify the port we want to listen to
const port = 3000
// define a data for illustration purpose
const mydata = {a:1,b:2,c:3}

app.get('/', (req, res)) => res.json(mydata)

app.listen(port, () => console.log(Example app listening on port ${port}!))  Save the above code to file myexpress-server.js. Now if you run node myexpress-server.js in your terminal and open http://localhost:3000 in your browser, you should see the values of mydata printed on the screen!. Now we have successfully set up a small express server! ## Use axios to make http call Now we want to acquire our mydata from some external place, we can use axios to make API call to our express server built and get our mydata object. Let’s write our axios code, // require axios const axios = require('axios') // define our axios.get function const getData = async () => { try { const mydata = await axios.get('http://localhost:3000') console.log(mydata.data) } catch (error) { console.error(error) } } // call our function getData()  Save the following code to a file named myaxios.js. Now if we a) start our express server by doing node myexpress-server.js in the terminal, and b) run our axios code in another terminal window using node myaxios.js. Whola, you can see the data for our mydata object printed on the terminal!. ]]> Use Tikhonov Regularization to Solve Fredholm Integral Equation 2019-04-26T00:00:00+00:00 /posts/fredholm-equation/ Background Fredholm integral equation of the first kind is written as, $f(x)=\int_{a}^{b}K(x,s)p(s)\mathrm{d}s.$ The problem is to find $p(s)$, given that $f(x)$ and $K(x,s)$ are known. This equation occurs quite often in many different areas. ## Discretization-based Method Here we describe a discretization-based method to solve the Fredholm integral equation. The integral equation is approximately replaced by a Riemann summation over grids, $f(x_i)=\sum_j \Delta_s K(x_i, s_j) p(s_j)$ where $\Delta_s$ is the grid size along the dimension $s$ and $x_i$, $s_j$ are the grid points with $i$ and $j$ indicating their indices. When grid size $\Delta_s\to0$, the summation converges to the true integral. It is more convenient to write it in the matrix form, $\boldsymbol{f} = \boldsymbol{\Delta}_s \boldsymbol{K} \boldsymbol{p}$ where $\boldsymbol{f}=(f(x_1), f(x_2),\cdots,f(x_n))^{\mathrm{T}},$ $\boldsymbol{K}= \begin{pmatrix} K(x_1,s_1) & K(x_1,s_2) & \cdots & K(x_1,s_m) \\ K(x_2,s_1) & K(x_2,s_2) & \cdots & K(x_2,s_m) \\ \vdots & \vdots & \ddots & \vdots \\ K(x_n,s_1) & K(x_n,s_2) & \cdots & K(x_n,s_m) \end{pmatrix}$ $\boldsymbol{p} = (p(s_1),p(s_2),\cdots,p(s_m))^{\mathrm{T}}$ and $\boldsymbol{\Delta}_s = \Delta_s \boldsymbol{I}$ with $\boldsymbol{I}$ being the identity matrix of dimension $n \times n$. Now solving the Fredholm integral equation is equivalent to solving a system of linear equations. The standard approach ordinary least squares linear regression, which is to find the vector $\boldsymbol{p}$ minimizing the norm $\vert\vert \boldsymbol{\Delta}_s \boldsymbol{K} \boldsymbol{p}-\boldsymbol{f}\vert\vert_2^2$. In principle, the Fredholm integral equation may have non-unique solutions, thus the corresponding linear equations are also ill-posed. The most commonly used method for ill-posed problem is Tikhonov regularization which is to minimize $\vert\vert\boldsymbol{\Delta}_s \boldsymbol{K} \boldsymbol{p}-\boldsymbol{f}\vert\vert_2^2+\alpha^2\vert\vert\boldsymbol{p}\vert\vert_2^2$ Note that this is actually a subset of Tikhonov regularization (also called Ridge regularization) with $\alpha$ being a parameter. ## When $p(s)$ is a probability density function In many cases, both $f(x)$ and $g(s)$ are probability density function (PDF), and $K(x,s)$ is a conditional PDF, equivalent to $K(x\vert s)$. Thus, there are two constraints on the solution $p(s)$, that is $p(s)\geq 0$ and $\int p(s)\mathrm{d}s = 1$. These two constraints translate to $p(s_i)\geq 0$ for any $s_i$ and $\Delta_s\sum_i p(s_i)=1$. Hence, we need to solve the Tikhonov regularization problem subject to these two constraints. In the following, I will show how to solve the Tikhonov regularization problem with both equality and inequality constraints. First, I will show that the Tikhonov regularization problem with non-negative constraint can be easily translated to a regular non-negative least square problem (NNLS) which can be solved using active set algorithm. Let us construct the matrix, $\boldsymbol{A}= \begin{pmatrix} \Delta_s \boldsymbol{K} \\ \alpha \boldsymbol{I} \end{pmatrix}$ and the vector, $\boldsymbol{b}= \begin{pmatrix} \boldsymbol{f}\\ \boldsymbol{0} \end{pmatrix}$ where $\boldsymbol{I}$ is the $m\times m$ identity matrix and $\boldsymbol{0}$ is the zero vector of size $m$. It is easy to show that the Tikhonov regularization problem $\mathrm{min}(\vert\vert\boldsymbol{\Delta}_{s} \boldsymbol{K} \boldsymbol{p} - \boldsymbol{f}\vert\vert_{2}^{2}+\alpha^2\vert\vert\boldsymbol{p}\vert\vert_{2}^{2})$ subject to $\boldsymbol{p}\geq 0$ is equivalent to the regular NNLS problem, $\mathrm{min}(\vert\vert\boldsymbol{A}\boldsymbol{p}-\boldsymbol{b}\vert\vert_2^2),\mathrm{\ subject\ to\ }\boldsymbol{p}\geq 0$ Now we add the equality constraint, $\Delta_s\sum_i p(s_i)=1$ or $\boldsymbol{1}\boldsymbol{p}=1/\Delta_s$ written in matrix form. My implementation of solving such problem follows the algorithm described in Haskell and Hanson . According to their method, the problem becomes another NNLS problem, $\mathrm{min}(\vert\vert\boldsymbol{1}\boldsymbol{p}-1/\Delta_s\vert\vert_2^2+\epsilon^2\vert\vert\boldsymbol{A}\boldsymbol{p}-\boldsymbol{b}\vert\vert_2^2),\mathrm{\ subject\ to\ }\boldsymbol{p}\geq 0$ The solution to the above equation converges to the true solution when $\epsilon\to0^+$. Now I have described the algorithm to solve the Fredholm equation of the first kind when $p(s)$ is a probability density function. I call the algorithm described above as non-negative Tikhonov regularization with equality constraint (NNETR). ## Code Here I show the core code of the algorithm described above. # core algorithm of non-negative equality Tikhonov regularization (NNETR) def NNETR(K, f, Delta, epsilon, alpha): # the first step A_nn = np.vstack((K, alpha * np.identity(K.shape))) b_nn = np.hstack((f, np.zeros(K.shape))) # the second step A_nne = np.vstack((epsilon * A_nn, np.full(A_nn.shape, 1.0))) b_nne = np.hstack((epsilon * b_nn, 1.0)) # Use NNLS solver provided by scipy sol, residue = scipy.optimize.nnls(A_nne, b_nne) # solution should be divided by Delta (grid size) sol = sol/Delta return sol, residue  ## Examples • Compounding an exponential distribution with its rate parameter distributed according to a gamma distribution yields a Lomax distribution $f(x)=a(x+1)^{-(a+1)}$, supported on $(0,\infty)$, with $a>0$. $k(x,\theta)=\theta e^{-\theta x}$ is an exponential density and $p(\theta) = \Gamma(a)^{-1}\theta^{a-1}e^{-\theta}$ is a gamma density. • Compounding a Gaussian distribution with mean distributed according to another Gaussian distribution yields (again) a Gaussian distribution $f(x)=\mathcal{N}(a,b^2+\sigma^2)$. $k(x\vert\mu)=\mathcal{N}(\mu,\sigma^2)$ and $p(\mu)=\mathcal{N}(a,b^2)$ • Compounding an exponential distribution with its rate parameter distributed according to a mixture distribution of two gamma distributions. Similar to the first example, we use $k(x,\theta)=\theta e^{-\theta x}$. But here we use $p(\theta)=q p(\theta\vert a_1)+(1-q)p(\theta\vert a_2)$ where $q$, $a_1$ and $a_2$ are parameters. It is clear that $p(\theta)$ is a mixture between two different gamma distributions such as it is a bimodal distribution. Following the first example, we have $f(x)=qf(x\vert a_1)+(1-q)f(x\vert a_2)$ where $f(x\vert a)=a(x+1)^{-(a+1)}$ • Compounding an exponential distribution with its rate parameter distributed as a discrete distribution. 1. Haskell, Karen H., and Richard J. Hanson. “An algorithm for linear least squares problems with equality and nonnegativity constraints.” Mathematical Programming 21.1 (1981): 98-118. ↩︎ ]]> Use Multidimensional LSTM Network to Learn Linear and Non-Linear Mapping 2018-02-18T00:00:00+00:00 /posts/mdrnn/ This note is about the effectiveness of using multidimensional LSTM network to learn matrix operations, such as linear mapping as well as non-linear mapping. Recently I am trying to solve a research problem related to mapping between two matrices. And came up the idea of applying neural network to the problem. The Recurrent Neural Network (RNN) came to my sight not long after I started searching since it seems be able to capture the spatiotemporal dependence and correlation between different elements whereas the convolutional neural network is not able to (I am probably wrong, this is just my very simple understanding and I am not a expert on neural network). Perhaps the most famous article about RNN online is this blog where Andrej Karpathy demonstrated the effectiveness of using RNN to generate meaningful text content. For whoever interested in traditional chinese culture, here is a github repo on using RNN to generate 古诗 (Classical Chinese poetry). However the above examples only focus taking the sequential input data and output sequential prediction. My problem is learning mapping between two matrices which is multidimensional. After researching a little bit, I found Multidimensional Recurrent Neural Network can be used here. If you google “Multidimensional Recurrent Neural Network”, the first entry would be this paper by Alex Graves, et al. However I want to point out that almost exact same idea is long proposed back in 2003 in the context of protein contact map prediction in this paper. I have never had any experience using neural network before. Instead of learning from scratch, I decided that it is probably more efficient to just find a github repo available and study the code from there. Fortunately I did find a very good exemplary code here. The question is that can MDLSTM learn the mapping between two matrices? From basic linear algebra, we know there are two types of mapping: linear map and non-linear map. So it is natural to study the problem in two cases. Any linear mapping can be represented by a matrix. For simplicity, I use a random matrix to represent the linear mapping we want to learn, $M$. And apply it to a gaussian field matrix $I$ to produce a new transformed matrix $O$, i.e. $O = M\cdot I$. We feed $I$ and $O$ into our MDLSTM network as our inputs and targets. Since our goal is to predict $O$ given the input $I$ where values of elements in $O$ are continuous rather than categorical. So we use linear activation function and mean square error as our loss function. def fft_ind_gen(n): a = list(range(0, int(n / 2 + 1))) b = list(range(1, int(n / 2))) b.reverse() b = [-i for i in b] return a + b def gaussian_random_field(pk=lambda k: k ** -3.0, size1=100, size2=100, anisotropy=True): def pk2(kx_, ky_): if kx_ == 0 and ky_ == 0: return 0.0 if anisotropy: if kx_ != 0 and ky_ != 0: return 0.0 return np.sqrt(pk(np.sqrt(kx_ ** 2 + ky_ ** 2))) noise = np.fft.fft2(np.random.normal(size=(size1, size2))) amplitude = np.zeros((size1, size2)) for i, kx in enumerate(fft_ind_gen(size1)): for j, ky in enumerate(fft_ind_gen(size2)): amplitude[i, j] = pk2(kx, ky) return np.fft.ifft2(noise * amplitude) def next_batch_linear_map(bs, h, w, mapping, anisotropy=True): x = [] for i in range(bs): o = gaussian_random_field(pk=lambda k: k ** -4.0, size1=h, size2=w, anisotropy=anisotropy).real x.append(o) x = np.array(x) y = [] for idx, item in enumerate(x): y.append(np.dot(mapping, item)) y = np.array(y) # data normalization for idx, item in enumerate(x): x[idx] = (item - item.mean())/item.std() for idx, item in enumerate(y): y[idx] = (item - item.mean())/item.std() return x, y  Note that we normalize the matrix elements by making their mean equals zero and variance equal 1. We can visualize the mapping by plotting the matrix h, w = 10, 10 batch_size = 10 linear_map = np.random.rand(h, w) batch_x, batch_y = next_batch(batch_size, h, w, linear_map) fig, ax = plt.subplots(1,3) ax.imshow(batch_x, cmap='jet', interpolation='none') ax.imshow(my_multiply, cmap='jet', interpolation='none') ax.imshow(batch_y, cmap='jet', interpolation='none') ax.set_title(r'$\mathrm{Input\ Matrix\ }I$') ax.set_title(r'$\mathrm{Linear\ Mapping\ Matrix\ }M$') ax.set_title(r'$\mathrm{Output\ Matrix\ }O$') ax.axis('off') ax.axis('off') ax.axis('off') plt.tight_layout() plt.show() Mapping between matrices As shown, the matrix $M$ maps $I$ to $O$. Such transformation is called linear mapping. I will show that MDLSTM can indeed learn this mapping up to reasonable accuracy. I use the codes here. The following code is the training part anisotropy = False learning_rate = 0.005 batch_size = 200 h = 10 w = 10 channels = 1 x = tf.placeholder(tf.float32, [batch_size, h, w, channels]) y = tf.placeholder(tf.float32, [batch_size, h, w, channels]) linear_map = np.random.rand(h,w) hidden_size = 100 rnn_out, _ = multi_dimensional_rnn_while_loop(rnn_size=hidden_size, input_data=x, sh=[1, 1]) # use linear activation function model_out = slim.fully_connected(inputs=rnn_out, num_outputs=1, activation_fn=None) # use a little different loss function from the original code loss = tf.sqrt(tf.reduce_sum(tf.square(tf.subtract(y, model_out)))) grad_update = tf.train.AdamOptimizer(learning_rate).minimize(loss) sess = tf.Session(config=tf.ConfigProto(log_device_placement=False)) sess.run(tf.global_variables_initializer()) # Add tensorboard (Really usefull) train_writer = tf.summary.FileWriter('Tensorboard_out' + '/MDLSTM',sess.graph) steps = 1000 mypredict_result = [] loss_series = [] for i in range(steps): batch = next_batch_linear_map(batch_size, h, w, linear_map, anisotropy) st = time() batch_x = np.expand_dims(batch, axis=3) batch_y = np.expand_dims(batch, axis=3) mypredict, loss_val, _ = sess.run([model_out, loss, grad_update], feed_dict={x: batch_x, y: batch_y}) mypredict_result.append([batch_x, batch_y, mypredict]) print('steps = {0} | loss = {1:.3f} | time {2:.3f}'.format(str(i).zfill(3), loss_val, time() - st)) loss_series.append([i+1, loss_val])  The loss as a function of steps is shown in the figure below. It seems the loss saturate around 70-75. Now let’s see how well our neural network learns? The following figures show five predictions on newly randomly generated input matrix. The results are pretty good for the purpose of illustration. I am sure there must be some room for improvements.  I choose the square of the matrix as the test for nonlinear mapping, $I^{2}$. def next_batch_nonlinear_map(bs, h, w, anisotropy=True): x = [] for i in range(bs): o = gaussian_random_field(pk=lambda k: k ** -4.0, size1=h, size2=w, anisotropy=anisotropy).real x.append(o) x = np.array(x) y = [] for idx, item in enumerate(x): y.append(np.dot(item, item)) # only changes here y = np.array(y) # data normalization for idx, item in enumerate(x): x[idx] = (item - item.mean())/item.std() for idx, item in enumerate(y): y[idx] = (item - item.mean())/item.std() return x, y  The following image are the loss function and results.  As you can see, the results are not great but very promising. ]]> Generating Random Walk Using Normal Modes 2017-11-17T00:00:00+00:00 /posts/generate-gaussian/ Long time ago, I wrote about how to use Pivot algorithm to generate equilibrium conformations of a random walk, either self-avoiding or not. The volume exclusion of a self-avoiding chain make it non-trivial to generate conformations. Gaussian chain, on the other hand, is very easy and trivial to generate. In addition to the efficient pivot algorithm, in this article, I will show another interesting but non-straightforward method to generate gaussian chain conformations. To illustrate this method which is used to generate static conformations of a gaussian chain, we need first consider the dynamics of a such system. It is well known the dynamics of a gaussian/ideal chain can be modeled by the Brownian motion of beads connected along a chain, which is ensured to give correct equilibrium ensemble. The model is called “Rouse model”, and very well studied. I strongly suggest the book The Theory of Polymer Dynamics by Doi and Edwards to understand the method used here. I also found a useful material here. I will not go through the details of derivation of solution of Rouse model. To make it short, the motion of a gaussian chain is just linear combinations of a series of inifinite number of independent normal modes. Mathematically, that is, $\mathbf{R}_{n}=\mathbf{X}_{0}+2\sum_{p=1}^{\infty}\mathbf{X}_{p}\cos\big(\frac{p\pi n}{N}\big)$ where $\mathbf{R}_{n}$ is the position of $n^{th}$ bead and $\mathbf{X}_{p}$ are the normal modes. $\mathbf{X}_{p}$ is the solution of langevin equation $\xi_{p}\frac{\partial}{\partial t}\mathbf{X}_{p}=-k_{p}\mathbf{X}_{p}+\mathbf{f}_{p}$. This is a special case of Orstein-Uhlenbeck process and the equilibrium solution of this equation is just a normal distribution with mean $0$ and variance $k_{\mathrm{B}}T/k_{p}$. $X_{p,\alpha}\sim \mathcal{N}(0,k_{\mathrm{B}}T/k_{p})\quad, \quad\alpha=x,y,z$ where $k_{p}=\frac{6\pi^{2}k_{\mathrm{B}}T}{N b^{2}}p^{2}$, $N$ is the number of beads or number of steps. $b$ is the kuhn length. This suggests that we can first generate normal modes. Since the normal modes are independent with each other and they are just gaussian random number. It is very easy and straightforward to do. And then we just transform them to the actual position of beads using the first equation and we get the position of each beads, giving us the conformations. This may seems untrivial at first glance but should give us the correct result. To test this, let’s implement the algorithm in python. def generate_gaussian_chain(N, b, pmax): # N = number of beads # b = kuhn length # pmax = maximum p modes used in the summation # compute normal modes xpx, xpy and xpz xpx = np.asarray(map(lambda p: np.random.normal(scale = np.sqrt(N * b**2.0/(6 * np.pi**2.0 * p**2.0))), xrange(1, pmax+1))) xpy = np.asarray(map(lambda p: np.random.normal(scale = np.sqrt(N * b**2.0/(6 * np.pi**2.0 * p**2.0))), xrange(1, pmax+1))) xpz = np.asarray(map(lambda p: np.random.normal(scale = np.sqrt(N * b**2.0/(6 * np.pi**2.0 * p**2.0))), xrange(1, pmax+1))) # compute cosin terms cosarray = np.asarray(map(lambda p: np.cos(p * np.pi * np.arange(1, N+1)/N), xrange(1, pmax+1))) # transform normal modes to actual position of beads x = 2.0 * np.sum(np.resize(xpx, (len(xpx),1)) * cosarray, axis=0) y = 2.0 * np.sum(np.resize(xpy, (len(xpy),1)) * cosarray, axis=0) z = 2.0 * np.sum(np.resize(xpz, (len(xpz),1)) * cosarray, axis=0) return np.dstack((x,y,z))  Note that there is a parameter called pmax. Although actual position is the linear combination of inifinite number of normal modes, numerically we must truncate this summation. pmax set the number of normal modes computed. Also in the above code, we use numpy broadcasting to make the code very consie and efficient. Let’s use this code to generate three conformations with different values of pmax and plot them # N = 300 # b = 1.0 conf1 = generate_gaussian_chain(300, 1.0, 10) # pmax = 10 conf2 = generate_gaussian_chain(300, 1.0, 100) # pmax = 100 conf3 = generate_gaussian_chain(300, 1.0, 1000) # pmax = 1000 fig = plt.figure(figsize=(15,5)) # matplotlib codes here plt.show() polymer conformations The three plots show the conformations with $p_{\mathrm{max}}=10$, $p_{\mathrm{max}}=100$ and $p_{\mathrm{max}}=1000$. $N=300$ and $b=1$. As clearly shown here, larger number of modes gives more correct result. The normal modes of small p corresponds the low frequency motion of the chain, thus with small pmax, we are only considering the low frequency modes. The conformation generated can be considered as some what coarse-grained representation of a gaussian chain. Larger the pmax is, more normal modes of higher frequency are included, leading to more detailed structure. The coarsing process can be vividly observed in the above figure from right to left (large pmax to small pmax). To test our conformations indeed are gaussian chain, we compute the mean end-to-end distance to test whether we get correct Flory scaling ($\langle R_{ee}^{2}\rangle = b^{2}N$). The famous Flory scaling As shown in the above plot, we indeed get the correct scaling result, $\langle R_{ee}^{2}\rangle = b^{2}N$. When using this method, care should be taken setting the parameter pmax, which is the number of normal modes computed. This number should be large enough to ensure the correct result. Longer the chain is, the larger pmax should be set. ]]> Simulating Brownian Dynamics (overdamped Langevin Dynamics) using LAMMPS 2017-11-06T00:00:00+00:00 /posts/simulating-brownian/ This article was originally posted on my old Wordpress blog here. LAMMPS is a very powerful Molecular Dynamics simulation software I use in my daily research. In our research group, we mainly run Langevin Dynamics (LD) or Brownian Dynamics (BD) simulation. However, for some reason, LAMMPS doesn’t provide a way to do Brownian Dynamics (BD) simulation. Both the LD and BD can be used to sample correct canonical ensemble, which sometimes also be called NVT ensemble. The BD is the large friction limit of LD, where the inertia is neglected. Thus BD is also called overdamped Langevin Dynamics. It is very important to know the difference between LD and BD since these two terms seems be used indifferently by many people which is simply not correct. The equation of motion of LD is, $m \ddot{\mathbf{x}} = -\nabla U(\mathbf{x}) - m\gamma \dot{\mathbf{x}}+\mathbf{R}(t)$ where $m$ is the mass of the particle, $x$ is its position and $\gamma$ is the damping constant. $\mathbf{R}(t)$ is random force. The random force is subjected to fluctuation-dissipation theorem. $\langle \mathbf{R}(0)\cdot\mathbf{R}(t) \rangle = 2m\gamma\delta(t)/\beta$. $\gamma=\xi/m$ where $\xi$ is the drag coefficient. $\mathbf{R(t)}$ is nowhere differentiable, its integral is called Wiener process. Denote the wiener process associated with$ \mathbf{R}(t)\$ as $\omega(t)$. It has the property $\omega(t+\Delta t)-\omega(t)=\sqrt{\Delta t}\theta$, $\theta$ is the Gaussian random variable of zero mean, variance of $2m\gamma/\beta$.

$\langle \theta \rangle = 0\quad\quad\langle \theta^{2}\rangle = 2m\gamma/\beta$

The fix fix langevin provided in LAMMPS is the numerical simulation of the above equation. LAMMPS uses a very simple integration scheme. It is the Velocity-Verlet algorithm where the force on a particle includes the friction drag term and the noise term. Since it is just a first order algorithm in terms of the random noise, it can not be used for large friction case. Thus the langevin fix in LAMMPS is mainly just used as a way to conserve the temperature (thermostat) in the simulation to sample the conformation space. However in many cases, we want to study the dynamics of our interested system realistically where friction is much larger than the inertia. We need to do BD simulation.

For a overdamped system, $\gamma=\xi/m$ is very large, let’s take the limit $\gamma=\xi/m\to\infty$, the bath becomes infinitely dissipative (overdamped). Then we can neglect the left side of the equation of LD. Thus for BD, the equation of motion becomes

$\dot{\mathbf{x}}=-\frac{1}{\gamma m}\nabla U(\mathbf{x})+\frac{1}{\gamma m}\mathbf{R}(t)$

The first order integration scheme of the above equation is called Euler-Maruyama algorithm, given as

$\mathbf{x}(t+\Delta t)-\mathbf{x}(t)=-\frac{\Delta t}{m\gamma}\nabla U(\mathbf{x})+\sqrt{\frac{2\Delta t}{m\gamma\beta}}\omega(t)$

where $\omega(t)$ is the normal random variable with zero mean and unit variance. Since for BD, the velocities are not well defined anymore, only the positions are updated. The implementation of this scheme in LAMMPS is straightforward. Based on source codes fix_langevin.cpp and fix_langevin.h in the LAMMPS, I wrote a custom fix of BD myself. The core part of the code is the following. The whole code is here.

void FixBD::initial_integrate(int vflag)
{
double dtfm;
double randf;

// update x of atoms in group

double **x = atom->x;
double **f = atom->f;
double *rmass = atom->rmass;
double *mass = atom->mass;
int *type = atom->type;
int nlocal = atom->nlocal;
if (igroup == atom->firstgroup) nlocal = atom->nfirst;

if (rmass) {
for (int i = 0; i < nlocal; i++)
dtfm = dtf / rmass[i];
randf = sqrt(rmass[i]) * gfactor;
x[i] += dtv * dtfm * (f[i]+randf*random->gaussian());
x[i] += dtv * dtfm * (f[i]+randf*random->gaussian());
x[i] += dtv * dtfm * (f[i]+randf*random->gaussian());
}

} else {
for (int i = 0; i < nlocal; i++)
dtfm = dtf / mass[type[i]];
randf = sqrt(mass[type[i]]) * gfactor;
x[i] += dtv * dtfm * (f[i]+randf*random->gaussian());
x[i] += dtv * dtfm * (f[i]+randf*random->gaussian());
x[i] += dtv * dtfm * (f[i]+randf*random->gaussian());
}
}
}


As one can see, the implementation of the integration scheme is easy, shown above. dtv is the time step $\Delta t$ used. dtfm is $1/(\gamma m)$ and randf is $\sqrt{2m\gamma/(\Delta t\beta)}$.

The Euler-Maruyama scheme is a simple first order algorithm. Many studies has been done on higher order integration scheme allowing large time step being used. I also implemented a method shown in this paper. The integration scheme called BAOAB is very simple, given as

$\mathbf{x}(t+\Delta t)-\mathbf{x}(t)=-\frac{\Delta t}{m\gamma}\nabla U(\mathbf{x})+\sqrt{\frac{\Delta t}{2m\gamma\beta}}(\omega(t+\Delta t)+\omega(t))$

The source code of this method can be downloaded here. In addition, feel free to fork my Github repository for fix bd and fix bd/baoab. I have done some tests and have been using this code in my research for a while and haven’t found problems. But please test the code yourself if you intend to use it and welcome any feedback if you find any problems.

To decide whether to use LD or BD in the simulation, one need to compare relevant timescales. Consider a free particle governed by the Langevin equation. Solving for the velocity autocorrelation function leads to, $\langle v(0)v(t)\rangle=(kT/m)e^{-\gamma t}$. This shows that the relaxation time for momentum is $\tau_m = 1/\gamma=m/\xi$. There is another timescale called Brownian timescale calculated by $\tau_{BD}=\sigma^2\xi/kT$ where $\sigma$ is the size of the particle. $\tau_{BD}$ is the timescale at which the particle diffuses about its own size. If $\tau_{BD}\gg \tau_m$ and if you are not interested at the dynamics on the timescale $\tau_m$, then one can use BD since the momentum can be safely integrated out. However, if these two timescales are comparable or $\tau_{BD}<\tau_m$, then only LD can be used because the momentum cannot be neglected in this case. To make the problem more complicated, there are more than just these two timescales in most of simulation cases, such as the relaxation time of bond vibration, etc… Fortunately, practically, comparing these two timescales is good enough for many cases.

]]>
Big Bend 徒步露营攻略 2017-09-26T00:00:00+00:00 /posts/big-bend/ 5月不是最适合去Big Bend露营的季节, 沙漠在每天最热的时候温度会高达40度, 再加上缺水, 所以很少人会选择夏季来Big Bend. 来Austin之后, 偶然看到一张South rim落日后的照片, 紫色的天空, 脚下是一眼望不到头的Chiwawa沙漠, 看到这张照片后就下定决心一定要去一趟. 无奈一直没有合适的时间, 再加上去Big Bend最适宜的季节 (冬季) 很难订到campsite, 一年多了也没有去成. 5月份学期结束, 我们决定干脆说走就走, 现在来看我俩很庆幸有了这次稍显准备仓促的旅行, 要不然真不知道什么时候能去成了.

big bend NP位于德州与墨西哥交界处, 由一条叫Rio Grande的河分界, 整个公园属于Chiwawa沙漠的一部分. 由于位置偏远, (我们从austin开过去要8个小时, 圣安东尼奥算是离公园最近的大城市也得6个小时), 所以是美国到访人数最少的几个国家公园之一 (全美一共59个国家公园big bend到访人数排名42). 这里最受欢迎的活动是徒步和露营, 整个公园的trail一共长180英里(290公里), 由于游客稀少, 所以在这里的露营和徒步会给你非常难忘的体验. Big Bend另外一个出名的是这里的夜空, 这里有美国最黑的夜晚 (因为附近实在太荒了, 几乎没有任何光污染), 遗憾的是我们没能好好欣赏这里的银河. 推荐如果有机会一定要熬到后半夜看看银河. 这里是公园官方关于观星的页面

## 住宿

Big bend公园内的住宿分为两种: camping和lodging.

## 我们的装备:

1. shelter
• 睡袋: 夏天晚上最冷的时候大概17,18度, 不需要低温的睡袋. 我们买的REI最便宜的一款
• 帐篷: 双人帐篷
• 防水布: 垫在垫子下面, 万一下雨多一层保护
2. 食物
• energy bar, 坚果类的, 牛肉干: 这些都属于边走边吃的, 补充糖分和能量
• Frozen-dry food: 强烈推荐. 这玩意一开始实际上是设计给宇航员吃的, 后来进入户外领域, 我们买的是mountain house的. 一是比想象的好吃很多. 而是非常方便, 只需要用开水泡15分钟就行. 价格也不贵, 大包的大概8刀,够两个人吃一顿.
• 一共16L水: 这一项最重要, Big Bend是个缺水的地方, 公园官方的建议是每人每天至少需要一加仑水. 我们的计划是3天两夜, 所以按最低标准准备了4加仑的水. 实际上还是准备少了, 后面会讲到. 如果在近期下过雨的话, 山里面是有可能有溪水的. 不过这些水很宝贵, 野生动物依靠这些水生存. 而且有可能会很脏, 没法直接喝的, 需要过滤.
• 滤水装备: 我们准备了消毒药片和物理过滤工具 (户外用品店都有卖的). 我们在第一天的campsite旁发现了几个小水塘, 不过由于是死水,已经变成黄色的了, 我们过滤了颜色也没消掉, 所以最终也没下决心喝下去.
• 水瓶和水袋: 水瓶少带, 减少重量和体积, 多用那种2L,3L的水袋
3. 其他装备
• 手电筒: 也可以用头灯
• 折叠刀: 一般来说用不上
• 驱虫剂: 强烈建议带上. 山里面非常多的苍蝇, 是会咬人的而且会痒, 当然这个看脸, 我老婆完全没事, 我被咬的死去活来…
• 便携的野外用炉子: 烧水煮饭用
• 急救包: 感冒药, 发烧药, 防中暑, 抗生素, 创可贴, 碘水, 棉球等等…
• 帽子: 强烈推荐. 自我感觉可以非常明显的减低水分流失速度
• 登山杖: 膝盖不好的最好还是带着吧

## 注意事项:

1. 水一定要带够! 按照每人每天最低一加仑的量来计算. 可以在出发前一两天开始多喝水, 让身体多储存点水.
2. 因为上一条, 所以背包会非常重, 这也是在Big Bend徒步露营和在其他地方的最大区别. 我和老婆一共带了16L水,这就已经16公斤了, 加上其余的东西, 我俩一共背了大概40公斤. 你需要精简东西,能不带的就不带,能少带的就少带, 另外就是从装备入手, 像睡袋, 帐篷, sleeping pad这些东西是可以很轻的, 当然价格嘛… 所以最终还是要综合自己的体力以及预算来决定买哪些装备带什么东西.

## 第三天

How to Access iPython Notebook On A Remote Server ? 2017-02-18T00:00:00+00:00 /posts/automated-ipython/ My research work involves a lot of using of IPython Notebook. I usually do it on an office MAC. However I also very often need to access it from home. After a brief searching, I found these three wonderful articles on this topic.

I have been doing this for a while. But it eventually comes to me that how good it is if I can make it automatic. So I wrote this python script to do the procedures described in those three articles. I am sure there must be some more elegant way to do this. But this is what I got so far and it works.

import paramiko
import sys
import subprocess
import socket
import argparse

# function to get available port
def get_free_port():
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(('localhost',0))
s.listen(1)
port = s.getsockname()
s.close()
return port

# print out the output from paramiko SSH connection
def print_output(output):
for line in output:
print(line)

parser = argparse.ArgumentParser(description='Locally open IPython Notebook on remote server\n')
help='terminate the IPython notebook on remote server')
args = parser.parse_args()

host="***" # host name

# write a temporary python script to upload to server to execute
# this python script will get available port number

def temp():
with open('free_port_tmp.py', 'w') as f:
f.write('import socket\nimport sys\n')
f.write('def get_free_port():\n')
f.write('    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)\n')
f.write("    s.bind(('localhost', 0))\n")
f.write('    s.listen(1)\n')
f.write('    port = s.getsockname()\n')
f.write('    s.close()\n')
f.write('    return port\n')
f.write("sys.stdout.write('{}'.format(get_free_port()))\n")
f.write('sys.stdout.flush()\n')

def connect():
# create SSH client
client = paramiko.SSHClient()

# generate the temp file and upload to server
temp()
ftpClient = client.open_sftp()
ftpClient.put('free_port_tmp.py', "/tmp/free_port_tmp.py")

# execute python script on remote server to get available port id
stdin, stdout, stderr = client.exec_command("python /tmp/free_port_tmp.py")
print_output(stderr_lines)

print('REMOTE IPYTHON NOTEBOOK FORWARDING PORT: {}\n'.format(port_remote))

ipython_remote_command = "source ~/.zshrc;tmux \
new-session -d -s remote_ipython_session 'ipython notebook \
--no-browser --port={}'".format(port_remote)

stdin, stdout, stderr = client.exec_command(ipython_remote_command)

if len(stderr_lines) != 0:
if 'duplicate session: remote_ipython_session' in stderr_lines:
print("ERROR: \"duplicate session: remote_ipython_session already exists\"\n")
sys.exit(0)

print_output(stderr_lines)

# delete the temp files on local machine and server
subprocess.run('rm -rf free_port_tmp.py', shell=True)
client.exec_command('rm -rf /tmp/free_port_tmp.py')

client.close()

port_local = int(get_free_port())
print('LOCAL SSH TUNNELING PORT: {}\n'.format(port_local))

ipython_local_command = "ssh -N -f -L localhost:{}:localhost:{} \
gs27722@wel-145-31.cm.utexas.edu".format(port_local, port_remote)

subprocess.run(ipython_local_command, shell=True)

def close():
# create SSH client
client = paramiko.SSHClient()
stdin, stdout, stderr = client.exec_command("source ~/.zshrc;tmux kill-session -t remote_ipython_session")
if len(stderr_lines) == 0:
print('Successfully terminate the IPython notebook\n')
else:
print_output(stderr_lines)
client.close()

if args.terminate:
close()
else:
connect()


This script does the following:

1. Connect to the server using python package paramiko.
2. Upload a temporary python script. Use paramiko to execute the python script. This script gets an available port on localhost.
3. Open Ipython Notebook using the port we get from the last step. I used tmux to do this. And my shell is zsh. You can modify that part of code based on your situation
4. On the local machine, find an available port and create an SSH tunneling to port forwarding the port on the remote machine to local machine.

If the script runs successfully, you will see something like this.

If you want to check does IPython Notebook really runs on the remote machine. Use command tmux ls. A tmux session named remote_ipython_session should exist.

In browser, open http://localhost: 50979. You should be able to access your ipython notebook. To terminate the ipython notebook on the remote machines, simply do

Understanding LAMMPS Source Codes: A Study Note 2016-04-16T00:00:00+00:00 /posts/learn-lammps/ This is a note about learning LAMMPS source codes. This note focuses on compute style of Lammps which is used to compute certain quantity during the simulation run. Of course you can as well compute these quantities in post-process, however it’s usually faster to do it in the simulation since you can take advantage of the all the distance, forces, et al generated during the simulation instead of computing them again in post-process. I will go through the LAMMPS source code compute_gyration.h and compute_gyration.cpp. I am not very familiar with c++, so I will also explain some language related details which is what I learn when studying the code. Hope this article can be helpful when someone want to modify or make their own Lammps compute style.

## compute_gyration.h

#ifdef COMPUTE_CLASS

ComputeStyle(gyration,ComputeGyration)

#else

#ifndef LMP_COMPUTE_GYRATION_H
#define LMP_COMPUTE_GYRATION_H

#include "compute.h"

namespace LAMMPS_NS {

class ComputeGyration : public Compute {
public:
ComputeGyration(class LAMMPS *, int, char **);
~ComputeGyration();
void init();
double compute_scalar();
void compute_vector();

private:
double masstotal;
};

}

#endif
#endif


First part of this code

#ifdef COMPUTE_CLASS

ComputeStyle(gyration,ComputeGyration)

#else


is where this specific compute style is defined. If you want to write your own compute style, let’s say intermediate scattering function. Then we write like this

#ifdef COMPUTE_CLASS

ComputeStyle(isf,ComputeISF)  // ISF stands for intermediate scattering function

#else


Move to the rest part. #include "compute.h" and namespace LAMMPS_NS is to include the base class and namespace. Nothing special is here, you need to have this in every specific compute style header file.

class ComputeGyration : public Compute {
public:
ComputeGyration(class LAMMPS *, int, char **);
~ComputeGyration();
void init();
double compute_scalar();
void compute_vector();

private:
double masstotal;


You can see there is a overal structure in the above code class A : public B. This basically means that our derived class A will inherit all the public and protected member of class B. More details can be found here, here and here

Next, we declare two types of member of our derived class, public and private. public is the member we want the other code can access to and private is the member which is only used in the derived class scope. Now let’s look at the public class member. Note that there is no type declaration of class member ComputeGyration and ~ComputeGyration. They are called Class Constructor and Class Destructor. They are usually used to set up the initial values for certain member variables as we can see later in compute_gyration.cpp. Note that for some compute style such as compute_msd.h, the destructor is virtual, that is virtual ~ComputeMSD instead of just ~ComputeMSD. This is because class ComputeMSD is also inherited by derive class ComputeMSDNonGauss. So you need to decalre the base destructor as being virtual. Look at this page for more details. Now let’s move forward.

  void init();
double compute_scalar();
void compute_vector();


here all the function init, compute_scalar and compute_vector all are the base class member which are already defined in compute.h. However they are all virtual functions, which means that they can be overrided in the derived class, here it is the ComputeGyration. This and this pages provide some basic explanations for the use of virtual functions. Here is a list shown in LAMMPS documentation of some examples of the virtual functions you can use in your derived class.

In our case, gyration computation will return a scalor and a vector, then we need compute_scalar() and compute_vector(). Private member masstotal is the quantity calculated locally which is only used within the class and not needed for the rest of the codes.

## compute_gyration.cpp

Now let’s look at the compute_gyration.cpp.

#include <math.h>
#include "compute_gyration.h"
#include "update.h"
#include "atom.h"
#include "group.h"
#include "domain.h"
#include "error.h"


Here the necessary header files are include. The name of these header file is self-explanary. For instance, updata.h declare the functions to update the timestep, et al.

ComputeGyration::ComputeGyration(LAMMPS *lmp, int narg, char **arg) :
Compute(lmp, narg, arg)
{
if (narg != 3) error->all(FLERR,"Illegal compute gyration command");

scalar_flag = vector_flag = 1;
size_vector = 6;
extscalar = 0;
extvector = 0;

vector = new double;
}


The above code define the what the constructor ComputeGyration actually does. :: is called scope operator, it is used to specify that the function being defined is a member (in our case which is the constructor which has the same name as the its class) of the class ComputeGyration and not a regular non-member function. The structure ComputeGyration : Compute() is called a Member Initializer List. It initializes the member Compute() with the arguments lmp, narg, arg. narg is the number of arguments provided. scalar_flag, vector_flag, size_vector, extscalar and extvector all are the flags parameter defined in Compute.h. For instance, scalar_flag = 1/0 indicates we will/won’t use function compute_scalar() in our derived class. The meaning of each parameter is explained in compute.h. This line vector = new double is to dynamically allocate the memory for array of length 6. Normally the syntax of new operator is such

double *vector = NULL;
vector = new double;


Here the line double *vector = NULL is actually in compute.h and compute.cpp. Where pointer vector is defined in compute.h and its value is set to NULL in compute.cpp.

ComputeGyration::~ComputeGyration()
{
delete [] vector;
}


The above code speficy destructor that is what will be excuted when class ComputeGyration goes out of scope or is deleted. In this case, it delete the gyration tensor vector defined above. The syntax of delete operator for array is delete [] vector. For details of new and delete can be found here.

void ComputeGyration::init()
{
masstotal = group->mass(igroup);
}


This part perform one time setup like initialization. Operator -> is just a syntax sugar, class->member is equivalent with (*class).member. What group->mass(igroup) does is to call the member mass() function of class group, provided the group-ID, and return the total mass of this group. How value of igroup is set can be examined in compute.cpp. It’s the second argument of compute style.

double ComputeGyration::compute_scalar()
{
invoked_scalar = update->ntimestep;

double xcm;
group->xcm(igroup,masstotal,xcm);
scalar = group->gyration(igroup,masstotal,xcm);
return scalar;
}


invoked_scalar is defined in base class Compute. The value is the last timestep on which compute_scalar() was invoked. ntimestep is the member of class update which is the current timestep. xcm function of class group calculate the center-of-mass coords. The result will be stored in xcm. gyration function calculate the gyration of a group given the total mass and center of mass of the group. The total mass is calculated in init(). And in order for it to be accessed here, it is defined as private in compute_gyration.h. Notice that here there is no explicit code to calculte the gyration scalor because the member function which does this job is already defined in class group. So we just need to call it. However we also want to calculate the gyration tensor, we need to write a function to calculate it.

void ComputeGyration::compute_vector()
{
invoked_vector = update->ntimestep;

double xcm;
group->xcm(igroup,masstotal,xcm);

double **x = atom->x;
int *type = atom->type;
imageint *image = atom->image;
double *mass = atom->mass;
double *rmass = atom->rmass;
int nlocal = atom->nlocal;

double dx,dy,dz,massone;
double unwrap;

double rg;
rg = rg = rg = rg = rg = rg = 0.0;

for (int i = 0; i < nlocal; i++)
if (rmass) massone = rmass[i];
else massone = mass[type[i]];

domain->unmap(x[i],image[i],unwrap);
dx = unwrap - xcm;
dy = unwrap - xcm;
dz = unwrap - xcm;

rg += dx*dx * massone;
rg += dy*dy * massone;
rg += dz*dz * massone;
rg += dx*dy * massone;
rg += dx*dz * massone;
rg += dy*dz * massone;
}
MPI_Allreduce(rg,vector,6,MPI_DOUBLE,MPI_SUM,world);

if (masstotal == 0.0) return;
for (int i = 0; i < 6; i++) vector[i] = vector[i]/masstotal;
}


The above code do the actual computation of gyration tensor.

Here is the list of meaning of each variable

• x: 2D array of the position of atoms.
• mask: array of group information of each atom. if (mask[i] & groupbit) check whether the atom is in the group on which we want to perform calculation.
• type: type of atom.
• image: image flags of atoms. For instance a value of 2 means add 2 box lengths to get the unwrapped coordinate.
• mass: mass of atoms.
• rmass: mass of atoms with finite-size (meaning that it can have rotational motion). Notice that mass of such particle is set by density and diameter, not directly by the mass. That’s why they set two variables rmass and mass. To extract mass of atom i, use rmass[i] or mass[type[i]].
• nlocal: number of atoms in one processor.

Look at this line domain->unmap(x[i],image[i],unwrap), domain.cpp tells that function unmap return the unwrapped coordination of atoms in unwrap. The following several lines calculate the gyration tensor. The MPI code MPI_Allreduce(rg,vector,6,MPI_DOUBLE,MPI_SUM,world) sums all the six components of rg calculated by each processor, store the value in vector and then distribute vector to all the processors. Refer to this article for details.

Here are two good articles about understanding and hacking LAMMPS code.

]]>
Pivot Algorithm For Generating Self-avoiding Chain, Using Python and Cython 2014-11-13T00:00:00+00:00 /posts/pivot-algorithm/ Pivot algorithm is best Monte Carlo algorithm known so far used for generating canonical ensemble of self-avoiding random walks (fixed number of steps). Originally it is for the random walk on a lattice, but it also can be modified for continuous random walk. Recently I encountered a problem where I need to generate self-avoiding chain configurations.

Basically the most simple version of this algorithm breaks into following steps:

• Prepare an initial configuration of a $N$ steps walk on lattice (equivalent to a $N$ monomer chain)
• Randomly pick a site along the chain as pivot site
• Randomly pick a side (right to the pivot site or left to it), the chain on this side is used for the next step.
• Randomly apply a rotate operation on the part of the chain we choose at the above step.
• After the rotation, check the overlap between the rotated part of the chain and the rest part of the chain. Accept the new configuration if there is no overlap and restart from 2th step. Reject the configuration and repeat from 2th step if there are overlaps.

For random walks on a 3D cubic lattice, there are only 9 distinct rotation operations.

Some references on Pivot algorithm

• [Lal(1969)]: The original paper of pivot algorithm.
• [Madras and Sokal(1988)]: The most cited paper on this field. For the first time, they showed pivot algorithm is extrememly efficient contradicted to the intuition.
• [Clisby(2010)]: The author developed a new more efficient inplementation of pivot algorithm and calculate the critical exponent $\nu$, which is also the Flory exponent for polymer chain in bad solvent.

## Python Implementation Using Numpy and Scipy

The implement of this algorithm in Python is very straightforward. The raw file can be found here

import numpy as np
import timeit
from scipy.spatial.distance import cdist

# define a dot product function used for the rotate operation
def v_dot(a):return lambda b: np.dot(a,b)

class lattice_SAW:
def __init__(self,N,l0):
self.N = N
self.l0 = l0
# initial configuration. Usually we just use a straight chain as inital configuration
self.init_state = np.dstack((np.arange(N),np.zeros(N),np.zeros(N)))
self.state = self.init_state.copy()

# define a rotation matrix
# 9 possible rotations: 3 axes * 3 possible rotate angles(90,180,270)
self.rotate_matrix = np.array([[[1,0,0],[0,0,-1],[0,1,0]],[[1,0,0],[0,-1,0],[0,0,-1]]
,[[1,0,0],[0,0,1],[0,-1,0]],[[0,0,1],[0,1,0],[-1,0,0]]
,[[-1,0,0],[0,1,0],[0,0,-1]],[[0,0,-1],[0,1,0],[-1,0,0]]
,[[0,-1,0],[1,0,0],[0,0,1]],[[-1,0,0],[0,-1,0],[0,0,1]]
,[[0,1,0],[-1,0,0],[0,0,1]]])

# define pivot algorithm process where t is the number of successful steps
def walk(self,t):
acpt = 0
# while loop until the number of successful step up to t
while acpt <= t:
pick_pivot = np.random.randint(1,self.N-1) # pick a pivot site
pick_side = np.random.choice([-1,1]) # pick a side

if pick_side == 1:
old_chain = self.state[0:pick_pivot+1]
temp_chain = self.state[pick_pivot+1:]
else:
old_chain = self.state[pick_pivot:]
temp_chain = self.state[0:pick_pivot]

# pick a symmetry operator
symtry_oprtr = self.rotate_matrix[np.random.randint(len(self.rotate_matrix))]
# new chain after symmetry operator
new_chain = np.apply_along_axis(v_dot(symtry_oprtr),1,temp_chain - self.state[pick_pivot]) + self.state[pick_pivot]

# use cdist function of scipy package to calculate the pair-pair distance between old_chain and new_chain
overlap = cdist(new_chain,old_chain)
overlap = overlap.flatten()

# determinte whether the new state is accepted or rejected
if len(np.nonzero(overlap)) != len(overlap):
continue
else:
if pick_side == 1:
self.state = np.concatenate((old_chain,new_chain),axis=0)
elif pick_side == -1:
self.state = np.concatenate((new_chain,old_chain),axis=0)
acpt += 1

# place the center of mass of the chain on the origin
self.state = self.l0*(self.state - np.int_(np.mean(self.state,axis=0)))

N = 100 # number of monomers(number of steps)
l0 = 1 # bond length(step length)
t = 1000 # number of pivot steps

chain = lattice_SAW(N,l0)

%timeit chain.walk(t)
1 loops, best of 3: 2.61 s per loop


Above code performs a 100 monomer chain with 1000 successful pivot steps. However even with numpy and the built-in function cdist of scipy, the code is still too slow for large number of random walk steps.

## Cython Implementation

When come to the loops, Python can be very slow. In many complex situations, even numpy and scipy is not that helpful. For instance in this case, in order to determine the overlaps, we need to have a nested loop over two sets of sites (monomers). In the above code, I use built-in function cdist of scipy to do this, which is already highly optimized. But actually we don’t have to complete the loops, because we can stop the search if we encounter one overlap. However I can’t think of a natural numpy or scipy way to do this efficiently due to the conditional break. Here is where [Cython] can be extrememly useful. Cython can translate your python code to C and translate your C or C++ code to a Python module so you can directly import your C/C++ code in Python. To do that, first we just handwrite our pivot algorithm using plain C++ code.

#include <math.h>
using namespace std;
void c_lattice_SAW(double* chain, int N, double l0, int ve, int t){
... // pivot algorithm codes here
}


Name the file c_lattice_SAW.cpp. Here we define a function called c_lattice_SAW. Where chain is the array storing the coordinates of monomers, N is the number of monomers, l0 is the bond length, t is the number of successful steps.

• C++11 library random is used here in order to use Mersenne twister RNG directly.
• The C++ code in this case is not a complete program. It doesn’t have main function.

The whole C++ code can be found here. Beside our plain C code, we also need a header file c_lattice_SAW.h.

void c_lattice_SAW(double* chain, int N, double l0, int ve, int t);


If you don’t want to handwrite a C code, another way to use Cython is to write plain Cython program whose syntax is very much Python-like. But in that way, how to get high quality random numbers efficiently is a problem. Usually there are several ways to get random numbers in Cython

• Use Python module random.

This method will be very slow if random number is generated in a big loop because generated C code must call a Python module everytime which is a lot of overhead time.

• Use numpy to generate many random numbers in advance.

This will require large amount of memory and also in many cases, the total number of random numbers needed is not known before the computation.

• Use C function rand() from standard library stdlib in Cython

rand() is not a very good RNG. In a scientific computation like Monte Carlo simulation, this is not good way to get random numbers.

• Wrap a good C-implemented RNG using Cython.

This can be a good way. Currently I have found two ways to do this: 1. [Use numpy randomkit] 2. [Use GSL library]

• Handwrite C or C++ code using random library or other external library and use Cython to wrap the code.

This will give the optimal performance, but comes with more complicated and less readable code.

What I did in this post is the last method.

Now we need to make a .pyx file that will handle the C code in Cython and define a python function to use our C code. Give the .pyx a different name like lattice_SAW.pyx

import cython
import numpy as np
cimport numpy as np

cdef extern from "c_lattice_SAW.h":
void c_lattice_SAW(double* chain, int N, double l0, int ve, int t)

@cython.boundscheck(False)
@cython.wraparound(False)

def lattice_SAW(int N, double l0, int ve, int t):
cdef np.ndarray[double,ndim = 1,mode="c"] chain = np.zeros(N*3)
c_lattice_SAW(&chain,N,l0,ve,t)
return chain


Compile our C code to generate a shared library which can be imported into Python as a module. To do that, we use Python distutils package. Make a file named setup.py.

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

import numpy

setup(
cmdclass = {'build_ext':build_ext},
ext_modules = [Extension("lattice_SAW",
sources = ["lattice_SAW.pyx","c_lattice_SAW.cpp"],
extra_compile_args=['-std=c++11','-stdlib=libc++'],
language="c++",
include_dirs = [numpy.get_include()])],
)


Instead of normal arguments, we also have extra_compile_args here. This is because in the C++ code, we use library random which is new in C++11. On Mac, -std=c++11 and -stdlib=libc++ need to be added to tell the compilers to support C++11 and use libc++ as the standard library. On Linux system, just -std=c++11 is enough.

If cimport numpy is used, then the setting include_dirs = [numpy.get_include()])] need to be added

Then in terminal we do

Linux

python setup.py build_ext --inplace


or Mac OS

clang++ python setup.py build_ext --inplace


clang++ tell the python use clang compiler not gcc because apparently the version of gcc shipped with OS X doesn’t support C++11.

If the compilation goes successfully, then a .so library file is generated. Now we can import our module in Python in that working directory

import lattice_SAW
import numpy

%timeit lattice_SAW.lattice_SAW(100,1,1,1000)
100 loops, best of 3: 5.97 ms per loop


That is 437 times faster than our Numpy/Scipy way!

]]>