Monte Carlo Diffusion Simulator

1.  Overview

This tutorial describes the Monte Carlo diffusion simulator of Camino and how it can be used to generate synthetic MR data. The simulator allows the simulation of diffusion from simple to extremely complex diffusion environments, called "substrates". In this tutorial, we'll be covering how to run several different substrates such as

  • Cylinders with equal radii
  • Crossing fibres
  • Cylinders with gamma-distributed radii
  • Mesh-based substrates

A substrate is envisaged to sit inside a single voxel, with spins diffusing across it. The boundaries of the voxel are usually periodic so that the substrate defines an environment made up of an infinite, 3D array of whatever you specify. The measurement model in the simulation does not capture the trade-off between voxel size and SNR and hence simulation "voxels" can be quite a bit smaller than those in actual scans. The simulation is, and has always been, intended as a tool to simulate signals due to sub-voxel structure, rather than large spatially-extended structures.

2.  Running simulations and common options

Simulations are all run via the datasynth command, and the substrate is specified using the -substrate switch. Different substrates require different parameters to be specified, hence different additional switches will be required which will be discussed in the next section. Certain options, however, are common to all (or nearly all) simulations and we will discuss them as follows.

-walkers specifies how many spins to use in the simulations. This is an important consideration since more spins equates to greater accuracy, but it increases the amount of time a simulation takes to run. Simple simulations require around 10000 spins, but simulations on more complex substrates require more spins to capture the additional complexity, particularly at higher b-values. For the simulations covered in this tutorial we recommend AT LEAST 100000 spins, but more may well be required.

-tmax specifies the number of timesteps in to which the simulation is divided. This can be thought of as defining the temporal resolution of the simulation, since the total duration of a simulation's dynamics (i.e. the period of time that is simulated, and not how long it takes to run!) is defined by its schemefile. This duration will be split into a number of updates of equal length, dt. This is an important number, since together with the diffusivity, it defines the length of steps in each spin's random walk. It also defines how long a simulation will take to run. More timesteps equates to a longer execution. Investigations have shown that simple substrates require about 1000 updates, although very complex substrates and very high b-values may need more (5000 or more).

-diffusivity specifies the free diffusivity for spins in the simulation in SI units. The default value is 2.0E-9m^s/s but this can be changed as needed. All physical quantities in Camino (including b-value) use SI units (seconds, meters, teslas) so it may be necessary to convert to SI before specifying them.

-p specifies the probability that a spin will step through a barrier unaffected instead of interacting with it. This is usually set to 0, making barriers impermeable.

-voxels specifies the number of voxels of data to synthesise. It is important to specify this to be at least 1. By default, a simulation will run once and then fill remaining voxels with different noise-realisations of the synthesised data. To run a separate simulation in each voxel, use the -separateruns option.

-initial specifies the initial conditions of spins by a keyword. uniform distributes spins uniformly (but randomly) across the substrate. delta locates them all at the same point in the centre of the substrate, intra distributes them uniformly in the intracellular space and extra does the same but in the extracellular space. Note that some substrates do not have a reliable definition of intracellular and extracellular, and in this case the use of intra and extra will lead to unexpected results and is deprecated.

-seed specifies the seed for the random number generators to use. Since it is a Monte-Carlo simulation, it makes extensive use of random numbers when running. Specifying different seeds will cause different sequences of random numbers to be generated and hence different trajectories of spins to be developed as (slightly) different synthetic measurements to be synthesised. A seed can be any integer, although it is better to use large numbers (8 or 9 digits) as more complex seeds give lower correlations in random number sequences.

3.  Half a Command

Putting all this together, we can go a long way towards running a simulation, because we can specify everything in a simulation except the substrate.
datasynth -walkers 100000 -tmax 1000 -p 0.0 -voxels 1 -schemefile myscheme.scheme -initial uniform
Specifies a simulation with 100000 spins, 1000 timesteps, zero permeability and spins initially uniformly distributed across the substrate using myscheme.scheme to specify the pulse sequence.

In fact, although it usually the best thing to do, it is not always appropriate to specify a scheme file. The next section describes running simulations with and without scheme files.

4.  Scheme files and simulations

As previously mentioned, a scheme file defines a pulse sequence used by the simulation to generate synthetic data. There are several different formats of scheme file which describe different kinds of pulse sequences at different levels of complexity. The simulation can make use of any of them, although in the case of very simple scheme files, assumptions will be made about the specifics of the sequence.

-schemefile specifies the scheme file to use for data synthesis. The scheme file describes the pulse sequence used to generate diffusion-weighted data. These specify gradients, pulse durations, orientations, separations, etc. and are no different to those used by other parts of Camino.

The simulation needs to run at least one set of dynamics to generate a set of synthetic measurements. Simulations are complex and can take from minutes up to days, depending on complexity and hardware. The longer a schemefile is, the more memory will be required by the simulation, since each spin stores a phase shift for each entry in the scheme file. For this reason, it may not be advisable to put hundreds of measurements into a single scheme file but rather split them across different simulations.

5.  Trajectories files

In some applications, however, we might want to synthesise many different sets of measurements from the same simulation. For example, I might want to compare the sensitivity of different acquisitions to a particular parameter, or to investigate the behaviour of the diffusion signal as a function of one or more scan parameter. In this case, it may not be appropriate or even feasible to run a separate simulation for every protocol used and for this reason it is possible to separate the simulation dynamics from the data synthesis phase.

This is done using an intermediate file called a trajectories (or traj) file. A trajfile contains the complete trajectories of all spins in a given simulation and may be parsed into a set of measurements at a later date. In order to run a simulation in this way, use the -duration switch along with -trajfile.

-duration specifies the duration of the dynamics (in seconds) if no scheme file is used. You can find out what this should be by using the longest acquisition in your scheme file. The simulation should be long enough to run at least from the onset of the first gradient block to the end of the last.

-trajfile specifies the name of the trajfile generated.

An example for generating a trajfile is following reproduced.
datasynth -walkers 100000 -tmax 1000 -p 0.0 -voxels 1 -initial uniform -duration 0.2 -trajfile mytraj.traj
The duration is in seconds, and the trajfiles is mytraj.traj and it could be very large. A typical trajfile will run to several or tens of Gigabytes, so a considerable amount of disk space will be required.

Traj files are parsed into data using a separate command called scan.
scan -trajfile mytraj.traj -schemefile myscheme.scheme > output.bfloat
This command reads the trajectories file specified and synthesise diffusion weighted data using myscheme.scheme. Due to the size of the files involved this will take at least several seconds and may take several minutes for larger files.

Traj files are big-endian binary files. In order to access them directly you will need to read one variable at a time. The format is quite simple but you will need to be careful to read the correct variable type for each field. There is a small header, and then the bulk of the data, which is formatted like a long table.

Header:

dynamics duration (double)
number of spins (double)
tmax (double)

Data (one line per spin per update):

time (double)
spin index (int)
spin x (double)
spin y (double)
spin z (double)

double refers to a signed double-precision 8-byte floating point number. int refers to a signed, 32bit (4 byte) integer.

6.  Different Substrates

At its heart, the diffusion simulation is a model of diffusion in an environment that restricts the motion of spins. Data are generated by simulating the action of a pulse sequence on the spins, and hence data is a function of spin trajectories, which are themselves strongly influenced by the environment. The structure contained in the environment is called the "substrate" and it is important to choose a substrate that is appropriate for the study you are interested in. New substrates are being added at all times, but this tutorial covers only the most useful and flexible types provided by the simulator.

Since diffusion MRI has historically been concerned with imaging white matter, many of the substrates supported by the simulator are designed to model white matter structure and are hence composed of cylinders. However, more recently the limitations of cylinders have started to become important; consequently, a more general form of substrate was implemented. It describes the environment with arbitrary triangle meshes, making it possible to simulate diffusion in extremely detailed tissue environments.

6.1  Empty substrates

The absolute simplest simulation possible is in an empty environment without any restricting structures at all. This is called an Empty substrate, and it's the simulation's equivalent of putting a bottle of water in the scanner, i.e. it's a model of free diffusion. Empty substrates can be surprisingly useful for testing purposes, and they are also very fast to run because of the lack of intersection checking. To run and empty substrate use:
datasynth -walkers 100000 -tmax 1000 -voxels 1 -p 0.0 -schemefile my.scheme -initial uniform -substrate empty > freeDiffusion.bfloat
and the diffusivity can be controlled by adding a -diffusivity option and by specifying the diffusivity in SI units.

6.2  Regular-packed cylinders

The simplest non-trivial simulation that is supported is an environment containing regularly packed cylinders of constant radius. In this situation, the environment is composed of an infinite array of cylinders parallel to the z-axis. There are two ways of arranging cylinders in this way: square-packing, in which cylinders are placed at the intersections on a square grid, and hexagonal-packing in which they are placed on the intersections of a triangular grid.

From a data-synthesis perspective there is not a great deal of difference between the two of these packings but it is worth noting that a hexagonal packing is more efficient at filling space than square packing. The maximum intracellular volume fraction from non-overlapping square packed cylinders is around 79% whereas from hexagonal packing it is 91%.

The simulation supports both of these configurations, they are specified by the -packing switch followed by either square or hex. Cylinders in the simulation do not necessarily have to be abutting (i.e. close-packed, so that their edges touch), although this is certainly possible. The size and spacing of cylinders is controlled by the -cylinderrad and -cylindersep options respectively. -cylinderrad specifies the RADIUS of all cylinders and -cylindersep the separation between cylinders. The cylindersep variable is defined as the distance between the central axes of the cylinders and hence must be at least twice the cylinder radius. Separations closer than this will cause cylinders to overlap and intersect and may cause unexpected results. Both distances must be specified in meters (NOT in microns).

An example command to run a simulation with a regular-packed cylinder substrate is:
datasynth -walkers 100000 -tmax 1000 -voxels 1 -p 0.0 -schemefile my.scheme -initial uniform -substrate cylinder -packing hex -cylinderrad 1E-6 -cylindersep 2.1E-6 > regcylinders.bfloat
This will run a simulation with 100000 spins, 1000 datasteps and scan parameters specified by my.scheme on a hexagonally packed regular cylinder substrate with impermeable cylinders of radius 1E-6m and separation 2.1E-6m and send the results to the file regcylinders.bfloat. Changing hex to square will change the packing configuration accordingly.

6.3  Crossing Cylinders

A situation that is often of interest in diffusion MR research is where we have more than one principle fibre direction. The simulation is able to model crossing fibres with a specified crossing angle. This substrate contains two populations of fibres in interleaved planes. One population is parallel to the z-axis and another is rotated about the y-axis by a given angle with respect to the first.

Cylinders on this substrate are arranged in parallel in the xz-plane in parallel layers one cylinder thick. I.e. a plane of cylinders parallel to the z-axis, with a rotated with respect the first, then another parallel z-axis and so on. Cylinders are all of a constant radius.

An example command to use here is
datasynth -walkers 100000 -tmax 1000 -voxels 1 -p 0.0 -schemefile my.scheme -initial uniform -substrate crossing -crossangle 0.7854 -cylinderrad 1E-6 -cylindersep 2.1E-6 > crossingcyls45.bfloat
Here we've specified a crossing substrate. The crossing angle is specified in radians (NOT degrees) using the -crossangle 0.7854. This is approximately pi/4, or 45 degrees. The crossing angle can take on any value, just make sure you use radians!

6.4  Irregularly Packed, Distributed Radius Cylinders

Useful as they are, regularly packed cylinders of constant radius do not capture much of the complexity of white matter tissue, and in fact we can do quite a bit better. In this section we describe a more realistic type of substrate. Parallel cylinders with gamma-distributed radius and not symmetry in their arrangement. For brevity we refer to this substrate as gamma-cylinders. For historical reasons, it is referred to on the command line as an "inflammation" substrate (see Hall & Alexander IEEE TMI 2009).

The gamma cylinder substrate contains a specified number of cylinders whose radii are drawn from a gamma distribution. This distribution is specified by two parameters, which may also be given on the command line. Cylinders are placed randomly in a square region of a given size such that they no overlap and that the edges of the square are periodic: a cylinder that overlaps an edge is repeated on the opposite side of the square.

An example of command for simulating diffusion with gamma distributed cylinders is
datasynth -walkers 100000 -tmax 1000 -voxels 1 -p 0.0 -schemefile my.scheme -initial uniform -substrate inflammation -increments 1 -gamma 1.84037 7.8E-7 -substratesize 3.5E-5 -numcylinders 100 > gammacyls.bfloat
Here we've specified an inflammation substrate that contains 100 cylinders which will be placed in a cross-section region of 35x35um. Because there is no simple way to predict the intracellular volume fraction from the number of cylinders, substrate size and distribution parameters, some experimentation may be necessary to obtain realistic volume fractions. The intracellular volume fraction of the substrate is reported in the runtime log (and on the console output). If a substrate size is too small it means that the simulation will not be able to fit all cylinders in the space, and hence it will distort the dynamics. If is too large it means that there is too much space between cylinders, and hence too low an intracellular volume fraction.

A more complex substrate also brings other considerations. Firstly, it may be necessary to model a larger number of spins in order to probe its complexity. Here we have used 100000, but for more demanding simulations including high b-value measurements it may be necessary to increase this substantially. It is not uncommon to use 500000 spins and 2000 or more timesteps in order to obtain good convergence of results. Furthermore, a disordered substrate means that, in order to eliminate bias in simulations, it is necessary to run several simulations on different realisations of the same substrate. This means that different sets of cylinders with the same statistical properties are arranged in different ways in space at every simulation. The seed for the random number generator can be changed using the -seed option followed by any large integer. Any integer will do, but large (8-9 digits) numbers with complicated binary representations will work better than simple ones. Changing the seed will change the arrangement of cylinders and also the trajectories executed by spins during the simulation. Averaging over these is called an ensemble average, and is good practice when using disordered substrates.

6.5  Mesh substrates

The most flexible and realistic type of substrate currently supported by the simulation is the Mesh Substrate. Mesh substrates model the environment as a collection of triangles and as such can model almost any shape in three dimensions. Mesh substrates have been successfully used to simulate diffusion in environments constructed from micrograph images (Panagiotaki et al 2010), and validate assumptions on white matter models with orientation dispersion where analytic expressions are unavailable (Wang et al 2013). This page contains a brief description, detailed mesh tutorials can be find in Monte Carlo Mesh Simulation Tutorial.

Mesh substrates are constructed using a PLY file. This is a simple file format used for 3D geometry. They are simple to construct, and the format is described here. is an example of the format as parsed by the simulation. In common with everything else in Camino, all distances in the PLY file used will be assumed to be in meters.

A basic example command for running a simulation on a mesh substrate is
datasynth -walkers 100000 -tmax 1000 -voxels 1 -p 0.0 -schemefile my.scheme -initial uniform -substrate ply -plyfile mysubstrate.ply > meshsubstrate.bfloat
Here we're running the usual 100000 spins and 1000 timesteps to generate a single voxel of data using the my.scheme scheme file with spins initially uniformly distributed across the substrate. The substrate here is constructed of all the triangles in the PLY file. Spins diffuse around in the space containing the mesh and any spin that leaves the region occupied by the substrate will re-enter on the opposite side. This means that spins whose trajectories cross a periodic boundary experience a discontinuity in their environment and to avoid biases in the signal caused by these discontinuities, we minimise the contribution from boundary-crossing spins by only generating data from the central portion of the voxel. This is a cube-within-a-cube which shares its centre with the substrate. The size of this central region is by default 0.75 times the length of each side of the substrate. This proportion can be changed by using the -voxelsizefrac followed by a number between zero and 1. The larger this region the more spins will be contained in the central region, but the more the signal will be affected by spins that have crossed the boundaries of the substrate. The default value has provided a good compromise in previous work. For some substrates (such as those with periodic boundaries) it may make sense to generate data from the entire substrate. In this case use -voxelsizefrac 1.0.

Because meshes are very complicated, simulations with meshes can be very computationally demanding. Previous work (see Panagiotaki et al MICCAI 2010) has employed meshes with as many as 200000 triangles, simulating 80000 spins and 1250 timesteps required around 24hours on a 3.2GHz dual-core linux PC. It is important to run with a large number of spins and updates in order to achieve unbiased results with good convergence.

7.  Other Options

This section describe some other miscellaneous features of the simulation that may be useful.

7.1  Different pulse sequences

First of all, it's worth to mention that the simulation can work with a variety of scheme file formats. In addition to the standard version 1 scheme file that describes standard PGSE acquisitions, the simulation can work with twice refocussed spin-echo (TRSE) sequences using version 3 scheme files and completely general gradient waveforms using gradient waveform schemes (see tutorial on general waveform scheme files which can express, for example, double wave-vector or oscillating gradients). A general waveform scheme can be used to describe arbitrary, 3 dimensional pulse shapes and so are able to describe almost any conceivable sequence. The simulation will work seamlessly with any of these schemes, simply specify a scheme file of the appropriate format on the command line.

7.2  Permeability

In all commands given previously, membrane permeability was set to zero via the -p 0.0 switch. This specifies a probability that when a spin encounters a barrier, it will step through it rather than being reflected. Specifying a non-zero probability will cause membranes to become more permeable. Care should be taken with this probability since, in combination with the simulation time increment dt = duration/tmax, it defines a rate of spins through membranes. Changing dt will therefore change this rate, and if dt is changed, the membrane permeability must also be altered to account for this.

7.3  Stats files

In addition to generating synthetic MRI signals, the simulation is capable of generating statistics directly from the dynamics. These are generated in stats files. To generate a stats file, just specify its name from the command line using -statsfile myStatsFile.dat along with the rest of your simulation command. By default, a stats file will contain the mean squared-displacement of spins in the x, y and z directions for all time points in the simulation (MSD). This allows, for example, the time-dependent intrinsic diffusivity to be calculated. Stats files are binary files, formatted with each entry as a double precision floating point number (Java type double) as follows:
<duration of simulation (dt*tmax)>
<Number of spins>
<Number of timesteps in the simulation (also a double)>
<time1>
<MSDx>
<MSDy>
<MSDz>
<time2>
<MSDx>
<MSDy>
<MSDz>
...

Framework exists in the code for implementing other type of stats files with different type of data in them, although there is currently no command line mechanism for changing statsfile type.

That about covers things. Have fun in Monte-Carlo.