Technical tips for running a neuroscience (fMRI GLM) experiment
Having worked on a neuroscience experiment for the last 1.5 years I have learned a great deal along the way. I offer this post as what I would do if I were starting from scratch. I’m focused on the technical details rather than neuroscience aspects of experiment design. This focuses on a basic GLM with fMRI data, so some of the information may not be applicable to other experiment types.
Prep work
OS
For neuroscience purposes, I recommend a BSD based operating system, meaning OSX or a Linux distribution. Getting a Mac would allow you to use any of the tools mentioned below, but it is obviously a costly expenditure if you do not already own one.
Linux is a viable alternative, and neurodebian is perfect if you are picking as OS just for neuroscience. The main draw back of Linux is that you will likely have difficultly running MATLAB, and subsequently the Psychtoolbox. You can either choose to: a) try to get MATLAB/GNU Octave running in linux b) Ignore matlab and use python (see the below section on data collection).
Windows would be my last choice for operating systems. If you must, install a virtual machine copy of neurodebian using virtual box, and work out of the virtual machine. Note: The performance hit of virtualization will probably be detrimental to data collection and analysis, so you will have to work out your own solution for this based on your environment (e.g. set up design files in the VM and then process them on a shared server.).
Version Control and Dropbox
Dropbox and a version control tool such as git will save you a great deal of frustration with backing up and saving your work at all stages of the experiment. The real value in these tools is that it provides a virtual and automated experiment log.
Dropbox
If you are not aware, Dropbox is a tool for automatic syncing and backup to the cloud of data that exists on your computer. The tool is invaluable when it comes to backing and sharing data with colleagues. Don’t use Dropbox to store neuroimages or sensitive subject data, because it is stored offsite (ie. on the Dropbox servers) and you have limited space. Visit the official website at www.dropbox.com to get started. The blog lifehacker covers uses of dropbox extensively.
Git
Git is a version control tool, which if you are not familiar with programming, which allows for careful tracking of changes in code bases. More specifically, it keeps a very detailed record of changes in text files. This is invaluable when you are modifying your scripts and can also be leveraged in tracking your regressor files. Syntactically, the commands and concepts of git take a bit getting to used to. However, the investment in learning the system will pay off when you are scripting. I would recommend this guide to learn. If you would like to share your code Github is the de facto standard. If you would like to keep your work private Bitbucket allows you to create a private repository as long as there are less than 5 contributors.
Data Collection
When it comes to data collection, it boils down to stimuli presentation to the subject and often, the recording of subject response times. Even if you are not recording subject input, you will need to record scanner start time, and image acquisition times in relation to stimuli presentation. To this end, I know of two possible toolsets.
MATLAB combined with Psychtoolbox
MATLAB is a program developed by MathWorks, which is geared specifically towards scientific and heavy math computation. It has it’s own language. Psychtoolbox is an addon for matlab that is geared towards running psychological experiments.
Pros:
- MATLAB and psychtoolbox are well used in the neuroscience community and scientific community at large. Thus, it should be fairly easy to find help when you run into problems.
- Many analysis tools use MATLAB scripts to preform analysis tasks.
- The Psychotoolbox library works well and is fairly easy to learn, once you understand MATLAB scripting in general.
Cons:
- Cost, if your institution does not have a license.
- MATLAB is a nightmare of a language. It is an aggravating language to write in and debug.
Python/NumPy with PsychoPy
I did not use this combination myself, but I would have adopted it from the start had I known about it. NumPy is a derivative of the python language that replicates much of the functionality of MATLAB, and is geared towards engineering/scientific computing in the same way. If you have become attached to the MATLAB IDE and development environment, you can get a similar set up going with iPython. One of the main reasons I would advocate for this setup is that if you can learn python here, you will be prepared to use it in analysis and you could stick to one language from start to finish. Also, I believe you would be able to be more efficient during analysis because Nipype will be open to you. To get the stimuli presentation suite, use PsychoPy.
Pros:
- Free
- Language is far easier to learn and syntactically superior to MATLAB
- Ability to work in one language from start to finish
Cons:
- Not as well adopted, so expect support to be slim
Analysis
Analysis is likely the most computational and scripting intensive part of the process. There are many technical choices to be made here and a little forethought will save you time down the road.
Organization
Perhaps the most important part of the post-collection process is to select a sane directory structure and naming convention. There will be a large number of files generated by this process and keeping to standard will allow you to know what is what, but more importantly keep scripting fairly easy. Whatever you decided, make sure that it works for your experiment design. You will likely have the following types of files:
-
Small files
- raw behavioral files (e.g. subject responses)
- regressor files (for GLM analyses)
- analysis design files
-
Large files/folders
- raw scanner 4D files
- raw structure files
- processed structure files
- preprocess and analysis directories
A few tips:
-
Place all of the small files under version control. This can be done fairly easily if you integrate version control into the scripts that will generate these files. Example, suppose you are generating a regressor file for your GLM. Call git add on all of the generated files, and then git commit with a message explaining what they are. In this way you can know what a file is later when you go to publish.
- At least place as much of these as possible in Dropbox, this is not as good as full version control but it is better than nothing
-
Learn symbolic linking for your OS. This allows you to create paths and structures that link elsewhere, saving you space on the HD. This also makes for clearer, more organized structure. For example, if you have raw neural files from the scanner, and their naming scheme does not fit with your pattern. You can maintain the original files somewhere, and create a symbolic link to them with a naming structure that fits your pattern.
-
Keep your raw data separate from any processing you do. Mirror this somewhere so you always have a pristine copy of your data if all else fails
Here’s an example of our experiment structure:
- Experiment root
|
- Structures
| |
| + Subject#1
| |
| ...
|
- Condtion#1
| |
| - Subject#1
| | |
| | - run#1
| | |
| | + analysis#1
| | + analysis#2
| | + behavioral regressors
| | * raw 4D
| |
| + Subject#2
| |
| ...
|
+ Condition#2
|
+ Higher Level Analysis Folders (Fixed Effects)
|
+ Higher Level Analysis Folders (Mixed Effects)
Analysis suite
There are several suites for analysis, and each has their devotees. The two main suites are FSL and SPM. Personally I used FSL for my experiment, so I have more to say on it. As a rule of thumb, I have found that if you can do something in one suite, you can probably do it in the other. Google will help you figure out how. As a last resort, you can always use the other suite for a single task since they share data formats.
FSL
FMRIB Software Library(FSL) is a suite of tools developed my Steve Smith’s group out of Oxford. The tools are primarily command line tools, and have GUI frontends. The CL nature of the tools make them easy to script. The design file system used by the suite makes replication and scripting also very easy. My major complaint after using the suite is that the GUI tools are difficult to use. This is a due to their support across multiple OSes, and reliance on the Qt engine to provide this support. As free software, native GUIs would be nice but it is not a deal killer here. This problem essentially trains you to work more in command line and raw text files. The quicker you pick this up, the more efficient you will be in FSL.
Other thoughts/tips:
- The documentation is hit or miss for FSL. You can hover over in the GUI to get the same documentation. Similarly, you can man page the command line tools. However, if you are really stuck I would turn to searching their mail listing. The project maintainers are very active and helpful when you have questions. You might even subscribe to the list just to stay current and see what others doing.
SPM
Statistical Parametric Mapping (SPM) is a suite of MATLAB tools. The major problem here is that it requires MATLAB. So, if you have chosen a software path that does not utilize MATALB you might consider not using SPM. Like FSL, it is also free.
Brain voyager
Brain voyager QX is a paid tool. I know little of it as no one in my lab used it. This experience, combined with my neuroscience internet experiences, leads me to believe it is not as well adopted as the other two suites.
Other software worth knowing
Freesurfer
Freesurfer is an anatomical analysis tool suite. I believe the real value of this tool is that it registers based on cortical folding patterns, providing a better fit over traditional registration techniques. I still haven’t fully utilized this tool, and wish I had more time to learn it, as it seems to hold a lot of value.
Nipy
Nipy is the tool that actually caused me to write this article in the first place. Essentially, Nipy is a tool/engine for creating analysis pipelines. Using python, it provides a soup to nuts interface for scripting your analysis workflow. It provides access and interfaces for a litany of software, many that I have mentioned here. Sadly, I stumbled upon Nipype when I was already deeply entrenched in my analysis procedures. Due to the fact that I am not a PHD student in Neuro and don’t plan on doing another experiment it did not make sense for me to sink the time into Nipy. However, if I were starting from scratch I would spend a quiet weekend alone with Nipy and learn how to use it efficiently. This would have saved me tons of time in the post data collection period. Nipype is a community of shared neuroscience pipelines, theoretically you could just download a pipeline and not have to reinvent the wheel.
Scripting
Pick a scripting language early and try to use it exclusively. Sticking to one language will save you time because you won’t have to learn new syntax to accomplish things you already know how to do in your main language. Here are a few choices:
Python
Having finished my experiment, I wish I had worked exclusively in python. The language is syntactically much simpler than the other two (Bash/MATLAB). I believe it is more intuitive to pick up and debug than the other two. You could accomplish everything that you could in MATLAB through a combination of NumPy and iPython (see the above section). Paired with Nipy I believe this would be the most powerful toolset of any discussed in this article. You can also accomplish most shell/Bash tasks in python through code similar to this:
#here I am calling the fsl tool slicer
#Popen within the module subprocess is the key to accessing the shell cmds
from subprocess import Popen
def slice(outname,color=None)
slicer_command=['/usr/local/fsl/bin/slicer',outname,'-l',color, '-L','-A','400',outname+'.png']
p1=Popen(slicer_command)
p1.wait()
MATLAB
If you already use SPM, sticking to MATLAB for all of your scripting needs might be the proper call. Originally, we used MATLAB for behavioral analysis and Bash was used to script the fsl tools for neural analysis. However, I have seen people accomplish the latter task using matlab exclusively and code such as this:
[status,output]=system('ls')
Bash/Shell Scripting
Bash and shell scripting is a fickle mistress. In contrast to the other two, you get direct control of the CL tools. However, you will have to learn some of the basic tools such as awk and grep. Furthermore, the syntax of bash is harder to read and write. “If statements” in particular are prickly. I don’t think you could exclusively work in Bash, as doing behavioral analysis would be inefficient.