Opinions
Why should a biologist use R?
Although many biologists, but not all, do not have a real pleasure using statistics, statistics have a relevant, sometimes mandatory, place in their work. It is thus not surprising that they resort to use whatever is available and seems to do the work. This is the key point; what seems to do the work may not be doing the work correctly. A clear first place among tools that are used to do statistics which should not be used goes for MS-Excel (the story can be found in McCullough and Heiser, 2008). What are the good choices? Actually, there are many programs suitable for doing, correctly, statistics. Why should a biologist choose R?
We will group the elements of an answer under two labels: fashion and quality. The majority of those reading this text are smart phone and tablet users, many may have facebook and linkedin accounts. Without arguing about the advantages that such platforms and devices may provide, it is clearly that social pressure, fashion, is the main reason for people /starting/ to use them. Fashion is not always a bad thing. R is nowadays the most fashionable software to do statistics, in biology and elsewhere. The web is pullulated by R recipes, by itself, alone, this is not an argument in favour of R, but it shows that R is a living and growing (software) being.
The real reasons to use R relate to its quality. R is developed and maintained by a core group of highly respected statisticians spread around the world, but everyone is welcome to contribute (free software). Many recent developments in statistics first appear as R extensions, ready to use for everyone (R packages make R easily extensible); probably there are already a few packages that implement the analyses you are looking for. Although the basic interface with R is through a command-line interface (“typing commands”) there are a number of graphical interfaces available (such as RStudio) that make the life easier for the newbie and for the occasional user. With R it is easy to make publication quality graphics, and change them too.
So far we have mentioned characteristics that testify the quality of R as an excellent software for doing statistics, but there is more to it. R encourages and facilitates quality research, that is, good scientific practices (usually described as reproducible research) taking advantage of the basic text based interface and a well designed object system. In R you can repeat an analysis exactly, unlike with click only interfaces. With R it is very easy to automate some analysis, making you not only save a lot of time but also avoid errors typical of repeated tasks.
Well, you will be saying, you have convinced me that R is good, but is it for the biologist? The existence of the Bioconductor project should be enough as the definitive answer, but as not everyone does genetics or genomics, we should also mention that use of R is becoming pervasive in domains such as Population Genetics, Ecology, Evolution & Speciation, Complexity Systems, Conservation and others and, naturally, in large scale high throughput Bioinformatic technologies.
The advantages of R come at a cost, it is necessary to understand the R system and objects, how it works, and learn a few basic stuff. How much this takes will depend on your previous experience, but R being text based, it allows for an incremental way of learning. Learning R requires personal involvement: without practice R is not easier than a foreign language; with practice you can very quickly get to the point of taking advantage of R recipes found on the web, better, you will be able to tell the good recipes from the bad.
sinceRely youRs, JM Nunes & Nicolas Hulo
PS: At the BioSC we are intensive and extensive users of R; we will gladly help you with your R.