Data sharing

8. BigBrain data sharing: good face in a bad play.

Interview statement:

“Today, the BigBrain model and the 3D maps are a prime example of shared big data. They can now be clicked on, rotated, zoomed in and marvelled at by anyone on the Internet. Is something similar to emerge from HIBALL in the end?

In principle, yes. When it comes to the 20-micrometre BigBrain model, we’re talking about one terabyte of data – even that’s not something that can “just quickly be downloaded”. That’s why we have developed web-based tools in which only the data you are looking at has to be transferred. This is similar to what has already proven successful in other areas. If we go to the 1-micrometre level, it’s going to be even more challenging. So we need to develop methods that allow data to be processed without having to transport it.”

My Question 1: What is the value of error of the map’s borders shared so far by BigBrain project?

My Answer: Here we see again how the fascination with huge size of data files twists the mindset and points to the problem into completely wrong direction. However, the analysis of maps shared by BigBrain project gives important insight into real problems of these maps: mapping error.

BigBrain data are shared as area maps. Going to EBRAIN Data Sharing site and entering BigBrain into the “search” field we are getting 48 results (check it here). To “marvel”the data we can use publicly available browser-based viewer. Position of the border between areas was determined by a procedure, described by K.Zilles, A. Schleicher et all. a long time ago. The procedure is based upon measurement of GLI profile  (you can look at my analysis of 40 years history of this method here). In essence, the position of the border is determined by a peak of a Mahalanobis distance function, estimated as a difference between feature vectors of neighboring blocks of GLI profiles. Dependency of the parcellation pattern on the number of profiles per block b was described by K.Amunts and colleagues many times (see, for example [4]). Moreover, the value of block size varies for borders between different areas (see table below).

Areas                    Block Size b     Block Width = b x 17

Te1.0,Te1.1,Te1.3, Te3   10-20            170-340

V1,V2                    12               204

FG3, FG4                 22               374

hOc3D, hOc4d, hOc4ip     24               408

Given the size of a measuring field 17×17 micrometers, the uncertainty of the border position (we also can call it spatial error, or position error) of this method could be roughly estimated as 1/2 of the block width, and varies between 85 and 204  micrometers. Looking at the different errors, clearly noticeable in the maps, we can only conclude that this is indeed, an estimate of a minimal error (Fig. 7). As we see, the real error of this map is much higher.

Fig. 7: The magnitude and types of errors in BigBrain maps. A – gaps, B – rough contours, C – systematic shifts, D – border misalignment, E – fragments lost in space, F – areas overlaps.

My Question 2: how to estimate the magnitude of mapping errors and what does it tell us about real map resolution?

My Answer:  Comparing the size of gaps and border defects with available ruler we can only conclude that we are dealing with errors magnitude of several millimetres (1 mm = 1000 – micrometers). So, it definitely does not look as 20-micron resolution map.

Of course, borders of cortical areas are of arbitrary nature and they are result of a convention. Still, we have to realize, that the resolution of the area map cannot be better then the error of estimate of the position of its borders. So, projection of the entire map to higher-resolution template (histological template instead of a MRI template) might even increase this error due to projection (scaling or registration) procedure, what we see in the picture above. Therefore, in best-case scenario, resolution of the map projected on the template can be estimated as minimal average mapping error. According to the calculations above, it might be estimated as 1/2 of the average block width, which is roughly 150 micrometers. Therefore, pretending that the resolution of the map and of the whole atlas is 20 microns clearly means significant bias of true facts.

My Question 3: does it make sense to declare a goal to create 1 micrometer-resolution map, if the error of mapping procedure is at least 150 times higher?

My Answer:  whether on purpose, or by mistake, but BigBrain publications and discussions always mix two different issues: resolution of the map and resolution of the microscopy template. Uninterrupted 1 mk-resolution microscopy data set make a lot of sense. It can be used to study cytoarchitecture in 3D, which was newer done before.  However, sharing these data was technically possible even 10 years ago, and is even simpler now.

Still, this is always mixed with 1mk-resolution map, which apparently still does not exist today. Looking at estimates of the mapping procedure and level of error of 20 mk-resolution map, it hardly makes sense to build the map of 1 mk resolution. Even more so, it hardly makes sense to create dependency with sharing 1 mk microscopy data upon creation of 1 mk-resolution cytoarchitectonic map. Even more so, the only reason for it is to have a good excuse for not sharing microscopy data. As I said: good face in a bad play.

Next several questions are more political than scientific, so I would not analyse them and jump directly to the last question.

<<Next Page>>