The story of Grey Level Index: forty years of misleading consistency?

D. Problematic statistical testing

The third problem is  statistical testing of the significance of squared Mahalonobis distance  (SMD) maximae. Mahalanobis distance estimate is plagued by numerous sources of errors, especially when parameters are strongly correlated. It is known that the magnitude of these errors might be acceptable or even insignificant if number of parameters (p) is by order of magnitude less then the number of observations (n), (see, for example, an impressive work regarding this problem by A.D.Ker “Stability of Mahalanobis distance”, Oxford, 2010). Unfortunately it is not so in GLI profiles comparison case, where p (number of features) is 10 and the final value of n (block size) varies from 10 to 24. The use of Hotelling’s statistics is problematic as well, because it is also known for notorious lack of robustness for correlated data. Numerous robust alternatives based on PCA/LDA, SVM and other informatics approaches, were developed during last 20 years. However, we can find that none of them were adopted in later works, at least – ten, published from 1990 to 2015, where GLI method was described again and again by essentially the same authors (with one exception: the addition of the Bonferroni correction, mentioned in 2015).

E. Observer-independence of GLI mapping technology

The fourth and the most substantially problematic claim is observer-independence of the method. This issue requires more elaborate explanation. Even in one of the most recent papers relevant to the subject (“Two New Cytoarchitectonic Areas on the Human Mid-Fusiform Gyrus”, S. Lorenz et all., Cerebral Cortex, 2015, 1–13) we read: curvilinear trajectories outlining the first and the last layer of the cortical plate in selected region have to be “interactively drawn”. So, here we have an admission that the procedure in not entirely observer-independent. Moreover, the beginning and the end of the curvilinear band, selected for profiles measurements (the “left” and “right” edge) has to be manually selected as well. The reason is simple: as was demonstrated above, while measuring profiles we have to deal with section orientation problem.

The profile of the cortical area can be reproducibly registered only in limited zones of the section where the orientation of cortical columns is clearly visible and parallel to the section plane. By rough estimate, such regions usually cover only 50% of the cortical section or less. Presently, in the absence of reliable algorithms, these regions can be selected only by visual analysis of the section. Additional criteria for such selection are clearly visible apical dendrites and uninterrupted sections of the micro-vessels, visible through significant portion (two-three layers) of the cortical traverse. In all other regions, where the orientation of the section plane does not match the direction of the columns, the position of the border has to be determined by extrapolation of boundaries created in “correctly-oriented” sections, in other words – by pure guessing. Additionally, uncontrolled subtle changes of orientation of the cortical plate in the section under study might create condition for registration of additional “artificial” borders.

The selection of the appropriate distance measure, squared Euclidian distance function vs squared Mahalanobis distance function also was a matter of visual evaluation of observed boundaries. Mahalanobis distance was selected because it created better set of peaks, which matched more reasonably with observed boundaries (1998). The next admission of observer-dependence is the admitted necessity to check and reject peaks caused by staining and cutting artifacts, folds, ruptures, vessels, or “untypical cell clusters”. In other words, absence of image processing that could eliminate or minimize the influence of all these problems leads to an additional step, which involves the necessity of visual control.

The significance of the peak is determined by statistical test, but the number of peaks and their heights depends upon the value of the “block size”, which is actually the width of the area of profile averaging.  For example, compare the number of peaks for n = 8 and 12 in fig. 4 in 1998s paper, or fig.5 in 1999’s paper, which are the same pictures. Authors claim that block size selection algorithms converges automatically. However, as we can see, this value is selected differently for different areas:  12 – for areas 17 and 18, 1999; 10 – for areas TE1.0, TE1.1 and TE2 (Fig. 7) in 1999 paper, but 20 (Fig.1) for the same set of areas in 2006 paper; 22 – for areas FG4 and FG4, 2015,  24 – for areas  hOc3D, hOc4d and hOc4ip, 2013, and so on. This detail was not explained, especially the discrepancy between earlier and later results for temporal areas. However, it would be reasonable to expect that due to the lack of robustness of Hotelling’s test “automatic convergence of the algorithm” does not always happen. So, in order to determine what width of the block “makes sense” one had to check the number and position of the peaks against expected, visually detectable signs of architectonic differences between a certain set of areas. Peaks filtering procedure becomes even more complicated considering an additional step of “connecting” corresponding peaks on consecutive serial sections. And this is exactly what can be found in 2013’s paper by M.Kujovic et all (“Cytoarchitectonic mapping of human dorsal extrastriate cortex”, 2013, Brain Struct Fnct, 218:157-172): “The position of maxima were accepted as borders if found at comparable positions in neighboring sections and confirmed by visual inspection of cell-stained sections”. The same was re-iterated again in later paper: “detection of a border region was reconfirmed by visual inspection” (S.Lorenz et all, “Two new cytoarchitectonic areas of the human mid-fusiform gyrus”, Cerebral CZortex, 2015, 1-13).

Simply speaking, the  observer dependence is shifted from visual determination of the position of the border to peak filtering procedure and, quite possible, even to selection of the “correct” size of the averaging block. Considering all these details, this procedure does not look at all neither robust, nor automatic. So, in view of numerous steps that clearly require manual interaction and visual checking, the claim of observer-independence of this technology seems significantly exaggerated to say the least.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.