You may have heard of the 0-index vs 1-index debates. This is about that.
Suppose you have a sequence of things, like 1,2,3,4 and you want thing number three. Easy enough, in some programming languages – like R, MATLAB, or FORTRAN – you just do array.
In most other programming languages that begin indexing at 0, you would have to do array which strikes me as unintuitive and weird. Why would anyone design a language like that?
Well, the why goes that somehow it began because
The TL;DR is that Hoye says most programmers think C has zero-indexed arrays because the creator of C chose zero for the run-time efficiency of pointer and array arithmetic. Hoye says this is wrong; he claims that C uses zero-indexed arrays because its predecessor BCPL used them, and that BCPL chose zero-indexed arrays for compile-time efficiency. (I’m not convinced by his technical arguments, by the way, but that’s beside the point I’d like to make.)
But that’s just an historical argument. Are there theoretical reasons to prefer 0-indexing over 1-indexing?
Edsger Dijkstra, in 1982 wrote a small piece explaining why numbering should start at zero, which I suggest reading.
The issues with his article are manyfold:
To denote the subsequence of natural numbers 2, 3, …, 12 without the pernicious three dots, four conventions are open to us
a) 2 ≤ i < 13 b) 1 < i ≤ 12 c) 2 ≤ i ≤ 12 d) 1 < i < 13
Are there reasons to prefer one convention to the other? Yes, there are. The observation that conventions a) and b) have the advantage that the difference between the bounds as mentioned equals the length of the subsequence is valid. So is the observation that, as a consequence, in either convention two subsequences are adjacent means that the upper bound of the one equals the lower bound of the other. Valid as these observations are, they don’t enable us to choose between a) and b); so let us start afresh.
Yeah, that is fairly true, but there are two problems.
The first one is that there are arguments to prefer c) and d), namely that the numbers that form the array appear in its definition.
Second, that he is not considering all the options. The one I would favour is actually 2<=i<=12, a closed interval.
There is a smallest natural number. Exclusion of the lower bound —as in b) and d)— forces for a subsequence starting at the smallest natural number the lower bound as mentioned into the realm of the unnatural numbers. That is ugly, so for the lower bound we prefer the ≤ as in a) and c). Consider now the subsequences starting at the smallest natural number: inclusion of the upper bound would then force the latter to be unnatural by the time the sequence has shrunk to the empty one. That is ugly, so for the upper bound we prefer < as in a) and d). We conclude that convention a) is to be preferred.
What is the smallest natural number? Some people begin to count at 0, others at 1 (I’m in this camp). What he says here is that if you go with b) or d) and want to refer to the sequence 0,1, you would have to write -1<i . But if one takes the naturals to begin at 1, the argument has no force. For the case of the empty sequence, it is unclear what he means. If we take an empty sequence, then there is no need to refer to it. In an empty sequence you can’y do array[whatever] because there isn’t anything.
When dealing with a sequence of length N, the elements of which we wish to distinguish by subscript, the next vexing question is what subscript value to assign to its starting element. Adhering to convention a) yields, when starting with subscript 1, the subscript range 1 ≤ i < N+1; starting with 0, however, gives the nicer range 0 ≤ i < N. So let us let our ordinals start at zero: an element’s ordinal (subscript) equals the number of elements preceding it in the sequence. And the moral of the story is that we had better regard —after all those centuries!— zero as a most natural number.
And if you consider the closed interval, then element N is at position N.
So much for Dijkstra. But there are more arguments involved, revolving generally around the “silly 1s”. Imagine a variable like a=[1,2,3,4]
- Access. In accessing it, 0-indexing requires you to add one to your index, a[2+1]
- Pointers. To access the memory address of the first element, one would have to do *a in 0-based and *a in 1-based indexing. In C (0-indexed), if you want to get element x of an array, given that each element occupies M in memory, you would do *(a+x*M). With 1-indexing, you would do *(a+(x-1)*M). This is because the way C works. You could also have designed a pointer arithmetic system where the pointer doesn’t point to elements but to the container, or what surrounds them (the array). If elements are bricks inside a box, the pointer could point to the gaps between boxes. And so *(a+1) would again be a reference to the first element.
- Slicing. Imagine that you want to get the two elements in the middle. There are of course different conventions for slicing (and many of them are silly, inferior to closed intervals), so it is argued that with 0-index, if one does b= a[start:end], then b[x]=a[start+x], while one would need to substract 1 for 1-indexing, a[start+x-1]. But this is not the case. It is the case if the way slicing works in the programming language under consideration has slicing methods like Dijkstra suggests. In C (and for that matter, python), doing a[0:1] in our abovementioned array does not give us [1,2]. It gives us . If a programming language is sensible (R is a good example for this specific case) and a[1,2]=[1,2] then this objection fails.
- Matrix operations. If you have a matrix and flatten it to an array, you can get element i,j in 0-indexing by matrix[i][j]=arr[i*ncols+j]. In 1-indexing, it is matrix[i][j]=arr[i*(ncols-1)+j]. Not as straightforward, true but also not something you would usually need to do these days (Where you have libraries that do the abstraction for you!)
Does this really matter? Nah, not really. I use both Python and R at work and I don’t get my 1s confused. It’s more of a matter of aesthetic preference. In practice, if the language is well designed you won’t have to do any weird addition or substraction of 1, usually. As an example, in Python to get the last element of an array you have to do arr[len(arr)-1], but you could use the notation arr[-1].
You can also get deep and get the argument to the philosophy of mathematics and set theory, it all boils down to your relative sympathy for 0 vs 1 as a starter for the naturals.