1
0
Fork 0
mirror of git://git.code.sf.net/p/zsh/code synced 2025-09-30 19:20:53 +02:00

23562: add KSH_ZERO_SUBSCRIPT option and leave off by default

This commit is contained in:
Peter Stephenson 2007-06-18 13:25:03 +00:00
parent 5c44b0a472
commit abae4fe16e
17 changed files with 225 additions and 75 deletions

View file

@ -1,3 +1,17 @@
2007-06-18 Peter Stephenson <pws@csr.com>
* 23562: README, Doc/Zsh/options.yo, Doc/Zsh/params.yo,
Src/exec.c, Src/glob.c, Src/options.c, Src/params.c, Src/subst.c,
Src/zsh.h, Src/Modules/mapfile.c, Src/Modules/parameter.c,
Src/Zle/complete.c, Test/C01arith.ztst, Test/D05array.ztst,
Test/D06subscript.ztst, Test/D07multibyte.ztst,
Test/E01options.ztst: add KSH_ZERO_SUBSCRIPT option to handle
the currently default behaviour that $array[0] or $string[0]
is mapped to $array[1] or $string[1] if KSH_ARRAYS is not
in effect. Now off by default; returns empty element or
causes error if used for setting. Valid ranges that happen
to include zero are not affected.
2007-06-16 Peter Stephenson <p.w.stephenson@ntlworld.com>
* unposted: Test/D07multibyte.ztst: stop test files from

View file

@ -1244,6 +1244,31 @@ tt(readonly), are processed. Without this option, zsh will perform normal
word splitting after command and parameter expansion in arguments of an
assignment; with it, word splitting does not take place in those cases.
)
pindex(KSH_ZERO_SUBSCRIPT)
cindex(arrays, behaviour of index zero)
item(tt(KSH_ZERO_SUBSCRIPT))(
Treat use of a subscript of value zero in array or string expressions as a
reference to the first element, i.e. the element that usually has the
subscript 1. Ignored if tt(KSH_ARRAYS) is also set.
If neither this option nor tt(KSH_ARRAYS) is set, accesses to an element of
an array or string with subscript zero return an empty element or string,
while attempts to set element zero of an array or string are treated as an
error. However, attempts to set an otherwise valid subscript range that
includes zero will succeed. For example, if tt(KSH_ZERO_SUBSCRIPT) is not
set,
example(array[0]=(element))
is an error, while
example(array[0,1]=(element))
is not and will replace the first element of the array.
This option is for compatibility with older versions of the shell and
is not recommended in new code.
)
pindex(POSIX_BUILTINS)
item(tt(POSIX_BUILTINS) <K> <S>)(
When this option is set the tt(command) builtin can be used to execute

View file

@ -96,6 +96,14 @@ Subscripts may be used inside braces used to delimit a parameter name, thus
option is set, the braced form is the only one that works, as bracketed
expressions otherwise are not treated as subscripts.
If the tt(KSH_ARRAYS) option is not set, then by default accesses to
an array element with a subscript that evaluates to zero return an
empty string, while an attempt to write such an element is treated as
an error. For backward compatibility the tt(KSH_ZERO_SUBSCRIPT)
option can be set to cause subscript values 0 and 1 to be equivalent; see
the description of the option in ifzman(zmanref(zshoptions))\
ifnzman(noderef(Description of Options)).
The same subscripting syntax is used for associative arrays, except that
no arithmetic expansion is applied to var(exp). However, the parsing
rules for arithmetic expressions still apply, which affects the way that
@ -233,26 +241,22 @@ print ${array[(R)$key2]})
item(tt(R))(
Like `tt(r)', but gives the last match. For associative arrays, gives
all possible matches. May be used for assigning to ordinary array
elements, but not for assigning to associative arrays.
On failure the empty string is returned.
elements, but not for assigning to associative arrays. On failure, for
normal arrays this has the effect of returning the element corresponding to
subscript 0; this is empty unless one of the options tt(KSH_ARRAYS) or
tt(KSH_ZERO_SUBSCRIPT) is in effect.
)
item(tt(i))(
Like `tt(r)', but gives the index of the match instead; this may not be
combined with a second argument. On the left side of an assignment,
behaves like `tt(r)'. For associative arrays, the key part of each pair
is compared to the pattern, and the first matching key found is the
result.
On failure, a value one past the end of the array or string is returned.
result. On failure substitutes one more than the last currently
valid index, as discussed under the description of `tt(r)'.
)
item(tt(I))(
Like `tt(i)', but gives the index of the last match, or all possible
matching keys in an associative array.
On failure the value 0 is returned. If the option tt(KSH_ARRAYS) is in
effect, the subscript is still 0 for a failed match; this cannot be
distinguished from a successful match without testing tt(${array[0]})
against the pattern.
matching keys in an associative array. On failure substitutes 0.
)
item(tt(k))(
If used in a subscript on an associative array, this flag causes the keys

29
README
View file

@ -45,6 +45,31 @@ behaviour.) Now it is treated identically to "$@". The same change
applies to expressions with forced splitting such as ${=1+"$@"}, but
otherwise the case where SH_WORD_SPLIT is not set is unaffected.
In previous versions of the shell it was possible to use index 0 in an
array or string subscript to refer to the same element as index 1 if the
option KSH_ARRAYS was not in effect. This was a limited approximation to
the full KSH_ARRAYS handling and so was not very useful. In this version
of the shell, this behaviour is only provided when the option
KSH_ZERO_SUBSCRIPT is set. Note that despite the name this does not provide
true compatibility with ksh or other shells and KSH_ARRAYS should still be
used for that purpose. By default, the option is not set; an array
subscript that evaluates to 0 returns an empty string or array element and
attempts to write to an array or string range including only a zero
subscript are treated as an error. Writes to otherwise valid ranges that
also include index zero are allowed; hence for example the assignment
array[(R)notfound,(r)notfound]=()
(where the string "notfound" does not match an element in $array) sets the
entire array to be empty, as in previous versions of the shell.
KSH_ZERO_SUBSCRIPT is irrelevant when KSH_ARRAYS is set. Also as in previous
versions, attempts to write to non-existent elements at the end of an array
cause the array to be suitably extended. This difference means that, for
example
array[(R)notfound]=(replacement)
is an error if KSH_ZERO_SUBSCRIPT is not set (new behaviour), while
array[(r)notfound]=(replacement)
causes the given value to be appended to the array (same behaviour as
previous versions).
The "exec" precommand modifier now takes various options for compatibility
with other shells. This means that whereas "exec -prog" previously
tried to execute a command name "-prog", it will now report an error
@ -77,10 +102,6 @@ of the value. The form ${param//#$search/replace} where the value
$search starts with "%" considers the "%" to be part of the search
string as before.
Parameter subscripts of the form ${array[(R)test]} now return the
empty string if they fail to match. The previous longstanding behaviour
was confusing and useless.
The MULTIBYTE option is on by default where it is available; this
causes many operations to recognise characters as in the current locale.
Older versions of the shell always assumed a character was one byte.

View file

@ -149,7 +149,7 @@ setpmmapfiles(Param pm, HashTable ht)
for (hn = ht->nodes[i]; hn; hn = hn->next) {
struct value v;
v.isarr = v.inv = v.start = 0;
v.isarr = v.flags = v.start = 0;
v.end = -1;
v.arr = NULL;
v.pm = (Param) hn;

View file

@ -180,7 +180,7 @@ setpmcommands(UNUSED(Param pm), HashTable ht)
Cmdnam cn = zshcalloc(sizeof(*cn));
struct value v;
v.isarr = v.inv = v.start = 0;
v.isarr = v.flags = v.start = 0;
v.end = -1;
v.arr = NULL;
v.pm = (Param) hn;
@ -341,7 +341,7 @@ setfunctions(UNUSED(Param pm), HashTable ht, int dis)
for (hn = ht->nodes[i]; hn; hn = hn->next) {
struct value v;
v.isarr = v.inv = v.start = 0;
v.isarr = v.flags = v.start = 0;
v.end = -1;
v.arr = NULL;
v.pm = (Param) hn;
@ -701,7 +701,7 @@ setpmoptions(UNUSED(Param pm), HashTable ht)
struct value v;
char *val;
v.isarr = v.inv = v.start = 0;
v.isarr = v.flags = v.start = 0;
v.end = -1;
v.arr = NULL;
v.pm = (Param) hn;
@ -1325,7 +1325,7 @@ setpmnameddirs(UNUSED(Param pm), HashTable ht)
struct value v;
char *val;
v.isarr = v.inv = v.start = 0;
v.isarr = v.flags = v.start = 0;
v.end = -1;
v.arr = NULL;
v.pm = (Param) hn;
@ -1554,7 +1554,7 @@ setaliases(HashTable alht, UNUSED(Param pm), HashTable ht, int flags)
struct value v;
char *val;
v.isarr = v.inv = v.start = 0;
v.isarr = v.flags = v.start = 0;
v.end = -1;
v.arr = NULL;
v.pm = (Param) hn;

View file

@ -1139,7 +1139,7 @@ set_compstate(UNUSED(Param pm), HashTable ht)
for (cp = compkparams,
pp = compkpms; cp->name; cp++, pp++)
if (!strcmp(hn->nam, cp->name)) {
v.isarr = v.inv = v.start = 0;
v.isarr = v.flags = v.start = 0;
v.end = -1;
v.arr = NULL;
v.pm = (Param) hn;

View file

@ -1470,7 +1470,7 @@ zglob(LinkList list, LinkNode np, int nountok)
v.isarr = SCANPM_WANTVALS;
v.pm = NULL;
v.end = -1;
v.inv = 0;
v.flags = 0;
if (getindex(&s, &v, 0) || s == os) {
zerr("invalid subscript");
restore_globstate(saved);

View file

@ -153,9 +153,10 @@ static struct optname optns[] = {
{{NULL, "interactivecomments",OPT_BOURNE}, INTERACTIVECOMMENTS},
{{NULL, "ksharrays", OPT_EMULATE|OPT_BOURNE}, KSHARRAYS},
{{NULL, "kshautoload", OPT_EMULATE|OPT_BOURNE}, KSHAUTOLOAD},
{{NULL, "kshglob", OPT_EMULATE|OPT_KSH}, KSHGLOB},
{{NULL, "kshglob", OPT_EMULATE|OPT_KSH}, KSHGLOB},
{{NULL, "kshoptionprint", OPT_EMULATE|OPT_KSH}, KSHOPTIONPRINT},
{{NULL, "kshtypeset", OPT_EMULATE|OPT_KSH}, KSHTYPESET},
{{NULL, "kshtypeset", OPT_EMULATE|OPT_KSH}, KSHTYPESET},
{{NULL, "kshzerosubscript", 0}, KSHZEROSUBSCRIPT},
{{NULL, "listambiguous", OPT_ALL}, LISTAMBIGUOUS},
{{NULL, "listbeep", OPT_ALL}, LISTBEEP},
{{NULL, "listpacked", 0}, LISTPACKED},

View file

@ -520,7 +520,7 @@ scanparamvals(HashNode hn, int flags)
return;
}
v.isarr = (PM_TYPE(v.pm->node.flags) & (PM_ARRAY|PM_HASHED));
v.inv = 0;
v.flags = 0;
v.start = 0;
v.end = -1;
paramvals[numparamvals] = getstrvalue(&v);
@ -1298,7 +1298,7 @@ getarg(char **str, int *inv, Value v, int a2, zlong *w,
(*ta || ((v->isarr & SCANPM_MATCHMANY) &&
(v->isarr & (SCANPM_MATCHKEY | SCANPM_MATCHVAL |
SCANPM_KEYMATCH))))) {
*inv = v->inv;
*inv = (v->flags & VALFLAG_INV) ? 1 : 0;
*w = v->end;
return 1;
}
@ -1317,19 +1317,6 @@ getarg(char **str, int *inv, Value v, int a2, zlong *w,
if (pprog && pattry(pprog, *p) && !--num)
return r;
}
/*
* Failed to match.
* If we're returning an index, return 0 to show
* we've gone off the start. Unfortunately this
* is ambiguous with KSH_ARRAYS set, but we're
* stuck with that now.
*
* If the index is to be turned into an element,
* return an index that does not point to a valid
* element (since 0 is treated the same as 1).
*/
if (!ind)
r = len + 1;
} else
for (r = 1 + beg, p = ta + beg; *p; r++, p++)
if (pprog && pattry(pprog, *p) && !--num)
@ -1549,13 +1536,7 @@ getarg(char **str, int *inv, Value v, int a2, zlong *w,
}
}
}
/*
* Failed to match.
* If the argument selects an element rather than
* its index, ensure the element is empty.
* See comments on the array case above.
*/
return (down && ind) ? 0 : slen + 1;
return down ? 0 : slen + 1;
}
}
return r;
@ -1563,13 +1544,14 @@ getarg(char **str, int *inv, Value v, int a2, zlong *w,
/**/
int
getindex(char **pptr, Value v, int dq)
getindex(char **pptr, Value v, int flags)
{
int start, end, inv = 0;
char *s = *pptr, *tbrack;
*s++ = '[';
s = parse_subscript(s, dq); /* Error handled after untokenizing */
/* Error handled after untokenizing */
s = parse_subscript(s, flags & SCANPM_DQUOTED);
/* Now we untokenize everything except inull() markers so we can check *
* for the '*' and '@' special subscripts. The inull()s are removed *
* in getarg() after we know whether we're doing reverse indexing. */
@ -1654,7 +1636,7 @@ getindex(char **pptr, Value v, int dq)
if (start > 0 && (isset(KSHARRAYS) || (v->pm->node.flags & PM_HASHED)))
start--;
if (v->isarr != SCANPM_WANTINDEX) {
v->inv = 1;
v->flags |= VALFLAG_INV;
v->isarr = 0;
v->start = start;
v->end = start + 1;
@ -1686,7 +1668,32 @@ getindex(char **pptr, Value v, int dq)
if (start > 0)
start -= startprevlen;
else if (start == 0 && end == 0)
end = startnextlen;
{
/*
* Strictly, this range is entirely off the
* start of the available index range.
* This can't happen with KSH_ARRAYS; we already
* altered the start index in getarg().
* Are we being strict?
*/
if (isset(KSHZEROSUBSCRIPT)) {
/*
* We're not.
* Treat this as accessing the first element of the
* array.
*/
end = startnextlen;
} else {
/*
* We are. Flag that this range is invalid
* for setting elements. Set the indexes
* to a range that returns empty for other accesses.
*/
v->flags |= VALFLAG_EMPTY;
start = -1;
com = 1;
}
}
if (s == tbrack) {
s++;
if (v->isarr && !com &&
@ -1755,7 +1762,7 @@ fetchvalue(Value v, char **pptr, int bracks, int flags)
else
v = (Value) hcalloc(sizeof *v);
v->pm = argvparam;
v->inv = 0;
v->flags = 0;
v->start = ppar - 1;
v->end = ppar;
if (sav)
@ -1786,11 +1793,11 @@ fetchvalue(Value v, char **pptr, int bracks, int flags)
v->isarr = SCANPM_MATCHMANY;
}
v->pm = pm;
v->inv = 0;
v->flags = 0;
v->start = 0;
v->end = -1;
if (bracks > 0 && (*s == '[' || *s == Inbrack)) {
if (getindex(&s, v, (flags & SCANPM_DQUOTED))) {
if (getindex(&s, v, flags)) {
*pptr = s;
return v;
}
@ -1830,7 +1837,7 @@ getstrvalue(Value v)
if (!v)
return hcalloc(1);
if (v->inv && !(v->pm->node.flags & PM_HASHED)) {
if ((v->flags & VALFLAG_INV) && !(v->pm->node.flags & PM_HASHED)) {
sprintf(buf, "%d", v->start);
s = dupstring(buf);
return s;
@ -1911,7 +1918,7 @@ getarrvalue(Value v)
return arrdup(nular);
else if (IS_UNSET_VALUE(v))
return arrdup(&nular[1]);
if (v->inv) {
if (v->flags & VALFLAG_INV) {
char buf[DIGBUFSIZE];
s = arrdup(nular);
@ -1943,7 +1950,7 @@ getintvalue(Value v)
{
if (!v)
return 0;
if (v->inv)
if (v->flags & VALFLAG_INV)
return v->start;
if (v->isarr) {
char **arr = getarrvalue(v);
@ -1970,7 +1977,7 @@ getnumvalue(Value v)
if (!v) {
mn.u.l = 0;
} else if (v->inv) {
} else if (v->flags & VALFLAG_INV) {
mn.u.l = v->start;
} else if (v->isarr) {
char **arr = getarrvalue(v);
@ -2000,7 +2007,7 @@ export_param(Param pm)
if (emulation == EMULATE_KSH /* isset(KSHARRAYS) */) {
struct value v;
v.isarr = 1;
v.inv = 0;
v.flags = 0;
v.start = 0;
v.end = -1;
val = getstrvalue(&v);
@ -2037,6 +2044,11 @@ setstrvalue(Value v, char *val)
zsfree(val);
return;
}
if (v->flags & VALFLAG_EMPTY) {
zerr("%s: assignment to invalid subscript range", v->pm->node.nam);
zsfree(val);
return;
}
v->pm->node.flags &= ~PM_UNSET;
switch (PM_TYPE(v->pm->node.flags)) {
case PM_SCALAR:
@ -2051,7 +2063,7 @@ setstrvalue(Value v, char *val)
z = dupstring(v->pm->gsu.s->getfn(v->pm));
zlen = strlen(z);
if (v->inv && unset(KSHARRAYS))
if ((v->flags & VALFLAG_INV) && unset(KSHARRAYS))
v->start--, v->end--;
if (v->start < 0) {
v->start += zlen;
@ -2176,6 +2188,11 @@ setarrvalue(Value v, char **val)
v->pm->node.nam);
return;
}
if (v->flags & VALFLAG_EMPTY) {
zerr("%s: assignment to invalid subscript range", v->pm->node.nam);
freearray(val);
return;
}
if (v->start == 0 && v->end == -1) {
if (PM_TYPE(v->pm->node.flags) == PM_HASHED)
arrhashsetfn(v->pm, val, 0);
@ -2194,7 +2211,7 @@ setarrvalue(Value v, char **val)
v->pm->node.nam);
return;
}
if (v->inv && unset(KSHARRAYS)) {
if ((v->flags & VALFLAG_INV) && unset(KSHARRAYS)) {
if (v->start > 0)
v->start--;
v->end--;

View file

@ -2008,7 +2008,7 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int ssub)
v->isarr = isarr;
v->pm = pm;
v->end = -1;
if (getindex(&s, v, qt) || s == os)
if (getindex(&s, v, qt ? SCANPM_DQUOTED : 0) || s == os)
break;
}
/*
@ -2025,8 +2025,11 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int ssub)
* in the subexp stuff or immediately above.
*/
if ((isarr = v->isarr)) {
/* No way to get here with v->inv != 0, so getvaluearr() *
* is called by getarrvalue(); needn't test PM_HASHED. */
/*
* No way to get here with v->flags & VALFLAG_INV, so
* getvaluearr() is called by getarrvalue(); needn't test
* PM_HASHED.
*/
if (v->isarr == SCANPM_WANTINDEX) {
isarr = v->isarr = 0;
val = dupstring(v->pm->node.nam);
@ -2048,8 +2051,9 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int ssub)
int tmplen = arrlen(v->pm->gsu.a->getfn(v->pm));
if (v->start < 0)
v->start += tmplen + v->inv;
if (!v->inv && (v->start >= tmplen || v->start < 0))
v->start += tmplen + ((v->flags & VALFLAG_INV) ? 1 : 0);
if (!(v->flags & VALFLAG_INV) &&
(v->start >= tmplen || v->start < 0))
vunset = 1;
}
if (!vunset) {

View file

@ -585,12 +585,17 @@ struct asgment {
struct value {
int isarr;
Param pm; /* parameter node */
int inv; /* should we return the index ? */
int flags; /* flags defined below */
int start; /* first element of array slice, or -1 */
int end; /* 1-rel last element of array slice, or -1 */
char **arr; /* cache for hash turned into array */
};
enum {
VALFLAG_INV = 0x0001, /* We are performing inverse subscripting */
VALFLAG_EMPTY = 0x0002 /* Subscripted range is empty */
};
#define MAX_ARRLEN 262144
/********************************************/
@ -1725,6 +1730,7 @@ enum {
KSHGLOB,
KSHOPTIONPRINT,
KSHTYPESET,
KSHZEROSUBSCRIPT,
LISTAMBIGUOUS,
LISTBEEP,
LISTPACKED,

View file

@ -91,8 +91,13 @@
>3.5
>4
(( newarray[unsetvar]++ ))
(( newarray[unsetvar]++ ))
(( newarray[unsetvar] = 1 ))
2:error using unset variable as index
?(eval):1: assignment to invalid subscript range
integer setvar=1
(( newarray[setvar]++ ))
(( newarray[setvar]++ ))
print ${(t)newarray} ${#newarray} ${newarray[1]}
0:setting array elements in math context
>array 1 2

View file

@ -30,12 +30,12 @@
>..
echo .$foo[0].
0:Treat 0 like 1
>.a.
0:Treat 0 as empty
>..
echo .$foo[0,0].
0:Treat 0,0 like 1,1.
>.a.
0:Treat 0,0 as empty
>..
echo .$foo[0,1].
0:Another weird way to access the first element

View file

@ -182,3 +182,56 @@
echo X${${l##*}[-1]}X
0:Negative index applied to substition result from empty array
>XX
array=(one two three four)
print X$array[0]X
0:Element zero is empty if KSH_ZERO_SUBSCRIPT is off.
>XX
array[0]=fumble
1:Can't set element zero if KSH_ZERO_SUBSCRIPT is off.
?(eval):1: array: assignment to invalid subscript range
print X$array[(R)notfound]X
0:(R) returns empty if not found if KSH_ZERO_SUBSCRIPT is off.
>XX
setopt KSH_ZERO_SUBSCRIPT
print X$array[0]X
0:Element zero is element one if KSH_ZERO_SUBSCRIPT is on.
>XoneX
array[0]=fimble
print $array
0:Can set element zero if KSH_ZERO_SUBSCRIPT is on.
>fimble two three four
print X$array[(R)notfound]X
0:(R) yuckily returns the first element on failure withe KSH_ZERO_SUBSCRIPT
>XfimbleX
unsetopt KSH_ZERO_SUBSCRIPT
array[(R)notfound,(r)notfound]=(help help here come the seventies retreads)
print $array
0:[(R)notfound,(r)notfound] replaces the whole array
>help help here come the seventies retreads
string="Why, if it isn't Officer Dibble"
print "[${string[0]}][${string[1]}][${string[0,3]}]"
0:String subscripts with KSH_ZERO_SUBSCRIPT unset
>[][W][Why]
setopt KSH_ZERO_SUBSCRIPT
print "[${string[0]}][${string[1]}][${string[0,3]}]"
0:String subscripts with KSH_ZERO_SUBSCRIPT set
>[W][W][Why]
unsetopt KSH_ZERO_SUBSCRIPT
string[0,3]="Goodness"
print $string
0:Assignment to chunk of string ignores element 0
>Goodness, if it isn't Officer Dibble
string[0]=!
1:Can't set only element zero of string
?(eval):1: string: assignment to invalid subscript range

View file

@ -88,7 +88,7 @@
s=é
print A${s[-2]}A B${s[-1]}B C${s[0]}C D${s[1]}D E${s[2]}E
0:Out of range subscripts with multibyte characters
>AA BéB CéC DéD EE
>AA BéB CC DéD EE
print ${a[(i)é]} ${a[(I)é]} ${a[${a[(i)é]},${a[(I)é]}]}
0:Reverse indexing with multibyte characters

View file

@ -521,7 +521,7 @@
>one one[2]
>one two three
>one two three two
>one one two three
>one two three
fpath=(.)
echo >foo 'echo foo loaded; foo() { echo foo run; }'