Structure and general syntax
A NetRexx program is built up out of a series of clauses that are composed of: zero or more blanks (which are ignored); a sequence of tokens (described in this section); zero or more blanks (again ignored); and the delimiter ';' (semicolon) which may be implied by line-ends or certain keywords. Conceptually, each clause is scanned from left to right before execution and the tokens composing it are resolved.
Identifiers (known as symbols) and numbers are recognized at this stage, comments (described below) are removed, and multiple blanks (except within literal strings) are reduced to single blanks. Blanks adjacent to operator characters and special characters are also removed.
Blanks and White Space
Blanks (spaces) may be freely used in a program to improve appearance and layout, and most are ignored. Blanks, however, are usually significant
- within literal strings (see below)
- between two tokens that are not special characters (for example, between two symbols or keywords)
- between the two characters forming a comment delimiter
- immediately outside parentheses ('(' and ')') or brackets ('[' and ']').
For implementations that support tabulation (tab) and form feed characters, these characters outside of literal strings are treated as if they were a single blank; similarly, if the last character in a NetRexx program is the End-of-file character (EOF, encoded in ASCII as decimal 26), that character is ignored.
Commentary is included in a NetRexx program by means of comments. Two forms of comment notation are provided; line comments are ended by the end of the line on which they start, and block comments are typically used for more extensive commentary.
- Line comments
A line comment is started by a sequence of two adjacent hyphens ('--'); all characters following that sequence up to the end of the line are then ignored by the NetRexx language processor.
Example:
i=j+7 -- this line comment follows an assignment
- Block comments
A block comment is started by the sequence of characters '/*', and is ended by the same sequence reversed, '*/'. Within these delimiters any characters are allowed (including quotes, which need not be paired). Block comments may be nested, which is to say that '/*' and '*/' must pair correctly. Block comments may be anywhere, and may be of any length. When a block comment is found, it is treated as though it were a blank (which may then be removed, if adjacent to a special character).
Example:
/* This is a valid block comment */
The two characters forming a comment delimiter ('/*' or '*/') must be adjacent (that is, they may not be separated by blanks or a line-end).
Note: It is recommended that NetRexx programs start with a block comment that describes the program. Not only is this good programming practice, but some implementations may use this to distinguish NetRexx programs from other languages.
Implementation minimum: Implementations should support nested block comments to a depth of at least 10. The length of a comment should not be restricted, in that it should be possible to 'comment out' an entire program.
Tokens
The essential components of clauses are called tokens. These may be of any length, unless limited by implementation restrictions,[1] and are separated by blanks, comments, ends of lines, or by the nature of the tokens themselves.
The tokens are:
- Literal strings
A sequence including any characters that can safely be represented in an implementation[2] and delimited by the single quote character (') or the double-quote ("). Use "" to include a " in a literal string delimited by ", and similarly use two single quotes to include a single quote in a literal string delimited by single quotes. A literal string is a constant and its contents will never be modified by NetRexx. Literal strings must be complete on a single line (this means that unmatched quotes may be detected on the line that they occur).
Any string with no characters (i.e., a string of length 0) is called a null string.
Examples:
'Fred'
'Aÿ'
"Don't Panic!"
":x"
'You shouldn''t' /* Same as "You shouldn't" */
'' /* A null string */
Within literal strings, characters that cannot safely or easily be represented (for example 'control characters') may be introduced using an escape sequence. An escape sequence starts with a backslash ('\'), which must then be followed immediately by one of the following (letters may be in either uppercase or lowercase): - t
- the escape sequence represents a tabulation (tab) character
- n
- the escape sequence represents a new-line (line feed) character
- r
- the escape sequence represents a return (carriage return) character
- f
- the escape sequence represents a form-feed character
- "
- the escape sequence represents a double-quote character
- '
- the escape sequence represents a single-quote character
- \
- the escape sequence represents a backslash character
- -
- the escape sequence represents a 'null' character (the character whose encoding equals zero), used to indicate continuation in a say instruction
- 0
- (zero) the escape sequence represents a 'null' character (the character whose encoding equals zero); an alternative to \-
- xhh
- the escape sequence represents a character whose encoding is given by the two hexadecimal digits ('hh') following the 'x'. If the character encoding for the implementation requires more than two hexadecimal digits, they are padded with zero digits on the left.
- uhhhh
- the escape sequence represents a character whose encoding is given by the four hexadecimal digits ('hhhh') following the 'u'. It is an error to use this escape if the character encoding for the implementation requires fewer than four hexadecimal digits.
Hexadecimal digits for use in the escape sequences above may be any decimal digit (0-9) or any of the first six alphabetic characters (a-f), in either lowercase or uppercase.
Examples:
'You shouldn\'t' /* Same as "You shouldn't" */
'\x6d\u0066\x63' /* In Unicode: 'mfc' */
'\\\u005C' /* In Unicode, two backslashes */
Implementation minimum: Implementations should support literal strings of at least 100 characters. (But note that the length of string expression results, etc., should have a much larger minimum, normally only limited by the amount of storage available.)- Symbols
Symbols are groups of characters selected from the Roman alphabet in uppercase or lowercase (A-Z, a-z), the Arabic numerals (0-9), and underscore. Implementations may also allow other alphabetic and numeric characters in symbols to improve the readability of programs in languages other than English. These additional characters are known as extra letters and extra digits.[3]
Examples:
DanYrOgof minx Élan Virtual3D
A symbol may include other characters only when the first character of the symbol is a digit (0-9 or an extra digit). In this case, it is a numeric symbol -- it may include a period ('.') and it must have the syntax of a number. This may be simple number, which is a sequence of digits with at most one period (which may not be the final character of the sequence), or it may be a number expressed in exponential notation. A number in exponential notation is a simple number followed immediately by the sequence 'E' (or 'e'), followed immediately by a sign ('+' or '-'),[4] followed immediately by one or more digits (which may not be followed by any other symbol characters).
Examples:
1
1.3
12.007
17.3E-12
3e+12
0.03E+9
When extra digits are used in numeric symbols, they must represent values in the range zero through nine. When numeric symbols are used as numbers, any extra digits are first converted to the corresponding character in the range 0-9, and then the symbol follows the usual rules for numbers in NetRexx (that is, the most significant digit is on the left, etc.). In the reference implementation, the extra letters are those characters (excluding A-Z, a-z, and underscore) that result in 1 when tested with java.lang.Character.isLetter. Similarly, the extra digits are those characters (excluding 0-9) that result in 1 when tested with java.lang.Character.isDigit.
The meaning of a symbol depends on the context in which it is used. For example, a symbol may be a constant (if a number), a keyword, the name of a variable, or identify some algorithm.
Implementation minimum: Implementations should support symbols of at least 50 characters. (But note that the length of its value, if it is a string variable, should have a much larger limit.)
- Operator characters
The characters + - * / % | & = \ > < are used (sometimes in combination) to indicate operations in expressions. A few of these are also used in parsing templates, and the equals sign is also used to indicate assignment. Blanks adjacent to operator characters are removed, so, for example, the sequences:
345>=123
345 >=123
345 >= 123
345 > = 123
are identical in meaning. Some of these characters may not be available in all character sets, and if this is the case appropriate translations may be used.
Note: The sequences '--', '/*', and '*/' are comment delimiters, as described earlier. The sequences '++' and '\\' are not valid in NetRexx programs.
- Special characters
The characters . , ; ) ( ] [ together with the operator characters have special significance when found outside of literal strings, and constitute the set of special characters. They all act as token delimiters, and blanks adjacent to any of these (except the period) are removed, except that a blank adjacent to the outside of a parenthesis or bracket is only deleted if it is also adjacent to another special character (unless this is a parenthesis or bracket and the blank is outside it, too).
Some of these characters may not be available in all character sets, and if this is the case appropriate translations may be used.
To illustrate how a clause is composed out of tokens, consider this example:
'REPEAT' B + 3;
This is composed of six tokens: a literal string, a blank operator (described later), a symbol (which is probably the name of a variable), an operator, a second symbol (a number), and a semicolon. The blanks between the 'B' and the '+' and between the '+' and the '3' are removed. However one of the blanks between the 'REPEAT' and the 'B' remains as an operator. Thus the clause is treated as though written:
'REPEAT' B+3;
Implied semicolons and continuations
A semicolon (clause end) is implied at the end of each line, except if:
- The line ends in the middle of a block comment, in which case the clause continues at the end of the block comment.
- The last token was a hyphen. In this case the hyphen is functionally replaced by a blank, and hence acts as a continuation character.
This means that semicolons need only be included to separate multiple clauses on a single line.
Notes:
- A comment is not a token, so therefore a comment may follow the continuation character on a line.
- Semicolons are added automatically by NetRexx after certain instruction keywords when in the correct context. The keywords that may have this effect are else, finally, otherwise, then; they become complete clauses in their own right when this occurs. These special cases reduce program entry errors significantly.
The case of names and symbols
In general, NetRexx is a case-insensitive language. That is, the names of keywords, variables, and so on, will be recognized independently of the case used for each letter in a name; the name 'Swildon' would match the name 'swilDon'.
NetRexx, however, uses names that may be visible outside the NetRexx program, and these may well be referenced by case-sensitive languages. Therefore, any name that has an external use (such as the name of a property, method, constructor, or class) has a defined spelling, in which each letter of the name has the case used for that letter when the name was first defined or used.
Similarly, the lookup of external names is both case-preserving and case-insensitive. If a class, method, or property is referenced by the name 'Foo', for example, an exact-case match will first be tried at each point that a search is made. If this succeeds, the search for a matching name is complete. If it is does not succeed, a case-insensitive search in the same context is carried out, and if one item is found, then the search is compl
ete. If more than one item matches then the reference is ambiguous, and an error is reported.
Implementations are encouraged to offer an option that requires that all name matches are exact (case-sensitive), for programmers or house-styles that prefer that approach to name matching.