CMDSyntax Usage and Reference Guide

Author:	David Boddie <david@boddie.org.uk>
Last modified:	23rd September 2003

Abstract

CMDSyntax is a library for parsing command line arguments according to a syntax definition, returning dictionaries of values where the command line complies with the required syntax.

A typical syntax string is made up of labels, switches (also known as options), commands, brackets and operators. Combinations of these objects allow developers to specify the valid structure of a variety of command lines. The use of a human readable format for syntax descriptions enables the developer to use the same syntax string for presentation to the user, either in the form of a textual message or as a form to fill in.

The style of the syntax used can be customised for the convenience of the developer. The style of the arguments passed from the command line can also be specified to support platform-specific conventions for command line tools.

Quick example

  # Import the syntax library alongside the system and pretty printing
  # libraries.
  
  import sys, cmdsyntax, pprint
  
  # Create a syntax object, describing the desired form of input.
  
  s = cmdsyntax.Syntax("[-v] infile [-o outfile]")
  
  # Find any possible matches between the command line given
  # and the syntax definition.
  
  matches = s.get_args(sys.argv[1:])
  
  # Display the matches.
  
  pprint.pprint(matches)

Introduction
Using the library
Summary
Syntax definitions
Styles
- Syntax styles
- Command line styles

Introduction

The CMDSyntax library takes an approach to the problem of interpreting command line arguments which is different to that used by many other Python solutions. Many libraries enable the developer to specify the arguments which can be passed to a program from the command line and provide a convenient way of retrieving the values associated with those arguments. This approach works well as long as combinations of switches and commands are not used to specify input which may be interpreted as being contradictory or ambiguous. Programs which present a complex interface offering large numbers of command line switches are more likely to be presented with invalid input; the developer therefore has to implement a substantial syntax-checking infrastructure in addition to describing the interface to the user and validating correct input.

The central concept of this library is the idea that arguments passed to a program from the command line should be checked against a simple syntax definition, producing a list of possible matches. For a well-defined syntax, this validation process removes the need for the developer to check whether the input is self-consistent. This approach releases development time which can be spent on checking the quality of the input and implementing features. The form of the syntax definition and its components are given in the Syntax definitions section.

A description of the library's main features is given in the section on usage which discusses simple verification of command line arguments with suitable examples. More advanced uses, such as user input correction and graphical user interface generation, are also discussed.

Differing styles of syntax specification and user input are supported through the use of a class which encapsulates the form of the common command line features such as labels and switches. This allows the developer to produce programs which mimic the style of some legacy tools. For example, short switches with long names such as -quiet may be enabled and interpreted as a single switch in the same manner as --quiet. A section describing the usage of styles covers their most common uses; a more detailed description is given in the Styles section.

Conventions

Since we will be discussing various forms of textual input, both specified by the developer and given by the user at the command line, we need to adopt typographic conventions to clearly indicate the context in which each piece of input is given. These are defined in the following paragraphs.

An excerpt from a program is displayed as preformatted text separately from the body of the text. For example, an import statement will be displayed as:

  import cmdsyntax, sys

Rare instances of Python code which fails to compile, which causes an exception to be raised at runtime, or which leads to unexpected results are displayed in a similar manner to normal program excerpts. The background colour indicates that there is something wrong with the code:

  import cmdsyntax, sys, # <- trailing comma

Names, declarations and values of Python variables and classes are displayed inline using a teletype font. For example, a dictionary may appear as {'infile': 'myfile.txt'}

Excerpts from example syntax definitions are displayed inline using a teletype font but with a different background colour to that used for Python objects. For example, a syntax definition may be given as [-o outfile] infile

Excerpts from text submitted at the command line are displayed inline using a teletype font with a background colour and a dotted border. For example, input given for the syntax definition given above may appear as -o myfile2.txt myfile.txt

Terminology

It is possible to encounter a number of different conventions when discussing the methods used to implement a solution for interpreting command line input. In this document, to avoid confusion, we define a set of terms and use them consistently rather than assume knowledge of a particular vocabulary for the subject.

Since the specification of elements in a syntax definition is distinct from the whitespace-separated sequence of items entered on the command line, we necessarily provide different terms for each. Generic items on the command line are referred to as arguments where their actual content is arbitrary or not important. In a syntax definition, elements which correspond to arguments on the command line are referred to as objects; generally speaking, these act as placeholders for the values supplied by the user as arguments. These are discussed in more detail in the Syntax definitions section but the terms used are defined as follows:

Where a given object is intended to represent a user-specified value and is represented using only alphanumeric characters and underscores, we refer to the object as a label.
If the string representing an object begins with one or two leading dashes, the user is expected to supply an argument matching or containing the same string. We refer to this object as a switch. Examples of such an object are -q, --quiet and --output-file=logfile.
If the string representing an object begins and ends with quotation marks, the user is expected to supply the string given but without the quotes. We refer to this object as a command. An example of this is "commit".
Where a given object is intended to represent a user-specified value and is represented using only alphanumeric characters, underscores and spaces within angled brackets, we refer to the object as an extended label. An example of this is <input file>.

Limitations

The library is currently lacking a number of useful features which make its use inappropriate for certain types of command line program. The most notable of these are:

Lack of support for any automatic type coercion of arguments collected from command line input.
An inability to interpret command lines in which switches and arguments have been concatenated. For example, it might be expected that the input -vfmyfile.txt will be interpreted as -v -f myfile.txt. This form of interpretation is not supported.

Using the library

The quick example is a useful summary for impatient developers but it does not provide much in the way of an description or explanation of the class framework used in the library. However, it does show the most common use of the main Syntax class: instantiation followed by matching. Basic use of the library can be simple because the underlying model is so powerful, and because the library performs much of the work required to interpret the user's input correctly. Such simple usage of the library is discussed further in the following section. Although the syntax used in the syntax definitions may be familiar, the reader may find the Syntax definitions section a useful reference.

The method used by the library to validate user input allows for a more forgiving approach to invalid input in which failed attempts to match the user's input against the syntax definition are retained for further inspection. The treatment of these failed matches is the subject of the Correcting user input section.

The use of syntax definitions as templates for a graphical user interface (GUI) allows us to either augment or replace the command line interface. The use of a GUI to display a program's input requirements is discussed in the GUI form generation section.

Finally, we discuss the use of styles to modify the way in which the library interprets syntax definitions and matches input against them.

Simple usage

Summary
Creation of a syntax object:
`syntax_object = Syntax(syntax_definition)`
Matching user input against a syntax definition:
`matches = syntax_object.get_args(sys.argv[1:])`

When a Python program is executed, the command line used to invoke the program is available to the program through the argv list in the sys module. The first of the list items is typically the name of the program; all other items are the arguments passed by the caller. We gain access to these arguments by importing the sys module. It is also convenient to import the cmdsyntax module at this time:

  import sys, cmdsyntax

For short syntax definitions, such as that used in the quick example, we may use the syntax string directly when we instantiate a Syntax object. When using longer definitions, or if we wish to reuse the same string later, it is more convenient to define the string separately. For example, the following syntax string is arguably too long to fit on one line:

  syntax = "<Input file> <Output directory> <File extension>" + \
           " [-s <Starting page>] [-f <Finishing page>]"

In its simplest form, an instance of the Syntax class can be created in the following manner:

  syntax_obj = cmdsyntax.Syntax(syntax)

If this instantiation was successful, we now have a Syntax object which contains all the information about the syntax we expect from the user. However, where the syntax is poorly or incompletely specified, the instantiation will fail and a cmdsyntax_error exception will be raised by the Syntax class:

  syntax_obj = cmdsyntax.Syntax("infile [-o outfile")

In the above case, a closing square bracket ] was missing from the syntax definition.

The user's input is matched against the definition through the use of the get_args method of the Syntax object created; syntax_obj in our example. This method is used in the simplest case to return a list of match dictionaries, each of which corresponds to a valid match of the input to the syntax definition. We obtain a list of match dictionaries in the following manner:

  matches = syntax_obj.get_args(sys.argv[1:])

Note that the first item in the sys.argv list is not passed to the method; this item contains the name of the program being executed and can be discarded. Note also that, although lists are mutable objects in Python and can therefore be changed, the sys.argv list passed to the get_args method will be returned unaltered.

If no successful matches were found the matches list will be empty. However, if a valid command line was supplied then we will find at least one match dictionary in the list returned. A match dictionary will contain entries corresponding to the labels, switches and commands given in the syntax definition using the following scheme:

Arguments corresponding to labels in the syntax definition are stored in entries where the key is simply the label text used in the syntax definition. Where a label is not matched because it was optional, no corresponding dictionary entry is created for it. For example, supplying the label infile with the value myfile.txt will create an entry in the match dictionary of the form {'infile': 'myfile.txt'}
Switches, when found on the command line, are allocated entries where the key used is the switch text given in the syntax definition but without any preceding dashes. The associated value for each entry depends on the type of switch found:
- Standalone switches such as -a, -help and --quiet result in entries where the value stored is the integer 1. For example, the switch --quiet if matched will cause the entry {'quiet': 1} to be created in the match dictionary.
- Switches which involve the specification of an accompanying quantity such as --file=name result in entries where the value stored is the string assigned to the switch on the command line; the placeholder for the value in the syntax definition is discarded. Switches which were not matched do not have corresponding entries in the match dictionary. For example, the switch --file=name if matched by the input --file=myfile.txt will cause the entry {'file': 'myfile.txt'} to be created in the match dictionary.
Arguments corresponding to commands in the syntax definition are given entries where the key is the command text used on the command line and the value is 1. The key value will be equivalent to the text used in the syntax definition but without any surrounding quotation marks. As with switches, no dictionary entries are created for commands which are not matched. For example, the command "commit" when matched by the argument commit will cause an entry of {'commit': 1} to be created in the match dictionary.
Extended labels behave in the same way as ordinary labels except that the dictionary key used is the relevant label text from the syntax definition but without the surrounding angled brackets. For example, supplying <input file> with the value myfile.txt will create an entry in the match dictionary of the form {'input file': 'myfile.txt'}

Warning: It is strongly advised that the names given to each label, switch and command are unique for each possible match. The matching process assumes uniqueness of named objects in the syntax definition and will overwrite existing match dictionary entries with new values, regardless of any previous contents. An example of a poor choice of object names can be found in the following definition:

--pattern=name (--solid | --empty | --pattern)

It is possible to re-use object names as long as care is taken to ensure that such objects are located in different possible interpretations of the syntax definition. For example, the filename label cannot be encountered more than once in a valid interpretation of this syntax definition:

("read" filename) | ("delete" filename)

Ideally, the method will return a list containing a single dictionary which corresponds to a unique match between the user input and the syntax definition. If no matches are found then the program may take any appropriate action. However, the situation in which many matches are returned highlights the problem of using an imprecise syntax definition. We define such an imprecise syntax to illustrate this point:

  syntax = "[infile] [outfile]"

In the above example, the syntax involving optional objects will allow more than one valid match to be returned when the user only specifies a single argument; it may be assigned to either 'infile' or 'outfile' in the match dictionary.

The problem of an insufficiently precise syntax definition is an example of a design issue which should be resolved when the program is written. However, the developer may decide to handle the ambiguity of the situation at runtime by simply selecting the first match in the list, or by looking for the most appropriate match. Other ways to resolve this issue may involve the use of a graphical user interface or a change to the syntax style used.

Correcting user input

Summary
Obtaining failed matches as well as successful matches:
`matches, failed = syntax_object.get_args(sys.argv[1:], return_failed = 1)`
Strict matching of user input with a syntax definition:
`matches, failed = syntax_object.get_args(sys.argv[1:], return_failed = 1, strict = 1)`

The CMDSyntax library was initially designed to provide the developer with a convenient way of checking the consistency of the user's command line input. Such a facility aims to give the developer confidence that incomplete or inconsistent input will always be rejected, leaving only valid input which can be relied upon. However, the identification of invalid input raises an interesting possibility: such failed matches between user input and a syntax definition can be collected and used to inform the user where any errors occurred in their input.

Consider the following situation involving a simple syntax definition:

  syntax = "<LaTeX file> (--Output-PostScript | --Output-PDF)"
  syntax_obj = cmdsyntax.Syntax(syntax)
  
  matches, failed = syntax_obj.get_args(sys.argv[1:], return_failed = 1)

In this definition the user is required to supply a file name, preferably referring to a LaTeX file, and either --Output-PostScript or --Output-PDF. If only a file name is supplied, so that sys.argv[1:] is a list containing only one item, then an empty list of matches will be returned. However, since the keyword argument return_failed = 1 rather than its default zero value, the get_args method returns a second value: a list of incomplete matches between the input and the syntax definition. For example, if the user input was merely myfile.tex then the list of failed matches will be equivalent to the assignment

  failed = [ {'LaTeX file': 'myfile.tex'} ]

In our example, the user presumably forgot to specify the desired output format for their processed file. The developer can optionally prepare the program to recognise common situations such as this and either provide a method for the user to correct their mistake or generate a contextual error message. In situations where many failed matches were returned, they are listed in descending order of "completeness" (using the number of matched arguments as a measure of quality). Hence, the first element in the list may be the failed match which is closest to being "complete". To illustrate this, let us consider an attempt to match a command line to a more involved syntax definition:

  import cmdsyntax, sys
  
  syntax = "[-s <Starting page>] [-f <Finishing page>] " + \
           "<Input file> <Output directory> <File extension>"
  
  syntax_obj = cmdsyntax.Syntax(syntax)
  
  matches, failed = syntax_obj.get_args(sys.argv[1:], return_failed = 1)

If the user supplied a command line such as -s 12 infile outdir this would result in a list of failed matches equivalent to the following list:

  [
    {'Starting page': '12', 's': 1, 'Input file': 'infile', 'Output directory': 'outdir'},
    {'Output directory': '12', 'Input file': '-s', 'File extension': 'infile'},
    {'Starting page': '12', 's': 1},
    {}
  ]

We can see from the above list that the first match dictionary is clearly the most complete. However, although the second match appears to contain more entries than the third match, it is clear that there has been some confusion: the -s 12 input has not been matched to the relevant switch and its following argument and this has lead to some interesting assignments. Such confusion can be avoided, but at the cost of some flexibility when finding matches. We try to match the input again but use a slightly different invocation of the get_args method:

  matches, failed = syntax_obj.get_args(sys.argv[1:], return_failed = 1, strict = 1)

This time, the list of failed matches is equivalent to

  [
    {'Starting page': '12', 's': 1, 'Input file': 'infile', 'Output directory': 'outdir'},
    {'Starting page': '12', 's': 1},
    {}
  ]

The use of the strict keyword argument prevents text which looks like a switch from being interpreted as a simple label. Most of the time, this extra strictness is not required and therefore it is not enabled by default. For some situations it may be necessary to apply such strict checking to avoid certain common mistakes in user input. However, it should not be used indiscriminately as it will actually prevent "valid" matches from being found; such as when the user actually wishes to reference a file called "-s", for example.

GUI form generation

Summary
Constructing a graphical form for data entry:
`if cmdsyntax.use_GUI() != None:`
`form = cmdsyntax.Form(caption, syntax_object)`
Matching the form's contents against a syntax definition:
`matches = form.get_args()`
Using failed matches to partially fill in a graphical form:
`closest = failed[0]`
`form = cmdsyntax.Form(caption, syntax_obj, closest)`

By providing a syntax definition in the form of a structured description of the input required from the user, a syntax definition can be interpreted not only as template for valid input but also as a description of input parameters to be conveyed to the user. The definitions are designed to be readable by users, following similar conventions to the usage information displayed by many command line tools. However, by treating these descriptions as more than just convenient summaries for the user, this library can be used to construct graphical user interfaces for data entry.

The creation of a GUI form need not take into account any command line input. Arguably, the simplest method of creating a form is along the lines of the following example:

  import cmdsyntax
  
  syntax = "<LaTeX file> (--Output-PostScript | --Output-PDF)"
  syntax_obj = cmdsyntax.Syntax(syntax)

As normal, a syntax definition is specified and used to create a Syntax object. We now ignore any user input and call the use_GUI function to check whether a GUI library is available. This function, which can take a single string argument specifying the GUI toolkit to use, returns either the name of the toolkit which will be used or None if no toolkit is available. If unspecified, the Tkinter toolkit will be used, if available.

  if cmdsyntax.use_GUI() != None:

If a toolkit is available then a form with the supplied caption is created using the Form class and the syntax_obj object. The input is read using a method of the form object, analogous to the Syntax.get_args method.

    form = cmdsyntax.Form("Specify output options for the LaTeX file", syntax_obj)
    matches = form.get_args()

Warning: If the use_GUI function is not called before attempts to create a form then the Form class will raise a form_error exception.

The form generated by the above example appears in the following figure. Note that the <LaTeX file> label in the definition is used to create a label and associated input field whereas the exclusive switches become checkboxes.

A simple form constructed from a syntax definition.

Although such forms would ideally allow only unambiguous user input, checks on the validity of the information provided by the user are still performed. Hence, in the previous example, if the user selects both checkboxes or presses "Cancel" an empty list of matches will be returned by the form.get_args method.

While a useful tool in its own right, the form construction facilities of the library are possibly best used in conjunction with the syntax matching facilities discussed earlier. Typically, we might expect to receive user input from the command line which we will first check against the syntax definition. Only if we fail to find a unique match do we then construct a form for the user to fill in. This is where the previous section becomes relevant once more: we may use the failed matches to save the user some of the effort needed to fill in the form.

We revisit an earlier example to demonstrate this strategy:

  import cmdsyntax, sys
  
  syntax = "[-s <Starting page>] [-f <Finishing page>] " + \
           "<Input file> <Output directory> <File extension>"
  
  syntax_obj = cmdsyntax.Syntax(syntax)
  
  matches, failed = syntax_obj.get_args(sys.argv[1:], return_failed = 1)
  
  if matches == [] and cmdsyntax.use_GUI() != None:

At this point, we have checked the input and failed to find any valid matches. However, there is a list of failed matches to examine. We assume that the first failed match is the closest to being a valid match and supply this closest match when we use the Form class to create a form. Any valid matches are returned as before:

    closest = failed[0]
    
    form = cmdsyntax.Form("Select your options", syntax_obj, closest)
    
    matches = form.get_args()

When supplied with a command line of -s 12 myfile pages the above example generates a form which appears in the following figure. Note that, although the input was incomplete, the command line arguments which were matched against elements of the syntax definition were used to fill in various fields in the form.

A partially completed form constructed from a syntax definition and incomplete user input.

The use of a failed match dictionary to partially complete the required fields in the form may be used more creatively. For example, the developer may wish to provide a default set of values which will appear in the form and possibly augment these with values from the user's input.

Using styles

Summary
Creating a new style:
`style = cmdsyntax.Style()`
Verifying that a style is ready for use:
`style.verify()`

Styles are containers for the pieces of information which describe the contents of syntax definitions and user input. If no style information is used in the construction of Syntax objects, or in the argument matching process initiated by a call to the get_args method, the default style is used. This style follows the conventions given in the Syntax definitions section and resembles the sort of input usually presented to command line tools.

Styles are created by instantiating the Style class, which effectively creates an instance of the default style:

  import cmdsyntax, sys
  
  my_style = cmdsyntax.Style()

The construction of the style object does not require any information specific to this style. This feature of the design reflects the large amount of arbitrary information used to describe styles.

The above definitions have not produced a style which is in any way different to the default style. However, for a style which has been modified in some way, we should check its consistency before it is used. This is achieved for our style object in the following way:

  my_style.verify()

The verify method will return 1 if the style is superficially correct and 0 if there are obvious problems with it. Methods of Syntax objects which allow a style to be specified will verify that it is self-consistent, raising a cmdsyntax_error exception if it is not. Although not necessary, it is useful to call this method before the style is used because it will change various internal definitions as a result of any changes made since its instantiation. Following this check, the style's attributes may be inspected to determine where any inconsistencies have arisen.

In practice, after creating a style, we would wish to modify the style in some way before using it. Let us consider the situation in which we would like to change the character used to denote switches: we will change it to an underscore. Referring to the Syntax styles section, we redefine the relevant attributes of our Style object:

  my_style.switches = "_"
  my_style.in_switch = style.switches + style.in_string + "="
  
  if my_style.verify() == 0:
  
    print "Internal error - exiting."
    sys.exit()

Having successfully verified that the style is self-consistent, we may now write a syntax definition using the new convention. We create a syntax object using the syntax definition and check the user's input:

  syntax = "[_o output] __input=file"
  syntax_obj = cmdsyntax.Syntax(syntax, style = my_style)
  
  matches = syntax_obj.get_args(sys.argv[1:])

We might suppose that a command line containing _o results.txt __input=source.txt will satisfy the syntax definition and produce a valid match dictionary. However, we only used the style to modify the appearance of the syntax definition and not the expected form of the user input. Hence, the appropriate input would have been -o results.txt --input=source.txt in this case. To enable the user to supply switches in our new style, we must use the following form of the get_args method:

  matches = syntax_obj.get_args(sys.argv[1:], style = my_style)

Use of the style keyword argument in this method is conceptually similar to its use in customising syntax definitions but controls some aspects of the matching process. This is explained in the Command line styles section.

By default, Syntax objects will employ a style which interprets switches like -help and -quit as collections of single character switches. However, many classic command line tools use the single dash, single word form to represent individual switches, so it was recognised that it would be useful to provide compatibility with this style of syntax. To enable this form of switch we use the following sequence of operations:

  long_style = cmdsyntax.Style()
  long_style.allow_single_long = 1
  long_style.expand_single = 0
  if long_style.verify() == 0:
  
    print "Internal error - exiting."
    sys.exit()

We may wish to use two different forms of notation for switches: a strict, verbose form for syntax definitions and a lenient form for the user to use. An example of this is the common practice of allowing the user to concatenate single character switches rather than requiring that each switch is given separately. We achieve this by defining a style for syntax definitions which prohibits single options from being concatenated, although the developer may instead wish to impose self-restraint when writing their syntax definitions:

  defn_style = cmdsyntax.Style()
  defn_style.expand_single = 0

Note that switches like -help are now completely forbidden by the settings in the defn_style object; attempting to use this form of switch when creating a Syntax object with this style will cause a cmdsyntax_error exception to be raised. Although it is already enabled in the default style, we define a style that permits concatentation of single character switches:

  command_style = cmdsyntax.Style()
  command_style.expand_single = 1

To illustrate the differences between the two styles, consider the following syntax definition written in the style described by the defn_style object:

(-q | -v) -s (-f input_file)

This definition can be satisfied by any of the following command lines in the style described by the command_style object:

-q -s
-qs
-v -f myfile.txt
-sf input.txt
-qsf input.txt

Note that although switches can be concatenated in such a manner, the trailing input_file argument may not be appended to this list of switches; the command line -qsfinput.txt will therefore be rejected by the matching system. This is a recognised limitation of this library.

Other uses

Summary
Obtaining a list of arguments from a string:
`args = syntax_obj.create_argv()`

Command line parsing is a specific case from a group of problems addressed by general text parsing solutions. Although this library is too specialised to implement such generalised parsing, it may be adapted to cope in situations which are similar to the command line case. Typically, such situations also involve lists of arguments, but present their input in the form of a string rather than a pre-scanned list of arguments.

The Syntax object provides the create_argv method to process such strings in a manner which approximates to that used by common command line environments. Presented with a string, the method returns a list of strings which can be used with a Syntax object:

  input = raw_input(">")
  
  args = syntax_obj.create_argv(input)
  
  matches = syntax_obj.get_args(args)

Thus, the library is able to use and match input from a variety of sources. Checking interactive input at a command prompt is an eminently suitable application of this feature, but it may also be applied to situations such as web page templating.

Summary

The CMDSyntax library provides a reasonably flexible way of simply specifying command line input requirements for a wide range of tools. Where the default style is insufficient or inflexible, it may be customised to suit the requirements of users. Support for features such as graphical form generation and forgiving validation of user input enable the developer to deploy tools outside the typical domain of the command line.

Syntax definitions

Syntax definitions are typically composed of a sequence of objects, logically connected by operators and grouped together by various forms of brackets. When the Syntax class is instantiated, the syntax definition is transformed into a tree structure describing all possible forms of input allowed by the definition.

The use of a human readable format for syntax definitions can remove many of the synchronisation problems which may arise when a command line interface is developed over time. Since the syntax definitions can be presented to the user without modification, the documented command line interface can be automatically kept up to date with changes made to the implementation of the interface.

Objects are placeholders for the various forms of input required by the syntax definition. These include:

labels and extended labels which represent values to be specified by the user;
switches which enable or disable certain aspects of a program's behaviour;
commands which the user must supply exactly as declared in the syntax definition.

Operators impose logical constraints on the input expected from the user. There are only two operators which can be used to describe input: the sequence (AND) operator and the exclusive OR operator. A logical NOT operator is not defined.

Objects may be grouped together to avoid ambiguity by using the appropriate form of brackets. Typically, this task is performed by ordinary brackets to indicate order within a collection of objects. Other forms of collections include groups of optional objects and selections of objects.

Note: The characters used to describe each element are merely conventions which may be modified by employing a syntax style when specifying a syntax definition.

Objects

Labels (alphanumeric and _ characters)

In the default style, objects made up entirely of alphanumeric characters and underscores represent arbitrary values on the command line. When the user's input is read, the corresponding command line values are stored as entries in a dictionary under the labels defined in the syntax definition.

Example: The definition infile outfile when matched to the command line input myfile.txt print.ps causes a match dictionary to be created which is equivalent to {"infile": "myfile.txt", "outfile": "print.ps"} dictionary.

Switches (- or -- followed by a string)

Switches are strings which begin with a single dash or a double dash and are followed by a string of characters. By default, the allowed characters are alphanumeric characters and underscores; dashes may also be used but must not immediately follow the leading dashes. This syntax may be changed by customising the style used to describe the syntax definition.

Switches which contain a single dash followed by a string of only one character must be matched exactly by the corresponding argument on the command line.

Example: -a

Those that begin with a single dash and more than one character represent a set of possible single character switches which can be used in any order. At least one of these switches must be found at the corresponding position on the command line.

Example: -abcde indicates that at least one of the five single character switches must be specified.

Those that begin with a double dash may take two forms: the first is the equivalent to the single dash, single character switch; the second contains an equals sign, allowing the switch to be assigned a value.

Example: --output-dir=dir assigns the value found to "output-dir" in any match dictionaries produced.

Commands (a string enclosed in double quotes)

To specify commands that must be supplied by the user, we use strings enclosed in double quotes. These behave like switches in their simplest form, but are less restrictive in the range of characters which may be used.

Example: "add"|"remove" file will require that the user explicitly writes either add or remove before the name of the file that they specify. The user should not specify the command using quotation marks.

Commands are relevant to the type of utility which performs named operations on files, such as the cvs tool. Alternatively, they can be used in syntax definitions for input-checking outside the domain of the command line.

Example: "get"|"put" file might be used in a file transfer utility.

Extended labels (a string enclosed in a < > pair)

To allow more verbose labels, the syntax definition can contain strings enclosed in angled brackets. Such objects are treated exactly like ordinary labels but allow a greater range of characters.

Example: <your input file> <my output file> will produce entries in the match dictionary called "your input file" and "my output file".

Operators

The sequence (AND) operator [space]

A pair of objects separated by a space must both be supplied as arguments on the command line in the order given in the syntax definition.

Example: infile outfile requires that two arguments are supplied from the command line. The arguments will be stored in the match dictionary under the names "infile" and "outfile" respectively.

By default, the newline and carriage return characters also act as AND operators. Multiple AND operators are interpreted as a single operator.

Example: The syntax definition

infile outfile

is equivalent to that used in the previous example.

Generally, the AND operator respects the order of its operands, requiring that the user supply all required arguments in the order in which they were specified in the syntax definition. However, there is an exception to this rule, created as a concession to usability, which is used when processing lists of optional objects.

The exclusive OR operator |

This is used to specify that only one of a list of objects should be selected. This is typically used with switches.

Example: -a|-b|-c allows only one of the three single character switches to be specified.

Although whitespace characters are usually interpreted as AND operators, the syntax parser will recognise when they are being used to pad strings for clarity. This is especially useful when using the exclusive OR operator.

Example: -a | -b | -c is equivalent to -a|-b|-c since the spaces are interpreted as padding.

Grouping objects

Grouping (objects enclosed within a ( ) pair)

This is useful when using the exclusive OR operator to select between a series of multiple object options.

Example: (-i infile)|(-o outfile) requires that either the -i switch and its associated argument or -o and its argument are supplied, but not both.

Groups of objects can be nested to allow complex specifications to be written. It should be noted that usage definitions involving deep nesting of groups may be difficult to read and maintain.

Optional objects (objects enclosed within a [ ] pair)

To indicate that objects are optional, they must be enclosed within square brackets.

Example: infile [-o outfile] allows the user to specify only the required infile argument, if desired. Note that the AND operator is being used within the optional group so, if the -o switch is given by the user then the outfile argument must also be given immediately after it.

Although the order of the arguments corresponding to infile and -o outfile was important in the previous example, this is not the case where a syntax specifies a sequence of optional object groups.

Example: [-h host] [-p port] [-u user] allows the user to specify each of these optional groups of switches and arguments in any order, although the internal order of each group must be maintained. Hence, command line input such as -h machine -p 12345 -u david will be matched to give the same dictionary as -u david -h machine -p 12345 or any other permutation of these groups of objects.

Since the order of optional object groups is ignored, it is important to provide key tokens which can be unambiguously matched.

Example: [host] [port] [user] will not always allow input to be matched unambiguously. An example of this is where the user supplies the string machine david for matching. Even without reordering the optional objects, the input cannot be matched unambiguously against the syntax definition. The previous example used switches to "anchor" the matching process and indicate some sort of order for the arguments supplied by the user.

Selections of objects (objects enclosed within a { } pair)

Selections are used to require that one or more of a group of objects are specified on the command line.

Example: {--send-e-mail --send-postcard --telephone-call} will require the user to specify at least one of the options given.

Example: -abcde from an example in the section describing switches can also be represented as {-a -b -c -d -e}

An enhanced form of the selection notation allows the range of required items to be specified explicitly. This is achieved by specifying a range of positive integers of the form #a-b which must occur immediately after the opening brace and must be separated from the items in the selection by whitespace. Such ranges specify that at least a but no more than b items may be supplied by the user. Note that both a and b must be equal or greater than 0 and that a should not exceed b.

Example: {#2-2 --hold-sword --hold-shield --hold-axe} will only allow two of these options to be given by the user, but they may be given in any order.

Example: {#0-3 (-h host) (-p port) (-u user)} is equivalent to [-h host] [-p port] [-u user] from a previous example illustrating the use of optional objects.

Example: {#1-1 -a -b -c} is equivalent to -a|-b|-c from a previous example showing the use of the exclusive OR operator.

Where only a single number is given in a selection of objects, this indicates the minimum number of objects which must be supplied as arguments on the command line. In such a case, the maximum number of objects allowed is limited to the number of objects given in the selection list.

Example: {#4 --name=string --red=number --green=number --blue=number} will require all four options to be given, but allows them to be specified in any order.

Multiple objects (objects or groups followed by ...)

To indicate that a particular object or group of objects may be repeatedly specified a number of times by the user, we follow an object or group with an ellipsis "..." in the syntax definition. This allows homogeneous lists of arguments to be specified.

Example: [-o output_file] input_file ... will allow multiple input files to be specified.

Repeated arguments given by the user are stored in appropriate entries in the match dictionary. However, to accommodate multiple values, each entry corresponding to a repeated object contains a list of values of the appropriate type. This convention is used even if only one value was supplied.

More complicated definitions can be constructed involving groups of objects.

Example: (--name=s --red=n --green=n --blue=n) ... requires that one or more complete sets of switches must be supplied by the user.

Warning: The use of this feature with single labels can be problematic in a manner similar to that mentioned in the discussion of failed matches and strict matching.

Example: input_file ... [-o output_file ...]

In this example, multiple input and output files can be specified. However, when attempting to match the command line firstfile secondfile -o we would expect to find no successful matches, but instead we find a list of matches equivalent to the following:

  matches = [ {'input_file': ['firstfile', 'secondfile', '-o']} ]

This above situation can be resolved through the use of the strict keyword argument to the get_args method of our Syntax object, but such a possibility needs to be considered before runtime. It is possible to design a syntax definition to avoid this sort of issue using switches as in the second example, but it is not a particularly user friendly interface. It is possibly a good idea to restrict use of this feature to well understood cases, such as the form used in the first example.

Styles

Styles are used to allow the appearance of the syntax definition to be customised while still retaining the same meaning. They are also used to define the style of user input expected when matching command line input to a syntax definition. Syntax styles and command line styles are usually the same, but there may be cases where it is necessary to use two distinct styles.

Although the same class is used to apply a style to both syntax definitions and user input, each is affected differently by the information which resides within the class. For example, the descriptions of allowed characters for labels, switches and commands are used in the context of a syntax style to help determine the validity of syntax definitions. However, when used in the context of command line input, these descriptions merely determine which arguments may be interpreted as switches.

It has to be admitted that styles are something of an afterthought in the design of the CMDSyntax library and, consequently, their use can be somewhat awkward and error-prone. However, for simple changes to the default syntax construction and matching behaviour of the Syntax class, use of the style system should be straightforward, if regarded as an "advanced" topic.

Syntax styles

For syntax definitions, the style defines the characters which may be used to specify objects, operators and groups in the definition. The attributes which contain this information are defined in the declaration of the Style class and can be overridden in instances or derived classes. Care must be taken to ensure that changes to the default syntax style do not result in badly formed or conflicting definitions.

The collect_start and collect_end attributes refer to the characters used to delimit normal groups of objects:

    collect_start  = '('
    collect_end    = ')'

Optional objects are delimited using the optional_start and optional_end characters:

    optional_start = '['
    optional_end   = ']'

Selections of objects are delimited by the select_start and select_end characters:

    select_start   = '{'
    select_end     = '}'

The quantity attribute defines the character used to indicate that a range of objects must be selected in a selection group; such ranges are described using the quantity_separator and in_quantity definitions:

    quantity       = '#'
    quantity_separator = "-"
    in_quantity = string.digits + quantity_separator

The characters used to delimit extended labels are defined by the following assignments:

    ext_start      = '<'
    ext_end        = '>'

The characters used to delimit command objects in a syntax definition are both defined in the default style to use the same character:

    string_start   = '"'
    string_end     = '"'

The definition of the multiple object specifier allows for a multi-character string to be given:

    multiple       = "..."

This object is identified by the first character in the string, allowing characters defined for other purposes to be used within the rest of the string. However, such uses may make syntax definitions confusing to write and, if presented to the user in the form of a usage declaration, may make them difficult to read.

The AND and exclusive OR operators are defined as single characters:

    and_operator   = ' '
    eor_operator   = '|'

Although, by default, the AND operator is only defined to be a space character, the Syntax class will interpret this as a general declaration that the AND operator can be specified by any whitespace character. This behaviour is also implemented for cases where the exclusive OR operator is defined to be a space character. The developer should avoid selecting whitespace characters for both of these operators; such definitions will cause Syntax objects to raise the cmdsyntax_error exception:

    and_operator   = ' '
    eor_operator   = '\n'

Sets of characters common to labels, commands and switches are defined by the in_string attribute:

    in_string   = string.letters + string.digits + "_"

The leading character used to specify switches is initially set to the commonly used dash character. This is itself incorporated in the definition of allowed characters in switch definitions:

    switches    = '-'
    in_switch   = switches + in_string + '='

The above arrangement allows switches such as --outfile-file to be used in the default style and enables assignments involving switches such as --directory=dirname. Note that changes to the switches attribute do not automatically affect the contents of the in_switch attribute when the verify method is called.

Styles contain a number of settings controlling the form in which switches may appear in a syntax definition. These are summarised in the following excerpt from the Style class:

    allow_double       = 1   # e.g. --option
    allow_double_value = 1   # e.g. --option=value
    allow_double_short = 0   # e.g. --v
    allow_single_value = 0   # e.g. -o=value
    allow_single_long  = 0   # e.g. -name
    expand_single      = 1   # e.g. -abc -> -a -b -c

The allowed forms of switches used in the syntax definition are defined by the above attributes being set to either 0 or 1. Certain combinations of these are regarded as contradictory. For example, the allow_single_long and expand_single attributes represent features which cannot both be enabled simultaneously since they provide two different interpretations of items such as -abc (equivalent to a {-a -b -c} selection) and -quit (an individual switch).

Many of the above definitions are relevant only to the contents of syntax definitions. However, the characters defining the composition of switches and the flags which control their interpretation are also relevant when applied to the process of matching command line arguments to a syntax definition. This is discussed in the next section.

Command line styles

Styles are applied to command line input in a manner similar to the way in which they are applied to syntax definitions. However, of all the settings describing the composition of syntax objects, only those characters which control the appearance of switches are used directly to control the interpretation of input from the command line. For convenience, these are repeated here:

    switches    = '-'
    in_switch   = switches + in_string + '='

The interpretation of switches is also informed by the values of the flags given in the previous section; again repeated here:

    allow_double       = 1   # e.g. --option
    allow_double_value = 1   # e.g. --option=value
    allow_double_short = 0   # e.g. --v
    allow_single_value = 0   # e.g. -o=value
    allow_single_long  = 0   # e.g. -name
    expand_single      = 1   # e.g. -abc -> -a -b -c

There are two other attributes which control the matching process but which have no effect on the way in which syntax definitions are interpreted:

    forbid_option_labels = 1
    optional_in_order = 0

The first of these is used to determine whether argument which look like switches may be interpreted as labels. By default this is forbidden, but the Syntax.get_args method will relax this restriction if it fails to find any matches, unless the keyword argument strict = 1. See Correcting user input for a short discussion of this feature.

The second attribute determines whether the order of optional arguments is important. By default, as discussed in the Simple usage section, the order is ignored. However, it may be useful in some circumstances to enforce that such optional elements are matched in the order in which they were specified; one benefit of this is that some poorly specified syntax definitions will yield unique matches with this option enabled.