Impression Documents and Tools

Home David Projects Impression Documents
Updated: 2010-10-27

Note: This is a work-in-progress. Many of the links given in this document lead to locations in which their targets have yet to be written.

A basic package is now available for download: Impression-0.13.zip. This archive contains the Impression package and some tools for (incompletely) converting Impression documents into some other formats. To install the package, open a terminal in a suitable working directory, unpack the archive, and enter the Impression-0.13 directory. As root, if necessary, type the following at the command line:

  python setup.py install

You should now be able to run the tools supplied in the Tools directory from the command line. Good luck!

Introduction

The Computer Concepts Impression series of applications were, and possibly still are, amongst the most capable desktop publishing (DTP) packages available for the RISC OS marketplace. The original Impression application was one of the first DTP packages available for the Archimedes range of desktop computers and this presumably helped its maker gain a large proportion of the potential userbase. This led to ongoing development of the package in the 1990s with various versions appearing for different niches in the market, such as Impression Junior and Impression Style. The level of support for these packages could be described as "varied" as some packages were superceded by later incarnations, some features appear to have been planned for but never implemented and other features required the user to purchase plugins or support tools.

This document aims to describe the basic format used by Impression to store documents on disc, discussing the structures encountered when reading these documents and techniques used to extract the user's information. Some of this is written from memory, with reference to the only "definitive" reference available to the author: a library written for the purpose of retrieving such information. At no point in the development of the library was reverse engineering employed on the executables of any of the Impression packages.

The document formats

Early versions of Impression stored documents on disc in application directories, using directories to hold separate chapters and a common file to describe styles, page layouts, frame borders and so on. The limits of the ADFS disc format used in the early 1990s, in particular the limit of 77 files per directory, may have been too restrictive for some users. Indeed, the developers may have found this to be the case because a newer format was subsequently introduced which was effectively an amalgamation of the files usually found within an Impression document directory. Whether this new format was entirely successful in solving the needs of advanced users of the software is unknown, although it seems that the publishers of a certain magazine preferred the old document format, presumably because it allowed chapters to be swapped out to disc when not needed.

When describing the document format, it is possibly useful to begin with a description of the single file version to outline the main components of a document before explaining how a new format document file could be constructed from a document directory. This approach enables us to leave awkward and non-obvious details such as the mapping table until later. For old format documents, the part of the document describing its structure resides in a file usually called "!DocData" inside the document directory.

The document header

The first 256 bytes or so of the document contains two main items of interest: the document version number and the main table. Additionally, in newer versions of Impression, the default scale factor for viewing the document is also given. Earlier versions of Impression stored the main table in a different place to later versions so knowledge of the document format version number is important if we are to reliably extract data. The version numbers which have been encountered are given in Table 1; contributions to this table are welcome!

Table 1: Document version numbers encountered with their inferred creators.
Linked from: [*]
Document version Possible creator
0x1D (29)Impression Publisher Plus
0x1C (28)Impression Style/Publisher
0x16 (22)Impression Junior
0x14 (20)Impression II
0x0E (14)Impression

The document's main table is the starting point for obtaining any kind of data about the document's content. The format of the table appears to allow for future expansion and appears to vary slightly between versions of the document format. Without knowledge of the version number it would be necessary to search for the table using some sort of matching algorithm. A simple outline of the format is shown in Figure 1.

The Impression document header Figure 1: A simple representation of the beginning of an Impression document indicating the byte offsets of words of interest. A word is taken to be four bytes in length.
[Click on the image to see it in more detail.]
Linked from: [*]

The words contained in the table are addresses corresponding to further tables and areas within the document. For example, newer documents produced by Impression Publisher or Impression Publisher Plus may have a main table which can be categorised in the form used in Table 2 whereas the main tables in older documents have been found to take the form used in Table 3.

Table 2: The form of the document's main table in new style documents.
Linked from: [*]
0x0 0x4 0x8 0xC
0x110:zero unknownunknownunknown
0x120:unknownstylesbordersborders
0x130:bordersbordersbordersborders
0x140:bordersbordersmappingmapping
0x150:content tablecontent tablemaster pagesmaster pages
0x160:end master pageschapter pagescontentcontent

Table 3: The form of the document's main table in old style documents.
Linked from: [*]
0x0 0x4 0x8 0xC
0x0B0:stylesstylesstylesstyles
0x0C0:stylesbordersbordersborders
0x0D0:bordersbordersbordersborders
0x0E0:bordersmappingmappingcontent table
0x0F0:content tablemaster pagesmaster pagesend master pages
0x100:chapter pagesend chapter pagescontent

Immediately following the main table is a twelve byte, whitespace-terminated string containing the name of the file from which the document originated. Impression does not appear to update this name when saving a document.

As is evident from a quick comparison of the layouts of each table, there is a standard order in which further information is stored in the document: styles, borders, mapping information, content references, master pages, chapter pages followed by the content itself which, for the old document format, is stored separately. Each of the items referred to in the main table will be examined in the order in which it appears in the table.

Text styles

The text style system used by Impression allows the user to apply styles to paragraphs and running text, allowing the user to layer them in order to produce documents with a consistent style. Although Impression provides default styles for headings and paragraphs, the document model it uses does not employ high level concepts for the layout of textual structures unlike, for example, packages such as TechWriter and LaTeX. However, the wide range of options governing the appearance of the text and its formatting behaviour allow a great deal of flexibility in text presentation.

The style table immediately follows the filename string at the end of the main table and contains offsets to the styles defined in the document. Unlike the main table, the offsets are calculated from the beginning of the style table rather than the beginning of the document. The style definitions follow the style table in the document.

Style definitions

Many styles are defined to take advantage of the ability of the system to layer styles, only declaring attributes of the text formatting and appearance which require changing to obtain a particular effect. The attributes defined by an underlying style are exposed where a style above does not override attributes which are common to both styles. For this system to work there must be a base style which provides defaults for these attributes: typically "Normal" or "BaseStyle".

Therefore, Impression's styles may conserve memory by declaring only the attributes which are necessary to encapsulate their behaviour. This is achieved by the use of a number of flags words (see Figure 2), specifying which attributes of the style will be defined. For most styles, the values of these flags have meanings which can be determined simply by changing the nature of styles using Impression's user interface. However, the base style appears to contain flags and data for which there appears to be no corresponding visible or documented effect on the style. This clearly makes interpreting the styles used in some documents particularly difficult for a conversion tool.

The locations of flags words in a style definition Figure 2: The locations of flags words and the style within a style definition. Note that for older documents the gap at offset 0x0C may not appear, causing the word at 0x14 and following information to appear at 0x0C instead.
[Click on the image to see it in more detail.]
Linked from: [*]

The meanings of each of the flags in the first flags word encountered is shown in Figure 3. This word specifies the presence of a mixture of formatting and appearance attributes, hints for the application which is to edit the document and Impression-specific information such as whether the style is to be displayed in a menu. Not all of these attributes are relevant to those attempting to extract enough information in order to display Impression documents; much of the paragraph positioning information is unnecessary, for example. However, for editors, almost all of the information presented will be required if the document is to maintain its overall consistent appearance after it has been modified by the user.

Relation between the style flags and declarations Figure 3: The meanings of the bits when set in the first flags word. Note that the flags word is labelled according to its position in the order of appearance of flags words within the style definition.
[Click on the image to see it in more detail.]
Linked from: [*]

Figure 4 presents more of the flags associated with the style; in this case from the second flags word encountered. Here, it is evident that, without an available document format specification, it is difficult to understand the nature of the flags which apparently duplicate those in the previous word. These duplicates require checking carefully in order to determine the actual presence or absence of the relevant style attributes. Some flags have unknown meanings but still correspond to bytes or words in the following data. Many of the flags have meanings which are difficult to categorise using the key provided, especially where they have not been fully explored. For example, the leadering string may be explicitly written into the text by Impression, otherwise it will be up to the renderer to fill in the text when required. The script size, however, is always specified through style changes in the text content, so its value in the flags word is mainly useful for editing applications.

Relation between the style flags and declarations Figure 4: The meanings of the bits when set in the second flags word. Note that the flags word is labelled according to its position in the order of appearance of flags words within the style definition.
[Click on the image to see it in more detail.]
Linked from: [*]

The third flags word defines the number of tabs used by the style; this number being equal to the number of consecutive bits set in the word, counting from the least significant bit. It is important to read this number of bits correctly, as the corresponding number of values will be included in the style definition. The fourth flags word declares information which is used by Impression to manage styles and generate a document contents listing and index (Figure 5). These attributes are not particularly useful to a rendering application but may be useful for those who wish to create tools for automating the modification of styles in documents.

Relation between the style flags and declarations Figure 5: The meanings of the bits when set in the fourth flags word. Note that the flags word is labelled according to its position in the order of appearance of flags words within the style definition.
[Click on the image to see it in more detail.]
Linked from: [*]

The order of style attribute declarations within a style definition is not the same as the order in which the flags are allocated in the flags words. Since the attributes are written as a sequence of optional values, appearing only if the relevant bit is set in a flags word, errors in decoding flags leads to values in the sequence being missed or read by mistake. The consequences of this is a style in which many of the attributes are wildly incorrect. While such errors are easy to correct for a human reading the resulting style summary, such badly interpreted styles can cause severe problems for an application relying on correct style information.

Table 4:
Corresponding flag or value Size of value when enabled Description
[After name]
[Base style skip word]
Auto indentbyteBit 0 is auto indent state.
[Base style jump to the beginning of the next word]
[Base style skip three bytes - to last byte in word]
Font name flag 1 byte Unknown value - skip byte.
Underline byte Value one of 0 (none), 1, 2.
Script byte Value one of 0 (none), 1 (subscript), 2 (superscript).
Strikeout byte Value either 0 (off) or 1 (on).
Alignment byte Value one of 0 (left), 1 (centre), 2 (right), 3 (full).
Keep single paragraphs together byte Value either 0 (no), 1 (yes).
Keep multiple paragraphs together byte Value either 0 (no), 1 (yes).
Hyphenation byte Value either 0 (off) or 1 (on).
[Base style skip byte]
Decimal tab byte ASCII code for the character used.
Keep with next paragraph byte Value either 0 (off) or 1 (on).
Foreground colour byte Unknown value - skip byte
Unknown flag byte Unknown flag at 0x080000 in second flags word - skip byte.
[Base style skip four bytes]
Page grid lock byte Value either 0 (off) or 1 (on).
Rule-offs byte Rule-off above determined by bit 1 (0x2): 0 (off), 1 (on).
Rule-off below determined by bit 2 (0x4): 0 (off), 1 (on).
Italic style byte Value either 0 (off) or 1 (on).
Bold style byte Value either 0 (off) or 1 (on).
For any style, if not word aligned then move to the beginning of the next word.
[Base style: skip word]
Left margin word Rightwards displacement in millipoints from the left hand frame edge.
Right margin word Rightwards displacement in millipoints from the right hand frame edge. [The value is typically negative, indicating a leftwards displacement.]
First line left margin word Rightwards displacement in millipoints from the left hand frame edge for the first line in a paragraph.
Script offsets (always for base styles) word

The "offset" for subscript text is given in the lower half of the word and is expressed in the form of a percentage:
value = ((word & 0xffff) * 100.0) / 0x8000

The "offset" for superscript text is given in the upper half of the word and is expressed in the form of a percentage:
value = (((word >> 16) & 0xffff) * 100.0) / 0x8000

Script size (always for base styles) word

The size of script text as a percentage of the text size is expressed in the form of a percentage:
value = (word * 100.0) / 0x8000

Line spacing word

Whether line spacing is expressed as a percentage of text size is determined by the top bit of this word.

If the top bit is set then a percentage is stored in the remaining bits in the word:
value = ((word & 0x7fffffff) * 100.0) / 0x10000

If the top bit is not set then a length in millipoints is stored in the remaining bits in the word.

Space below paragraphs word Value in millipoints
Space above paragraphs word Value in millipoints
Underline offset (always for base styles) two words

The offset and size of underlined text, expressed as a percentage, presumably of the text size, given in the form used to express the sizes and offsets of script text.

For the first word, the underline offset is given by the expression:
value = (word * 100.0) / 0x8000

For the second word, the underline size is given by the expression:
value = (word * 100.0) / 0x8000

Rule-offs (always for base styles) word Rule-off thickness in millipoints.
Left rule-off margin (always for base styles) word Rightwards displacement in millipoints from the left hand frame edge.
Right rule-off margin (always for base styles) word Rightwards displacement in millipoints from the left hand frame edge.
Font size word The font size in sixteenths of a point.
Text aspect ratio word Expressed in the form of a percentage, the top two bytes in the word are the number of hundreds, the bottom two are the units expressed as a fraction of hundreds:
value = (word * 100.0) / 0x10000
Rule-offs word The offset of the rule-off below the text in millipoints.
Keep paragraphs together (general policy) word Keep together within this length given in millipoints.
Leadering word The leadering string, either zero terminated or a maximum of four characters.
Vertical rule-off width word Width of vertical rules in millipoints.
Rule-offs word The offset of the rule-off above the text in millipoints.
[Base style skip two words]
Tracking word Tracking (specified in thousandths of an em), in the following form:
value = (word * 1000.0) / 0x10000
[Base style skip six words]
Foreground flag 2 word Unknown value - skip word
Background flag 2 word Unknown value - skip word
Font name flag 2 word Unknown value, possibly the font handle.
Number of ruler tabs Corresponding number of words

Each word contains the tab position and its type; the position is in the top three bytes with the type in the lowest byte.

The tab position in millipoints is given by:
value = ((word & 0xffffff00) >> 8

The tab type is given by:
value = word & 0xff
where value is one of 0 (left), 1 (centre), 2 (right), 3 (decimal), 4 (vertical rule).

[Base style: skip words until the tab definitions fill a block of memory 0x80 bytes in length.]
Font name 0x28 bytes Null terminated, possibly limited to 0x28 characters.
Foreground colour (always for base styles) word The text foreground colour in the form 0xBBGGRRTT, where each colour channel contains a value in the range [0, 255]; the transparency channel indicating opacity if TT equals zero.
Background colour (always for base styles) word The text background colour in the form 0xBBGGRRTT, where each colour channel contains a value in the range [0, 255]; the transparency channel indicating opacity if TT equals zero.
Underline/strikeout colour word (although this word may be reused) The underline/strikeout colour in 0xBBGGRRTT format.
[Base style: skip word]
Rule-off colour word The colour of horizontal and vertical rule-offs in 0xBBGGRRTT format.