Audience Dialogue

Software for building a knowledgebase

This is a specification for "Foobase" - software which we commissioned in 2001, but was not finished in a form that we could use easily. However the idea is still good, and there's no other software that does this (that we can find), so there should be a market for it. Thus the specs are on this website in the hope that somebody else will take up the idea and produce the software. The work would probably involve taking the code for an email program, and modifying it to perform the functions described below. If any reader of this page decides to produce the software (whether open source or commercially) please let us know - we're willing to be a test user.

Specification for version 1

This program will enable people (either alone or as a workgroup) to make sense from a lot of small fragments of information. The purpose of the program is to make it easy to assemble fragments of text into a coherent and meaningful structure: a "knowledgebase".

The model

In many ways, the Foobase program will work like an email program. An email database (such as Eudora or Outlook) is made up of a number of email messages. Each message has certain properties: the text of the message, the subject, the sender, the date and time, etc. The messages are divided among a number of top-level folders (such as inbox, outbox, and unsent drafts). Each top-level folder can be divided into a number of other folders, e.g. relating to particular topics.

Now imagine an email program that doesn't actually send or receive messages. All that's left is the database, as described above. Instead of an email message, there's what we call an "observation" - this might be one person's responses to a questionnaire, one person's comments after attending a seminar, a book review, an abstract of a paper, a news item, or an idea jotted down in the middle of the night.

Just as an email message has a subject, a sender, and a date and time, an observation has a subject, a source, and a date and time. Just as similar emails can be moved to a folder on that topic, so can similar observations. Just as email folders are divided into larger groupings (e.g. inbox and outbox), so are observations - except that the larger groupings are based on meaning.

Just as emails can have attachments and web links, so will Foobase.

After this point, Foobase and email software are different. Foobase won't receive or send messages, but it will have other features, giving it the ability to quickly and accurately classify the observations into groups.

The feature that most clearly separates Foobase from email will be Foobase's ability to make higher-level observations, summarizing a set of lower-level ones. In email terms, that's as if each folder had its own email message built into its header.

Another difference between Foobase and email is that each user has only one set of email. But one user can have any number of Foobase knowledgebases on a single computer - each one could cover a single project.

Software format

This should be a Java program which runs on a user's computer - either inside a browser or as a stand-alone program. The data will be in an online database, running on a server (which could be the user's own computer).

File format

This is up to the programmer, to some extent, but in general we prefer a few large files to many small ones. An XML-type format is suggested. The file extension can be .kgb (for knowledgebase).

Observations: structure and types

The knowledgebase is made up of observations, which can also be thought of as notes, comments, or memos. Observations can be structured (with variables) or unstructured (all text).

There are 3 levels of observations:
Level 1 - base data - specific comments by any participant or stakeholder.
Level 2- summaries of base data. Each observation in level 2 is based on a number of observations at level 1.
Level 3 - overall summaries. Each observation in level 3 is based on a number of observations at level 2.

Structure of observations

The basic fields in each observation are
1. Body of the observation: either one text field, or divided into more specific fields.
2. Topic of this observation (particular seminar, website, newsletter, etc)
3. Class of observation (if more than one defined for this knowledgebase)
4. Keywords for this observation.
5. The person who made the observation (or type of person - e.g. course participant, website visitor).
6. Date and time of the observation.
7. Relevance rating (0 to 100)
8. Links to observations at a lower level.
9. Links at same level [group/s that this observation belongs to]
10. Links to observations at a higher level.
11. A sequence number for this observation.
12. Attached file/s and web links, if any.
13-. User-defined fields, within body of observation.

These observations may be newly made, or may be from existing data, e.g. extracts from reports, Endnote references, emails, etc.

Level 1 only - various inputs from individual cases

Examples of some classes of observations:

(a) Example 1: data from forms, where each observation is one case (person), and each case answers questions on a number of variables. Variables are a mixture of numeric, category, date/time, and text data. Most variables have one value per case, but some may have multiple values for the same case.

(b) Example 2: descriptions of seminar and similar events, with additional fields:
- town or city
- venue
- date and maybe time
- presenter/s
- general topic area
- specific topic
- audience size
- any unusual factors

(c) Example 3: transcripts of discussions. Each observation consists of a sequence of statements on a single topic, made by one or more people. The full discussion would be a set of observations.

(d) Example 4: bibliographic references (e.g. from Endnote), perhaps with some text.

It should be possible to add new types of observation formats (i.e. different sets of variables) without further programming - perhaps by editing an XML file with a text editor. For each field this file could specify properties such as broad type (e.g. text, number, date/time), field label, whether something must be entered, range of entries (1 to N repeats), possible values, etc. The priciple is to keep it all fairly loose, and not to be too worried about data types etc.

Views

Foobase has three kinds of view:

1. Data view - see a full observation record. This is the view used when creating and editing an observation. An automatically generated summary also appears in data view. It might be about full-screen size, and resemble an HTML form - like an email message with a little structure

2. Table view - squeeze each observation onto one line of an on-screen table. This view is used when comparing and summarizing observations. It would resemble a spreadsheet, or a table of email messages.

3. Search view - a search form, the output of which is a table view.

Program functions

The Foobase software will do three things:

1. Creating and editing observations.
2. Selecting a set of observations.
3. Summarizing the selected set into a higher-level set.

The emphasis of the program will be on ease of use. All the above functions can be done with a word processor, spreadsheet, database, or (almost) email program - but not easily. The software has to make it really easy for the user to summarize the observations, by grouping them, sorting them, etc.

Menu structure

File - create new knowledgebase
- open an existing knowledgebase [default = last one used]
- open new window to current knowledgebase
- save / as
- export data in current selection [as table]
- print current selection
- quit

Edit
- Edit current observation
- Add new observation, of class... [default: same as current / standard format]
- Create new class of observation [define fields: choose field name, and type (not size). Field types can be: text, keyword, category (single or multiple value), number, date-time]

Select
- make new selection [by defining criteria in search view]
- select all
- add current observation/s to selection
- remove current observation/s from selection
- save selection as new group [prompt for name of group]
- select group [choose name]
- cancel current selection

Display
- as spreadsheet [one observation per line]
- as form [one observation per form]
- as tree view [one observation per node]

Link
- show uplinks (from higher level)
- show downlinks (to lower level)
- show crosslinks (same level)
- go back [after visiting a link]
- add uplink [to ...]
- add downlink [to ...]
- add crosslink [to ...]
- remove this link [when links displayed]

Summarize
- Show summary for currently selected field/s

Fields - [list of fields - in table view, clicking on a field in this menu will alternately hide and show the field in the table]

Help - selecting } basic summaries,
- displaying } - about one page
- editing } each
- manual [this will itself be a knowledgebase]

1. Creating and editing observations

In data view, any currently visible observation can be edited. From table view, the user must click on that line, and choose Edit> Current observation (or control E).

The user can manually summarize visible observations, by beginning a new observation, usually at a higher level. This is most easily done by opening an additional window.

To enter a new item, the user must choose the level, and the class. The Edit menu would contain

Edit
- Current observation
- New observation, level 3
- New, level 2 observation
- New, level 1 observation
- New level 1 questionnaire } varies, depending on
- New level 1 event } classes of item available
- New level 1 bibliography }

A new observation can be produced by either typing in the observation, or copying and pasting text from an external file (or another observation).

When a new class of observation is added, fields may first be defined - the default is one big text field. If this is pasted in, it should later be possible to divide that field into several others. If these were in a fixed order, this could easily be done by inserting delimiter characters.

When fields have been defined, each field is separated by a thin line. The user moves between fields by pressing Tab, Page Up or Page Down, or using the mouse. Fields may be omitted, in which case extra lines are inserted, and Tab (etc) must be pressed twice.

The importance of a text field in a particular observation can be indicated by beginning the entry with up to 3 asterisks: the more asterisks, the more important. The easiest way to do this would be to make Importance a new field type.

Importing multiple observations

A set of observations - such as data from a survey - can be generated by copying an external file, in CSV or tab-delimited or record-delimited format. Each record (in the first two cases) or set of records (if record-delimited) will produce an observation. Field names will be picked up if the CSV or tab-delimited file has an initial record containing these. A record-delimited file will need a case delimiter, and field names can be picked up if the file is one of name-value pairs.

Making links

An observation, when selected, can be linked to another observation using the Link menu. Relevant options are
- add uplink (i.e. to a higher-level observation)
- add downlink (i.e. to a lower-level observation)
- add crosslink (to an observation at the same level
- remove this link

2. Selecting observations

The selection process is the heart of the software, and must be really quick to use. The principle is to begin with all or no observations, then to add or subtract more observations from that selection depending on the characteristics or contents of those observations.

Options in Select menu:
- select from: all / currently selected / currently unselected
- select level/s (1, 2, 3)
- select class of observation (set of case data, transcript, description, etc)
Fields for the selected type are then displayed, generally one per line
Selecting within each field
- numeric or date/time fields: choose range
- category fields: choose category/s (checkboxes)
- text fields: enter text to be found, with AND logic on each line, OR between lines. Search is for individual words unless enclosed in quotes, when the search is for that phrase (like most search engines).

AND logic applies between (but not within) the above items; if OR is wanted, choose "add to previous selection" and make a new selection to add.

Selection is made by clicking visible fields - a SQL-like statement also appears on line 2 (below the menus) - this is editable, for mixing AND and OR etc.

User presses GO button at right of Line 2.
Progress indicator shows "selected n of nnn"
If no observations are selected, a window appears: "None found. Try again? {yes] [no]". If Yes is chosen, the user needs to re-select the criteria.

By clicking the checkmark at the right of any line (in spreadsheet format) or using the Select menu (form format), or the + or - key on the numeric keypad, the user can add an observation to (or remove it from) the selection.

When observations are selected, they are automatically displayed in spreadsheet view: one observation per line. The last observation selected appears at the foot of the screen. Earlier ones scroll up and disappear (unless brought back with the scroll bar).

Displaying the selection

A selection is shown by default in spreadsheet list format (one line per observation). A status line at the foot of the screen shows the class of observation, the number of observations selected and the total number available at that level, of that class.
[There should be no need to mix classes in one selection, but maybe it should be possible (later?) to change the class of any observation to the default of a single text field.]

With the Display menu, the user can toggle to the Form format and back:
- display observation as single form (control F)
- display in spreadsheet format (control S)

The user can also toggle between selection and all cases with
- display all
- display selection
This could be done using an F key or control-key sequence (e.g. F12, or control T for toggle).

When a text field appears in list format, the whole field may not be visible. Moving the mouse over this area and leaving it there, after one second will display the whole text in a yellow box (as in MS Office -"mousetips"). The yellow box will overflow neighbouring lines and/or fields.

In spreadsheet view, the top line (as in Outlook etc.) shows field names. Clicking on any field in this top line will select that field, which is then highlighted. The 2nd line has an upward-pointing and a downward-pointing triangle for each field. Clicking a triangle will sort the selection on that field in the indicated order. This sorting affects the display only, not the files,

When a multi-value field (e.g. keywords) is sorted each entry will appear once for each keyword, but in a different background colour from usual.

The order of lines on the screen can be changed by seizing a line (or highlighted group of lines) in the left margin and dragging it up or down. This overrides the sorting.

Highlighting one or more lines then pressing the DELETE key will remove those lines from the selection (but not from the file).

The Fields menu lists the fields in the current type of observation. A field irrelevant to the user's current purpose can be hidden by selecting it and choosing Hide This Field from the Fields menu. If multiple fields are selected, all will be hidden. To show a field again, the user must choose the field (which will not be ticked) from the Fields menu. All visible fields are ticked. A thicker-than-usual vertical bar (2 pixels instead of 1) between fields in line 2 will show where fields are hidden.

The displayed width of a field can be changed by dragging its label on the top line - as with Excel etc. The minimum width visible this way is 2 characters. If the cursor is placed on the top line of a column, the field name appears in a pale blue box, after half a second.

When a selection has been made and Show All is chosen in spreadsheet view, the selected observations are shown on a white background, and the unselected observations on a grey background.

Another display is the Tree View - like Windows Explorer, but with + and - signs on groups of observations (instead of directories). A group of observations can be selected by highlighting it, then choosing Add Current Observations from the Select menu.

Displaying links

More items in the Display menu:

- display links to this observation (from lower level)
- display links from this observation (to higher level)
- display links at same level

If there are no links, a message appears on the bottom (status) line: "No links of [this type]" - and the screen doesn't change.

If there are links, a new version of the Display screen appears, with one linked observation shown on each line.

3. What can be done with selections

A selection can be summarized or exported...

Summarize

When one field is selected, and Summarize chosen from the menu, a new window opens, summarizing that field for the current selection. The first item in the window is the name of the field.

When the summarized field is a text variables, the summary shows (1) the number of observations at each importance level - i.e. beginning with 0, 1, 2, and 3 asterisks.
(2) the first line of each observation beginning with ***

If the summarized field is a category variable, the number of observations with each different value is shown, together with a total and a percentage, e.g.

Right      26     40.0% 
Wrong 39 60.0%
No data 0 0.0%
Total 65 100.0%
Multiple-answer variables (e.g. keywords) have two percentage columns: % of observations, and % of mentions.

For date variables, dates are tabulated like categories, but with cumulative percentages added in a new column to the right.

Numeric variables are summarized like date variables, but with mean and standard deviation added.

When summary data are shown, an option appears: "Paste into new observation?" A user who accepts this option is prompted for the level - defaulting to 1 level higher than the current data.

Each of these new observations has links to the underlying observations. (As this could be a huge number of links, a list of the serial numbers of all items in the current selection is automatically generated, as another kind of observation, and a single link is made to that list.)

Export

The text of the level 3 observations can form the basis of a formal report. Thus an export facility will save much time.

To export, the wanted fields from the current selection are chosen - often only the main text field will be needed, perhaps also the Topic field that defines the text.

When Export is chosen, the standard "save file" dialogue box appears. If no variables are selected, a message appears: "No variables selected. [Body only] [Try again]"

Exported data is saved as an ASCII text file, one record per field, plus a totally blank record after the observation. Thus if several observations were exported, with topic and text fields, the export file would list each observation's heading in one record, its text in the next, and a blank delimiting record. When pasted into a word processor, each record would be a paragraph. Within the word processor, these paragraphs could be converted into a table, with one row per observation, and one column per variable.

Expected size of knowledgebase

As a guide to writing the software, the approximate number of observations expected in a large knowledgebase would be:

Level 1 - 10,000? 
Level 2 - 500?
Level 3 - 20?

Average size of observation 1,000 characters
So total size of that large knowledgebase will be about 10 MB. Best to hold entire knowledgebase in RAM, for speed. Give warning if this is not possible, or suggest closing other apps.

Undo

Either: undo last action (control Z)
Or (better)
Keep a log of all/recent actions. Control Z brings up this list, one line per action. Undo an action by selecting its line and pressing Delete. When this is not possible, grey out the un-undoable action - e.g. trying to delete someting that's already deleted.

Warning: getting this perfect might be very tedious, and hardly worth the trouble.

Still to decide (Feb 2001)

Allow for more than one user at a time on the same knowledgebase? Probably not, because the analysis is PC-based. But then what about multiple remote data entry? Need some way to merge/append separately created parts of same knowledgebase?

Should there be restrictions on editing other people's observations? Maybe unable to delete, but to add comments in a different colour. OK to edit an observation made by the same user, but software should keep a log of edits, showing username, date, time, observation number, and change made. Or is this unnecessary?

How to fit in data dictionaries, e.g. possible values of category variables. Another type of observation? Another level? Through XML DTD?

Could all this be done inside a browser, or does it need a separate program?
If the latter, should the data entry use XML?
Have a tab-delimited option as well?