The Commerce of Content
Management Report:
First Draft: "What We Can Learn about Productivity and Effectiveness Metrics from Software Design and Development, Training, and Marketing Communications"

Project Management | People Management | Business Management | Information Design Models, Processes, and Techniques | Home

by Saul Carliner, Helen Constantinides, Kirk St.Amant, and Catherine Walstad.

Note: This is a long report. You might consider printing it for easier reading.

In this Literature Review
Effectiveness and Productivity Metrics for Technical Communication
Effectiveness and Productivity Metrics for Software Design and Development
Effectiveness and Productivity Metrics for Training
Effectiveness and Productivity Metrics for Marketing Communications and Public Relations
Intellectual Capital
Lessons Learned
References

In 1859, the U.S. Congress heard a report that, as a result of the publication of a guide for light house operators, no preventable maritime disaster occurred in the previous year.

While it is not fair to give the written instructions the entire credit for the improvement [technical advancements were introduced]…[These] comments regarding how the lights were kept is positive testimony to the value of the documentation (Loges, 1998, p.452)

At least since the middle of the nineteenth century, technical communicators have been concerned about the value of our work.

What's different almost 150 years later is the extent and nature of the concern. If this quote provides insight into the sentiment of the time, the concern centered on the difference that technical communication products made in their environments. In our time, the concern focuses on justifying the need for organizations to continue employing technical communicators.

Durig the global economic slowdown of the early 1990s, technical communicators worried about the future of our jobs as organizations downsized and contracted services to outside agencies. As economies, organizations retained a skepticism about expenses. Mead commented, "If technical documentation cannot be shown to contribute to the bottom line, it has no reason for being" (353).

This attitude in industry has spurred a quest for precise ways of measuring how we contribute to the bottom line of the organizations that employ us (such as Redish, 1995, Hosier, 1992, See 1995) Nervous technical communicators seek these calculations to preserve a job in danger, or prevent a department from being outsourced.

Because the concept of calculating the value added by technical communication is relatively new and the methodologies are not yet in place, we also seek other statistics to make our case. In his 1998 review of the literature on the value added by technical communication, Mead concludes that we do not yet have such standardized metrics; he notes that we need to conduct ongoing, systematic measurements to identify them.

Specifically, what should be measured? Neal (1994) comments that measuring a process is worthless unless the product is also measured. That suggests that measurements might address both effectiveness of the product (that is, how good is it?) and the productivity of the process that produced it (how much did we make and what resources were needed to do that?).

Technical communicators aren't the only professionals to face downsizing and outsourcing. While organizations were downsizing their technical communication teams, they also reduced the size of their software development and support, training, and marketing communications departments, and hired outside agencies to assume some of the responsibilities.

If they reacted like technical communicators, these three groups of people (among others) must have also sought methods of calculating the value they add to the organizations they serve. And if technical communication is truly an interdisciplinary field that draws on expertise, research, and literature from a variety of other fields, perhaps we should extend our search for useful metrics to these related disciplines. Our work overlaps with each of these fields. Perhaps they have metrics for assessing our work that we can adopt or adapt now.

This article reviews the models and metrics used to assess the work of software designers and developers, trainers, and marketing communicators. First, it considers the state of the art in technical communication. Then, it presents models and metrics used to assess effectiveness and productivity of software designers and developers, trainers, and marketing communicators. Last, it suggests some common lessons that might be learned from these three disciplines.

Top

Effectiveness and Productivity Metrics for Technical Communication

Holistic Measures

Although effectiveness and productivity are separate metrics, some approaches to measuring them note that the two are tightly linked.

Observing that the quality of a product is directly linked to the process that develops it, and that the effective use of resources during the process is tightly linked to the extensiveness (or maturity) of that process, JoAnn Hackos proposed the Process Maturity Model in her 1994 book on project management. Hackos' model is an adaptation of the Capabilities or Process Maturity Model developed for software development, and described later in this article.

Effectiveness Metrics

According to the literature, technical communicators use the following methods to assess the effectiveness of their work: meeting criteria, usability testing, assessment of communication techniques, assessing the value-added by technical communication, and awards.

Usability testing emerged from the field of ergonomics and human factors and assesses the ease with which users can perform an assigned task with a communication product (or with the software or hardware that the communication product describes) (Dumas & Redish, 1999). Technical communicators are primarily recommended to conduct usability testing to take draft communication products for a "test drive" before their release to the public (Duin, 1993). The problems found in the test should be corrected before the communication product is released to users.

Technical communicators might also conduct usability tests after publication, but primarily to suggest future improvements, rather than assess the ultimate effectiveness of the communication product.

Another approach that has been widely described in the literature is the assessment of communication features, such as the U-metric assessment. Such assessments measure the extent to which specific communication features are used in a given communication product, such as white space, type size, index entries, visuals, and headings. Specifically, the assessment looks for those features that other research has shown to be correlated with improved readability or comprehension. Because this type of approach involves extensive counting and the presence of such features does not guarantee quality, use has been limited.

More recently, the literature has focused on calculating the value-added by technical communication. The approach, suggested by Redish and Ramey (1995), focuses on case-study research and involves calculating a performance improvement to another part of the organization that results from the publication of a technical communication product. In some cases, the benefit might be reduced errors, such as the reduction in errors resulting from the U.S. Veteran's Administration redesigning a form (Daniel, 1995). Similarly, Spencer and Yates that customers who used the documentation produced by the software publisher called the company's help desk less frequently than those who did not (1995).

Another means of evaluating the effectiveness of technical communication is through awards. The Society for Technical Communication offers a two-tiered awards program (local or regional, and international) that evaluates the effectiveness of technical publications, art, videos, and online communication. The competitions use an external set of criteria to assess each entry. Some of the criteria are similar to those in the assessment of communication features. Rather than counting the extent to which features are used, judges make a value judgement. Judges are also asked to make value judgements on the overall effectiveness of the communication product. (STC Judging Criteria, 1999.)

Although none are in wide use, some models have been proposed for evaluating the effectiveness of technical communication products. For example, Carliner proposed a four-level approach that assesses user satisfaction and ability to perform intended tasks, the extent to which clients were able to achieve their intended business goals, and client satisfaction (1997).

For technical communicators, the key challenge to developing effectiveness metrics is defining the metrics themselves. Usually, this results from the problem of defining quality. Bandes suggests that quality is conformance to requirements (1986). But the practice of technical communication typically establishes requirements only for editorial consistency and specifications for printing, rather than requirements for the content (such as "users should be able to install the software in 8 minutes or less with no more than 1 error in the process"). Since Bandes addressed at the issue, several others have tried to define quality, ranging from definitions focusing on characteristics of the text to the service provided by the technical communicator (Fredrickson, 1992).

Productivity Metrics

We technical communicators use a variety of methods to assess our productivity. The most common is the pages-per-day rate: the number of pages produced per day by a given technical communicator (Hackos, 1994). When considering the rate, organizations recognize the difference among new pages and revised pages. New pages are assumed to require more work than revised ones. In some cases, revisions only require changing one or two words on a page. Differences in page sizes might also affect a communicator's rate.

Organizations also assess the productivity of their technical communication groups by calculating their page rates: the total number of pages published and maintained by a technical communication team during a given time period (such as 1.5 finished pages per day).

As the number of technical communication products published online increases, the pages per day rate becomes a less valuable calculation. Some organizations also calculate the number of online topics or screens published. Because the lengths of topics vary, and the number of screens required by topics on differently-sized computers also vary, concerns arise with this metric. In response, other organizations track the amount of disk storage used, but the number and size of graphics and programming instructions can have an impact on this metric, too.

Some organizations also track specific types of costs and schedules. For example, Murphy (1992) benchmarked the cost of publishing different types of communication products, and also calculated the time needed to produce different types of communication products. Similarly, Cover, Cooke and Hunt (1995) calculated the costs of making changes at different points in the publications process.

One of the problems underlying the calculation of productivity rates is that most corporations consider this information to be proprietary, and do not let their workers publish the information so comparisons can be made across organizations and industries.

As a result, technical communicator rely heavily on rules of thumb that are widely used but based on experience and not substantiated through research. For example, Thomson suggests that a quick rule of thumb for estimating the cost of a technical communication project is as a percentage of the total cost of the software development project it supports (1998).

Furthermore, dogging all efforts to devise both effectiveness and productivity for technical communication is that much of the information available is "largely anecdotal and of varying quality; furthermore, no doctoral dissertations have been published on the subject and there is no standard textbook or handbook on the subject though several important works, such as those by Schriver (1997) and Hackos (1994) that discuss the "value-added problem" (Mead, 1998).

Top

Effectiveness and Productivity Metrics for Software Design and Development

Probably before, but certainly since, Frederick Brooks first named The Mythical Man Month in 1972, software project managers have been sensitive to the time and staffing needed for a software project, as well as the quality of the software produced. Changes in software development technology -- new languages, tools, and methodologies -- have only efforts to develop meaningful measures.

Holistic Measures of Productivity and Effectiveness

Some models for assessing the productivity and effectiveness of software development tightly link the two, and consider both as part of a single measure. These holistic approaches include the Process Maturity Framework model, the Goal-Question-Metric model, the risk management model, and object-oriented design principles.

The Process Maturity Framework Model links that the effectiveness of a software product and the productivity of resources to the quality of the process used to develop it (Pfleeger and McGowan, 1997). The process was developed by the Software Engineering Institute of Carnegie Mellon University. It considers the development process along a continuum of maturity, it notes that as a process becomes more mature, more meaningful the measures. The following figure shows this continuum.

1. Initial

2. Repeatable

3. Defined

4. Managed

5. Optimizing

Less mature

 

 

 

More mature

Characteristics: Development is ad hoc.

Characteristics: Process dependent on certain individuals

Characteristics: The process is defined, and institutionalized. Most likely, the process is recorded in a procedures or process guide

Characteristics: The process is measured with a variety of quantitative measures.

Characteristics: The process becomes a feedback loop, through which feedback from one part is acted upon in others.

Metrics to Use: Baseline

Metrics to Use: Project

Metrics to Use: Product

Metrics to Use: Process + feedback for control

Metrics to Use: Process + feedback for changing the process

In the first level, Initial Process, software development is pursued in an ad hoc manner. Because the development process at this level lacks both adequate structure and control, only preliminary "baseline" metrics should be collected, which can be subsequently used for comparison. These initial measures can include product size and staff effort.

In the second level, Repeatable Process, the development process has identified inputs (requirements), outputs (code), and constraints (budget and schedule limits). However, the transition from inputs to outputs is not clearly visible and therefore only project metrics can be collected. Possible metrics include amount of effort, duration of the project, size and volatility of requirements, overall project cost, and size of code (110). Pfleeger and McGowan also recommend adding metrics that address the experience and turnover rate of personnel, since personnel can have a significant effect on project cost.

In the third level, Defined Process, the process activities are clearly defined with both entry and exit conditions (111). Due to this more detailed definition, the inputs and outputs to each functional activity can be examined and the intermediate products measured. At this level, project managers can start measuring the complexity of and assessing the quality of requirements, design, code, and test plans. Pfleeger and McGowan thus propose the following metrics at this level: requirements complexity, design complexity, code complexity, and test complexity (112). The number of faults per product and the overall density of defects can also be measured (112). Lastly, Pfleeger and McGowan recommend measuring pages of documentation.

In level 4, Managed Process, priorities for project activities can be set by feedback from early project activities. In other words, feedback determines how resources should be applied. This level enables measurement and control of the process itself by examining reuse, defect-driven testing, and configuration management (112). Using metrics, project managers can correct the course of the development process. Pfleeger and McGowan recommend that the following metrics be collected: process type, amount of producer reuse, amount of consumer reuse, defect identification, use of defect density model for testing, use of configuration management, and module completion over time.

In level 5, Optimizing Process, measures are used to change and improve the process, or to tailor the development process to the situation. Projects at this level of development are rare, and therefore Pfleeger and McGowan offer no metrics.

The Goal-Question-Metric (GQM) Model is another model commonly used to measure productivity and effectiveness. It provides a systematic approach for setting goals and defining them in an operational manner (Basili and Rombach, 1997). Basili and Rombach have developed a set of templates for setting goals and a set of guidelines for deriving questions and metrics.

Goals are defined using a goal-generation template: the object of interest (product or process), the aspect of interest (cost or ability to detect defects), the purpose of the study (assessment or prediction), the point of view from which the study is performed (customer's or manager's), and the context in which the study is performed (people-oriented or problem-oriented factors) (Basili and Green, 1997). These goals are then refined into a set of questions, which determine the metrics in turn. Note that the object of interest can be either a product or a process, and therefore that the GQM allows allows for the definition of goals concerning both effectiveness and productivity.

Stark and Durst (1997) have used the GQM model to define software development and maintenance metrics (61). Their model asks two questions. The first is “How maintainable is the system?” Metrics that answer this question include computer resource utilization, software size, fault density, software volatility (changes), and design complexity. The second question is “Where are the resources going?” Metrics used to assess this include software staffing requirements, service request scheduling, and the distribution of types of faults (problems). Although Stark and Durst focus on process, the GQM model can also be used to define metrics for measuring product effectiveness.

The Risk Management Model is a approach that recognizes the interdependence of effectiveness and productivity is risk management. Boehm focuses on identifying and addressing high-risk elements early in order to avoid project disasters. He notes two aspects of the software process that leads to these disasters: the sequential, document-driven waterfall process model, which binds people to requirements, and the code-driven, evolutionary development process model, which relies on working out problems later. Boehm notes that all good project managers are good risk managers who use a "general concept of risk exposure (potential loss times the probability of loss) to guide their priorities and actions" (28). He defines risk exposure as

RE = P(UO) * L(UO)

where

RE = risk exposure
P(UO) = probability of unsatisfactory outcome
L(UO) = loss to affected parties

He further defines unsatisfactory outcome as multi-dimensional: for customers and developers, budget overruns and schedule slips are unsatisfactory; for users, products with the wrong functionality, user-interface shortfalls, performance shortfalls, or reliability shortfalls are unsatisfactory; and for maintainers, poor-quality software is unsatisfactory (28). Therefore, like the process maturity framework and the GQM model, this model recognizes that risk management affects both product and process, or both effectiveness and productivity.

Boehm states that risk management consists of two primary steps: risk assessment and risk control. Risk assessment involves identifying potential risks, analyzing them, and prioritizing them for the purpose of risk control. Risk control involves planning to address known risks, developing action plans for resolving the risks, and monitoring the action plans.

Object Oriented Design Principles. Erni and Lewerentz (1996) recommend the use of design metrics rather than either product or process metrics. According to them, metrics used as "an integral part of the design process" to improve product quality are design metrics (64). Like the previous three models, then, Erni and Lewerentz note the interdependence between effectiveness and productivity.

Erni and Lewerentz focus on object oriented frameworks (not software), which are meant to be reused in as many applications as possible. Erni and Lewerentz describe three common object-oriented design principles: modularity, abstraction, and simplicity. They claim that applying these object oriented design principles leads to both reusability and maintainability (65). Modularity refers to encapsulation, or the self-referential quality of a software module. Abstraction refers to the degree of generalizability, or the cohesion, completeness, and relative independence of a specific software module with respect to other modules. Lastly, simplicity refers to the size of the software module (the smaller, the better) and the number of interdependencies or connections between software modules (the few, the better).

Although the models discussed above all integrate effectiveness and productivity, software managers also use approaches that are specific to either product or process.

Effectiveness Metrics

The primary method for measuring the effectiveness of software is testing, which is performed throughout the software development process. Tests fall into two general categories: functional testing and usability testing. Types of functional tests include software inspections, unit tests, integration tests, and system tests (Weller, 1997). Software inspections involve the examination of code in order to discover faults, a method that has been found to uncover from 3 to 5 times more errors per hour than other, more complicated tests (Fenton and Pfleeger, 1997).

Unit, integration, and system tests involve the actual execution of specific software modules and isolate not only errors in coding but problems with speed or system interaction. Unit tests verify the functionality of isolated software modules as specified in program requirements; whereas integration tests verify the interaction of modules. Lastly, system tests verify the functionality of the software system as a whole. Unit, integration, and system tests are typically performed at specific stages within the development project. For instance, programmers perform unit tests of software modules as they are completed, integration tests as soon as a related suite of modules are completed, and system tests upon completion of the entire software application.

In addition to functional testing of requirements, programs are tested for usability. Software usability tests resemble the usability tests performed on technical communication products. But rather than focusing solely on communication products, software usability tests explore all aspects of usage, from the speed of using the software to the quality of the support received from the technical support group.

In addition to assessing the effectiveness of a given project, programmers can seek certification of the effectiveness of their skills. Some certifications certify general skills in programming or in managing programming, This certification comes from the Institute for Certification of Computer Professionals and was developed by an independent organization. Another popular certification is in project management skills from the Project Management Institute. Other certification exams assess technical skills with particular software applications, such as Microsoft products. Most of these certifications are developed by the publishers of the particular piece of software.

Productivity Metrics

The most crude metric that measures the productivity of software designers and developers is software lines of code (SLOC), or the number of lines of code that comprise a given program. Like the page count used for technical communicators, it is considered to be a crude measure of effectiveness, because it only measures how much was written. Lo, Webby, and Jeffery note that the SLOC approach has been criticized as follows: (1) there is no universally accepted definition for a SLOC (is a comment count as a line of code?); (2) the number of SLOC is language and programming style dependent; (3) the number of SLOC is difficult to estimate prior to system implementation; and (4) the SLOC approach overemphasizes the coding stage (166).

Lo, Webby, and Jeffery (1996) propose an effort estimation model, or productivity measurement, for the development of Graphical User Interfaces (GUIs). They propose a system that decomposes a GUI into windows and identifies the relationship between the different types of widgets on the window and the effort required to code and test the window. By using this model, developers are able to estimate the amount of time required to implement a window based on a detailed design of that window.

Lo et al. classify widgets into three categories: action, data, and static widgets (167-68). Action widgets provide access to commands and functions, e.g. buttons. This category of widgets is further subdivided based on whether or not a widget requires database access. Data widgets display and accept data. This category is also further subdivided based on whether or not a widget involves a list. Static widgets are widgets that label or group other widgets. Data widgets are most highly correlated with development effort, although action widgets also contribute. Static widgets require the fewest resources, because they involve no additional programming.

Top

Effectiveness and Productivity Metrics for Training

Effectiveness Metrics

Three widely-followed models underlie the efforts of training groups to assess their effectiveness. The first is the instructional systems development (ISD) process. The ISD process was developed by the American Institutes for Research during World War II as a means of hastening development of training for new military technologies so the military could hasten the deployment of those technologies on the battle field (Deutsch, 1992).

The ISD process consists of a sequence of steps that guide an instructional designer through the process of analyzing, designing, developing, implementing, and evaluating a training program. It's sometimes called the ADDIE model. Since World War II, industry took a liking to the model and application in a variety of organizations has resulted in over 31 published variations of the model (Gustafson, 1991) and numerous proprietary variations.

In 1987 and 1993, researchers studied the extent to which instructional designers actually used the model and found that it is used in part or whole by nearly all instructional designers (Zemke & Lee, Wedman & Tesmer). Both studies found that one of the most widely performed tasks in the ISD process is writing observable and measurable learning objectives before design and development begin.

The second model that underlies efforts by trainers to assess their effectiveness is the human performance technology model first proposed by Thomas Gilbert in Human Competence: Engineering Worthy Performance. The model proposes that the value of training to the organizations that sponsor it is the resulting improvement in human performance. Improving human performance involves more than imparting skills and knowledge. It also involves motivating performers to successfully complete assigned tasks and providing performers with the resources needed for those tasks.

The International Society for Performance Improvement and the American Society for Training and Development, two of the largest organizations serving the training profession, officially endorse the performance model as the recommended approach to human resource development.

Most significantly, what matters most is the end performance. Specific means of achieving performance, such as discovery learning, advance organizers, simulations, ratios of text to photographs on the page--are all considered in terms of their effect on resulting performance, not as a measure of quality of themselves.

The third model that underlies efforts by trainers to assess their effectiveness is a widely accepted model for evaluating the effectiveness of training. Ask a trainer what they're doing for Level II evaluation and chances are good that they'll respond with a description of specific strategies for measuring the extent of learning.

This model, called the Kirkpatrick model after its author Donald Kirkpatrick, was first published in 1959 as his doctoral dissertation in 1959 and has gained increasingly wide acceptance in the field. Kirkpatrick proposes that, rather than a single measurement, the effectiveness of training can be evaluated according to different scales and for different purposes. Specifically, he proposes these four types of evaluation (called levels) (1998):

Level

Name

Issues Assessed at this Level

I

Reaction

Assesses participants’ initial reactions to a course. This, in turn, offers insights into participants’ satisfaction with a course, a perception of value. Trainers usually assess this through a survey, often called a “smiley sheet.” Occasionally, trainers use focus groups and similar methods to receive more specific comments (called qualitative feedback) on the courses. According to the TRAINING magazine annual industry survey, almost 100 percent of all trainers perform “Level I” evaluation.

II

Learning

Assesses the amount of information that participants learned. Trainers usually assess this with a criterion-referenced test. The criteria are objectives for the course: statements developed before a course is developed that explicitly state the skills that participants should be able to perform after taking a course. Because the objectives are the requirements for the course, a Level II evaluation assesses conformance to requirements, or quality.

To assess learning, learners must be tested before and after a training program. The pre-testing assesses what learners already know before taking the course.

III

Behavior (or transfer)

Assesses the amount of material that participants actually use in everyday work 6 weeks to 6 months (perhaps longer) after taking the course. This assessment is based on the objectives of the course and assessed through tests, observations, surveys, and interviews with co-workers and supervisors. Like the Level II evaluation, Level III assesses the requirements of the course and can be viewed as a follow-on assessment of quality.

IV

Impact

Assesses the financial impact of the training course on the bottom line of the organization 6 months to 2 years after the course (the actual time varies depending on the context of the course).

For many reasons, Level IV is the most difficult level to measure. First, most training courses do not have explicitly written business objectives, such as “this course should reduce support expenses by 20 percent.” Second, the methodology for assessing business impact is not yet refined. Jack Phillips' actually separates the financial measure from other measures of impact, calling the financial measure Return on Investment (ROI). He suggests a variety of methods for tracking impact and ROI, including business measurements, observations, some by surveys, and qualitative measures. Last, after six months or more, evaluators have difficulty solely attributing changed business results to training when changes in personnel, systems, and other factors might also have contributed to business performance.

Despite these difficulties in obtaining a measure, over 50 percent of organizations perform this type of evaluation on 50 percent of their courses (TRAINING, 1995).

Level IV evaluation is assessment of quality. It does so in financial terms, a perspective different than that of the evaluations at Level II and Level III.

Robinson and Robinson add that an effective evaluation program must be sustained in the long-run. That is, attempts to evaluate isolated projects on an occasional basis will be less fruitful than an ongoing effort to evaluate most projects. This view seems to be widely shared. Most organizations conduct at least Level I and Level II evaluations for the majority of their projects (TRAINING 1995).

To better predict the likelihood that a training program under development will produce measurable learning, many trainers conduct pilot tests of their courses, just as technical communicators rely on usability testing to assess the ease of use of a draft communication product before final publication. Trainers consider pilot tests (also called formative evaluation because it evaluates a training program in formation) to be tests of the course, not the learners, so the results are not used to assess the effectiveness of training programs. Trainers only use the results of summative evaluation, evaluation conducted on an already published course, when assessing their effectiveness (Dick & Carey).

Certainly challenges have arisen to the Kirkpatrick model. For example, Phillips and Kaufman have both proposed additional levels. Phillips proposes that a financial calculation of impact differs from other assessments of it. Kaufman suggests that, in addition to learning impact, trainers should evaluate the social impact of their courses--that is, the benefit to society.

Trainers also recognize that evaluating the transfer and impact of technical skills that pertain directly to a job task, such as computer skills and manufacturing processes, (often called hard skills) is easier than evaluating the transfer and impact of skills that pertain to the more general task of working, such as negotiating skills and demonstrating empathy (often called soft skills) (Phillips).

To provide standardized, benchmarkable measures of learning, The American Society for Training and Development (ASTD) recently launched a benchmarking service. Using a series of standardized surveys, it asks learners to report on their learning immediately after the course, then again three to six months later. The second survey also asks learners how well they can apply the skills taught to their jobs. In conjunction with the second survey, a third one is sent to supervisors, asking how well learners are demonstrating the skills on the job. Organizations submit the results to the ASTD, which tracks courses by category (such as customer relations training and new employee orientation training) and industry. Trainers can use this information to assess whether their courses are meeting industry benchmarks for effectiveness. The benchmarking service follows a several-year study that the organization conducted of world-class practices in training and development. (Bassi & Ahlstrand).

In addition to assessing the effectiveness of a given project, classroom trainers can certify the effectiveness of their teaching skills. The certification tool was developed over the course of a decade in a three-stage process. First, several organizations for training professionals chartered an International Board of Standards and Practices (IBSTPI) to identify the competencies needed in a variety of training positions. Second, at the request of organizations in the health care and computer industry, a certification examination based on these competencies was developed by Chauncy Associates, the for-profit arm of the Educational Testing Service. Last, the certification test was validated. Two forms of the certification are available: one for technical trainers, the other for trainers of professional development and related skills. It is anticipated that passing this certification will become a basic entry requirement for certain jobs as classroom trainers (Fields, 1997).

Productivity Metrics

Although specific measures differ among organizations, training organizations use a few industry-standard measures or methodologies to assess their productivity: the number of students trained, process measurements, return on investment (ROI), percentage of payroll, and industry measurements.

Student-days has been the primary means of measuring productivity in training departments. Student-days measure the number of students in a classroom on a given day. For example, if 21 students attend a 3-day class, the productivity is 63 student days. Organizations have used this measure to assess the productivity of instructors and entire departments. Using the student-day measure in aggregate also allows organizations to assess the productivity of support personnel, like enrollment administrators and classroom coordinators, whose work is essential to the success of a classroom course but who do not have a direct presence in the classroom.

Trainers acknowledge students days as a measure of productivity, not effectiveness, calling it a measure of "butts-in-seats." As long as the primary source of training was the classroom, student-days has been an effective measure of productivity. But as organizations aggressively move their training online (an effort expected to grow 10 to 20 percent (Moe, 1998) between 1998 and 2002) and the number of students in traditional classrooms declines, training organizations are scrambling to find new measures of productivity.

The three models that underlie the assessment of training effectiveness also influence in part the assessment of training productivity.

Another method used to assess the productivity of training is the return on investment. Measuring the return on investment involves a variety of efforts. At the least, it compares the cost of resources invested with the benefits received. The Kirkpatrick model considers these benefits in terms of performance on the job. Other models consider these benefits in terms of revenues generated.

ROI calculations also assess the productivity of the training process itself. Most are used to track the number of resources used for a given type of project to predict the amount needed for future projects. For example, Eugenio found that multimedia projects had four levels of complexity and the time, staff, and budget needed could be predictably calculated by this level of complexity. Foshay studied the accuracy of project budget and schedule estimates made before and after design was complete; he found that estimates after design was complete to be within 3 percent of proposed budgets, while estimates made before design has been started were usually off by as much as 20 or 25 percent (1997). Others link the results of training with the economic performance of the organization (Bassi & Ahlstrand).

Another measure of productivity considers the relationship between expenditures on training and total company expenditures. The measure results from an economic study commissioned by the American Society for Training and Development (ASTD), which compared organizations' total expenditure on training with their total payroll expense (Bassi & Ahlstrand). ASTD found that the average organization spends 1.5 percent of its payroll on training, though some organizations classified as innovative might spend as much as 6 percent. Researchers also correlated the level of expenditures on training with corporate culture, to characterize levels of spending in different industries.

Finally, annual industry surveys help trainers track the size of the industry, the extent of outsourcing, the types of services purchased, the size of organizations, trends in hiring, types of training offered, and salaries. VNU Business Media, publisher of several trade magazines for the training industry, conducts one annual survey. The American Society for Training and Development conducts a panel study similar to the Dow Jones Industrial Average, that tracks the diffusion of industry trends into organizations.

Top

Effectiveness and Productivity Metrics for Marketing Communications and Public Relations

Effectiveness Metrics

Certain measures of effectiveness are almost uniformly consistent among marketing communications organizations, especially those focusing on sales-promotion (promotion of a specific product or short-term sale) and direct marketing (marketing communications efforts aimed at selling products and services, such as catalogs and websites that permit electronic commerce).

Like training and software, some measures of effectiveness are linked to the process. Like training projects, most marketing communicators establish a set of measurable sales goals for a project before initiating "creative" work on a project (what technical communicators would term design and development). A goal might be increase sales by 1.5 percent or generate $USD 15 million in revenues. The sales objectives are usually developed in conjunction with the sponsor.

As trainers rely on formative evaluation to assess the likely effectiveness of courses under development, marketing communicators use pilot testing to assess the likely effectiveness of proposed advertising campaigns. Pilot testing usually occurs in a couple of phases. Marketing communicators often present drafts of ads, catalogs, and other materials to use focus groups to assess audience response. After choosing one response, marketing communicators might conduct a pilot campaign in a few carefully selected geographic regions to assess the results. Marketing communicators use pilot campaigns for large national campaigns, when the cost of the campaign is in the millions of dollars and a pilot campaign can provide data on whether the investment in the full campaign would be worthwhile (Dillon, Madden, and Firtle, 1991).

Direct-response is the means most commonly used to assess the success of the ad. Direct-response refers to a system that measures the number of responses to a marketing communication product, and the nature of that response (such as a request for more information or a purchase). Some direct-response systems involve printing a code on the marketing communication product and requesting that code when a customer calls. Other direct-response systems involve establishing separate phone extensions or numbers for each ad placed. Whatever the system, it involves tracking the response at the time the customer contacts the company.

In addition, some organizations also want to track the source of the contact with the customer, such as a magazine advertisement or a flyer sent to everyone on a mailing list rented from a nonprofit association. By tracking the source, too, an organization can assess whether a particular source of contacts is good or not. If the percentage of respondents from a given list matches or exceeds the overall response to the campaign, the source is probably a good one. If not, the source is probably not good, and not likely to be used again in the near future.

Because direct-response marketing is so widely used within the marketing communications field, and because trade publications like Advertising Age report results of individual campaigns, average performance for different types of marketing campaigns are widely known, such as the average response to a direct mailing.

Although the effectiveness of marketing communications is reported to upper management in terms of sales generated, professional organizations sponsor competitions that assess the effectiveness of marketing communications products for their creative value. For example, the Addys (Advertising Awards) is a U.S.-wide competition that operates at the local and national level. The primary value of awards like the Addys is in their value as a credential on a resume, but not as a measure of effectiveness of a given campaign. In fact, some Addy competitions have let participants enter materials that clients rejected as inappropriate to their business.

Other than the Addys, the measures of the effectiveness of marketing communications product are almost exclusively financial and numerical outcomes: how many sales resulted? How many responses? These are business measures rather than communication measures. (Examples of communication measures are measurements of the use of white space or the impact of color). Communication measures are assessed during pilot testing to assess consumer responses to different communication techniques.

Many of these techniques are adapted for public relations, which is concerned with the quality of an organization's relationships in the community. Like their colleagues in marketing communications, public relations specialists are concerned with collecting numerical data to assess their effectiveness. Because no formula exists for calculating the direct financial benefit of a community relationship, public relations specialists measure, instead, less precise numbers.

On a more basic level, measuring public opinion is a challenging effort. At the least, no ideal scale exists for measuring opinion. Even strength of opinion varies among people, so even though two people might report that the "strongly agree" with a statement, "strong" might actually vary among individuals. Another challenge of measuring public opinion is that it changes. Consider that during an election campaign, polls are taken weekly in anticipation of changes in public opinion. Consider that factors outside of the campaign frequently influence it. For example, if an agency is trying to raise awareness of breast cancer at the same time that a beloved actress announces that she, too, has the disease, this outside factor has a positive impact on the campaign and is reported as an influencing factor.

Within these restrictions, some in public relations organizations have proposed a four-part methodology for assessing the effectiveness of campaigns. According to Saffir and Turrant, the methodology includes:

  1. Volume of placements (that is, how many media outlets mentioned the story or issue that is being promoted)
  2. Gross impressions, as measured by people who saw the campaign
  3. How closely was the press coverage "on-strategy;" (that is, did the coverage directly support the overall strategy to build opinions or was if off)
  4. Quality of placement, such as location and length of a story in a print publication or the timing in the broadcast and length of a story on the radio or television (1994).

They also suggest that organizations assess the effectiveness of a campaign by its boldness. They observe that many organizations like to play safe and avoid offending parts of the public, but failing to take risks will likely result in a drab campaign that will fail to yield the intended results.

Responding to the lack of industry-standard metrics for the effectiveness of public relations campaigns, The Institute for Public Relations and Council of Public Relations Firms announced in April 2000 that they are collaborating to develop the first comprehensive model to measure the effectiveness of public relations programs. The goal is to design "a series of PR Outcome Models" that meet these objectives:

Productivity Metrics

Some measures of effectiveness for marketing communicators double as measures of productivity. For example, revenue generated not only serves as a measure of effectiveness for a marketing communications campaign, but also as a measure of productivity for an internally based marketing communications group. By stating the level of revenue generated, marketing communications management also states how the department contributed to the corporation's financial statement.

Because most marketing communications products are developed and placed in the media by outside agencies (American Advertising Museum), these agencies use other metrics to assess their productivity.

The most common metric is the size of "billings," the size of contracts with client companies. Billings have two components: the cost of services provided (usually the smaller of the two) and the cost of placement (such as the cost of purchasing advertising on a television network and in magazines).

Advertising, too, has a common metric: the cost of reaching 1,000 readers (abbreviated cpm). Newspapers, magazines, radio and television stations, and websites track the number of viewers, readers, or visitors. Rates are set based on the average number of people reached, and the quality of demogra phics. Quality of the demographics refers to the demographic qualities of the viewers, readers, and visitors, and how much advertisers seek those demographics. For example, in the United States, television advertisers prefer viewers between the ages of 18 and 49. If a television show draws a high number of viewers between 18 and 49, its advertising rates will be higher than a show with many more viewers, but primarily in other demographic groups (Dillon, Madden, & Firtle).

As with measurements of effectiveness, measurements of productivity in public relations presents more of a challenge than that of marketing communications. One measure of productivity is output produced, such as the number of press releases. The number of placements for a single press release serves not only as a measure of effectiveness, but also of productivity.

Saffir and Turrant also suggest that another productivity measure is the consideration of alternative uses for the resources expended. If used elsewhere in the organization, could the organization have received a stronger return? Moore (1998) considers it differently, suggesting that organizations compare the time-instensive cost of producing and placing press releases with the cost and effectiveness of advertising.

Intellectual Capital

Like technical communicators, software designers and developers, trainers, and marketing communicators, the financial community is interested in operationally defining the value that information, like technical communication products, add to corporations. Specifically, intellectual capital represents a move by the financial community to place a financial value on the knowledge or intellectual assets of an organization (Stewart, 1997).

Intellectual assets include:

Unfortunately, little of this capital has a stated value in financial terms and businesses calculate their net worth based on financial valuations. As the global economy becomes increasingly based on generating and selling knowledge, the financial community is challenged with devising these calculations.

Edvisson suggests that devising meaningful measures will requires time and effort, but has suggested 128 possible measures of intellectual capital. Most involve the calculations of relationships of intellectual capital to business performance rather than a direct financial calculation. For example, one proposed measure explores the relationship between training in one year and corporate financial performance in a later one.

Top

Lessons Learned

A few common themes run through the choice of metrics used to assess the effectiveness and productivity of software design and development, training, marketing communications, and public relations. Following are the common themes.

Measuring effectiveness and productivity after the fact requires up-front planning before the fact. Metrics that show changes in performance usually involve a comparison: how users performed before versus how they performed afterwards--with a new communication product. Or how the productivity of technical communicators improved over time. To be credible, results cannot be measured after the fact without comparing the data to data collected before the project began. The next several lessons follow from this one.

Following a well-defined process. Software design and development is promoting the process-maturity model and training follows the ISD process. Although it does not have a name, most marketing communications firms follow a three-part analysis, design, and development process. The ISD process and the one followed by marketing communicators includes evaluation as a key step, as significant as conducting a needs analysis or preparing a first draft.

Processes are not static however. For example, as learning moves online and instructional designers increasingly develop materials that do not meet traditional definitions of training, limitations of the instructional systems design process become painfully clear. The process must be adapted to remain relevant (Gordon & Zemke, 2000).

Beginning projects with observable and measurable objectives and assessing effectiveness by the extent to which those objectives have been achieved. Training and marketing communications projects begin with observable, measurable objectives; projects are ultimately assessed against those objectives. In fact, the processes followed in each discipline delay design and development until after the objectives have been written and approved.

Both technical communication and software development make attempts at starting with a goal. Software development efforts usually begin with a requirements document, but the requirements usually focus on features and functions of the product, rather than the end performance of users. Similarly, technical communication projects usually begin with a list of tasks to be documented. The tasks do not need to be written in observable and measurable terms; and are often expressed as the ability to use a function or feature. Although the literature recommends that requirements and tasks be prepared before design and development begin, they often begin without approval of the requirements and tasks.

One of the challenges of establishing goals for technical communication and software development projects is that the flexible use of these products. Many are intended for use in several situations; identifying a few formal goals often seems limiting. The challenge, then, is anticipating the range of goals users might have in these situations, and using these goals to guide the development of more tangible objectives (Carliner, 2000).

A widely accepted methodology for assessing effectiveness. Training and marketing communications have also developed extensive models and methodologies for assessing effectiveness of their work. These methodologies primarily assess the extent to which the objectives were achieved. Training uses the Kirkpatrick; marketing communications uses the direct-response approach.

Usability testing performed by technical communication and software development groups has potential for assessing the effectiveness of the respective groups' work. But both only employ usability tests in formative evaluation; they rarely use it for summative evaluation or offer an alternative methodology for doing so.

Financially based measures of productivity. Through the measurement of sales generated and billings, marketing communications groups use a financial standard to measure their productivity as an organization. The percentage of payroll model emerging in the training field offers a similar measure for internal departments.

Measurements of technique do not assess effectiveness or productivity. Although these characteristics are known from research to improve the effectiveness of training, trainers do not consider interactivity, learning model, test question format, number of visuals, and similar types of techniques when assessing the ultimate effectiveness of their training programs. Similarly, although the use of color, graphics, white space, and readable text are correlated with quality advertising, marketing communicators do not measure these techniques when assessing their effectiveness.

Certainly trainers and marketing communicators care about the effectiveness of teaching and communication techniques, but they consider them during formative evaluation. The presence or absence of these techniques are not considered effectiveness metrics. Only the larger issue--ability to achieve the stated goal--is.

Measurements of intangibles. Saffir and Turrant observe that "quantitative evaluation is not the 'be-all and end-all' but a yardstick" by which organizations can assess their work (212). Although most metrics involve calculation of a hard number, training and marketing communications also include measurements of perceptions and feelings. For example, Level I evaluation in training assesses users' emotional reaction to a training course, and enjoys wide acceptance among training management. Similarly, because they cannot measure financial results, public relations specialists measure perceptions and name recognition.

Ongoing collection of data. The most persuasive data about effectiveness is that collected on an ongoing basis about most projects, not on an isolated basis for some projects.

Industry surveys. Metrics represent a comparison. Certainly those taken within an organization help participants track performance among projects. But many organizations want to compare their performance with that in other organizations. Industry surveys in training and marketing communications allow professionals to compare some aspects of their work across organizations. In many fields (not just these), professional organizations like the American Society for Training and Development conduct these surveys. Similarly, publishers serving the trade organization, like IDG (which publishes ComputerWorld and other trade publications), also conduct industry surveys on software development.

Simple measures don't. Although they provide quick, easily calculated measures of productivity, page counts, lines of code written, and student days fail to effectively and meaningfully measure either the productivity or effectiveness of specialists in the respective disciplines.

No single measure does the job. Software design and development, training, and marketing communications each track several metrics each for effectiveness and productivity, and use each to evaluate a different aspect of the work. Spilka observes that, even within technical communication, "most industry authors advocate a holistic perspective leading to multiple definitions of quality" (210). She adds that "every measure has its unique weaknesses and cannot by itself indicate whether a product or process is high in quality" (212). Furthermore, "uniformly giving primacy of some ingredients of high quality documentation over others can be a risk, because how much these ingredients matter will vary from one context to the next" (211).

In fact, this issue of context also underlies all measurement systems. Although people can collect most metrics in most situations, their importance varies depending on the priorities established.

Results build perceptions, they do not prove value. Although practitioners of software design and development, training, and marketing communications each rigorously collect data about their effectiveness and productivity, these practitioners have not shown that the data alone can prove the value of a service. Instead, it is used as part of a larger system of building and maintaining relationships with internal and external sponsors (Robinson & Robinson).

Top

References

_________. (1999.) Judging Guidelines for the Society for Technical Communication's International Online Communication Competition. Arlington, Virginia: Society for Technical Communication.

________. (1995.) Training Annual Industry Survey. TRAINING. 31(10).

American Advertising Museum. (1992.) Text label for permanent exhibition. Portland, OR.

Bandes, H. (1986.) Defining and controlling documentation quality--part I. Technical communication. 33(1). 6-9.

Bandes, H. (1986.) Defining and controlling documentation quality--part II. Technical communication. 33(2). 69-71.

Basili, Victor R. and H. Dieter Rombach. (1997). The TAME Project: Towards Improvement-Oriented Software Environments. In Paul Oman and Shari Lawrence Pfleeger, eds. Applying software mMetrics, Los Alamitos, CA: IEEE Computer Society Press. 94-108.

Basili, Victor R., and Scott Green. (1997). Software process evolution at the SEL. In Paul Oman and Shari Lawrence Pfleeger, eds. Applying software metrics, Los Alamitos, CA: IEEE Computer Society Press. Los Alamitos, CA: IEEE Computer Society Press. 128-36.

Bassi, Laurie and Amanda Ahlstrand. The 2000 learning outcomes report. Alexandria, VA: American Society for Training and Development.

Boehm, Barry W. (1997). Software Risk Management: Principles and Practices. In Paul Oman and Shari Lawrence Pfleeger, eds. Applying software metrics, Los Alamitos, CA: IEEE Computer Society Press. Los Alamitos, CA: IEEE Computer Society Press. 27-36.

Brooks, Frederick. (1972.) The Mythical man month: essays on software engineering. Reading, MA.

Carliner, Saul. (2000.) Eight things that training and performance improvement professionals must know about knowledge management. Minneapolis: Bill Communications. www.lakewoodconferences.com/kmwp.

Carliner, Saul. (1997) Demonstrating the effectiveness and value of technical communication products and services: a four-level process. Technical communication. 44(3).

Chambers, Marlene. (1999.) Critiquing exhibition criticism, , Museum News 78/5, p31-37 and 65.

Cover, M., Cooke, D., and Hunt, M. (1995.) Estimating the cost of high-quality documentation. Technical communication. 42(1) 76-83.

Daniel, Reva. (1995.) Revising letters to veterans. Technical communication. 42(1). 69-75.

Deutsch, W. (1992.) Teaching machines, programming, computers, and instructional technology: the roots of performance technology. Performance & instruction. 31(2). pp. 14-20.

Dick, W. & Carey, L. (1990.) The systematic design of instruction. (3rd ed.) New York: HarperCollins.

Dillon, W. R., Madden, T. J., & Firtle, N. H. (1990.) Marketing research in a marketing environment. Homewood, IL: Irwin.

Direct Marketing Association. (2000.) Website. http://www.the-dma.org/aboutdma.

Duin, Ann Hill. (1993.) Test drive. In Barnum, Carol and Carliner, Saul (eds.) Techniques for technical communicators. New York: Allyn & Bacon.

Dumas , Joseph S. and Redish, Janice C. (1999). A Practical Guide to Usability Testing. Intellect.

Erni, Karin, and Claus Lewerentz. (1996). Applying design-metrics to object-oriented frameworks. Proceedings. of the third international software metrics symposium. Berlin, Germany, March 25-26, 1996. Los Alamitos, CA: IEEE Computer Society Press. 64-74.

Edvisson, L. and Malone, M. Intellectual Capital: Realizing Your Company's True Value by Finding Its Hidden Brainpower. 1997. New York: HarperBusiness.

Fenton, Norman, and Shari Lawrence Pfleeger. (1997). Science and Substance: A Challenge to Software Engineers. In Paul Oman and Shari Lawrence Pfleeger, eds. Applying software metrics, Los Alamitos, CA: IEEE Computer Society Press. 6-15.

Fields, Dennis. Development of the new trainer certification exam. Presented to the Minnesota chapter of the International Society for Performance Improvement. Minneapolis, MN: July 17, 2000.

Foshay, R. Wellesley (1997.) Fourth Generation Instructional Design. International Society for Performance Improvement Conference. Anaheim: April 18, 1997.

Fredrickson, Lola. (1992.) Quality in technical communication: a definition for the 1990s. Technical communication. 39(3). p.394-399.

Gilbert, Thomas. (1978.) Human competence: engineering worthy performance. Washington, DC: International Society for Performance Improvement.

Gordon, Jack and Ron Zemke. (2000.) The attack on ISD. TRAINING. 37(4).

Gustafson, K. L. (1991.) Survey of instructional development models. Syracuse, NY: ERIC Clearinghouse on Information Resources.

Hackos, J.T. (1995.) Process-maturity model for publications organizations. Proceedings of the 42nd Society for Technical Communication Annual Conference. Arlington, VA: Society for Technical Communication.

Hackos, J. T. (1994.) Managing your documentation projects. New York: John Wiley & Sons.

Hosier, William J., Rubens, Philip M., Krull, Robert, and Velotta, Chris. (1992.) Basing documentation quality standards on research. In Proceedings of the 39th Society for Technical Communication Annual Conference. Arlington, Virginia: Society for Technical Communication. p.428-431.

King, M. (1992.) Presentation to IBM marketing force. Atlanta, Georgia.

Kirkpatrick, D. L. (1998.) Evaluating training programs: the four levels. 2nd edition. San Francisco, CA: Berrett-Koehler.

Lo, R., R. Webby, and R. Jeffery. (1996). Sizing and Estimating the Coding and Unit Testing Effort for GUI Systems. Proceedings of the third international software metrics symposium. Berlin, Germany, March 25-26, 1996. Los Alamitos, CA: IEEE Computer Society Press. 166-73.

Loges, Max. (1998.) The value of technical documentation as an aid in training: the case of the U.S. Lighthouse board. Journal of business and technical communication 12(4). 437-453.

Mead, Jay. Measuring the value added by technical documentation: a review of research and practice. Technical communication: 45(3) p.353-379.

Moe, Michael. (1998.) The book of knowledge. San Francisco: Merrill Lynch.

Murphy, S. (1992.) Researching benchmarks for a corporate documentation group. In Proceedings of the 39th Society for Technical Communication Annual Conference. Arlington, Virginia: Society for Technical Communication. p.723-725.

Neal, Ralph D. (1994). The validation by measurement theory of proposed object-oriented software metrics. Washington, D. C.: National Aeronautics and Space Administration.

Pfleeger, Shari Lawrence, and Clement McGowan. (1997). Software Metrics in the Process Maturity Framework. In Paul Oman and Shari Lawrence Pfleeger, eds. Applying software metrics, Los Alamitos, CA: IEEE Computer Society Press. 109-15.

Phillips, Jack. (1997.) Handbook of training evaluation and measurement methods. Gulf Publishing Company.

Phillips, Jack. (1999.) Return-on-investment in training. Lakewood Conferences. Phoenix, Arizona: Training Director's Forum. June 7, 1999.

Redish, Janice. (1995.) Adding value as a technical communicator. Technical communication. 42(1) p.26-39.

Robinson, D. and Robinson, J. (1989.) Training for impact. San Francisco, CA: Jossey Bass.

Saffir, Leonard and John Turrant. (1994.) Power public relations: how to get PR to work for you. NTC business books. Lincolnwood (Chicago), IL.

See, Edward. (1995.) Moving toward an entrepreneurial model: providing technical information services within a large corporation. Technical communication. 42(4). 421-425.

Spencer, C. J. and Yates, D. K. (1995.) A good user's guide means fewer suport calls and lower support costs. Technical communication. 42(1). 52-55.

Spilka, Rachel. (2000.) The issue of quality in professional documentation: how can academia make more of a difference. Technical communication quarterly. 9(2). 207-220.

Stark, George, and Robert C. Durst. (1997). Using Metrics in Management Decision Making. In Paul Oman and Shari Lawrence Pfleeger, eds. Applying software metrics, Los Alamitos, CA: IEEE Computer Society Press. 60-66.

Stewart, Thomas. Intellectual Capital: the New Wealth of Organizations. 1997. New York: Currency/Doubleday.

Thomson, Terry. (1998.) Forecasting development costs for training and documentation. Fredrickson Communications website. www.fredcomm.com.

Wedman, J. & Tessmer, M. (1993.) Instructional designer's decisions and priorities: a survey of design practice. Performance improvement quarterly. 6(2). pp. 43-57.

Weller, Edward F. (1997). Using Metrics to Manage Software Projects. In Paul Oman and Shari Lawrence Pfleeger, eds. Applying software metrics, Los Alamitos, CA: IEEE Computer Society Press. 53-59.

Zemke, Ron and Lee, Chris. (1987.) How long does it take? TRAINING. 24(6). pp. 75-80.

Top

Project Management | People Management | Business Management | Information Design Models, Processes, and Techniques | Home

(c) Copyright. 2002. All rights reserved.