Application Data Exchange Assessment Framework and Functional Requirements for Mobile Health - Local Development build (v0.1.0). See the Directory of published versions
This section describes how an Actor can be assessed for conformance to this guide.
Conformity assessment must be readily reproducible across different assessors. It should be measurable against each requirement, as well as through groups of requirements in a functional area or category. However, pass/fail conformity assessment (e.g., as is done for UL Listing or CE Mark) is generally insufficient when there is great diversity in end-user requirements. Thus, this guide takes the approach that there are multiple levels of “conformance”, as has been done with other specifications.
Some IETF specifications (known as Requests for Comments or RFC) have distinguished between conforming and fully conforming to the specification. A conforming system implements all SHALL requirements, but a fully conforming system implements all SHALL and SHOULD requirements. The distinction is useful, but the terminology used is subtle and may not be easily understood for users not familiar with this usage. It shows up in a few commonly used RFCs, but is not common, and the distinction would lost on those not familiar with standards conformity assessment in general.
This guide describes a more user friendly stars rating system on an ordinal scale of 1 to 5. However, unlike some stars rating systems which simply based on aggregated ratings of subjective assessors, this guide defines a rating system which is based on objective criteria, which should be more readily reproducable across different assessors.
The following demonstrates the scoring: where 0 stars is fully non-conforming, 1 star meets some requirements, 2 starts meets most of the requirements, 3 stars meets all required criteria, 4 meets all required criteria and some degree of recommended criteria, and 5 stars meets all required criteria, and most or all of the recommended criteria.
For the purposes of this guide, most is defined as 50% or more. [should this be higher? –KWB]
Each requirement criterion is individually measured on a pass/fail basis, and includes one or more scenarios containing GIVEN/WHEN/THEN statements describing how to perform the test. Only SHALL and SHOULD criteria are used in this assessment. Criteria using MAY or NEED NOT are merely present to clarify allowed behavior and describe optional behaviors for which there is no value judgment. These criteria are often provided to clarify to implementors nuances that might be confusing without further explanation.
For any given requirement, if any test scenario fails, the SUT fails to conform to the requirement. Otherwise the SUT conforms to this requirement.
The results of conformance testing for a given SUT include:
A given requirement may not be applicable to a given SUT, or the assessor may not have performed a specified test on the SUT (e.g., because test sponsor did not require those tests be performed).
For example, there is a requirement that the results of a blood pressure measurement be displayed to the user. This is a requirement of an App, but is NOT a requirement of a Device. Thus, the results for this test would be reported as Not Applicable. In another case, the there are requirements on devices to be able to report blood pressure, heart rate, and respiration rate. However, these three observations are not reported by all devices, and aren’t always needed for every use case. Thus, a clinic evaluating devices for use in blood pressure monitoring may elect to Not Test a device against the sub-category containing requirements on respiration rate. In this case, the test was not performed.
For more clarity: The inability of a device to record respiration rate does not make the test “not applicable” when the category for basic device operations is chosen. This is a failure of the device to support that function. This does NOT indicate a flaw in the device. It merely reports the device’s inability to support that requirement.
As in laboratory diagnostics, tests provide objective evidence. Assessments perform computations on those results to enable interpretation of test results. The 0-5 star rating of a system is the assessment, the pass/fail status for each requirement are the tests.
The procedure for computing assessments is as follows. For a group of requirements (e.g., a category or sub-category):
To report the assessment, the following values should be provided:
These values provide the interpretation of the result (the star ranking), and allow sub-category results to be aggregated upwards to category results. Assessments can be computed at a category even when some of the sub-category tests have not been performed. In this case, the testing for the category must be recorded as being incomplete (and this status will propagate upwards to the next category and so on).
Technical specifications such as FHIR or CDA Implementation guides can also be tested using this guide, regulatory and other requirements, and even clinical guidelines can be assessed against this guide. However, because these are specifications, not systems, they require additional considerations when computing reporting assessments and results. What is important for implementers of a specification is to be able to determine the difference between what is minimally required, what can be supported with additional effort, and what simply cannot be done.
Two sets of results must be computed and assessed. The first set of results report what the specification under test requires (the minimal results). The second set of results report what the specification allows (the maximal results). An assessment is reported for each of these possible results. The minimal results show how the assessment would perform against an actual implementation conforming to the specification without any additional work. The maximal results show how the assessment would perform against an actual implementation that started off by conforming to the specification being tested, but was augmented to ensure conformance to this guide.
Consider evaluating the FHIR Observation Vital Signs and the AMA IHMI Observation Blood Pressure requirements, against both the Basic and Clinical Blood Pressure Observation requirements of this guide.
A FHIR Observation Resource conforming to either of these two guides guide will get 5 stars when evaluated against the the Basic Blood Pressure Observation. One can safely choose an implementation meeting either one and expect that it will meet those requirements.
However, a FHIR Observation resource meeting only the minimum necessary requirements of the FHIR Observation Vital Signs profile will NOT meet the requirements of the Clinical Blood Pressure requirements of this guide, it will likely get two stars. But, one conforming to the AMI IHMI Observation Blood Pressure 3 requirements will meet the more stringent guidelines in the Clinical Blood Pressure requirements (it will receive 3 or more stars). Thus, if your application needs to meet the clinical blood pressure requirements of this guide, and you have access to an implementation meeting the AMI IHMI Observation Blood Pressure 3 requirements, then that should be the way to go (when other considerations are not relevant).
Even so, the FHIR Observation Vital Signs profile does not prohibit one from creating a FHIR Observation resource that also conforms to the Clinical Blood Pressure Observation requirements of this guide. The distance between these two represents a certain amount of effort that must be expended to go that final step. That effort may need to be weighed against the cost of acquiring an implementation supporting the more demanding profile.
Given two requirements S and G, where S describes the implementation guide being tested, and G describes a scenario this guide:
The assessment results for the maximal case will always be at least as good as the assessment results for the minimal case. Reporting both assessments enables users who may have access to an implementation supporting S to also support G, and to assess how much additional work is needed to make the output of the implementation of S support G.
For a given criteria group in this guide, the reporting recommendations are as follows:
Display the category name, followed by a number of filled stars given by the assessment ranking in green (or black for B&W images). Follow that by a number of open stars outlined in red (or black for B&W images) necessary to ensure that there are always 5 stars displayed. After the stars include three - four numbers reported in the following form: {SHALL-passed}+{SHOULD-passed}/{SHALL-total}+{SHOULD-total}
Indent categories under subcategories if they are displayed in the same area.
An example report is given below:
Physical Activity and Sleep 1+2/6+6
Physical Activity 0+1/3+3
Sleep 1+1/3+3
Basic Vital Signs 10+3/11+11
Basic SPO2 2+1/3+3
Basic Respiration and Pulse 3+0/3+3
Basic Blood Pressure 3+1/3+3
Basic Height and Weight 2+1/2+2
Display the category name, followed by a number of stars filled in red (or black for B&W images) given by the minimal assessment ranking. Follow that by a number of filled stars in yellow (or light-gray screen in B&W images) necessary to bring the total up to the maximal assessment ranking (this may be 0). Follow that by a number of open stars outlined in red (or black for B&W images) necessary to ensure that there are always 5 stars displayed. After the stars include three - six numbers reported in the form: {SHALL-min-passed}-{SHALL-max-passed}+{SHOULD-min-passed}-{SHOULD-max-passed}/{SHALL-total}+{SHOULD-total}
When min-passed and max-passed are the same value, they should be reported as a single number.
Indent categories under subcategories if they are displayed in the same area.
An example report is given below:
Physical Activity and Sleep 1+2/6+6
Physical Activity 0+1/3+3
Sleep 1+1/3+3
Basic Vital Signs 10-11+4-9/11+11
Basic SPO2 2-3+1-3/3+3
Basic Respiration and Pulse 3+0-2/3+3
Basic Blood Pressure 3+1-3/3+3
Basic Height and Weight 2+1-2/2+2
NOTE: Colors for Red, Green and Yellow should be selected from a color-blind safe pallette.
Gherkin describes a language that is used for testing
applications. The core of this language is made up of three keywords GIVEN
, WHEN
and
THEN
and the conjunction AND
structured into scenarios (or examples) to test a specific feature.
This guide describes a requirement using the Feature: keyword in Gherkin.
Requirements can further be broken down into specific business rules using the Rule: keyword to describe a business rule under test. When rules are used, if any rule fails, the entire feature fails.
Each test is provided using the Scenario: keyword to describe how the feature can be tested.
Descriptive text will follow the Feature, Rule, or Scenario to provide more detailed information. This guide uses Rationale: as a keyword to explain the reason why a particular feature is important. Sometimes requirements are obvious, and including the rationale for a particular requirement aids in communicating the need for a feature.
The keywords have the following meanings:
AND
at the start of the next line (and subsequent lines).AND
or to reverse the logic of the
condition BUT
.This guide uses tags to identifier the actors to which a feature is applicable. The form of these tags is @{Actor-Name}[-{Shall|Should}]
When only the actor name is given, the feature represents a Shall requirement, as in @App below. This can be made explicit by using @App-Shall instead.
Specific rules or scenarios can also be marked with these tags to create rules specific to the actors to which they are applied.
An example requirement is provided below.
@App @Device-Should
Feature: User data SHALL be hidden after a period of inactivity.
This example illustrates the form of a Functional Requirement. Each requirement will be recorded as a feature in Gherkin. The heading preceding the requirement will provide the requirement identifier and name. Requirement identifiers are numbered in sequence, and preceded by a short mnemonic that identifies the requirement category. This example requirement has the identifier EX-1, and the name “Example Feature”. Rationale: User data should not be exposed when a user is not interactive with the device or application. Hiding the screen prevents user data from being exposed.
Rule: A screen saver must be present that SHALL hide the users data after a configurable period of time has elapsed.
Scenario: Example Scenario
GIVEN
the screen saver is configured to hide the screen after 5 minutes.
WHEN
the time period has elapsed
THEN
screen is hidden,
AND
the user must reenter their password to resume work,
BUT
the user is not logged out.
Rule: If the user does not configure the screen saver, then the default timeout period SHALL be used.
Scenario: Default Scenario
GIVEN
the screen saver has NOT been configured by the user
WHEN
the default timeout period has elapsed
THEN
screen is hidden,
AND
the user must reenter their password to resume work,
BUT
the user is not logged out.