Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  
Some of that is contentious, but it's just talking points at this stage. We've tiptoed around the issue of extant conventions, and the ongoing result is a lot of attributes that nobody's really sure how to use, and which don't result in better logs. In Casablanca it's time to be less conservative. 

Examples

Markers

Markers unambiguously assign semantics to individual log messages. They allow messages that have a specific meaning to be cheaply and easily identified in logger output, without inherently unreliable (and more costly, and less easily enforced) schemes like scanning for magic strings in the text of each log message. 

ONAP logging requires the emission of Markers reporting entry, exit and invocation as the execution of requests pass between ONAP components. This information is used to generate a call graph.

ONAP components are also free to use Markers for their own purposes. Any Markers that are logged will be automatically indexed by Logstash. 

Markers differ from MDCs in two important ways:

  1. They have a name, but no value. They are a tag - like a label. 
  2. They are specified explicitly on invocation. They are not ThreadLocal, and they do not propagate.

EELF's implementation can be modified to emit Markers, but its public APIs do not allow them to be passed in by callers.

Log4J docs, SLF4J docs

see code on reference folder in git clone ssh://michaelobrien@gerrit.onap.org:29418/logging-analytics

Examples

Marker formatting is using tabs - 

Jira
serverONAP JIRA
serverId425b2b0a-557c-3c0c-b515-579789cceedb
keyLOG-553
 - see https://git.onap.org/logging-analytics/tree/reference/logging-slf4j/src/test/resources/logback.xml#n9

Note there are 3 tabs (see p_mak in logback.xml) delimiting the MARKERS (ENTRY and EXIT) at the end of each line
<property name="p_mak" value="%replace(%replace(%marker){'\t', '\\\\t'}){'\n','\\\\n'}"/>
Code Block
languagejava
themeMidnight
2018-07-05T20:21:34.794Z	http-nio-8080-exec-2	INFO	org.onap.demo.logging.ApplicationService	InstanceID=ede7dd52-91e8-45ce-9406-fbafd17a7d4c, RequestID=f9d8bb0f-4b4b-4700-9853-d3b79d861c5b, ServiceName=/logging-demo/rest/health/health, InvocationID=8f4c1f1d-5b32-4981-b658-e5992f28e6c8, InvokeTimestamp=2018-07-05T20:21:26.617Z, PartnerName=, ClientIPAddress=0:0:0:0:0:0:0:1, ServerFQDN=localhost			ENTRY	
2018-07-05T20:22:09.268Z	http-nio-8080-exec-2	INFO	org.onap.demo.logging.ApplicationService	ResponseCode=, InstanceID=ede7dd52-91e8-45ce-9406-fbafd17a7d4c, RequestID=f9d8bb0f-4b4b-4700-9853-d3b79d861c5b, ServiceName=/logging-demo/rest/health/health, ResponseDescription=, InvocationID=8f4c1f1d-5b32-4981-b658-e5992f28e6c8, Severity=, InvokeTimestamp=2018-07-05T20:21:26.617Z, PartnerName=, ClientIPAddress=0:0:0:0:0:0:0:1, ServerFQDN=localhost, StatusCode=			EXIT	

Logging

Via SLF4J:

Code Block
languagejava
linenumberstrue
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.slf4j.Marker;
import org.slf4j.MarkerFactory;
// ...
final Logger logger = LoggerFactory.getLogger(this.getClass());
final Marker marker = MarkerFactory.getMarker("MY_MARKER");
logger.warn(marker, "This warning has a 'MY_MARKER' annotation.");

EELF does not allow Markers to be set directly. See notes on the InvocationID MDC.

Serializing

Marker names also need to be escaped, though they're much less likely to contain problematic characters than MDC values.

Escaping in Logback configuration can be achieved with:

Code Block
languagetext
linenumberstrue
%replace(%replace(%marker){'\t','\\\\t'}){'\n','\\\\n'}

...

TODO: add table detailing which log files each marker is a part of - from a use case perspective

This should be reported as early in invocation as possible, immediately after setting the RequestID and InvocationID MDCs.

It can be automatically set by EELF, and written to the AUDIT log. 

It must be manually set otherwise. Candidate for framework

EELF:

Code Block
languagejava
titleEELF
linenumberstrue
final EELFLogger logger = EELFManager.getAuditLogger();
logger.auditEvent("Entering.");

SLF4J:

Code Block
languagejava
titleSLF4J
linenumberstrue
public static final Marker ENTRY = MarkerFactory.getMarker("ENTRY");
// ... 
final Logger logger = LoggerFactory.getLogger(this.getClass());
logger.debug(ENTRY, "Entering.");

Marker - EXIT

This should be reported as late in invocation as possible, immediately before unsetting the RequestID and InvocationID MDCs.

It can be automatically reported by EELF, and written to the METRICS log. 

It must be manually set otherwise. 

EELF:

Code Block
languagejava
titleEELF
linenumberstrue
final EELFLogger logger = EELFManager.getMetricsLogger();
logger.metricsEvent("Exiting.");

SLF4J:

Code Block
languagejava
titleSLF4J
linenumberstrue
public static final Marker EXIT = MarkerFactory.getMarker("EXIT");
// ... 
final Logger logger = LoggerFactory.getLogger(this.getClass());
logger.debug(EXIT, "Exiting.");

Marker - INVOKE

This should be reported by the caller of another ONAP component via REST, including a newly allocated InvocationID, which will be passed to the caller. 

SLF4J:

Code Block
languagejava
titleSLF4J
linenumberstrue
public static final Marker INVOKE = MarkerFactory.getMarker("INVOKE");
// ...

// Generate and report invocation ID. 

final String invocationID = UUID.randomUUID().toString();
MDC.put(MDC_INVOCATION_ID, invocationID);
try {
    logger.debug(INVOKE_SYNCHRONOUS, "Invoking synchronously ... ");
}
finally {
    MDC.remove(MDC_INVOCATION_ID);
}

// Pass invocationID as HTTP X-InvocationID header.

callDownstreamSystem(invocationID, ... );

EELF examples of INVOCATION_ID reporting, without changing published APIs.

Marker - INVOKE-RETURN

This should be reported by the caller of another ONAP component via REST on return.

InvokeTimestamp context dependent MDC will be reported here.

SLF4J:
Code Block
languagejava
titleSLF4J
linenumberstrue
TBD

Marker - INVOKE-SYNCHRONOUS

This should accompany INVOKE when the invocation is synchronous.

SLF4J:
Code Block
languagejava
titleSLF4J
linenumberstrue
public static final Marker INVOKE_SYNCHRONOUS;
static {
    INVOKE_SYNCHRONOUS = MarkerFactory.getMarker("INVOKE");
    INVOKE_SYNCHRONOUS.add(MarkerFactory.getMarker("SYNCHRONOUS"));
}
// ...

// Generate and report invocation ID. 

final String invocationID = UUID.randomUUID().toString();
MDC.put(MDC_INVOCATION_ID, invocationID);
try {
    logger.debug(INVOKE_SYNCHRONOUS, "Invoking synchronously ... ");
}
finally {
    MDC.remove(MDC_INVOCATION_ID);
}

// Pass invocationID as HTTP X-InvocationID header.

callDownstreamSystem(invocationID, ... );

EELF example of SYNCHRONOUS reporting, without changing published APIs. 

Errorcodes

Errorcodes are reported as MDCs. 

TODO: add to table

Exceptions should be accompanied by an errrorcode. Typically this is achieved by incorporating errorcodes into your exception hierarchy and error handling. ONAP components generally do not share this kind of code, though EELF defines a marker interface (meaning it has no methods) EELFResolvableErrorEnum.

A common convention is for errorcodes to have two components:

  1. A prefix, which identifies the origin of the error. 
  2. A suffix, which identifies the kind of error.

Suffixes may be numeric or text. They may also be common to more than one component.

For example:

Code Block
languagetext
linenumberstrue
COMPONENT_X.STORAGE_ERROR

Output Format

Several considerations:

  1. Logs should be human-readable (within reason). 
  2. Shipper and indexing performance and durability depends on logs that can be parsed quickly and reliably.
  3. Consistency means fewer shipping and indexing rules are required.

Text Output

TODO: 20190115 - do not take the example in this section until I reverify it in terms of the reworked spec example in 

https://gerrit.onap.org/r/#/c/62405/20/reference/logging-kubernetes/logdemonode/charts/logdemonode/resources/config/logback.xml

Jira
serverONAP JIRA
serverId425b2b0a-557c-3c0c-b515-579789cceedb
keyLOG-630

ONAP needs to strike a balance between human-readable and machine-readable logs. This means:

  • The use of PIPE (|) as a delimiter. (Previously tab, and before that ... pipe).
  • Escaping all messages, exceptions, MDC values, Markers, etc. to replace tabs and pipes in their content.
  • Escaping all newlines with \n so that each entry is on one line

In logback, this looks like:

Code Block
languagexml
linenumberstrue
<property name="defaultPattern" value="%nopexception%logger
|%date{yyyy-MM-dd'T'HH:mm:ss.SSSXXX,UTC}
|%level
|%replace(%replace(%replace(%message){'\t','\\\\t'}){'\n','\\\\n'}){'|','\\\\|'}
|%replace(%replace(%replace(%mdc){'\t','\\\\t'}){'\n','\\\\n'}){'|','\\\\|'}
|%replace(%replace(%replace(%rootException){'\t','\\\\t'}){'\n','\\\\n'}){'|','\\\\|'}
|%replace(%replace(%replace(%marker){'\t','\\\\t'}){'\n','\\\\n'}){'|','\\\\|'}
|%thread
|%n"/>

The output of which, with MDCs, a Marker and a nested exception, with newlines added for readability, looks like:

Code Block
languagetext
linenumberstrue
org.onap.example.component1.subcomponent1.LogbackTest
|2017-08-06T16:09:03.594Z
|ERROR
|Here's an error, that's usually bad
|key1=value1, key2=value2 with space, key5=value5"with"quotes, key3=value3\nwith\nnewlines, key4=value4\twith\ttabs
|java.lang.RuntimeException: Here's Johnny
\n\tat org.onap.example.component1.subcomponent1.LogbackTest.main(LogbackTest.java:24)
\nWrapped by: java.lang.RuntimeException: Little pigs, little pigs, let me come in
\n\tat org.onap.example.component1.subcomponent1.LogbackTest.main(LogbackTest.java:27)
|AMarker1
|main     

Default Logstash indexing rules understand output in this format.

XML Output

For Log4j 1.X output, since escaping is not supported, the best alternative is to emit logs in XML format, we will expand on JSON support

There may be other instances where XML (or JSON) output may be desirable. Default indexing rules support 

Default Logstash indexing rules understand the XML output of Log4J's XMLLayout.

Note that we're hoping that support for indexing of XML output can be deprecated during Beijing. This relies on the adoption of ODL Carbon, which should eliminate any remnant of Log4J1.X.

Output Location

Standardization of output locations makes logs easier to locate and ship for indexing. 

Expand on out-of-container locations off /dockerdata-nfs

Logfiles should default to beneath /var/log, and beneath /var/log/onap in the case of core ONAP components:

Code Block
languagetext
linenumberstrue
/var/log/onap/<component>[/<subcomponent>]/*.log

(warning)Anything else written into /var/log/ONAP_EELF, if so, is it /var/log/ONAP_EELF or /var/log/onap_EELF?

For the duration of Beijing, logs  will be written to a separate directory, /var/log/ONAP_EELF:

expand on Casablanca differences, and adding as a config setting in OOM

Code Block
languagetext
linenumberstrue
/var/log/ONAP_EELF/<component>[/<subcomponent>]/*.log

Configuration

Logging providers should be configured by file. Files should be at a predictable, static location, so that they can be written by deployment automation. Ideally this should be under /etc/ONAP, but compliance is low.

Locations

All logger provider configuration document locations namespaced by component and (if applicable) subcomponent by default:

Code Block
languagetext
linenumberstrue
/etc/ONAP/<component>[/<subcomponent>]/<provider>.xml

Where <provider>.xml, will typically be one of:

  1. logback.xml
  2. log4j.xml
  3. log4j.properties

Reconfiguration

Logger providers should reconfigure themselves automatically when their configuration file is rewritten. All major providers should support this. 

The default interval is 10s. 

Overrides

The location of the configuration file MAY be overrideable, for example by an environment variable, but this is left for individual components to decide. 

Archetypes

...

Output Format

Output Location

Configuration

Logging providers should be configured by file. Files should be at a predictable, static location, so that they can be written by deployment automation. Ideally this should be under /etc/ONAP, but compliance is low.

Locations

All logger provider configuration document locations namespaced by component and (if applicable) subcomponent by default:

Code Block
languagetext
linenumberstrue
/etc/ONAP/<component>[/<subcomponent>]/<provider>.xml

Where <provider>.xml, will typically be one of:

  1. logback.xml
  2. log4j.xml
  3. log4j.properties

Reconfiguration

Logger providers should reconfigure themselves automatically when their configuration file is rewritten. All major providers should support this. 

The default interval is 10s. 

Overrides

The location of the configuration file MAY be overrideable, for example by an environment variable, but this is left for individual components to decide. 

Archetypes

Configuration archetypes can be found in the ONAP codebase https://git.onap.org/logging-analytics/tree/. Choose according to your provider, and whether you're logging via EELF. Efforts to standardize them are underwayso the ones you should be looking for are where pipe (|) is used as a separator. (Previously it was "|").

Retention

Types of EELF Logs

Retention

Logfiles are often large. Logging providers allow retention policies to be configured. 

Retention has to balance:

  • The need to index logs before they're removed.
  • The need to retain logs for other (including regulatory) purposes. 

Defaults are subject to change. Currently they are:

  1. Files <= 50MB before rollover. 
  2. Files retain for 30 days. 
  3. Total files capped at 10GB. 

In Logback configuration XML:

Code Block
languagexml
linenumberstrue
<appender name="file" class="ch.qos.logback.core.rolling.RollingFileAppender">
    <file>${outputDirectory}/${outputFilename}.log</file>
    <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
        <fileNamePattern>${outputDirectory}/${outputFilename}.%d{yyyy-MM-dd}.%i.log.zip</fileNamePattern>
        <maxFileSize>50MB</maxFileSize>
        <maxHistory>30</maxHistory>
        <totalSizeCap>10GB</totalSizeCap>
    </rollingPolicy>
    <encoder>
        <!-- ... -->
    </encoder>
</appender>

Types of EELF Logs

EELF guidelines stipulate that an application should output log records to four separate files:

  1. audit
  2. metrics
  3. error
  4. debug

This applies only to EELF logging. Components which log directly to a provider may choose to emit the same set of logs, but most do not. 

Audit Log

An audit log is required for EELF-enabled components, and provides a summary view of the processing of a (e.g., transaction) request within an application. It captures activity requests that are received by an ONAP component, and includes such information as the time the activity is initiated, then it finishes, and the API that is invoked at the component.

Audit log records are intended to capture the high level view of activity within an ONAP component. Specifically, an API request handled by an ONAP component is reflected in a single Audit log record that captures the time the request was received, the time that processing was completed, as well as other information about the API request (e.g., API name, on whose behalf it was invoked, etc).

Metrics Log

A metrics log is required for EELF-enabled components, and provides a more detailed view into the processing of a transaction within an application. It captures the beginning and ending of activities needed to complete it. These can include calls to or interactions with other ONAP or non-ONAP entities.

Suboperations invoked as part of the processing of the API request are logged in the Metrics log. For example, when a call is made to another ONAP component or external (i.e., non-ONAP) entity, a Metrics log record captures that call. In such a case, the Metrics log record indicates (among other things) the time the call is made, when it returns, the entity that is called, and the API invoked on that entity. The Metrics log record contain the same RequestID as the Audit log record so the two can be correlated.

Note that a single request may result in multiple Audit log records at an ONAP component and may result in multiple Metrics log records generated by the component when multiple suboperations are required to satisfy the API request captured in the Audit log record.

Error Log

An error log is required for EELF-enabled components, and is intended to capture info, warn, error and fatal conditions sensed (“exception handled”) by the software components.

This includes previous logs that went to application.log

Debug Log

A debug log is optional for EELF-enabled components, and is intended to capture whatever data may be needed to debug and correct abnormal conditions of the application.

Engine.out

Console logging may also be present, and is intended to capture “system/infrastructure” records. That is stdout and stderr assigned to a single “engine.out” file in a directory configurable (e.g. as an environment/shell variable) by operations personnel.

Application Log (deprecated)

see example in https://git.onap.org/oom/tree/kubernetes/portal/charts/portal-sdk/resources/config/deliveries/properties/ONAPPORTALSDK/logback.xml

We no longer support this 5th log file - see error.log 

Log File Locations

There are several locations where logs are available on the host, on the nfs share and in each application and filebeat container

Logs on each Kubernetes host VM - docker empty.dir shares

Logs on the /dockerdata-nfs share across hosts

Logs on each microservice docker container - /var/log/onap

...

New ONAP Component Checklist

...

 They are unordered. 

What's New

TBD

(Including what WILL be new in v1.2  / R2). 

  1. Field separator reverted to pipe. 
  2. Dual appenders in Logback and Log4j reference configurations:
    1. Indexable, for shipping and indexing. 
    2. EELF, for backward compatibility. 
    3. Minor changes to path conventions.
  3. XML output deprecated (required only for Log4j1.2, which is also expected to go).
  4. Improved documentation of semantics and usage (including initialization and propagation via ThreadLocaland HTTP headers) for existing MDCs and attributes. 
  5. Add MDCs/Markers + usage for invocation IDs, allowing call graphs to be built without reliance on heuristics.
  6. Revisiting persistence (a clear requirement) and rollover settings, based on feedback from operations. 
  7. More discussion of How to Log. (Where previously guidelines were largely concerned with architecture and mechanics).
  8. Locking in other changes proposed in R1, including MDC serialization, escaping, etc. These can be treated as accepted. (Note that they only affect indexable output).

In addition, we expect to provide (as a Beijing deliverable) a minimal, synthetic component as an example of best-practices, and this will provide all code examples for this guide. (Does that mean the example will log via EELF, or will we end up with two variants?)

Pending Specification Work

...