About This Document

Official R1 documentation snapshot in  https://onap.readthedocs.io/en/latest/submodules/logging-analytics.git/docs/

This document specifies logging conventions to be followed by ONAP component applications.  

ONAP logging is intended to support operability, debugging and reporting on ONAP. These guidelines address:

  • Events that are written by ONAP components.
  • Propagation of transaction and invocation information between components.
  • MDCs, Markers and other information that should be attached to log messages.
    • MDC = Mapped Diagnostic Context
  • Human and machine-readable output format(s).
  • Files, locations and other conventions. 

Original AT&T ONAP Logging guidelines (pre amsterdam release) - for historical reference only: https://wiki.onap.org/download/attachments/1015849/ONAP%20application%20logging%20guidelines.pdf?api=v2

The Acumos logging specification follows this document at https://wiki.acumos.org/display/OAM/Log+Standards 

Logback reference: Logging Developer Guide#Logback.xml based on https://gerrit.onap.org/r/#/c/62405

Introduction

The purpose of ONAP logging is to capture information needed to operate, troubleshoot and report on the performance of the ONAP platform and its constituent components. Log records may be viewed and consumed directly by users and systems, indexed and loaded into a datastore, and used to compute metrics and generate reports. 

The processing of a single client request will often involve multiple ONAP components and/or subcomponents (interchangeably referred to as ‘application’ in this document). The ability to track flows across components is critical to understanding ONAP’s behavior and performance. ONAP logging uses a universally unique RequestID value in log records to track the processing of every client request through all the ONAP components involved in its processing.

A reference configuration of Elastic Stack is being deployed using ONAP Operations Manager since the amsterdam release - see usage in Logging Analytics Dashboards (Kibana)

This document proposes conventions you can follow to generate conformant, indexable logging output from your component.

Supported Languages

Java (including Scala and Clojure - because they compile to the same bytecode) and Python are supported, but conventions may be implemented by other technologies like GO. 

Java

The library and aop wrapper are written in Java and will work for Clojure (thanks to a discussion with Shwetank Dave that reminded me of languages that compile to bytecode) and Scala as well. see: https://git.onap.org/logging-analytics/tree/reference

Clojure

See the java libraries

Scala

see the java libraries

Python

https://lists.onap.org/g/onap-discuss/topic/logging_python_logging/25307286?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,25307286

How to Log

ONAP prescribes conventions. The use of certain APIs and providers is recommended, but they are not mandatory. Most components log via EELF or SLF4J to a provider like Logback or Log4j.

Logging Specification Compliance

For the casablanca release the logging specification has been updated and finalized as of June 2018.  Implementation of this specification is required but the method of implementation is optional based on each team's level of possible engagement.

High Level the changes are introduction of MDC's (key/value pairs like requestID=...) and Markers (labels like ENTRY/EXIT)

The ELK stack (indexes and dashboards) is focused on the new specification in Casablanca - if time permits we will have a migration path and/or support for both the older Amsterdam/Beijing release and Casablanca -  LOG-584 - Getting issue details... STATUS

Level 0: Log specification unchanged

Just keep your code as-is and commit to migrating to the the MDC and Marker focused specification for Dublin - logs will be the older format for now

Level 1: Log specification compliance via local project changes, library or wrapper

Continue to use your own library/wrapper if you have one like in SDC and AAI, change individual source files.

Level 2: Log specification compliance via SLF4J supplied logging-analytics library

Use Luke's SLF4J library directly as wrapper for MDCs and Markers by calling the library from within each function.

Level 3: Log specification compliance via AOP library over SLF4J supplied logging-analytics library

Use a spring based Aspect library that emits Markers automatically around function calls and retrofit your code to log via Luke's SLF4J for internal log messages.

ONAP Application Logging Specification v1.2 (Casablanca)#LoggingWithAOP

Logging Library Location and Use

see https://git.onap.org/logging-analytics/tree/reference/logging-slf4j

and usage ONAP Development#DeveloperUseoftheLoggingLibrary  ONAP Development#KubernetesDevOps and Logging User Guide#LoggingDevOps

see Spring AOP example (minimal changes to existing code base) WIP in  LOG-135 - Getting issue details... STATUS  documented at ONAP Development#LoggingWithAOP

EELF

EELF is the Event and Error Logging Framework, described at https://github.com/att/EELF.

EELF abstracts your choice of logging provider, and decorates the familiar Logger contracts with features like:

  • Localization. 
  • Error codes. 
  • Generated wiki documentation.
  • Separate audit, metrics, security error and debug logs. 

EELF is a facade, so logging output is configured in two ways:

  1. By selection of a logging provider such as Logback or Log4j, typically via the classpath. 
  2. By way of a provider configuration document, typically logback.xml or log4j.xml. See ONAP Application Logging Specification v1.2 (Casablanca)#Providers.

SLF4J

SLF4J is a logging facade, and a humble masterpiece. It combines what's common to all major, modern Java logging providers into a single interface. This decouples the caller from the provider, and encourages the use of what's universal, familiar and proven. 

EELF also logs via SLF4J's abstractions as the default provider.

Providers

Logging providers are normally enabled by their presence in the classpath. This means the decision may have been made for you, in some cases implicitly by dependencies. If you have a strong preference then you can change providers, but since the implementation is typically abstracted behind EELF or SLF4J, it may not be worth the effort.

Logback

Logback is the most commonly used provider. It is generally configured by an XML document named logback.xml. See ONAP Application Logging Specification v1.2 (Casablanca)#Configuration.

See HELM template https://git.onap.org/logging-analytics/tree/reference/provider/helm/logback

Log4j 2.X

Log4j 2.X is somewhat less common than Logback, but equivalent. It is generally configured by an XML document named log4j.xml. See ONAP Application Logging Specification v1.2 (Casablanca)#Configuration.

Log4j 1.X

Strongly discouraged from Beijing onwards, since 1.X is EOL, and since it does not support escaping, so its output may not be machine-readable. See https://logging.apache.org/log4j/1.2/.

This affects OpenDaylight-based components like SDNC and APPC, since ODL releases prior to Carbon bundled Log4j 1.X, and make it difficult to replace. The Common Controller SDK Project project targets ODL Carbon, so remaining instances of Log4j 1.X should disappear by the time of the Casablanca release.

What to Log

The purpose of logging is to capture diagnostic information.

An important aspect of this is analytics, which requires tracing of requests between components. In a large, distributed and scalable system such as ONAP this is critical to understanding behavior and performance. 

General 

It isn't the aim of this document to reiterate Best Practices, so advice here is general: 

  • Use a logging facade such as SLF4J or EELF. 
  • Write log messages in English.
  • Write meaningful messages. Consider what will be useful to consumers of logger output. 
  • Log at the appropriate level. Be aware of the volume of logs that will be produced.
  • Safeguard the information in exceptions, and ensure it is never lost.
  • Use error codes to characterize exceptions. 
  • Log in a machine-readable format. See Conventions.
  • Log for analytics as well as troubleshooting.

Others have written extensively on this: 

Standard Attributes

These are attributes common to all log messages. They are either:

  • Explicitly required by logging APIs:
    • Logger
    • Level
    • Message
    • Exception (note that exception is the only standard attribute that may routinely be empty).
  • Implicitly derived by the logging provider:
    • Timestamp
    • Thread.

Which means you normally can't help but report them. 

See https://www.slf4j.org/api/org/slf4j/Logger.html and https://logback.qos.ch/manual/layouts.html#ClassicPatternLayout for their origins and use.

Logger Name

This indicates the name of the logger that logged the message.

In Java it is convention to name the logger after the class or package using that logger.

  • In Java, report the class or package name.
  • In Python, the class or source filename.

Most other languages will fit one of those patterns. 

Level

Severity, typically drawn from the enumeration {TRACE, DEBUG, INFO, WARN, ERROR}.

Think carefully about the information you report at each log level. The default log level is INFO.

Some loggers define non-standard levels, like FINE, FINER, WARNING, SEVERE, FATAL or CRITICAL. Use these judiciously, or avoid them.

Message

The free text payload of a log event. 

This is the most important item of information in most log messages. See ONAP Application Logging Specification v1.2 (Casablanca)#General guidelines.

Internationalization

Diagnostic log messages generally do not need to be internationalized.

Parameterization

Parameterized messages allow serialization to be deferred until AFTER level threshold checks. This means the cost is never incurred for messages that won't be written. 

  • Favor parameterized messages, especially for INFO and DEBUG logging.
  • Perform expensive serialization in the #toString method of wrapper classes.

For example:

logger.debug("Relax - this won't hurt: {}", new ToStringWrapper(costlyToSerialize));

Exception

The error stacktrace, where applicable.

Log unabridged stacktraces upon error.

When rethrowing, ensure that frame information is not lost:

  • By logging the original exception at the point where it was caught. 
  • By setting the original exception as the cause when rethrowing.

Timestamp

Logged as an ISO8601 UTC datetime. Millisecond (or greater) precision is preferable.

For example:

2018-07-05T20:21:34.794Z

Offset timestamps are OK provided the offset is included. (In the above example, the "Z" is a shorthand indicating an offset of zero – UTC).

Thread

The name of the thread from which the log message was emitted.

Thread names don't necessarily convey useful information, and their reliability depends on the thread model implemented by different runtimes, but they are sometimes used in heuristic analysis.

Efficiency

There is tension between utility and efficiency. IO bandwidth is finite, and the cost of serialization can be significant, especially at higher diagnostic levels.

Methods and Line Numbers

Many loggers can use reflection to emit the originating (Java) method, and even individual line numbers.

This information is certainly useful, but very expensive. Most logging implementations recommend that this not be enabled in production.

Level Thresholds

Level indicates severity.

Logger output is typically filtered by logger and level. The default logging level is INFO, so particular consideration should be given to the efficiency of INFO-level logging.

When DEBUG-level logging is configured, it's probably for good reason, and a greater overhead is expected. Be aware that it's not unusual for DEBUG logging to be left enabled inadvertently, however.

WARN and ERROR-level messages are of higher value, and comparatively rare, so their cost is less of a concern.

Conditionals

A common pattern is to place conditionals around (expensive) serialization.

For example:

if (logger.isDebugEnabled()) {
    logger.debug("But this WILL hurt: " + costlyToSerialize);
}

Parameterized logging is preferable.

Context

MDCs

A Mapped Diagnostic Context (MDC) allows an arbitrary string-valued attribute to be attached to a Java thread via a ThreadLocal variable. The MDC's value is then emitted with each message logged by that thread. The set of MDCs associated with a log message is serialized as unordered name-value pairs (see ONAP Application Logging Specification v1.2 (Casablanca)#Text Output).

A good discussion of MDCs can be found at https://logback.qos.ch/manual/mdc.html

Example

From Luke Parker's call graph work in https://git.onap.org/logging-analytics/tree/reference/logging-slf4j-demo

LogEntry(markers=ENTRY, logger=ComponentAlpha, requestID=eb3e0dc2-6c3c-4bb7-8ed6-e5cc4ec7aad2, invocationID=06c815ef-5969-45cc-b319-d0dbcde89329, timestamp=Tue May 08 04:23:27 AEST 2018)


Mapped Diagnostic Context Table

Pipe OrderNameTypeGroupDescription

Applicable

(per log file)

Marker Associations

Moved

MDC

to


standard

attribute


Removed

(was in

older

spec)

Required?

Y/N/C

(C= context dependent)

N = not required

L=Library provided


DerivedHistoricalAcumos
ref
Use Cases

Code References

1LogTimestamplog system
use %d field - see %d{"yyyy-MM-dd'T'HH:mm:ss.SSSXXX",UTC}



L




2EntryTimestampMDC
if part of an ENTRY marker log



C




3InvokeTimestampMDC
if part of an INVOKE marker log



C




4

RequestID

MDC

UUID to track the processing of each client request across all the ONAP components involved in its processing





Y



In general

https://git.onap.org/logging-analytics/tree/reference/logging-slf4j-demo/src/test/java/org/onap/logging/ref/slf4j/demo/component/AbstractComponentTest.java

5InvocationIDMDC

UUID correlates log entries relating to a single invocation of a single component
In the case of an asynchronous request, the InvocationID should come from the original request 





Y




6InstanceIDMDC

UUID to differentiate between multiple instances of the same (named) log writing service/application





Y
was InstanceUUID


7ServiceInstanceIDMDC





C




8threadlog system
use %thread field



L




9ServiceName

The service inside the partner doing the call - includes API name





Y




10PartnerName

unauthenticated = The part of the URI specifying the agent that the caller used to make the call to the component that is logging the message.

authenticated = userid

  1. If an authenticated API, then log the userid
  2. Otherwise, if the HTTP header "X-ONAP-PartnerName" was provided, then log that (note: this was a direction that we seemed to be going but never completed)
  3. Otherwise, if the HTTP header "User-Agent" was provided, then log that
  4. Otherwise, log "UNKNOWN" (since the field is currently required, something must be in it)




Y

user

11StatusCode

This field indicates the high level status of the request - one of (COMPLETE, ERROR, INPROGRESS)






Y



20180807: expand from 2 fields to add "INPROGRESS"

addresses Chris Lott question on https://wiki.acumos.org/display/OAM/Log+Standards

12ResponseCode

This field contains application-specific error codes.



Y






13ResponseDesc

This field contains a human readable description of the ResponseCode



Y






14level

%level



L




15Severity

Logging level by default aligned with the reported log level - one of INFO/TRACE/DEBUG/WARN/ERROR/FATAL





Y

level (but numbers)

16ServerIPAddress






C




17ElapsedTime






C




18ServerFQDN

The VM FQDN if the server is virtualized. Otherwise the host name of the logging component.





Y






19ClientIPAddress

This field contains the requesting remote client application’s IP address if known. Otherwise empty.





Y






20VirtualServerName






C




21ContextName






C




22TargetEntity

The name of the ONAP component or sub-component, or external entity, at which the operation activities captured in this metrics log record is invoked.





C




23TargetServiceName

The name  of the API or operation activities invoked (name on the remote/target application) at the TargetEntity.  





C




24TargetElement

VNF/PNF context dependent - on CRUD operations of VNF/PNFs

The IDs that need to be covered with the above Attributes are

       -        VNF_ID OR VNFC_ID : (Unique identifier for a VNF asset that is being instantiated or that would generate an alarms)

       -        VSERVER_ID OR VM_ID (or vmid): (Unique identified for a virtual server or virtual machine on which a Control Loop action is usually taken on, or that is installed  as part of instantiation flow)

       -        PNF : (What is the Unique identifier used within ONAP)





C




25UserMDC
User - used for %X{user}  



C




26p_loggerlog system
The name of the class doing the logging (in my case the ApplicationController – close to the targetservicename but at the class granular level - this field is %logger



L




27p_mdclog system

allows forward compatability with ELK indexers that read all MDCs in a single field - while maintaining separate MDCs above.


The key/value pairs all in one pipe field (will have some duplications currently with MDC’s that are in their own pipe – but allows us to expand the MDC list – replaces customvalue1-3 older fields - this field is %mdc





L




28p_messagelog system
The marker labels INVOKE, ENTRY, EXIT – and later will also include DEBUG, AUDIT, METRICS, ERROR when we go to 1 log file - this field is %marker



L





RootExceptionlog system
%rootException - Dublin spec only



L




29p_markerlog system
The marker labels INVOKE, ENTRY, EXIT – and later will also include DEBUG, AUDIT, METRICS, ERROR when we go to 1 log file - this field is %marker



L





Logging

Via SLF4J:

import java.util.UUID;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.slf4j.MDC;
// ...
final Logger logger = LoggerFactory.getLogger(this.getClass());
MDC.put("SomeUUID", UUID.randomUUID().toString());
try {
    logger.info("This message will have a UUID-valued 'SomeUUID' MDC attached.");
    // ...
}
finally {
    MDC.clear();
}

EELF doesn't directly support MDCs, but its default provider (where com.att.eelf.configuration.SLF4jWrapper is the configured EELF provider)normally logs via SLF4J, and SLF4J will receive any MDC that is set:

import java.util.UUID;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.slf4j.MDC;
import com.att.eelf.configuration.EELFLogger;
import com.att.eelf.configuration.EELFManager;
// ...
final EELFLogger logger = EELFManager.getInstance().getLogger(this.getClass());
MDC.put("SomeUUID", UUID.randomUUID().toString());
try {
    logger.info("This message will have a UUID-valued 'SomeUUID' MDC attached.");
    // ...
}
finally {
    MDC.clear();
}

Serializing

Output of MDCs must ensure that:

  • All reported MDCs are logged with both name AND value. Logging output should not treat any MDCs as special.
  • All MDC names and values are escaped.

Escaping in Logback configuration can be achieved with:

%replace(%replace(%mdc){'\t','\\\\t'}){'\n','\\\\n'}

MDC - RequestID

This is often referred to by other names, including "Transaction ID", and one of several (pre-standardization) REST header names including X-TransactionIDX-ECOMP-TransactionIDX-ECOMP-RequestID and X-ONAP-RequestID.

ONAP logging uses a universally unique "RequestID" value in log records to track the processing of each client request across all the ONAP components involved in its processing. RequestID be propagated across all interfaces, not just REST Interfaces.

This value:

  • Is logged as a RequestID MDC. 
  • Is propagated between components in REST calls as an X-ONAP-RequestID HTTP header.

Receiving the X-ONAP-RequestID will vary by component according to APIs and frameworks. In general:

import javax.ws.rs.core.HttpHeaders;
// ...
final HttpHeaders headers = ...;
// ...
String txId = headers.getRequestHeaders().getFirst("X-ONAP-RequestID");
if (StringUtils.isBlank(txId)) {
	txId = UUID.randomUUID().toString();
}
MDC.put("RequestID", txID);

Setting the X-ONAP-RequestID likewise will vary. For example:

final String txID = MDC.get("RequestID");
HttpURLConnection cx = ...;
// ...
cx.setRequestProperty("X-ONAP-RequestID", txID);

Note that it's been suggested that for the duration of Casablanca we report the request ID using all three headers:

  1. X-ONAP-RequestID (canonical)
  2. X-RequestID (deprecated)
  3. X-TransactionID (deprecated)

MDC - InvocationID

InvocationID is similar to RequestID, but where RequestID correlates records relating a single, top-level invocation of ONAP as it traverses many systems, InvocationID correlates log entries relating to a single invocation of a single component. Typically this means via REST, but in certain cases an InvocationID may be allocated without a new invocation, e.g. when a request is retried.

RequestID and InvocationID allow an execution graph to be derived. This requires that:

  • The relationship between RequestID and InvocationID is reported. 
  • The relationship between caller and recipient is reported for each invocation.

The proposed approach is that:

  • Callers:
    • Issue a new, unique InvocationID UUID for each downstream call they make. 
    • Log the new InvocationID, indicating the intent to invoke:
      • With Markers INVOKE, and SYNCHRONOUS if the invocation is synchronous.
      • With their own InvocationID still set as an MDC.
    • Pass the InvocationID as an X-InvocationID REST header.
  • Invoked components:
    • Retrieve the InvocationID from REST headers upon invocation, or generate a UUID default. 
    • Set the InvocationID MDC.
    • Write a log entry with the Marker ENTRY. (In EELF this will be to the AUDIT log).
    • Act as per Callers in all downstream requests. 
    • Write a log entry with the Marker EXIT upon return. (In EELF this will be to the METRICS log).
    • Unset all MDCs on exit.

That seems onerous, but:

  • It's only a few calls. 
  • It can be largely abstracted in the case of EELF logging.

MDC - InstanceID

(formerly InstanceUUID)

If known, this field contains a universally unique identifier used to differentiate between multiple instances of the same (named) log writing service/application. Its value is set at instance creation time (and read by it, e.g., at start/initialization time from the environment). This value should be picked up by the component instance from its configuration file and subsequently used to enable differentiation of log records created by multiple, locally load balanced ONAP component or subcomponent instances that are otherwise identically configured.

Handles parallel threads or running across a load balanced set of microservices - for identification.

MDC - PartnerName

This field should contain the name of the client application user agent or user invoking the API. The identification of the entity that made the request being served. For a serving API that is authenticating the request, this should be the authenticated username or equivalent (e.g. a userid or a mechid). 

For example SDC-BE instead of just SDC for the overall pods

This is often used for heuristic analysis to identify invocations between ONAP individual ONAP components. Its value has never been clearly stipulated, so a common problem has been a lack of consistency. 

There is no clear consensus, but:

  • Use the short name of your component, e.g. xyzdriver. (try to incorporate both levels - the container name and the pod the container is in within the kubernetes deployment)
  • Values should be human-readable. 
  • Values should be fine-grained enough to disambiguate subcomponents where it's likely to matter. This is subjective. 
  • Be consistent: your component should ALWAYS report same value. 

Real-life examples include MSO, bpmnclient, BPELClient, (all of which are reported by SO), openECOMP (SDNC), vid (VID!) etc. (See the problem?)

Usage overlaps with InvocationID, which doesn't mean PartnerName gets retired, but which might mean it serves a more descriptive purpose. (Since it hasn't proven to be a great way of generating a call graph).

MDC - ServiceName

The URI that the caller used to make the call to the component that is logging the message.

For EELF Audit log records that capture API requests, this field contains the name of the API invoked at the component creating the record (e.g., Layer3ServiceActivateRequest).

For EELF Audit log records that capture processing as a result of receipt of a message, this field should contain the name of the module that processes the message.

Usage is the same for indexable logs. 

MDC - StatusCode

This field indicates the high level status of the request. It must have the value COMPLETE when the request is successful and ERROR when there is a failure.  And INPROGRESS for states between the two.

Discussion: status/response/severity relationship

status = global, response below is app specific

Ability to render severity-like line in a non-debug log

MDC - ResponseCode

This field contains application-specific error codes. For consistency, common error categorizations should be used.

MDC - Severity

OPS specific

Use/Map existing https://www.slf4j.org/api/org/apache/commons/logging/Log.html

ENUM is INFO/TRACE/DEBUG/WARN/ERROR/FATAL

By default - align this severity with the reported log level

(optionally a way to map actual level from reported level if required)

MDC - ServerFQDN

This field contains the Virtual Machine (VM) Fully Qualified Domain Name (FQDN) if the server is virtualized. Otherwise, it contains the host name of the logging component.

Best effort (ip, fqdn)

(previously covered by removed "Server" field)

redundancy between clientIP, server, virtualServer name is OK - and helpfull for runtime OPS/Hybrid envs

supercedes virtualServerName

Report what is in the http header

Discussion: roll all 3 fqdn, hostname or ip into one field - do we ever need two of the 3 fields concurrently?

TODO: Verify what is also available from a filebeat agent when it exists

MDC - ClientIPAddress

This field contains the requesting remote client application’s IP address if known. Otherwise this field can be empty.

We don't differentiate between inside/outside ONAP for the IP - this supports hybrid environments

Derived from the system

redundancy between clientIP, server, virtualServer name is OK - and helpfull for runtime OPS/Hybrid envs

Discussion: do we need both ip and fqdn fields?

Report what is in the http header

MDC - EntryTimestamp

Date-time that processing activities being logged begins. The value should be represented in UTC and formatted per ISO 8601, such as “2015-06-03T13:21:58+00:00”. The time should be shown with the maximum resolution available to the logging component (e.g., milliseconds, microseconds) by including the appropriate number of decimal digits. For example, when millisecond precision is available, the date-time value would be presented as, as “2015-06-03T13:21:58.340+00:00”.

Context dependent on whether part of an ENTRY marker

Audit requires this field

MDC - InvokeTimestamp

Timestamp on invocation start.

Context dependent on whether part of an INVOKE marker

metrics needs this field.

MDC - TargetEntity

It contains the name of the ONAP component or sub-component, or external entity, at which the operation activities captured in this metrics log record is invoked.

Example: SDC-BE

MDC - TargetServiceName

It contains the name  of the API or operation activities invoked (name on the remote/target application) at the TargetEntity.  

Example: Class name of rest endpoint

Discussion: on building call graph vs human readable single line - keep for human readable

Used as valuable URI - to annnote invoke marker

Review in terms of ONAP Application Logging Specification v1.2 (Casablanca)#Marker-INVOKE - possiblly add INVOKE-return - to filter reporting

TBD: Coverage by log file type (debug, trace, ...)

TBD: cover off discussion on reducing log files to two (DEBUG/rest) for C* release

MDC - TargetElement

VNF/PNF context dependent - on CRUD operations of VNF/PNFs

The IDs that need to be covered with the above Attributes are

       -        VNF_ID OR VNFC_ID : (Unique identifier for a VNF asset that is being instantiated or that would generate an alarms)

       -        VSERVER_ID OR VM_ID (or vmid): (Unique identified for a virtual server or virtual machine on which a Control Loop action is usually taken on, or that is installed  as part of instantiation flow)

       -        PNF : (What is the Unique identifier used within ONAP)

MDCs - the Rest

Other MDCs are logged in a wide range of contexts.

Certain MDCs and their semantics may be specific to EELF log types.

Deprecation

Indexing makes many of the remaining attributes redundant. So for example:

  • There is considerable duplication:
    • BeginTimestamp, EndTimestamp, ElapsedTime. These are all captured elsewhere (and ElapsedTime is even redundant within that triplet).
    • Server, ServerIPAddress, ServerFQDN, VirtualServiceName. Overkill. Should be one, plus optionally ClientIPAddress (or some variant thereof).
    • TargetEntity, TargetServiceName, not obviously different to similar attributes.
  • There is junk:
    • Severity? Nagios codes?
    • ProcessKey?
    • All the stuff that's already grayed out in the table above.
    • People may defend these individually, maybe vigorously, but they're domain-specific:
      • That absolutely doesn't mean they can't be used.
      • Beats configuration allows ad hoc contexts to be indexed.
      • But perhaps they don't belong in this kind of spec.
  • Redundant attributes *do* matter, because:
    • Populating and propagating everything prescribed by the guide approaches being prohibitive. People won't do it, and people *don't* do it. 
    • If something might be in one of several attributes then that's worse than it being in just one.
  • That means:
    • We're left with only two MANDATORY attributes, necessary to build invocations graphs:
      • RequestID - top-level transactions. 
      • InvocationID - inter-component invocations.
    • And a minimal number of OPTIONAL descriptive attributes: ServiceInstanceID, InstanceID, Server, StatusCode, ResponseCode, ResponseDescription.
    • Those are the ones we need to document clearly, support in APIs, etc.
    • That's <=10, a manageable number. 
    • And again, that matters because if the number isn't manageable, people won't (and don't) comply. 

  
Some of that is contentious, but it's just talking points at this stage. We've tiptoed around the issue of extant conventions, and the ongoing result is a lot of attributes that nobody's really sure how to use, and which don't result in better logs. In Casablanca it's time to be less conservative. 

Examples


Markers

Markers unambiguously assign semantics to individual log messages. They allow messages that have a specific meaning to be cheaply and easily identified in logger output, without inherently unreliable (and more costly, and less easily enforced) schemes like scanning for magic strings in the text of each log message. 

ONAP logging requires the emission of Markers reporting entry, exit and invocation as the execution of requests pass between ONAP components. This information is used to generate a call graph.

ONAP components are also free to use Markers for their own purposes. Any Markers that are logged will be automatically indexed by Logstash. 

Markers differ from MDCs in two important ways:

  1. They have a name, but no value. They are a tag - like a label. 
  2. They are specified explicitly on invocation. They are not ThreadLocal, and they do not propagate.

EELF's implementation can be modified to emit Markers, but its public APIs do not allow them to be passed in by callers.

Log4J docs, SLF4J docs

see code on reference folder in git clone ssh://michaelobrien@gerrit.onap.org:29418/logging-analytics


Examples

Marker formatting is using tabs -  LOG-553 - Getting issue details... STATUS  - see https://git.onap.org/logging-analytics/tree/reference/logging-slf4j/src/test/resources/logback.xml#n9

Note there are 3 tabs (see p_mak in logback.xml) delimiting the MARKERS (ENTRY and EXIT) at the end of each line
<property name="p_mak" value="%replace(%replace(%marker){'\t', '\\\\t'}){'\n','\\\\n'}"/>


2018-07-05T20:21:34.794Z	http-nio-8080-exec-2	INFO	org.onap.demo.logging.ApplicationService	InstanceID=ede7dd52-91e8-45ce-9406-fbafd17a7d4c, RequestID=f9d8bb0f-4b4b-4700-9853-d3b79d861c5b, ServiceName=/logging-demo/rest/health/health, InvocationID=8f4c1f1d-5b32-4981-b658-e5992f28e6c8, InvokeTimestamp=2018-07-05T20:21:26.617Z, PartnerName=, ClientIPAddress=0:0:0:0:0:0:0:1, ServerFQDN=localhost			ENTRY	
2018-07-05T20:22:09.268Z	http-nio-8080-exec-2	INFO	org.onap.demo.logging.ApplicationService	ResponseCode=, InstanceID=ede7dd52-91e8-45ce-9406-fbafd17a7d4c, RequestID=f9d8bb0f-4b4b-4700-9853-d3b79d861c5b, ServiceName=/logging-demo/rest/health/health, ResponseDescription=, InvocationID=8f4c1f1d-5b32-4981-b658-e5992f28e6c8, Severity=, InvokeTimestamp=2018-07-05T20:21:26.617Z, PartnerName=, ClientIPAddress=0:0:0:0:0:0:0:1, ServerFQDN=localhost, StatusCode=			EXIT	


Logging

Via SLF4J:

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.slf4j.Marker;
import org.slf4j.MarkerFactory;
// ...
final Logger logger = LoggerFactory.getLogger(this.getClass());
final Marker marker = MarkerFactory.getMarker("MY_MARKER");
logger.warn(marker, "This warning has a 'MY_MARKER' annotation.");

EELF does not allow Markers to be set directly. See notes on the InvocationID MDC.

Serializing

Marker names also need to be escaped, though they're much less likely to contain problematic characters than MDC values.

Escaping in Logback configuration can be achieved with:

%replace(%replace(%marker){'\t','\\\\t'}){'\n','\\\\n'}


Marker - ENTRY

TODO: add table detailing which log files each marker is a part of - from a use case perspective

This should be reported as early in invocation as possible, immediately after setting the RequestID and InvocationID MDCs.

It can be automatically set by EELF, and written to the AUDIT log. 

It must be manually set otherwise. Candidate for framework

EELF:

EELF
final EELFLogger logger = EELFManager.getAuditLogger();
logger.auditEvent("Entering.");

SLF4J:

SLF4J
public static final Marker ENTRY = MarkerFactory.getMarker("ENTRY");
// ... 
final Logger logger = LoggerFactory.getLogger(this.getClass());
logger.debug(ENTRY, "Entering.");

Marker - EXIT

This should be reported as late in invocation as possible, immediately before unsetting the RequestID and InvocationID MDCs.

It can be automatically reported by EELF, and written to the METRICS log. 

It must be manually set otherwise. 

EELF:

EELF
final EELFLogger logger = EELFManager.getMetricsLogger();
logger.metricsEvent("Exiting.");

SLF4J:

SLF4J
public static final Marker EXIT = MarkerFactory.getMarker("EXIT");
// ... 
final Logger logger = LoggerFactory.getLogger(this.getClass());
logger.debug(EXIT, "Exiting.");

Marker - INVOKE

This should be reported by the caller of another ONAP component via REST, including a newly allocated InvocationID, which will be passed to the caller. 

SLF4J:

SLF4J
public static final Marker INVOKE = MarkerFactory.getMarker("INVOKE");
// ...

// Generate and report invocation ID. 

final String invocationID = UUID.randomUUID().toString();
MDC.put(MDC_INVOCATION_ID, invocationID);
try {
    logger.debug(INVOKE_SYNCHRONOUS, "Invoking synchronously ... ");
}
finally {
    MDC.remove(MDC_INVOCATION_ID);
}

// Pass invocationID as HTTP X-InvocationID header.

callDownstreamSystem(invocationID, ... );

EELF examples of INVOCATION_ID reporting, without changing published APIs.

Marker - INVOKE-RETURN

This should be reported by the caller of another ONAP component via REST on return.

InvokeTimestamp context dependent MDC will be reported here.

SLF4J:
SLF4J
TBD


Marker - INVOKE-SYNCHRONOUS

This should accompany INVOKE when the invocation is synchronous.

SLF4J:
SLF4J
public static final Marker INVOKE_SYNCHRONOUS;
static {
    INVOKE_SYNCHRONOUS = MarkerFactory.getMarker("INVOKE");
    INVOKE_SYNCHRONOUS.add(MarkerFactory.getMarker("SYNCHRONOUS"));
}
// ...

// Generate and report invocation ID. 

final String invocationID = UUID.randomUUID().toString();
MDC.put(MDC_INVOCATION_ID, invocationID);
try {
    logger.debug(INVOKE_SYNCHRONOUS, "Invoking synchronously ... ");
}
finally {
    MDC.remove(MDC_INVOCATION_ID);
}

// Pass invocationID as HTTP X-InvocationID header.

callDownstreamSystem(invocationID, ... );

EELF example of SYNCHRONOUS reporting, without changing published APIs. 

Error Codes

Error codes are reported as MDCs. 

TODO: add to table

Exceptions should be accompanied by an error code. Typically this is achieved by incorporating error codes into your exception hierarchy and error handling. ONAP components generally do not share this kind of code, though EELF defines a marker interface (meaning it has no methods) EELFResolvableErrorEnum.

A common convention is for error codes to have two components:

  1. A prefix, which identifies the origin of the error. 
  2. A suffix, which identifies the kind of error.

Suffixes may be numeric or text. They may also be common to more than one component.

For example:

COMPONENT_X.STORAGE_ERROR

Output Format

Several considerations:

  1. Logs should be human-readable (within reason). 
  2. Shipper and indexing performance and durability depends on logs that can be parsed quickly and reliably.
  3. Consistency means fewer shipping and indexing rules are required.

Text Output

TODO: 20190115 - do not take the example in this section until I reverify it in terms of the reworked spec example in 

https://gerrit.onap.org/r/#/c/62405/20/reference/logging-kubernetes/logdemonode/charts/logdemonode/resources/config/logback.xml

LOG-630 - Getting issue details... STATUS

ONAP needs to strike a balance between human-readable and machine-readable logs. This means:

  • The use of PIPE (|) as a delimiter. (Previously tab, and before that ... pipe).
  • Escaping all messages, exceptions, MDC values, Markers, etc. to replace tabs and pipes in their content.
  • Escaping all newlines with \n so that each entry is on one line

In logback, this looks like:

<property name="defaultPattern" value="%nopexception%logger
|%date{yyyy-MM-dd'T'HH:mm:ss.SSSXXX,UTC}
|%level
|%replace(%replace(%replace(%message){'\t','\\\\t'}){'\n','\\\\n'}){'|','\\\\|'}
|%replace(%replace(%replace(%mdc){'\t','\\\\t'}){'\n','\\\\n'}){'|','\\\\|'}
|%replace(%replace(%replace(%rootException){'\t','\\\\t'}){'\n','\\\\n'}){'|','\\\\|'}
|%replace(%replace(%replace(%marker){'\t','\\\\t'}){'\n','\\\\n'}){'|','\\\\|'}
|%thread
|%n"/>

The output of which, with MDCs, a Marker and a nested exception, with newlines added for readability, looks like:

org.onap.example.component1.subcomponent1.LogbackTest
|2017-08-06T16:09:03.594Z
|ERROR
|Here's an error, that's usually bad
|key1=value1, key2=value2 with space, key5=value5"with"quotes, key3=value3\nwith\nnewlines, key4=value4\twith\ttabs
|java.lang.RuntimeException: Here's Johnny
\n\tat org.onap.example.component1.subcomponent1.LogbackTest.main(LogbackTest.java:24)
\nWrapped by: java.lang.RuntimeException: Little pigs, little pigs, let me come in
\n\tat org.onap.example.component1.subcomponent1.LogbackTest.main(LogbackTest.java:27)
|AMarker1
|main     

Default Logstash indexing rules understand output in this format.

XML Output

For Log4j 1.X output, since escaping is not supported, the best alternative is to emit logs in XML format, we will expand on JSON support

There may be other instances where XML (or JSON) output may be desirable. Default indexing rules support 

Default Logstash indexing rules understand the XML output of Log4J's XMLLayout.

Note that we're hoping that support for indexing of XML output can be deprecated during Beijing. This relies on the adoption of ODL Carbon, which should eliminate any remnant of Log4J1.X.

Output Location

Standardization of output locations makes logs easier to locate and ship for indexing. 

Expand on out-of-container locations off /dockerdata-nfs

Logfiles should default to beneath /var/log, and beneath /var/log/ONAP in the case of core ONAP components:

/var/log/ONAP/<component>[/<subcomponent>]/*.log

For the duration of Beijing, logs  will be written to a separate directory, /var/log/ONAP_EELF:

expand on Casablanca differences, and adding as a config setting in OOM

/var/log/ONAP_EELF/<component>[/<subcomponent>]/*.log

Configuration

Logging providers should be configured by file. Files should be at a predictable, static location, so that they can be written by deployment automation. Ideally this should be under /etc/ONAP, but compliance is low.

Locations

All logger provider configuration document locations namespaced by component and (if applicable) subcomponent by default:

/etc/ONAP/<component>[/<subcomponent>]/<provider>.xml

Where <provider>.xml, will typically be one of:

  1. logback.xml
  2. log4j.xml
  3. log4j.properties

Reconfiguration

Logger providers should reconfigure themselves automatically when their configuration file is rewritten. All major providers should support this. 

The default interval is 10s. 

Overrides

The location of the configuration file MAY be overrideable, for example by an environment variable, but this is left for individual components to decide. 

Archetypes

Configuration archetypes can be found in the ONAP codebase https://git.onap.org/logging-analytics/tree/. Choose according to your provider, and whether you're logging via EELF. Efforts to standardize them are underway so the ones you should be looking for are where pipe (|) is used as a separator. (Previously it was "|").

Retention

Logfiles are often large. Logging providers allow retention policies to be configured. 

Retention has to balance:

  • The need to index logs before they're removed.
  • The need to retain logs for other (including regulatory) purposes. 

Defaults are subject to change. Currently they are:

  1. Files <= 50MB before rollover. 
  2. Files retain for 30 days. 
  3. Total files capped at 10GB. 

In Logback configuration XML:

<appender name="file" class="ch.qos.logback.core.rolling.RollingFileAppender">
    <file>${outputDirectory}/${outputFilename}.log</file>
    <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
        <fileNamePattern>${outputDirectory}/${outputFilename}.%d{yyyy-MM-dd}.%i.log.zip</fileNamePattern>
        <maxFileSize>50MB</maxFileSize>
        <maxHistory>30</maxHistory>
        <totalSizeCap>10GB</totalSizeCap>
    </rollingPolicy>
    <encoder>
        <!-- ... -->
    </encoder>
</appender>

Types of EELF Logs

EELF guidelines stipulate that an application should output log records to four separate files:

  1. audit
  2. metrics
  3. error
  4. debug

This applies only to EELF logging. Components which log directly to a provider may choose to emit the same set of logs, but most do not. 

Audit Log

An audit log is required for EELF-enabled components, and provides a summary view of the processing of a (e.g., transaction) request within an application. It captures activity requests that are received by an ONAP component, and includes such information as the time the activity is initiated, then it finishes, and the API that is invoked at the component.

Audit log records are intended to capture the high level view of activity within an ONAP component. Specifically, an API request handled by an ONAP component is reflected in a single Audit log record that captures the time the request was received, the time that processing was completed, as well as other information about the API request (e.g., API name, on whose behalf it was invoked, etc).

Metrics Log

A metrics log is required for EELF-enabled components, and provides a more detailed view into the processing of a transaction within an application. It captures the beginning and ending of activities needed to complete it. These can include calls to or interactions with other ONAP or non-ONAP entities.

Suboperations invoked as part of the processing of the API request are logged in the Metrics log. For example, when a call is made to another ONAP component or external (i.e., non-ONAP) entity, a Metrics log record captures that call. In such a case, the Metrics log record indicates (among other things) the time the call is made, when it returns, the entity that is called, and the API invoked on that entity. The Metrics log record contain the same RequestID as the Audit log record so the two can be correlated.

Note that a single request may result in multiple Audit log records at an ONAP component and may result in multiple Metrics log records generated by the component when multiple suboperations are required to satisfy the API request captured in the Audit log record.

Error Log

An error log is required for EELF-enabled components, and is intended to capture info, warn, error and fatal conditions sensed (“exception handled”) by the software components.

This includes previous logs that went to application.log

Debug Log

A debug log is optional for EELF-enabled components, and is intended to capture whatever data may be needed to debug and correct abnormal conditions of the application.

Engine.out

Console logging may also be present, and is intended to capture “system/infrastructure” records. That is stdout and stderr assigned to a single “engine.out” file in a directory configurable (e.g. as an environment/shell variable) by operations personnel.

Application Log (deprecated)

see example in https://git.onap.org/oom/tree/kubernetes/portal/charts/portal-sdk/resources/config/deliveries/properties/ONAPPORTALSDK/logback.xml

We no longer support this 5th log file - see error.log 

Log File Locations

There are several locations where logs are available on the host, on the nfs share and in each application and filebeat container

Logs on each Kubernetes host VM - docker empty.dir shares

Logs on the /dockerdata-nfs share across hosts

Logs on each microservice docker container - /var/log/onap

Logs on each filebeat docker container sidecar - /var/log/onap

New ONAP Component Checklist

Add this procedure to the Project Proposal Template

By following a few simple rules:

  • Your component's output will be indexed automatically. 
  • Analytics will be able to trace invocation through your component.

Obligations fall into two categories:

  1. Conventions regarding configuration, line format and output. 
  2. Ensuring the propagation of contextual information. 

You must:

  1. Choose a Logging provider and/or EELF. Decisions, decisions.
  2. Create a configuration file based on an existing archetype. See ONAP Application Logging Specification v1.2 (Casablanca)#Configuration.
  3. Read your configuration file when your components initialize logging.
  4. Write logs to a standard location so that they can be shipped by Filebeat for indexing. See ONAP Application Logging Specification v1.2 (Casablanca)#Output Location.
  5. Report transaction state:
    1. Retrieve, default and propagate RequestID. See ONAP Application Logging Specification v1.2 (Casablanca)#MDC - RequestID.
    2. At each invocation of one ONAP component by another:
      1. Initialize and propagate InvocationID. See ONAP Application Logging Specification v1.2 (Casablanca)#MDC - Invocation ID.
      2. Report INVOKE and SYNCHRONOUS markers in caller. 
      3. Report ENTRY and EXIT markers in recipient. 
  6. Write useful logs!

 They are unordered. 

What's New

(Including what WILL be new in v1.2  / R2). 

  1. Field separator reverted to pipe. 
  2. Dual appenders in Logback and Log4j reference configurations:
    1. Indexable, for shipping and indexing. 
    2. EELF, for backward compatibility. 
    3. Minor changes to path conventions.
  3. XML output deprecated (required only for Log4j1.2, which is also expected to go).
  4. Improved documentation of semantics and usage (including initialization and propagation via ThreadLocal and HTTP headers) for existing MDCs and attributes. 
  5. Add MDCs/Markers + usage for invocation IDs, allowing call graphs to be built without reliance on heuristics.
  6. Revisiting persistence (a clear requirement) and rollover settings, based on feedback from operations. 
  7. More discussion of How to Log. (Where previously guidelines were largely concerned with architecture and mechanics).
  8. Locking in other changes proposed in R1, including MDC serialization, escaping, etc. These can be treated as accepted. (Note that they only affect indexable output).

In addition, we expect to provide (as a Beijing deliverable) a minimal, synthetic component as an example of best-practices, and this will provide all code examples for this guide. (Does that mean the example will log via EELF, or will we end up with two variants?)

Pending Specification Work

iddateitemdetailsstatus

20180614MDC ClientIPAddressAsk question of OPS to remove this field - 20180419todo

20180614MDC ResponseCode / ResponseDescriptionexpand/find note 1*todo


Developer Guide

see separate page (cross releases) in Logging Developer Guide

  • No labels

142 Comments

  1. 20180412 - doing group edit to lockdown fields

    https://lists.onap.org/pipermail/onap-discuss/2018-April/009020.html

    Requirements (formalize into the wiki later)

    MDC's

    Note: for required response fields - these can be optional for logs that do not log a response


    Changes

    RequestID changes to TransactionID - proposal to finalize by 20180419 - involves significant code changes

    partnerName manditory? 5-10% of teams currently support - need it to differentiate between 3rd party calls inside components - like a JPA/REST library for example

    Verify that partnerName represents the granular microservice name so the ops team can target an individual container

    Review with OPS for sign off on removal of Server, ServerIP in leu of keeping ServerFQDN

    provide spec for onap team for what to write to the pair serverFQDN/clientIPAddress

    advise teams on duplication of ServerFQDN and VirtualServerName - these are or hybrid deployements


  2. TODO: for 15 TargetServiceName  -  look at INVOKE-return marker -

    Add scenarios for UCs for marker and MDC logs - per log type

    Determine if for C* release dual DEBUG/the-other-3 log files could be reduced from 4 to 2 if the Marker/MDC reporting makes the log type not required except for debug logs

    To be clear - 2 files (debug and everything else in a separate one)


    Discuss in terms of B/C upgrade path

    LOG-231 - Getting issue details... STATUS

    Epic for requestID to TransactionID - start discussion for cassablanca  LOG-232 - Getting issue details... STATUS


    ExternalID field discussion for hybrid environments:

    • use custom fields that the BSS can use for request/response tracking
    • David: embedded field

    meet 20180419 discussion

  3. We should include enough information in this specification so that log records that conform to it can be  translated into log records produced by components that are using the separate Audit and Metric log records in the original specification.   This is necessary for deployments that will include some components supporting each format. 

    We should required logs records to be written as follows (which is largely consistent with the discussion above):

    1. ENTRY: written as early in the invocation of an API as possible
    2. EXIT: written as late as in the invocation as possible (and MUST include the BeginTimestamp MDC)
    3. INVOKE: written just before an API on another component is called
    4. RETURN (or another name?) called just after the API in #3 returns.  (NOTE:  we need another timestamp here but can't use BeginTimestamp).

    1 and 2 can be used to produce the equivalent of an Audit log record.  3 and 4 can produce a metric log record. 





    1. Along these lines, when a combined logfile is used, should each record include a field specifying which log it belongs to (audit, debug, error, or metrics) so that the translation Lee mentions can be further facilitated?

    2. Clear examples of what an audit, debug, error, and metrics entry would look like (whether being logged in 4 separate files or in a combined file) would also be helpful.

      1. The need for a clear example of the log content keeps coming up.  Let's please try to get that assigned to someone in today's meeting.

  4. 20180426: Notes on spec meet

    Agenda:

    •  5+ files into 1-  Agreed 20180426  LOG-231 - Getting issue details... STATUS
      • pending - Need KPI metrics for before/after moving to 1 single log file - epic TBD
    •  Finalize MDC's - done except for the new INVOKE-RETURN
    • reorder the columns by priority
    •  Can we target specific logs like debug logs separate from say audit logs using markers/mdcs
    • Anything left on "derived" discussion
    • JIRA - Need example or definitions of each of the fields like PartnerName for example - either through a running test or a set of examples
    • JIRA - Need example use-case for Lee's comments on the INVOKE return (sync and async cases)
      • case is 
        • INVOKE
          • may need another MDC to keep the INVOKE time
          • Called function processes
        • INVOKE-return (time duration from (INVOKE-return - INVOKE) timestamps - report only - the ELK stack will do the diff 
        •  
    • and Lee's suggestion to add a duration timestamp in same comment
    • Finalize RequestID rename -  LOG-232 - Getting issue details... STATUS
    • Rename SYNCHRONOUS to INVOKE-S* and add INVOKE-RETURN
      • TODO: verify assumption that no upgrade path required because MARKERs are not in use yet
      • Discuss value of invoke-return for async calls (immediate return - but useful for tx tracking)
      • Verify async callback frameworks (comet, AJAX based, websocket) and how they use these return markers
      • Luke: finalize markers/mdcs for invocation including async scenarios 
    • Migration plan - Beijing to Casablanca upgrade epic - TBD
    • Timeline for Arch Subcommittee review - for Casablanca (Post RC1 - next thu?)
    • Lee Breslau ONAP operations CRUD  logs on VNFs - do we need a context sensitive MDC
      • Expressed as a list for 1:N multiple VNFS
      • added TargetElement - TODO add section to wiki, adjust beijing/Cass diff



  5. TargetEntiry and TargetServiceName fields are relevant to the INVOKE and INVOKE-return, so these fields could be tagged as "C" (context based), instead of "Y"

    1. editing now - thanks

      1. i quit the editing mode, i didn't make any changes yet

  6. notes: 20180503 review

    (10 participants) - screencap

    • dig up logback spec file that covered off some of our infrastructure/volumetrics...?
    • investigate values.yaml helm override we can suggest for teams
    • plan for LOG-231 mitigation 5 to 1 log files for Arch meet tue
    • order of fields changes green, yellow
    • David Bainbridge: Discussion on DB transaction confusion with the rename requestID to transactionID 
      • also need to nail down justification for change in LOG-232
      • AT&T rest standards ref: conversation id ? - Dave W.
      • Lee: need to cover off internal invocations
      • Correlation id
      • unless we have a very good reason continue to use request - Lee - defer - deferred for now until we document the change requirements
      • Sanjay -  in terms on intent along with the combined use with invocationID
    • verify epic on upgrade points
    • ops coverage - need more clear definition - example/in-use
    • take out discussion in description field into * note item at end for reference - description one liner very concise - Michael O'Brien
    • Q) Shishir - question about Arch subcommittee some sort of governance check  (like the license check) - we will need infrastructure in our project first - to be ab le to show an example.
    • onap-discuss/onap-tsc groups, presentation to arch, tsc, 80 ptls know about the new spec, how to use, impact, some migration to do in parts - 
    • service mesh proposal - logging component - part of general tracking, OPNFV has a proposal for CD framework that has logging - use as examples in how to approach the tsc.
    • know the spec exists
    • enforcement of the spec - by 5% example patch to project
  7. I've just put together a SLFJ reference example. The original purpose was to capture some representative call graph data for use in a diagram, but in order to do that, I've put all (or most, as we'll see) of the attributes through their paces. I'll commit the implementation to the logging-analytics project shortly (to logging-analytics/reference/slf4j-reference), but in the meantime, here's some feedback. (Brace yourselves.) 

    • There are still a LOT of attributes, even after the cull: 
      • The contract is burdensome.
      • Redundancy is more apparent when writing code than it was to all of us when editing a wiki, I fear.
    • There are groupings of attributes which aren't sufficiently evident from their naming:
      • Service:
        • ServiceName is OK, but its semantics are vague, and that means we'll continue to get wayward values.
        • InstanceUUID refers to the same "Service" as "ServiceName", but has no "Service" prefix.
        • ServicePath (i.e. something derivable from HttpServletRequest) would be more use than "ServiceName", and ServiceMethod wouldn't hurt, either. (But I'd rather fewer mandatory attributes, if I had to choose).
      • Responses:
        • There's ResponseCode, ResponseDescription and Severity.
        • Severity refers to the same response as "ResponseCode", but has no prefix.
        • StatusCode ditto.
      • Addresses:
        • ServerFQDN is quite specific – why an FQDN exactly? – and quite difficult to derive reliably.
        • ClientIPAddress is also quite specific, but different, and also quite difficult to derive.
        • Should be ServerAddress and ClientAddress.
    • The "Partner" in PartnerName is unintuitive:
      • It really means "Caller" or "Invoker", with the semantics of (I think) ServiceName.
      • But it's redundant in any case, since it's derivable from ServiceName + InvocationID.
    • Some attributes are very tempting to disregard:
      • Often this is the result of deliberately preserving redundancy:
        • For historical reasons. Maybe backward-compatibility. Maybe something more prosaic.
        • To assist with analysis in the absence of Elastic Stack.
        • But the cutoff is arbitrary, IMO. Some redundant attributes were retained. Some weren't.
      • But it makes for a lot of extra work, and that means they will be inconsistently set, resulting in noise.
      • So:
        • Drop INVOKE-RESPONSE. We already log EXITING, and Invocation* and Response* attributes.
        • Drop TargetEntity, derivable.
        • Drop TargetServiceName: derivable.
        • Drop one of EntryTimestamp and InvokeTimestamp: derivable.


    1. Luke,

      Two comments:

      1.  I believe some of the fields you propose eliminating are needed to support the operational requirements we have.   We should discuss further to see if we are disagreeing about what's required or how to achieve it.
      2. Related comment: even if some information is derivable from an associated log record using Invocation or Request ID, there still might be a case for including it.  Specifically, from an operational standpoint (and in our experience) we can't always assume that the corresponding log record with the derivable information is available.  What I mean by this is that to some extent (and we can discuss to what extent) log records ought to be "standalone", or at least ought to have some minimal information as to make them meaningful when they do stand alone.

      Lee

      1. Hi Lee,

        Quite right, and I do understand, and didn't mean to come on so strong. We've been focused on the specification, and I was giving feedback from the perspective of a developer going about implementing it from the bottom up.
           
        Distilling those points down a little:

        • Each attribute incurs a cost, imposes constraints, with implications for compliance. Most are easy. A few are quite hard.
        • Lots of attributes, tricky semantics, so if we're making changes already then it couldn't hurt to classify them into groups and name them accordingly. It's easier to think and organize code around groups of related attributes, and they do sort quite nicely into IDs, service, addresses, response.

        Thanks,
        Luke

  8. From 20180515 meet

    discussion on async invoke - as in providing a mechanism to provide for async callbacks (websocket like...)

    relax requirements on caller - defer to responder to mark the return but not necessarily via callback to originator

    In terms of tracking the invocationId



    TODO: merge duplicate MDC tables

    TODO: group based on luke's: IDs, service, addresses, response

    Review service mesh email on logging overlap


  9. This page suggests, "Consider using EELF."  Please explain the case for EELF.

    I see the current initiative to use just a couple of log files, not segregated into audit/metrics/error/application files as per EELF.  I also see the current initiative to use markers, which are not supported by EELF.

    I'm trying to say, these facts seem like a strong argument against using EELF. Using SLF4J looks like the most straightforward approach to meeting these logging specifications.

    Thanks in advance for comments & guidance.

    1. I agree with going straight SL4J approach, overall I find EELF overbearing and confusing.  It is mostly just a limited implementation.

      1. My preference, too. The "Consider using EELF" was carefully worded. EELF is idiosyncratic, but many existing components use it, and that won't change any time soon.

        The spec is behavioral, and the aim is to be compatible with any library or language. There are still a few gray areas, like EELF's multiple logfiles and hardwired (e.g. com.att.eelf.audit) logger names, and these take time to resolve because nobody's entirely sure who or what depends on them. They can largely be spun out into provider configuration, though, which in theory means you can safely multiplex output into a single file, since the provider configuration can be overridden to restore EELF-like behavior where required. (Like I say though, it's still a gray area).

        There's a nascent reference implementation for SLF4J in logging-analytics/reference/logging-slf4j (logging-analytics/reference/slf4j-reference pending https://gerrit.onap.org/r/#/c/48433/) which might be of some interest. It has few dependencies and aims to be a very light touch, but you can also use it as a (Casablanca) example of SLF4J logging without incurring anything.

  10. I added a detailed toy example on the markers - EELF (audit/metric) equivalent implementation for discussion. 

    MarkersEELF Equivalent
    Entry | ExitAudit
    Invoke | Invoke-ReturnMetric

    A quick description on the following example:

    Assume a request is going through three components: A, B and C,  where C calls A to invoke B to do something. With EELF log implementation, A will generate one audit log entry and one metric log entry, and B will generate one audit log entry corresponding to A's metric log. The requestID (r1) will be carried through in the whole transaction. 

    With Marks/MDCs implementation, we are showing the EELF equivalent below, with invocationID and markers. 

    Thanks,

    Xuan

    1. Xuan, thanks for putting together and posting this example.   I think it embodies several of the points we've discussed.  Key among them: this shows why we need both the EntryTimestamp and InvokeTimestamp MDCs.

      Another point that may be implicit in your example but could be with calling out:  t1 < t2 < t3 . . . . .   which shows the sequence of events that generate these particular log records.

  11. Thanks Xuan, I concur with Lee's comments.

    For marker,  let’s have two phase approach.

    Phase 1: Casablanca release – Introduce marker concept in Casablanca release. All ONAP components comply with Casablanca logs specs (MCDs) and use marker concept in logs. We will still writes logs in 4 separate files (audit, metrics, error, debug). At end of Casablanca release we will evaluate marker concepts pros/cons.

    Phase 2: Based on Casablanca marker concept evaluation, we will execute the file consolidation in next release.

  12. This page still refers to X-TransactionID as the HTTP header that will carry the RequestID in REST requests.  I thought we had decided a while back that we would use X-ONAP-RequestID as the HTTP header.  Has that been re-thought or was the change never made to this page?

  13. We had an internal AT&T meeting today and discussed the need to clearly state the expectation that RequestID be propagated across all interfaces, not just REST Interfaces.  We may also wish to consider specifying how that should be done for currently known non-REST interfaces, such as via the DMaaP Message Router and DMaaP Data (File) Router.

  14. We have another internal AT&T discussion  on PartnerName and ServiceName definitions. Ops and development teams recommendation on definitions.

    • PartnerName – The identification of the entity that made the request being served.  For a serving API that is authenticating the request, this should be the authenticated username or equivalent (e.g. an userid or a mechid).  For a non-authenticating API, this should be the component or subcomponent name (the existing Casablanca spec provides a description for this case).
    • ServiceName – The URI that the caller used to make the call to the component that is logging the message.
  15. Here are updated definitions based on Ops and development team input

    • TargetEntity - It contains the name      of the ONAP component or sub-component, or external entity, at which the      operation activities captured in this metrics log record is invoked.
    • TargetServiceName - It contains the name  of the API or operation activities invoked (name on the remote/target application) at the TargetEntity.  Example:  Class name of rest endpoint Discussion: on building call graph vs human      readable single line - keep for human readable. Used as valuable URI - to      annnote invoke marker. Review in terms of Marker-INVOKE - possibly add      INVOKE-return - to filter reporting.
    • TargetElement - VNF/PNF      context dependent - on CRUD operations of VNF/PNFs.

           The IDs that need to be covered with the above Attributes are

           -        VNF_ID OR VNFC_ID : (Unique identifier for a VNF asset that is being instantiated or that would generate an alarms)

           -        VSERVER_ID OR VM_ID (or vmid): (Unique identified for a virtual server or virtual machine on which a Control Loop action is usually taken on, or that                   is installed  as part of instantiation flow)

           -        PNF : (What is the Unique identifier used within ECOMP that we would like used in ONAP)

     

    • PartnerName – The identification of the entity that made the      request being served.  For a serving API that is authenticating the      request, this should be the authenticated username or equivalent (e.g. an      attuid or a mechid).  For a non-authenticating API, this should be      the component or subcomponent name (the existing Casablanca spec provides a description      for this case).
    • ServiceName – The URI that the caller used to make the call to the      component that is logging the message.
    • RequestID  See comments from David Williamson above - “an internal AT&T meeting discussed      the need to clearly state the expectation that RequestID be propagated      across all interfaces, not just REST Interfaces.”

     

  16. Example output from Luke Parker 's Marker example junut tests in logging-slf4j-demo

    -------------------------------------------------------
     T E S T S
    -------------------------------------------------------
    Running TestSuite
    2018-06-07T13:54:01.308Z	main	INFO	org.onap.logging.ref.slf4j.CallGraphTest		Starting.			
    {
        "severity": "severity0",
        "code": "code0",
        "service": "service0",
        "requests": [
            {
                "severity": "severityA",
                "code": "codeA",
                "service": "serviceA",
                "requests": []
            },
            {
                "severity": "severityB",
                "code": "codeB",
                "service": "serviceB",
                "requests": []
            }
        ]
    }
    {"severity":"severity0","code":"code0","service":"service0","requests":[{"severity":"severityA","code":"codeA","service":"serviceA","requests":[]},{"severity":"severityB","code":"codeB","service":"serviceB","requests":[]}]}
    {
        "severity": "severity0",
        "code": "code0",
        "responses": [
            {
                "severity": "severityA",
                "code": "codeA",
                "responses": []
            },
            {
                "severity": "severityB",
                "code": "codeB",
                "responses": []
            }
        ]
    }
    {"severity":"severity0","code":"code0","responses":[{"severity":"severityA","code":"codeA","responses":[]},{"severity":"severityB","code":"codeB","responses":[]}]}
    2018-06-07T13:54:01.485Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.alpha.ComponentAlpha	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=21c16aab-fd0c-4eb0-9b73-837349d20c4a, InvokeTimestamp=2018-06-07T09:54:01.473, PartnerName=, ClientIPAddress=127.0.0.1, ServerFQDN=localhost			ENTRY	
    2018-06-07T13:54:01.486Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.alpha.ComponentAlpha	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=21c16aab-fd0c-4eb0-9b73-837349d20c4a, InvokeTimestamp=2018-06-07T09:54:01.473, PartnerName=, ClientIPAddress=127.0.0.1, ServerFQDN=localhost	Executing in-process:\n{\n    "service": "beta",\n    "requests": [\n        {\n            "service": "gamma",\n            "requests": []\n        },\n        {\n            "service": "gamma",\n            "requests": []\n        }\n    ]\n}			
    2018-06-07T13:54:01.489Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.alpha.ComponentAlpha	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=21c16aab-fd0c-4eb0-9b73-837349d20c4a, InvokeTimestamp=2018-06-07T09:54:01.473, PartnerName=, ClientIPAddress=127.0.0.1, ServerFQDN=localhost	Invoking in-process [org.onap.logging.ref.slf4j.demo.component.beta.ComponentBeta].			
    2018-06-07T13:54:01.491Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.alpha.ComponentAlpha	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=21c16aab-fd0c-4eb0-9b73-837349d20c4a, InvokeTimestamp=2018-06-07T09:54:01.473, PartnerName=, ClientIPAddress=127.0.0.1, ServerFQDN=localhost	0f1ef603-23fd-4208-830b-493d2738e0b1		INVOKE [ SYNCHRONOUS ]	
    2018-06-07T13:54:01.492Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.beta.ComponentBeta	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=0f1ef603-23fd-4208-830b-493d2738e0b1, InvokeTimestamp=2018-06-07T09:54:01.492, PartnerName=service.alpha, ClientIPAddress=127.0.0.1, ServerFQDN=localhost			ENTRY	
    2018-06-07T13:54:01.493Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.beta.ComponentBeta	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=0f1ef603-23fd-4208-830b-493d2738e0b1, InvokeTimestamp=2018-06-07T09:54:01.492, PartnerName=service.alpha, ClientIPAddress=127.0.0.1, ServerFQDN=localhost	Executing in-process:\n{\n    "service": "gamma",\n    "requests": []\n}			
    2018-06-07T13:54:01.494Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.beta.ComponentBeta	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=0f1ef603-23fd-4208-830b-493d2738e0b1, InvokeTimestamp=2018-06-07T09:54:01.492, PartnerName=service.alpha, ClientIPAddress=127.0.0.1, ServerFQDN=localhost	Invoking in-process [org.onap.logging.ref.slf4j.demo.component.gamma.ComponentGamma].			
    2018-06-07T13:54:01.495Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.beta.ComponentBeta	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=0f1ef603-23fd-4208-830b-493d2738e0b1, InvokeTimestamp=2018-06-07T09:54:01.492, PartnerName=service.alpha, ClientIPAddress=127.0.0.1, ServerFQDN=localhost	17db5764-d971-4866-91a0-2b847ad5c5c3		INVOKE [ SYNCHRONOUS ]	
    2018-06-07T13:54:01.496Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.gamma.ComponentGamma	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=17db5764-d971-4866-91a0-2b847ad5c5c3, InvokeTimestamp=2018-06-07T09:54:01.496, PartnerName=service.beta, ClientIPAddress=127.0.0.1, ServerFQDN=localhost			ENTRY	
    2018-06-07T13:54:01.497Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.gamma.ComponentGamma	ResponseCode=, InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, ResponseDescription=, InvocationID=17db5764-d971-4866-91a0-2b847ad5c5c3, Severity=, InvokeTimestamp=2018-06-07T09:54:01.496, PartnerName=service.beta, ClientIPAddress=127.0.0.1, ServerFQDN=localhost, StatusCode=			EXIT	
    2018-06-07T13:54:01.498Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.beta.ComponentBeta	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=0f1ef603-23fd-4208-830b-493d2738e0b1, InvokeTimestamp=2018-06-07T09:54:01.492, PartnerName=service.alpha, ClientIPAddress=127.0.0.1, ServerFQDN=localhost	Executing in-process:\n{\n    "service": "gamma",\n    "requests": []\n}			
    2018-06-07T13:54:01.499Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.beta.ComponentBeta	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=0f1ef603-23fd-4208-830b-493d2738e0b1, InvokeTimestamp=2018-06-07T09:54:01.492, PartnerName=service.alpha, ClientIPAddress=127.0.0.1, ServerFQDN=localhost	Invoking in-process [org.onap.logging.ref.slf4j.demo.component.gamma.ComponentGamma].			
    2018-06-07T13:54:01.500Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.beta.ComponentBeta	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=0f1ef603-23fd-4208-830b-493d2738e0b1, InvokeTimestamp=2018-06-07T09:54:01.492, PartnerName=service.alpha, ClientIPAddress=127.0.0.1, ServerFQDN=localhost	2756834d-e7e4-46ed-ac33-c7215f7f6646		INVOKE [ SYNCHRONOUS ]	
    2018-06-07T13:54:01.500Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.gamma.ComponentGamma	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=2756834d-e7e4-46ed-ac33-c7215f7f6646, InvokeTimestamp=2018-06-07T09:54:01.500, PartnerName=service.beta, ClientIPAddress=127.0.0.1, ServerFQDN=localhost			ENTRY	
    2018-06-07T13:54:01.501Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.gamma.ComponentGamma	ResponseCode=, InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, ResponseDescription=, InvocationID=2756834d-e7e4-46ed-ac33-c7215f7f6646, Severity=, InvokeTimestamp=2018-06-07T09:54:01.500, PartnerName=service.beta, ClientIPAddress=127.0.0.1, ServerFQDN=localhost, StatusCode=			EXIT	
    2018-06-07T13:54:01.501Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.beta.ComponentBeta	ResponseCode=, InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, ResponseDescription=, InvocationID=0f1ef603-23fd-4208-830b-493d2738e0b1, Severity=, InvokeTimestamp=2018-06-07T09:54:01.492, PartnerName=service.alpha, ClientIPAddress=127.0.0.1, ServerFQDN=localhost, StatusCode=			EXIT	
    2018-06-07T13:54:01.501Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.alpha.ComponentAlpha	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=21c16aab-fd0c-4eb0-9b73-837349d20c4a, InvokeTimestamp=2018-06-07T09:54:01.473, PartnerName=, ClientIPAddress=127.0.0.1, ServerFQDN=localhost	Executing in-process:\n{\n    "service": "gamma",\n    "requests": []\n}			
    2018-06-07T13:54:01.502Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.alpha.ComponentAlpha	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=21c16aab-fd0c-4eb0-9b73-837349d20c4a, InvokeTimestamp=2018-06-07T09:54:01.473, PartnerName=, ClientIPAddress=127.0.0.1, ServerFQDN=localhost	Invoking in-process [org.onap.logging.ref.slf4j.demo.component.gamma.ComponentGamma].			
    2018-06-07T13:54:01.502Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.alpha.ComponentAlpha	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=21c16aab-fd0c-4eb0-9b73-837349d20c4a, InvokeTimestamp=2018-06-07T09:54:01.473, PartnerName=, ClientIPAddress=127.0.0.1, ServerFQDN=localhost	df450393-d359-43e3-b23f-95d3e0f8ebad		INVOKE [ SYNCHRONOUS ]	
    2018-06-07T13:54:01.503Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.gamma.ComponentGamma	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=df450393-d359-43e3-b23f-95d3e0f8ebad, InvokeTimestamp=2018-06-07T09:54:01.503, PartnerName=service.alpha, ClientIPAddress=127.0.0.1, ServerFQDN=localhost			ENTRY	
    2018-06-07T13:54:01.504Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.gamma.ComponentGamma	ResponseCode=, InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, ResponseDescription=, InvocationID=df450393-d359-43e3-b23f-95d3e0f8ebad, Severity=, InvokeTimestamp=2018-06-07T09:54:01.503, PartnerName=service.alpha, ClientIPAddress=127.0.0.1, ServerFQDN=localhost, StatusCode=			EXIT	
    2018-06-07T13:54:01.504Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.alpha.ComponentAlpha	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=21c16aab-fd0c-4eb0-9b73-837349d20c4a, InvokeTimestamp=2018-06-07T09:54:01.473, PartnerName=, ClientIPAddress=127.0.0.1, ServerFQDN=localhost	Executing in-process:\n{\n    "service": "delta",\n    "requests": []\n}			
    2018-06-07T13:54:01.505Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.alpha.ComponentAlpha	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=21c16aab-fd0c-4eb0-9b73-837349d20c4a, InvokeTimestamp=2018-06-07T09:54:01.473, PartnerName=, ClientIPAddress=127.0.0.1, ServerFQDN=localhost	Invoking in-process [org.onap.logging.ref.slf4j.demo.component.delta.ComponentDelta].			
    2018-06-07T13:54:01.507Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.alpha.ComponentAlpha	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=21c16aab-fd0c-4eb0-9b73-837349d20c4a, InvokeTimestamp=2018-06-07T09:54:01.473, PartnerName=, ClientIPAddress=127.0.0.1, ServerFQDN=localhost	dfb525dd-33a0-45db-925d-440256589570		INVOKE [ SYNCHRONOUS ]	
    2018-06-07T13:54:01.507Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.delta.ComponentDelta	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=dfb525dd-33a0-45db-925d-440256589570, InvokeTimestamp=2018-06-07T09:54:01.507, PartnerName=service.alpha, ClientIPAddress=127.0.0.1, ServerFQDN=localhost			ENTRY	
    2018-06-07T13:54:01.508Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.delta.ComponentDelta	ResponseCode=, InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, ResponseDescription=, InvocationID=dfb525dd-33a0-45db-925d-440256589570, Severity=, InvokeTimestamp=2018-06-07T09:54:01.507, PartnerName=service.alpha, ClientIPAddress=127.0.0.1, ServerFQDN=localhost, StatusCode=			EXIT	
    2018-06-07T13:54:01.508Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.alpha.ComponentAlpha	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=21c16aab-fd0c-4eb0-9b73-837349d20c4a, InvokeTimestamp=2018-06-07T09:54:01.473, PartnerName=, ClientIPAddress=127.0.0.1, ServerFQDN=localhost	Executing in-process:\n{\n    "service": "beta",\n    "requests": [{\n        "service": "gamma",\n        "requests": [{\n            "service": "delta",\n            "requests": []\n        }]\n    }]\n}			
    2018-06-07T13:54:01.509Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.alpha.ComponentAlpha	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=21c16aab-fd0c-4eb0-9b73-837349d20c4a, InvokeTimestamp=2018-06-07T09:54:01.473, PartnerName=, ClientIPAddress=127.0.0.1, ServerFQDN=localhost	Invoking in-process [org.onap.logging.ref.slf4j.demo.component.beta.ComponentBeta].			
    2018-06-07T13:54:01.510Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.alpha.ComponentAlpha	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=21c16aab-fd0c-4eb0-9b73-837349d20c4a, InvokeTimestamp=2018-06-07T09:54:01.473, PartnerName=, ClientIPAddress=127.0.0.1, ServerFQDN=localhost	dc07b2de-1aba-4f2d-b490-8ff3b41c4a5e		INVOKE [ SYNCHRONOUS ]	
    2018-06-07T13:54:01.510Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.beta.ComponentBeta	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=dc07b2de-1aba-4f2d-b490-8ff3b41c4a5e, InvokeTimestamp=2018-06-07T09:54:01.510, PartnerName=service.alpha, ClientIPAddress=127.0.0.1, ServerFQDN=localhost			ENTRY	
    2018-06-07T13:54:01.511Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.beta.ComponentBeta	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=dc07b2de-1aba-4f2d-b490-8ff3b41c4a5e, InvokeTimestamp=2018-06-07T09:54:01.510, PartnerName=service.alpha, ClientIPAddress=127.0.0.1, ServerFQDN=localhost	Executing in-process:\n{\n    "service": "gamma",\n    "requests": [{\n        "service": "delta",\n        "requests": []\n    }]\n}			
    2018-06-07T13:54:01.511Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.beta.ComponentBeta	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=dc07b2de-1aba-4f2d-b490-8ff3b41c4a5e, InvokeTimestamp=2018-06-07T09:54:01.510, PartnerName=service.alpha, ClientIPAddress=127.0.0.1, ServerFQDN=localhost	Invoking in-process [org.onap.logging.ref.slf4j.demo.component.gamma.ComponentGamma].			
    2018-06-07T13:54:01.512Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.beta.ComponentBeta	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=dc07b2de-1aba-4f2d-b490-8ff3b41c4a5e, InvokeTimestamp=2018-06-07T09:54:01.510, PartnerName=service.alpha, ClientIPAddress=127.0.0.1, ServerFQDN=localhost	5bc2370b-e8a9-4bfa-b707-cd65437d25ee		INVOKE [ SYNCHRONOUS ]	
    2018-06-07T13:54:01.512Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.gamma.ComponentGamma	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=5bc2370b-e8a9-4bfa-b707-cd65437d25ee, InvokeTimestamp=2018-06-07T09:54:01.512, PartnerName=service.beta, ClientIPAddress=127.0.0.1, ServerFQDN=localhost			ENTRY	
    2018-06-07T13:54:01.512Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.gamma.ComponentGamma	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=5bc2370b-e8a9-4bfa-b707-cd65437d25ee, InvokeTimestamp=2018-06-07T09:54:01.512, PartnerName=service.beta, ClientIPAddress=127.0.0.1, ServerFQDN=localhost	Executing in-process:\n{\n    "service": "delta",\n    "requests": []\n}			
    2018-06-07T13:54:01.513Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.gamma.ComponentGamma	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=5bc2370b-e8a9-4bfa-b707-cd65437d25ee, InvokeTimestamp=2018-06-07T09:54:01.512, PartnerName=service.beta, ClientIPAddress=127.0.0.1, ServerFQDN=localhost	Invoking in-process [org.onap.logging.ref.slf4j.demo.component.delta.ComponentDelta].			
    2018-06-07T13:54:01.514Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.gamma.ComponentGamma	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=5bc2370b-e8a9-4bfa-b707-cd65437d25ee, InvokeTimestamp=2018-06-07T09:54:01.512, PartnerName=service.beta, ClientIPAddress=127.0.0.1, ServerFQDN=localhost	f7d6d048-9c94-4887-878d-cd2edc8a74a1		INVOKE [ SYNCHRONOUS ]	
    2018-06-07T13:54:01.515Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.delta.ComponentDelta	InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, InvocationID=f7d6d048-9c94-4887-878d-cd2edc8a74a1, InvokeTimestamp=2018-06-07T09:54:01.515, PartnerName=service.gamma, ClientIPAddress=127.0.0.1, ServerFQDN=localhost			ENTRY	
    2018-06-07T13:54:01.516Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.delta.ComponentDelta	ResponseCode=, InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, ResponseDescription=, InvocationID=f7d6d048-9c94-4887-878d-cd2edc8a74a1, Severity=, InvokeTimestamp=2018-06-07T09:54:01.515, PartnerName=service.gamma, ClientIPAddress=127.0.0.1, ServerFQDN=localhost, StatusCode=			EXIT	
    2018-06-07T13:54:01.516Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.gamma.ComponentGamma	ResponseCode=, InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, ResponseDescription=, InvocationID=5bc2370b-e8a9-4bfa-b707-cd65437d25ee, Severity=, InvokeTimestamp=2018-06-07T09:54:01.512, PartnerName=service.beta, ClientIPAddress=127.0.0.1, ServerFQDN=localhost, StatusCode=			EXIT	
    2018-06-07T13:54:01.516Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.beta.ComponentBeta	ResponseCode=, InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, ResponseDescription=, InvocationID=dc07b2de-1aba-4f2d-b490-8ff3b41c4a5e, Severity=, InvokeTimestamp=2018-06-07T09:54:01.510, PartnerName=service.alpha, ClientIPAddress=127.0.0.1, ServerFQDN=localhost, StatusCode=			EXIT	
    2018-06-07T13:54:01.517Z	main	INFO	org.onap.logging.ref.slf4j.demo.component.alpha.ComponentAlpha	ResponseCode=, InstanceUUID=5586d8ef-b5ee-4947-b0c6-d839c268f153, RequestID=35b02a31-6d41-4483-8cf7-6e0325419419, ServiceName=, ResponseDescription=, InvocationID=21c16aab-fd0c-4eb0-9b73-837349d20c4a, Severity=, InvokeTimestamp=2018-06-07T09:54:01.473, PartnerName=, ClientIPAddress=127.0.0.1, ServerFQDN=localhost, StatusCode=			EXIT	
    {
        "severity": "INFO",
        "code": "alpha.OK",
        "service": "alpha",
        "responses": [
            {
                "severity": "INFO",
                "code": "beta.OK",
                "service": "beta",
                "responses": [
                    {
                        "severity": "INFO",
                        "code": "gamma.OK",
                        "service": "gamma",
                        "responses": []
                    },
                    {
                        "severity": "INFO",
                        "code": "gamma.OK",
                        "service": "gamma",
                        "responses": []
                    }
                ]
            },
            {
                "severity": "INFO",
                "code": "gamma.OK",
                "service": "gamma",
                "responses": []
            },
            {
                "severity": "INFO",
                "code": "delta.OK",
                "service": "delta",
                "responses": []
            },
            {
                "severity": "INFO",
                "code": "beta.OK",
                "service": "beta",
                "responses": [{
                    "severity": "INFO",
                    "code": "gamma.OK",
                    "service": "gamma",
                    "responses": [{
                        "severity": "INFO",
                        "code": "delta.OK",
                        "service": "delta",
                        "responses": []
                    }]
                }]
            }
        ]
    }
    READING LOGFILE: /var/folders/mk/90_85fd56s1_qmrlx_cp_ds80000gn/T/org.onap.logging.ref.slf4j.CallGraphTest8668247085911281802/output.log
    ----
    Graph:
    
    LogEntry(markers=ENTRY, logger=ComponentAlpha, requestID=[...]25419419, invocationID=[...]49d20c4a, timestamp=2018-06-07T09:54:01.485-04:00)
        LogEntry(markers=ENTRY, logger=ComponentBeta, requestID=[...]25419419, invocationID=[...]2738e0b1, timestamp=2018-06-07T09:54:01.492-04:00)
            LogEntry(markers=ENTRY, logger=ComponentGamma, requestID=[...]25419419, invocationID=[...]7ad5c5c3, timestamp=2018-06-07T09:54:01.496-04:00)
            LogEntry(markers=ENTRY, logger=ComponentGamma, requestID=[...]25419419, invocationID=[...]5f7f6646, timestamp=2018-06-07T09:54:01.500-04:00)
        LogEntry(markers=ENTRY, logger=ComponentGamma, requestID=[...]25419419, invocationID=[...]e0f8ebad, timestamp=2018-06-07T09:54:01.503-04:00)
        LogEntry(markers=ENTRY, logger=ComponentDelta, requestID=[...]25419419, invocationID=[...]56589570, timestamp=2018-06-07T09:54:01.507-04:00)
        LogEntry(markers=ENTRY, logger=ComponentBeta, requestID=[...]25419419, invocationID=[...]b41c4a5e, timestamp=2018-06-07T09:54:01.510-04:00)
            LogEntry(markers=ENTRY, logger=ComponentGamma, requestID=[...]25419419, invocationID=[...]437d25ee, timestamp=2018-06-07T09:54:01.512-04:00)
                LogEntry(markers=ENTRY, logger=ComponentDelta, requestID=[...]25419419, invocationID=[...]dc8a74a1, timestamp=2018-06-07T09:54:01.515-04:00)
    
    ----
    2018-06-07T13:54:03.552Z	main	INFO	org.onap.logging.ref.slf4j.CallGraphTest		Ending.			
    Should be deleting [/var/folders/mk/90_85fd56s1_qmrlx_cp_ds80000gn/T/org.onap.logging.ref.slf4j.CallGraphTest8668247085911281802]...
    Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.92 sec - in TestSuite
    
    
    
  17. Acumos logging meet

    • Definitely close MDC spec table for the next log meet
    • fill out marker tables
    • Look more at ELK as a Service - start with common ELK charts (with Clamp)
    • MDC sync (core section + extensions) - ident duplication
      • onap, acumos specific MDC's
      • finish off optional/derived MDC's for next log meet - 
    • fill out dev system instructions
    • work with clamp, techmahindra
    • Use case driven - drive onap side with UC like Acumos side to derive MDC cross coverage
    • Address use case and code links from required MDC table entries - 
    • Discuss with SDC on logback issue - Michael O'Brien to discuss with Michael Lando
    1. Clamp meeting: they don't need ELK as a Service but will collaborate with common charts

  18. Thanks for the example.

    Regarding INVOKE records that contains 2 invocation ID's (The old one is in the MDC, the new one follows INVOKE), would it be more consistent if both invocation IDs are in the MDCs (the new one as a conditional field)?

    1. this makes sense - I will prototype this

    2. Horace, 

      Thanks for pointing out the two invocation IDs in the example. I updated my previous example with a new invocationID added to the Invoke/Invoke-return log record. So the question is whether we will define a new attribute in MDC "InvocationIDOut" (or some other name making more sense) to the spec, and this new field only applies for the logs with Invoke/Invoke-return markers. 


  19. 20180614: notes

    • in terms of 1 log file direction - mark mdcs as not applicable - required but based on which log writing to
    • expand on using marker label (DEBUG ie) to differentiate logs when in 1 file
  20. 220180621: Notes

    add links to actual library code (Java is more mature for now) https://git.onap.org/logging-analytics/tree/reference/logging-slf4j

    https://git.onap.org/logging-analytics/tree/reference/logging-slf4j/src/main/java/org/onap/logging/ref/slf4j/ONAPLogAdapter.java

    via

    https://git.onap.org/logging-analytics/tree/reference/logging-slf4j/src/test/java/org/onap/logging/ref/slf4j/ONAPLogAdapterTest.java

    and flow related

    https://git.onap.org/logging-analytics/tree/reference/logging-slf4j-demo/src/test/java/org/onap/logging/ref/slf4j/demo/component/AbstractComponentTest.java


    add links to devops page Logging User Guide#LoggingDevOps and ONAP Development#KubernetesDevOps

    q) other wrappers in additions to slf4j - yes we can add other wrappers

    Q) on phased (two logstash configs ) -  LOG-493 - Getting issue details... STATUS

    todo: remember Acumos operations use case refs and marker scenarios including when applicable in devops section

    There will be updates to public acumos use cases - do a 2nd pass

    Q) on multiple logback.xml in each project

    • try to get them working first - see JIRA
    • but then do we want to generate them based on templates?
    • maybe as part of common charts for elk - we have a pluggable xml per component - based on if we change the sidecar/deamonset
    • Note: follow ciman yaml plugin pattern for jjb
    • Add EPIC/story maybe with pluggable - 

    TODO email Shishir Thakore the security meet and aaf meet

    add code references to the right side of each table entry into the junit test for now.


    Work items

    • use the log library -  LOG-115 - Getting issue details... STATUS  
    • Deploy/test/debug the library in use -  LOG-137 - Getting issue details... STATUS
    • expand on security use cases -  LOG-409 - Getting issue details... STATUS  with techmindra and clamp
    • look at  LOG-494 - Getting issue details... STATUS
    • raise oom shadow jiras for 481/483 and expand with epic for all components not just portal - for filebeat/elasticsearch yamls - verify by attempting to modify each on in the ubuntu user

    • Document use case column in the table from acumos
    • attendees
  21. Some comments:

    • Section numbers or # tags for href would help me refer to specific sections of the requirements from User Stories
    • In its current format, the risk of skimming and coming away with the wrong requirements is a bit high
      • I would consider separating out normative from informative content
      • Make clear in tables, etc current fields and what has been obsoleted or deprecated. Note that folks can refer to the old document if they really want more history
    • It might be useful to separate out
      • Requirements and here is a library that will help
      • Information you need to ensure people enter (REST APIs, etc) so you can log correctly.
    1. There is an evolving section of using/deploying/debugging the library - parts of this as it is finalized/tested/verified will be retrofitted here

      ONAP Development#DeveloperUseoftheLoggingLibrary

      to

      LoggingLibraryLocationandUse

      off 

      LOG-115 - Getting issue details... STATUS

      LOG-137 - Getting issue details... STATUS




    2. I think # tags are being used: MDC-RequestID

      1. Also spring AOP example in progress - ONAP Development#LoggingWithAOP under  LOG-135 - Getting issue details... STATUS

  22. One dumb question, according to this page, slf4j seems to be the best choice to support all these MDCs and Markers in order to fulfill our requirements . However, our code is written in python, is the modules of slf4j supported by python ? Or is there any replaceable ones?  Thanks in advance.

    Best Regards,

    Ruoyu

    1. Ruoyu, Hi we are still finalizing the specification adherence in the python library - submitted by the Multicloud team at https://git.onap.org/logging-analytics/tree/pylog

      1. Hi Michael, do you happen to know when the Python library is expected to be finalized?

        1. I need to learn more python first and retrofit changes after the java work is completed

          1. What's the expectation from Python based applications for logging in Casablanca release?

            1. Michael O'Brien, OOF was planning to make some progress along the adherence to logging 1.2 spec for Casablanca. Since we are a python project, the readiness of the Python library is a key turning point for us. We could still do Level 1 and implement a local library/wrapper, but I'm worried it may be wasted efforts if the python library is close to being rolled out (in Casablanca or Dublin). Given this, I'd be grateful for your recommendation for our project, given the time left in Casablanca.

  23. The spec is inconsistent in its references to the metric/metrics log.  In some places, metric is used; in other places, metrics.  Which is it?  I do not think it matters much which we use, rather we need to be completely consistent in the spec so that we at least have a chance of consistency in implementation.


    1. https://github.com/att/EELF/blob/master/EELF/src/main/java/com/att/eelf/configuration/EELFLogger.java and most provider config calls it "metrics", so I guess that's what we should standardize on. 

  24. TODO: eclEmma coverage details and howto, using "recheck" to rerun oom job builder timing errors

  25. Acumos meeting minutes: 20180719: 

    • expand out the technologies in deployment, dev, testing
    • expand on what is spec and what is optional (example tabs in marker logback.xml)
    • expand on onap specific - vs acumos/onap agnostic
    • optional/required API specification (should match log spec) - expand on 2 levels of required (derivable)
    • expand on conditions where setting an mdc (if unset) will vary
    • summary: 3 points - to assist with Acumus TSC meet
      • consumer of API
      • provider of API
      • spec of the API
  26. Acumos meet 20180802

    • cover off application.log (merged)
    • mdcs/kvp are required but markers are not - as a minimum compliance - this would make the AOP library optional (it mostly helps with ENTRY/EXIT markers) - as was mentioned - we are not consuming a library that we are designing on the fly
    • need granular m:n AOP pointcuts - currently only one exists on "org.onap.*" which is configurable via parameter
  27. The following fields appear to be missing from the table:

    1. The "Log Detailed Message" itself
    2. Thread ID (I see it in the config examples, so it seems to still be expected)
    3. Logger Name (or class name)

    For clarity, if these are indeed expected, then they should probably be in the table.

    1. We will need to discuss this in both the logging spec meeting for onap at 11am tue and the acumos ptl and log meetings (9am tue, 1130 thu) - since they follow the onap spec with modifications

      https://wiki.acumos.org/display/OAM/Log+Standards

      added "Message" MDC

      did not add "ThreadID" - because it should be defined in the logback.xml as a regular field

      TODO: Michael O'Brien add section on common attributes (not MDCs) but manditory via provider api's (logback.xml)  - timestamp, logger, level, threadid

      example where we have timestamp, thread, level and logger (based on classname in this example)

      2018-07-09T14:48:01.014Z    http-nio-8080-exec-8    INFO    org.onap.demo.logging.ApplicationService 

      via

      LoggerFactory.getLogger(this.getClass()).info("Running /health");


      1. I was just talking to the ASDC team and they want to know if there is a reason why "message" should be an MDC.  Does it really need to be saved for later?  Not sure if perhaps there are scenarios where this makes sense and thus it should be an MDC.

        1. Per today's meeting, I understand that "Message" will be added to the table of common attributes and will not be added as an MDC.

  28. With regard to ServerFQDN, this may be difficult to obtain inside a container.  For example, within a pod, calls to hostname return the name of the pod, not the name of the VM or server on which the pod is running.  Within a Docker container, hostname returns the id of the container.  Thus, the VM or server name is essentially hidden from inside the pod and the container.

    This spec currently defines the content of ServerFQDN as follows:

    "This field contains the Virtual Machine (VM) Fully Qualified Domain Name (FQDN) if the server is virtualized. Otherwise, it contains the host name of the logging component."

    What about inside a Pod or a container?

    Two possibilities I can think of:

    1. Log as ServerFQDN whatever hostname returns; but in the case of containers it is not clear to me if these likely ephemeral names are going to have any meaning, especially after the pod/container that has that name is terminated
    2. Require that pods and containers be started with an environment variable that contains the VM or server FQDN and then have that picked up and logged; this would, unfortunately, require changes outside of logging to accommodate

    Other ideas beyond these two are certainly welcome.

    What I think is missing from my understanding is what an operations team for an ONAP deployment might be expecting to do with this information.  In particular, #1 above seems the easiest from the perspective of getting the information and not impacting anything outside of logging, but is it sufficient?  If it is, then I would say we should go with #1.

    1. expanding on container level for ServerFQDN

      vm (no docker)

      vm<delimeter>,dockerid - (docker-compose)

      vm<delimeter>,pod<>containerID (kubernetes)  - what about namespace

      example

      onap          onap-log-elasticsearch-7557486bc4-9h7gf    1/1       Running   0          7d

       - only 9h7gf varies on restarts

      expand the field's description - likely just return whatever the is output from inside the container

      Need to show production level vm name (plus) - currently running in dev only getting "localhost"


      2018-07-09T14:48:01.014Z    http-nio-8080-exec-8    INFO    org.onap.demo.logging.ApplicationService   
       InstanceUUID=67bc4b12-56a1-4b62-ab78-0c8ca8834383,
       RequestID=023af35a-b281-49c4-bf13-5167b1453780,
       ServiceName=/logging-demo/rest/health/health,
       InvocationID=94dc9e24-3722-43e5-8995-12f95e153ca3,
       InvokeTimestamp=2018-07-09T14:48:01.008Z,
       PartnerName=,
       ClientIPAddress=0:0:0:0:0:0:0:1,
       ServerFQDN=localhost   Running /health
      1. Based on the investigation Michael has done here, I think I feel comfortable with the idea of logging whatever hostname returns.  This is simple and appears to be reasonably useful.  I would like to see us finalize this during the ONAP logging call tomorrow if at all possible.

        1. Updating based on testing on a single node k8s server and a clustered k8s system to see if serverFQDN returns the vm name

          currently I am getting the node that the pod is running on - but I should get dev0, dev1 for clustered nodes

          ServerFQDN=dev.onap.info

          note this is just the k8s service itself - not the pod

          ClientIPAddress=10.42.0.1

          ubuntu@ip-172-31-48-194:~$ kubectl get services --all-namespaces

          default       kubernetes             ClusterIP   10.43.0.1       <none>        443/TCP          61d



  29. With regard to InstanceUUID, the SDC team has raised some interesting points with regarding to InstanceUUID inside a container:

    1. Why does it have to be a UUID?  The Pod ID is already unique (for at least a reasonable period of time, perhaps a year) and so could the Pod ID be used?
    2. If this really must be a UUID, requiring each component to write their own code to generate an ID on Pod startup, get it propagated so as to be usable within the Pod, and then picked up so that it will appear in each log message seems like something that would be better done as part of the framework.

    Note that if #1 is pursued, this could possibly duplicate what is reported in ServerFQDN if the other question I just posted is resolved to provide the Pod ID in ServerFQDN.

    1. good point the pod id is unique enough

      onap-log-elasticsearch-7557486bc4-9h7gf 

      plus the actual container - if there are sidecars

      choose one of: [logdemonode-filebeat-onap logdemonode]

      verify

       node / pod / container - levels


      should we make UUID optional and include pod ID for InstanceUUID - yes (pass in kubernetes id if available

      Michael O'Brien todo - fix the table and description for InstanceUUID - keep the name



  30. Will the ONAP reference implementation library https://git.onap.org/logging-analytics be released anytime soon?  I see version 1.2.2 in the ONAP staging repository but not in the releases repository.

    1. I would expect we will run the release artifacts when we pass RC2 for casablanca - for now you can use the lastest snapshot jar in the staging repo.

      1. I can use the artifact locally sure, but in the Acumos project at LF, the Jenkins settings are configured to ignore staging repos. Acumos doesn't have the special one-off fix to the nexus server :/ so having a bona fide released library is important.

        How about releasing v 1.2.2, and when you need to change it, bump up the version and release again?  Doesn't cost any extra ..

  31. How do you log ClientIpAddress for scheduled tasks?  Do you just want something like 127.0.0.1? Since it is an internally started item?

    1. Yes, IP address may not really resolve to anything usable inside a k8s cluster - it is more for hybrid deployments where the client is outside the cluster - in the case of a k8s deployment it currently resolves to 10.42.0.1

  32. Partner name description seems off unless I am reading it wrong?  Is it really the target uri?

    1. Right, thanks for the catch

      in unauthenticated calls - the service/agent (part of the url) making the call

      in authenticated calls - the userid

      adjusted table text


  33. Is there an obsolescence chart, where it kind of describes what was removed, or has been replaced by a new MDC value?  I find it difficult to look at what I was previously logging, then the new format and trying to align it all.

    1. good idea, was using the diff in wiki - I will put back the deleted fields - more so since the last 2 weeks where we were talking about implementing a phased approach (no all or nothing)

  34. Adding larger logging committers and onap to acumos mail.



    From: LOTT, CHRISTOPHER M (CHRISTOPHER) <clott@research.att.com>
    Sent: Friday, August 17, 2018 9:51 AM
    To: MANTHAN, MUKESH <MM00542237@techmahindra.com>; GUPTA, PARICHAY <PG0057291@techmahindra.com>
    Cc: SULLIVAN, BRYAN L (BRYAN L) <bryan.sullivan@research.att.com>; Michael O'Brien <Frank.Obrien@amdocs.com>; SUBRAMANIAN, NATARAJAN <Natarajan.Subramanian@techmahindra.com>; TALASILA, MANOOP (MANOOP) <talasila@research.att.com>; Dooze Philippe IMT/ Oln <philippe.dooze@orange.com>; DEY, SPONDON <sd4351@att.com>; JACOBSON, GUY J (GUY J) <guy@research.att.com>; WELCH, LORRAINE A <lb2391@att.com>; SARKAR, ANANDARUP (ANANDARUP) <asarkar@research.att.com>; UKASICK, AIMEE L (AIMEE L) <aimeeu@research.att.com>
    Subject: Sample logging code and logback.xml for possible Acumos component use
    Importance: High

     

    Good morning, I didn't have time to attend all your committee's meetings, but based on your wiki page

     

    https://wiki.acumos.org/display/OAM/Logging

     

    I started working with ONAP-style logging in the CDS server component.  I reused code and config from the ONAP log-analytics project 

     

       https://git.onap.org/logging-analytics/tree/reference/logging-slf4j

     

    A digression: I tweaked this some, I didn't feel this was 100% drop-in usable because of the "ONAP" prefix in several of their HTTP request header names, which probably isn't right for Acumos.

     

    Anyhow for Spring-Boot users, I can offer sample code to add some mapped diagnostic context (MDC) values on each REST request.  Those MDC key-value pairs then become available to the log output pattern. Requires no change to individual controllers, which may be an accelerator for some projects. The key class is below. You can fetch the rest from the CDS project gerrit repository.

     

    I also can offer a logback.xml config file, see below.  Now, bear with me, mine uses the tab-separated MDC-key-value format that your committee seems to have decided should be postponed to release 2/B.  I happen to like that format, think it's a lot prettier :) than the 30+ column pipe-delimited fixed format that's proposed for release A, so maybe this is not of value to you.  But please note that the output format is ENTIRELY determined by the logback.xml file, so log-generating components can choose fixed or variable columns trivially, just drop in the right file in one place.  But I don't know the OAM impact - Parichay, please tell me how difficult it is for you to accommodate multiple log formats in OAM.

     

    Just wanted to toss in my $0.02 and code samples, HTH

     

    ---

     

    /**

     * Adds request details to the mapped diagnostic context (MDC) so they can be

     * logged. <BR>

     * http://www.devgrok.com/2017/04/adding-mdc-headers-to-every-spring-mvc.html

     */

    public class LoggingHandlerInterceptor extends HandlerInterceptorAdapter {

     

    /**

    * Tracks the set of keys added to MDC.

    */

    private ThreadLocal<Set<String>> storedKeys = ThreadLocal.withInitial(() -> new HashSet<>());

     

    /**

    * Copies key-value pairs from HTTP request to MDC context. Unfortunately they

    * use different conventions for key naming.

    */

    @Override

    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler)

    throws Exception {

     

    addKey(ONAPLogConstants.MDCs.SERVER_FQDN, InetAddress.getLocalHost().getCanonicalHostName());

    addKey(ONAPLogConstants.MDCs.CLIENT_IP_ADDRESS, request.getRemoteAddr());

    final String requestId = request.getHeader("X-Request-ID");

    if (requestId != null)

    addKey(ONAPLogConstants.MDCs.REQUEST_ID, requestId);

    return true;

    }

     

    private void addKey(String key, String value) {

    MDC.put(key, value);

    storedKeys.get().add(key);

    }

     

    // request ended on current thread remove properties

    @Override

    public void afterConcurrentHandlingStarted(HttpServletRequest request, HttpServletResponse response, Object handler)

    throws Exception {

    removeKeys();

    }

     

    @Override

    public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex)

    throws Exception {

    removeKeys();

    }

     

    private void removeKeys() {

    for (String key : storedKeys.get())

    MDC.remove(key);

     

    storedKeys.remove();

    }

    }

     

    ---

     

    <?xml version="1.0" encoding="UTF-8"?>

    <!--

      ===============LICENSE_START=======================================================

      Acumos

      ===================================================================================

      Copyright (C) 2017 AT&T Intellectual Property & Tech Mahindra. All rights reserved.

      ===================================================================================

      This Acumos software file is distributed by AT&T and Tech Mahindra

      under the Apache License, Version 2.0 (the "License");

      you may not use this file except in compliance with the License.

      You may obtain a copy of the License at

       

      http://www.apache.org/licenses/LICENSE-2.0

       

      This file is distributed on an "AS IS" BASIS,

      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

      See the License for the specific language governing permissions and

      limitations under the License.

      ===============LICENSE_END=========================================================

      -->

     

    <configuration scan="true" scanPeriod="3 seconds" debug="false">

     

    <!-- component name is log file basename -->

    <property name="componentName" value="cmn-data-svc"></property>

     

    <!-- Use a subdirectory for ease of mounting a volume in docker -->

    <property name="logDirectory" value="logs" />

     

    <property name="p_tim" value="%d{&quot;yyyy-MM-dd'T'HH:mm:ss.SSSXXX&quot;, UTC}"/>

    <property name="p_thr" value="%thread"/>

    <property name="p_lvl" value="%level"/>

    <property name="p_log" value="%logger"/>

    <property name="p_mdc" value="%replace(%replace(%mdc){'\t','\\\\t'}){'\n', '\\\\n'}"/>

    <property name="p_msg" value="%replace(%replace(%msg){'\t', '\\\\t'}){'\n','\\\\n'}"/>

    <property name="p_exc" value="%replace(%replace(%rootException){'\t', '\\\\t'}){'\n','\\\\n'}"/>

    <property name="p_mrk" value="%replace(%replace(%marker){'\t', '\\\\t'}){'\n','\\\\n'}"/>

    <property name="singleLinePattern" value="%nopexception${p_tim}\t${p_thr}\t${p_lvl}\t${p_log}\t${p_mdc}\t${p_msg}\t${p_exc}\t${p_mrk}\t%n"/>

    <property name="multiLinePattern" value="%nopexception${p_tim}\t${p_thr}\t${p_lvl}\t${p_log}\t%mdc\t%msg\t%rootException\t%marker\t%n"/>

     

    <appender name="console" class="ch.qos.logback.core.ConsoleAppender">

    <encoder>

    <pattern>${multiLinePattern}</pattern>

    </encoder>

    </appender>

     

    <appender name="file" class="ch.qos.logback.core.rolling.RollingFileAppender">

    <file>${logDirectory}/${componentName}.log</file>

    <rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">

    <fileNamePattern>${logDirectory}/${componentName}.%i.log.zip</fileNamePattern>

    <minIndex>1</minIndex>

    <maxIndex>9</maxIndex>

    </rollingPolicy>

    <triggeringPolicy

    class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">

    <maxFileSize>5MB</maxFileSize>

    </triggeringPolicy>

    <encoder>

    <pattern>${singleLinePattern}</pattern>

    </encoder>

    </appender>

     

    <appender name="asyncFile" class="ch.qos.logback.classic.AsyncAppender">

    <queueSize>256</queueSize>

    <appender-ref ref="file" />

    </appender>

     

    <!-- using level debug yields enormous output from spring and friends -->

    <root level="info">

    <appender-ref ref="console" />

    <appender-ref ref="asyncFile" />

    </root>

     

    <!-- Apache HttpClient has verbose debug output; silence it -->

    <logger name="org.apache.http" level="INFO" />

     

    <!-- Spring client has verbose debug output; silence it -->

    <logger name="org.springframework.web.client" level="INFO" />

     

    </configuration>

     

     

  35. Just updated the acumos table with Lorraine Welch - retrofitting ONAP here based on the pipe delimited 25 column spec for audit.log

    https://wiki.acumos.org/pages/viewpage.action?pageId=10028842

    based on the onap/portal/sdk logback.xml pattern for audit  - use this pattern for all 4 logs (audit, error, metrics, debug)

    https://git.onap.org/oom/tree/kubernetes/portal/charts/portal-sdk/resources/config/deliveries/properties/ONAPPORTALSDK/logback.xml#n59


     original onap portal/sdk  20180806
      <property name="auditPattern" value="%X{BeginTimestamp}|%X{EndTimestamp}|%X{RequestId}|%X{ServiceInstanceId}|%thread||%X{ServiceName}|%X{PartnerName}|%X{StatusCode}|%X{ResponseCode}|%X{ResponseDesc}|%X{InstanceUUID}|%.-5level|%X{AlertSeverity}|%X{ServerIPAddress}|%X{Timer}|%X{ServerFQDN}|%X{RemoteHost}||||||||%msg%n" />
    
    modified acumos - 25 fields
      <property name="auditPattern" value="%X{EntryTimestamp}|%X{InvokeTimestamp}|%X{RequestId}|%X{InvocationId}|%X{InstanceUUID}|%X{ServiceInstanceId}|%thread|%X{ServiceName}|%X{PartnerName}|%X{StatusCode}|%X{ResponseCode}|%X{ResponseDesc}|%.-5level|%X{Severity}|%X{ServerIPAddress}|%X{ElapsedTime}|%X{ServerFQDN}|%X{ClientIPAddress}|%X{VirtualServerName}|%X{ContextName}|%X{TargetEntity}|%X{TargetServiceName}|%X{TargetElement}|%X{User}|%msg%n" />
  36. Why are we requiring the usage of the logging library to dictate levels?  Whom decided this?  I have no intentions of ever using the library as I feel it does not provide sufficient value, so I assume we will never reach above level 1?

    1. Sounds good we are planning on using the portal/sdk library.

      Levels can be set independently via the 4 log appenders in logback.xml outside of a library

        <logger name="org.onap.eelf.audit" level="info" additivity="false">

          <appender-ref ref="asyncEELFAudit" />

        </logger>

        <logger name="org.onap.eelf.debug" level="debug" additivity="false">

          <appender-ref ref="asyncEELFDebug" />

        </logger>

        <logger name="org.onap.eelf.error" level="info" additivity="false">

          <appender-ref ref="asyncEELFError" />

        </logger>

        <logger name="org.onap.eelf.metrics" level="info" additivity="false">

          <appender-ref ref="asyncEELFMetrics" />

        </logger>

      LOG-631 - Getting issue details... STATUS

      1. I meant more like measuring levels of compliance?  It appears in order to be Level 3 compliant I am forced to use your library?  Why can I not just use SL4J and meet your logging spec?


        IE this


        Level 2: Log specification compliance via SLF4J supplied logging-analytics library

        Use Luke's SLF4J library directly as wrapper for MDCs and Markers by calling the library from within each function.

        Level 3: Log specification compliance via AOP library over SLF4J supplied logging-analytics library

        Use a spring based Aspect library that emits Markers automatically around function calls and retrofit your code to log via Luke's SLF4J for internal log messages.

        1. right, you can do level 1 and continue to use your own code to make the request headers.  For logback.xml eventually a single file and pattern will be communicated and phased in - for now you are on your own in making sure the 29 fields are in where available.

          /michael

          1. Do you have a draft of the logback.xml? or some place i can pull it from?

            1. went through a refactor to align with Acumos last week (same people in ONAP)

              Logging Developer Guide#Logback.xml

              in progress at

              https://gerrit.onap.org/r/#/c/62405

              focus is on consolidating all the oom logback.xml files into 1 to fix the configMap size issue

  37. For the p_log or logger field missing from acumos that I added back to onap - I was wrong this is not for markers - this is the logger itself on the class - usually the same as the class name.

    notice new field p_logger at pos 25 - after user

    notice new field p_mdc at pos 26 and p_marker at 28 (after Message at 27)


      <property name="p_log" value="%logger"/>

    The marker field I am adding back a p_mak so it displays what lukes slf4j library already emits

      <property name="p_mak" value="%replace(%replace(%marker){'\t', '\\\\t'}){'\n','\\\\n'}"/>

    For the msg to Message field - need to adjust the code because

    -<property name="p_message" value="%replace(%replace(%msg){'\t', '\\\\t'}){'\n','\\\\n'}"/>

    +<property name="p_message" value="%replace(%replace(%Message){'\t', '\\\\t'}){'\n','\\\\n'}"/>

    breaks

    |%PARSER_ERROR[Message]

    for the serverFQDN that outputs the dns entry of the host vm for the k8s cluster - i am retesting on a cluster so we would expect dev2.onap.info instead of dev.onap.info

    merging

      <property name="p_tim" value="%d{&quot;yyyy-MM-dd'T'HH:mm:ss.SSSXXX&quot;, UTC}"/>

      <property name="p_lvl" value="%level"/>

      <property name="p_logger" value="%logger"/>

      <property name="p_mdc" value="%replace(%replace(%mdc){'\t','\\\\t'}){'\n', '\\\\n'}"/>

      <property name="p_message" value="%replace(%replace(%msg){'\t', '\\\\t'}){'\n','\\\\n'}"/>

      <property name="p_exc" value="%replace(%replace(%rootException){'\t', '\\\\t'}){'\n','\\\\n'}"/>

      <property name="p_marker" value="%replace(%replace(%marker){'\t', '\\\\t'}){'\n','\\\\n'}"/>

      <property name="p_thr" value="%thread"/>

      <!--  property name="pattern" value="%nopexception${p_tim}\t${p_thr}\t${p_lvl}\t${p_log}\t${p_mdc}\t${p_msg}\t${p_exc}\t${p_mak}\t%n"/-->

      <property name="pattern" value="%nopexception${p_tim}|%X{EntryTimestamp}|%X{InvokeTimestamp}|%X{RequestId}|%X{InvocationId}|%X{InstanceUUID}|%X{ServiceInstanceId}|%thread|%X{ServiceName}|%X{PartnerName}|%X{StatusCode}|%X{ResponseCode}|%X{ResponseDesc}|%.-5level|%X{Severity}|%X{ServerIPAddress}|%X{ElapsedTime}|%X{ServerFQDN}|%X{ClientIPAddress}|%X{VirtualServerName}|%X{ContextName}|%X{TargetEntity}|%X{TargetServiceName}|%X{TargetElement}|%X{User}|${p_logger}|${p_mdc}|${p_message}|${p_marker}%n" />

      

    for

    2018-08-28T16:26:57.887Z||2018-08-28T16:26:57.883Z|||7741c447-cd4f-4f0a-9e7a-9abc3eebe8c1||http-nio-8080-exec-30|/logging-demo/rest/health/health|||||INFO||||localhost|0:0:0:0:0:0:0:1|||||||org.onap.demo.logging.ApplicationService|ResponseCode=, InstanceUUID=7741c447-cd4f-4f0a-9e7a-9abc3eebe8c1, RequestID=c897a12f-fa51-4450-8645-3ee907946695, ServiceName=/logging-demo/rest/health/health, ResponseDescription=, InvocationID=067ba289-9017-462a-8d68-aa2670a04205, Severity=, InvokeTimestamp=2018-08-28T16:26:57.883Z, PartnerName=, ClientIPAddress=0:0:0:0:0:0:0:1, ServerFQDN=localhost, StatusCode=||EXIT


    ${p_logger}| 26

    The name of the class doing the logging (in my case the ApplicationController – close to the targetservicename but at the class granular level

    ${p_mdc}|27

    The key/value pairs all in one pipe field (will have some duplications currently whit MDC’s that are in their own pipe – but allows us to expand the MDC list – replaces customvalue1-3 older fields)

    ${p_marker}%n" 29

    The marker labels INVOKE, ENTRY, EXIT – and later will also include DEBUG, AUDIT, METRICS, ERROR when we go to 1 log file


    1. Talking with Lorraine - follow her classifications to handle the mdc and marker fields

      log system type = marker

      https://wiki.acumos.org/display/OAM/Log+Standards?focusedCommentId=10028838&refresh=1535137583955#comment-10028838


  38. For Casablanca log file locations should be /var/log/onap - not /var/log/ONAP - the spec will be adjusted

    /michael

    1. Frankfurt spec has the location updated.

  39. LOG-634 - Getting issue details... STATUS

    see
    https://wiki.onap.org/display/DW/LOG+Meeting+Minutes+2018-06-26
    and the spec
    https://wiki.onap.org/pages/viewpage.action?pageId=28378955#ONAPApplicationLoggingSpecificationv1.2(Casablanca)-MDC-RequestID

    we need a way to support client/server calls between pods running the old or new versions of the http headers

    [~pau2882] is running into this during communication from pomba aggregator to context builder (based on AAI which is using the pre-casablanca headers)

    Larger issue is teams are not aware of the spec change - because we not communicated it yet.

    Plan is to support non-casablanca spec in casablanca and communicate the spec in dublin


  40. TODO: 

    LOG-635 - Getting issue details... STATUS

    [~pau2882] and [~bbenesch] found this discrepancy in the spec when they used the X-ONAP-PartnerName header attribute

    We already expose support for the header
    public static final String PARTNER_NAME = "X-ONAP-PartnerName";

    https://git.onap.org/logging-analytics/tree/reference/logging-slf4j/src/main/java/org/onap/logging/ref/slf4j/ONAPLogAdapter.java#n313

    add to the spec under
    https://wiki.onap.org/pages/viewpage.action?pageId=28378955#ONAPApplicationLoggingSpecificationv1.2(Casablanca)-MDC-PartnerName

  41. Response Code and Response Description, seem to indicate they are conditional and only populated on error scenarios based on the text description.  however they are marked as mandatory.  Do you expect us to always return these values?

  42. Invoke and Invoke-return seem semi confusing, are you asking for the invoke marker on the initial request?, then invoke-return on the response from that component?  The current sample implementation does not even have an invoke-return marker.

    1. Yes, invoke marker is for the initial request.

      Yes, invoke-return marker is for the response.

      Added  LOG-1174 - Getting issue details... STATUS  to track adding a sample implementation of invoke-return marker.

  43. I use defaultPattern as it is mentioned above. The problem is that my output looks like:

    org.onap.dcaegen2.services.prh.MainApp     |2018-09-06T11:38:06.527Z     |INFO     |\|T\|h\|e\| \|f\|o\|l\|l\|o\|w\|i\|n\|g\| \|p\|r\|o\|f\|i\|l\|e\|s\| \|a\|r\|e\| \|a\|c\|t\|i\|v\|e\|:\| \|d\|e\|v\|     |\|     |\|     |\|     |main     |

    removing "%replace(... ){'|','\\\\|'}" makes it look better:

    org.onap.dcaegen2.services.prh.MainApp     |2018-09-06T11:49:12.722Z     |INFO     |The following profiles are active: dev     |     |     |     |main     |

    However all is in single line. Is it good pattern or I'm doing something wrong? https://gerrit.onap.org/r/#/c/64993/

    Michael O'Brien

    1. I noticed something similar as well, when using the newer format

    2. I am seeing the following with the WIP https://gerrit.onap.org/r/#/c/62405/ however I am still missing some field values - specifically the empty tabs after "INFO" so I may have the same issue - investigating....

      2018-09-11T17:40:52.370Z||2018-09-11T17:40:52.364Z|||c36d0a58-2732-4b51-ac2f-8fbcd7129b57||http-nio-8080-exec-3|/logging-demo/rest/health/health|||||INFO||||127.0.0.1|127.0.0.1|||||||org.onap.demo.logging.ApplicationService|InstanceUUID=c36d0a58-2732-4b51-ac2f-8fbcd7129b57, RequestID=11aec6d7-cf18-4736-96ff-2ae541a9df6a, ServiceName=/logging-demo/rest/health/health, InvocationID=5120f68e-20e7-49cc-9ffd-a4da8c0abfda, InvokeTimestamp=2018-09-11T17:40:52.364Z, PartnerName=, ClientIPAddress=127.0.0.1, ServerFQDN=127.0.0.1||ENTRY
      
      2018-09-11T17:40:52.370Z||2018-09-11T17:40:52.364Z|||c36d0a58-2732-4b51-ac2f-8fbcd7129b57||http-nio-8080-exec-3|/logging-demo/rest/health/health|||||INFO||||127.0.0.1|127.0.0.1|||||||org.onap.demo.logging.ApplicationService|InstanceUUID=c36d0a58-2732-4b51-ac2f-8fbcd7129b57, RequestID=11aec6d7-cf18-4736-96ff-2ae541a9df6a, ServiceName=/logging-demo/rest/health/health, InvocationID=5120f68e-20e7-49cc-9ffd-a4da8c0abfda, InvokeTimestamp=2018-09-11T17:40:52.364Z, PartnerName=, ClientIPAddress=127.0.0.1, ServerFQDN=127.0.0.1|default appender - info level - Running /health|
  44. Acumos meet min - some more alignment on the logback spec - we released the final version for Casablanca (ONAP) and Athena (Acumos)

    the Dublin version ONAP Application Logging Specification - Post Dublin will contain changes post M4 code freeze (20180920) and athena for Acumos (20180914)

    synced

    version 30 of

    https://wiki.acumos.org/display/OAM/Log+Standards

    with version 111 of

    ONAP Application Logging Specification v1.2 (Casablanca)

    details on 

    LOG Meeting Minutes 2018-09-20 - M4 Code Freeze

  45. TODO: correlate "X-FromAppId" existing in code like SO and AAI to a field here in the new spec like TargetEntity for values like "SDC-BE"

    in VFC it is at the container not pod level, in SO it is just the pod like "MSO" in AAI it is required - or an exception will be thrown

    X-FromAppID is not generated like we do for requestID via X-Transaction...

  46. We are seeing an issue with multiple concurrent end-to-end test environments running multiple releases.  Let's suppose we have two end-to-end test environments T1 and T2, each of which is a complete ONAP deployment, but we only have one instance of a log repository (e.g. ELK) in which we are keeping the logs.  I am looking for a solution for how to tell the two environments apart in the logs.  Granted I could perhaps look at things like host/VM names, but what if I have containers that are not from the same environment or release on the same VM?

    it is starting to seem that the only way to really solve this once and for all is to put an environment and release indicator into every log message.  This could work as follows:

    1. Create a HELM chart that would require these two attributes be provided
    2. Provide the values at container deployment time
    3. Within the container, pick up the values (in the standard logging library, where applicable) and place the values into each log message

    Does anyone have any other ideas for how to distinguish between environments and releases or how to get environment and release indicators into the log?

    Note this is probably something for Dublin if not for the "E" release.

    1. Good point Dave, I will be landing soon and will address this issue of differentiation - likely with a global k8s tag in the root values.yaml  common chart that is injected as an env variable - so deployments are self aware.  Definitely for Dublin - and likely for other teams.

      LOG-707 - Getting issue details... STATUS

      nice

      /michael

  47. Hello  Michael O'Brien and #logging team.   These are some comments for your consideration, as we are in the initial phases of scoping out the Dublin work::

    1. L0-L3 compliance

    L0-L3 logging compliance levels seem more oriented to java projects and a progressive move to use the logging reference library. It seems to me though that compliance of the actual "logging entries content and format", "file location", etc .. is not 100% guaranteed at higher levels, while a project at a lower level may be compliant in all those aspects. A thought is that it may be a better measure to define the levels terms of adherence to the spec as a blackbox. For example, and just to say: L0: conformance to spec (measurable by some tooling that checks the format), L1: configuration at /var/log/onap/<component>/<subcomponent>, .. 

    2. Logging reference implementation enhancements.

    I realize this may be difficult, but it would be great to have a logging reference implementation that embeds natively as a provider under the slf4j umbrella (as the logback and log4j libraries do), preserving the universally accepted logging APIs provided by slf4j, and not to lose precision in some formatting flags. For example, the logging reference implementation, I think, since it is wrapping on top slf4j, would lose functionality with regards to the %C, %L, and %M, as logging entries would point to the wrapper library, instead of the real location at where the emission of the logging statement took place. These are preserved by a native implementation, such as logback and log4j do as direct providers under slf4j.

    3. Semantics of audit/error/metrics files?

    It was not clear if the audit/error/metrics/debug types of files are preserved, and some references to an unified log file seem to be suggested in the comments. It seems to me that the semantics attached to the audit/error/metrics (but not debug) as they were originally defined is what is mostly of interest for the analytics team and the spec.

    4. Scope of the spec

    I gather the impression, although not 100% clear, that the spec would apply to all logging activities for any given component. I think there should be a clear partition into 2 categories of logging files in the spec for individual ONAP components to produce:

    a) the first category would apply to those logs that are relevant from a global view of the system, and are subject to analytics processing (the consumer would be an ONAP operator), and

    b) the second category for those that are only of relevance to particular development teams, that can only make sense of by development teams with intimate domain specific knowledge, and not particularly useful to everybody else. This b) category will also include those files that individual component teams consider of little value to be fit in the a) category plus those that cannot conform with the spec from 3rd party software (such as databases, middleware, and libraries).

    Those in category a) are the ones that should be suitable to global analytics processing, and should be required strict compliance to the logging spec. This would also alleviate performance issues, and will prevent the cluttering of "global" records with unnecessary information. It is well known of the significant performance overhead that verbose logging can imposed in a system, and more importantly seems to me this will greatly impact system performance with the post-mortem analytics processing (for example some libraries may log any partial socket read or write from/into the wire when enabled in debug or trace mode).

    For those in b) category, I suggest to explicitly be given some degree of leeway in compliance in the spec. The leeway may be in terms of formatting, the current spec dictates very verbose logging records, and don't seem very effective when involved in debugging and troubleshooting activities with multiple log files. It also requires special crafting when dealing with async systems where the MDC structure is not straightforward to maintain.   On the other hand, I think some aspects from the spec should be preserved such as 30 day retention period.

  48. I find it impossible to implement this ONAP Logging specification from python because the end goal – the log record format to be output – is not finished.  Please complete the Text Output section.

    I tried to interpret the reworked spec example from the following link but could not be certain what text would be output for all fields:

    https://gerrit.onap.org/r/#/c/62405/20/reference/logging-kubernetes/logdemonode/charts/logdemonode/resources/config/logback.xml

    A few examples:

    1. It has at least three timestamp fields but only one of them is shown with a format.

    2. The ElapsedTime field format is not described.

    3. The MDC field 27 is shown in an example in the Text Output section we are told to ignore for now, but as currently described (key=value, delimited by commas, newlines and tabs escaped) it won't be parseable if any of the MDCs contains a comma. (Also for ACUMOS that uses pipes instead of tabs as a delimiter, it won't be parseable if any of the MDCs contains a pipe.)  Instead of trying to invent a new serialization, why not use a proven method like converting MDCs to a JSON string?  Then escape the delimiter when putting it into a delimited field of your output log record?


  49. It sounds like EELF is a facade of a facade (slf4j). The link to EELF is broken what is the purpose of EELF if to use MDCs slf4j wrapper has to be used? Slf4j is known by most java developers their website and code is easily found and working...

  50. I am a little confused on the difference between what the Audit and Metrics log actually log. Reading the description I assumed the Audit log had the API call and the successful response (where relevant). Conversely, I picked up that the Metrics log could log every step from the call to the return. Unless I am misreading, the above example https://gerrit.onap.org/r/#/c/62405/20/reference/logging-kubernetes/logdemonode/charts/logdemonode/resources/config/logback.xml has the two files logging the exact same information.

    1. So Audit ideally would be incoming calls to your server and its final response.  Metric would be outgoing calls from your server to another Server and its response.  I agree overall the split is rather confusing.


      In SO we use the following configurations

      For Audit we filter like this.

      <filter class="ch.qos.logback.core.filter.EvaluatorFilter">

      <evaluator class="ch.qos.logback.classic.boolex.OnMarkerEvaluator">

      <marker>EXIT</marker>

      </evaluator>

      <onMismatch>DENY</onMismatch>

      <onMatch>ACCEPT</onMatch>

      </filter>


      On Metric we filter like so

      <filter class="ch.qos.logback.core.filter.EvaluatorFilter">

      <evaluator class="ch.qos.logback.classic.boolex.OnMarkerEvaluator">

      <marker>INVOKE_RETURN</marker>

      </evaluator>

      <onMismatch>DENY</onMismatch>

      <onMatch>ACCEPT</onMatch>

      </filter>

  51. It looks like p_message and p_marker are defined as the same thing.  I suspect p_message was intended to be something else.  Does anybody know what happened here?

    1. In looking at this a little further, I believe the description of p_marker was accidentally pasted over the description of p_message.  Since there is no other free-format field anywhere in the list, I believe p_message was intended to be that.

      1. p_message

        log system


        The marker labels INVOKE, ENTRY, EXIT – and later will also include DEBUG, AUDIT, METRICS, ERROR when we go to 1 log file - this field is %marker

        p_marker

        log system


        The marker labels INVOKE, ENTRY, EXIT – and later will also include DEBUG, AUDIT, METRICS, ERROR when we go to 1 log file - this field is %marker




        Good catch -  p_message was intended to be the text put %msg; it is not the same as p_marker


        Message

        log system

        standard attribute - defined in logback.xml - Message - used for %msg%



        I went back to many previous version of this wiki page and could not find one with p_message properly filled out.

        Message

        The free text payload of a log event. 

        This is the most important item of information in most log messages. See General guidelines.


        Please refer to https://gerrit.onap.org/r/#/c/logging-analytics/+/62405/20/reference/logging-demo/src/main/resources/logback.xml: Lines 40,79

  52. There is no mention of what ServiceInstanceID is supposed to contain.  This should probably be spelled out.

  53. There is mention in some of the discussion about ServerFQDN that it supercedes VirtualServerName, yet VirtualServerName is still in the table of MDCs.  Should VirtualServerName be removed from the table of MDCs or is there some reason why we are leaving it there for a release or two?

    1. Per discussion at the ONAP Logging project meeting on 7/23/2019, we clarified that VirtualServerName is being retained in the Casablanca spec (and through at least El Alto) for backward compatibility.  In may be removed in Frankfurt or later.

  54. There is no mention of what ContextName is supposed to contain.  This should probably be spelled out.

    1. Per discussion at the ONAP Logging project meeting on 7/23/2019, we were not able to find any definition for ContextName; therefore, it does not have to be provided.

  55. Can we please clarify what we expect to happen with InvocationID in the case of an asynchronous response?  I could see two possibilities:

    1) The asynchronous response contains the InvocationID of the original request

    2) The asynchronous response, itself being a request, contains a new InvocationID

    1. Per discussion at the ONAP Logging project meeting on 7/23/2019, we decided to go with option #1 (asynchronous response contains the InvocationID of the original request) to assist in tying the async response to the original request.

  56. Regarding PartnerName, the following language is starting to make less and less sense every time I read it:

    unauthenticated = The part of the URI specifying the agent that the caller used to make the call to the component that is logging the message.

    I had been interpreting "agent" as the API handler on the called side that was interpreting the request.  Thus, the applicable part of the URI would be the initial elements that identify that API handler.  Any parameters to that API handler (provided in the form of path info, query string, or fragment) would be omitted since that is not necessary to identify the API handler.

    But I am now starting to wonder if that really makes sense here.  "The agent that the caller used to make the call" is starting to sound more to me like a browser, curl, or whatever the caller happened to use to initiate the HTTP request.  This would make sense except I don't think that is part of the URI.  It is; however, probably contained in one of the headers, likely "User-Agent".  I am wondering if this would be more correctly worded as follows:

    unauthenticated = The HTTP header specifying the agent that the caller used to make the call to the component that is logging the message (e.g. "User-Agent").

    1. That first paragraph looks to have evolved from a similarly puzzling description in the original guidelines at  https://wiki.onap.org/download/attachments/1015849/ONAP%20application%20logging%20guidelines.pdf?api=v2, and my guess is that nobody dared touch the bit they didn't understand. 

      After the opening paragraph, the description starts to make more sense. It identifies the caller. It's derived and populated inconsistently, but I think it's the short name of the calling component.  

      1. Thanks, Luke.  At least the original guidelines you referenced do not talk about taking a part of the URI; any part of the URI would seem to have little hope of providing information about the calling component.  For the unauthenticated case, do you think grabbing something like the "User-Agent" HTTP header makes sense?

        1. We discussed this at the 8/6/2019 ONAP logging meeting and concluded that using the "User-Agent" HTTP header as the fallback makes sense.  Thus, the current strategy is as follows:

          1. If an authenticated API, then log the userid
          2. Otherwise, if the HTTP header "X-ONAP-PartnerName" was provided, then log that (note: this was a direction that we seemed to be going but never completed)
          3. Otherwise, if the HTTP header "User-Agent" was provided, then log that
          4. Otherwise, log "UNKNOWN" (since the field is currently required, something must be in it)
          1. Quoting https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent

            The User-Agent request header contains a characteristic string that allows the network protocol peers to identify the application type, operating system, software vendor or software version of the requesting software user agent.

            Example value

            Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0


            To me this seems more confusing than helpful. I couldn't find any application code that sets this to a meaningful value, can someone link me to source code that suggests otherwise?

  57. ResponseCode should this be the HTTP status code such as 200 or 403?

    ResponseDesc should this be the text description of the HTTP status code? for example HTTP 200 it would be OK HTTP 403 would be Forbidden and 401 would be Unauthorized

    1. By my understanding, ResponseCode is supposed to contain application-specific error codes.  That does not sound like HTTP status codes to me.  For context, a recommended (required?) categorization was provided in the initial logging specification as follows:  0=success, 100=permission errors, 200=availability errors/timeouts, 300=data errors, 400=schema errors, 500=business process errors, 900=unknown errors.  That is definitely not HTTP status codes.  That table was removed at some point and the work of expanding the definition is listed in this spec as pending work that was never done.  We should probably work this in El Alto or Frankfurt, but I am not sure what we should be telling developers to do in the meantime.  Any thoughts on that from others on the team?

      1. Over the past few days, we have had some further email discussion as well as in today's ONAP logging project meeting.  I think we have settled on the idea that a more expansive understanding of what an application-specific error code could be is appropriate.  In particular, there seems to be little trouble in allowing an application team to decide that their HTTP-based application will log HTTP status codes as the ResponseCode.

        However, that does not mean this is encouraged.  The HTTP status codes are rather broad and generic (by design) and do not provide as much detail about what went wrong compared to what a well chosen list of application-specific codes could.

        And, of course, since not all applications are HTTP-based, it would be inappropriate to define in the spec that HTTP status codes should be reported in ResponseCode.  Thus, the spec seems fine as written.

  58. In an environment where docker & k8s is being used what will the value of ServerIPAddress be?

  59. Is there as standard set of values for TargetEntity? You wouldn't want different values represent the same entity IE "AAI", "A&AI", "A and AI"

    1. This was not spelled out in this spec but is certainly needed for the exact reason you point out.  In the El Alto time frame, AT&T will be proposing to add standardized ONAP component names to the ONAPLogConstants class.  In the meantime within AT&T, please use the updated version of this class that is being delivered as part of the 2019.11 logging project 319912.

  60. Is there any list of valid values for partner name? At least for the major components?

    1. Note:  I had originally posted the following reply as a separate comment on 8/6.  I am re-posting as a reply.

      This was not spelled out in this spec. In the El Alto time frame, AT&T will be proposing to add standardized ONAP component names to the ONAPLogConstants class. In the meantime within AT&T, please use the updated version of this class that is being delivered as part of the 2019.11 logging project 319912.

  61. Should PartnerName and TargetEntity be restricted to the same set of values? I saw the example value of SDC-BE given for both.

    1. The way I look at this, the short answer is "no" but there is a bit more to it than that.

      With regard to PartnerName, AT&T will be proposing in the El Alto time frame that, for non-authenticated APIs, it be logged as "Component.Microservice" with Component being defined as the standardized name of the relevant component (this will be a proposed addition to ONAPLogConstants) and Microservice being consistent within any given microservice.  Of course, this depends on X-ONAP-PartnerName being provided in HTTP headers (another proposal–all of this ties together) at least within ONAP.  External entities that are not authenticated would get logged as either the User-Agent header or simply "UNKNOWN".

      On the other hand, TargetEntity is defined as the ONAP component or sub-component name or the name of an external entity.  For a TargetEntity within ONAP, using the same standard component names as for PartnerName makes sense to me, but probably not a microservice name (how do I as the caller know how you the API provider names your microservices).  The fact that TargetServiceName would specify the specific service being called, I do not see the need to provide anything more in TargetEntity.  How to name external entities seems like it could also be standardized but that is likely outside the scope of an ONAP spec.

      1. For a TargetEntity within ONAP, using the same standard component names as for PartnerName makes sense to me

        If we could standardize even this that would be good.

        how do I as the caller know how you the API provider names your microservices?

        Is the caller directly calling the microservice?

  62. The logging constants have COMPLETED, but the spec says COMPLETE

    See https://gerrit.onap.org/r/gitweb?p=logging-analytics.git;a=blob;f=reference/logging-slf4j/src/main/java/org/onap/logging/ref/slf4j/ONAPLogConstants.java;h=52aeac5ce33bc2cb992c653e89342412a367e174;hb=refs/heads/casablanca#l207

    Quoting spec from above

    MDC - StatusCode

    This field indicates the high level status of the request. It must have the value COMPLETE when the request is successful and ERROR when there is a failure.  And INPROGRESS for states between the two.

    1. As far as I can tell, the code is wrong.  The value should be "COMPLETE" not "COMPLETED".

      1. Created  LOG-1164 - Getting issue details... STATUS  to track this.

    1. Similar to InstanceUUID/InstanceID, this MDC was known as ResponseDescription in Beijing and prior but was changed to ResponseDesc in Casablanca.  At least in this case some of the samples/examples were updated, but not all of them were and the code was never updated (and even some places in the spec still refer to the prior name).  As far as I can tell, all of these should have been updated to consistently refer to this as ResponseDesc.

      1. Created  LOG-1165 - Getting issue details... STATUS  to track this.

  63. If you use a logging conversion pattern like

    %d{&quot;yyyy-MM-dd'T'HH:mm:ss.SSSXXX&quot;,UTC}

     can you still easily calculate elapsed time? Elapsed time is not described above, an older spec states

    Elapsed time to complete processing of an API or request at the granularity available to the component system.  This value should be the difference between BeginTimestamp and EndTimestamp fields.

    EndTimestamp is not present in casablanca logging spec so I was assuming elapsed time = LogTimestamp - BeginTimeStamp


    1. You would just calculate elapsed time in the filter as StartTime - now(), and write this to elapsed time.

    2. You would just calculate elapsed time in the filter as StartTime - now(), and write this to elapsed time.

    3. You would just calculate elapsed time in the filter as StartTime - now(), and write this to elapsed time.

    1. This MDC was called InstanceUUID in Beijing and prior but was changed to InstanceID starting with Casablanca.  As I recall, the logic was having a simpler name that lined up better with RequestID, InvocationID, etc.  Unfortunately, none of the code, example log messages, example Logback configs, and so on were updated.  They should have been as far as I can tell.

      1. Created  LOG-1166 - Getting issue details... STATUS  to track this.

  64. When a loadbalancer is present and the X-Forwarded-For header has been set, should the value of that header be used for the ClientIPAddress? That was my understanding, I want to make sure others agree.

  65. The original ONAP logging spec (presumably from Amsterdam though I cannot find that exact spec to verify) had a concept of a composite RequestID.  This was intended to be used in cases where a component must, for the same given transaction (e.g. RequestID), make many related API sub-requests to a second component which could elect to interpret the HTTP headers as a nonce value for purposes of detecting duplicate requests.  The idea was to add a suffix to the RequestID in the form of "UUID:suffix" where the suffix could be either an integer or another UUID.

    I have a suspicion that the composite RequestID was probably dropped when InvocationID was added since it could serve the same purpose.  Sanjay Agraharam shares the same thinking.  But I wanted to throw it out there to see if others agree at least that we got rid of composite RequestID for a good reason and that we don't want it back (as opposed to having unintentionally dropped it and thinking we need it back), if not also believing that this is why we made that decision.

  66. Are error codes always numeric? Above I saw "COMPONENT_X.STORAGE_ERROR", would that literally appear in the log as an error code or is this the name of constant that points to a numeric value?