About This Document
Official R1 documentation snapshot in https://onap.readthedocs.io/en/latest/submodules/logging-analytics.git/docs/
This document specifies logging conventions to be followed by ONAP component applications.
ONAP logging is intended to support operability, debugging and reporting on ONAP. These guidelines address:
- Events that are written by ONAP components.
- Propagation of transaction and invocation information between components.
- MDCs, Markers and other information that should be attached to log messages.
- MDC = Mapped Diagnostic Context
- Human and machine-readable output format(s).
- Files, locations and other conventions.
Original AT&T ONAP Logging guidelines (pre amsterdam release) - for historical reference only: https://wiki.onap.org/download/attachments/1015849/ONAP%20application%20logging%20guidelines.pdf?api=v2
The Acumos logging specification follows this document at https://wiki.acumos.org/display/OAM/Log+Standards
Logback reference: Logging Developer Guide#Logback.xml based on https://gerrit.onap.org/r/#/c/62405
Introduction
The purpose of ONAP logging is to capture information needed to operate, troubleshoot and report on the performance of the ONAP platform and its constituent components. Log records may be viewed and consumed directly by users and systems, indexed and loaded into a datastore, and used to compute metrics and generate reports.
The processing of a single client request will often involve multiple ONAP components and/or subcomponents (interchangeably referred to as ‘application’ in this document). The ability to track flows across components is critical to understanding ONAP’s behavior and performance. ONAP logging uses a universally unique RequestID value in log records to track the processing of every client request through all the ONAP components involved in its processing.
A reference configuration of Elastic Stack is being deployed using ONAP Operations Manager since the amsterdam release - see usage in Logging Analytics Dashboards (Kibana)
This document proposes conventions you can follow to generate conformant, indexable logging output from your component.
Supported Languages
Java (including Scala and Clojure - because they compile to the same bytecode) and Python are supported, but conventions may be implemented by other technologies like GO.
Java
The library and aop wrapper are written in Java and will work for Clojure (thanks to a discussion with Shwetank Dave that reminded me of languages that compile to bytecode) and Scala as well. see: https://git.onap.org/logging-analytics/tree/reference
Clojure
See the java libraries
Scala
see the java libraries
Python
How to Log
ONAP prescribes conventions. The use of certain APIs and providers is recommended, but they are not mandatory. Most components log via EELF or SLF4J to a provider like Logback or Log4j.
Logging Specification Compliance
For the casablanca release the logging specification has been updated and finalized as of June 2018. Implementation of this specification is required but the method of implementation is optional based on each team's level of possible engagement.
High Level the changes are introduction of MDC's (key/value pairs like requestID=...) and Markers (labels like ENTRY/EXIT)
The ELK stack (indexes and dashboards) is focused on the new specification in Casablanca - if time permits we will have a migration path and/or support for both the older Amsterdam/Beijing release and Casablanca - - LOG-584Getting issue details... STATUS
Level 0: Log specification unchanged
Just keep your code as-is and commit to migrating to the the MDC and Marker focused specification for Dublin - logs will be the older format for now
Level 1: Log specification compliance via local project changes, library or wrapper
Continue to use your own library/wrapper if you have one like in SDC and AAI, change individual source files.
Level 2: Log specification compliance via SLF4J supplied logging-analytics library
Use Luke's SLF4J library directly as wrapper for MDCs and Markers by calling the library from within each function.
Level 3: Log specification compliance via AOP library over SLF4J supplied logging-analytics library
Use a spring based Aspect library that emits Markers automatically around function calls and retrofit your code to log via Luke's SLF4J for internal log messages.
ONAP Application Logging Specification v1.2 (Casablanca)#LoggingWithAOP
Logging Library Location and Use
see https://git.onap.org/logging-analytics/tree/reference/logging-slf4j
and usage ONAP Development#DeveloperUseoftheLoggingLibrary ONAP Development#KubernetesDevOps and Logging User Guide#LoggingDevOps
see Spring AOP example (minimal changes to existing code base) WIP in - LOG-135Getting issue details... STATUS documented at ONAP Development#LoggingWithAOP
EELF
EELF is the Event and Error Logging Framework, described at https://github.com/att/EELF.
EELF abstracts your choice of logging provider, and decorates the familiar Logger contracts with features like:
- Localization.
- Error codes.
- Generated wiki documentation.
- Separate audit, metrics,
securityerror and debug logs.
EELF is a facade, so logging output is configured in two ways:
- By selection of a logging provider such as Logback or Log4j, typically via the classpath.
- By way of a provider configuration document, typically logback.xml or log4j.xml. See ONAP Application Logging Specification v1.2 (Casablanca)#Providers.
SLF4J
SLF4J is a logging facade, and a humble masterpiece. It combines what's common to all major, modern Java logging providers into a single interface. This decouples the caller from the provider, and encourages the use of what's universal, familiar and proven.
EELF also logs via SLF4J's abstractions as the default provider.
Providers
Logging providers are normally enabled by their presence in the classpath. This means the decision may have been made for you, in some cases implicitly by dependencies. If you have a strong preference then you can change providers, but since the implementation is typically abstracted behind EELF or SLF4J, it may not be worth the effort.
Logback
Logback is the most commonly used provider. It is generally configured by an XML document named logback.xml. See ONAP Application Logging Specification v1.2 (Casablanca)#Configuration.
See HELM template https://git.onap.org/logging-analytics/tree/reference/provider/helm/logback
Log4j 2.X
Log4j 2.X is somewhat less common than Logback, but equivalent. It is generally configured by an XML document named log4j.xml. See ONAP Application Logging Specification v1.2 (Casablanca)#Configuration.
Log4j 1.X
Strongly discouraged from Beijing onwards, since 1.X is EOL, and since it does not support escaping, so its output may not be machine-readable. See https://logging.apache.org/log4j/1.2/.
This affects OpenDaylight-based components like SDNC and APPC, since ODL releases prior to Carbon bundled Log4j 1.X, and make it difficult to replace. The Common Controller SDK Project project targets ODL Carbon, so remaining instances of Log4j 1.X should disappear by the time of the Casablanca release.
What to Log
The purpose of logging is to capture diagnostic information.
An important aspect of this is analytics, which requires tracing of requests between components. In a large, distributed and scalable system such as ONAP this is critical to understanding behavior and performance.
General
It isn't the aim of this document to reiterate Best Practices, so advice here is general:
- Use a logging facade such as SLF4J or EELF.
- Write log messages in English.
- Write meaningful messages. Consider what will be useful to consumers of logger output.
- Log at the appropriate level. Be aware of the volume of logs that will be produced.
- Safeguard the information in exceptions, and ensure it is never lost.
- Use error codes to characterize exceptions.
- Log in a machine-readable format. See Conventions.
- Log for analytics as well as troubleshooting.
Others have written extensively on this:
- http://www.masterzen.fr/2013/01/13/the-10-commandments-of-logging/
- https://www.loggly.com/blog/how-to-write-effective-logs-for-remote-logging/
- And so on.
Standard Attributes
These are attributes common to all log messages. They are either:
- Explicitly required by logging APIs:
- Logger
- Level
- Message
- Exception (note that exception is the only standard attribute that may routinely be empty).
- Implicitly derived by the logging provider:
- Timestamp
- Thread.
Which means you normally can't help but report them.
See https://www.slf4j.org/api/org/slf4j/Logger.html and https://logback.qos.ch/manual/layouts.html#ClassicPatternLayout for their origins and use.
Logger Name
This indicates the name of the logger that logged the message.
In Java it is convention to name the logger after the class or package using that logger.
- In Java, report the class or package name.
- In Python, the class or source filename.
Most other languages will fit one of those patterns.
Level
Severity, typically drawn from the enumeration {TRACE, DEBUG, INFO, WARN, ERROR}.
Think carefully about the information you report at each log level. The default log level is INFO.
Some loggers define non-standard levels, like FINE, FINER, WARNING, SEVERE, FATAL or CRITICAL. Use these judiciously, or avoid them.
Message
The free text payload of a log event.
This is the most important item of information in most log messages. See ONAP Application Logging Specification v1.2 (Casablanca)#General guidelines.
Internationalization
Diagnostic log messages generally do not need to be internationalized.
Parameterization
Parameterized messages allow serialization to be deferred until AFTER level threshold checks. This means the cost is never incurred for messages that won't be written.
- Favor parameterized messages, especially for INFO and DEBUG logging.
- Perform expensive serialization in the #toString method of wrapper classes.
For example:
logger.debug("Relax - this won't hurt: {}", new ToStringWrapper(costlyToSerialize));
Exception
The error stacktrace, where applicable.
Log unabridged stacktraces upon error.
When rethrowing, ensure that frame information is not lost:
- By logging the original exception at the point where it was caught.
- By setting the original exception as the cause when rethrowing.
Timestamp
Logged as an ISO8601 UTC datetime. Millisecond (or greater) precision is preferable.
For example:
2018-07-05T20:21:34.794Z
Offset timestamps are OK provided the offset is included. (In the above example, the "Z" is a shorthand indicating an offset of zero – UTC).
Thread
The name of the thread from which the log message was emitted.
Thread names don't necessarily convey useful information, and their reliability depends on the thread model implemented by different runtimes, but they are sometimes used in heuristic analysis.
Efficiency
There is tension between utility and efficiency. IO bandwidth is finite, and the cost of serialization can be significant, especially at higher diagnostic levels.
Methods and Line Numbers
Many loggers can use reflection to emit the originating (Java) method, and even individual line numbers.
This information is certainly useful, but very expensive. Most logging implementations recommend that this not be enabled in production.
Level Thresholds
Level indicates severity.
Logger output is typically filtered by logger and level. The default logging level is INFO, so particular consideration should be given to the efficiency of INFO-level logging.
When DEBUG-level logging is configured, it's probably for good reason, and a greater overhead is expected. Be aware that it's not unusual for DEBUG logging to be left enabled inadvertently, however.
WARN and ERROR-level messages are of higher value, and comparatively rare, so their cost is less of a concern.
Conditionals
A common pattern is to place conditionals around (expensive) serialization.
For example:
if (logger.isDebugEnabled()) { logger.debug("But this WILL hurt: " + costlyToSerialize); }
Parameterized logging is preferable.
Context
MDCs
A Mapped Diagnostic Context (MDC) allows an arbitrary string-valued attribute to be attached to a Java thread via a ThreadLocal variable. The MDC's value is then emitted with each message logged by that thread. The set of MDCs associated with a log message is serialized as unordered name-value pairs (see ONAP Application Logging Specification v1.2 (Casablanca)#Text Output).
A good discussion of MDCs can be found at https://logback.qos.ch/manual/mdc.html.
Example
From Luke Parker's call graph work in https://git.onap.org/logging-analytics/tree/reference/logging-slf4j-demo
LogEntry(markers=ENTRY, logger=ComponentAlpha, requestID=eb3e0dc2-6c3c-4bb7-8ed6-e5cc4ec7aad2, invocationID=06c815ef-5969-45cc-b319-d0dbcde89329, timestamp=Tue May 08 04:23:27 AEST 2018)
Mapped Diagnostic Context Table
- Must be set as early in invocation as possible.
- Must be unset on exit.
- keep in sync with https://wiki.acumos.org/display/OAM/Log+Standards
Pipe Order | Name | Type | Group | Description | Applicable (per log file) | Marker Associations | Moved MDC to standard attribute | Removed (was in older spec) | Required? Y/N/C (C= context dependent) N = not required L=Library provided | Derived | Historical | Acumos ref | Use Cases | Code References |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | LogTimestamp | log system | use %d field - see %d{"yyyy-MM-dd'T'HH:mm:ss.SSSXXX",UTC} | L | ||||||||||
2 | EntryTimestamp | MDC | if part of an ENTRY marker log | C | ||||||||||
3 | InvokeTimestamp | MDC | if part of an INVOKE marker log | C | ||||||||||
4 | MDC | UUID to track the processing of each client request across all the ONAP components involved in its processing | Y | In general | ||||||||||
5 | InvocationID | MDC | UUID correlates log entries relating to a single invocation of a single component | Y | ||||||||||
6 | InstanceID | MDC | UUID to differentiate between multiple instances of the same (named) log writing service/application | Y | was InstanceUUID | |||||||||
7 | ServiceInstanceID | MDC | C | |||||||||||
8 | thread | log system | use %thread field | L | ||||||||||
9 | ServiceName | The service inside the partner doing the call - includes API name | Y | |||||||||||
10 | PartnerName | unauthenticated = The part of the URI specifying the agent that the caller used to make the call to the component that is logging the message. authenticated = userid
| Y | user | ||||||||||
11 | StatusCode | This field indicates the high level status of the request - one of (COMPLETE, ERROR, INPROGRESS) | Y | 20180807: expand from 2 fields to add "INPROGRESS" addresses Chris Lott question on https://wiki.acumos.org/display/OAM/Log+Standards | ||||||||||
12 | ResponseCode | This field contains application-specific error codes. | Y | |||||||||||
13 | ResponseDesc | This field contains a human readable description of the ResponseCode | Y | |||||||||||
14 | level | %level | L | |||||||||||
15 | Severity | Logging level by default aligned with the reported log level - one of INFO/TRACE/DEBUG/WARN/ERROR/FATAL | Y | level (but numbers) | ||||||||||
16 | ServerIPAddress | C | ||||||||||||
17 | ElapsedTime | C | ||||||||||||
18 | ServerFQDN | The VM FQDN if the server is virtualized. Otherwise the host name of the logging component. | Y | |||||||||||
19 | ClientIPAddress | This field contains the requesting remote client application’s IP address if known. Otherwise empty. | Y | |||||||||||
20 | VirtualServerName | C | ||||||||||||
21 | ContextName | C | ||||||||||||
22 | TargetEntity | The name of the ONAP component or sub-component, or external entity, at which the operation activities captured in this metrics log record is invoked. | C | |||||||||||
23 | TargetServiceName | The name of the API or operation activities invoked (name on the remote/target application) at the TargetEntity. | C | |||||||||||
24 | TargetElement | VNF/PNF context dependent - on CRUD operations of VNF/PNFs The IDs that need to be covered with the above Attributes are - VNF_ID OR VNFC_ID : (Unique identifier for a VNF asset that is being instantiated or that would generate an alarms) - VSERVER_ID OR VM_ID (or vmid): (Unique identified for a virtual server or virtual machine on which a Control Loop action is usually taken on, or that is installed as part of instantiation flow) - PNF : (What is the Unique identifier used within ONAP) | C | |||||||||||
25 | User | MDC | User - used for %X{user} | C | ||||||||||
26 | p_logger | log system | The name of the class doing the logging (in my case the ApplicationController – close to the targetservicename but at the class granular level - this field is %logger | L | ||||||||||
27 | p_mdc | log system | allows forward compatability with ELK indexers that read all MDCs in a single field - while maintaining separate MDCs above. The key/value pairs all in one pipe field (will have some duplications currently with MDC’s that are in their own pipe – but allows us to expand the MDC list – replaces customvalue1-3 older fields - this field is %mdc | L | ||||||||||
28 | p_message | log system | The marker labels INVOKE, ENTRY, EXIT – and later will also include DEBUG, AUDIT, METRICS, ERROR when we go to 1 log file - this field is %marker | L | ||||||||||
29 | p_marker | log system | The marker labels INVOKE, ENTRY, EXIT – and later will also include DEBUG, AUDIT, METRICS, ERROR when we go to 1 log file - this field is %marker | L |
Logging
Via SLF4J:
import java.util.UUID; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.slf4j.MDC; // ... final Logger logger = LoggerFactory.getLogger(this.getClass()); MDC.put("SomeUUID", UUID.randomUUID().toString()); try { logger.info("This message will have a UUID-valued 'SomeUUID' MDC attached."); // ... } finally { MDC.clear(); }
EELF doesn't directly support MDCs, but its default provider (where com.att.eelf.configuration.SLF4jWrapper is the configured EELF provider)normally logs via SLF4J, and SLF4J will receive any MDC that is set:
import java.util.UUID; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.slf4j.MDC; import com.att.eelf.configuration.EELFLogger; import com.att.eelf.configuration.EELFManager; // ... final EELFLogger logger = EELFManager.getInstance().getLogger(this.getClass()); MDC.put("SomeUUID", UUID.randomUUID().toString()); try { logger.info("This message will have a UUID-valued 'SomeUUID' MDC attached."); // ... } finally { MDC.clear(); }
Serializing
Output of MDCs must ensure that:
- All reported MDCs are logged with both name AND value. Logging output should not treat any MDCs as special.
- All MDC names and values are escaped.
Escaping in Logback configuration can be achieved with:
%replace(%replace(%mdc){'\t','\\\\t'}){'\n','\\\\n'}
MDC - RequestID
This is often referred to by other names, including "Transaction ID", and one of several (pre-standardization) REST header names including X-TransactionID, X-ECOMP-TransactionID, X-ECOMP-RequestID and X-ONAP-RequestID.
ONAP logging uses a universally unique "RequestID" value in log records to track the processing of each client request across all the ONAP components involved in its processing. RequestID be propagated across all interfaces, not just REST Interfaces.
This value:
- Is logged as a RequestID MDC.
- Is propagated between components in REST calls as an X-ONAP-RequestID HTTP header.
Receiving the X-ONAP-RequestID will vary by component according to APIs and frameworks. In general:
import javax.ws.rs.core.HttpHeaders; // ... final HttpHeaders headers = ...; // ... String txId = headers.getRequestHeaders().getFirst("X-ONAP-RequestID"); if (StringUtils.isBlank(txId)) { txId = UUID.randomUUID().toString(); } MDC.put("RequestID", txID);
Setting the X-ONAP-RequestID likewise will vary. For example:
final String txID = MDC.get("RequestID"); HttpURLConnection cx = ...; // ... cx.setRequestProperty("X-ONAP-RequestID", txID);
Note that it's been suggested that for the duration of Casablanca we report the request ID using all three headers:
- X-ONAP-RequestID (canonical)
- X-RequestID (deprecated)
- X-TransactionID (deprecated)
MDC - InvocationID
InvocationID is similar to RequestID, but where RequestID correlates records relating a single, top-level invocation of ONAP as it traverses many systems, InvocationID correlates log entries relating to a single invocation of a single component. Typically this means via REST, but in certain cases an InvocationID may be allocated without a new invocation, e.g. when a request is retried.
RequestID and InvocationID allow an execution graph to be derived. This requires that:
- The relationship between RequestID and InvocationID is reported.
- The relationship between caller and recipient is reported for each invocation.
The proposed approach is that:
- Callers:
- Issue a new, unique InvocationID UUID for each downstream call they make.
- Log the new InvocationID, indicating the intent to invoke:
- With Markers INVOKE, and SYNCHRONOUS if the invocation is synchronous.
- With their own InvocationID still set as an MDC.
- Pass the InvocationID as an X-InvocationID REST header.
- Invoked components:
- Retrieve the InvocationID from REST headers upon invocation, or generate a UUID default.
- Set the InvocationID MDC.
- Write a log entry with the Marker ENTRY. (In EELF this will be to the AUDIT log).
- Act as per Callers in all downstream requests.
- Write a log entry with the Marker EXIT upon return. (In EELF this will be to the METRICS log).
- Unset all MDCs on exit.
That seems onerous, but:
- It's only a few calls.
- It can be largely abstracted in the case of EELF logging.
MDC - InstanceID
(formerly InstanceUUID)
If known, this field contains a universally unique identifier used to differentiate between multiple instances of the same (named) log writing service/application. Its value is set at instance creation time (and read by it, e.g., at start/initialization time from the environment). This value should be picked up by the component instance from its configuration file and subsequently used to enable differentiation of log records created by multiple, locally load balanced ONAP component or subcomponent instances that are otherwise identically configured.
Handles parallel threads or running across a load balanced set of microservices - for identification.
MDC - PartnerName
This field should contain the name of the client application user agent or user invoking the API. The identification of the entity that made the request being served. For a serving API that is authenticating the request, this should be the authenticated username or equivalent (e.g. a userid or a mechid).
For example SDC-BE instead of just SDC for the overall pods
This is often used for heuristic analysis to identify invocations between ONAP individual ONAP components. Its value has never been clearly stipulated, so a common problem has been a lack of consistency.
There is no clear consensus, but:
- Use the short name of your component, e.g. xyzdriver. (try to incorporate both levels - the container name and the pod the container is in within the kubernetes deployment)
- Values should be human-readable.
- Values should be fine-grained enough to disambiguate subcomponents where it's likely to matter. This is subjective.
- Be consistent: your component should ALWAYS report same value.
Real-life examples include MSO, bpmnclient, BPELClient, (all of which are reported by SO), openECOMP (SDNC), vid (VID!) etc. (See the problem?)
Usage overlaps with InvocationID, which doesn't mean PartnerName gets retired, but which might mean it serves a more descriptive purpose. (Since it hasn't proven to be a great way of generating a call graph).
MDC - ServiceName
The URI that the caller used to make the call to the component that is logging the message.
For EELF Audit log records that capture API requests, this field contains the name of the API invoked at the component creating the record (e.g., Layer3ServiceActivateRequest).
For EELF Audit log records that capture processing as a result of receipt of a message, this field should contain the name of the module that processes the message.
Usage is the same for indexable logs.
MDC - StatusCode
This field indicates the high level status of the request. It must have the value COMPLETE when the request is successful and ERROR when there is a failure. And INPROGRESS for states between the two.
Discussion: status/response/severity relationship
status = global, response below is app specific
Ability to render severity-like line in a non-debug log
MDC - ResponseCode
This field contains application-specific error codes. For consistency, common error categorizations should be used.
MDC - Severity
OPS specific
Use/Map existing https://www.slf4j.org/api/org/apache/commons/logging/Log.html
ENUM is INFO/TRACE/DEBUG/WARN/ERROR/FATAL
By default - align this severity with the reported log level
(optionally a way to map actual level from reported level if required)
MDC - ServerFQDN
This field contains the Virtual Machine (VM) Fully Qualified Domain Name (FQDN) if the server is virtualized. Otherwise, it contains the host name of the logging component.
Best effort (ip, fqdn)
(previously covered by removed "Server" field)
redundancy between clientIP, server, virtualServer name is OK - and helpfull for runtime OPS/Hybrid envs
supercedes virtualServerName
Report what is in the http header
Discussion: roll all 3 fqdn, hostname or ip into one field - do we ever need two of the 3 fields concurrently?
TODO: Verify what is also available from a filebeat agent when it exists
MDC - ClientIPAddress
This field contains the requesting remote client application’s IP address if known. Otherwise this field can be empty.
We don't differentiate between inside/outside ONAP for the IP - this supports hybrid environments
Derived from the system
redundancy between clientIP, server, virtualServer name is OK - and helpfull for runtime OPS/Hybrid envs
Discussion: do we need both ip and fqdn fields?
Report what is in the http header
MDC - EntryTimestamp
Date-time that processing activities being logged begins. The value should be represented in UTC and formatted per ISO 8601, such as “2015-06-03T13:21:58+00:00”. The time should be shown with the maximum resolution available to the logging component (e.g., milliseconds, microseconds) by including the appropriate number of decimal digits. For example, when millisecond precision is available, the date-time value would be presented as, as “2015-06-03T13:21:58.340+00:00”.
Context dependent on whether part of an ENTRY marker
Audit requires this field
MDC - InvokeTimestamp
Timestamp on invocation start.
Context dependent on whether part of an INVOKE marker
metrics needs this field.
MDC - TargetEntity
It contains the name of the ONAP component or sub-component, or external entity, at which the operation activities captured in this metrics log record is invoked.
Example: SDC-BE
MDC - TargetServiceName
It contains the name of the API or operation activities invoked (name on the remote/target application) at the TargetEntity.
Example: Class name of rest endpoint
Discussion: on building call graph vs human readable single line - keep for human readable
Used as valuable URI - to annnote invoke marker
Review in terms of ONAP Application Logging Specification v1.2 (Casablanca)#Marker-INVOKE - possiblly add INVOKE-return - to filter reporting
TBD: Coverage by log file type (debug, trace, ...)
TBD: cover off discussion on reducing log files to two (DEBUG/rest) for C* release
MDC - TargetElement
VNF/PNF context dependent - on CRUD operations of VNF/PNFs
The IDs that need to be covered with the above Attributes are
- VNF_ID OR VNFC_ID : (Unique identifier for a VNF asset that is being instantiated or that would generate an alarms)
- VSERVER_ID OR VM_ID (or vmid): (Unique identified for a virtual server or virtual machine on which a Control Loop action is usually taken on, or that is installed as part of instantiation flow)
- PNF : (What is the Unique identifier used within ONAP)
MDCs - the Rest
Other MDCs are logged in a wide range of contexts.
Certain MDCs and their semantics may be specific to EELF log types.
Deprecation
Indexing makes many of the remaining attributes redundant. So for example:
- There is considerable duplication:
- BeginTimestamp, EndTimestamp, ElapsedTime. These are all captured elsewhere (and ElapsedTime is even redundant within that triplet).
- Server, ServerIPAddress, ServerFQDN, VirtualServiceName. Overkill. Should be one, plus optionally ClientIPAddress (or some variant thereof).
- TargetEntity, TargetServiceName, not obviously different to similar attributes.
- There is junk:
- Severity? Nagios codes?
- ProcessKey?
- All the stuff that's already grayed out in the table above.
- People may defend these individually, maybe vigorously, but they're domain-specific:
- That absolutely doesn't mean they can't be used.
- Beats configuration allows ad hoc contexts to be indexed.
- But perhaps they don't belong in this kind of spec.
- Redundant attributes *do* matter, because:
- Populating and propagating everything prescribed by the guide approaches being prohibitive. People won't do it, and people *don't* do it.
- If something might be in one of several attributes then that's worse than it being in just one.
- That means:
- We're left with only two MANDATORY attributes, necessary to build invocations graphs:
- RequestID - top-level transactions.
- InvocationID - inter-component invocations.
- And a minimal number of OPTIONAL descriptive attributes: ServiceInstanceID, InstanceID, Server, StatusCode, ResponseCode, ResponseDescription.
- Those are the ones we need to document clearly, support in APIs, etc.
- That's <=10, a manageable number.
- And again, that matters because if the number isn't manageable, people won't (and don't) comply.
Some of that is contentious, but it's just talking points at this stage. We've tiptoed around the issue of extant conventions, and the ongoing result is a lot of attributes that nobody's really sure how to use, and which don't result in better logs. In Casablanca it's time to be less conservative.
Examples
Markers
Markers unambiguously assign semantics to individual log messages. They allow messages that have a specific meaning to be cheaply and easily identified in logger output, without inherently unreliable (and more costly, and less easily enforced) schemes like scanning for magic strings in the text of each log message.
ONAP logging requires the emission of Markers reporting entry, exit and invocation as the execution of requests pass between ONAP components. This information is used to generate a call graph.
ONAP components are also free to use Markers for their own purposes. Any Markers that are logged will be automatically indexed by Logstash.
Markers differ from MDCs in two important ways:
- They have a name, but no value. They are a tag - like a label.
- They are specified explicitly on invocation. They are not ThreadLocal, and they do not propagate.
EELF's implementation can be modified to emit Markers, but its public APIs do not allow them to be passed in by callers.
see code on reference folder in git clone ssh://michaelobrien@gerrit.onap.org:29418/logging-analytics
Examples
Marker formatting is using tabs - - LOG-553Getting issue details... STATUS - see https://git.onap.org/logging-analytics/tree/reference/logging-slf4j/src/test/resources/logback.xml#n9
Note there are 3 tabs (see p_mak in logback.xml) delimiting the MARKERS (ENTRY and EXIT) at the end of each line
<property name="p_mak" value="%replace(%replace(%marker){'\t', '\\\\t'}){'\n','\\\\n'}"/>
2018-07-05T20:21:34.794Z http-nio-8080-exec-2 INFO org.onap.demo.logging.ApplicationService InstanceID=ede7dd52-91e8-45ce-9406-fbafd17a7d4c, RequestID=f9d8bb0f-4b4b-4700-9853-d3b79d861c5b, ServiceName=/logging-demo/rest/health/health, InvocationID=8f4c1f1d-5b32-4981-b658-e5992f28e6c8, InvokeTimestamp=2018-07-05T20:21:26.617Z, PartnerName=, ClientIPAddress=0:0:0:0:0:0:0:1, ServerFQDN=localhost ENTRY 2018-07-05T20:22:09.268Z http-nio-8080-exec-2 INFO org.onap.demo.logging.ApplicationService ResponseCode=, InstanceID=ede7dd52-91e8-45ce-9406-fbafd17a7d4c, RequestID=f9d8bb0f-4b4b-4700-9853-d3b79d861c5b, ServiceName=/logging-demo/rest/health/health, ResponseDescription=, InvocationID=8f4c1f1d-5b32-4981-b658-e5992f28e6c8, Severity=, InvokeTimestamp=2018-07-05T20:21:26.617Z, PartnerName=, ClientIPAddress=0:0:0:0:0:0:0:1, ServerFQDN=localhost, StatusCode= EXIT
Logging
Via SLF4J:
import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.slf4j.Marker; import org.slf4j.MarkerFactory; // ... final Logger logger = LoggerFactory.getLogger(this.getClass()); final Marker marker = MarkerFactory.getMarker("MY_MARKER"); logger.warn(marker, "This warning has a 'MY_MARKER' annotation.");
EELF does not allow Markers to be set directly. See notes on the InvocationID MDC.
Serializing
Marker names also need to be escaped, though they're much less likely to contain problematic characters than MDC values.
Escaping in Logback configuration can be achieved with:
%replace(%replace(%marker){'\t','\\\\t'}){'\n','\\\\n'}
Marker - ENTRY
TODO: add table detailing which log files each marker is a part of - from a use case perspective
This should be reported as early in invocation as possible, immediately after setting the RequestID and InvocationID MDCs.
It can be automatically set by EELF, and written to the AUDIT log.
It must be manually set otherwise. Candidate for framework
EELF:
final EELFLogger logger = EELFManager.getAuditLogger(); logger.auditEvent("Entering.");
SLF4J:
public static final Marker ENTRY = MarkerFactory.getMarker("ENTRY"); // ... final Logger logger = LoggerFactory.getLogger(this.getClass()); logger.debug(ENTRY, "Entering.");
Marker - EXIT
This should be reported as late in invocation as possible, immediately before unsetting the RequestID and InvocationID MDCs.
It can be automatically reported by EELF, and written to the METRICS log.
It must be manually set otherwise.
EELF:
final EELFLogger logger = EELFManager.getMetricsLogger(); logger.metricsEvent("Exiting.");
SLF4J:
public static final Marker EXIT = MarkerFactory.getMarker("EXIT"); // ... final Logger logger = LoggerFactory.getLogger(this.getClass()); logger.debug(EXIT, "Exiting.");
Marker - INVOKE
This should be reported by the caller of another ONAP component via REST, including a newly allocated InvocationID, which will be passed to the caller.
SLF4J:
public static final Marker INVOKE = MarkerFactory.getMarker("INVOKE"); // ... // Generate and report invocation ID. final String invocationID = UUID.randomUUID().toString(); MDC.put(MDC_INVOCATION_ID, invocationID); try { logger.debug(INVOKE_SYNCHRONOUS, "Invoking synchronously ... "); } finally { MDC.remove(MDC_INVOCATION_ID); } // Pass invocationID as HTTP X-InvocationID header. callDownstreamSystem(invocationID, ... );
EELF examples of INVOCATION_ID reporting, without changing published APIs.
Marker - INVOKE-RETURN
This should be reported by the caller of another ONAP component via REST on return.
InvokeTimestamp context dependent MDC will be reported here.
SLF4J:
TBD
Marker - INVOKE-SYNCHRONOUS
This should accompany INVOKE when the invocation is synchronous.
SLF4J:
public static final Marker INVOKE_SYNCHRONOUS; static { INVOKE_SYNCHRONOUS = MarkerFactory.getMarker("INVOKE"); INVOKE_SYNCHRONOUS.add(MarkerFactory.getMarker("SYNCHRONOUS")); } // ... // Generate and report invocation ID. final String invocationID = UUID.randomUUID().toString(); MDC.put(MDC_INVOCATION_ID, invocationID); try { logger.debug(INVOKE_SYNCHRONOUS, "Invoking synchronously ... "); } finally { MDC.remove(MDC_INVOCATION_ID); } // Pass invocationID as HTTP X-InvocationID header. callDownstreamSystem(invocationID, ... );
EELF example of SYNCHRONOUS reporting, without changing published APIs.
Error Codes
Error codes are reported as MDCs.
TODO: add to table
Exceptions should be accompanied by an error code. Typically this is achieved by incorporating error codes into your exception hierarchy and error handling. ONAP components generally do not share this kind of code, though EELF defines a marker interface (meaning it has no methods) EELFResolvableErrorEnum.
A common convention is for error codes to have two components:
- A prefix, which identifies the origin of the error.
- A suffix, which identifies the kind of error.
Suffixes may be numeric or text. They may also be common to more than one component.
For example:
COMPONENT_X.STORAGE_ERROR
Output Format
Several considerations:
- Logs should be human-readable (within reason).
- Shipper and indexing performance and durability depends on logs that can be parsed quickly and reliably.
- Consistency means fewer shipping and indexing rules are required.
Text Output
TODO: 20190115 - do not take the example in this section until I reverify it in terms of the reworked spec example in
- LOG-630Getting issue details... STATUS
ONAP needs to strike a balance between human-readable and machine-readable logs. This means:
- The use of PIPE (|) as a delimiter. (Previously tab, and before that ... pipe).
- Escaping all messages, exceptions, MDC values, Markers, etc. to replace tabs and pipes in their content.
- Escaping all newlines with \n so that each entry is on one line.
In logback, this looks like:
<property name="defaultPattern" value="%nopexception%logger |%date{yyyy-MM-dd'T'HH:mm:ss.SSSXXX,UTC} |%level |%replace(%replace(%replace(%message){'\t','\\\\t'}){'\n','\\\\n'}){'|','\\\\|'} |%replace(%replace(%replace(%mdc){'\t','\\\\t'}){'\n','\\\\n'}){'|','\\\\|'} |%replace(%replace(%replace(%rootException){'\t','\\\\t'}){'\n','\\\\n'}){'|','\\\\|'} |%replace(%replace(%replace(%marker){'\t','\\\\t'}){'\n','\\\\n'}){'|','\\\\|'} |%thread |%n"/>
The output of which, with MDCs, a Marker and a nested exception, with newlines added for readability, looks like:
org.onap.example.component1.subcomponent1.LogbackTest |2017-08-06T16:09:03.594Z |ERROR |Here's an error, that's usually bad |key1=value1, key2=value2 with space, key5=value5"with"quotes, key3=value3\nwith\nnewlines, key4=value4\twith\ttabs |java.lang.RuntimeException: Here's Johnny \n\tat org.onap.example.component1.subcomponent1.LogbackTest.main(LogbackTest.java:24) \nWrapped by: java.lang.RuntimeException: Little pigs, little pigs, let me come in \n\tat org.onap.example.component1.subcomponent1.LogbackTest.main(LogbackTest.java:27) |AMarker1 |main
Default Logstash indexing rules understand output in this format.
XML Output
For Log4j 1.X output, since escaping is not supported, the best alternative is to emit logs in XML format, we will expand on JSON support
There may be other instances where XML (or JSON) output may be desirable. Default indexing rules support
Default Logstash indexing rules understand the XML output of Log4J's XMLLayout.
Note that we're hoping that support for indexing of XML output can be deprecated during Beijing. This relies on the adoption of ODL Carbon, which should eliminate any remnant of Log4J1.X.
Output Location
Standardization of output locations makes logs easier to locate and ship for indexing.
Expand on out-of-container locations off /dockerdata-nfs
Logfiles should default to beneath /var/log, and beneath /var/log/ONAP in the case of core ONAP components:
/var/log/ONAP/<component>[/<subcomponent>]/*.log
For the duration of Beijing, logs will be written to a separate directory, /var/log/ONAP_EELF:
expand on Casablanca differences, and adding as a config setting in OOM
/var/log/ONAP_EELF/<component>[/<subcomponent>]/*.log
Configuration
Logging providers should be configured by file. Files should be at a predictable, static location, so that they can be written by deployment automation. Ideally this should be under /etc/ONAP, but compliance is low.
Locations
All logger provider configuration document locations namespaced by component and (if applicable) subcomponent by default:
/etc/ONAP/<component>[/<subcomponent>]/<provider>.xml
Where <provider>.xml, will typically be one of:
- logback.xml
- log4j.xml
- log4j.properties
Reconfiguration
Logger providers should reconfigure themselves automatically when their configuration file is rewritten. All major providers should support this.
The default interval is 10s.
Overrides
The location of the configuration file MAY be overrideable, for example by an environment variable, but this is left for individual components to decide.
Archetypes
Configuration archetypes can be found in the ONAP codebase https://git.onap.org/logging-analytics/tree/. Choose according to your provider, and whether you're logging via EELF. Efforts to standardize them are underway so the ones you should be looking for are where pipe (|) is used as a separator. (Previously it was "|").
Retention
Logfiles are often large. Logging providers allow retention policies to be configured.
Retention has to balance:
- The need to index logs before they're removed.
- The need to retain logs for other (including regulatory) purposes.
Defaults are subject to change. Currently they are:
- Files <= 50MB before rollover.
- Files retain for 30 days.
- Total files capped at 10GB.
In Logback configuration XML:
<appender name="file" class="ch.qos.logback.core.rolling.RollingFileAppender"> <file>${outputDirectory}/${outputFilename}.log</file> <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy"> <fileNamePattern>${outputDirectory}/${outputFilename}.%d{yyyy-MM-dd}.%i.log.zip</fileNamePattern> <maxFileSize>50MB</maxFileSize> <maxHistory>30</maxHistory> <totalSizeCap>10GB</totalSizeCap> </rollingPolicy> <encoder> <!-- ... --> </encoder> </appender>
Types of EELF Logs
EELF guidelines stipulate that an application should output log records to four separate files:
- audit
- metrics
- error
- debug
This applies only to EELF logging. Components which log directly to a provider may choose to emit the same set of logs, but most do not.
Audit Log
An audit log is required for EELF-enabled components, and provides a summary view of the processing of a (e.g., transaction) request within an application. It captures activity requests that are received by an ONAP component, and includes such information as the time the activity is initiated, then it finishes, and the API that is invoked at the component.
Audit log records are intended to capture the high level view of activity within an ONAP component. Specifically, an API request handled by an ONAP component is reflected in a single Audit log record that captures the time the request was received, the time that processing was completed, as well as other information about the API request (e.g., API name, on whose behalf it was invoked, etc).
Metrics Log
A metrics log is required for EELF-enabled components, and provides a more detailed view into the processing of a transaction within an application. It captures the beginning and ending of activities needed to complete it. These can include calls to or interactions with other ONAP or non-ONAP entities.
Suboperations invoked as part of the processing of the API request are logged in the Metrics log. For example, when a call is made to another ONAP component or external (i.e., non-ONAP) entity, a Metrics log record captures that call. In such a case, the Metrics log record indicates (among other things) the time the call is made, when it returns, the entity that is called, and the API invoked on that entity. The Metrics log record contain the same RequestID as the Audit log record so the two can be correlated.
Note that a single request may result in multiple Audit log records at an ONAP component and may result in multiple Metrics log records generated by the component when multiple suboperations are required to satisfy the API request captured in the Audit log record.
Error Log
An error log is required for EELF-enabled components, and is intended to capture info, warn, error and fatal conditions sensed (“exception handled”) by the software components.
This includes previous logs that went to application.log
Debug Log
A debug log is optional for EELF-enabled components, and is intended to capture whatever data may be needed to debug and correct abnormal conditions of the application.
Engine.out
Console logging may also be present, and is intended to capture “system/infrastructure” records. That is stdout and stderr assigned to a single “engine.out” file in a directory configurable (e.g. as an environment/shell variable) by operations personnel.
Application Log (deprecated)
see example in https://git.onap.org/oom/tree/kubernetes/portal/charts/portal-sdk/resources/config/deliveries/properties/ONAPPORTALSDK/logback.xml
We no longer support this 5th log file - see error.log
Log File Locations
There are several locations where logs are available on the host, on the nfs share and in each application and filebeat container
Logs on each Kubernetes host VM - docker empty.dir shares
Logs on the /dockerdata-nfs share across hosts
Logs on each microservice docker container - /var/log/onap
Logs on each filebeat docker container sidecar - /var/log/onap
Add this procedure to the Project Proposal Template By following a few simple rules: Obligations fall into two categories: You must: They are unordered. New ONAP Component Checklist
What's New
(Including what WILL be new in v1.2 / R2).
- Field separator reverted to pipe.
- Dual appenders in Logback and Log4j reference configurations:
- Indexable, for shipping and indexing.
- EELF, for backward compatibility.
- Minor changes to path conventions.
- XML output deprecated (required only for Log4j1.2, which is also expected to go).
- Improved documentation of semantics and usage (including initialization and propagation via ThreadLocal and HTTP headers) for existing MDCs and attributes.
- Add MDCs/Markers + usage for invocation IDs, allowing call graphs to be built without reliance on heuristics.
- Revisiting persistence (a clear requirement) and rollover settings, based on feedback from operations.
- More discussion of How to Log. (Where previously guidelines were largely concerned with architecture and mechanics).
- Locking in other changes proposed in R1, including MDC serialization, escaping, etc. These can be treated as accepted. (Note that they only affect indexable output).
In addition, we expect to provide (as a Beijing deliverable) a minimal, synthetic component as an example of best-practices, and this will provide all code examples for this guide. (Does that mean the example will log via EELF, or will we end up with two variants?)
Pending Specification Work
id | date | item | details | status |
---|---|---|---|---|
20180614 | MDC ClientIPAddress | Ask question of OPS to remove this field - 20180419 | todo | |
20180614 | MDC ResponseCode / ResponseDescription | expand/find note 1* | todo |
Developer Guide
see separate page (cross releases) in Logging Developer Guide
142 Comments
Michael O'Brien
20180412 - doing group edit to lockdown fields
https://lists.onap.org/pipermail/onap-discuss/2018-April/009020.html
Requirements (formalize into the wiki later)
MDC's
Note: for required response fields - these can be optional for logs that do not log a response
Changes
RequestID changes to TransactionID - proposal to finalize by 20180419 - involves significant code changes
partnerName manditory? 5-10% of teams currently support - need it to differentiate between 3rd party calls inside components - like a JPA/REST library for example
Verify that partnerName represents the granular microservice name so the ops team can target an individual container
Review with OPS for sign off on removal of Server, ServerIP in leu of keeping ServerFQDN
provide spec for onap team for what to write to the pair serverFQDN/clientIPAddress
advise teams on duplication of ServerFQDN and VirtualServerName - these are or hybrid deployements
Michael O'Brien
TODO: for 15 TargetServiceName - look at INVOKE-return marker -
Add scenarios for UCs for marker and MDC logs - per log type
Determine if for C* release dual DEBUG/the-other-3 log files could be reduced from 4 to 2 if the Marker/MDC reporting makes the log type not required except for debug logs
To be clear - 2 files (debug and everything else in a separate one)
Discuss in terms of B/C upgrade path
LOG-231 - Getting issue details... STATUS
Epic for requestID to TransactionID - start discussion for cassablanca LOG-232 - Getting issue details... STATUS
ExternalID field discussion for hybrid environments:
meet 20180419 discussion
Lee Breslau
We should include enough information in this specification so that log records that conform to it can be translated into log records produced by components that are using the separate Audit and Metric log records in the original specification. This is necessary for deployments that will include some components supporting each format.
We should required logs records to be written as follows (which is largely consistent with the discussion above):
1 and 2 can be used to produce the equivalent of an Audit log record. 3 and 4 can produce a metric log record.
Dave Williamson
Along these lines, when a combined logfile is used, should each record include a field specifying which log it belongs to (audit, debug, error, or metrics) so that the translation Lee mentions can be further facilitated?
Dave Williamson
Clear examples of what an audit, debug, error, and metrics entry would look like (whether being logged in 4 separate files or in a combined file) would also be helpful.
Dave Williamson
The need for a clear example of the log content keeps coming up. Let's please try to get that assigned to someone in today's meeting.
Michael O'Brien
20180426: Notes on spec meet
Agenda:
Xuan Liu
TargetEntiry and TargetServiceName fields are relevant to the INVOKE and INVOKE-return, so these fields could be tagged as "C" (context based), instead of "Y"
Michael O'Brien
editing now - thanks
Xuan Liu
i quit the editing mode, i didn't make any changes yet
Michael O'Brien
notes: 20180503 review
(10 participants) - screencap
Luke Parker
I've just put together a SLFJ reference example. The original purpose was to capture some representative call graph data for use in a diagram, but in order to do that, I've put all (or most, as we'll see) of the attributes through their paces. I'll commit the implementation to the logging-analytics project shortly (to logging-analytics/reference/slf4j-reference), but in the meantime, here's some feedback. (Brace yourselves.)
Lee Breslau
Luke,
Two comments:
Lee
Luke Parker
Hi Lee,
Quite right, and I do understand, and didn't mean to come on so strong. We've been focused on the specification, and I was giving feedback from the perspective of a developer going about implementing it from the bottom up.
Distilling those points down a little:
Thanks,
Luke
Michael O'Brien
From 20180515 meet
discussion on async invoke - as in providing a mechanism to provide for async callbacks (websocket like...)
relax requirements on caller - defer to responder to mark the return but not necessarily via callback to originator
In terms of tracking the invocationId
TODO: merge duplicate MDC tables
TODO: group based on luke's: IDs, service, addresses, response
Review service mesh email on logging overlap
Chris Lott
This page suggests, "Consider using EELF." Please explain the case for EELF.
I see the current initiative to use just a couple of log files, not segregated into audit/metrics/error/application files as per EELF. I also see the current initiative to use markers, which are not supported by EELF.
I'm trying to say, these facts seem like a strong argument against using EELF. Using SLF4J looks like the most straightforward approach to meeting these logging specifications.
Thanks in advance for comments & guidance.
Steve Smokowski
I agree with going straight SL4J approach, overall I find EELF overbearing and confusing. It is mostly just a limited implementation.
Luke Parker
My preference, too. The "Consider using EELF" was carefully worded. EELF is idiosyncratic, but many existing components use it, and that won't change any time soon.
The spec is behavioral, and the aim is to be compatible with any library or language. There are still a few gray areas, like EELF's multiple logfiles and hardwired (e.g. com.att.eelf.audit) logger names, and these take time to resolve because nobody's entirely sure who or what depends on them. They can largely be spun out into provider configuration, though, which in theory means you can safely multiplex output into a single file, since the provider configuration can be overridden to restore EELF-like behavior where required. (Like I say though, it's still a gray area).
There's a nascent reference implementation for SLF4J in logging-analytics/reference/logging-slf4j (logging-analytics/reference/slf4j-reference pending https://gerrit.onap.org/r/#/c/48433/) which might be of some interest. It has few dependencies and aims to be a very light touch, but you can also use it as a (Casablanca) example of SLF4J logging without incurring anything.
Xuan Liu
I added a detailed toy example on the markers - EELF (audit/metric) equivalent implementation for discussion.
A quick description on the following example:
Assume a request is going through three components: A, B and C, where C calls A to invoke B to do something. With EELF log implementation, A will generate one audit log entry and one metric log entry, and B will generate one audit log entry corresponding to A's metric log. The requestID (r1) will be carried through in the whole transaction.
With Marks/MDCs implementation, we are showing the EELF equivalent below, with invocationID and markers.
Thanks,
Xuan
Lee Breslau
Xuan, thanks for putting together and posting this example. I think it embodies several of the points we've discussed. Key among them: this shows why we need both the EntryTimestamp and InvokeTimestamp MDCs.
Another point that may be implicit in your example but could be with calling out: t1 < t2 < t3 . . . . . which shows the sequence of events that generate these particular log records.
Shishir Thakore
Thanks Xuan, I concur with Lee's comments.
For marker, let’s have two phase approach.
Phase 1: Casablanca release – Introduce marker concept in Casablanca release. All ONAP components comply with Casablanca logs specs (MCDs) and use marker concept in logs. We will still writes logs in 4 separate files (audit, metrics, error, debug). At end of Casablanca release we will evaluate marker concepts pros/cons.
Phase 2: Based on Casablanca marker concept evaluation, we will execute the file consolidation in next release.
Dave Williamson
This page still refers to X-TransactionID as the HTTP header that will carry the RequestID in REST requests. I thought we had decided a while back that we would use X-ONAP-RequestID as the HTTP header. Has that been re-thought or was the change never made to this page?
Dave Williamson
We had an internal AT&T meeting today and discussed the need to clearly state the expectation that RequestID be propagated across all interfaces, not just REST Interfaces. We may also wish to consider specifying how that should be done for currently known non-REST interfaces, such as via the DMaaP Message Router and DMaaP Data (File) Router.
Shishir Thakore
We have another internal AT&T discussion on PartnerName and ServiceName definitions. Ops and development teams recommendation on definitions.
Shishir Thakore
Here are updated definitions based on Ops and development team input
The IDs that need to be covered with the above Attributes are
- VNF_ID OR VNFC_ID : (Unique identifier for a VNF asset that is being instantiated or that would generate an alarms)
- VSERVER_ID OR VM_ID (or vmid): (Unique identified for a virtual server or virtual machine on which a Control Loop action is usually taken on, or that is installed as part of instantiation flow)
- PNF : (What is the Unique identifier used within ECOMP that we would like used in ONAP)
Michael O'Brien
Example output from Luke Parker 's Marker example junut tests in logging-slf4j-demo
Michael O'Brien
Acumos logging meet
Michael O'Brien
mike talk to mike
Michael O'Brien
Clamp meeting: they don't need ELK as a Service but will collaborate with common charts
Cheuk Yiu Horace Ip
Thanks for the example.
Regarding INVOKE records that contains 2 invocation ID's (The old one is in the MDC, the new one follows INVOKE), would it be more consistent if both invocation IDs are in the MDCs (the new one as a conditional field)?
Michael O'Brien
this makes sense - I will prototype this
Xuan Liu
Horace,
Thanks for pointing out the two invocation IDs in the example. I updated my previous example with a new invocationID added to the Invoke/Invoke-return log record. So the question is whether we will define a new attribute in MDC "InvocationIDOut" (or some other name making more sense) to the spec, and this new field only applies for the logs with Invoke/Invoke-return markers.
Michael O'Brien
20180614: notes
Michael O'Brien
220180621: Notes
add links to actual library code (Java is more mature for now) https://git.onap.org/logging-analytics/tree/reference/logging-slf4j
https://git.onap.org/logging-analytics/tree/reference/logging-slf4j/src/main/java/org/onap/logging/ref/slf4j/ONAPLogAdapter.java
via
https://git.onap.org/logging-analytics/tree/reference/logging-slf4j/src/test/java/org/onap/logging/ref/slf4j/ONAPLogAdapterTest.java
and flow related
https://git.onap.org/logging-analytics/tree/reference/logging-slf4j-demo/src/test/java/org/onap/logging/ref/slf4j/demo/component/AbstractComponentTest.java
add links to devops page Logging User Guide#LoggingDevOps and ONAP Development#KubernetesDevOps
q) other wrappers in additions to slf4j - yes we can add other wrappers
Q) on phased (two logstash configs ) - LOG-493 - Getting issue details... STATUS
todo: remember Acumos operations use case refs and marker scenarios including when applicable in devops section
There will be updates to public acumos use cases - do a 2nd pass
Q) on multiple logback.xml in each project
TODO email Shishir Thakore the security meet and aaf meet
add code references to the right side of each table entry into the junit test for now.
Work items
Sharon Chisholm
Some comments:
Michael O'Brien
There is an evolving section of using/deploying/debugging the library - parts of this as it is finalized/tested/verified will be retrofitted here
ONAP Development#DeveloperUseoftheLoggingLibrary
to
LoggingLibraryLocationandUse
off
LOG-115 - Getting issue details... STATUS
LOG-137 - Getting issue details... STATUS
Prudence Au
I think # tags are being used: MDC-RequestID
Michael O'Brien
Also spring AOP example in progress - ONAP Development#LoggingWithAOP under LOG-135 - Getting issue details... STATUS
Ruoyu Ying
One dumb question, according to this page, slf4j seems to be the best choice to support all these MDCs and Markers in order to fulfill our requirements . However, our code is written in python, is the modules of slf4j supported by python ? Or is there any replaceable ones? Thanks in advance.
Best Regards,
Ruoyu
Michael O'Brien
Ruoyu, Hi we are still finalizing the specification adherence in the python library - submitted by the Multicloud team at https://git.onap.org/logging-analytics/tree/pylog
Dave Williamson
Hi Michael, do you happen to know when the Python library is expected to be finalized?
Michael O'Brien
I need to learn more python first and retrofit changes after the java work is completed
Ikram Ikramullah
What's the expectation from Python based applications for logging in Casablanca release?
Shankaranarayanan Puzhavakath Narayanan
Michael O'Brien, OOF was planning to make some progress along the adherence to logging 1.2 spec for Casablanca. Since we are a python project, the readiness of the Python library is a key turning point for us. We could still do Level 1 and implement a local library/wrapper, but I'm worried it may be wasted efforts if the python library is close to being rolled out (in Casablanca or Dublin). Given this, I'd be grateful for your recommendation for our project, given the time left in Casablanca.
Dave Williamson
The spec is inconsistent in its references to the metric/metrics log. In some places, metric is used; in other places, metrics. Which is it? I do not think it matters much which we use, rather we need to be completely consistent in the spec so that we at least have a chance of consistency in implementation.
Luke Parker
https://github.com/att/EELF/blob/master/EELF/src/main/java/com/att/eelf/configuration/EELFLogger.java and most provider config calls it "metrics", so I guess that's what we should standardize on.
Michael O'Brien
updated page
Michael O'Brien
TODO: eclEmma coverage details and howto, using "recheck" to rerun oom job builder timing errors
Michael O'Brien
Acumos meeting minutes: 20180719:
Michael O'Brien
Acumos meet 20180802
Dave Williamson
The following fields appear to be missing from the table:
For clarity, if these are indeed expected, then they should probably be in the table.
Michael O'Brien
We will need to discuss this in both the logging spec meeting for onap at 11am tue and the acumos ptl and log meetings (9am tue, 1130 thu) - since they follow the onap spec with modifications
https://wiki.acumos.org/display/OAM/Log+Standards
added "Message" MDC
did not add "ThreadID" - because it should be defined in the logback.xml as a regular field
TODO: Michael O'Brien add section on common attributes (not MDCs) but manditory via provider api's (logback.xml) - timestamp, logger, level, threadid
example where we have timestamp, thread, level and logger (based on classname in this example)
2018-07-09T14:48:01.014Z http-nio-8080-exec-8 INFO org.onap.demo.logging.ApplicationService
via
LoggerFactory.getLogger(this.getClass()).info("Running /health");
Dave Williamson
I was just talking to the ASDC team and they want to know if there is a reason why "message" should be an MDC. Does it really need to be saved for later? Not sure if perhaps there are scenarios where this makes sense and thus it should be an MDC.
Dave Williamson
Per today's meeting, I understand that "Message" will be added to the table of common attributes and will not be added as an MDC.
Dave Williamson
With regard to ServerFQDN, this may be difficult to obtain inside a container. For example, within a pod, calls to hostname return the name of the pod, not the name of the VM or server on which the pod is running. Within a Docker container, hostname returns the id of the container. Thus, the VM or server name is essentially hidden from inside the pod and the container.
This spec currently defines the content of ServerFQDN as follows:
"This field contains the Virtual Machine (VM) Fully Qualified Domain Name (FQDN) if the server is virtualized. Otherwise, it contains the host name of the logging component."
What about inside a Pod or a container?
Two possibilities I can think of:
Other ideas beyond these two are certainly welcome.
What I think is missing from my understanding is what an operations team for an ONAP deployment might be expecting to do with this information. In particular, #1 above seems the easiest from the perspective of getting the information and not impacting anything outside of logging, but is it sufficient? If it is, then I would say we should go with #1.
Michael O'Brien
expanding on container level for ServerFQDN
vm (no docker)
vm<delimeter>,dockerid - (docker-compose)
vm<delimeter>,pod<>containerID (kubernetes) - what about namespace
example
onap onap-log-elasticsearch-7557486bc4-9h7gf 1/1 Running 0 7d
- only 9h7gf varies on restarts
expand the field's description - likely just return whatever the is output from inside the container
Need to show production level vm name (plus) - currently running in dev only getting "localhost"
Dave Williamson
Based on the investigation Michael has done here, I think I feel comfortable with the idea of logging whatever hostname returns. This is simple and appears to be reasonably useful. I would like to see us finalize this during the ONAP logging call tomorrow if at all possible.
Michael O'Brien
Updating based on testing on a single node k8s server and a clustered k8s system to see if serverFQDN returns the vm name
currently I am getting the node that the pod is running on - but I should get dev0, dev1 for clustered nodes
ServerFQDN=dev.onap.info
note this is just the k8s service itself - not the pod
ClientIPAddress=10.42.0.1
ubuntu@ip-172-31-48-194:~$ kubectl get services --all-namespaces
default kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 61d
Dave Williamson
With regard to InstanceUUID, the SDC team has raised some interesting points with regarding to InstanceUUID inside a container:
Note that if #1 is pursued, this could possibly duplicate what is reported in ServerFQDN if the other question I just posted is resolved to provide the Pod ID in ServerFQDN.
Michael O'Brien
good point the pod id is unique enough
onap-log-elasticsearch-7557486bc4-9h7gf
plus the actual container - if there are sidecars
choose one of: [logdemonode-filebeat-onap logdemonode]
verify
node / pod / container - levels
should we make UUID optional and include pod ID for InstanceUUID - yes (pass in kubernetes id if available
Michael O'Brien todo - fix the table and description for InstanceUUID - keep the name
Chris Lott
Will the ONAP reference implementation library https://git.onap.org/logging-analytics be released anytime soon? I see version 1.2.2 in the ONAP staging repository but not in the releases repository.
Michael O'Brien
I would expect we will run the release artifacts when we pass RC2 for casablanca - for now you can use the lastest snapshot jar in the staging repo.
Chris Lott
I can use the artifact locally sure, but in the Acumos project at LF, the Jenkins settings are configured to ignore staging repos. Acumos doesn't have the special one-off fix to the nexus server :/ so having a bona fide released library is important.
How about releasing v 1.2.2, and when you need to change it, bump up the version and release again? Doesn't cost any extra ..
Michael O'Brien
Questions/answers for parichay gupta answered on https://lists.onap.org/g/onap-discuss/topic/logging_standards/24231150?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,24231150
Steve Smokowski
How do you log ClientIpAddress for scheduled tasks? Do you just want something like 127.0.0.1? Since it is an internally started item?
Michael O'Brien
Yes, IP address may not really resolve to anything usable inside a k8s cluster - it is more for hybrid deployments where the client is outside the cluster - in the case of a k8s deployment it currently resolves to 10.42.0.1
Steve Smokowski
Partner name description seems off unless I am reading it wrong? Is it really the target uri?
Michael O'Brien
Right, thanks for the catch
in unauthenticated calls - the service/agent (part of the url) making the call
in authenticated calls - the userid
adjusted table text
Steve Smokowski
Is there an obsolescence chart, where it kind of describes what was removed, or has been replaced by a new MDC value? I find it difficult to look at what I was previously logging, then the new format and trying to align it all.
Michael O'Brien
good idea, was using the diff in wiki - I will put back the deleted fields - more so since the last 2 weeks where we were talking about implementing a phased approach (no all or nothing)
Michael O'Brien
Adding larger logging committers and onap to acumos mail.
From: LOTT, CHRISTOPHER M (CHRISTOPHER) <clott@research.att.com>
Sent: Friday, August 17, 2018 9:51 AM
To: MANTHAN, MUKESH <MM00542237@techmahindra.com>; GUPTA, PARICHAY <PG0057291@techmahindra.com>
Cc: SULLIVAN, BRYAN L (BRYAN L) <bryan.sullivan@research.att.com>; Michael O'Brien <Frank.Obrien@amdocs.com>; SUBRAMANIAN, NATARAJAN <Natarajan.Subramanian@techmahindra.com>; TALASILA, MANOOP (MANOOP) <talasila@research.att.com>; Dooze Philippe IMT/ Oln <philippe.dooze@orange.com>; DEY, SPONDON <sd4351@att.com>; JACOBSON, GUY J (GUY J) <guy@research.att.com>; WELCH, LORRAINE A <lb2391@att.com>; SARKAR, ANANDARUP (ANANDARUP) <asarkar@research.att.com>; UKASICK, AIMEE L (AIMEE L) <aimeeu@research.att.com>
Subject: Sample logging code and logback.xml for possible Acumos component use
Importance: High
Good morning, I didn't have time to attend all your committee's meetings, but based on your wiki page
https://wiki.acumos.org/display/OAM/Logging
I started working with ONAP-style logging in the CDS server component. I reused code and config from the ONAP log-analytics project
https://git.onap.org/logging-analytics/tree/reference/logging-slf4j
A digression: I tweaked this some, I didn't feel this was 100% drop-in usable because of the "ONAP" prefix in several of their HTTP request header names, which probably isn't right for Acumos.
Anyhow for Spring-Boot users, I can offer sample code to add some mapped diagnostic context (MDC) values on each REST request. Those MDC key-value pairs then become available to the log output pattern. Requires no change to individual controllers, which may be an accelerator for some projects. The key class is below. You can fetch the rest from the CDS project gerrit repository.
I also can offer a logback.xml config file, see below. Now, bear with me, mine uses the tab-separated MDC-key-value format that your committee seems to have decided should be postponed to release 2/B. I happen to like that format, think it's a lot prettier :) than the 30+ column pipe-delimited fixed format that's proposed for release A, so maybe this is not of value to you. But please note that the output format is ENTIRELY determined by the logback.xml file, so log-generating components can choose fixed or variable columns trivially, just drop in the right file in one place. But I don't know the OAM impact - Parichay, please tell me how difficult it is for you to accommodate multiple log formats in OAM.
Just wanted to toss in my $0.02 and code samples, HTH
---
/**
* Adds request details to the mapped diagnostic context (MDC) so they can be
* logged. <BR>
* http://www.devgrok.com/2017/04/adding-mdc-headers-to-every-spring-mvc.html
*/
public class LoggingHandlerInterceptor extends HandlerInterceptorAdapter {
/**
* Tracks the set of keys added to MDC.
*/
private ThreadLocal<Set<String>> storedKeys = ThreadLocal.withInitial(() -> new HashSet<>());
/**
* Copies key-value pairs from HTTP request to MDC context. Unfortunately they
* use different conventions for key naming.
*/
@Override
public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler)
throws Exception {
addKey(ONAPLogConstants.MDCs.SERVER_FQDN, InetAddress.getLocalHost().getCanonicalHostName());
addKey(ONAPLogConstants.MDCs.CLIENT_IP_ADDRESS, request.getRemoteAddr());
final String requestId = request.getHeader("X-Request-ID");
if (requestId != null)
addKey(ONAPLogConstants.MDCs.REQUEST_ID, requestId);
return true;
}
private void addKey(String key, String value) {
MDC.put(key, value);
storedKeys.get().add(key);
}
// request ended on current thread remove properties
@Override
public void afterConcurrentHandlingStarted(HttpServletRequest request, HttpServletResponse response, Object handler)
throws Exception {
removeKeys();
}
@Override
public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex)
throws Exception {
removeKeys();
}
private void removeKeys() {
for (String key : storedKeys.get())
MDC.remove(key);
storedKeys.remove();
}
}
---
<?xml version="1.0" encoding="UTF-8"?>
<!--
===============LICENSE_START=======================================================
Acumos
===================================================================================
Copyright (C) 2017 AT&T Intellectual Property & Tech Mahindra. All rights reserved.
===================================================================================
This Acumos software file is distributed by AT&T and Tech Mahindra
under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
This file is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
===============LICENSE_END=========================================================
-->
<configuration scan="true" scanPeriod="3 seconds" debug="false">
<!-- component name is log file basename -->
<property name="componentName" value="cmn-data-svc"></property>
<!-- Use a subdirectory for ease of mounting a volume in docker -->
<property name="logDirectory" value="logs" />
<property name="p_tim" value="%d{"yyyy-MM-dd'T'HH:mm:ss.SSSXXX", UTC}"/>
<property name="p_thr" value="%thread"/>
<property name="p_lvl" value="%level"/>
<property name="p_log" value="%logger"/>
<property name="p_mdc" value="%replace(%replace(%mdc){'\t','\\\\t'}){'\n', '\\\\n'}"/>
<property name="p_msg" value="%replace(%replace(%msg){'\t', '\\\\t'}){'\n','\\\\n'}"/>
<property name="p_exc" value="%replace(%replace(%rootException){'\t', '\\\\t'}){'\n','\\\\n'}"/>
<property name="p_mrk" value="%replace(%replace(%marker){'\t', '\\\\t'}){'\n','\\\\n'}"/>
<property name="singleLinePattern" value="%nopexception${p_tim}\t${p_thr}\t${p_lvl}\t${p_log}\t${p_mdc}\t${p_msg}\t${p_exc}\t${p_mrk}\t%n"/>
<property name="multiLinePattern" value="%nopexception${p_tim}\t${p_thr}\t${p_lvl}\t${p_log}\t%mdc\t%msg\t%rootException\t%marker\t%n"/>
<appender name="console" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>${multiLinePattern}</pattern>
</encoder>
</appender>
<appender name="file" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${logDirectory}/${componentName}.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
<fileNamePattern>${logDirectory}/${componentName}.%i.log.zip</fileNamePattern>
<minIndex>1</minIndex>
<maxIndex>9</maxIndex>
</rollingPolicy>
<triggeringPolicy
class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
<maxFileSize>5MB</maxFileSize>
</triggeringPolicy>
<encoder>
<pattern>${singleLinePattern}</pattern>
</encoder>
</appender>
<appender name="asyncFile" class="ch.qos.logback.classic.AsyncAppender">
<queueSize>256</queueSize>
<appender-ref ref="file" />
</appender>
<!-- using level debug yields enormous output from spring and friends -->
<root level="info">
<appender-ref ref="console" />
<appender-ref ref="asyncFile" />
</root>
<!-- Apache HttpClient has verbose debug output; silence it -->
<logger name="org.apache.http" level="INFO" />
<!-- Spring client has verbose debug output; silence it -->
<logger name="org.springframework.web.client" level="INFO" />
</configuration>
Michael O'Brien
Just updated the acumos table with Lorraine Welch - retrofitting ONAP here based on the pipe delimited 25 column spec for audit.log
https://wiki.acumos.org/pages/viewpage.action?pageId=10028842
based on the onap/portal/sdk logback.xml pattern for audit - use this pattern for all 4 logs (audit, error, metrics, debug)
https://git.onap.org/oom/tree/kubernetes/portal/charts/portal-sdk/resources/config/deliveries/properties/ONAPPORTALSDK/logback.xml#n59
Steve Smokowski
Why are we requiring the usage of the logging library to dictate levels? Whom decided this? I have no intentions of ever using the library as I feel it does not provide sufficient value, so I assume we will never reach above level 1?
Michael O'Brien
Sounds good we are planning on using the portal/sdk library.
Levels can be set independently via the 4 log appenders in logback.xml outside of a library
<logger name="org.onap.eelf.audit" level="info" additivity="false">
<appender-ref ref="asyncEELFAudit" />
</logger>
<logger name="org.onap.eelf.debug" level="debug" additivity="false">
<appender-ref ref="asyncEELFDebug" />
</logger>
<logger name="org.onap.eelf.error" level="info" additivity="false">
<appender-ref ref="asyncEELFError" />
</logger>
<logger name="org.onap.eelf.metrics" level="info" additivity="false">
<appender-ref ref="asyncEELFMetrics" />
</logger>
LOG-631 - Getting issue details... STATUS
Steve Smokowski
I meant more like measuring levels of compliance? It appears in order to be Level 3 compliant I am forced to use your library? Why can I not just use SL4J and meet your logging spec?
IE this
Level 2: Log specification compliance via SLF4J supplied logging-analytics library
Use Luke's SLF4J library directly as wrapper for MDCs and Markers by calling the library from within each function.
Level 3: Log specification compliance via AOP library over SLF4J supplied logging-analytics library
Use a spring based Aspect library that emits Markers automatically around function calls and retrofit your code to log via Luke's SLF4J for internal log messages.
Michael O'Brien
right, you can do level 1 and continue to use your own code to make the request headers. For logback.xml eventually a single file and pattern will be communicated and phased in - for now you are on your own in making sure the 29 fields are in where available.
/michael
Steve Smokowski
Do you have a draft of the logback.xml? or some place i can pull it from?
Michael O'Brien
went through a refactor to align with Acumos last week (same people in ONAP)
Logging Developer Guide#Logback.xml
in progress at
https://gerrit.onap.org/r/#/c/62405
focus is on consolidating all the oom logback.xml files into 1 to fix the configMap size issue
Michael O'Brien
For the p_log or logger field missing from acumos that I added back to onap - I was wrong this is not for markers - this is the logger itself on the class - usually the same as the class name.
notice new field p_logger at pos 25 - after user
notice new field p_mdc at pos 26 and p_marker at 28 (after Message at 27)
<property name="p_log" value="%logger"/>
The marker field I am adding back a p_mak so it displays what lukes slf4j library already emits
<property name="p_mak" value="%replace(%replace(%marker){'\t', '\\\\t'}){'\n','\\\\n'}"/>
For the msg to Message field - need to adjust the code because
-<property name="p_message" value="%replace(%replace(%msg){'\t', '\\\\t'}){'\n','\\\\n'}"/>
+<property name="p_message" value="%replace(%replace(%Message){'\t', '\\\\t'}){'\n','\\\\n'}"/>
breaks
|%PARSER_ERROR[Message]
for the serverFQDN that outputs the dns entry of the host vm for the k8s cluster - i am retesting on a cluster so we would expect dev2.onap.info instead of dev.onap.info
merging
<property name="p_tim" value="%d{"yyyy-MM-dd'T'HH:mm:ss.SSSXXX", UTC}"/>
<property name="p_lvl" value="%level"/>
<property name="p_logger" value="%logger"/>
<property name="p_mdc" value="%replace(%replace(%mdc){'\t','\\\\t'}){'\n', '\\\\n'}"/>
<property name="p_message" value="%replace(%replace(%msg){'\t', '\\\\t'}){'\n','\\\\n'}"/>
<property name="p_exc" value="%replace(%replace(%rootException){'\t', '\\\\t'}){'\n','\\\\n'}"/>
<property name="p_marker" value="%replace(%replace(%marker){'\t', '\\\\t'}){'\n','\\\\n'}"/>
<property name="p_thr" value="%thread"/>
<!-- property name="pattern" value="%nopexception${p_tim}\t${p_thr}\t${p_lvl}\t${p_log}\t${p_mdc}\t${p_msg}\t${p_exc}\t${p_mak}\t%n"/-->
<property name="pattern" value="%nopexception${p_tim}|%X{EntryTimestamp}|%X{InvokeTimestamp}|%X{RequestId}|%X{InvocationId}|%X{InstanceUUID}|%X{ServiceInstanceId}|%thread|%X{ServiceName}|%X{PartnerName}|%X{StatusCode}|%X{ResponseCode}|%X{ResponseDesc}|%.-5level|%X{Severity}|%X{ServerIPAddress}|%X{ElapsedTime}|%X{ServerFQDN}|%X{ClientIPAddress}|%X{VirtualServerName}|%X{ContextName}|%X{TargetEntity}|%X{TargetServiceName}|%X{TargetElement}|%X{User}|${p_logger}|${p_mdc}|${p_message}|${p_marker}%n" />
for
2018-08-28T16:26:57.887Z||2018-08-28T16:26:57.883Z|||7741c447-cd4f-4f0a-9e7a-9abc3eebe8c1||http-nio-8080-exec-30|/logging-demo/rest/health/health|||||INFO||||localhost|0:0:0:0:0:0:0:1|||||||org.onap.demo.logging.ApplicationService|ResponseCode=, InstanceUUID=7741c447-cd4f-4f0a-9e7a-9abc3eebe8c1, RequestID=c897a12f-fa51-4450-8645-3ee907946695, ServiceName=/logging-demo/rest/health/health, ResponseDescription=, InvocationID=067ba289-9017-462a-8d68-aa2670a04205, Severity=, InvokeTimestamp=2018-08-28T16:26:57.883Z, PartnerName=, ClientIPAddress=0:0:0:0:0:0:0:1, ServerFQDN=localhost, StatusCode=||EXIT
${p_logger}| 26
The name of the class doing the logging (in my case the ApplicationController – close to the targetservicename but at the class granular level
${p_mdc}|27
The key/value pairs all in one pipe field (will have some duplications currently whit MDC’s that are in their own pipe – but allows us to expand the MDC list – replaces customvalue1-3 older fields)
${p_marker}%n" 29
The marker labels INVOKE, ENTRY, EXIT – and later will also include DEBUG, AUDIT, METRICS, ERROR when we go to 1 log file
Michael O'Brien
Talking with Lorraine - follow her classifications to handle the mdc and marker fields
log system type = marker
https://wiki.acumos.org/display/OAM/Log+Standards?focusedCommentId=10028838&refresh=1535137583955#comment-10028838
Michael O'Brien
For Casablanca log file locations should be /var/log/onap - not /var/log/ONAP - the spec will be adjusted
/michael
Prudence Au
Frankfurt spec has the location updated.
Michael O'Brien
LOG-634 - Getting issue details... STATUS
see
https://wiki.onap.org/display/DW/LOG+Meeting+Minutes+2018-06-26
and the spec
https://wiki.onap.org/pages/viewpage.action?pageId=28378955#ONAPApplicationLoggingSpecificationv1.2(Casablanca)-MDC-RequestID
we need a way to support client/server calls between pods running the old or new versions of the http headers
[~pau2882] is running into this during communication from pomba aggregator to context builder (based on AAI which is using the pre-casablanca headers)
Larger issue is teams are not aware of the spec change - because we not communicated it yet.
Plan is to support non-casablanca spec in casablanca and communicate the spec in dublin
Michael O'Brien
TODO:
LOG-635 - Getting issue details... STATUS
[~pau2882] and [~bbenesch] found this discrepancy in the spec when they used the X-ONAP-PartnerName header attribute
We already expose support for the header
public static final String PARTNER_NAME = "X-ONAP-PartnerName";
https://git.onap.org/logging-analytics/tree/reference/logging-slf4j/src/main/java/org/onap/logging/ref/slf4j/ONAPLogAdapter.java#n313
add to the spec under
https://wiki.onap.org/pages/viewpage.action?pageId=28378955#ONAPApplicationLoggingSpecificationv1.2(Casablanca)-MDC-PartnerName
Steve Smokowski
Response Code and Response Description, seem to indicate they are conditional and only populated on error scenarios based on the text description. however they are marked as mandatory. Do you expect us to always return these values?
Steve Smokowski
Invoke and Invoke-return seem semi confusing, are you asking for the invoke marker on the initial request?, then invoke-return on the response from that component? The current sample implementation does not even have an invoke-return marker.
Prudence Au
Yes, invoke marker is for the initial request.
Yes, invoke-return marker is for the response.
Added LOG-1174 - Getting issue details... STATUS to track adding a sample implementation of invoke-return marker.
Michal Dziedzic
I use
defaultPattern
as it is mentioned above. The problem is that my output looks like:org.onap.dcaegen2.services.prh.MainApp |2018-09-06T11:38:06.527Z |INFO |\|T\|h\|e\| \|f\|o\|l\|l\|o\|w\|i\|n\|g\| \|p\|r\|o\|f\|i\|l\|e\|s\| \|a\|r\|e\| \|a\|c\|t\|i\|v\|e\|:\| \|d\|e\|v\| |\| |\| |\| |main |
removing "%replace(... ){'|','\\\\|'}" makes it look better:
org.onap.dcaegen2.services.prh.MainApp |2018-09-06T11:49:12.722Z |INFO |The following profiles are active: dev | | | |main |
However all is in single line. Is it good pattern or I'm doing something wrong? https://gerrit.onap.org/r/#/c/64993/
Michael O'Brien
Steve Smokowski
I noticed something similar as well, when using the newer format
Michael O'Brien
I am seeing the following with the WIP https://gerrit.onap.org/r/#/c/62405/ however I am still missing some field values - specifically the empty tabs after "INFO" so I may have the same issue - investigating....
Michael O'Brien
Acumos meet min - some more alignment on the logback spec - we released the final version for Casablanca (ONAP) and Athena (Acumos)
the Dublin version ONAP Application Logging Specification - Post Dublin will contain changes post M4 code freeze (20180920) and athena for Acumos (20180914)
synced
version 30 of
https://wiki.acumos.org/display/OAM/Log+Standards
with version 111 of
ONAP Application Logging Specification v1.2 (Casablanca)
details on
LOG Meeting Minutes 2018-09-20 - M4 Code Freeze
Michael O'Brien
TODO: correlate "X-FromAppId" existing in code like SO and AAI to a field here in the new spec like TargetEntity for values like "SDC-BE"
in VFC it is at the container not pod level, in SO it is just the pod like "MSO" in AAI it is required - or an exception will be thrown
X-FromAppID is not generated like we do for requestID via X-Transaction...
Dave Williamson
We are seeing an issue with multiple concurrent end-to-end test environments running multiple releases. Let's suppose we have two end-to-end test environments T1 and T2, each of which is a complete ONAP deployment, but we only have one instance of a log repository (e.g. ELK) in which we are keeping the logs. I am looking for a solution for how to tell the two environments apart in the logs. Granted I could perhaps look at things like host/VM names, but what if I have containers that are not from the same environment or release on the same VM?
it is starting to seem that the only way to really solve this once and for all is to put an environment and release indicator into every log message. This could work as follows:
Does anyone have any other ideas for how to distinguish between environments and releases or how to get environment and release indicators into the log?
Note this is probably something for Dublin if not for the "E" release.
Michael O'Brien
Good point Dave, I will be landing soon and will address this issue of differentiation - likely with a global k8s tag in the root values.yaml common chart that is injected as an env variable - so deployments are self aware. Definitely for Dublin - and likely for other teams.
LOG-707 - Getting issue details... STATUS
nice
/michael
Jorge Hernandez
Hello Michael O'Brien and #logging team. These are some comments for your consideration, as we are in the initial phases of scoping out the Dublin work::
1. L0-L3 compliance
L0-L3 logging compliance levels seem more oriented to java projects and a progressive move to use the logging reference library. It seems to me though that compliance of the actual "logging entries content and format", "file location", etc .. is not 100% guaranteed at higher levels, while a project at a lower level may be compliant in all those aspects. A thought is that it may be a better measure to define the levels terms of adherence to the spec as a blackbox. For example, and just to say: L0: conformance to spec (measurable by some tooling that checks the format), L1: configuration at /var/log/onap/<component>/<subcomponent>, ..
2. Logging reference implementation enhancements.
I realize this may be difficult, but it would be great to have a logging reference implementation that embeds natively as a provider under the slf4j umbrella (as the logback and log4j libraries do), preserving the universally accepted logging APIs provided by slf4j, and not to lose precision in some formatting flags. For example, the logging reference implementation, I think, since it is wrapping on top slf4j, would lose functionality with regards to the %C, %L, and %M, as logging entries would point to the wrapper library, instead of the real location at where the emission of the logging statement took place. These are preserved by a native implementation, such as logback and log4j do as direct providers under slf4j.
3. Semantics of audit/error/metrics files?
It was not clear if the audit/error/metrics/debug types of files are preserved, and some references to an unified log file seem to be suggested in the comments. It seems to me that the semantics attached to the audit/error/metrics (but not debug) as they were originally defined is what is mostly of interest for the analytics team and the spec.
4. Scope of the spec
I gather the impression, although not 100% clear, that the spec would apply to all logging activities for any given component. I think there should be a clear partition into 2 categories of logging files in the spec for individual ONAP components to produce:
a) the first category would apply to those logs that are relevant from a global view of the system, and are subject to analytics processing (the consumer would be an ONAP operator), and
b) the second category for those that are only of relevance to particular development teams, that can only make sense of by development teams with intimate domain specific knowledge, and not particularly useful to everybody else. This b) category will also include those files that individual component teams consider of little value to be fit in the a) category plus those that cannot conform with the spec from 3rd party software (such as databases, middleware, and libraries).
Those in category a) are the ones that should be suitable to global analytics processing, and should be required strict compliance to the logging spec. This would also alleviate performance issues, and will prevent the cluttering of "global" records with unnecessary information. It is well known of the significant performance overhead that verbose logging can imposed in a system, and more importantly seems to me this will greatly impact system performance with the post-mortem analytics processing (for example some libraries may log any partial socket read or write from/into the wire when enabled in debug or trace mode).
For those in b) category, I suggest to explicitly be given some degree of leeway in compliance in the spec. The leeway may be in terms of formatting, the current spec dictates very verbose logging records, and don't seem very effective when involved in debugging and troubleshooting activities with multiple log files. It also requires special crafting when dealing with async systems where the MDC structure is not straightforward to maintain. On the other hand, I think some aspects from the spec should be preserved such as 30 day retention period.
Terry Schmalzried
I find it impossible to implement this ONAP Logging specification from python because the end goal – the log record format to be output – is not finished. Please complete the Text Output section.
I tried to interpret the reworked spec example from the following link but could not be certain what text would be output for all fields:
https://gerrit.onap.org/r/#/c/62405/20/reference/logging-kubernetes/logdemonode/charts/logdemonode/resources/config/logback.xml
A few examples:
1. It has at least three timestamp fields but only one of them is shown with a format.
2. The ElapsedTime field format is not described.
3. The MDC field 27 is shown in an example in the Text Output section we are told to ignore for now, but as currently described (key=value, delimited by commas, newlines and tabs escaped) it won't be parseable if any of the MDCs contains a comma. (Also for ACUMOS that uses pipes instead of tabs as a delimiter, it won't be parseable if any of the MDCs contains a pipe.) Instead of trying to invent a new serialization, why not use a proven method like converting MDCs to a JSON string? Then escape the delimiter when putting it into a delimited field of your output log record?
Kevin Smokowski
https://github.com/att/EELF gives a 404 for me.
Kevin Smokowski
It sounds like EELF is a facade of a facade (slf4j). The link to EELF is broken what is the purpose of EELF if to use MDCs slf4j wrapper has to be used? Slf4j is known by most java developers their website and code is easily found and working...
Gerard Nugent
I am a little confused on the difference between what the Audit and Metrics log actually log. Reading the description I assumed the Audit log had the API call and the successful response (where relevant). Conversely, I picked up that the Metrics log could log every step from the call to the return. Unless I am misreading, the above example https://gerrit.onap.org/r/#/c/62405/20/reference/logging-kubernetes/logdemonode/charts/logdemonode/resources/config/logback.xml has the two files logging the exact same information.
Steve Smokowski
So Audit ideally would be incoming calls to your server and its final response. Metric would be outgoing calls from your server to another Server and its response. I agree overall the split is rather confusing.
In SO we use the following configurations
For Audit we filter like this.
<filter class="ch.qos.logback.core.filter.EvaluatorFilter">
<evaluator class="ch.qos.logback.classic.boolex.OnMarkerEvaluator">
<marker>EXIT</marker>
</evaluator>
<onMismatch>DENY</onMismatch>
<onMatch>ACCEPT</onMatch>
</filter>
On Metric we filter like so
<filter class="ch.qos.logback.core.filter.EvaluatorFilter">
<evaluator class="ch.qos.logback.classic.boolex.OnMarkerEvaluator">
<marker>INVOKE_RETURN</marker>
</evaluator>
<onMismatch>DENY</onMismatch>
<onMatch>ACCEPT</onMatch>
</filter>
Dave Williamson
It looks like p_message and p_marker are defined as the same thing. I suspect p_message was intended to be something else. Does anybody know what happened here?
Dave Williamson
In looking at this a little further, I believe the description of p_marker was accidentally pasted over the description of p_message. Since there is no other free-format field anywhere in the list, I believe p_message was intended to be that.
Lorraine Welch
p_message
log system
The marker labels INVOKE, ENTRY, EXIT – and later will also include DEBUG, AUDIT, METRICS, ERROR when we go to 1 log file - this field is %marker
p_marker
log system
The marker labels INVOKE, ENTRY, EXIT – and later will also include DEBUG, AUDIT, METRICS, ERROR when we go to 1 log file - this field is %marker
Good catch - p_message was intended to be the text put %msg; it is not the same as p_marker
Message
log system
standard attribute - defined in logback.xml - Message - used for %msg%
I went back to many previous version of this wiki page and could not find one with p_message properly filled out.
Message
The free text payload of a log event.
This is the most important item of information in most log messages. See General guidelines.
Please refer to https://gerrit.onap.org/r/#/c/logging-analytics/+/62405/20/reference/logging-demo/src/main/resources/logback.xml: Lines 40,79
Dave Williamson
There is no mention of what ServiceInstanceID is supposed to contain. This should probably be spelled out.
Dave Williamson
There is mention in some of the discussion about ServerFQDN that it supercedes VirtualServerName, yet VirtualServerName is still in the table of MDCs. Should VirtualServerName be removed from the table of MDCs or is there some reason why we are leaving it there for a release or two?
Dave Williamson
Per discussion at the ONAP Logging project meeting on 7/23/2019, we clarified that VirtualServerName is being retained in the Casablanca spec (and through at least El Alto) for backward compatibility. In may be removed in Frankfurt or later.
Dave Williamson
There is no mention of what ContextName is supposed to contain. This should probably be spelled out.
Dave Williamson
Per discussion at the ONAP Logging project meeting on 7/23/2019, we were not able to find any definition for ContextName; therefore, it does not have to be provided.
Dave Williamson
Can we please clarify what we expect to happen with InvocationID in the case of an asynchronous response? I could see two possibilities:
1) The asynchronous response contains the InvocationID of the original request
2) The asynchronous response, itself being a request, contains a new InvocationID
Dave Williamson
Per discussion at the ONAP Logging project meeting on 7/23/2019, we decided to go with option #1 (asynchronous response contains the InvocationID of the original request) to assist in tying the async response to the original request.
Dave Williamson
Regarding PartnerName, the following language is starting to make less and less sense every time I read it:
unauthenticated = The part of the URI specifying the agent that the caller used to make the call to the component that is logging the message.
I had been interpreting "agent" as the API handler on the called side that was interpreting the request. Thus, the applicable part of the URI would be the initial elements that identify that API handler. Any parameters to that API handler (provided in the form of path info, query string, or fragment) would be omitted since that is not necessary to identify the API handler.
But I am now starting to wonder if that really makes sense here. "The agent that the caller used to make the call" is starting to sound more to me like a browser, curl, or whatever the caller happened to use to initiate the HTTP request. This would make sense except I don't think that is part of the URI. It is; however, probably contained in one of the headers, likely "User-Agent". I am wondering if this would be more correctly worded as follows:
unauthenticated = The HTTP header specifying the agent that the caller used to make the call to the component that is logging the message (e.g. "User-Agent").
Luke Parker
That first paragraph looks to have evolved from a similarly puzzling description in the original guidelines at https://wiki.onap.org/download/attachments/1015849/ONAP%20application%20logging%20guidelines.pdf?api=v2, and my guess is that nobody dared touch the bit they didn't understand.
After the opening paragraph, the description starts to make more sense. It identifies the caller. It's derived and populated inconsistently, but I think it's the short name of the calling component.
Dave Williamson
Thanks, Luke. At least the original guidelines you referenced do not talk about taking a part of the URI; any part of the URI would seem to have little hope of providing information about the calling component. For the unauthenticated case, do you think grabbing something like the "User-Agent" HTTP header makes sense?
Dave Williamson
We discussed this at the 8/6/2019 ONAP logging meeting and concluded that using the "User-Agent" HTTP header as the fallback makes sense. Thus, the current strategy is as follows:
Kevin Smokowski
Quoting https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent
The User-Agent request header contains a characteristic string that allows the network protocol peers to identify the application type, operating system, software vendor or software version of the requesting software user agent.
Example value
To me this seems more confusing than helpful. I couldn't find any application code that sets this to a meaningful value, can someone link me to source code that suggests otherwise?
Kevin Smokowski
ResponseCode should this be the HTTP status code such as 200 or 403?
ResponseDesc should this be the text description of the HTTP status code? for example HTTP 200 it would be OK HTTP 403 would be Forbidden and 401 would be Unauthorized
Dave Williamson
By my understanding, ResponseCode is supposed to contain application-specific error codes. That does not sound like HTTP status codes to me. For context, a recommended (required?) categorization was provided in the initial logging specification as follows: 0=success, 100=permission errors, 200=availability errors/timeouts, 300=data errors, 400=schema errors, 500=business process errors, 900=unknown errors. That is definitely not HTTP status codes. That table was removed at some point and the work of expanding the definition is listed in this spec as pending work that was never done. We should probably work this in El Alto or Frankfurt, but I am not sure what we should be telling developers to do in the meantime. Any thoughts on that from others on the team?
Dave Williamson
Over the past few days, we have had some further email discussion as well as in today's ONAP logging project meeting. I think we have settled on the idea that a more expansive understanding of what an application-specific error code could be is appropriate. In particular, there seems to be little trouble in allowing an application team to decide that their HTTP-based application will log HTTP status codes as the ResponseCode.
However, that does not mean this is encouraged. The HTTP status codes are rather broad and generic (by design) and do not provide as much detail about what went wrong compared to what a well chosen list of application-specific codes could.
And, of course, since not all applications are HTTP-based, it would be inappropriate to define in the spec that HTTP status codes should be reported in ResponseCode. Thus, the spec seems fine as written.
Kevin Smokowski
In an environment where docker & k8s is being used what will the value of ServerIPAddress be?
Kevin Smokowski
Is there as standard set of values for TargetEntity? You wouldn't want different values represent the same entity IE "AAI", "A&AI", "A and AI"
Dave Williamson
This was not spelled out in this spec but is certainly needed for the exact reason you point out. In the El Alto time frame, AT&T will be proposing to add standardized ONAP component names to the ONAPLogConstants class. In the meantime within AT&T, please use the updated version of this class that is being delivered as part of the 2019.11 logging project 319912.
Kevin Smokowski
Is there any list of valid values for partner name? At least for the major components?
Dave Williamson
Note: I had originally posted the following reply as a separate comment on 8/6. I am re-posting as a reply.
This was not spelled out in this spec. In the El Alto time frame, AT&T will be proposing to add standardized ONAP component names to the ONAPLogConstants class. In the meantime within AT&T, please use the updated version of this class that is being delivered as part of the 2019.11 logging project 319912.
Kevin Smokowski
Should PartnerName and TargetEntity be restricted to the same set of values? I saw the example value of SDC-BE given for both.
Dave Williamson
The way I look at this, the short answer is "no" but there is a bit more to it than that.
With regard to PartnerName, AT&T will be proposing in the El Alto time frame that, for non-authenticated APIs, it be logged as "Component.Microservice" with Component being defined as the standardized name of the relevant component (this will be a proposed addition to ONAPLogConstants) and Microservice being consistent within any given microservice. Of course, this depends on X-ONAP-PartnerName being provided in HTTP headers (another proposal–all of this ties together) at least within ONAP. External entities that are not authenticated would get logged as either the User-Agent header or simply "UNKNOWN".
On the other hand, TargetEntity is defined as the ONAP component or sub-component name or the name of an external entity. For a TargetEntity within ONAP, using the same standard component names as for PartnerName makes sense to me, but probably not a microservice name (how do I as the caller know how you the API provider names your microservices). The fact that TargetServiceName would specify the specific service being called, I do not see the need to provide anything more in TargetEntity. How to name external entities seems like it could also be standardized but that is likely outside the scope of an ONAP spec.
Kevin Smokowski
For a TargetEntity within ONAP, using the same standard component names as for PartnerName makes sense to me
If we could standardize even this that would be good.
how do I as the caller know how you the API provider names your microservices?
Is the caller directly calling the microservice?
Kevin Smokowski
The logging constants have COMPLETED, but the spec says COMPLETE
See https://gerrit.onap.org/r/gitweb?p=logging-analytics.git;a=blob;f=reference/logging-slf4j/src/main/java/org/onap/logging/ref/slf4j/ONAPLogConstants.java;h=52aeac5ce33bc2cb992c653e89342412a367e174;hb=refs/heads/casablanca#l207
Quoting spec from above
MDC - StatusCode
This field indicates the high level status of the request. It must have the value COMPLETE when the request is successful and ERROR when there is a failure. And INPROGRESS for states between the two.
Dave Williamson
As far as I can tell, the code is wrong. The value should be "COMPLETE" not "COMPLETED".
Prudence Au
Created LOG-1164 - Getting issue details... STATUS to track this.
Kevin Smokowski
Above the MDC key is "ResponseDesc" but the constant in the code is "ResponseDescription" https://gerrit.onap.org/r/gitweb?p=logging-analytics.git;a=blob;f=reference/logging-slf4j/src/main/java/org/onap/logging/ref/slf4j/ONAPLogConstants.java;h=52aeac5ce33bc2cb992c653e89342412a367e174;hb=refs/heads/casablanca#l155
Dave Williamson
Similar to InstanceUUID/InstanceID, this MDC was known as ResponseDescription in Beijing and prior but was changed to ResponseDesc in Casablanca. At least in this case some of the samples/examples were updated, but not all of them were and the code was never updated (and even some places in the spec still refer to the prior name). As far as I can tell, all of these should have been updated to consistently refer to this as ResponseDesc.
Prudence Au
Created LOG-1165 - Getting issue details... STATUS to track this.
Kevin Smokowski
If you use a logging conversion pattern like
%d{"yyyy-MM-dd'T'HH:mm:ss.SSSXXX",UTC}
can you still easily calculate elapsed time? Elapsed time is not described above, an older spec states
Elapsed time to complete processing of an API or request at the granularity available to the component system. This value should be the difference between BeginTimestamp and EndTimestamp fields.
EndTimestamp is not present in casablanca logging spec so I was assuming elapsed time = LogTimestamp - BeginTimeStamp
Steve Smokowski
You would just calculate elapsed time in the filter as StartTime - now(), and write this to elapsed time.
Steve Smokowski
You would just calculate elapsed time in the filter as StartTime - now(), and write this to elapsed time.
Steve Smokowski
You would just calculate elapsed time in the filter as StartTime - now(), and write this to elapsed time.
Kevin Smokowski
Above the MDC key is InstanceID, but gerrit has InstanceUUID https://gerrit.onap.org/r/gitweb?p=logging-analytics.git;a=blob;f=reference/logging-slf4j/src/main/java/org/onap/logging/ref/slf4j/ONAPLogConstants.java;h=52aeac5ce33bc2cb992c653e89342412a367e174;hb=refs/heads/casablanca#l122
Dave Williamson
This MDC was called InstanceUUID in Beijing and prior but was changed to InstanceID starting with Casablanca. As I recall, the logic was having a simpler name that lined up better with RequestID, InvocationID, etc. Unfortunately, none of the code, example log messages, example Logback configs, and so on were updated. They should have been as far as I can tell.
Prudence Au
Created LOG-1166 - Getting issue details... STATUS to track this.
Kevin Smokowski
When a loadbalancer is present and the X-Forwarded-For header has been set, should the value of that header be used for the ClientIPAddress? That was my understanding, I want to make sure others agree.
Dave Williamson
The original ONAP logging spec (presumably from Amsterdam though I cannot find that exact spec to verify) had a concept of a composite RequestID. This was intended to be used in cases where a component must, for the same given transaction (e.g. RequestID), make many related API sub-requests to a second component which could elect to interpret the HTTP headers as a nonce value for purposes of detecting duplicate requests. The idea was to add a suffix to the RequestID in the form of "UUID:suffix" where the suffix could be either an integer or another UUID.
I have a suspicion that the composite RequestID was probably dropped when InvocationID was added since it could serve the same purpose. Sanjay Agraharam shares the same thinking. But I wanted to throw it out there to see if others agree at least that we got rid of composite RequestID for a good reason and that we don't want it back (as opposed to having unintentionally dropped it and thinking we need it back), if not also believing that this is why we made that decision.
Kevin Smokowski
Are error codes always numeric? Above I saw "
COMPONENT_X.STORAGE_ERROR
", would that literally appear in the log as an error code or is this the name of constant that points to a numeric value?