Anyway, if y’all have ever taken a look at the Server Log, it is something else.
I don’t have a lot of data to go on, but I have begun parsing this out into a dashboard format.
So far I have identified these (Top Level) fields:
ServerLog
DatabaseNotification
EcfChangeNotification
GlobalLicensing
RESTApi
Op
And these fields & attributes of “Op” (Where the good stuff is.)
Sql
License
RESTApi
BOReader
BpmCustomization
Exception
BAQ
What I need help with is identifying any fields in the Top Level Node, or more importantly the “Op”
node, (For anything I don’t have listed,) with associated sample data, so I can write parsers for those.
I have parsers written for all the ones listed already.
Use metrics to track the occurrence of an event, counting of items, the time taken to perform an action or to report the current value of a resource (CPU, memory, etc.)
Use logs to track detailed information about an event also monitored by a metric, particularly errors, warnings or other exceptional situations.
A trace provides visibility into how a request is processed across multiple services in a microservices environment. Every trace needs to have a unique identifier associated with it.
From a DevOps perspective, we monitor to ensure software quality. When we make changes to the software, did we reduce errors and/or make it more performant? Creating metrics, logs, and traces is easy. Creating ACTIONABLE metrics, logs, and traces requires some planning - which requires thinking about automation. It’s very inefficient to have humans process logs. We don’t have time and will only do so during a post-mortem. When using observability tools, we can be proactive and have the system notify or even react to observability data:
How many errors in MRP? More or fewer than the last run? How long did it run? How does that compare to the last ten runs?
How many overall errors after a patch installation? Speed improvement or regression?
What’s the current Session Count?
How long has that job had no activity?
What was the CPU, Memory, networking, and Disc utilization during events?
How long since that last successful SQL backup?
All this is dumped into a system that can then alert or even perform actions like:
reboot a VM
restart a service
kill a container that’s unresponsive
add more containers (scale out)
remove containers (scale in)
add an Issue to a GitHub repository
message an Admin
send shocks to the developer’s collar
There are many tools, mostly cloud (see below), but might as well mention Azure Monitor, which works for both on-prem and cloud workloads. Click the link above to learn more.