Discussion:
Cooked audit log format
(too old to reply)
Matthew Booth
2008-05-11 21:40:48 UTC
Permalink
As recently mentioned, Linux audit logs[1] are fairly hideous, and
although machine readability may have been a design goal, I'd argue
they're not too friendly in that regard either. I suspect, in fact, that
the principal driver has been machine producability ;)

I've noticed that a number of utilities cook the logs slightly. I've
shied away from this to date because I want to be able to leverage
existing tools. However, if some standard emerged (or has emerged and I
missed it) for cooked logs, I'd be extremely interested in implementing
that.

Simple starters would include:
* Translating the architecture and syscall names into human.
* Jumping one way or the other with the hex strings business.
* Translating socket addresses into human.
* Translating timestamps into human.
* Ditching uninteresting records, such as PATH with no name for the
dynamic linker, and 2 PATH records when execing a script.

with an ultimate goal of:
* Defining an expected set of data for every system call and putting
them all on a single line in a well defined format.

Is anybody doing any work in this direction?

Matt

[1] Of course, they're really accounting logs produced by the accounting
daemon. If you actually audit your accounting logs, this seemingly
pedantic point can become quite confusing.
--
Matthew Booth, RHCA, RHCSS
Red Hat, Global Professional Services

M: +44 (0)7977 267231
GPG ID: D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
Steve Grubb
2008-05-12 14:43:17 UTC
Permalink
Post by Matthew Booth
I've noticed that a number of utilities cook the logs slightly. I've
shied away from this to date because I want to be able to leverage
existing tools. However, if some standard emerged (or has emerged and I
missed it) for cooked logs, I'd be extremely interested in implementing
that.
* Translating the architecture and syscall names into human.
libauparse, ausearch, & ausyscall can do this.
Post by Matthew Booth
* Jumping one way or the other with the hex strings business
not sure what you mean by this. ausearch, aureport, & libauparse can handle
them.
Post by Matthew Booth
* Translating socket addresses into human.
libauparse, ausearch, and aureport all do this.
Post by Matthew Booth
* Translating timestamps into human.
libauparse, ausearch, and aureport all do this.
Post by Matthew Booth
* Ditching uninteresting records, such as PATH with no name for the
dynamic linker, and 2 PATH records when execing a script.
* Defining an expected set of data for every system call and putting
them all on a single line in a well defined format.
I have a feeling that too will become an abomination. aureport tries to get
the audit events down to the bare essentials. But what you wind up with is
something that makes you want more details. When you add more details you
feel like you want less.
Post by Matthew Booth
Is anybody doing any work in this direction?
Not really. Part of the problem is that I occasionally hear complaints about
the audit format, but then no one that is actually /using/ the audit output
is willing to help define what an auditor needs. I'd really like this to come
from people who do this as their job.

I can take a guess at what's needed. But I really want to hear it from the
Security Officer's perspective.

One thing that is on the TODO list is to make a output format that is like
strace for syscall records. At least people have experience reading strace
output and it might help make one class of record easier to understand. Doing
this will be a big job, so I want to get some important things like remote
logging finished before jumping into it.

-Steve
Matthew Booth
2008-05-12 15:02:36 UTC
Permalink
Post by Steve Grubb
Post by Matthew Booth
* Translating the architecture and syscall names into human.
libauparse, ausearch, & ausyscall can do this.
Post by Matthew Booth
* Jumping one way or the other with the hex strings business
not sure what you mean by this. ausearch, aureport, & libauparse can handle
them.
Strings should be either always hex encoded, or always escaped
(preferably the latter).
Post by Steve Grubb
Post by Matthew Booth
* Translating socket addresses into human.
libauparse, ausearch, and aureport all do this.
Post by Matthew Booth
* Translating timestamps into human.
libauparse, ausearch, and aureport all do this.
No doubt, but I'm interested in a general agreement around the output,
not which tools can generate it. My customer is using a third party
audit tool to collate logs from a large number of sources including
Linux accounting logs, but also including HP-UX, Solaris, Windows, AIX,
door sensors, etc... There is currently no good steer for third party
tool vendors about what log format they should support, hence I have
recommended uncooked. However, the problem with uncooked logs is that
they are offensive to the human eye ;) This makes life difficult for an
operator presented with a bunch of logs to look at, which together form
some interesting event.
Post by Steve Grubb
Post by Matthew Booth
* Ditching uninteresting records, such as PATH with no name for the
dynamic linker, and 2 PATH records when execing a script.
Oh, also:

* Ditching CWD and making all PATH records absolute.
Post by Steve Grubb
Post by Matthew Booth
* Defining an expected set of data for every system call and putting
them all on a single line in a well defined format.
I have a feeling that too will become an abomination. aureport tries to get
the audit events down to the bare essentials. But what you wind up with is
something that makes you want more details. When you add more details you
feel like you want less.
The goal is semi human-readability in a standard, machine-readable
format. So include all the abominable details, but at the end of the
line ;) And put everything on 1 line. And define exactly what will be on
that line, every time.
Post by Steve Grubb
Post by Matthew Booth
Is anybody doing any work in this direction?
Not really. Part of the problem is that I occasionally hear complaints about
the audit format, but then no one that is actually /using/ the audit output
is willing to help define what an auditor needs. I'd really like this to come
from people who do this as their job.
I can take a guess at what's needed. But I really want to hear it from the
Security Officer's perspective.
One thing that is on the TODO list is to make a output format that is like
strace for syscall records. At least people have experience reading strace
output and it might help make one class of record easier to understand. Doing
this will be a big job, so I want to get some important things like remote
logging finished before jumping into it.
I don't underestimate the size of the task: it's a huge mountain of
donkey work, but it really has to be done. And maintained...

Matt
--
Matthew Booth, RHCA, RHCSS
Red Hat, Global Professional Services

M: +44 (0)7977 267231
GPG ID: D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
Steve Grubb
2008-05-12 15:19:46 UTC
Permalink
Post by Matthew Booth
Post by Steve Grubb
Post by Matthew Booth
* Translating the architecture and syscall names into human.
libauparse, ausearch, & ausyscall can do this.
Post by Matthew Booth
* Jumping one way or the other with the hex strings business
not sure what you mean by this. ausearch, aureport, & libauparse can
handle them.
Strings should be either always hex encoded, or always escaped
(preferably the latter).
The issue that always dominates any thinking about the audit system is how to
save diskspace. So, whenever a string has no naughty characters, we let it go
as is. If the string contains something that will confuse the parser or do
other bad things, we encode the string such that the parser cannot be
confused. But we only do that on demand because the majority of strings are
well-behaved.
Post by Matthew Booth
Post by Steve Grubb
Post by Matthew Booth
* Translating timestamps into human.
libauparse, ausearch, and aureport all do this.
No doubt, but I'm interested in a general agreement around the output,
Sure, if someone that does auditing steps forward and wants to help define a
standard, we can code something up. That has been the whole issue all this
time.

-Steve
LC Bruzenak
2008-05-12 15:50:35 UTC
Permalink
Q: Will the (hopefully) soon-to-be released visualization tool have any
influence on this discussion? Also aggregation?

My hope is that I'd only look at human-readable audit data which is
aggregated on one central repository. For me that means the transfer
sizes are important. Ideally to me, the data would be raw/compressed and
sent to a common place with guaranteed delivery.

It would be at that point where the visualization, cooking, translating,
etc. occurs. The more the better. :)

Regardless, my original question was would more cooking find its way
into the visualization tool? And any idea of when that may be released?

LCB.
--
LC (Lenny) Bruzenak
***@magitekltd.com
Miloslav Trmač
2008-05-12 16:09:40 UTC
Permalink
Steve Grubb
2008-05-12 16:34:56 UTC
Permalink
Hello,
Post by LC Bruzenak
Q: Will the (hopefully) soon-to-be released visualization tool have any
influence on this discussion?
Regardless, my original question was would more cooking find its way
into the visualization tool? And any idea of when that may be released?
A preliminary version will be easily installable in a few days; you can
download a tarball from https://fedorahosted.org/audit-viewer right now,
but building the required python-gtkextra bindings requires some
effort.[1]
It can also be pulled from koji for FC-9 packages:

http://koji.fedoraproject.org/koji/buildinfo?buildID=47990

-Steve
LC Bruzenak
2008-05-12 16:44:29 UTC
Permalink
...
Post by Steve Grubb
A preliminary version will be easily installable in a few days; you can
download a tarball from https://fedorahosted.org/audit-viewer right now,
but building the required python-gtkextra bindings requires some
effort.[1]
http://koji.fedoraproject.org/koji/buildinfo?buildID=47990
-Steve
--
Awesome; I have been really looking forward to this!
Thanks to you both.

LCB.
--
LC (Lenny) Bruzenak
***@magitekltd.com
Matthew Booth
2008-05-12 16:53:20 UTC
Permalink
Post by LC Bruzenak
Q: Will the (hopefully) soon-to-be released visualization tool have any
influence on this discussion? Also aggregation?
Not really. I'm looking for accounting logs to be parsable by both
humans and arbitrary third party products. What's required is a usable,
well defined format. (Think: why should MS publish SMB when Windows does
it just fine?)

Matt
--
Matthew Booth, RHCA, RHCSS
Red Hat, Global Professional Services

M: +44 (0)7977 267231
GPG ID: D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
John Dennis
2008-05-12 16:12:34 UTC
Permalink
Post by Steve Grubb
Post by Matthew Booth
Strings should be either always hex encoded, or always escaped
(preferably the latter).
The issue that always dominates any thinking about the audit system is how to
save diskspace. So, whenever a string has no naughty characters, we let it go
as is. If the string contains something that will confuse the parser or do
other bad things, we encode the string such that the parser cannot be
confused. But we only do that on demand because the majority of strings are
well-behaved.
This is not a true statement, unless the kernel has been patched
recently the handling of strings is seriously broken, a fact which has
been pointed out numerous times. It is also not true that parser cannot
be confused by the string format, also pointed out several times. It
should also be a goal that libraries other than auparse be capable of
parsing audit strings. It should also be a goal that correct parsing of
audit logs not be dependent on specific kernel versions.

The extra bytes in question would likely never exceed .01% of total file
size thus concerns about the extra bytes needed to properly escape a
string hogging disk space should not advanced in 2008 with large disks
and high bandwidth networks, reliable parsing trumps 1970's optimization
concerns.
--
John Dennis <***@redhat.com>
Eric Paris
2008-05-12 20:56:23 UTC
Permalink
It should also be a goal that correct parsing of audit logs not be
dependent on specific kernel versions.
I've heard you say this a number of times. Can you let me know specific
examples of what you had to code around due to specific kernel versions
since the audit system started to settle down?

-Eric
John Dennis
2008-05-13 12:30:05 UTC
Permalink
Post by Eric Paris
It should also be a goal that correct parsing of audit logs not be
dependent on specific kernel versions.
I've heard you say this a number of times. Can you let me know specific
examples of what you had to code around due to specific kernel versions
since the audit system started to settle down?
The set of strings which are encoded in hex vary depending on kernel
version. That set of special audit fields currently must be hard coded
into any code which attempts to parse audit output or the parsing will
fail, sometimes catastrophically. A list of these special field values
differing between two popular kernel versions has been previously posted
to this list. That list is not complete. If the rules for string
encoding were proper and regular this issue would not exist.
--
John Dennis <***@redhat.com>
Tony Jones
2008-05-15 10:28:15 UTC
Permalink
Post by Steve Grubb
Post by Matthew Booth
Strings should be either always hex encoded, or always escaped
(preferably the latter).
The issue that always dominates any thinking about the audit system is how to
save diskspace. So, whenever a string has no naughty characters, we let it go
as is. If the string contains something that will confuse the parser or do
other bad things, we encode the string such that the parser cannot be
confused. But we only do that on demand because the majority of strings are
well-behaved.
Are you talking here about the escaping that is performed inside of auditd? If
so, IMO, this seriously needs to be reworked. The way it works (encoding the
entire string rather than just escapinng the offending characters) doesn't
make sense plus it's very inefficient in terms of implementation. I mentioned
this to you in private mail at the time of the buffer overflow advisory. I'm
happy to work on a patch but it's always possible I'm missing some design
subtlety ;-)

thanks!

Tony
Steve Grubb
2008-05-15 12:44:37 UTC
Permalink
Post by Tony Jones
Post by Steve Grubb
Post by Matthew Booth
Strings should be either always hex encoded, or always escaped
(preferably the latter).
The issue that always dominates any thinking about the audit system is
how to save diskspace. So, whenever a string has no naughty characters,
we let it go as is. If the string contains something that will confuse
the parser or do other bad things, we encode the string such that the
parser cannot be confused. But we only do that on demand because the
majority of strings are well-behaved.
Are you talking here about the escaping that is performed inside of auditd?
If so, IMO, this seriously needs to be reworked. The way it works (encoding
the entire string rather than just escapinng the offending characters)
doesn't make sense plus it's very inefficient in terms of implementation. I
mentioned this to you in private mail at the time of the buffer overflow
advisory. I'm happy to work on a patch but it's always possible I'm missing
some design subtlety ;-)
Before sending a patch, it has to be backwards compatible. IOW, there is no
guarantee that someone will update user space tools and run an old kernel or
use a new kernel with old tools. There's no way to enforce this and people
will expect their tools to work.

Also note that the hex string encoding is used to encode some data structures,
so you would need to be judicious in which fields use whatever encoding.

About the scheduling of such a patch, I wouldn't want to merge the patch until
the remote logging is complete. (Its the last scheduled feature of the
current development series.) After that point, I think we are at the
branching point for big changes again.

-Steve
John Dennis
2008-05-15 15:59:33 UTC
Permalink
Post by Steve Grubb
Post by Tony Jones
Post by Steve Grubb
Post by Matthew Booth
Strings should be either always hex encoded, or always escaped
(preferably the latter).
The issue that always dominates any thinking about the audit system is
how to save diskspace. So, whenever a string has no naughty characters,
we let it go as is. If the string contains something that will confuse
the parser or do other bad things, we encode the string such that the
parser cannot be confused. But we only do that on demand because the
majority of strings are well-behaved.
Are you talking here about the escaping that is performed inside of auditd?
If so, IMO, this seriously needs to be reworked. The way it works (encoding
the entire string rather than just escapinng the offending characters)
doesn't make sense plus it's very inefficient in terms of implementation. I
mentioned this to you in private mail at the time of the buffer overflow
advisory. I'm happy to work on a patch but it's always possible I'm missing
some design subtlety ;-)
Before sending a patch, it has to be backwards compatible. IOW, there is no
guarantee that someone will update user space tools and run an old kernel or
use a new kernel with old tools. There's no way to enforce this and people
will expect their tools to work.
Also note that the hex string encoding is used to encode some data structures,
so you would need to be judicious in which fields use whatever encoding.
About the scheduling of such a patch, I wouldn't want to merge the patch until
the remote logging is complete. (Its the last scheduled feature of the
current development series.) After that point, I think we are at the
branching point for big changes again.
String encoding is broken. Preserving backward compatibility with an
unusable format does not make much sense. The sooner it gets fixed the
better and the less the overall pain will be.

It's broken, it's needs to be fixed, end of story.
--
John Dennis <***@redhat.com>
Continue reading on narkive:
Loading...