Discussion:
Audit for live supervision
(too old to reply)
Kay Hayen
2008-08-14 07:14:07 UTC
Permalink
Hello,

I would like to present our plan for using audit briefly. We have made a
prototype implementation, and discovered some things along the way.

We are making a middleware for ATC systems. We are writing it in Ada and
partially in Python. In Python we do mostly the prototypes, so the prototype
code is in Python.

For that, we have one problem, to uniquely identify a process that
communicated with the outside world. We have settled with the process start
date. That date can be determined in a way so that it is stable
(using /proc/stat btime field, elf note for Hertz value, and then translate
ticks from /proc/pid/stat into a date) and reproducible outside of the
process. Given the pid and start_date, we can check if a process is still
alive, reliably. The method is notably different from what ps does, which may
(or so I propose after looking at the source) output different start times in
different runs.

We have a daemon running that may or may not fork processes that it monitors,
for the communicating ones, we want to be able to tell everybody in the
system (spanning several nodes) that a communication partner is no more, for
non-communicating ones we simply want to observe and report that e.g. ntpd or
some monitoring/working shell script is running or not.

The identifier hostname/pid/start_date is therefore what what we call a "life"
of a process. It may restart, but the pid won't wrap around within one tick,
that is at least the limiting restriction.

Now one issue, I see is that the times that we get from auditd through the
socket from its child daemon may not match the start_date exactly. I think
they could. Actually we would prefer to receive the tick at which a process
started, instead of a absolute time dated fork event, because then we could
apply our code to calculate the stable time. Alternatively it would be nice
to know how the time value from auditd comes into existance. In principle
it's true, that for every event we should actually get the tick over a date,
at least both. Ticks are the real kernel time, aren't they?

Currently we feel we should apply a delta around the times to match them, and
that's somehow unstable methinks. We would prefer delta to be 0. Otherwise we
may e.g. run into pid number overruns much easier.

The other thing is sequence numbers. We see in the output sequence numbers for
each audit event. Very nice. But can you confirm where these sequence numbers
are created? Are they done in the kernel, in auditd or in its child daemon?

The underlying question is, how safe can we be that we didn't miss anything
when sequence numbers don't suggest so. We would like to use the lossless
mode of auditd. Does that simply mean that auditd may get behind in worst
case?

Then, we have first looked at auditd 1.2 (RHEL3), auditd 1.6 (RHEL5/Ubuntu)
and auditd 1.7 (Debian and self-compiled for RHEL 5.2). The format did
undergo important changes and it seems that 1.7 is much more friendly to
parse. Can you confirm that a type=EOE delimits every event (is that even the
correct term to use, audit trace, how is it called).

We can't build the rpm due to dependency problems, so I was using the hard
way, ./configure --prefix=/opt/auditd-1.7 and that works fine on our RHEL 5.2
it seems. What's not so clear to (me) is which kernel dependency there really
is. Were there interface changes at all? The changelog didn't suggest so.

BTW: Release-wise, will RHEL 5.3 include the latest auditd? That is our target
platform for a release next year, and it sure would be nice not to have to
fix up the audit installation.

One thing I observed with 1.7.4-1 from Debian Testing amd64 that we won't ever
see any clone events on the socket (and no forks, but we only know of cron
doing these anyway), but all execs and exit_groups.

The rules we use are:

# First rule - delete all
-D

# Increase the buffers to survive stress events.
# Make this bigger for busy systems
-b 320

# Feel free to add below this line. See auditctl man page

-a entry,always -S clone -S fork -S vfork
-a entry,always -S execve
-a entry,always -S exit_group -S exit


Very strange. Works fine with self-compile RHEL 5.2, I understand that you are
not Debian guys, I just wanted to ask you briefly if you were aware of
anything that could cause that. I am going to report that as a bug (to them)
otherwise.

With our rules file, we have grouped only similar purpose syscalls that we
care about. The goal we have is to track all newly created processes, their
exits and the code they run. If you are aware of anything we miss, please
point it out.

Also, it is true (I read that yesterday) that every syscall is slowed down for
every new rule? That means, we are making a mistake by not having only one
line? And is open() performance really affected by this? Does audit not
(yet?) use other tracing interface like SystemTap, etc. where people try to
have 0 cost for inactive traces.

Also on a general basis. Do you recommend using the sub-daemon for the job or
should we rather use libaudit for the task instead? Any insight is welcome
here.

What we would like to achieve is:

1. Monitor every created process if it (was) relevant to something. We don't
want to miss a process however briefly it ran.
2. We don't want to poll periodically, but rather only wake up (and then with
minimal latency) when something interesting happened. We would want to poll a
periodic check that forks are still reported, so we would detect a loss of
service from audit.
3. We don't want to possible loose or miss anything, even if load gets higher,
although we don't require to survive a fork bomb.

Sorry for the overlong email. We just hope you can help us identify how to
make best use of audit for our project.

Best regards,
Kay Hayen
Steve Grubb
2008-08-14 14:04:08 UTC
Permalink
Post by Kay Hayen
I would like to present our plan for using audit briefly. We have made a
prototype implementation, and discovered some things along the way.
Nice. I'll skip straight to the parts that I thnk I can comment on.
Post by Kay Hayen
Now one issue, I see is that the times that we get from auditd through the
socket from its child daemon may not match the start_date exactly.
All time hacks in the audit logs come from the kernel at the instant the
record is created. They all start by calling audit_log_start, and right here
is where time is written:

http://lxr.linux.no/linux+v2.6.26.2/kernel/audit.c#L1194

The source that is used is current_kernel_time();
Post by Kay Hayen
I think they could. Actually we would prefer to receive the tick at which a
process started,
The audit system has millisecond resolution.This was considered adequate due
to system ticks being < 1000 Hz. The current_kernel_time90 is a broken down
time struct similar to pselect's. This is how its used:

audit_log_format(ab, "audit(%lu.%03lu:%u): ",
t.tv_sec, t.tv_nsec/1000000, serial);
Post by Kay Hayen
Currently we feel we should apply a delta around the times to match them,
and that's somehow unstable methinks. We would prefer delta to be 0.
Otherwise we may e.g. run into pid number overruns much easier.
I'm thinking the audit resolution is higher than the scheduler's ticks. If you
take the absolute ticks and turn them into <sys/time.h>,

struct timespec {
long tv_sec; /* seconds */
long tv_nsec; /* nanoseconds */
};

Would they match?
Post by Kay Hayen
The other thing is sequence numbers. We see in the output sequence numbers
for each audit event. Very nice. But can you confirm where these sequence
numbers are created? Are they done in the kernel, in auditd or in its child
daemon?
They are done in the kernel and are incremented for each audit_log_start so
that no 2 audit events within the same millisecond have the same serial
number. Their source is here:

http://lxr.linux.no/linux+v2.6.26.2/kernel/audit.c#L1085
Post by Kay Hayen
The underlying question is, how safe can we be that we didn't miss anything
when sequence numbers don't suggest so. We would like to use the lossless
mode of auditd. Does that simply mean that auditd may get behind in worst
case?
Yes. You would want to do a couple things. Increase the kernel backlog,
increase auditd priority, & increase audispd's internal queue.
Post by Kay Hayen
Can you confirm that a type=EOE delimits every event (is that even
the correct term to use, audit trace, how is it called).
It delimits every multipart event. you can use something like this to
determine if you have an event:

if ( r->type == AUDIT_EOE || r->type < AUDIT_FIRST_EVENT ||
r->type >= AUDIT_FIRST_ANOM_MSG) {
have full event...
}
Post by Kay Hayen
We can't build the rpm due to dependency problems?
If you are on RHEL 5, just edit the spec file to remove --with-prelude. And
delete any packaging of egginfo files.
Post by Kay Hayen
, so I was using the hard way, ./configure --prefix=/opt/auditd-1.7 and that
works fine on our RHEL 5.2 it seems. What's not so clear to (me) is which
kernel dependency there really is. Were there interface changes at all?
The best bet is to take the last RHEL5 audit srpm and install it. Modify that
to have the new tar file. Then remove some of the patches. I have not build
current for RHEL5 so I can't say much except to remove one, rpmbuild -bp and
see if that is ok. then delete another if so. You do not need to do an
rpmbuild -ba.
Post by Kay Hayen
The changelog didn't suggest so.
There are likely dependency issues for the selinux policy used for the
zos-remote plugin.
Post by Kay Hayen
BTW: Release-wise, will RHEL 5.3 include the latest auditd?
That is the plan. But there will be a point where audit development continues
and bugfixes are backported rather than new version. At a minimum,
audit-1.7.5 will be in RHEL5.3. Maybe 1.7.6 if we have another quick release.
Post by Kay Hayen
One thing I observed with 1.7.4-1 from Debian Testing amd64 that we won't
ever see any clone events on the socket (and no forks, but we only know of
cron doing these anyway), but all execs and exit_groups.
That may be distro dependent. And you should use strace to confirm what you
are looking for. On x86_64, note there are 2 clone syscall and you should
have -F arch=b64 and -F arch=b32 for each rule.
Post by Kay Hayen
# First rule - delete all
-D
# Increase the buffers to survive stress events.
# Make this bigger for busy systems
-b 320
bump this up. maybe 8192. That's what we use for CAPP.
Post by Kay Hayen
# Feel free to add below this line. See auditctl man page
-a entry,always -S clone -S fork -S vfork
If you are on amd64, I would suggest:

-a entry,always -F arch=b32 -S clone -S fork -S vfork
-a entry,always -F arch=b64 -S clone -S fork -S vfork

and similar for other syscall rules.
Post by Kay Hayen
-a entry,always -S execve
-a entry,always -S exit_group -S exit
Very strange. Works fine with self-compile RHEL 5.2, I understand that you
are not Debian guys, I just wanted to ask you briefly if you were aware of
anything that could cause that. I am going to report that as a bug (to
them) otherwise.
There might be tunables that different distros can used with glibc. strace is
your friend...and having both 32/64 bit rules if amd64 is the target
platform.
Post by Kay Hayen
With our rules file, we have grouped only similar purpose syscalls that we
care about. The goal we have is to track all newly created processes, their
exits and the code they run. If you are aware of anything we miss, please
point it out.
This is a really tricky area. The could mmap a file and execute it. They can
pass file descriptors between processes and execve /proc/<pid>/fd/4. or maybe
take advantage of a hole in a program and overlay memory with another program
so that /proc shows one thing but its really another. Its really hard to make
airtight. SE Linux is your best bet to make sure people stay within the
bounds that you intend - which means that the real processes are auditable.
Post by Kay Hayen
Also, it is true (I read that yesterday) that every syscall is slowed down
for every new rule?
Yes, if they are syscall rules. Its best to group as many together as
possible.
Post by Kay Hayen
That means, we are making a mistake by not having only
one line?
I wouldn't say a mistake. Its that there will be a performance difference and
it may not be enough to worry about. You would have to benchmark it.
Post by Kay Hayen
And is open() performance really affected by this?
Yes.
Post by Kay Hayen
Does audit not (yet?) use other tracing interface like SystemTap, etc.
where people try to have 0 cost for inactive traces.
They have a cost. :) Also, systemtap while good for some things not good for
auditing. For one, systemtap recompiles the kernel to make new modules. You
may not want that in your environment. It also has not been tested for
CAPP/LSPP compilance.
Post by Kay Hayen
Also on a general basis. Do you recommend using the sub-daemon for the job
or should we rather use libaudit for the task instead? Any insight is
welcome here.
It really depends on what your environment allows. Do you need an audit trail?
With search tools? And reporting tools? Do you need the system to halt if
auditing problems occur? Do you need any certifications?
Post by Kay Hayen
1. Monitor every created process if it (was) relevant to something. We
don't want to miss a process however briefly it ran.
This is hard, but can be achieved with help from SE Linux.
Post by Kay Hayen
2. We don't want to poll periodically, but rather only wake up (and then
with minimal latency) when something interesting happened. We would want to
poll a periodic check that forks are still reported, so we would detect a
loss of service from audit.
You might write a audispd plugin for this.

-Steve
Kay Hayen
2008-08-15 06:43:49 UTC
Permalink
Hello Steve,

thanks for your reply, very helpful. :-)
Post by Steve Grubb
Post by Kay Hayen
Now one issue, I see is that the times that we get from auditd through
the socket from its child daemon may not match the start_date exactly.
All time hacks in the audit logs come from the kernel at the instant the
record is created. They all start by calling audit_log_start, and right
http://lxr.linux.no/linux+v2.6.26.2/kernel/audit.c#L1194
The source that is used is current_kernel_time();
[...]
Post by Steve Grubb
The audit system has millisecond resolution.This was considered adequate
due to system ticks being < 1000 Hz. The current_kernel_time90 is a broken
audit_log_format(ab, "audit(%lu.%03lu:%u): ",
t.tv_sec, t.tv_nsec/1000000, serial);
Steve Grubb
2008-08-15 12:54:35 UTC
Permalink
More importantly, and somewhat blocking my tests: With the improved rules I
type=SYSCALL msg=audit(1218773075.500:118620): arch=c000003e syscall=59
success=yes exit=0 a0=7fff6f78cf90 a1=7fff6f78cf40 a2=7fff6f78f068 a3=0
items=2 pp
id=11412 pid=11421 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000
fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts3 ses=4294967295
comm="gcc-4.3"
exe="/usr/bin/gcc-4.3" key=(null)
[...]
type=SYSCALL msg=audit(1218773075.496:118624): arch=c000003e syscall=56
success=yes exit=11421 a0=1200011 a1=0 a2=0 a3=7fc067776770 items=0
ppid=11407 pid
=11412 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000
egid=1000 sgid=1000 fsgid=1000 tty=pts3 ses=4294967295 comm="gnatchop"
exe="/usr/b
in/gnatchop" key=(null)
Please note the _ascending_ sequence number but _descending_ time.
What this indicates is that there was some recursion before the syscall
triggered an event. The syscall context exists from sycall entry to exit. If
during the middle a signal is delivered, the syscall is not finished. Instead
it runs the signal handler associated with the signal. The signal handler
might make syscalls which are then handled using the existing syscall context
via linked list. When that occurs, the timestamp is not being updated. Not
sure that is appropriate or why the original time really mattered. But that
is what you are observing. My guess is SIGTERM is being delivered during
another syscall.
Seems like a bug? Can you have a look at it?
I'll check on why we don't update the time stamp during syscall recursion.
-a entry,always -F arch=b32 -S clone -S fork -S vfork
-a entry,always -F arch=b64 -S clone -S fork -S vfork
Plus I still did't fully grasp why that arch filter was necessary in the
first place. I mean, after all, I was simply expecting that per default no
filter should give all arches. Is that filter actually a selector?
The -F arch is a selector for the syscall table. The kernel works off of
numbers not strings. So, clone doesn't mean anything to the kernel, but 56
has meaning. 56 doesn't mean much to people. So, auditctl does you a favor of
converting text to numbers. It needs to know which table to choose from, the
32 bit or 64 bit table as both or one could be valid. Its possible to compile
the kernel to use only the 64 bit table. There is no way to detect this from
user space except by failure...in which case all you know is failure but not
why.

There is also not a direct mapping between x86_64 and i386. There are syscalls
that exist on one arch but not the other. There are syscalls that change
names between arches. The problem is that I could maintain a table of all
these cross references for x86_64 and i386, but I don't have a good idea
about ppc and s390 which are also biarch. Then the table would be a snapshot
in time. A syscall could get added in a later kernel but you won't get the
right results because you were trusting the tool and not suspcious enough to
do your own review.

Then there is a problem of correlation. If I have 1 rule that expands to 2,
then how can I do a compare of what's in memory vs what rules are on disk?
IOW, how do I tell that someone typed:

-a entry,always -F arch=b32 -S clone -S fork -S vfork
-a entry,always -F arch=b64 -S clone -S fork -S vfork

or just

-a entry,always -S clone -S fork -S vfork

because auditctl would make 2 from 1. This is a really tricky issue and if we
didn't care about correlation...or about outdated tools we trust too
much...we could do this.
Does it have to do with the fact that syscall numbers are arch dependent?
Yes.

ausyscall x86_64 clone
56

ausyscall i386 clone
120
Post by Steve Grubb
Post by Kay Hayen
Can you confirm that a type=EOE delimits every event (is that even
the correct term to use, audit trace, how is it called).
It delimits every multipart event. you can use something like this to
if ( r->type == AUDIT_EOE || r->type < AUDIT_FIRST_EVENT ||
r->type >= AUDIT_FIRST_ANOM_MSG) {
have full event...
}
I will have to check if this affects our intended process tracing. The
parsing is certainly not simplified by it, for a possibly unrelated reason.
We have an audit parsing library. It takes this into account. the one and only
bug that I know of in it is when event records are interlaced. This is a
prolem you'll find at some point. Audit events and their records are not
serialized in the kernel. So, you could have:

syscall a
path a
syscall b
user msg c
cwd a
avc b
Without a very stateful message parser, one that e.g. knows how many lines
are to follow an EXECVE, we don't know when to forward it the part that
should process it.
time->Thu Aug 14 08:21:34 2008
node=127.0.0.1 type=PATH msg=audit(1218716494.667:677): item=1
name="/home/sgrubb/.kde/share/config/kmailrc.lock3U3ZZa.tmp" inode=11304982
dev=08:03 mode=0100644 ouid=4325 ogid=4325 rdev=00:00
obj=unconfined_u:object_r:user_home_t:s0

node=127.0.0.1 type=PATH msg=audit(1218716494.667:677): item=0
name="/home/sgrubb/.kde/share/config/" inode=12550361 dev=08:03 mode=040700
ouid=4325 ogid=4325 rdev=00:00 obj=unconfined_u:object_r:user_home_t:s0
node=127.0.0.1 type=CWD msg=audit(1218716494.667:677): cwd="/home/sgrubb"

node=127.0.0.1 type=SYSCALL msg=audit(1218716494.667:677): arch=c000003e
syscall=87 success=yes exit=0 a0=15f06b0 a1=39609389d0 a2=1340ac0
a3=3960b67a70 items=2 ppid=1 pid=3432 auid=4325 uid=4325 gid=4325 euid=4325
suid=4325 fsuid=4325 egid=4325 sgid=4325 fsgid=4325 tty=(none) ses=1
comm="kontact" exe="/usr/bin/kontact"
subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key="delete"

Look at the syscall record. It is always emitted with multi-line records. It
has an items count. Each auxiliary (path in this case) record has an item
number. You can tell when you have everything. Single line entries do not
have an items field. Also note that the record comprising an event comes out
of the kernel in a backwards order.
# 1. Some lines are split across multiple lines. The good thing is
that these never start
# with whitespace and so we can make them back into single
lines. This makes the next
# part easier.
lines = []
pass
lines.append( line )
assert line[0] != ' '
lines[-1] = lines[-1] + ' ' + line
Did you know about the audit parsing library?
This is in hope that indeed continued lines always start with a non-space
and type lines always start with a space. Would you consider this format
worthy and possible to change?
Don't like changing formats as that affects test suites.
I have no idea how much it represents and existing external interface, but
I can imagine you can't change it (easily). Probably the end of type= must
be detected by terminating empty line in case of those that can be
continued. But it would be very ugly to have to know the event types that
have this so early in the decoding process.
We have a parsing library, auparse, that handles the rules of audit parsing.
Look for auparse.h for the API.
Post by Steve Grubb
There might be tunables that different distros can used with glibc.
strace is your friend...and having both 32/64 bit rules if amd64 is the
target platform.
We did that of course. And what was confusing us was that the audit.log did
actually seem to show the calls. Can that even be?
Yes, as explained above.
Post by Steve Grubb
Post by Kay Hayen
Does audit not (yet?) use other tracing interface like SystemTap, etc.
where people try to have 0 cost for inactive traces.
They have a cost. :) Also, systemtap while good for some things not good
for auditing. For one, systemtap recompiles the kernel to make new
modules. You may not want that in your environment. It also has not been
tested for CAPP/LSPP compilance.
Post by Kay Hayen
Also on a general basis. Do you recommend using the sub-daemon for the
job or should we rather use libaudit for the task instead? Any insight
is welcome here.
It really depends on what your environment allows. Do you need an audit
trail? With search tools? And reporting tools? Do you need the system to
halt if auditing problems occur? Do you need any certifications?
I see. Luckily we are not into security, but only "safety". I can't find
anything on Wikipedia about it, so I will try to explain it briefly, please
forgive my limited understanding of it. :-)
At one point, I worked on Space Shuttle software. I know a little on how they
think about this.
It certainly will be very helpful to have the audit log and it searchable
and I understand we get that automatic by leaving audit enabled, but
configured correctly. In the past we have disabled it, because it caused a
full disk and boot failure on RHEL 3 after only a month or so. I think it
complained about the UDP echo packets that we use to check our internal LAN
operations, but it could have been SELinux too.
RHEL3's audit system is completely different than RHEL5's.
Post by Steve Grubb
Post by Kay Hayen
2. We don't want to poll periodically, but rather only wake up (and
then with minimal latency) when something interesting happened. We
would want to poll a periodic check that forks are still reported, so
we would detect a loss of service from audit.
You might write a audispd plugin for this.
Did you mean for the periodic check,
There is a realtime interface for the audit stream. You can write either a new
event dispatcher or a plugin to the existing one. Seeing as you are more
concerned with assurance, I'd just replace the current dispatcher with your
own. I have a description of this here:

http://people.redhat.com/sgrubb/audit/audit-rt-events.txt
or for the whole job, that means our supervision process?
The supervision process. Then again, maybe you want to replace the audit
daemon and handle events your own way. libaudit has all the primitives for
that. So, I guess that brings up the question of how you are accessing the
audit event stream. Are you reading straight from netlink or the disk?
Regarding performance I would like to say, you are likely right in that
it's a non-issue. It has something of a bike-shed to me though. :-) I think
I still have http://lwn.net/Articles/290428/ on my mind, where I had the
impression that kernel markers would only require a few noop instructions
as place holders for a jumps that would cause audit code to run.
You can go that way if you want. But I don't know of anyone else that has.
I was wondering why audit wouldn't use that. Is that historic (didn't exist,
nobody made a patch for it) or conscious decision (too difficult, not worth
it). Just curious here and of course the comment could be read as a bit
scary, because it actually means we will have to benchmark the impact...
systemtap came after audit. They have 2 different purposes. One is
debugging/profiling, the other is regulatory compliance and security. The
system tap people have no gurantees about what kinds of data is contained in
the stream or the reliability of delivery. There was some talk about
combining hooks and in the end it was decided that we should leave them
disconnected as they serve entirely different purposes.

-Steve
Kay Hayen
2008-08-16 11:19:27 UTC
Permalink
Hello Steve,

[ time descending, sequence number ascending problem ]
Post by Steve Grubb
What this indicates is that there was some recursion before the syscall
triggered an event. The syscall context exists from sycall entry to exit.
If during the middle a signal is delivered, the syscall is not finished.
Instead it runs the signal handler associated with the signal. The signal
handler might make syscalls which are then handled using the existing
syscall context via linked list. When that occurs, the timestamp is not
being updated. Not sure that is appropriate or why the original time really
mattered. But that is what you are observing. My guess is SIGTERM is being
delivered during another syscall.
That raised the following question to me: We have "entry" rules defined. When
we saw that we get exit codes, the conclusion was that "entry" and "exit" are
not different for every syscall. Can you confirm?

And assuming that, indeed the may fork complete only after its has completed
its signal handlers. The expectation would be that this is not an issue
though, because the new process is inactive. Wrong assumption. The new
process seemingly can already EXECVE with its fresh life, if the call of FORK
is interrupted after the process exists.

Now that poses an interesting problem. I guess, what we are missing is that
FORK should actually enter once and return up to _two_ times, and we need to
handle and see traces of these. That way we would simply see FORK return with
0 for the child before it does EXECVE, and so we would know who created it
(ppid) and know that it exists.

I tried to change our rules to "exit,always" from "entry,always", but it
didn't make a difference. Can you confirm that only one exit is traced and do
you think audit can be enhanced to trace these extra exits of syscalls like
FORK.

For every workaround, a SIGKILL signal towards the FORK making process would
probably leave us even without a trace, forever possibly.

As for the signal at hand here, I think we have ant (gcj based Java) doing
multi-threading (parallel building). That would be FORKs series with probably
a SIGALARM or whatever they use to switch target execution without resorting
to threading.
Post by Steve Grubb
Seems like a bug? Can you have a look at it?
I'll check on why we don't update the time stamp during syscall recursion.
Thanks a lot. I guess, the expectation could be that "exit" traces have the
datation of the "exit" and not "entry".
Post by Steve Grubb
Then there is a problem of correlation. If I have 1 rule that expands to 2,
then how can I do a compare of what's in memory vs what rules are on disk?
-a entry,always -F arch=b32 -S clone -S fork -S vfork
-a entry,always -F arch=b64 -S clone -S fork -S vfork
or just
-a entry,always -S clone -S fork -S vfork
because auditctl would make 2 from 1. This is a really tricky issue and if
we didn't care about correlation...or about outdated tools we trust too
much...we could do this.
That's understood. And a typical danger of user friendly abstraction is that
it causes confusion. As I said, -F was bound to "filter" in my mind.
For "arch" it's suddenly a selector. I would find something like this more
logical:

-a entry,always,any -S clone -S fork -S vfork

and if I really only wanted a certain arch, make me say so:

-a entry,always,b64 -S clone -S fork -S vfork
Post by Steve Grubb
ausyscall x86_64 clone
56
ausyscall i386 clone
120
Very good. We have initially defined a hash in Python manually with what we
encounter, but we can rather use that to create them. We specifically have
the problem of visiting a s390 site, where it will handy to have these
already in place. There is no such function in libaudit, is there?
Post by Steve Grubb
We have an audit parsing library. It takes this into account.
I have looked at it, and auparse_init doesn't seem to support reading from the
socket itself, does it? It could be AUSOURCE_FILE. And there there is the
issue that the logs on disk seem to be different from what format we get on
the socket. The node= is not on disk, newlines, empty lines, etc. see below
about that.

In an ideal world, we would like to note that the audit socket is readable,
hand it (or an arbitrarily truncated chunk of data) from it to libaudit, ask
it for events until it says there are not more. That would leave the
truncated line/event issue to libaudit. Is that part of the code?
Post by Steve Grubb
Without a very stateful message parser, one that e.g. knows how many
lines are to follow an EXECVE, we don't know when to forward it the part
that should process it.
[Example deleted]
Post by Steve Grubb
Look at the syscall record. It is always emitted with multi-line records.
It has an items count. Each auxiliary (path in this case) record has an
item number. You can tell when you have everything. Single line entries do
not have an items field. Also note that the record comprising an event
comes out of the kernel in a backwards order.
Ah, we simply ignored the type=PATH etc. elements. But what I mean is that of
the syscall itself, the arguments seemed to be on new lines:

This is from Python code:

data = _audit_socket.recv( 32*1024 )
print data

node=Annuitka type=SYSCALL msg=audit(1218880198.814:42205): arch=c000003e
syscall=59 success=yes exit=0 a0=16cc168 a1=1464c08 a2=1588008 a3=0 items=2
ppid=3864 pid=19928 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000
fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts3 ses=4294967295 comm="ls"
exe="/bin/ls" key=(null)
node=Annuitka type=EXECVE msg=audit(1218880198.814:42205): argc=2 a0="ls"
a1="--color=auto"

node=Annuitka type=CWD msg=audit(1218880198.814:42205):
cwd="/data/home/anna/comsoft/v7a1-ps2-acs/src/acs"
node=Annuitka type=PATH msg=audit(1218880198.814:42205): item=0 name="/bin/ls"
inode=1651626 dev=08:12 mode=0100755 ouid=0 ogid=0 rdev=00:00
node=Annuitka type=PATH msg=audit(1218880198.814:42205): item=1 name=(null)
inode=779612 dev=08:12 mode=0100755 ouid=0 ogid=0 rdev=00:00
node=Annuitka type=EOE msg=audit(1218880198.814:42205):

Note that we get a SYSCALL with 2 items, and then in order the items - from
the socket. But inbetween we get type=EXECVE it doesn't have an item number,
and worse the new line before 'a1=--color-auto' is real and so is the empty
line after it. I have another example of a "gnash" call from Konqueror with
no less than 29 arguments.

That means, in order to parse the socket, we should check argc, right? I think
we would prefer very long lines like they are in /var/log/audit instead,
making these kinds of steps optional.

Actually I don't understand the differences in format. I assume they serve the
purpose of making things readable?
Post by Steve Grubb
Did you know about the audit parsing library?
Our assumption was also that it should be easy enough to parse the text. Well
you know assumptions. Rarely ever true. :-)
Post by Steve Grubb
This is in hope that indeed continued lines always start with a non-space
and type lines always start with a space. Would you consider this format
worthy and possible to change?
Don't like changing formats as that affects test suites.
That " type=" start is a self-confusion of ours. Starting with 1.6 the node=
part was added, and some hack was still in place that removes "node=hostname"
and leaves the space there. Sorry about that.
Post by Steve Grubb
I have no idea how much it represents and existing external interface,
but I can imagine you can't change it (easily). Probably the end of type=
must be detected by terminating empty line in case of those that can be
continued. But it would be very ugly to have to know the event types that
have this so early in the decoding process.
We have a parsing library, auparse, that handles the rules of audit
parsing. Look for auparse.h for the API.
If you confirm that can handles the parsing from the socket, as suggested
above, we may persue that path and can ignore strangeness of the format once
its handled by the library.
Post by Steve Grubb
Post by Steve Grubb
There might be tunables that different distros can used with glibc.
strace is your friend...and having both 32/64 bit rules if amd64 is the
target platform.
We did that of course. And what was confusing us was that the audit.log
did actually seem to show the calls. Can that even be?
Yes, as explained above.
Sorry, I am still confused. Can you explain why the socket and the audit.log
can have different contents. I was blaming my (usually bad) memory.
Post by Steve Grubb
I see. Luckily we are not into security, but only "safety". I can't find
anything on Wikipedia about it, so I will try to explain it briefly,
please forgive my limited understanding of it. :-)
At one point, I worked on Space Shuttle software. I know a little on how
they think about this.
Well, that's perfect. :-)
Post by Steve Grubb
Post by Steve Grubb
Post by Kay Hayen
2. We don't want to poll periodically, but rather only wake up (and
then with minimal latency) when something interesting happened. We
would want to poll a periodic check that forks are still reported, so
we would detect a loss of service from audit.
You might write a audispd plugin for this.
Did you mean for the periodic check,
There is a realtime interface for the audit stream. You can write either a
new event dispatcher or a plugin to the existing one. Seeing as you are
more concerned with assurance, I'd just replace the current dispatcher with
http://people.redhat.com/sgrubb/audit/audit-rt-events.txt
I saw that too, but I thought it would be better to build upon the existing
effort. I think that's a viable alternative and potentially could be more
robust to us. At least it could be that audisp seems to try and solve
problems we don't have or want.

Looking at the source I saw that node name is something that audisp indeed
adds the node name and that auditd doesn't log EOEs, explaining some of the
differences. I didn't find why audisp has extra new lines, or if auditd
removed these.

I think we will make a prototype for the RT interface and see what it gives
us.
Post by Steve Grubb
or for the whole job, that means our supervision process?
The supervision process. Then again, maybe you want to replace the audit
daemon and handle events your own way. libaudit has all the primitives for
that. So, I guess that brings up the question of how you are accessing the
audit event stream. Are you reading straight from netlink or the disk?
Steve Grubb
2008-08-18 15:10:24 UTC
Permalink
Post by Kay Hayen
[ time descending, sequence number ascending problem ]
Post by Steve Grubb
What this indicates is that there was some recursion before the syscall
triggered an event. The syscall context exists from sycall entry to exit.
If during the middle a signal is delivered, the syscall is not finished.
Instead it runs the signal handler associated with the signal. The signal
handler might make syscalls which are then handled using the existing
syscall context via linked list. When that occurs, the timestamp is not
being updated. Not sure that is appropriate or why the original time
really mattered. But that is what you are observing. My guess is SIGTERM
is being delivered during another syscall.
That raised the following question to me: We have "entry" rules defined.
When we saw that we get exit codes, the conclusion was that "entry" and
"exit" are not different for every syscall. Can you confirm?
They are different in that some things are not defined at entry while all
things are defined at exit. I believe you can write all audit rules as exit
rules without any noticable differences. if have an entry rule that evaluates
to never, then it does speed things up since it no longer needs to collect
aux records. With respect to the time, its set when audit_log_start is called
which is always on exit for any rules that involve syscalls (that is when the
exit code is valid).
Post by Kay Hayen
I tried to change our rules to "exit,always" from "entry,always", but it
didn't make a difference. Can you confirm that only one exit is traced and
do you think audit can be enhanced to trace these extra exits of syscalls
like FORK.
Yes, I think the kernel could be updated to return twice. This would need to
be sent upstream and I think 2.6.28 is the next chance.
Post by Kay Hayen
Post by Steve Grubb
Seems like a bug? Can you have a look at it?
I'll check on why we don't update the time stamp during syscall recursion.
Thanks a lot. I guess, the expectation could be that "exit" traces have the
datation of the "exit" and not "entry".
See above about timestamp generation.
Post by Kay Hayen
Post by Steve Grubb
Then there is a problem of correlation. If I have 1 rule that expands to
2, then how can I do a compare of what's in memory vs what rules are on
-a entry,always -F arch=b32 -S clone -S fork -S vfork
-a entry,always -F arch=b64 -S clone -S fork -S vfork
or just
-a entry,always -S clone -S fork -S vfork
because auditctl would make 2 from 1. This is a really tricky issue and
if we didn't care about correlation...or about outdated tools we trust
too much...we could do this.
That's understood. And a typical danger of user friendly abstraction is
that it causes confusion. As I said, -F was bound to "filter" in my mind.
-F means field. In this case, the filter does use the arch field to select
which syscalls become events. But we put it before the syscall so that
auditctl looks it up in the right table. It might possibly be more correct to
introduce a selector for -S, but then people won't like giving it twice.
Post by Kay Hayen
Post by Steve Grubb
ausyscall x86_64 clone
56
ausyscall i386 clone
120
Very good. We have initially defined a hash in Python manually with what we
encounter, but we can rather use that to create them. We specifically have
the problem of visiting a s390 site, where it will handy to have these
already in place. There is no such function in libaudit, is there?
For what?
Post by Kay Hayen
Post by Steve Grubb
We have an audit parsing library. It takes this into account.
I have looked at it, and auparse_init doesn't seem to support reading from
the socket itself, does it?
You mean the netlink socket?
Post by Kay Hayen
It could be AUSOURCE_FILE. And there there is the issue that the logs on
disk seem to be different from what format we get on the socket.
yes it is.
Post by Kay Hayen
In an ideal world, we would like to note that the audit socket is readable,
hand it (or an arbitrarily truncated chunk of data) from it to libaudit,
ask it for events until it says there are not more. That would leave the
truncated line/event issue to libaudit. Is that part of the code?
libaudit should pull complete events from the kernel unless an execve has an
excessive number of arguments or large sized arguments.
Post by Kay Hayen
Post by Steve Grubb
Look at the syscall record. It is always emitted with multi-line records.
It has an items count. Each auxiliary (path in this case) record has an
item number. You can tell when you have everything. Single line entries
do not have an items field. Also note that the record comprising an event
comes out of the kernel in a backwards order.
Ah, we simply ignored the type=PATH etc. elements. But what I mean is that
data = _audit_socket.recv( 32*1024 )
print data
node=Annuitka type=SYSCALL msg=audit(1218880198.814:42205): arch=c000003e
syscall=59 success=yes exit=0 a0=16cc168 a1=1464c08 a2=1588008 a3=0 items=2
ppid=3864 pid=19928 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000
fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts3 ses=4294967295 comm="ls"
exe="/bin/ls" key=(null)
node=Annuitka type=EXECVE msg=audit(1218880198.814:42205): argc=2 a0="ls"
a1="--color=auto"
cwd="/data/home/anna/comsoft/v7a1-ps2-acs/src/acs"
node=Annuitka type=PATH msg=audit(1218880198.814:42205): item=0
name="/bin/ls" inode=1651626 dev=08:12 mode=0100755 ouid=0 ogid=0
rdev=00:00
node=Annuitka type=PATH msg=audit(1218880198.814:42205): item=1 name=(null)
inode=779612 dev=08:12 mode=0100755 ouid=0 ogid=0 rdev=00:00
Note that we get a SYSCALL with 2 items, and then in order the items - from
the socket. But inbetween we get type=EXECVE it doesn't have an item number,
I suppose that could be fixed.
Post by Kay Hayen
and worse the new line before 'a1=--color-auto' is real and so is
the empty line after it. I have another example of a "gnash" call from
Konqueror with no less than 29 arguments.
That is coming from here, and I think a patch was submitted fixing it.

http://lxr.linux.no/linux+v2.6.26.2/kernel/auditsc.c#L1114
Post by Kay Hayen
That means, in order to parse the socket, we should check argc, right? I
think we would prefer very long lines like they are in /var/log/audit
instead, making these kinds of steps optional.
Actually I don't understand the differences in format. I assume they serve
the purpose of making things readable?
Yes.
Kay Hayen
2008-08-19 06:45:00 UTC
Permalink
Hello Steve,
Post by Steve Grubb
Post by Kay Hayen
I tried to change our rules to "exit,always" from "entry,always", but it
didn't make a difference. Can you confirm that only one exit is traced
and do you think audit can be enhanced to trace these extra exits of
syscalls like FORK.
Yes, I think the kernel could be updated to return twice. This would need
to be sent upstream and I think 2.6.28 is the next chance.
The missing FORK return is incidentally the one we care about. We have no
concern if a process forks or not, we only want to know and see the new
process. It doesn't matter if the parent ever got to notice it.

Is there any hope such a patch could be part of RHEL 5.3, given that Redhat
has its own kernel release process? I am not that much into security, but I
could imagine that it's possible to carefully craft a process that escapes
the audit trail with SIGKILL to a forker.

All you got to do is to fork a process that will fork another and with
increasingly bigger times, you SIGKILL the process until its child will
secretely survive.
Post by Steve Grubb
Post by Kay Hayen
Post by Steve Grubb
ausyscall x86_64 clone
56
ausyscall i386 clone
120
Very good. We have initially defined a hash in Python manually with what
we encounter, but we can rather use that to create them. We specifically
have the problem of visiting a s390 site, where it will handy to have
these already in place. There is no such function in libaudit, is there?
For what?
Well for the functionality of ausyscall. If we could query the current arch,
well it's b32/b64 arches, then we could build that table at run time,
couldn't we?

That would be a whole lot nicer than hardcoded values, even if they are
generated using ausyscall.
Post by Steve Grubb
Post by Kay Hayen
Post by Steve Grubb
We have an audit parsing library. It takes this into account.
I have looked at it, and auparse_init doesn't seem to support reading
from the socket itself, does it?
You mean the netlink socket?
No, when opening the socket the to the sub deamon audisp. I couldn't convice
myself how the API would work with a socket. Does it?
Post by Steve Grubb
Post by Kay Hayen
In an ideal world, we would like to note that the audit socket is
readable, hand it (or an arbitrarily truncated chunk of data) from it to
libaudit, ask it for events until it says there are not more. That would
leave the truncated line/event issue to libaudit. Is that part of the
code?
libaudit should pull complete events from the kernel unless an execve has
an excessive number of arguments or large sized arguments.
I read that as that we can use the netlink socket with the libaudit directly,
which sort of could be exactly what we want. That would mean we wouldn't use
audit user space (processes) at all, right?
Post by Steve Grubb
Post by Kay Hayen
Note that we get a SYSCALL with 2 items, and then in order the items -
from the socket. But inbetween we get type=EXECVE it doesn't have an item
number,
I suppose that could be fixed.
Post by Kay Hayen
and worse the new line before 'a1=--color-auto' is real and so is
the empty line after it. I have another example of a "gnash" call from
Konqueror with no less than 29 arguments.
That is coming from here, and I think a patch was submitted fixing it.
http://lxr.linux.no/linux+v2.6.26.2/kernel/auditsc.c#L1114
I see. Strange to see line formatting like that in the kernel in the first
place. But libaudit doesn't care about them anyway I suppose.
Post by Steve Grubb
Post by Kay Hayen
Post by Steve Grubb
I have no idea how much it represents and existing external
interface, but I can imagine you can't change it (easily). Probably
the end of type= must be detected by terminating empty line in case
of those that can be continued. But it would be very ugly to have to
know the event types that have this so early in the decoding process.
We have a parsing library, auparse, that handles the rules of audit
parsing. Look for auparse.h for the API.
If you confirm that can handles the parsing from the socket, as suggested
above, we may persue that path and can ignore strangeness of the format
once its handled by the library.
The audit parsing library wants to read text strings as you would find them
on disk. The kernel keeps type separate as an integer so that decisions can
be made about what the record means without having to do a text to int
conversion. So, the audit daemon does the reformatting after it decides
that it a record type that we are interested in.
And I read that as the libaudit library being unable to use the netlink socket
directly.

[ Options for listening]
Post by Steve Grubb
You have 4 points to get the audit stream, in order of distance from the
event generation: the audit netlink socket, auditd realtime interface,
audisp plugin interface, and the af_unix socket created by the af_unix
plugin from audispd. For higher reliability where you don't want of need
any other audit processing interfering, I would say use either of the first
2.
The latency is getting higher with each step. For optimal performance we would
listen to the netlink socket and duplicate only the code essential to process
what we are interested it.

For extra points and hurt, we would do it in Ada and inside the target
process, really achieving the low latency. It may be the only realistic
option, but it also feels like duplication of effort. We have done netlink
interfaces in Ada before, but also have on our mind that it was said that the
netlink interface was said (not by you) to be still in flux. Is that still
true?

It certainly would be nice if the audisp had some form of output that can be
fed directly into libaudit parsing as it comes in. But that may be an
unrealistic expectation, is it?
Post by Steve Grubb
Post by Kay Hayen
Post by Steve Grubb
or for the whole job, that means our supervision process?
The supervision process. Then again, maybe you want to replace the
audit daemon and handle events your own way. libaudit has all the
primitives for that. So, I guess that brings up the question of how you
are accessing the audit event stream. Are you reading straight from
netlink or the disk?
John Dennis
2008-08-19 14:14:54 UTC
Permalink
Post by Kay Hayen
No, when opening the socket the to the sub deamon audisp. I couldn't convice
myself how the API would work with a socket. Does it?
The auparse library can read a stream by opening the parser with
AUSOURCE_FEED, you set a callback, then feed arbitrary number of bytes
into the parser by calling auparse_feed(), you'll be called back when a
complete event is found, at that point just use the normal auparse
functions.

You can read off of this unix socket (/var/run/audispd_events) but this
is deprecated. It is now preferred is now to use a audispd plugin and
read from stdin. See the audit src package and look in audisp/plugins
for examples. FWIW I noticed that code was calling fgets to get data to
feed to auparse_feed() but it seems inefficient to buffer lines twice,
auparse_feed will do the line protocol.
Post by Kay Hayen
I read that as that we can use the netlink socket with the libaudit directly,
which sort of could be exactly what we want. That would mean we wouldn't use
audit user space (processes) at all, right?
No, you really want to use the user space interface (see above).
Post by Kay Hayen
Post by Steve Grubb
You have 4 points to get the audit stream, in order of distance from the
event generation: the audit netlink socket, auditd realtime interface,
audisp plugin interface, and the af_unix socket created by the af_unix
plugin from audispd. For higher reliability where you don't want of need
any other audit processing interfering, I would say use either of the first
2.
The latency is getting higher with each step. For optimal performance we would
listen to the netlink socket and duplicate only the code essential to process
what we are interested it.
For extra points and hurt, we would do it in Ada and inside the target
process, really achieving the low latency. It may be the only realistic
option, but it also feels like duplication of effort. We have done netlink
interfaces in Ada before, but also have on our mind that it was said that the
netlink interface was said (not by you) to be still in flux. Is that still
true?
It certainly would be nice if the audisp had some form of output that can be
fed directly into libaudit parsing as it comes in. But that may be an
unrealistic expectation, is it?
It does, see above comment.
--
John Dennis <***@redhat.com>
Kay Hayen
2008-08-19 17:46:14 UTC
Permalink
Hello John,
Post by John Dennis
Post by Kay Hayen
No, when opening the socket the to the sub deamon audisp. I couldn't
convice myself how the API would work with a socket. Does it?
The auparse library can read a stream by opening the parser with
AUSOURCE_FEED, you set a callback, then feed arbitrary number of bytes
into the parser by calling auparse_feed(), you'll be called back when a
complete event is found, at that point just use the normal auparse
functions.
You can read off of this unix socket (/var/run/audispd_events) but this
is deprecated. It is now preferred is now to use a audispd plugin and
read from stdin. See the audit src package and look in audisp/plugins
for examples. FWIW I noticed that code was calling fgets to get data to
feed to auparse_feed() but it seems inefficient to buffer lines twice,
auparse_feed will do the line protocol.
That's great. We can use the first approach initially (unix socket), a plugin
is not so good for us, because our supervision process would need to receive
from it anyway.

The next best step would be to use the netlink socket directly.
Steve Grubb
2008-08-19 18:18:46 UTC
Permalink
Post by John Dennis
No, you really want to use the user space interface (see above).
Well, for lowest latency possible (note the "live" in subject), it would be
ideal to avoid context switches auditd -> audisp -> our supervisor and
instead simply run an additional netlink socket in addition to auditd (if
that is allowed). That way we would have a lot less latency, at least in
theory.
Only 1 netlink socket connection is allowed. The code you want to write for
low latency would either need to take the place of the audit daemon, meaning
you need to make your own trail if you need it. Or, write an audispd that is
run from auditd. There is some sample code here contrib/skeleton.c for
starting your own audispd.

-Steve
Steve Grubb
2008-08-19 14:47:27 UTC
Permalink
Post by Kay Hayen
Hello Steve,
Post by Steve Grubb
Post by Kay Hayen
I tried to change our rules to "exit,always" from "entry,always", but
it didn't make a difference. Can you confirm that only one exit is
traced and do you think audit can be enhanced to trace these extra
exits of syscalls like FORK.
Yes, I think the kernel could be updated to return twice. This would need
to be sent upstream and I think 2.6.28 is the next chance.
<snip>
Post by Kay Hayen
Is there any hope such a patch could be part of RHEL 5.3, given that Redhat
has its own kernel release process? I am not that much into security, but I
could imagine that it's possible to carefully craft a process that escapes
the audit trail with SIGKILL to a forker.
Just because a process exists is not a security concern. The process actually
has to do something related to security - e.g. access resources. At that
point we will pick it up. I can see that we should probably have 2 records on
the clone syscall if that is being audited.
Post by Kay Hayen
All you got to do is to fork a process that will fork another and with
increasingly bigger times, you SIGKILL the process until its child will
secretely survive.
Sure, but as soon as it touches something or makes any syscall, its
potentially auditable.
Post by Kay Hayen
Post by Steve Grubb
Post by Kay Hayen
Post by Steve Grubb
ausyscall x86_64 clone
56
ausyscall i386 clone
120
Very good. We have initially defined a hash in Python manually with
what we encounter, but we can rather use that to create them. We
specifically have the problem of visiting a s390 site, where it will
handy to have these already in place. There is no such function in
libaudit, is there?
For what?
Well for the functionality of ausyscall. If we could query the current
arch, well it's b32/b64 arches, then we could build that table at run time,
couldn't we?
Sure. If you look at the code for ausyscall, it simply calls
audit_syscall_to_name() in libaudit. On number to string its a straight
lookup, for string to number, we have to brute force search for it.
Post by Kay Hayen
That would be a whole lot nicer than hardcoded values, even if they are
generated using ausyscall.
Sure. Occassionally syscalls get added to the upstream kernel and very rarely
to a RHEL kernel. So, using libaudit would future-proof the code.
Post by Kay Hayen
Post by Steve Grubb
Post by Kay Hayen
Post by Steve Grubb
We have an audit parsing library. It takes this into account.
I have looked at it, and auparse_init doesn't seem to support reading
from the socket itself, does it?
You mean the netlink socket?
No, when opening the socket the to the sub deamon audisp. I couldn't
convice myself how the API would work with a socket. Does it?
Not directly because the audit internal API has the type as an integer
separate from the text of the event. Its really simple to create a string
that auparse can use and then use the feed interface. A working example of
the feed interface can be found in audisp/plugins/prelude/audisp-prelude.c.
Post by Kay Hayen
Post by Steve Grubb
Post by Kay Hayen
In an ideal world, we would like to note that the audit socket is
readable, hand it (or an arbitrarily truncated chunk of data) from it
to libaudit, ask it for events until it says there are not more. That
would leave the truncated line/event issue to libaudit. Is that part of
the code?
libaudit should pull complete events from the kernel unless an execve has
an excessive number of arguments or large sized arguments.
I read that as that we can use the netlink socket with the libaudit
directly, which sort of could be exactly what we want. That would mean we
wouldn't use audit user space (processes) at all, right?
True. You would have to load your own rules since that is done by the audit
user space.
Post by Kay Hayen
Post by Steve Grubb
Post by Kay Hayen
Note that we get a SYSCALL with 2 items, and then in order the items -
from the socket. But inbetween we get type=EXECVE it doesn't have an
item number,
I suppose that could be fixed.
Post by Kay Hayen
and worse the new line before 'a1=--color-auto' is real and so is
the empty line after it. I have another example of a "gnash" call from
Konqueror with no less than 29 arguments.
That is coming from here, and I think a patch was submitted fixing it.
http://lxr.linux.no/linux+v2.6.26.2/kernel/auditsc.c#L1114
I see. Strange to see line formatting like that in the kernel in the first
place. But libaudit doesn't care about them anyway I suppose.
No it doesn't. Things down stream from it might, but its stripped going to
disk.
Post by Kay Hayen
Post by Steve Grubb
Post by Kay Hayen
Post by Steve Grubb
I have no idea how much it represents and existing external
interface, but I can imagine you can't change it (easily). Probably
the end of type= must be detected by terminating empty line in case
of those that can be continued. But it would be very ugly to have
to know the event types that have this so early in the decoding
process.
We have a parsing library, auparse, that handles the rules of audit
parsing. Look for auparse.h for the API.
If you confirm that can handles the parsing from the socket, as
suggested above, we may persue that path and can ignore strangeness of
the format once its handled by the library.
The audit parsing library wants to read text strings as you would find
them on disk. The kernel keeps type separate as an integer so that
decisions can be made about what the record means without having to do a
text to int conversion. So, the audit daemon does the reformatting after
it decides that it a record type that we are interested in.
And I read that as the libaudit library being unable to use the netlink
socket directly.
No, libaudit is the I/O inteface with the kernel. The audit daemon and
auditctl make extensive use of it when talking to the kernel about audit
events. wrt auparse (if that's what you meant) you just run the data through:

asprintf(&v, "type=%s msg=%.*s\n", type, e->hdr.size, e->data);

and "v" has the string ready for auparse use. asprinf() allocates memory, so
watch that it doesn't create a memory leak.
Post by Kay Hayen
[ Options for listening]
Post by Steve Grubb
You have 4 points to get the audit stream, in order of distance from the
event generation: the audit netlink socket, auditd realtime interface,
audisp plugin interface, and the af_unix socket created by the af_unix
plugin from audispd. For higher reliability where you don't want of need
any other audit processing interfering, I would say use either of the
first 2.
The latency is getting higher with each step. For optimal performance we
would listen to the netlink socket and duplicate only the code essential to
process what we are interested it.
Sure
Post by Kay Hayen
For extra points and hurt, we would do it in Ada and inside the target
process, really achieving the low latency. It may be the only realistic
option, but it also feels like duplication of effort. We have done netlink
interfaces in Ada before, but also have on our mind that it was said that
the netlink interface was said (not by you) to be still in flux. Is that
still true?
We are in the process of migrating from the old rules to the new rules API.
from kernel 2.6.6 to around 2.6.16 had one API (audit_add_rule) and replaced
with a new and improved API (audit_add_rule_data) for kernels after that. The
deprecated functions should be removed from libaudit.h so that there is
binary compatibility for prebuilt apps and newly built apps won't be able to
use the old functions.
Post by Kay Hayen
It certainly would be nice if the audisp had some form of output that can
be fed directly into libaudit parsing as it comes in. But that may be an
unrealistic expectation, is it?
one note...libaudit is an I/O library, libauparse is the library that parses
audit events. Assuming you meant the latter...they are built for one another.
The audispd feeds data to siblings. In the configuration file, you just
specify that the child app wants string data and it takes care of the
conversion. The prelude plugin is a good example. However, audispd plugins
are probably too high of latency for you. Converting the kernel's data into a
string is simple as code snippet above shows.

-Steve
Kay Hayen
2008-08-19 18:23:21 UTC
Permalink
Hello Steve,
Post by Steve Grubb
Post by Kay Hayen
Well for the functionality of ausyscall. If we could query the current
arch, well it's b32/b64 arches, then we could build that table at run
time, couldn't we?
Sure. If you look at the code for ausyscall, it simply calls
audit_syscall_to_name() in libaudit. On number to string its a straight
lookup, for string to number, we have to brute force search for it.
Perfect, we are going to use that instead then.
Post by Steve Grubb
Post by Kay Hayen
Post by Steve Grubb
Post by Kay Hayen
Post by Steve Grubb
We have an audit parsing library. It takes this into account.
I have looked at it, and auparse_init doesn't seem to support reading
from the socket itself, does it?
You mean the netlink socket?
No, when opening the socket the to the sub deamon audisp. I couldn't
convice myself how the API would work with a socket. Does it?
Not directly because the audit internal API has the type as an integer
separate from the text of the event. Its really simple to create a string
that auparse can use and then use the feed interface. A working example of
the feed interface can be found in audisp/plugins/prelude/audisp-prelude.c.
Nice, as I wrote in that other email of mine shortly ago, this could well be
the way forward for now and later we can switch to pure netlink socket.
Post by Steve Grubb
Post by Kay Hayen
Post by Steve Grubb
libaudit should pull complete events from the kernel unless an execve
has an excessive number of arguments or large sized arguments.
I read that as that we can use the netlink socket with the libaudit
directly, which sort of could be exactly what we want. That would mean we
wouldn't use audit user space (processes) at all, right?
True. You would have to load your own rules since that is done by the audit
user space.
Can you confirm that two processes opening netlink sockets for audit
information get the same messages? I am under the impression that the kernel
doesn't maintain per socket configuration, does it?

If that were the case, we would simply co-exist with auditd and let it do its
logging, etc. and benefit from it, and its ability to load the rules (which
Post by Steve Grubb
asprintf(&v, "type=%s msg=%.*s\n", type, e->hdr.size, e->data);
and "v" has the string ready for auparse use. asprinf() allocates memory,
so watch that it doesn't create a memory leak.
That's very sweet. Where would you expect the pitfalls? I mean, it can't be so
easy. :-)
Post by Steve Grubb
The audispd feeds data to siblings. In the configuration file, you
just specify that the child app wants string data and it takes care of the
conversion. The prelude plugin is a good example. However, audispd plugins
are probably too high of latency for you. Converting the kernel's data into
a string is simple as code snippet above shows.
Given that all we would have to do is to open the socket and listen, feed
what's received on it to asprintf and get our callbacks called for events, it
sounds very simply indeed.

We will have a look at this too, seems from code complexity there barely would
be a difference, so taking the full jump immediately might be an option as
well. I will report on that too.

Best regards,
Kay Hayen
Steve Grubb
2008-08-19 18:39:50 UTC
Permalink
Post by Kay Hayen
Post by Steve Grubb
Post by Kay Hayen
Post by Steve Grubb
libaudit should pull complete events from the kernel unless an execve
has an excessive number of arguments or large sized arguments.
I read that as that we can use the netlink socket with the libaudit
directly, which sort of could be exactly what we want. That would mean
we wouldn't use audit user space (processes) at all, right?
True. You would have to load your own rules since that is done by the
audit user space.
Can you confirm that two processes opening netlink sockets for audit
information get the same messages?
Only one audit pid is allowed for security purposes.
Post by Kay Hayen
I am under the impression that the kernel doesn't maintain per socket
configuration, does it?
Nope, it only allows one.
Post by Kay Hayen
If that were the case, we would simply co-exist with auditd and let it do
its logging, etc. and benefit from it, and its ability to load the rules
If you want to co-exist with auditd, then you want to write your own audispd.
I pointed you to the skeleton.c code in the other email.
Post by Kay Hayen
Post by Steve Grubb
asprintf(&v, "type=%s msg=%.*s\n", type, e->hdr.size, e->data);
and "v" has the string ready for auparse use. asprinf() allocates memory,
so watch that it doesn't create a memory leak.
That's very sweet. Where would you expect the pitfalls? I mean, it can't be
so easy. :-)
No pitfalls except watching for memory leaks. Audispd used the same code.

-Steve
Kay Hayen
2008-08-19 20:33:58 UTC
Permalink
Hello Steve,
Post by Steve Grubb
Post by Kay Hayen
Can you confirm that two processes opening netlink sockets for audit
information get the same messages?
Only one audit pid is allowed for security purposes.
Damn security. I saw that patch while googling, and hoped it wasn't merged,
but seems it was.

I don't really understand why it is helping security, if I need to kill auditd
before I can open the netlink socket. For both I need root rights.

There isn't any SELinux in the play, is there?

Because if that were the case, we could e.g. only open the netlink socket with
the auditd binary. That would be effective, and configuration we could then
change.

But probably pointless to waiste your time on this, given how little I
understand security. I just can't resist, feels like a bike-shed and really
annoying limitation for our non-security interested system. :-)

Best regards,
Kay Hayen
Steve Grubb
2008-08-19 20:47:16 UTC
Permalink
Post by Kay Hayen
Post by Steve Grubb
Only one audit pid is allowed for security purposes.
Damn security. I saw that patch while googling, and hoped it wasn't merged,
but seems it was.
Its been there since 2.6.6 kernel. IOW - day 1.
Post by Kay Hayen
I don't really understand why it is helping security, if I need to kill
auditd before I can open the netlink socket. For both I need root rights.
The queueing is complicated and if you have a group of processes it gets real
messy. The audit queue tries hard for guaranteed delivery or take the system
down if the flow is not working right. Its not like syslog or iptables
logging.
Post by Kay Hayen
There isn't any SELinux in the play, is there?
SE Linux helps for MLS systems, but for CAPP, it doesn't come into play. The
data flowing through the audit system could be very sensitive. Anyone needing
access to the stream either needs to replace auditd, write their own
dispatcher, or write a plugin to the shipped dispatcher. This way the admin
knows exactly what processes have access to the data.

-Steve
Kay Hayen
2008-08-19 21:35:14 UTC
Permalink
Hello Steve,
Post by Steve Grubb
Post by Kay Hayen
I don't really understand why it is helping security, if I need to kill
auditd before I can open the netlink socket. For both I need root rights.
The queueing is complicated and if you have a group of processes it gets
real messy. The audit queue tries hard for guaranteed delivery or take the
system down if the flow is not working right. Its not like syslog or
iptables logging.
Ah I see! So I misread "security" to mean "prevent access" where it's
actually "security" as in "not possibly corrupted data", and that's very
welcome. Sorry about the confusion.

BTW: I looked at auditctl source and did some test, and it seems the rules can
be set by using auditctl even without auditd running. So that means we don't
have to do that ourselves.

Best regards,
Kay Hayen
Steve Grubb
2008-08-19 21:47:35 UTC
Permalink
Post by Kay Hayen
BTW: I looked at auditctl source and did some test, and it seems the rules
can be set by using auditctl even without auditd running. So that means we
don't have to do that ourselves.
Sort of. The initscripts of auditd load the rules using
auditctl -R /etc/audit/audit.rules. So, you'd want to do that in your
initscript if you decide to replace auditd.

-Steve

Continue reading on narkive:
Loading...