Discussion:
linux-audit: reconstruct path names from syscall events?
(too old to reply)
John Feuerstein
2011-09-17 00:12:15 UTC
Permalink
Hi,

I would like to audit all changes to a directory tree using the linux
auditing system[1].

# auditctl -a exit,always -F dir=/etc/ -F perm=wa

It seems like the GNU coreutils are enough to break the audit trail.

The resulting SYSCALL events provide CWD and multiple PATH records,
depending on the syscall. If one of the PATH records is relative, I can
reconstruct the absolute path using the CWD record.

However, that does not work for the whole *at syscall family
(unlinkat(2), renameat(2), linkat(2), ...); accepting paths relative to
a given directory file descriptor. GNU coreutils are prominent users,
for example "rm -r" making use of unlinkat(2) to prevent races.

Things like dup(2) and fd passing via unix domain sockets come to mind.
It's the same old story again: mapping fds to path names is ambiguous at
best, if not impossible.

I wonder why such incomplete file system auditing rules are considered
sufficient in the CAPP/LSPP/NISPOM/STIG rulesets?

Here's a simplified example:

$ cd /tmp
$ mkdir dir
$ touch dir/file
$ ls -ldi /tmp /tmp/dir /tmp/dir/file
2057 drwxrwxrwt 9 root root 380 Sep 17 00:02 /tmp
58781 drwxr-xr-x 2 john john 40 Sep 17 00:02 /tmp/dir
56228 -rw-r--r-- 1 john john 0 Sep 17 00:02 /tmp/dir/file
$ cat > unlinkat.c
#include <unistd.h>
#include <fcntl.h>

int main(int argc, char **argv)
{
int dirfd = open("dir", O_RDONLY);
unlinkat(dirfd, "file", 0);
return 0;
}
^D
$ make unlinkat
cc unlinkat.c -o unlinkat
$ sudo autrace ./unlinkat
Waiting to execute: ./unlinkat
Cleaning up...
Trace complete. You can locate the records with 'ausearch -i -p 32121'
$ ls -li dir
total 0

Now, looking at the resulting raw SYSCALL event for unlinkat(2):

type=SYSCALL msg=audit(1316210542.899:779): arch=c000003e syscall=263 success=yes exit=0 a0=3 a1=400690 a2=0 a3=0 items=2 ppid=32106 pid=32121 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts12 ses=36 comm="unlinkat" exe="/tmp/unlinkat" key=(null)
type=CWD msg=audit(1316210542.899:779): cwd="/tmp"
type=PATH msg=audit(1316210542.899:779): item=0 name="/tmp" inode=58781 dev=00:0e mode=040755 ouid=1000 ogid=1000 rdev=00:00
type=PATH msg=audit(1316210542.899:779): item=1 name="file" inode=56228 dev=00:0e mode=0100644 ouid=1000 ogid=1000 rdev=00:00
type=EOE msg=audit(1316210542.899:779):

-
Steve Grubb
2011-10-01 12:31:57 UTC
Permalink
Post by John Feuerstein
I would like to audit all changes to a directory tree using the linux
auditing system[1].
# auditctl -a exit,always -F dir=/etc/ -F perm=wa
It seems like the GNU coreutils are enough to break the audit trail.
I was hoping one of the kernel developers would have got involved with this question.
I pointed out the same problem as you maybe 5 years ago. The people working on it at
the time said that if you really want to know, just add events for opens and then you
can piece it together. In my opinion, that is avoiding the problem and not solving it.
There are way too many opens to put into an audit trail on the odd chance that you
might have needed one. In 5 years, the kernel has changed and so have the people
working on the code. Maybe this problem should be revisited.

-Steve
Post by John Feuerstein
The resulting SYSCALL events provide CWD and multiple PATH records,
depending on the syscall. If one of the PATH records is relative, I can
reconstruct the absolute path using the CWD record.
However, that does not work for the whole *at syscall family
(unlinkat(2), renameat(2), linkat(2), ...); accepting paths relative to
a given directory file descriptor. GNU coreutils are prominent users,
for example "rm -r" making use of unlinkat(2) to prevent races.
Things like dup(2) and fd passing via unix domain sockets come to mind.
It's the same old story again: mapping fds to path names is ambiguous at
best, if not impossible.
I wonder why such incomplete file system auditing rules are considered
sufficient in the CAPP/LSPP/NISPOM/STIG rulesets?
$ cd /tmp
$ mkdir dir
$ touch dir/file
$ ls -ldi /tmp /tmp/dir /tmp/dir/file
2057 drwxrwxrwt 9 root root 380 Sep 17 00:02 /tmp
58781 drwxr-xr-x 2 john john 40 Sep 17 00:02 /tmp/dir
56228 -rw-r--r-- 1 john john 0 Sep 17 00:02 /tmp/dir/file
$ cat > unlinkat.c
#include <unistd.h>
#include <fcntl.h>
int main(int argc, char **argv)
{
int dirfd = open("dir", O_RDONLY);
unlinkat(dirfd, "file", 0);
return 0;
}
^D
$ make unlinkat
cc unlinkat.c -o unlinkat
$ sudo autrace ./unlinkat
Waiting to execute: ./unlinkat
Cleaning up...
Trace complete. You can locate the records with 'ausearch -i -p 32121'
$ ls -li dir
total 0
type=SYSCALL msg=audit(1316210542.899:779): arch=c000003e syscall=263
success=yes exit=0 a0=3 a1=400690 a2=0 a3=0 items=2 ppid=32106 pid=32121
auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts12
ses=36 comm="unlinkat" exe="/tmp/unlinkat" key=(null) type=CWD
msg=audit(1316210542.899:779): cwd="/tmp"
type=PATH msg=audit(1316210542.899:779): item=0 name="/tmp" inode=58781
dev=00:0e mode=040755 ouid=1000 ogid=1000 rdev=00:00 type=PATH
msg=audit(1316210542.899:779): item=1 name="file" inode=56228 dev=00:0e
mode=0100644 ouid=1000 ogid=1000 rdev=00:00 type=EOE
-
Casey Schaufler
2011-10-03 19:42:25 UTC
Permalink
Post by Steve Grubb
Post by John Feuerstein
I would like to audit all changes to a directory tree using the linux
auditing system[1].
# auditctl -a exit,always -F dir=/etc/ -F perm=wa
It seems like the GNU coreutils are enough to break the audit trail.
I was hoping one of the kernel developers would have got involved with this question.
I pointed out the same problem as you maybe 5 years ago. The people working on it at
the time said that if you really want to know, just add events for opens and then you
can piece it together. In my opinion, that is avoiding the problem and not solving it.
There are way too many opens to put into an audit trail on the odd chance that you
might have needed one. In 5 years, the kernel has changed and so have the people
working on the code. Maybe this problem should be revisited.
Howdy. Kernel developer here.

The problem goes way back. Way, way back. I will do my
best to describe what is going on and why the kernel has
such a problem with pathnames and audit. I am afraid that
you may not be happy with the explanation, but I also
think that you should understand what is going on and why
it has been so difficult to get a satisfactory resolution.

The Linux (and UNIX before it) kernel does not have an
internal concept of a path. Pathname resolution is provided
for the convenience of user space code.

The kernel has a simple view of filesystem objects. They
are inodes and datablocks. So long as there is a name for
the inode somewhere on the system the object is retained,
and once all the names are gone it is expunged. There are
two kinds of names; open file descriptors and directory
entries. A directory entry contains exactly one component
of a pathname. You are not allowed to remove directories
unless they are empty because that would leave objects with
names in an inaccessible state.

The Linux filesystem semantics, inherited in all their
glory from UNIX, permit multiple directory entries to
refer to the same inode. That means that there can be
multiple names for the same object in the filesystem
name space. These names are all peers. None is the "real"
name of the object. The only possible real name for the
object is the inode number (combined with an identification
of the containing filesystem). This identifies the object
even when all entries in the filesystem namespace are
gone but the file is open. Auditible event can occur on
files that are open but have not filesystem entries.

It's a big mess because the auditor obviously wants to
know the name of the file, but it is entirely possible
that there are hundreds of names in the filesystem space
for the object and that there are hundreds of open file
descriptors for the object, none of which were created
by opening pathnames that refer to that object any longer.

The kernel can keep track of the path used to reach an
inode, but with hard links, symlinks, mount points and
namespaces the reality is that you can't identify the
object involved using that information. The best that
can be done is to record the pathname requested, the
pathname resolved, and the inode number. It is impossible
to track objects by pathname because the pathname is
not a kernel concept.

It's been this way forever. UNIX audit systems had/have
the exact same problem. This is why we have AppArmor and
TOMOYO. Unless someone smarter than I am has an outstanding
insight we aren't going to make you happy any time soon.
Post by Steve Grubb
-Steve
Post by John Feuerstein
The resulting SYSCALL events provide CWD and multiple PATH records,
depending on the syscall. If one of the PATH records is relative, I can
reconstruct the absolute path using the CWD record.
However, that does not work for the whole *at syscall family
(unlinkat(2), renameat(2), linkat(2), ...); accepting paths relative to
a given directory file descriptor. GNU coreutils are prominent users,
for example "rm -r" making use of unlinkat(2) to prevent races.
Things like dup(2) and fd passing via unix domain sockets come to mind.
It's the same old story again: mapping fds to path names is ambiguous at
best, if not impossible.
I wonder why such incomplete file system auditing rules are considered
sufficient in the CAPP/LSPP/NISPOM/STIG rulesets?
$ cd /tmp
$ mkdir dir
$ touch dir/file
$ ls -ldi /tmp /tmp/dir /tmp/dir/file
2057 drwxrwxrwt 9 root root 380 Sep 17 00:02 /tmp
58781 drwxr-xr-x 2 john john 40 Sep 17 00:02 /tmp/dir
56228 -rw-r--r-- 1 john john 0 Sep 17 00:02 /tmp/dir/file
$ cat > unlinkat.c
#include <unistd.h>
#include <fcntl.h>
int main(int argc, char **argv)
{
int dirfd = open("dir", O_RDONLY);
unlinkat(dirfd, "file", 0);
return 0;
}
^D
$ make unlinkat
cc unlinkat.c -o unlinkat
$ sudo autrace ./unlinkat
Waiting to execute: ./unlinkat
Cleaning up...
Trace complete. You can locate the records with 'ausearch -i -p 32121'
$ ls -li dir
total 0
type=SYSCALL msg=audit(1316210542.899:779): arch=c000003e syscall=263
success=yes exit=0 a0=3 a1=400690 a2=0 a3=0 items=2 ppid=32106 pid=32121
auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts12
ses=36 comm="unlinkat" exe="/tmp/unlinkat" key=(null) type=CWD
msg=audit(1316210542.899:779): cwd="/tmp"
type=PATH msg=audit(1316210542.899:779): item=0 name="/tmp" inode=58781
dev=00:0e mode=040755 ouid=1000 ogid=1000 rdev=00:00 type=PATH
msg=audit(1316210542.899:779): item=1 name="file" inode=56228 dev=00:0e
mode=0100644 ouid=1000 ogid=1000 rdev=00:00 type=EOE
-
Steve Grubb
2011-10-04 17:02:03 UTC
Permalink
Post by Casey Schaufler
Post by Steve Grubb
Post by John Feuerstein
I would like to audit all changes to a directory tree using the linux
auditing system[1].
# auditctl -a exit,always -F dir=/etc/ -F perm=wa
It seems like the GNU coreutils are enough to break the audit trail.
I was hoping one of the kernel developers would have got involved with
this question. I pointed out the same problem as you maybe 5 years ago.
The people working on it at the time said that if you really want to
know, just add events for opens and then you can piece it together. In
my opinion, that is avoiding the problem and not solving it. There are
way too many opens to put into an audit trail on the odd chance that you
might have needed one. In 5 years, the kernel has changed and so have
the people working on the code. Maybe this problem should be revisited.
Howdy. Kernel developer here.
Hi Casey,

Glad someone jumped in to this. :)
Post by Casey Schaufler
The problem goes way back. Way, way back. I will do my
best to describe what is going on and why the kernel has
such a problem with pathnames and audit. I am afraid that
you may not be happy with the explanation, but I also
think that you should understand what is going on and why
it has been so difficult to get a satisfactory resolution.
The Linux (and UNIX before it) kernel does not have an
internal concept of a path. Pathname resolution is provided
for the convenience of user space code.
The kernel has a simple view of filesystem objects. They
are inodes and datablocks. So long as there is a name for
the inode somewhere on the system the object is retained,
and once all the names are gone it is expunged. There are
two kinds of names; open file descriptors and directory
entries. A directory entry contains exactly one component
of a pathname. You are not allowed to remove directories
unless they are empty because that would leave objects with
names in an inaccessible state.
The Linux filesystem semantics, inherited in all their
glory from UNIX, permit multiple directory entries to
refer to the same inode. That means that there can be
multiple names for the same object in the filesystem
name space. These names are all peers. None is the "real"
name of the object. The only possible real name for the
object is the inode number (combined with an identification
of the containing filesystem). This identifies the object
even when all entries in the filesystem namespace are
gone but the file is open. Auditible event can occur on
files that are open but have not filesystem entries.
It's a big mess because the auditor obviously wants to
know the name of the file, but it is entirely possible
that there are hundreds of names in the filesystem space
for the object and that there are hundreds of open file
descriptors for the object, none of which were created
by opening pathnames that refer to that object any longer.
The kernel can keep track of the path used to reach an
inode, but with hard links, symlinks, mount points and
namespaces the reality is that you can't identify the
object involved using that information. The best that
can be done is to record the pathname requested, the
pathname resolved, and the inode number. It is impossible
to track objects by pathname because the pathname is
not a kernel concept.
It's been this way forever. UNIX audit systems had/have
the exact same problem. This is why we have AppArmor and
TOMOYO. Unless someone smarter than I am has an outstanding
insight we aren't going to make you happy any time soon.
What I was wondering about is this. Assumption: The openat audit event should not
happen all the time. If it does, then the system is going to perform poorly due to
load. So, if an event fires, we really aren't on the hotpath anymore. Proposal: If we
have an *at syscall event triggered, why can't we look at the fd being passed in and
look it up in the same place that the kernel keeps the path for the /proc/<pid>/fd/# ?

-Steve
John Feuerstein
2011-10-04 22:09:50 UTC
Permalink
Casey,

thanks for your explanation.
Post by Casey Schaufler
The Linux filesystem semantics, inherited in all their
glory from UNIX, permit multiple directory entries to
refer to the same inode. That means that there can be
multiple names for the same object in the filesystem
name space. These names are all peers. None is the "real"
name of the object. The only possible real name for the
object is the inode number (combined with an identification
of the containing filesystem). This identifies the object
even when all entries in the filesystem namespace are
gone but the file is open. Auditible event can occur on
files that are open but have not filesystem entries.
Linux does not support multiple hardlinks to a directory[1] though?

Since the first argument to *at(2) syscalls is a dirfd, would it not be
possible to do something similar to getcwd(2)?

1. identify the root directory -> root
2. identify the given directory using the dirfd -> dir
3. until we reach root:
- open ".." -> parent
- scan for a dentry that matches dir
- dir = parent
4. reconstruct path from dentry components

d_path() in fs/dcache.c[2] seems to implement that.

I understand that this is ambiguous because of directory symlinks, but
it's better than the current situation. It would work out fine on
filesystems without symlinks (AFAIK this is only possible using FUSE on
Linux as of now, FreeBSD has had mount -o nosymlink for ages).

However, I'm not sure if it's worth the performance penalty. What about
making this configurable with sysctl? If enabled, PATH records for
syscall arguments consisting of a directory file descriptor will get
their name field reconstructed (best-effort/ambiguous). If disabled, the
name field will simply remain empty, instead of falling back to the cwd.


[1] http://lwn.net/Articles/249607/
[2] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=fs/dcache.c;hb=v3.0#l2610
Eric Paris
2011-10-07 13:50:58 UTC
Permalink
Casey only talked about the easy part of the reason the pathnames are
useless. He forgot to mention that the linux kernel has mount
namespaces. There is absolutely no reason why one could not mount a FS
in the init namespace, launch a whole 'virtual machine' in that new FS,
and then unmount the FS from the initial namespace. Now we have 2
COMPLETELY disjoint 'filesystems'.

The audit logs, and things like /proc/pid/fd or dpath functions are all
going to be relative to the local FS namespace. Sometimes it just quite
simply can't be resolved. So now inside virtual machine namespace they
might read/modify /etc/shadow and that file IS /etc/shadow. There is no
other 'path' for that file. True its not the same /etc/shadow as the
one in the init fs namespace. And at some point there may have existed
a path in the init namespace /mnt/virt1/etc/shadow which also
represented that inode, but at this point in time the ONLY path which
represents this file is /etc/shadow.

Audit logs based on name are wrong and misleading. There's a reason the
auditable object is the inode and fs details Casey mentioned. We might
be able to usually give me information, but that information cannot EVER
be used for anything useful. Its unreliable. Exposing it only leads
one to believe they have knowledge they don't.

-Eric
Steve Grubb
2011-10-07 14:04:06 UTC
Permalink
Post by Eric Paris
Audit logs based on name are wrong and misleading. There's a reason the
auditable object is the inode and fs details Casey mentioned. We might
be able to usually give me information, but that information cannot EVER
be used for anything useful. Its unreliable. Exposing it only leads
one to believe they have knowledge they don't.
But also depending on an inode leads you to have knowledge when you don't. Inodes get
reassigned. Depending on when you look at your logs, that inode could have been used
in 5 different file names between the event and now. So, it is important to snapshot
what the inode was associated with at a given time.

My thinking is that we can make the system more useful even with name spaces. By
recording the current association we are more certain about what was being accessed.
We should be able to record the mount command that caused the new namespace so that we
can reconstruct a meaningful path. Maybe we need to change the session id on that
occurance so that child processes can be properly identified.

But you raise a new point that I think we eventally have to address and that is
containers. I think we will have to record container IDs and the like so that an audit
event can have a proper context for what it means. We at some point could have
duplicate uids and pids that are disjointed - not just files.

We have to address this some time.

-Steve
Casey Schaufler
2011-10-07 17:20:23 UTC
Permalink
Post by Eric Paris
Casey only talked about the easy part of the reason the pathnames are
useless. He forgot to mention
I didn't forgot to mention the whole mount point thingy.
People always get hung up in coming up with ways to explain
around the problem, and having already identified the root
cause of the problem I figured we might avoid yet another
round of clever and convoluted arguments around identifying
mount points.
Post by Eric Paris
that the linux kernel has mount
namespaces. There is absolutely no reason why one could not mount a FS
in the init namespace, launch a whole 'virtual machine' in that new FS,
and then unmount the FS from the initial namespace. Now we have 2
COMPLETELY disjoint 'filesystems'.
You don't even need all this to demonstrate the problem. A simple
chroot (or worse yet, fchroot (do we have that? I forget)) gives
most audit tools the willies.
Post by Eric Paris
The audit logs, and things like /proc/pid/fd or dpath functions are all
going to be relative to the local FS namespace. Sometimes it just quite
simply can't be resolved. So now inside virtual machine namespace they
might read/modify /etc/shadow and that file IS /etc/shadow. There is no
other 'path' for that file. True its not the same /etc/shadow as the
one in the init fs namespace. And at some point there may have existed
a path in the init namespace /mnt/virt1/etc/shadow which also
represented that inode, but at this point in time the ONLY path which
represents this file is /etc/shadow.
Audit logs based on name are wrong and misleading. There's a reason the
auditable object is the inode and fs details Casey mentioned. We might
be able to usually give me information, but that information cannot EVER
be used for anything useful. Its unreliable. Exposing it only leads
one to believe they have knowledge they don't.
The unfortunate reality is that audit requirements dating back to
the 1985 Orange Book clearly require audit be done by name. The
intent has always been clear that the name should be the name by
which people reference the object. We have been arguing that
providing the dev/inode pair meets the requirements for decades
now and only getting by with it by promising to provide mappings
to real pathnames.

The best you can do, and Irix(tm) did (does in some remote backwaters)
is report the dev/inode, the pathname requested and the pathname
resolved with indications of components that are symlinks and mount
points.

This is why there is value in pathname based access controls.
In many cases you don't care so much which object was accessed
as you do that the object is accessible using the name /etc/shadow.
Of course, once you say that, you really care about the object
because it may be accessed by other names as well and if you
care about the object named /etc/shadow you care about it regardless
of the name used to access it.

I would be delighted if someone came up with the fiendishly
clever solution to the issue. I am not going to bet on one
in my lifetime.
Post by Eric Paris
-Eric
Steve Grubb
2011-10-07 18:02:41 UTC
Permalink
Post by Casey Schaufler
I would be delighted if someone came up with the fiendishly
clever solution to the issue. I am not going to bet on one
in my lifetime.
It doesn't even need to be fiendishly clever to be useful. Using the /etc/shadow
analogy, What we get now is just shadow. Shadow where? /etc? /var/chroot/bind/etc?
/backup/etc? Any clue would be helpful. Bind mounts, chroot, and namespaces all make
it interesting, but just adding the dir as an aux record would make things so much
better. We can solve the other problem another day.

-Steve
Eric Paris
2011-10-07 18:27:53 UTC
Permalink
Post by Casey Schaufler
Post by Eric Paris
Casey only talked about the easy part of the reason the pathnames are
useless. He forgot to mention
I didn't forgot to mention the whole mount point thingy.
People always get hung up in coming up with ways to explain
around the problem, and having already identified the root
cause of the problem
Ok fair enough. I guess I just saw two root problems not just one. You
mentioned there existing multiple names for the same object. I was
thinking of the of there not existing any name for an object which makes
sense at a 'system wide' level. In any case. We might be able to get
some more pathname like info, but it's never (like Casey so sagely said)
going to be truely useful....

-Eric
Casey Schaufler
2011-10-07 21:38:41 UTC
Permalink
Post by Eric Paris
Post by Casey Schaufler
Post by Eric Paris
Casey only talked about the easy part of the reason the pathnames are
useless. He forgot to mention
I didn't forgot to mention the whole mount point thingy.
People always get hung up in coming up with ways to explain
around the problem, and having already identified the root
cause of the problem
Ok fair enough. I guess I just saw two root problems not just one. You
mentioned there existing multiple names for the same object. I was
thinking of the of there not existing any name for an object which makes
sense at a 'system wide' level. In any case. We might be able to get
some more pathname like info, but it's never (like Casey so sagely said)
going to be truely useful....
The worst case is 4000 processes that opened the file under 4000
different pathnames, all of which have since been unlinked, doing
fchmod. At the time of fchmod there is no pathname that refers to
the file, although 4000 pathnames are associated with the object
whose mode is getting changed. The dev/inode pair is the only
externally visible identifier that could possibly be used to
name the file in the log, and as you point out, the dev is not
reliable.

Now even with that, a path name could be useful. It just can't
be considered definitive or unique. As for audit tracking, you
really ought to be able to say things like "show me everything
that happens to the file that is currently called /etc/shadow"
and "show me everything that happens to any file called /etc/shadow",
even though the two statements are radically different underneath.
The problem is that 99 44/100% of the people looking at or setting
up audit trails are going to be disinterested in or possibly
incapable of making the distinction. Let's face it, most people
shouldn't be using computers capable of running anything except
AngryBirds.
Post by Eric Paris
-Eric
Steve Grubb
2011-10-10 12:54:10 UTC
Permalink
Post by Casey Schaufler
Post by Eric Paris
Post by Casey Schaufler
Post by Eric Paris
Casey only talked about the easy part of the reason the pathnames are
useless. He forgot to mention
I didn't forgot to mention the whole mount point thingy.
People always get hung up in coming up with ways to explain
around the problem, and having already identified the root
cause of the problem
Ok fair enough. I guess I just saw two root problems not just one. You
mentioned there existing multiple names for the same object. I was
thinking of the of there not existing any name for an object which makes
sense at a 'system wide' level. In any case. We might be able to get
some more pathname like info, but it's never (like Casey so sagely said)
going to be truely useful....
The worst case is 4000 processes that opened the file under 4000
different pathnames, all of which have since been unlinked, doing
fchmod. At the time of fchmod there is no pathname that refers to
the file, although 4000 pathnames are associated with the object
whose mode is getting changed. The dev/inode pair is the only
externally visible identifier that could possibly be used to
name the file in the log, and as you point out, the dev is not
reliable.
Now even with that, a path name could be useful. It just can't
be considered definitive or unique. As for audit tracking, you
really ought to be able to say things like "show me everything
that happens to the file that is currently called /etc/shadow"
and "show me everything that happens to any file called /etc/shadow",
even though the two statements are radically different underneath.
The problem is that 99 44/100% of the people looking at or setting
up audit trails are going to be disinterested in or possibly
incapable of making the distinction. Let's face it, most people
shouldn't be using computers capable of running anything except
AngryBirds.
I think that you missed the point of this problem report. There is a problem in path
name resolution as has been discussed. But the problem is that we only get partial
path information for some syscalls. The *at syscalls take 2 parameters: a previously
opened fd and a path relative to the fd. The audit system is recording the relative
path fine. What we don't have is what the fd points to.

It would be unreasonable to expect a watch on /etc be added on the off chance that you
might need 1 of those million opens to reconstruct an event. Right now we don't get
any information about the parent directory. Neither the directory's device or inode
gets recorded. Everything you point out is another problem.

-Steve
Mark Moseley
2012-10-09 23:09:18 UTC
Permalink
Post by John Feuerstein
Hi,
I would like to audit all changes to a directory tree using the linux
auditing system[1].
# auditctl -a exit,always -F dir=/etc/ -F perm=wa
It seems like the GNU coreutils are enough to break the audit trail.
The resulting SYSCALL events provide CWD and multiple PATH records,
depending on the syscall. If one of the PATH records is relative, I can
reconstruct the absolute path using the CWD record.
However, that does not work for the whole *at syscall family
(unlinkat(2), renameat(2), linkat(2), ...); accepting paths relative to
a given directory file descriptor. GNU coreutils are prominent users,
for example "rm -r" making use of unlinkat(2) to prevent races.
Things like dup(2) and fd passing via unix domain sockets come to mind.
It's the same old story again: mapping fds to path names is ambiguous at
best, if not impossible.
I wonder why such incomplete file system auditing rules are considered
sufficient in the CAPP/LSPP/NISPOM/STIG rulesets?
$ cd /tmp
$ mkdir dir
$ touch dir/file
$ ls -ldi /tmp /tmp/dir /tmp/dir/file
2057 drwxrwxrwt 9 root root 380 Sep 17 00:02 /tmp
58781 drwxr-xr-x 2 john john 40 Sep 17 00:02 /tmp/dir
56228 -rw-r--r-- 1 john john 0 Sep 17 00:02 /tmp/dir/file
$ cat > unlinkat.c
#include <unistd.h>
#include <fcntl.h>
int main(int argc, char **argv)
{
int dirfd = open("dir", O_RDONLY);
unlinkat(dirfd, "file", 0);
return 0;
}
^D
$ make unlinkat
cc unlinkat.c -o unlinkat
$ sudo autrace ./unlinkat
Waiting to execute: ./unlinkat
Cleaning up...
Trace complete. You can locate the records with 'ausearch -i -p 32121'
$ ls -li dir
total 0
type=SYSCALL msg=audit(1316210542.899:779): arch=c000003e syscall=263 success=yes exit=0 a0=3 a1=400690 a2=0 a3=0 items=2 ppid=32106 pid=32121 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts12 ses=36 comm="unlinkat" exe="/tmp/unlinkat" key=(null)
type=CWD msg=audit(1316210542.899:779): cwd="/tmp"
type=PATH msg=audit(1316210542.899:779): item=0 name="/tmp" inode=58781 dev=00:0e mode=040755 ouid=1000 ogid=1000 rdev=00:00
type=PATH msg=audit(1316210542.899:779): item=1 name="file" inode=56228 dev=00:0e mode=0100644 ouid=1000 ogid=1000 rdev=00:00
-
Al Viro
2012-10-09 23:29:07 UTC
Permalink
Post by John Feuerstein
Hi,
I would like to audit all changes to a directory tree using the linux
auditing system[1].
# auditctl -a exit,always -F dir=/etc/ -F perm=wa
It seems like the GNU coreutils are enough to break the audit trail.
The resulting SYSCALL events provide CWD and multiple PATH records,
depending on the syscall. If one of the PATH records is relative, I can
reconstruct the absolute path using the CWD record.
However, that does not work for the whole *at syscall family
(unlinkat(2), renameat(2), linkat(2), ...); accepting paths relative to
a given directory file descriptor. GNU coreutils are prominent users,
for example "rm -r" making use of unlinkat(2) to prevent races.
Things like dup(2) and fd passing via unix domain sockets come to mind.
It's the same old story again: mapping fds to path names is ambiguous at
best, if not impossible.
Your point being? Even if you do get all pathnames, you *can't* reconstruct
the changes of filesystem tree, period. Pathname resolution is not atomic.
Can't be made such, either - not without serializing all system calls, which
will hurt too damn much.

You can tell when something happens to filesystem *object*. Which audit,
lousy as it is, allows to do. Anything that hopes to reconstruct the
history of changes based on fully timestamped history of syscalls is
inherently unreliable.

Again, pathname resolution is not atomic at all and neither is reconstructing
pathname by object (i.e. by vfsmount/dentry pair).
Al Viro
2012-10-09 23:39:27 UTC
Permalink
If you see my recent linux-audit posting, another related thing (at
least as far as missing relevant information in the logs) is that the
audit logs are logging pathnames relative to the chroot, instead of
the pathnames relative to the root of the OS itself. You'd expect a
process chroot'd to /chroot, accessing (from the perspective of the
OS) /chroot/etc/password would get logged as /chroot/etc/password but
is rather logged as /etc/password.
I don't have a working LXC install handy, but I'd imagine the audit
subsystem would log relative to the container's / instead of the
host's / too.
BTW, what makes you think that container's root is even reachable from
"the host's /"? There is no such thing as "root of the OS itself"; different
processes can (and in case of containers definitely do) run in different
namespaces. With entirely different filesystems mounted in those, and
no promise whatsoever that any specific namespace happens to have all
filesystems mounted somewhere in it...
Mark Moseley
2012-10-09 23:47:17 UTC
Permalink
Post by Al Viro
If you see my recent linux-audit posting, another related thing (at
least as far as missing relevant information in the logs) is that the
audit logs are logging pathnames relative to the chroot, instead of
the pathnames relative to the root of the OS itself. You'd expect a
process chroot'd to /chroot, accessing (from the perspective of the
OS) /chroot/etc/password would get logged as /chroot/etc/password but
is rather logged as /etc/password.
I don't have a working LXC install handy, but I'd imagine the audit
subsystem would log relative to the container's / instead of the
host's / too.
BTW, what makes you think that container's root is even reachable from
"the host's /"? There is no such thing as "root of the OS itself"; different
processes can (and in case of containers definitely do) run in different
namespaces. With entirely different filesystems mounted in those, and
no promise whatsoever that any specific namespace happens to have all
filesystems mounted somewhere in it...
Nothing beyond guesswork, since it's been a while since I've played
with LXC. In any case, I was struggling a bit for the correct
terminology.

Am I similarly off-base with regards to the chroot'd scenario?
Al Viro
2012-10-09 23:54:47 UTC
Permalink
Post by Mark Moseley
Post by Al Viro
BTW, what makes you think that container's root is even reachable from
"the host's /"? There is no such thing as "root of the OS itself"; different
processes can (and in case of containers definitely do) run in different
namespaces. With entirely different filesystems mounted in those, and
no promise whatsoever that any specific namespace happens to have all
filesystems mounted somewhere in it...
Nothing beyond guesswork, since it's been a while since I've played
with LXC. In any case, I was struggling a bit for the correct
terminology.
Am I similarly off-base with regards to the chroot'd scenario?
chroot case is going to be reachable from namespace root, but I seriously
doubt that pathname relative to that will be more useful...

Again, relying on pathnames for forensics (or security in general) is
a serious mistake (cue unprintable comments about apparmor and similar
varieties of snake oil). And using audit as poor man's ktrace analog
is... misguided, to put it very mildly.
Mark Moseley
2012-10-10 22:45:08 UTC
Permalink
Post by Al Viro
Post by Mark Moseley
Post by Al Viro
BTW, what makes you think that container's root is even reachable from
"the host's /"? There is no such thing as "root of the OS itself"; different
processes can (and in case of containers definitely do) run in different
namespaces. With entirely different filesystems mounted in those, and
no promise whatsoever that any specific namespace happens to have all
filesystems mounted somewhere in it...
Nothing beyond guesswork, since it's been a while since I've played
with LXC. In any case, I was struggling a bit for the correct
terminology.
Am I similarly off-base with regards to the chroot'd scenario?
chroot case is going to be reachable from namespace root, but I seriously
doubt that pathname relative to that will be more useful...
Possibly not, but it'd still be good to have some sort of indicator
that this entry is being logged relative to the chroot, like an
additional item in the audit entry or even some kind of flag. But in
this case, and far more so in the unlinkat/chmodat/chownat case, I'd
think the least surprising thing (to me, at least) would be for the
directory item in the audit entry to have a pathname relative to
namespace root.
Post by Al Viro
Again, relying on pathnames for forensics (or security in general) is
a serious mistake (cue unprintable comments about apparmor and similar
varieties of snake oil). And using audit as poor man's ktrace analog
is... misguided, to put it very mildly.
Caveat: I'm just a sysadmin, so this stuff is as darn near "magic" as
I get to see on a regular basis, so it's safe to expect some naivety
and/or misguidedness on my part :)

I'm just using it as a log of files that have been written/changed on
moderately- to heavily-used systems. If there's another in-kernel
mechanism that'd be better suited for that sort of thing (at least
without adding a lot of overhead), I'd be definitely eager to know
about it. It's a web hosting environment, with customer files all
solely on NFS, so writes to the same directory can come from an
arbitrary number of servers. When they get swamped with write
requests, the amount of per-client stats exposed by our Netapp and
Oracle NFS servers is often only enough to point us at a client server
with an abusive user on it (but not much more, without turning on
debugging). Having logs of who's doing writes would be quite useful,
esp when writes aren't happening at that exact moment and wouldn't
show up in tools like iotop. The audit subsystem seemed like the best
fit for this kind of thing, but I'm more than open to whatever works.
Steve Grubb
2012-10-10 23:00:40 UTC
Permalink
Post by Mark Moseley
Post by Al Viro
Again, relying on pathnames for forensics (or security in general) is
a serious mistake (cue unprintable comments about apparmor and similar
varieties of snake oil). And using audit as poor man's ktrace analog
is... misguided, to put it very mildly.
Caveat: I'm just a sysadmin, so this stuff is as darn near "magic" as
I get to see on a regular basis, so it's safe to expect some naivety
and/or misguidedness on my part :)
I'm just using it as a log of files that have been written/changed on
moderately- to heavily-used systems. If there's another in-kernel
mechanism that'd be better suited for that sort of thing (at least
without adding a lot of overhead), I'd be definitely eager to know
about it. It's a web hosting environment, with customer files all
solely on NFS, so writes to the same directory can come from an
arbitrary number of servers. When they get swamped with write
requests, the amount of per-client stats exposed by our Netapp and
Oracle NFS servers is often only enough to point us at a client server
with an abusive user on it (but not much more, without turning on
debugging). Having logs of who's doing writes would be quite useful,
esp when writes aren't happening at that exact moment and wouldn't
show up in tools like iotop. The audit subsystem seemed like the best
fit for this kind of thing, but I'm more than open to whatever works.
The audit system is the best fit. But I think Al is saying there are some
limitations. i know that Eric pushed some patches a while back that makes a
stronger effort at collecting some of this information. What kernel are you
using?

-Steve
Mark Moseley
2012-10-10 23:07:49 UTC
Permalink
Post by Steve Grubb
Post by Mark Moseley
Post by Al Viro
Again, relying on pathnames for forensics (or security in general) is
a serious mistake (cue unprintable comments about apparmor and similar
varieties of snake oil). And using audit as poor man's ktrace analog
is... misguided, to put it very mildly.
Caveat: I'm just a sysadmin, so this stuff is as darn near "magic" as
I get to see on a regular basis, so it's safe to expect some naivety
and/or misguidedness on my part :)
I'm just using it as a log of files that have been written/changed on
moderately- to heavily-used systems. If there's another in-kernel
mechanism that'd be better suited for that sort of thing (at least
without adding a lot of overhead), I'd be definitely eager to know
about it. It's a web hosting environment, with customer files all
solely on NFS, so writes to the same directory can come from an
arbitrary number of servers. When they get swamped with write
requests, the amount of per-client stats exposed by our Netapp and
Oracle NFS servers is often only enough to point us at a client server
with an abusive user on it (but not much more, without turning on
debugging). Having logs of who's doing writes would be quite useful,
esp when writes aren't happening at that exact moment and wouldn't
show up in tools like iotop. The audit subsystem seemed like the best
fit for this kind of thing, but I'm more than open to whatever works.
The audit system is the best fit. But I think Al is saying there are some
limitations. i know that Eric pushed some patches a while back that makes a
stronger effort at collecting some of this information. What kernel are you
using?
Yup, understood. I've been playing with a variety of boxes, but mostly
within the 3.0.x and 3.2.x series. I'll drop 3.5.6 on some of these
boxes and see if my issues are already fixed (and proceed directly to
foot-in-mouth chagrined stage -- usually takes slightly longer to get
to that stage).
Mark Moseley
2012-10-11 17:27:53 UTC
Permalink
Post by Mark Moseley
Post by Steve Grubb
Post by Mark Moseley
Post by Al Viro
Again, relying on pathnames for forensics (or security in general) is
a serious mistake (cue unprintable comments about apparmor and similar
varieties of snake oil). And using audit as poor man's ktrace analog
is... misguided, to put it very mildly.
Caveat: I'm just a sysadmin, so this stuff is as darn near "magic" as
I get to see on a regular basis, so it's safe to expect some naivety
and/or misguidedness on my part :)
I'm just using it as a log of files that have been written/changed on
moderately- to heavily-used systems. If there's another in-kernel
mechanism that'd be better suited for that sort of thing (at least
without adding a lot of overhead), I'd be definitely eager to know
about it. It's a web hosting environment, with customer files all
solely on NFS, so writes to the same directory can come from an
arbitrary number of servers. When they get swamped with write
requests, the amount of per-client stats exposed by our Netapp and
Oracle NFS servers is often only enough to point us at a client server
with an abusive user on it (but not much more, without turning on
debugging). Having logs of who's doing writes would be quite useful,
esp when writes aren't happening at that exact moment and wouldn't
show up in tools like iotop. The audit subsystem seemed like the best
fit for this kind of thing, but I'm more than open to whatever works.
The audit system is the best fit. But I think Al is saying there are some
limitations. i know that Eric pushed some patches a while back that makes a
stronger effort at collecting some of this information. What kernel are you
using?
Yup, understood. I've been playing with a variety of boxes, but mostly
within the 3.0.x and 3.2.x series. I'll drop 3.5.6 on some of these
boxes and see if my issues are already fixed (and proceed directly to
foot-in-mouth chagrined stage -- usually takes slightly longer to get
to that stage).
Just gave 3.5.6 a shot and in these two particular cases, the result
is the same: chroot'd actions are logged in the audit entry relative
to the chroot, and the unlinkat/chmodat/chownat audit log entries only
have one item with the bare filename and no indication of directory.
Mark Moseley
2012-10-30 01:12:53 UTC
Permalink
Post by Mark Moseley
Post by Mark Moseley
Post by Steve Grubb
Post by Mark Moseley
Post by Al Viro
Again, relying on pathnames for forensics (or security in general) is
a serious mistake (cue unprintable comments about apparmor and similar
varieties of snake oil). And using audit as poor man's ktrace analog
is... misguided, to put it very mildly.
Caveat: I'm just a sysadmin, so this stuff is as darn near "magic" as
I get to see on a regular basis, so it's safe to expect some naivety
and/or misguidedness on my part :)
I'm just using it as a log of files that have been written/changed on
moderately- to heavily-used systems. If there's another in-kernel
mechanism that'd be better suited for that sort of thing (at least
without adding a lot of overhead), I'd be definitely eager to know
about it. It's a web hosting environment, with customer files all
solely on NFS, so writes to the same directory can come from an
arbitrary number of servers. When they get swamped with write
requests, the amount of per-client stats exposed by our Netapp and
Oracle NFS servers is often only enough to point us at a client server
with an abusive user on it (but not much more, without turning on
debugging). Having logs of who's doing writes would be quite useful,
esp when writes aren't happening at that exact moment and wouldn't
show up in tools like iotop. The audit subsystem seemed like the best
fit for this kind of thing, but I'm more than open to whatever works.
The audit system is the best fit. But I think Al is saying there are some
limitations. i know that Eric pushed some patches a while back that makes a
stronger effort at collecting some of this information. What kernel are you
using?
Would you happen to have a pointer to those patches? I've been surfing
the archives and not gotten lucky yet with finding the applicable
patchset.
Post by Mark Moseley
Post by Mark Moseley
Yup, understood. I've been playing with a variety of boxes, but mostly
within the 3.0.x and 3.2.x series. I'll drop 3.5.6 on some of these
boxes and see if my issues are already fixed (and proceed directly to
foot-in-mouth chagrined stage -- usually takes slightly longer to get
to that stage).
Just gave 3.5.6 a shot and in these two particular cases, the result
is the same: chroot'd actions are logged in the audit entry relative
to the chroot, and the unlinkat/chmodat/chownat audit log entries only
have one item with the bare filename and no indication of directory.
renameat seems to be the toughest of all of them (where
unlinkat/chmodat/chownat give you a hint in another audit entry). This
is doing a renameat(), from /home/moseley/tmp/tmp/renameat/1/a1 to
/home/moseley/tmp/tmp/renameat/2/a2

type=SYSCALL msg=audit(1351557710.520:74211): arch=c000003e
syscall=264 success=yes exit=0 a0=3 a1=40075c a2=4 a3=400759 items=4
ppid=22742 pid=15181 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0
sgid=0 fsgid=0 tty=pts17 ses=1727 comm="rename" exe="/tmp/rename"
key=(null)
type=CWD msg=audit(1351557710.520:74211): cwd="/tmp"
type=PATH msg=audit(1351557710.520:74211): item=0 name="/tmp"
inode=2367550 dev=08:02 mode=040775 ouid=1000 ogid=1000 rdev=00:00
type=PATH msg=audit(1351557710.520:74211): item=1 name="/tmp"
inode=2367551 dev=08:02 mode=040775 ouid=1000 ogid=1000 rdev=00:00
type=PATH msg=audit(1351557710.520:74211): item=2 name="a1"
inode=2367552 dev=08:02 mode=0100664 ouid=1000 ogid=1000 rdev=00:00
type=PATH msg=audit(1351557710.520:74211): item=3 name="a2"
inode=2367552 dev=08:02 mode=0100664 ouid=1000 ogid=1000 rdev=00:00

Anything else I could/should be trying? I'm more than willing to
experiment. I just always assume I'm missing some key flag or
something.

Here's the simple example code ... and, yes, I *do* know how to use
variables, just didn't bother here ;)

#include <fcntl.h>
#include <stdio.h>
#include <sys/types.h>
#include <dirent.h>

int main() {
DIR *a;
DIR *b;

char* dir1 = "/home/moseley/tmp/tmp/renameat/1";
char* dir2 = "/home/moseley/tmp/tmp/renameat/2";

a = opendir( dir1 );
b = opendir( dir2 );

int afd = dirfd( a );
int bfd = dirfd( b );

renameat( afd, "a1", bfd, "a2" );
}

Continue reading on narkive:
Loading...