Discussion:
[RFC][PATCH] (#6 U1) the latest incarnation
(too old to reply)
Timothy R. Chavez
2005-03-23 20:22:46 UTC
Permalink
Hello,

This is the latest patch. Quite a bit has changed and a lot more of the complexity has been stripped away. I've added back the hooks for d_instantiate() and d_splice_alias() for good form and rethought the positioning of the d_move() hooks. Also, I've done quite a bit with the locking. Now, [admittedly] I'm a novice, so the reader-writer locking stradegy I've used is probably not the best for performance -- especially since I've hooked __d_lookup() and will hit a write_lock() when I enter audit_attach_watch() (formly called audit_watch()).


Perhaps the locking can eventually change to using RCU and what not, but I'm pressed for time and have limited knowledge.

To save space in the inode struct, I've kept the i_audit as a pointer, rather then statically allocating memory for it. Now, all inode's get i_audit data allocated to them upon creation, and freed upon deletion. Is it reasonable to keep it this way? I am using kmalloc for this allocation, so I do believe I'm still wasting space, but I suppose I can change this back to audit_data slab allocations -- I just didn't know of a good way of creating the cache outside of the audit_inode_alloc() function itself.

I'll get cracking on the "list all watches" feature.

This is a patch against 2.6.11 for immediate gratification (and testing). David, I'll try to get you a patch to the the bitkeeper tree asap.

-tim
Stephen Smalley
2005-03-24 15:26:17 UTC
Permalink
This is the latest patch. Quite a bit has changed and a lot more of
the complexity has been stripped away. I've added back the hooks for
d_instantiate() and d_splice_alias() for good form and rethought the
positioning of the d_move() hooks.
Rationale? At least for me, the development process for this patch is
too opaque. I see changes from version to version (particularly with
respect to hooks and hook placement) with no explanation of the
reasoning, which makes it hard to assess the real impact. I think we
need a clear stated justification as to what each hook is achieving for
your higher level goals. Note btw that it is looking like we actually
need another security hook in the fs code to deal with the file creation
issue, so d_instantiate/d_splice_alias might not be sufficient for you
either.

As I mentioned earlier, I think you need a very clearly stated
description of your high level goals and requirements to include in your
submission on linux-fsdevel and lkml, because they will need to
understand those goals in order to assess whether your implementation is
sound and to tell you the right way to implement that desired
functionality if they think your implementation isn't sound. You want
to be careful about not confusing design/implementation with
goals/requirements.
Also, I've done quite a bit with the locking. Now, [admittedly] I'm a
novice, so the reader-writer locking stradegy I've used is probably
not the best for performance -- especially since I've hooked
__d_lookup() and will hit a write_lock() when I enter
audit_attach_watch() (formly called audit_watch()).
Did you test with a SMP kernel? Locked up immediately for me on boot,
right after displaying the Mount-cache hash table entries stats
(mnt_init).
To save space in the inode struct, I've kept the i_audit as a pointer,
rather then statically allocating memory for it. Now, all inode's get
i_audit data allocated to them upon creation, and freed upon deletion.
Is it reasonable to keep it this way?
Right, this is consistent with the security field, and lets people who
don't care about audit at all completely avoid the overhead by disabling
the compile-time option.
--
Stephen Smalley <***@tycho.nsa.gov>
National Security Agency
Timothy R. Chavez
2005-03-24 16:03:28 UTC
Permalink
Post by Stephen Smalley
Rationale? At least for me, the development process for this patch is
too opaque. I see changes from version to version (particularly with
respect to hooks and hook placement) with no explanation of the
reasoning, which makes it hard to assess the real impact. I think we
need a clear stated justification as to what each hook is achieving for
your higher level goals. Note btw that it is looking like we actually
need another security hook in the fs code to deal with the file creation
issue, so d_instantiate/d_splice_alias might not be sufficient for you
either.
As I mentioned earlier, I think you need a very clearly stated
description of your high level goals and requirements to include in your
submission on linux-fsdevel and lkml, because they will need to
understand those goals in order to assess whether your implementation is
sound and to tell you the right way to implement that desired
functionality if they think your implementation isn't sound. You want
to be careful about not confusing design/implementation with
goals/requirements.
This was sobering and reasonable. Thank you. I will definately work on this.
Post by Stephen Smalley
Also, I've done quite a bit with the locking. Now, [admittedly] I'm a
novice, so the reader-writer locking stradegy I've used is probably
not the best for performance -- especially since I've hooked
__d_lookup() and will hit a write_lock() when I enter
audit_attach_watch() (formly called audit_watch()).
Did you test with a SMP kernel? Locked up immediately for me on boot,
right after displaying the Mount-cache hash table entries stats
(mnt_init).
No testing on SMP as of right now, I'll get right on this right now. I have a
couple of minor fix-ups too (changed the description in init/Kconfig and
repositioned the exec_permission_lite() hook because if
exec_permission_lite() returns -EAGAIN, we'll get a record there and in
permission())

<snip>

Thanks again.

-tim
Stephen Smalley
2005-03-24 16:19:16 UTC
Permalink
Post by Timothy R. Chavez
No testing on SMP as of right now, I'll get right on this right now.
In audit_attach_watch(), you call audit_wentry_fetch_lock() which takes
read lock of the parent while holding the write lock on the entry. What
if dentry->d_parent == dentry, i.e. root inode of a filesystem?
--
Stephen Smalley <***@tycho.nsa.gov>
National Security Agency
Timothy R. Chavez
2005-03-24 16:36:27 UTC
Permalink
Post by Stephen Smalley
Post by Timothy R. Chavez
No testing on SMP as of right now, I'll get right on this right now.
In audit_attach_watch(), you call audit_wentry_fetch_lock() which takes
read lock of the parent while holding the write lock on the entry. What
if dentry->d_parent == dentry, i.e. root inode of a filesystem?
Funny you mentioned this. Right after you told me about the deadlock you were
getting, I went and looked at this function, and noticed the same thing. I'm
now testing for this case.
--
-tim
Stephen Smalley
2005-03-24 16:42:27 UTC
Permalink
Post by Timothy R. Chavez
Post by Stephen Smalley
Post by Timothy R. Chavez
No testing on SMP as of right now, I'll get right on this right now.
In audit_attach_watch(), you call audit_wentry_fetch_lock() which takes
read lock of the parent while holding the write lock on the entry. What
if dentry->d_parent == dentry, i.e. root inode of a filesystem?
Funny you mentioned this. Right after you told me about the deadlock you were
getting, I went and looked at this function, and noticed the same thing. I'm
now testing for this case.
Why do you need to take the write lock on the entry prior to calling
audit_wentry_fetch_lock anyway (vs. after)?
--
Stephen Smalley <***@tycho.nsa.gov>
National Security Agency
Stephen Smalley
2005-03-24 16:28:45 UTC
Permalink
Post by Timothy R. Chavez
This is the latest patch.
Other comments:

- As it stands with your patch, alloc_inode() will leak memory if
audit_inode_alloc() succeeds but security_inode_alloc() fails because
nothing frees the audit data on the error handling path for that case.
Also, I have to wonder whether it is truly necessary to make i_audit
conditionally compiled; note that i_security is always present and
initialized to NULL even if !CONFIG_SECURITY. If you can make it
unconditional, then you can also initialize it to NULL prior to the
audit_inode_alloc() call so that you have a known initial state, and
then just call audit_inode_free() on the error handling path (which will
either map to free'ing NULL, which is safe, or free'ing an actual audit
data, avoiding the leak). Or you could just make setting i_audit to
NULL the first step in audit_inode_alloc() prior to attempting the
allocation to ensure a known initial state.

- You've switched from storing the audit data in the current audit
context for later processing by audit_log_exit() to immediately
generation of partial audit data via audit_log* calls from
audit_notify_watch(). That goes beyond what I suggested; all I
suggested was setting the auditable flag in the current audit context
(while still saving the file information in the current audit context)
so that audit_syscall_exit will call audit_log_exit() and generate an
audit record upon syscall exit whenever an audit_notify_watch() is
triggered on an auditable inode during the syscall processing. Your new
approach is more like what SELinux presently does, but seems counter to
what others have been suggesting, i.e. batch everything up for
processing on syscall exit. Both approaches ensure that an audit record
is emitted whenever an auditable inode is encountered, but the present
approach yields two separate audit records (one immediate from your hook
and one upon syscall exit) vs. a single unified record. What do we
want? What do others think?
--
Stephen Smalley <***@tycho.nsa.gov>
National Security Agency
Timothy R. Chavez
2005-03-24 17:00:10 UTC
Permalink
Post by Stephen Smalley
Both approaches ensure that an audit record
is emitted whenever an auditable inode is encountered, but the present
approach yields two separate audit records (one immediate from your hook
and one upon syscall exit) vs. a single unified record. What do we
want? What do others think?
Hmmm... Here's what I get:

./auditctl -w /audit/foo -k fk_foo
cat /audit/foo

audit(1111683374.383:13808290): name="foo" filterkey=fk_foo perm=0 perm_mask=4
inode=962899 inode_uid=0 inode_gid=0 inode_dev=03:03 inode_rdev=00:00
audit(1111683374.383:13808290): syscall=5 exit=3 a0=bffff8a3 a1=8000 a2=0
a3=8000 items=1 pid=31676 loginuid=-1 uid=0 gid=0 euid=0 suid=0 fsuid=0
egid=0 sgid=0 fsgid=0
audit(1111683374.383:13808290): item=0 name="/audit/foo" inode=962899
dev=00:00

This seems to be a complete a record. I add an additional watch:

./auditctl -w /audit -k fk_audit
cat /audit/foo

audit(1111683471.201:13919013): name="audit" filterkey=fk_audit perm=0
perm_mask=1 inode=960993 inode_uid=0 inode_gid=0 inode_dev=03:03
inode_rdev=00:00
audit(1111683471.201:13919013): name="foo" filterkey=fk_foo perm=0 perm_mask=4
inode=962899 inode_uid=0 inode_gid=0 inode_dev=03:03 inode_rdev=00:00
audit(1111683471.201:13919013): syscall=5 exit=3 a0=bffff8a3 a1=8000 a2=0
a3=8000 items=1 pid=31692 loginuid=-1 uid=0 gid=0 euid=0 suid=0 fsuid=0
egid=0 sgid=0 fsgid=0
audit(1111683471.201:13919013): item=0 name="/audit/foo" inode=962899
dev=00:00
--
-tim
Stephen Smalley
2005-03-24 17:02:36 UTC
Permalink
Post by Timothy R. Chavez
./auditctl -w /audit/foo -k fk_foo
cat /audit/foo
audit(1111683374.383:13808290): name="foo" filterkey=fk_foo perm=0 perm_mask=4
inode=962899 inode_uid=0 inode_gid=0 inode_dev=03:03 inode_rdev=00:00
audit(1111683374.383:13808290): syscall=5 exit=3 a0=bffff8a3 a1=8000 a2=0
a3=8000 items=1 pid=31676 loginuid=-1 uid=0 gid=0 euid=0 suid=0 fsuid=0
egid=0 sgid=0 fsgid=0
audit(1111683374.383:13808290): item=0 name="/audit/foo" inode=962899
dev=00:00
It has all of the desired information, but is being emitted as multiple
audit records (which can be correlated using the timestamp/serial
number), and the first line is being emitted immediately by your hook
while the latter two are being emitted by the audit_log_exit upon
audit_syscall_exit. The question is whether we want your older approach
(save information in current audit context for later processing by
audit_log_exit), with just the modification of setting the auditable
flag to ensure generation, or the approach in your current patch, which
is more like what SELinux does presently.
--
Stephen Smalley <***@tycho.nsa.gov>
National Security Agency
Stephen Smalley
2005-03-24 19:13:36 UTC
Permalink
Post by Timothy R. Chavez
./auditctl -w /audit/foo -k fk_foo
cat /audit/foo
audit(1111683374.383:13808290): name="foo" filterkey=fk_foo perm=0 perm_mask=4
inode=962899 inode_uid=0 inode_gid=0 inode_dev=03:03 inode_rdev=00:00
audit(1111683374.383:13808290): syscall=5 exit=3 a0=bffff8a3 a1=8000 a2=0
a3=8000 items=1 pid=31676 loginuid=-1 uid=0 gid=0 euid=0 suid=0 fsuid=0
egid=0 sgid=0 fsgid=0
audit(1111683374.383:13808290): item=0 name="/audit/foo" inode=962899
dev=00:00
Ok, going back to what you are trying to achieve in terms of high level
goals (e.g. maintain auditing on /etc/shadow across re-creation for each
transaction), I did the following:
auditctl -w /etc/shadow -k SHADOW -p wa

i.e. show me all attempts to write or append to /etc/shadow.

Then I ran 'passwd' as a normal user and changed my own password, thus
re-creating /etc/shadow with my new password. No audit messages were
generated.
--
Stephen Smalley <***@tycho.nsa.gov>
National Security Agency
Stephen Smalley
2005-03-24 19:23:59 UTC
Permalink
Post by Stephen Smalley
Ok, going back to what you are trying to achieve in terms of high level
goals (e.g. maintain auditing on /etc/shadow across re-creation for each
auditctl -w /etc/shadow -k SHADOW -p wa
i.e. show me all attempts to write or append to /etc/shadow.
Then I ran 'passwd' as a normal user and changed my own password, thus
re-creating /etc/shadow with my new password. No audit messages were
generated.
Running strace on passwd, I see that the transaction consists of:
1) create /etc/nshadow
2) read old content from /etc/shadow
3) write new content to /etc/nshadow
4) rename /etc/nshadow to /etc/shadow

So I would have expected to see an audit upon the rename
from /etc/nshadow to /etc/shadow, no? I also tried adding a watch for
nshadow, e.g.
auditctl -w /etc/nshadow -k SHADOW -p w

Still no audit messages upon using passwd to change my password.

Now, if I change my watch to include read access as well, e.g.
auditctl -W /etc/shadow
auditctl -w /etc/shadow -k SHADOW -p rwa

Then I start to see some audit messages during passwd, but I shouldn't
have to request read access auditing in order to see modifications
(especially as that will generate a lot more data, e.g. upon every
authentication program's use of /etc/shadow).
--
Stephen Smalley <***@tycho.nsa.gov>
National Security Agency
Stephen Smalley
2005-03-24 19:32:25 UTC
Permalink
Post by Stephen Smalley
Then I start to see some audit messages during passwd, but I shouldn't
have to request read access auditing in order to see modifications
(especially as that will generate a lot more data, e.g. upon every
authentication program's use of /etc/shadow).
Ok, I see what is happening. You call audit_attach_watch() from d_move,
but you will never hit an audit_notify_watch(), hence no audit data upon
renames until a subsequent write to the existing file (which never
happens for /etc/shadow, as it is always re-created and renamed for each
transaction). So a natural question is what else should be calling
audit_notify_watch besides permission, exec_permission_lite, and
may_delete? d_move? may_create?
--
Stephen Smalley <***@tycho.nsa.gov>
National Security Agency
Stephen Smalley
2005-03-24 19:41:41 UTC
Permalink
Post by Stephen Smalley
Ok, I see what is happening. You call audit_attach_watch() from d_move,
but you will never hit an audit_notify_watch(), hence no audit data upon
renames until a subsequent write to the existing file (which never
happens for /etc/shadow, as it is always re-created and renamed for each
transaction). So a natural question is what else should be calling
audit_notify_watch besides permission, exec_permission_lite, and
may_delete? d_move? may_create?
I suppose may_create() won't help you, as the child has a negative
dentry at that point so you have no inode. You will have an inode upon
the subsequent d_instantiate, but can't tell that you are dealing with a
"just created" inode versus an already existing one, so you won't know
that you need to notify of a create. So you are back to post-create
style hooks for calling audit_notify_watch for file creations, right?
--
Stephen Smalley <***@tycho.nsa.gov>
National Security Agency
Chris Wright
2005-03-24 19:52:17 UTC
Permalink
Post by Stephen Smalley
Post by Stephen Smalley
Ok, I see what is happening. You call audit_attach_watch() from d_move,
but you will never hit an audit_notify_watch(), hence no audit data upon
renames until a subsequent write to the existing file (which never
happens for /etc/shadow, as it is always re-created and renamed for each
transaction). So a natural question is what else should be calling
audit_notify_watch besides permission, exec_permission_lite, and
may_delete? d_move? may_create?
I suppose may_create() won't help you, as the child has a negative
dentry at that point so you have no inode. You will have an inode upon
the subsequent d_instantiate, but can't tell that you are dealing with a
"just created" inode versus an already existing one, so you won't know
that you need to notify of a create. So you are back to post-create
style hooks for calling audit_notify_watch for file creations, right?
What was the problem with those, just hook proliferation?

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net
Stephen Smalley
2005-03-24 19:48:32 UTC
Permalink
Post by Chris Wright
Post by Stephen Smalley
I suppose may_create() won't help you, as the child has a negative
dentry at that point so you have no inode. You will have an inode upon
the subsequent d_instantiate, but can't tell that you are dealing with a
"just created" inode versus an already existing one, so you won't know
that you need to notify of a create. So you are back to post-create
style hooks for calling audit_notify_watch for file creations, right?
What was the problem with those, just hook proliferation?
I think that they are ok for notification (audit_notify_watch calls),
but they aren't safe for attaching watches (audit_attach_watch calls),
because the inode can be accessed via the dcache by another thread
before the post-create hooks run. Hence, he still wants to keep hooks
for attaching watches in places like d_instantiate, but the notification
hooks can occur later in the processing, like the dnotify hooks.
--
Stephen Smalley <***@tycho.nsa.gov>
National Security Agency
Timothy R. Chavez
2005-03-24 20:48:54 UTC
Permalink
Post by Stephen Smalley
Post by Stephen Smalley
Then I start to see some audit messages during passwd, but I shouldn't
have to request read access auditing in order to see modifications
(especially as that will generate a lot more data, e.g. upon every
authentication program's use of /etc/shadow).
Ok, I see what is happening. You call audit_attach_watch() from d_move,
but you will never hit an audit_notify_watch(), hence no audit data upon
renames until a subsequent write to the existing file (which never
happens for /etc/shadow, as it is always re-created and renamed for each
transaction). So a natural question is what else should be calling
audit_notify_watch besides permission, exec_permission_lite, and
may_delete? d_move? may_create?
Though the dnotify assessment happens in a later message, I think this is a
good approach. But, yes, back to expected messages missing, I was missing
records too for unlink() -- What is boiled down to was I was making the wrong
statement when filtering in audit_notify_watch():

static inline int may_delete(struct inode *dir,struct dentry *victim, int
isdir)
{
....
audit_notify_watch(victim->d_inode, 0);
....
}

void audit_notify_watch(struct inode *inode, int mask)
{
.....
// Doh! The mask==0 for may_delete's hook, thus we bottom out here
if (!mask || (wentry->w_watch->perms && !(wentry->w_watch->perms&mask)))
return;
.....
}

This statement really should be:

if (mask && (wentry->w_watch->perms && !(wentry->w_watch->perms&mask)))

We treat a mask==0 as, "no permissions filtering needed", thus we don't need
to check for it in the w_watch->perms. We also treat a w_watch->perms==0
(which is the default) similarly as "no permissions filtering required".

-tim
Stephen Smalley
2005-03-25 13:04:28 UTC
Permalink
Post by Timothy R. Chavez
static inline int may_delete(struct inode *dir,struct dentry *victim, int
isdir)
{
....
audit_notify_watch(victim->d_inode, 0);
....
}
This brings up another issue I wanted to discuss - whether the watch
mask should be more general than rwea, e.g. should it be possible to
specify that I want to see "rename", "link", "unlink", etc. with
specific mask values for each of these operations. At that point, you
would be passing your own set of mask values to audit_notify_watch (with
a translation table for the normal permission mask).

Alternatively, you could just view "rename", "link", and "unlink" as
another form of write, so you could pass MAY_WRITE here.

With regard to additional hook placement for audit_notify_watch, I think
you likely do want to mirror the security*_post* hooks for file creation
(create, mkdir, mknod, symlink), rename, and link with
audit_notify_watch calls to perform notifications of such events. Then
you keep audit_attach_watch calls in the dcache routines to manage the
i_audit fields and avoid races. However, I think you need to check
whether you truly need all of the current hook placements in the dcache
routines or whether some of them are duplicative on the same code path,
e.g. do you need both __d_lookup and d_instantiate/d_splice_alias
hooked?
--
Stephen Smalley <***@tycho.nsa.gov>
National Security Agency
Stephen Smalley
2005-03-25 13:28:08 UTC
Permalink
Post by Stephen Smalley
With regard to additional hook placement for audit_notify_watch, I think
you likely do want to mirror the security*_post* hooks for file creation
(create, mkdir, mknod, symlink), rename, and link with
audit_notify_watch calls to perform notifications of such events. Then
you keep audit_attach_watch calls in the dcache routines to manage the
i_audit fields and avoid races. However, I think you need to check
whether you truly need all of the current hook placements in the dcache
routines or whether some of them are duplicative on the same code path,
e.g. do you need both __d_lookup and d_instantiate/d_splice_alias
hooked?
I don't see what the __d_lookup hook buys you, can you explain? The
d_instantiate and d_splice_alias hooks ensure that you attach watches to
inodes when they are looked up or created before they can be accessed by
another thread via the dcache (since you call the hook before releasing
the dcache lock). What do you need the __d_lookup hook for?
--
Stephen Smalley <***@tycho.nsa.gov>
National Security Agency
Timothy R. Chavez
2005-03-25 16:46:29 UTC
Permalink
Post by Stephen Smalley
Post by Stephen Smalley
With regard to additional hook placement for audit_notify_watch, I think
you likely do want to mirror the security*_post* hooks for file creation
(create, mkdir, mknod, symlink), rename, and link with
audit_notify_watch calls to perform notifications of such events. Then
you keep audit_attach_watch calls in the dcache routines to manage the
i_audit fields and avoid races. However, I think you need to check
whether you truly need all of the current hook placements in the dcache
routines or whether some of them are duplicative on the same code path,
e.g. do you need both __d_lookup and d_instantiate/d_splice_alias
hooked?
I don't see what the __d_lookup hook buys you, can you explain? The
d_instantiate and d_splice_alias hooks ensure that you attach watches to
inodes when they are looked up or created before they can be accessed by
another thread via the dcache (since you call the hook before releasing
the dcache lock). What do you need the __d_lookup hook for?
Hello,

I've kind of struggled with this one and am was a bit reluctant to add it.
Perhaps my logic is right, bu there's a better placement. The reason why the
hook was placed in __d_lookup() was to auto-update a hardlink with the
correct watch. The only way a hardlink will generate audit records is if
it's inode is being watched and the only way the inode can be watched is if
one of it's dentry's is at a watch point. So, take this scenario for example
-- this is how we should currently perform:

watch /tmp/foo
# We get a record for "foo"
cat /tmp/foo
ln /tmp/foo /tmp/bar
watch /tmp/bar
# We get a record for "foo"
cat /tmp/bar
rm /tmp/foo
# __d_lookup() does its magic and we get a record for "bar"
cat /tmp/bar

This might be a completely assanine case now....

If /tmp/bar has a "natural" watch (ie: there's a watch on /tmp for "bar"),
and /tmp/bar is a hardlink to /tmp/foo, then if /tmp/foo left (it was the
dentry for the inode that was fufilling the watch criteria), then the inode
would lose its "foo" watch and /tmp/bar would be watchless. But /tmp/bar is
being watched! So we use __d_lookup() to automatically detect whether or not
there is a "natural" watch in its current path that it can attach too. This
could get a little more interesting if there are N hardlinks to /tmp/foo
which are all being watched. Then, any of the /tmp/XN's could claim the
inode and attach their watch -- it depends on what /tmp/XN calls the
audiit_attach_watch() first.

Comments? Opinions?

-tim
Stephen Smalley
2005-03-25 17:25:54 UTC
Permalink
Post by Timothy R. Chavez
I've kind of struggled with this one and am was a bit reluctant to add it.
Perhaps my logic is right, bu there's a better placement. The reason why the
hook was placed in __d_lookup() was to auto-update a hardlink with the
correct watch. The only way a hardlink will generate audit records is if
it's inode is being watched and the only way the inode can be watched is if
one of it's dentry's is at a watch point. So, take this scenario for example
watch /tmp/foo
# We get a record for "foo"
cat /tmp/foo
So at this point, the inode has hit an audit_attach_watch() upon the
d_splice_alias() call during lookup and is marked as requiring audit,
right?
Post by Timothy R. Chavez
ln /tmp/foo /tmp/bar
Same inode, so it is still marked as requiring audit (but you need an
audit_notify_watch call at the tail of vfs_link to generate one for the
link creation).
Post by Timothy R. Chavez
watch /tmp/bar
Why? We aren't trying to defend against stupid or malicious admins
here.
Post by Timothy R. Chavez
# We get a record for "foo"
cat /tmp/bar
rm /tmp/foo
# __d_lookup() does its magic and we get a record for "bar"
cat /tmp/bar
Wait. We are still dealing with the same inode at this point. Why was
its i_audit field changed by the delete if there are other hard links
present? Don't we want to preserve auditing on the inode in such a
case, irrespective of whether /tmp/bar had a watch or not, just because
of the original watch on /tmp/foo?
--
Stephen Smalley <***@tycho.nsa.gov>
National Security Agency
Stephen Smalley
2005-03-25 17:39:26 UTC
Permalink
Post by Stephen Smalley
Post by Timothy R. Chavez
# We get a record for "foo"
cat /tmp/bar
rm /tmp/foo
# __d_lookup() does its magic and we get a record for "bar"
cat /tmp/bar
Wait. We are still dealing with the same inode at this point. Why was
its i_audit field changed by the delete if there are other hard links
present? Don't we want to preserve auditing on the inode in such a
case, irrespective of whether /tmp/bar had a watch or not, just because
of the original watch on /tmp/foo?
Ok, I guess not, as the inode will eventually become "unwatched" anyway
if it is evicted and then re-looked up as /tmp/bar or if the system
reboots. The particular scenario seems a bit contrived, but the more
general case you mention later (file has multiple hard links, all with
watches defined a priori, the inode picks up the watch for whatever name
is used first to access it, and then that link is deleted) does seem
like a legitimate concern, as you don't want to lose auditing for
accessing the inode via the other links at that point.
--
Stephen Smalley <***@tycho.nsa.gov>
National Security Agency
Timothy R. Chavez
2005-03-25 21:02:14 UTC
Permalink
On Friday 25 March 2005 11:25 am, Stephen Smalley wrote:
<snip>
Post by Stephen Smalley
Wait. We are still dealing with the same inode at this point. Why was
its i_audit field changed by the delete if there are other hard links
present? Don't we want to preserve auditing on the inode in such a
case, irrespective of whether /tmp/bar had a watch or not, just because
of the original watch on /tmp/foo?
Just to be clear on this, I think the way I'm doing it now gives us the a
fairly clear statement on what constitutes "being watched". Iff a dentry
exists such that dentry.d_name->name eq watchlist_entry.name, will the
dentry->d_inode be able to hold a watch.

Because dentry->d_inode is sharable, this allows for implicit watches at other
locations in the filesystem. The watch on a hardlink is not explicit. The
only reason we're watching the hardlink, /tmp/bar, is because it
shares /tmp/foo's inode; /tmp/foo exists. Once /tmp/foo is gone, the watch
point is vacated, and we're no longer interested in receiving audit records.

If /tmp/foo comes back, /tmp/bar will not be watched... is this a shortcoming?
I don't look at it like this. I don't think we should make any assumptions
of the kind (it makes things simpler :-)). I think this scenario lumps into
the same category as mounting over /tmp with a new device and then
automatically watching /tmp/foo because we've specified this path.
The /tmp/foo on this new device might be completely different (ie: it was
targeted by the administrator for audit) and something we don't care about.
When the administrator watches, he/she does so explicitly on a given device,
in a given namespace.

Through all of this we can now say that a watch may only ever be attached to
one inode and that inode must have a dentry associated with it who's name
appears in its parent inode's watchlist.

I'm under the impression that this is acceptable for a CAPP evaluation.
Klaus??

-tim

Stephen Smalley
2005-03-25 18:16:16 UTC
Permalink
Post by Timothy R. Chavez
I've kind of struggled with this one and am was a bit reluctant to add it.
Perhaps my logic is right, bu there's a better placement. The reason why the
hook was placed in __d_lookup() was to auto-update a hardlink with the
correct watch. The only way a hardlink will generate audit records is if
it's inode is being watched and the only way the inode can be watched is if
one of it's dentry's is at a watch point. So, take this scenario for example
Are you also relying on the __d_lookup() hook to properly update/clear
i_audit->wentry fields for inodes already in the dcache for removed
watches (i.e. after an auditctl -W /tmp/foo, the subsequent
audit_attach_watch call by __d_lookup is what will reset the i_audit
field for /tmp/foo)?
--
Stephen Smalley <***@tycho.nsa.gov>
National Security Agency
Timothy R. Chavez
2005-03-25 18:54:16 UTC
Permalink
Post by Stephen Smalley
Post by Timothy R. Chavez
I've kind of struggled with this one and am was a bit reluctant to add
it. Perhaps my logic is right, bu there's a better placement. The reason
why the hook was placed in __d_lookup() was to auto-update a hardlink
with the correct watch. The only way a hardlink will generate audit
records is if it's inode is being watched and the only way the inode can
be watched is if one of it's dentry's is at a watch point. So, take this
Are you also relying on the __d_lookup() hook to properly update/clear
i_audit->wentry fields for inodes already in the dcache for removed
watches (i.e. after an auditctl -W /tmp/foo, the subsequent
audit_attach_watch call by __d_lookup is what will reset the i_audit
field for /tmp/foo)?
Yes, that is correct. So it is also used to clear out any references to an
unhashed wentry. So when we look for a wentry and we get NULL back, we first
put back our reference to the unhashed wentry (and provided all other
references are dropped, put our memory back into the cache) and
audit_wentry_get(NULL).

-tim
Timothy R. Chavez
2005-03-25 17:07:50 UTC
Permalink
Post by Stephen Smalley
Alternatively, you could just view "rename", "link", and "unlink" as
another form of write, so you could pass MAY_WRITE here.
I think we should keep it simple for the time being and go with this.
Post by Stephen Smalley
With regard to additional hook placement for audit_notify_watch, I think
you likely do want to mirror the security*_post* hooks for file creation
(create, mkdir, mknod, symlink), rename, and link with
audit_notify_watch calls to perform notifications of such events.
I'm not entirely sure we should hook mknod or symlink. We're not making any
claims about the watchability of a device, or symlink with this code. Do you
agree?

-tim
Stephen Smalley
2005-03-25 17:05:54 UTC
Permalink
Post by Timothy R. Chavez
I'm not entirely sure we should hook mknod or symlink. We're not making any
claims about the watchability of a device, or symlink with this code. Do you
agree?
We are only talking about post hooks to generate audit messages via
audit_notify_watch() if the inode has previously been marked by
audit_attach_watch(). Given your other hooks, it should already be
possible to audit reads and writes to device nodes (since a watch should
be possible to attach using your existing hooks in
d_instantiate/d_splice_alias and notifications should be generated using
your hook in permission), so why not allow auditing of creates as well?
Given that udev makes /dev dynamic, it seems like watches might be
important there as well, eh?
--
Stephen Smalley <***@tycho.nsa.gov>
National Security Agency
Stephen Smalley
2005-03-25 17:15:41 UTC
Permalink
Post by Stephen Smalley
We are only talking about post hooks to generate audit messages via
audit_notify_watch() if the inode has previously been marked by
audit_attach_watch(). Given your other hooks, it should already be
possible to audit reads and writes to device nodes (since a watch should
be possible to attach using your existing hooks in
d_instantiate/d_splice_alias and notifications should be generated using
your hook in permission), so why not allow auditing of creates as well?
Given that udev makes /dev dynamic, it seems like watches might be
important there as well, eh?
As a trivial test of the ability to audit reads and writes to device
nodes already, I did:
auditctl -w /dev/null
and then did:
echo hello > /dev/null
As expected, this generated an audit record.

Hence, while it may be fine to omit symlinks, I see no reason to not
include an audit_notify_watch call at the end of vfs_mknod that allows
you to generate an audit record for device creations based on name, as
you can already attach watches to device nodes and generate audit for
opens on them.
--
Stephen Smalley <***@tycho.nsa.gov>
National Security Agency
Chris Wright
2005-03-25 17:24:37 UTC
Permalink
Post by Stephen Smalley
Post by Timothy R. Chavez
I'm not entirely sure we should hook mknod or symlink. We're not making any
claims about the watchability of a device, or symlink with this code. Do you
agree?
We are only talking about post hooks to generate audit messages via
audit_notify_watch() if the inode has previously been marked by
audit_attach_watch(). Given your other hooks, it should already be
possible to audit reads and writes to device nodes (since a watch should
be possible to attach using your existing hooks in
d_instantiate/d_splice_alias and notifications should be generated using
your hook in permission), so why not allow auditing of creates as well?
Given that udev makes /dev dynamic, it seems like watches might be
important there as well, eh?
I agree, I see no reason to exclude these. They're just inodes in the
filesystem.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net
Timothy R. Chavez
2005-03-24 20:55:56 UTC
Permalink
Post by Stephen Smalley
Post by Timothy R. Chavez
This is the latest patch.
- As it stands with your patch, alloc_inode() will leak memory if
audit_inode_alloc() succeeds but security_inode_alloc() fails because
nothing frees the audit data on the error handling path for that case.
Is this too bold of a statement:

fs/inode.c: inode_alloc

if (audit_inode_alloc(inode) && security_inode_alloc(inode)) {
....
}

I figure, that if either is unsuccessful, we're probably not in any shape to
audit anyway... and visa versa. Is this a reasonable assumption? Or is it
too assertive?

-tim
Timothy R. Chavez
2005-03-24 21:04:27 UTC
Permalink
Post by Timothy R. Chavez
Post by Stephen Smalley
Post by Timothy R. Chavez
This is the latest patch.
- As it stands with your patch, alloc_inode() will leak memory if
audit_inode_alloc() succeeds but security_inode_alloc() fails because
nothing frees the audit data on the error handling path for that case.
fs/inode.c: inode_alloc
if (audit_inode_alloc(inode) && security_inode_alloc(inode)) {
....
}
nevermind that isn't even the right statement. sorry

-tim
Timothy R. Chavez
2005-03-24 22:02:48 UTC
Permalink
Post by Stephen Smalley
Post by Timothy R. Chavez
This is the latest patch.
- As it stands with your patch, alloc_inode() will leak memory if
audit_inode_alloc() succeeds but security_inode_alloc() fails because
nothing frees the audit data on the error handling path for that case.
I took your advice on just initializing inode->i_audit to NULL like the
security field. So would this be reasonable Stephen?

if (audit_inode_free(inode) || security_inode_alloc(inode)) {
audit_inode_free(inode);
security_inode_free(inode);
if (inode->i_sb->s_op->destroy_inode)
inode->i_sb->s_op->destroy_inode(inode);
else
kmem_cache_free(inode_cachep, (inode));
return NULL;
}

This way, in either case where there could be leakage, we clean up the memory
before the inode goes bye-bye.

-tim
Timothy R. Chavez
2005-03-24 22:21:23 UTC
Permalink
Post by Timothy R. Chavez
Post by Stephen Smalley
Post by Timothy R. Chavez
This is the latest patch.
- As it stands with your patch, alloc_inode() will leak memory if
audit_inode_alloc() succeeds but security_inode_alloc() fails because
nothing frees the audit data on the error handling path for that case.
I took your advice on just initializing inode->i_audit to NULL like the
security field. So would this be reasonable Stephen?
if (audit_inode_free(inode) || security_inode_alloc(inode)) {
Errr audit_inode_alloc(inode)... man... I'm going to quit writing e-mail for
the remainder of the day... I think I've done enough damage to my reputation
for the time being:-)

-tim
Stephen Smalley
2005-03-25 13:10:57 UTC
Permalink
Post by Timothy R. Chavez
I took your advice on just initializing inode->i_audit to NULL like the
security field. So would this be reasonable Stephen?
if (audit_inode_free(inode) || security_inode_alloc(inode)) {
audit_inode_free(inode);
security_inode_free(inode);
if (inode->i_sb->s_op->destroy_inode)
inode->i_sb->s_op->destroy_inode(inode);
else
kmem_cache_free(inode_cachep, (inode));
return NULL;
}
This way, in either case where there could be leakage, we clean up the memory
before the inode goes bye-bye.
1,1 s/audit_inode_free/audit_inode_alloc/

You don't need the security_inode_free() call on the error handling
path. Consider the cases:
1) audit_inode_alloc() fails, so we don't call security_inode_alloc()
and both i_audit and i_security are NULL. The audit_inode_free() call
on the error path does no harm.
2) audit_inode_alloc() succeeds and security_inode_alloc() fails, so
i_audit is non-NULL and i_security is NULL. In this case, we need the
audit_inode_free() call on the error handling path.

In neither case do you need a security_inode_free() call on the error
handling path.
--
Stephen Smalley <***@tycho.nsa.gov>
National Security Agency
David Woodhouse
2005-03-25 14:54:45 UTC
Permalink
Both approaches ensure that an audit record is emitted whenever an
auditable inode is encountered, but the present approach yields two
separate audit records (one immediate from your hook and one upon
syscall exit) vs. a single unified record. What do we want? What do
others think?
All things being equal, I think I'd rather see the information added to
the audit_context and then dumped with everything else on syscall exit.
When doing the IPC patch I deliberately made the 'aux' list generic
enough that it could be used for this kind of thing.

But are there reasons why it's hard to do that here? Do we need to
report information in contexts where we can't allocate memory (or at
least can't deal with failure to do so)?
--
dwmw2
Stephen Smalley
2005-03-25 14:54:25 UTC
Permalink
Post by David Woodhouse
All things being equal, I think I'd rather see the information added to
the audit_context and then dumped with everything else on syscall exit.
When doing the IPC patch I deliberately made the 'aux' list generic
enough that it could be used for this kind of thing.
But are there reasons why it's hard to do that here? Do we need to
report information in contexts where we can't allocate memory (or at
least can't deal with failure to do so)?
I don't think so; I think all callers of audit_notify_watch() can sleep
at the point of the call (unlike callers of audit_attach_watch, which
must not sleep, but that only attaches watches; it doesn't do any audit
generation). Now for SELinux avc_audit, that would be an issue, because
it cannot perform blocking allocation or otherwise deal with failures.
--
Stephen Smalley <***@tycho.nsa.gov>
National Security Agency
David Woodhouse
2005-03-25 15:11:55 UTC
Permalink
Post by Stephen Smalley
I don't think so; I think all callers of audit_notify_watch() can sleep
at the point of the call (unlike callers of audit_attach_watch, which
must not sleep, but that only attaches watches; it doesn't do any audit
generation). Now for SELinux avc_audit, that would be an issue, because
it cannot perform blocking allocation or otherwise deal with failures.
Then we should probably be using audit_context.aux for it and reporting
it on syscall exit.
--
dwmw2
Loading...