/var/log/messages: backlog limit exceeded

Discussion:

Denise Garrett

2005-06-15 19:10:21 UTC

Howdy,

I am currently working with the attached test, config3_test, that I have
pasted into a text file below. Config3 (assertions 4 and 5) fail on
multiple platforms that contain audit-0.9.4-1, although they will pass
with earlier audits. When it is ran the messages file in var/log/messages
is filled with the following repeating lines during the problem cases.

Jun 15 09:36:05 xracer1 auditd: The audit daemon is exiting.
Jun 15 09:36:05 xracer1 kernel: audit: audit_backlog=257 >
audit_backlog_limit=256
Jun 15 09:36:06 xracer1 auditd: auditd startup failed
Jun 15 09:36:06 xracer1 kernel: audit: audit_lost=65593 audit_rate_limit=0
audit_backlog_limit=256
Jun 15 09:36:06 xracer1 kernel: audit: backlog limit exceeded
Jun 15 09:36:06 xracer1 kernel: audit: audit_backlog=257 >
audit_backlog_limit=256
Jun 15 09:36:06 xracer1 auditd: auditd startup succeeded
Jun 15 09:36:06 xracer1 kernel: audit: audit_lost=65594 audit_rate_limit=0
audit_backlog_limit=256
Jun 15 09:36:06 xracer1 kernel: audit: backlog limit exceeded
Jun 15 09:36:06 xracer1 kernel: audit: audit_backlog=257 >
audit_backlog_limit=256
Jun 15 09:36:06 xracer1 kernel: audit: audit_lost=65595 audit_rate_limit=0
audit_backlog_limit=256
Jun 15 09:36:06 xracer1 kernel: audit: backlog limit exceeded
Jun 15 09:36:06 xracer1 kernel: audit: audit_backlog=257 >
audit_backlog_limit=256
Jun 15 09:36:06 xracer1 kernel: audit: audit_lost=65596 audit_rate_limit=0
audit_backlog_limit=256
Jun 15 09:36:06 xracer1 kernel: audit: backlog limit exceeded

The problem persists even with changing the backlog limit. Here are the
results in /var/log/messages with different limits.

backlog limit < 8000: Jun 15 00:38:43 bracer3 kernel: audit: backlog
limit exceeded
Jun 15 00:38:43 bracer3 auditd[6013]: Audit daemon rotating log file
s
Jun 15 00:38:43 bracer3 kernel: audit: audit_backlog=8001 >
audit_backlog_limit=8000
backlog limit > 9000
Jun 15 00:48:29 bracer3 auditd: auditd shutdown failed
Jun 15 00:48:29 bracer3 auditd: Value -1 should only be numbers - line
10
Jun 15 00:48:29 bracer3 auditd: The audit daemon is exiting.
Jun 15 00:48:29 bracer3 auditd: auditd startup failed
Jun 15 00:48:36 bracer3 auditd: auditd startup succeeded
Jun 15 00:48:36 bracer3 auditd[6832]: Init complete, audit pid set to:
6832
Jun 15 00:48:37 bracer3 auditd[6832]: Audit daemon rotating log files
Jun 15 00:49:08 bracer3 last message repeated 109 times
Jun 15 00:49:17 bracer3 last message repeated 33 times
Jun 15 00:49:20 bracer3 auditd[6832]: The audit daemon is exiting.
Jun 15 00:49:21 bracer3 auditd: auditd shutdown succeeded
Jun 15 00:49:21 bracer3 kernel: audit: *NO* daemon at audit_pid=6832
Jun 15 00:49:21 bracer3 kernel: audit(1118814561.489:5030167):
auid=500 removed an audit rule
Jun 15 00:49:21 bracer3 kernel:
Jun 15 00:49:21 bracer3 kernel: audit(1118814561.693:5030173):
auid=500 removed an audit rule
Jun 15 00:49:21 bracer3 kernel:
Jun 15 00:49:21 bracer3 kernel: audit(1118814561.897:5030179):
auid=500 removed an audit rule
Jun 15 00:49:21 bracer3 kernel:
Jun 15 00:49:22 bracer3 kernel: audit(1118814562.101:5030185):
auid=500 removed an audit rule
Jun 15 00:49:22 bracer3 kernel:
Jun 15 00:49:22 bracer3 kernel: audit(1118814562.305:5030191):
auid=500 removed an audit rule
Jun 15 00:49:22 bracer3 kernel:
Jun 15 00:49:22 bracer3 kernel: audit(1118814562.509:5030197):
auid=500 removed an audit rule
Jun 15 00:49:22 bracer3 kernel:

When the commands are done manually for only assertion 4 it passes. This
is because assertion 3 causes the load that sends the messages to
/var/log/messages. Here is the loop and ruleset used by assertion 3.
for (lc1 = 0; lc1 < 2000; lc1++) {
syscall(__NR_mkdir,dirname,mode);
syscall(__NR_chmod,dirname,mode);
syscall(__NR_rmdir,dirname);
}

/* Create rules using auditctl. */
system("auditctl -a entry,always -S mkdir");
system("auditctl -a entry,always -S chmod");
system("auditctl -a entry,always -S rmdir");
system("auditctl -a exit,always -S mkdir");
system("auditctl -a exit,always -S chmod");
system("auditctl -a exit,always -S rmdir");

The line assertion 4 creates and searches for in /var/log/messages is
there but followed by many rows of the backlog limit messages pushing it
to the top of the file making it difficult to find.

Denise Garrett
***@us.ibm.com

David Woodhouse

2005-06-20 09:50:51 UTC

Permalink

Content-Type: text/html; charset="US-ASCII"
 Howdy, I am currently working with the attached test,
config3_test, that I have pasted into a text file below. Config3
(assertions 4 and 5) fail on multiple platforms that contain
audit-0.9.4-1, although they will pass with earlier audits. When it is
ran the messages file in var/log/messages is filled with the following
repeating lines during the problem cases.

I think the answer to this problem is to make the auditable processes
_wait_ until they can get a slot in the backlog, whenever that's
possible. Obviously, some callers of audit_log_start() cannot sleep, but
most of the high-bandwidth callers can...

Assuming this approach is satisfactory we may want to look for more
callers which can be converted to audit_log_start_wait(), and perhaps
give a few more slots to the atomic callers, so there's always space for
them.

--- linux-2.6.9/kernel/auditsc.c.orig 2005-06-20 00:21:52.000000000 +0100
+++ linux-2.6.9/kernel/auditsc.c 2005-06-20 01:14:05.000000000 +0100
@@ -728,7 +734,7 @@ static void audit_log_exit(struct audit_
struct audit_watch_info *winfo;
struct hlist_node *pos;

- ab = audit_log_start(context, AUDIT_SYSCALL);
+ ab = audit_log_start_wait(context, AUDIT_SYSCALL);
if (!ab)
return; /* audit_panic has been called */
audit_log_format(ab, "arch=%x syscall=%d",
@@ -759,7 +765,7 @@ static void audit_log_exit(struct audit_
audit_log_end(ab);
for (aux = context->aux; aux; aux = aux->next) {

- ab = audit_log_start(context, aux->type);
+ ab = audit_log_start_wait(context, aux->type);
if (!ab)
continue; /* audit_panic has been called */

@@ -801,7 +807,7 @@ static void audit_log_exit(struct audit_
MAJOR(axi->dev), MINOR(axi->dev),
MAJOR(axi->rdev), MINOR(axi->rdev));
hlist_for_each_entry(winfo, pos, &axi->watches, node) {
- sub_ab = audit_log_start(context, AUDIT_FS_WATCH);
+ sub_ab = audit_log_start_wait(context, AUDIT_FS_WATCH);
if (!sub_ab)
return; /* audit_panic has been called */
audit_log_format(sub_ab, "watch_inode=%lu", axi->ino);
@@ -819,14 +825,14 @@ static void audit_log_exit(struct audit_
}

if (context->pwd && context->pwdmnt) {
- ab = audit_log_start(context, AUDIT_CWD);
+ ab = audit_log_start_wait(context, AUDIT_CWD);
if (ab) {
audit_log_d_path(ab, "cwd=", context->pwd, context->pwdmnt);
audit_log_end(ab);
}
}
for (i = 0; i < context->name_count; i++) {
- ab = audit_log_start(context, AUDIT_PATH);
+ ab = audit_log_start_wait(context, AUDIT_PATH);
if (!ab)
continue; /* audit_panic has been called */

--- linux-2.6.9/kernel/audit.c.orig 2005-06-20 00:21:52.000000000 +0100
+++ linux-2.6.9/kernel/audit.c 2005-06-20 01:14:41.000000000 +0100
@@ -106,6 +106,7 @@ static LIST_HEAD(audit_freelist);
static struct sk_buff_head audit_skb_queue;
static struct task_struct *kauditd_task;
static DECLARE_WAIT_QUEUE_HEAD(kauditd_wait);
+static DECLARE_WAIT_QUEUE_HEAD(audit_backlog_wait);

/* The netlink socket is only to be read by 1 CPU, which lets us assume
* that list additions and deletions, and watch insertions never happen
@@ -280,6 +281,7 @@ int kauditd_thread(void *dummy)

while (1) {
skb = skb_dequeue(&audit_skb_queue);
+ wake_up(&audit_backlog_wait);
if (skb) {
if (audit_pid) {
int err = netlink_unicast(audit_sock, skb, audit_pid, 0);
@@ -694,7 +696,8 @@ static inline void audit_get_stamp(struc
* syscall, then the syscall is marked as auditable and an audit record
* will be written at syscall exit. If there is no associated task, tsk
* should be NULL. */
-struct audit_buffer *audit_log_start(struct audit_context *ctx, int type)
+
+struct audit_buffer *__audit_log_start(struct audit_context *ctx, int type, int wait)
{
struct audit_buffer *ab = NULL;
struct timespec t;
@@ -703,8 +706,24 @@ struct audit_buffer *audit_log_start(str
if (!audit_initialized)
return NULL;

- if (audit_backlog_limit
+ while (audit_backlog_limit
&& skb_queue_len(&audit_skb_queue) > audit_backlog_limit) {
+ if (wait) {
+ int ret = 1;
+ /* Wait for auditd to drain the queue a little */
+ DECLARE_WAITQUEUE(wait, current);
+ set_current_state(TASK_INTERRUPTIBLE);
+ add_wait_queue(&audit_backlog_wait, &wait);
+
+ if (audit_backlog_limit &&
+ skb_queue_len(&audit_skb_queue) > audit_backlog_limit)
+ ret = schedule_timeout(HZ * 60);
+
+ __set_current_state(TASK_RUNNING);
+ remove_wait_queue(&audit_backlog_wait, &wait);
+ if (ret)
+ continue;
+ }
if (audit_rate_check())
printk(KERN_WARNING
"audit: audit_backlog=%d > "
@@ -715,7 +734,7 @@ struct audit_buffer *audit_log_start(str
return NULL;
}

- ab = audit_buffer_alloc(ctx, GFP_ATOMIC, type);
+ ab = audit_buffer_alloc(ctx, wait?GFP_KERNEL:GFP_ATOMIC, type);
if (!ab) {
audit_log_lost("out of memory in audit_log_start");
return NULL;
@@ -728,6 +747,16 @@ struct audit_buffer *audit_log_start(str
return ab;
}

+struct audit_buffer *audit_log_start(struct audit_context *ctx, int type)
+{
+ return __audit_log_start(ctx, type, 0);
+}
+
+struct audit_buffer *audit_log_start_wait(struct audit_context *ctx, int type)
+{
+ return __audit_log_start(ctx, type, 1);
+}
+
/**
* audit_expand - expand skb in the audit buffer
* @ab: audit_buffer
--- linux-2.6.9/include/linux/audit.h.orig 2005-06-20 01:15:24.000000000 +0100
+++ linux-2.6.9/include/linux/audit.h 2005-06-20 01:15:45.000000000 +0100
@@ -337,6 +337,7 @@ extern void audit_log(struct audit_
__attribute__((format(printf,3,4)));

extern struct audit_buffer *audit_log_start(struct audit_context *ctx,int type);
+extern struct audit_buffer *audit_log_start_wait(struct audit_context *ctx,int type);
extern void audit_log_format(struct audit_buffer *ab,
const char *fmt, ...)
__attribute__((format(printf,2,3)));

--
dwmw2

David Woodhouse

2005-06-20 15:27:07 UTC

Permalink

Post by David Woodhouse
Assuming this approach is satisfactory we may want to look for more
callers which can be converted to audit_log_start_wait(), and perhaps
give a few more slots to the atomic callers, so there's always space
for them.

Something like this...

--- kernel-2.6.9/linux-2.6.9/kernel/audit.c 2005-06-20 16:12:48.000000000 +0100
+++ kernel-2.6.9-running/linux-2.6.9/kernel/audit.c 2005-06-20 16:18:37.000000000 +0100
@@ -449,7 +449,7 @@ static int audit_receive_msg(struct sk_b
return -ESRCH;

if (audit_enabled && audit_filter_user(tsk, msg_type)) {
- ab = audit_log_start(NULL, msg_type);
+ ab = audit_log_start_wait(NULL, msg_type);
if (ab) {
audit_log_format(ab,
"user pid=%d uid=%u auid=%u msg='%.1024s'",
@@ -707,7 +707,7 @@ struct audit_buffer *__audit_log_start(s
return NULL;

while (audit_backlog_limit
- && skb_queue_len(&audit_skb_queue) > audit_backlog_limit) {
+ && skb_queue_len(&audit_skb_queue) > audit_backlog_limit + wait?0:5) {
if (wait) {
int ret = 1;
/* Wait for auditd to drain the queue a little */

--
dwmw2

David Woodhouse

2005-06-20 18:41:43 UTC

Permalink

+ && skb_queue_len(&audit_skb_queue) > audit_backlog_limit + wait?0:5) {

Every $%"@! time...

- && skb_queue_len(&audit_skb_queue) > audit_backlog_limit + wait?0:5) {
+ && skb_queue_len(&audit_skb_queue) > audit_backlog_limit + (wait?0:5)) {

--
dwmw2