Discussion:
[PATCH 00/12] RFC: steps to make audit pid namespace-safe
(too old to reply)
Richard Guy Briggs
2013-08-20 21:31:52 UTC
Permalink
This patchset is a revival of some of Eric Biederman's work to make audit
pid-namespace-safe.

In a couple of places, audit was printing PIDs in the task's pid namespace
rather than relative to the audit daemon's pid namespace, which currently is
init_pid_ns.

It also allows processes to log audit user messages in their own pid
namespaces, which was not previously permitted. Please see:
https://bugzilla.redhat.com/show_bug.cgi?id=947530
https://bugs.launchpad.net/ubuntu/+source/vsftpd/+bug/1160372
https://bugzilla.novell.com/show_bug.cgi?id=786024

Part of the cleanup here involves deprecating task->pid and task->tgid, which
are error-prone duplicates of the task->pids structure

The next step which I hope to add to this patchset will be to purge task->pid
and task->tgid from the rest of the kernel if possible. Once that is done,
task_pid_nr_init_ns() and task_tgid_nr_init_ns() that were introduced in patch
05/12 and used in patches 06/12 and 08/12 could be replaced with task_pid_nr()
and task_tgid_nr(). Eric B. did take a stab at that, but checking all the
subtleties will be non-trivial.

Does anyone have any opinions or better yet hard data on cache line misses
between pid_nr(struct pid*) and pid_nr_ns(struct pid*, &init_pid_ns)? I'd
like to see pid_nr() use pid_nr_ns(struct pid*, &init_pid_ns), or
pid_nr_init_ns() eliminated in favour of the original pid_nr(). pid_nr()
currently accesses the first level of the pid structure without having to
dereference the level number. If there is an actual speed difference, it could
be worth keeping, otherwise, I'd prefer to simplify that code.

Eric also had a patch to add a printk option to format a struct pid pointer
which was PID namespace-aware. I don't see the point, but I'll let him explain
it.

Discuss.

Eric W. Biederman (5):
audit: Kill the unused struct audit_aux_data_capset
audit: Simplify and correct audit_log_capset

Richard Guy Briggs (7):
audit: fix netlink portid naming and types
pid: get ppid pid_t of task in init_pid_ns safely
audit: convert PPIDs to the inital PID namespace.
pid: get pid_t of task in init_pid_ns correctly
audit: store audit_pid as a struct pid pointer
audit: anchor all pid references in the initial pid namespace
pid: modify task_pid_nr to work without task->pid.
pid: modify task_tgid_nr to work without task->tgid.
pid: rewrite task helper functions avoiding task->pid and task->tgid
pid: mark struct task const in helper functions

drivers/tty/tty_audit.c | 3 +-
include/linux/audit.h | 8 ++--
include/linux/pid.h | 6 +++
include/linux/sched.h | 81 ++++++++++++++++++++++++----------
kernel/audit.c | 76 +++++++++++++++++++------------
kernel/audit.h | 12 +++---
kernel/auditfilter.c | 35 +++++++++++----
kernel/auditsc.c | 36 ++++++---------
kernel/capability.c | 2 +-
kernel/pid.c | 4 +-
security/apparmor/audit.c | 7 +--
security/integrity/integrity_audit.c | 2 +-
security/lsm_audit.c | 11 +++--
security/tomoyo/audit.c | 2 +-
14 files changed, 177 insertions(+), 108 deletions(-)
Richard Guy Briggs
2013-08-20 21:31:53 UTC
Permalink
From: Eric W. Biederman <***@xmission.com>

Signed-off-by: "Eric W. Biederman" <***@xmission.com>
(cherry picked from commit 6904431d6b41190e42d6b94430b67cb4e7e6a4b7)
Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
kernel/auditsc.c | 6 ------
1 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 9845cb3..8c644c5 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -121,12 +121,6 @@ struct audit_aux_data_bprm_fcaps {
struct audit_cap_data new_pcap;
};

-struct audit_aux_data_capset {
- struct audit_aux_data d;
- pid_t pid;
- struct audit_cap_data cap;
-};
-
struct audit_tree_refs {
struct audit_tree_refs *next;
struct audit_chunk *c[31];
--
1.7.1
Richard Guy Briggs
2013-08-20 21:31:54 UTC
Permalink
Normally, netlink ports use the PID of the userspace process as the port ID.
If the PID is already in use by a port, the kernel will allocate another port
ID to avoid conflict. Re-name all references to netlink ports from pid to
portid to reflect this reality and avoid confusion with actual PIDs. Ports
use the __u32 type, so re-type all portids accordingly.

(This patch is very similar to ebiederman's 5deadd69)

Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
include/linux/audit.h | 2 +-
kernel/audit.c | 32 ++++++++++++++++----------------
kernel/audit.h | 8 ++++----
kernel/auditfilter.c | 18 ++++++++++--------
4 files changed, 31 insertions(+), 29 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 729a4d1..a3af0fa 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -462,7 +462,7 @@ extern int audit_update_lsm_rules(void);
/* Private API (for audit.c only) */
extern int audit_filter_user(int type);
extern int audit_filter_type(int type);
-extern int audit_receive_filter(int type, int pid, int seq,
+extern int audit_receive_filter(int type, __u32 portid, int seq,
void *data, size_t datasz);
extern int audit_enabled;
#else /* CONFIG_AUDIT */
diff --git a/kernel/audit.c b/kernel/audit.c
index 91e53d0..2476334 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -94,7 +94,7 @@ static int audit_failure = AUDIT_FAIL_PRINTK;
* the portid to use to send netlink messages to that process.
*/
int audit_pid;
-static int audit_nlk_portid;
+static __u32 audit_nlk_portid;

/* If audit_rate_limit is non-zero, limit the rate of sending audit records
* to that number per second. This prevents DoS attacks, but results in
@@ -165,15 +165,15 @@ struct audit_buffer {
};

struct audit_reply {
- int pid;
+ __u32 portid;
struct sk_buff *skb;
};

-static void audit_set_pid(struct audit_buffer *ab, pid_t pid)
+static void audit_set_portid(struct audit_buffer *ab, __u32 portid)
{
if (ab) {
struct nlmsghdr *nlh = nlmsg_hdr(ab->skb);
- nlh->nlmsg_pid = pid;
+ nlh->nlmsg_pid = portid;
}
}

@@ -472,7 +472,7 @@ static int kauditd_thread(void *dummy)
int audit_send_list(void *_dest)
{
struct audit_netlink_list *dest = _dest;
- int pid = dest->pid;
+ __u32 portid = dest->portid;
struct sk_buff *skb;

/* wait for parent to finish and send an ACK */
@@ -480,14 +480,14 @@ int audit_send_list(void *_dest)
mutex_unlock(&audit_cmd_mutex);

while ((skb = __skb_dequeue(&dest->q)) != NULL)
- netlink_unicast(audit_sock, skb, pid, 0);
+ netlink_unicast(audit_sock, skb, portid, 0);

kfree(dest);

return 0;
}

-struct sk_buff *audit_make_reply(int pid, int seq, int type, int done,
+struct sk_buff *audit_make_reply(__u32 portid, int seq, int type, int done,
int multi, const void *payload, int size)
{
struct sk_buff *skb;
@@ -500,7 +500,7 @@ struct sk_buff *audit_make_reply(int pid, int seq, int type, int done,
if (!skb)
return NULL;

- nlh = nlmsg_put(skb, pid, seq, t, size, flags);
+ nlh = nlmsg_put(skb, portid, seq, t, size, flags);
if (!nlh)
goto out_kfree_skb;
data = nlmsg_data(nlh);
@@ -521,13 +521,13 @@ static int audit_send_reply_thread(void *arg)

/* Ignore failure. It'll only happen if the sender goes away,
because our timeout is set to infinite. */
- netlink_unicast(audit_sock, reply->skb, reply->pid, 0);
+ netlink_unicast(audit_sock, reply->skb, reply->portid, 0);
kfree(reply);
return 0;
}
/**
* audit_send_reply - send an audit reply message via netlink
- * @pid: process id to send reply to
+ * @portid: netlink port to which to send reply
* @seq: sequence number
* @type: audit message type
* @done: done (last) flag
@@ -535,11 +535,11 @@ static int audit_send_reply_thread(void *arg)
* @payload: payload data
* @size: payload size
*
- * Allocates an skb, builds the netlink message, and sends it to the pid.
+ * Allocates an skb, builds the netlink message, and sends it to the port id.
* No failure notifications.
*/
-static void audit_send_reply(int pid, int seq, int type, int done, int multi,
- const void *payload, int size)
+static void audit_send_reply(__u32 portid, int seq, int type, int done,
+ int multi, const void *payload, int size)
{
struct sk_buff *skb;
struct task_struct *tsk;
@@ -549,11 +549,11 @@ static void audit_send_reply(int pid, int seq, int type, int done, int multi,
if (!reply)
return;

- skb = audit_make_reply(pid, seq, type, done, multi, payload, size);
+ skb = audit_make_reply(portid, seq, type, done, multi, payload, size);
if (!skb)
goto out;

- reply->pid = pid;
+ reply->portid = portid;
reply->skb = skb;

tsk = kthread_run(audit_send_reply_thread, reply, "audit_send_reply");
@@ -727,7 +727,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
size--;
audit_log_n_untrustedstring(ab, data, size);
}
- audit_set_pid(ab, NETLINK_CB(skb).portid);
+ audit_set_portid(ab, NETLINK_CB(skb).portid);
audit_log_end(ab);
}
break;
diff --git a/kernel/audit.h b/kernel/audit.h
index 123c9b7..36edcf5 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -237,13 +237,13 @@ extern int audit_uid_comparator(kuid_t left, u32 op, kuid_t right);
extern int audit_gid_comparator(kgid_t left, u32 op, kgid_t right);
extern int parent_len(const char *path);
extern int audit_compare_dname_path(const char *dname, const char *path, int plen);
-extern struct sk_buff * audit_make_reply(int pid, int seq, int type,
- int done, int multi,
- const void *payload, int size);
+extern struct sk_buff *audit_make_reply(__u32 portid, int seq, int type,
+ int done, int multi,
+ const void *payload, int size);
extern void audit_panic(const char *message);

struct audit_netlink_list {
- int pid;
+ __u32 portid;
struct sk_buff_head q;
};

diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
index f7aee8b..381d3de 100644
--- a/kernel/auditfilter.c
+++ b/kernel/auditfilter.c
@@ -971,7 +971,7 @@ out:
}

/* List rules using struct audit_rule_data. */
-static void audit_list_rules(int pid, int seq, struct sk_buff_head *q)
+static void audit_list_rules(__u32 portid, int seq, struct sk_buff_head *q)
{
struct sk_buff *skb;
struct audit_krule *r;
@@ -986,14 +986,15 @@ static void audit_list_rules(int pid, int seq, struct sk_buff_head *q)
data = audit_krule_to_data(r);
if (unlikely(!data))
break;
- skb = audit_make_reply(pid, seq, AUDIT_LIST_RULES, 0, 1,
- data, sizeof(*data) + data->buflen);
+ skb = audit_make_reply(portid, seq, AUDIT_LIST_RULES,
+ 0, 1, data,
+ sizeof(*data) + data->buflen);
if (skb)
skb_queue_tail(q, skb);
kfree(data);
}
}
- skb = audit_make_reply(pid, seq, AUDIT_LIST_RULES, 1, 1, NULL, 0);
+ skb = audit_make_reply(portid, seq, AUDIT_LIST_RULES, 1, 1, NULL, 0);
if (skb)
skb_queue_tail(q, skb);
}
@@ -1023,12 +1024,13 @@ static void audit_log_rule_change(char *action, struct audit_krule *rule, int re
/**
* audit_receive_filter - apply all rules to the specified message type
* @type: audit message type
- * @pid: target pid for netlink audit messages
+ * @portid: target port id for netlink audit messages
* @seq: netlink audit message sequence (serial) number
* @data: payload data
* @datasz: size of payload data
*/
-int audit_receive_filter(int type, int pid, int seq, void *data, size_t datasz)
+int audit_receive_filter(int type, __u32 portid, int seq, void *data,
+ size_t datasz)
{
struct task_struct *tsk;
struct audit_netlink_list *dest;
@@ -1046,11 +1048,11 @@ int audit_receive_filter(int type, int pid, int seq, void *data, size_t datasz)
dest = kmalloc(sizeof(struct audit_netlink_list), GFP_KERNEL);
if (!dest)
return -ENOMEM;
- dest->pid = pid;
+ dest->portid = portid;
skb_queue_head_init(&dest->q);

mutex_lock(&audit_filter_mutex);
- audit_list_rules(pid, seq, &dest->q);
+ audit_list_rules(portid, seq, &dest->q);
mutex_unlock(&audit_filter_mutex);

tsk = kthread_run(audit_send_list, dest, "audit_send_list");
--
1.7.1
Richard Guy Briggs
2013-08-20 21:31:55 UTC
Permalink
Added the functions
task_ppid()
task_ppid_nr_ns()
task_ppid_nr_init_ns()
to safely abstract the lookup of the PPID (real_parent's tgid) of a process,
including rcu locking, in any required pid namespace. This provides an
alternative to sys_getppid(), which is relative to the child process' pid
namespace.

(informed by ebiederman's 6c621b7e)
Cc: ***@vger.kernel.org
Cc: Eric W. Biederman <***@xmission.com>
Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
include/linux/sched.h | 23 +++++++++++++++++++++++
1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d722490..ec04090 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1448,6 +1448,11 @@ static inline struct pid *task_session(struct task_struct *task)
return task->group_leader->pids[PIDTYPE_SID].pid;
}

+static inline struct pid *task_ppid(struct task_struct *task)
+{
+ return task_tgid(rcu_dereference(current->real_parent));
+}
+
struct pid_namespace;

/*
@@ -1496,6 +1501,24 @@ static inline pid_t task_tgid_vnr(struct task_struct *tsk)
}


+static inline pid_t task_ppid_nr_ns(struct task_struct *tsk,
+ struct pid_namespace *ns)
+{
+ pid_t pid;
+
+ rcu_read_lock();
+ pid = pid_nr_ns(task_ppid(current), ns);
+ rcu_read_unlock();
+
+ return pid;
+}
+
+static inline pid_t task_ppid_nr_init_ns(struct task_struct *tsk)
+{
+ return task_ppid_nr_ns(tsk, &init_pid_ns);
+}
+
+
static inline pid_t task_pgrp_nr_ns(struct task_struct *tsk,
struct pid_namespace *ns)
{
--
1.7.1
Oleg Nesterov
2013-08-27 17:21:55 UTC
Permalink
Post by Richard Guy Briggs
Added the functions
task_ppid()
task_ppid_nr_ns()
task_ppid_nr_init_ns()
to safely abstract the lookup of the PPID
but it is not safe.
Post by Richard Guy Briggs
+static inline struct pid *task_ppid(struct task_struct *task)
+{
+ return task_tgid(rcu_dereference(current->real_parent));
^^^^^^^
task?
Post by Richard Guy Briggs
+static inline pid_t task_ppid_nr_ns(struct task_struct *tsk,
+ struct pid_namespace *ns)
+{
+ pid_t pid;
+
+ rcu_read_lock();
+ pid = pid_nr_ns(task_ppid(current), ns);
^^^^^^^
again.
Post by Richard Guy Briggs
+ rcu_read_unlock();
And why this is safe?

rcu_read_lock() can't help if tsk was already dead _before_ it takes
the rcu lock. ->real_parent can point the already freed/reused/unmapped
memory.

This is safe if, for example, the caller alredy holds rcu_read_lock()
and tsk was found by find_task_by*(), or tsk is current.

Richard, just in case... I am going to vacation, I will be completely
offline till Sep 10.

Oleg.
Richard Guy Briggs
2013-08-30 19:56:46 UTC
Permalink
Post by Oleg Nesterov
Post by Richard Guy Briggs
Added the functions
task_ppid()
task_ppid_nr_ns()
task_ppid_nr_init_ns()
to safely abstract the lookup of the PPID
but it is not safe.
Post by Richard Guy Briggs
+static inline struct pid *task_ppid(struct task_struct *task)
+{
+ return task_tgid(rcu_dereference(current->real_parent));
^^^^^^^
task?
Yup, thanks for those two catches.
Post by Oleg Nesterov
Post by Richard Guy Briggs
+ rcu_read_unlock();
And why this is safe?
rcu_read_lock() can't help if tsk was already dead _before_ it takes
the rcu lock. ->real_parent can point the already freed/reused/unmapped
memory.
Does it not bump a refcount if it is holding a pointer to it? So the
parent task might be dead, but it won't cause a pointer dereference
issue.
Post by Oleg Nesterov
This is safe if, for example, the caller alredy holds rcu_read_lock()
and tsk was found by find_task_by*(), or tsk is current.
Fair enough, I'll have a more careful look at this. Thanks.

Most of the instances are current, but the one called from apparmour is
stored. I've just learned that this is bad and someone else just chimed
in that they have a patch to remove it...

So what is a reliable way of keeping a reference to a task? I had
assumed that the best way was to keep a pointer to its task_struct,
making sure its refcount had been bumped by something like
get_task_struct(). Another way would be to do the same with its struct
pid. The third that I'm trying to avoid is using its init_pid_namespace
pid_t since it could refer to a completely different task if the pid_t
rolls over.
Post by Oleg Nesterov
Oleg.
- RGB

--
Richard Guy Briggs <***@redhat.com>
Senior Software Engineer
Kernel Security
AMER ENG Base Operating Systems
Remote, Ottawa, Canada
Voice: +1.647.777.2635
Internal: (81) 32635
Alt: +1.613.693.0684x3545
John Johansen
2013-08-30 20:37:09 UTC
Permalink
Post by Richard Guy Briggs
Post by Oleg Nesterov
Post by Richard Guy Briggs
Added the functions
task_ppid()
task_ppid_nr_ns()
task_ppid_nr_init_ns()
to safely abstract the lookup of the PPID
but it is not safe.
Post by Richard Guy Briggs
+static inline struct pid *task_ppid(struct task_struct *task)
+{
+ return task_tgid(rcu_dereference(current->real_parent));
^^^^^^^
task?
Yup, thanks for those two catches.
Post by Oleg Nesterov
Post by Richard Guy Briggs
+ rcu_read_unlock();
And why this is safe?
rcu_read_lock() can't help if tsk was already dead _before_ it takes
the rcu lock. ->real_parent can point the already freed/reused/unmapped
memory.
Does it not bump a refcount if it is holding a pointer to it? So the
parent task might be dead, but it won't cause a pointer dereference
issue.
Post by Oleg Nesterov
This is safe if, for example, the caller alredy holds rcu_read_lock()
and tsk was found by find_task_by*(), or tsk is current.
Fair enough, I'll have a more careful look at this. Thanks.
Most of the instances are current, but the one called from apparmour is
stored. I've just learned that this is bad and someone else just chimed
in that they have a patch to remove it...
the apparmor case isn't actually stored long term. The stored task will be
a parameter that was passed into an lsm hook and the buffer that it is
stored in dies before the hook is done. Its temporarily stored in the
struct so that it can be passed into the lsm_audit fn, and printed into an
allocated audit buffer. The text version in the audit buffer is what will
exist beyond the hook.

There are three patches, I'll reply them below once I have finished rebasing
them to apply to the current tree instead of my dev tree.
John Johansen
2013-08-30 22:41:03 UTC
Permalink
Mediation is based off of the cred but auditing includes the current
task which may not be related to the actual request.

Signed-off-by: John Johansen <***@canonical.com>
---
security/apparmor/capability.c | 15 +++++----------
security/apparmor/domain.c | 2 +-
security/apparmor/include/capability.h | 5 ++---
security/apparmor/include/ipc.h | 4 ++--
security/apparmor/ipc.c | 9 ++++-----
security/apparmor/lsm.c | 2 +-
6 files changed, 15 insertions(+), 22 deletions(-)

diff --git a/security/apparmor/capability.c b/security/apparmor/capability.c
index 887a5e9..98a73eb 100644
--- a/security/apparmor/capability.c
+++ b/security/apparmor/capability.c
@@ -48,8 +48,7 @@ static void audit_cb(struct audit_buffer *ab, void *va)

/**
* audit_caps - audit a capability
- * @profile: profile confining task (NOT NULL)
- * @task: task capability test was performed against (NOT NULL)
+ * @profile: profile being tested for confinement (NOT NULL)
* @cap: capability tested
* @error: error code returned by test
*
@@ -58,8 +57,7 @@ static void audit_cb(struct audit_buffer *ab, void *va)
*
* Returns: 0 or sa->error on success, error code on failure
*/
-static int audit_caps(struct aa_profile *profile, struct task_struct *task,
- int cap, int error)
+static int audit_caps(struct aa_profile *profile, int cap, int error)
{
struct audit_cache *ent;
int type = AUDIT_APPARMOR_AUTO;
@@ -68,7 +66,6 @@ static int audit_caps(struct aa_profile *profile, struct task_struct *task,
sa.type = LSM_AUDIT_DATA_CAP;
sa.aad = &aad;
sa.u.cap = cap;
- sa.aad->tsk = task;
sa.aad->op = OP_CAPABLE;
sa.aad->error = error;

@@ -119,8 +116,7 @@ static int profile_capable(struct aa_profile *profile, int cap)

/**
* aa_capable - test permission to use capability
- * @task: task doing capability test against (NOT NULL)
- * @profile: profile confining @task (NOT NULL)
+ * @profile: profile being tested against (NOT NULL)
* @cap: capability to be tested
* @audit: whether an audit record should be generated
*
@@ -128,8 +124,7 @@ static int profile_capable(struct aa_profile *profile, int cap)
*
* Returns: 0 on success, or else an error code.
*/
-int aa_capable(struct task_struct *task, struct aa_profile *profile, int cap,
- int audit)
+int aa_capable(struct aa_profile *profile, int cap, int audit)
{
int error = profile_capable(profile, cap);

@@ -139,5 +134,5 @@ int aa_capable(struct task_struct *task, struct aa_profile *profile, int cap,
return error;
}

- return audit_caps(profile, task, cap, error);
+ return audit_caps(profile, cap, error);
}
diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c
index 01b7bd6..f037c57 100644
--- a/security/apparmor/domain.c
+++ b/security/apparmor/domain.c
@@ -75,7 +75,7 @@ static int may_change_ptraced_domain(struct task_struct *task,
if (!tracer || unconfined(tracerp))
goto out;

- error = aa_may_ptrace(tracer, tracerp, to_profile, PTRACE_MODE_ATTACH);
+ error = aa_may_ptrace(tracerp, to_profile, PTRACE_MODE_ATTACH);

out:
rcu_read_unlock();
diff --git a/security/apparmor/include/capability.h b/security/apparmor/include/capability.h
index c24d295..e4fea19 100644
--- a/security/apparmor/include/capability.h
+++ b/security/apparmor/include/capability.h
@@ -4,7 +4,7 @@
* This file contains AppArmor capability mediation definitions.
*
* Copyright (C) 1998-2008 Novell/SUSE
- * Copyright 2009-2010 Canonical Ltd.
+ * Copyright 2009-2013 Canonical Ltd.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License as
@@ -34,8 +34,7 @@ struct aa_caps {
kernel_cap_t extended;
};

-int aa_capable(struct task_struct *task, struct aa_profile *profile, int cap,
- int audit);
+int aa_capable(struct aa_profile *profile, int cap, int audit);

static inline void aa_free_cap_rules(struct aa_caps *caps)
{
diff --git a/security/apparmor/include/ipc.h b/security/apparmor/include/ipc.h
index aeda0fb..288ca76 100644
--- a/security/apparmor/include/ipc.h
+++ b/security/apparmor/include/ipc.h
@@ -19,8 +19,8 @@

struct aa_profile;

-int aa_may_ptrace(struct task_struct *tracer_task, struct aa_profile *tracer,
- struct aa_profile *tracee, unsigned int mode);
+int aa_may_ptrace(struct aa_profile *tracer, struct aa_profile *tracee,
+ unsigned int mode);

int aa_ptrace(struct task_struct *tracer, struct task_struct *tracee,
unsigned int mode);
diff --git a/security/apparmor/ipc.c b/security/apparmor/ipc.c
index c51d226..777ac1c 100644
--- a/security/apparmor/ipc.c
+++ b/security/apparmor/ipc.c
@@ -54,15 +54,14 @@ static int aa_audit_ptrace(struct aa_profile *profile,

/**
* aa_may_ptrace - test if tracer task can trace the tracee
- * @tracer_task: task who will do the tracing (NOT NULL)
* @tracer: profile of the task doing the tracing (NOT NULL)
* @tracee: task to be traced
* @mode: whether PTRACE_MODE_READ || PTRACE_MODE_ATTACH
*
* Returns: %0 else error code if permission denied or error
*/
-int aa_may_ptrace(struct task_struct *tracer_task, struct aa_profile *tracer,
- struct aa_profile *tracee, unsigned int mode)
+int aa_may_ptrace(struct aa_profile *tracer, struct aa_profile *tracee,
+ unsigned int mode)
{
/* TODO: currently only based on capability, not extended ptrace
* rules,
@@ -72,7 +71,7 @@ int aa_may_ptrace(struct task_struct *tracer_task, struct aa_profile *tracer,
if (unconfined(tracer) || tracer == tracee)
return 0;
/* log this capability request */
- return aa_capable(tracer_task, tracer, CAP_SYS_PTRACE, 1);
+ return aa_capable(tracer, CAP_SYS_PTRACE, 1);
}

/**
@@ -101,7 +100,7 @@ int aa_ptrace(struct task_struct *tracer, struct task_struct *tracee,
if (!unconfined(tracer_p)) {
struct aa_profile *tracee_p = aa_get_task_profile(tracee);

- error = aa_may_ptrace(tracer, tracer_p, tracee_p, mode);
+ error = aa_may_ptrace(tracer_p, tracee_p, mode);
error = aa_audit_ptrace(tracer_p, tracee_p, error);

aa_put_profile(tracee_p);
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index 2e2a0dd..69c54c8 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -145,7 +145,7 @@ static int apparmor_capable(const struct cred *cred, struct user_namespace *ns,
if (!error) {
profile = aa_cred_profile(cred);
if (!unconfined(profile))
- error = aa_capable(current, profile, cap, audit);
+ error = aa_capable(profile, cap, audit);
}
return error;
}
--
1.8.3.2
John Johansen
2013-08-30 22:42:26 UTC
Permalink
Now that aa_capabile no longer sets the task field it can be removed
and the lsm_audit version of the field can be used.

Signed-off-by: John Johansen <***@canonical.com>
---
security/apparmor/audit.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/security/apparmor/audit.c b/security/apparmor/audit.c
index 031d2d9..e32c448 100644
--- a/security/apparmor/audit.c
+++ b/security/apparmor/audit.c
@@ -111,7 +111,7 @@ static const char *const aa_audit_type[] = {
static void audit_pre(struct audit_buffer *ab, void *ca)
{
struct common_audit_data *sa = ca;
- struct task_struct *tsk = sa->aad->tsk ? sa->aad->tsk : current;
+ struct task_struct *tsk = sa->u.tsk ? sa->u.tsk : current;

if (aa_g_audit_header) {
audit_log_format(ab, "apparmor=");
@@ -149,12 +149,6 @@ static void audit_pre(struct audit_buffer *ab, void *ca)
audit_log_format(ab, " name=");
audit_log_untrustedstring(ab, sa->aad->name);
}
-
- if (sa->aad->tsk) {
- audit_log_format(ab, " pid=%d comm=", tsk->pid);
- audit_log_untrustedstring(ab, tsk->comm);
- }
-
}

/**
@@ -212,7 +206,7 @@ int aa_audit(int type, struct aa_profile *profile, gfp_t gfp,

if (sa->aad->type == AUDIT_APPARMOR_KILL)
(void)send_sig_info(SIGKILL, NULL,
- sa->aad->tsk ? sa->aad->tsk : current);
+ sa->u.tsk ? sa->u.tsk : current);

if (sa->aad->type == AUDIT_APPARMOR_ALLOWED)
return complain_error(sa->aad->error);
--
1.8.3.2
John Johansen
2013-08-30 22:43:41 UTC
Permalink
The reporting of the parent task info is a vestage from old versions of
apparmor. The need for this information was removed by unique null-
profiles before apparmor was upstreamed so remove this info from logging.

Signed-off-by: John Johansen <***@canonical.com>
---
security/apparmor/audit.c | 6 ------
security/apparmor/include/audit.h | 1 -
2 files changed, 7 deletions(-)

diff --git a/security/apparmor/audit.c b/security/apparmor/audit.c
index e32c448..89c7865 100644
--- a/security/apparmor/audit.c
+++ b/security/apparmor/audit.c
@@ -111,7 +111,6 @@ static const char *const aa_audit_type[] = {
static void audit_pre(struct audit_buffer *ab, void *ca)
{
struct common_audit_data *sa = ca;
- struct task_struct *tsk = sa->u.tsk ? sa->u.tsk : current;

if (aa_g_audit_header) {
audit_log_format(ab, "apparmor=");
@@ -132,11 +131,6 @@ static void audit_pre(struct audit_buffer *ab, void *ca)

if (sa->aad->profile) {
struct aa_profile *profile = sa->aad->profile;
- pid_t pid;
- rcu_read_lock();
- pid = rcu_dereference(tsk->real_parent)->pid;
- rcu_read_unlock();
- audit_log_format(ab, " parent=%d", pid);
if (profile->ns != root_ns) {
audit_log_format(ab, " namespace=");
audit_log_untrustedstring(ab, profile->ns->base.hname);
diff --git a/security/apparmor/include/audit.h b/security/apparmor/include/audit.h
index 69d8cae..cf65443 100644
--- a/security/apparmor/include/audit.h
+++ b/security/apparmor/include/audit.h
@@ -110,7 +110,6 @@ struct apparmor_audit_data {
void *profile;
const char *name;
const char *info;
- struct task_struct *tsk;
union {
void *target;
struct {
--
1.8.3.2
Richard Guy Briggs
2013-09-03 18:31:59 UTC
Permalink
Post by John Johansen
Post by Richard Guy Briggs
Most of the instances are current, but the one called from apparmour is
stored. I've just learned that this is bad and someone else just chimed
in that they have a patch to remove it...
the apparmor case isn't actually stored long term. The stored task will be
a parameter that was passed into an lsm hook and the buffer that it is
stored in dies before the hook is done. Its temporarily stored in the
struct so that it can be passed into the lsm_audit fn, and printed into an
allocated audit buffer. The text version in the audit buffer is what will
exist beyond the hook.
There are three patches, I'll reply them below once I have finished rebasing
them to apply to the current tree instead of my dev tree.
John, thanks for this context and fix. That helps simplify things.

- RGB

--
Richard Guy Briggs <***@redhat.com>
Senior Software Engineer
Kernel Security
AMER ENG Base Operating Systems
Remote, Ottawa, Canada
Voice: +1.647.777.2635
Internal: (81) 32635
Alt: +1.613.693.0684x3545
Richard Guy Briggs
2013-12-11 14:47:58 UTC
Permalink
Post by Richard Guy Briggs
Post by John Johansen
Post by Richard Guy Briggs
Most of the instances are current, but the one called from apparmour is
stored. I've just learned that this is bad and someone else just chimed
in that they have a patch to remove it...
the apparmor case isn't actually stored long term. The stored task will be
a parameter that was passed into an lsm hook and the buffer that it is
stored in dies before the hook is done. Its temporarily stored in the
struct so that it can be passed into the lsm_audit fn, and printed into an
allocated audit buffer. The text version in the audit buffer is what will
exist beyond the hook.
There are three patches, I'll reply them below once I have finished rebasing
them to apply to the current tree instead of my dev tree.
John, thanks for this context and fix. That helps simplify things.
John, What's the status of this set of 3 patches? I don't see them
upstream.
Post by Richard Guy Briggs
- RGB
- RGB

--
Richard Guy Briggs <***@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
Richard Guy Briggs
2013-12-11 17:19:12 UTC
Permalink
Post by Richard Guy Briggs
Post by Richard Guy Briggs
Post by John Johansen
Post by Richard Guy Briggs
Most of the instances are current, but the one called from apparmour is
stored. I've just learned that this is bad and someone else just chimed
in that they have a patch to remove it...
the apparmor case isn't actually stored long term. The stored task will be
a parameter that was passed into an lsm hook and the buffer that it is
stored in dies before the hook is done. Its temporarily stored in the
struct so that it can be passed into the lsm_audit fn, and printed into an
allocated audit buffer. The text version in the audit buffer is what will
exist beyond the hook.
There are three patches, I'll reply them below once I have finished rebasing
them to apply to the current tree instead of my dev tree.
John, thanks for this context and fix. That helps simplify things.
John, What's the status of this set of 3 patches? I don't see them
upstream.
they are part of the security tree merge in 3.13
51775fe apparmor: remove the "task" arg from may_change_ptraced_domain()
4a7fc30 apparmor: remove parent task info from audit logging
61e3fb8 apparmor: remove tsk field from the apparmor_audit_struct
dd0c6e8 apparmor: fix capability to not use the current task, during reporting
Ok, cool, so they will be upstream by the time I'll need them. Thanks!

- RGB

--
Richard Guy Briggs <***@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
John Johansen
2013-12-11 16:44:26 UTC
Permalink
Post by Richard Guy Briggs
Post by Richard Guy Briggs
Post by John Johansen
Post by Richard Guy Briggs
Most of the instances are current, but the one called from apparmour is
stored. I've just learned that this is bad and someone else just chimed
in that they have a patch to remove it...
the apparmor case isn't actually stored long term. The stored task will be
a parameter that was passed into an lsm hook and the buffer that it is
stored in dies before the hook is done. Its temporarily stored in the
struct so that it can be passed into the lsm_audit fn, and printed into an
allocated audit buffer. The text version in the audit buffer is what will
exist beyond the hook.
There are three patches, I'll reply them below once I have finished rebasing
them to apply to the current tree instead of my dev tree.
John, thanks for this context and fix. That helps simplify things.
John, What's the status of this set of 3 patches? I don't see them
upstream.
they are part of the security tree merge in 3.13

51775fe apparmor: remove the "task" arg from may_change_ptraced_domain()
4a7fc30 apparmor: remove parent task info from audit logging
61e3fb8 apparmor: remove tsk field from the apparmor_audit_struct
dd0c6e8 apparmor: fix capability to not use the current task, during reporting
Richard Guy Briggs
2013-08-20 21:31:56 UTC
Permalink
sys_getppid() returns the parent pid of the current process in its own pid
namespace. Since audit filters are based in the init pid namespace, a process
could avoid a filter or trigger an unintended one by being in an alternate pid
namespace or log meaningless information.

Switch to task_ppid_nr_init_ns() for PPIDs to anchor all audit filters in the
init_pid_ns.

(informed by ebiederman's 6c621b7e)
Cc: ***@vger.kernel.org
Cc: Eric W. Biederman <***@xmission.com>
Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
kernel/audit.c | 4 ++--
kernel/auditsc.c | 2 +-
security/apparmor/audit.c | 5 +----
3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 2476334..75b0989 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1588,10 +1588,10 @@ void audit_log_task_info(struct audit_buffer *ab, struct task_struct *tsk)
spin_unlock_irq(&tsk->sighand->siglock);

audit_log_format(ab,
- " ppid=%ld pid=%d auid=%u uid=%u gid=%u"
+ " ppid=%d pid=%d auid=%u uid=%u gid=%u"
" euid=%u suid=%u fsuid=%u"
" egid=%u sgid=%u fsgid=%u ses=%u tty=%s",
- sys_getppid(),
+ task_ppid_nr_init_ns(tsk),
tsk->pid,
from_kuid(&init_user_ns, audit_get_loginuid(tsk)),
from_kuid(&init_user_ns, cred->uid),
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 8c644c5..74a0748 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -466,7 +466,7 @@ static int audit_filter_rules(struct task_struct *tsk,
case AUDIT_PPID:
if (ctx) {
if (!ctx->ppid)
- ctx->ppid = sys_getppid();
+ ctx->ppid = task_ppid_nr_init_ns(current);
result = audit_comparator(ctx->ppid, f->op, f->val);
}
break;
diff --git a/security/apparmor/audit.c b/security/apparmor/audit.c
index 031d2d9..497b306 100644
--- a/security/apparmor/audit.c
+++ b/security/apparmor/audit.c
@@ -132,10 +132,7 @@ static void audit_pre(struct audit_buffer *ab, void *ca)

if (sa->aad->profile) {
struct aa_profile *profile = sa->aad->profile;
- pid_t pid;
- rcu_read_lock();
- pid = rcu_dereference(tsk->real_parent)->pid;
- rcu_read_unlock();
+ pid_t pid = task_ppid_nr_init_ns(tsk);
audit_log_format(ab, " parent=%d", pid);
if (profile->ns != root_ns) {
audit_log_format(ab, " namespace=");
--
1.7.1
Richard Guy Briggs
2013-08-20 21:31:57 UTC
Permalink
Added the functions
task_pid_nr_init_ns()
task_tgid_nr_init_ns()
to avoid the use of the error-prone duplicity of task->pid and task->tgid,
avoid changing the existing usage of task_pid_nr() and task_tgid_nr() while it
gets converted away from task->pid and task->tgid, and provide a clear
abstraction of the frequently used init_pid_ns in task_pid_nr_ns() and
task_tgid_nr_ns().
Also added pid_nr_init_ns() to explicitly use init_pid_ns.

(informed by ebiederman's 3a2e8c59)
Cc: Eric W. Biederman <***@xmission.com>
Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
include/linux/pid.h | 6 ++++++
include/linux/sched.h | 10 ++++++++++
2 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/include/linux/pid.h b/include/linux/pid.h
index 23705a5..051e134 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -172,6 +172,12 @@ static inline pid_t pid_nr(struct pid *pid)
pid_t pid_nr_ns(struct pid *pid, struct pid_namespace *ns);
pid_t pid_vnr(struct pid *pid);

+static inline pid_t pid_nr_init_ns(struct pid *pid)
+{
+ return pid_nr_ns(pid, &init_pid_ns);
+}
+
+
#define do_each_pid_task(pid, type, task) \
do { \
if ((pid) != NULL) \
diff --git a/include/linux/sched.h b/include/linux/sched.h
index ec04090..9651d95 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1482,6 +1482,11 @@ static inline pid_t task_pid_nr_ns(struct task_struct *tsk,
return __task_pid_nr_ns(tsk, PIDTYPE_PID, ns);
}

+static inline pid_t task_pid_nr_init_ns(struct task_struct *tsk)
+{
+ return task_pid_nr_ns(tsk, &init_pid_ns);
+}
+
static inline pid_t task_pid_vnr(struct task_struct *tsk)
{
return __task_pid_nr_ns(tsk, PIDTYPE_PID, NULL);
@@ -1495,6 +1500,11 @@ static inline pid_t task_tgid_nr(struct task_struct *tsk)

pid_t task_tgid_nr_ns(struct task_struct *tsk, struct pid_namespace *ns);

+static inline pid_t task_tgid_nr_init_ns(struct task_struct *tsk)
+{
+ return task_tgid_nr_ns(tsk, &init_pid_ns);
+}
+
static inline pid_t task_tgid_vnr(struct task_struct *tsk)
{
return pid_vnr(task_tgid(tsk));
--
1.7.1
Richard Guy Briggs
2013-08-20 21:31:58 UTC
Permalink
From: Eric W. Biederman <***@xmission.com>

- Always report the current process as capset now always only works on
the current process. This prevents reporting 0 or a random pid in
a random pid namespace.

- Don't bother to pass the pid as is available.

Signed-off-by: "Eric W. Biederman" <***@xmission.com>
(cherry picked from commit bcc85f0af31af123e32858069eb2ad8f39f90e67)
Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
include/linux/audit.h | 6 +++---
kernel/auditsc.c | 6 ++----
kernel/capability.c | 2 +-
3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index a3af0fa..49223a6 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -218,7 +218,7 @@ extern void __audit_mq_getsetattr(mqd_t mqdes, struct mq_attr *mqstat);
extern int __audit_log_bprm_fcaps(struct linux_binprm *bprm,
const struct cred *new,
const struct cred *old);
-extern void __audit_log_capset(pid_t pid, const struct cred *new, const struct cred *old);
+extern void __audit_log_capset(const struct cred *new, const struct cred *old);
extern void __audit_mmap_fd(int fd, int flags);

static inline void audit_ipc_obj(struct kern_ipc_perm *ipcp)
@@ -284,11 +284,11 @@ static inline int audit_log_bprm_fcaps(struct linux_binprm *bprm,
return 0;
}

-static inline void audit_log_capset(pid_t pid, const struct cred *new,
+static inline void audit_log_capset(const struct cred *new,
const struct cred *old)
{
if (unlikely(!audit_dummy_context()))
- __audit_log_capset(pid, new, old);
+ __audit_log_capset(new, old);
}

static inline void audit_mmap_fd(int fd, int flags)
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 74a0748..60a966d 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -2316,18 +2316,16 @@ int __audit_log_bprm_fcaps(struct linux_binprm *bprm,

/**
* __audit_log_capset - store information about the arguments to the capset syscall
- * @pid: target pid of the capset call
* @new: the new credentials
* @old: the old (current) credentials
*
* Record the aguments userspace sent to sys_capset for later printing by the
* audit system if applicable
*/
-void __audit_log_capset(pid_t pid,
- const struct cred *new, const struct cred *old)
+void __audit_log_capset(const struct cred *new, const struct cred *old)
{
struct audit_context *context = current->audit_context;
- context->capset.pid = pid;
+ context->capset.pid = task_pid_nr_init_ns(current);
context->capset.cap.effective = new->cap_effective;
context->capset.cap.inheritable = new->cap_effective;
context->capset.cap.permitted = new->cap_permitted;
diff --git a/kernel/capability.c b/kernel/capability.c
index f6c2ce5..7f60311 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -277,7 +277,7 @@ SYSCALL_DEFINE2(capset, cap_user_header_t, header, const cap_user_data_t, data)
if (ret < 0)
goto error;

- audit_log_capset(pid, new, current_cred());
+ audit_log_capset(new, current_cred());

return commit_creds(new);
--
1.7.1
Richard Guy Briggs
2013-08-20 21:31:59 UTC
Permalink
- PID will be reported in the relevant querying PID namespace.

- Refuse to change the current audit_pid if the new value does not
point to an existing process.

- Refuse to set the current audit_pid if the new value is not its own PID
(unless it is being unset).

- Convert audit_pid into the initial pid namespace for reports

(informed by ebiederman's 5bf431da)
Cc: "Eric W. Biederman" <***@xmission.com>
Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
kernel/audit.c | 25 +++++++++++++++++++------
kernel/audit.h | 4 ++--
kernel/auditsc.c | 6 +++---
3 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 75b0989..fa1d5f5 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -93,7 +93,7 @@ static int audit_failure = AUDIT_FAIL_PRINTK;
* contains the pid of the auditd process and audit_nlk_portid contains
* the portid to use to send netlink messages to that process.
*/
-int audit_pid;
+struct pid *audit_pid;
static __u32 audit_nlk_portid;

/* If audit_rate_limit is non-zero, limit the rate of sending audit records
@@ -388,9 +388,11 @@ static void kauditd_send_skb(struct sk_buff *skb)
err = netlink_unicast(audit_sock, skb, audit_nlk_portid, 0);
if (err < 0) {
BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
- printk(KERN_ERR "audit: *NO* daemon at audit_pid=%d\n", audit_pid);
+ pr_err("audit: *NO* daemon at audit_pid=%d\n"
+ , pid_nr_init_ns(audit_pid));
audit_log_lost("auditd disappeared\n");
- audit_pid = 0;
+ put_pid(audit_pid);
+ audit_pid = NULL;
/* we might get lucky and get this in the next auditd */
audit_hold_skb(skb);
} else
@@ -661,7 +663,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
case AUDIT_GET:
status_set.enabled = audit_enabled;
status_set.failure = audit_failure;
- status_set.pid = audit_pid;
+ status_set.pid = pid_vnr(audit_pid);
status_set.rate_limit = audit_rate_limit;
status_set.backlog_limit = audit_backlog_limit;
status_set.lost = atomic_read(&audit_lost);
@@ -684,10 +686,21 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
return err;
}
if (status_get->mask & AUDIT_STATUS_PID) {
- int new_pid = status_get->pid;
+ struct pid *new_pid = find_get_pid(status_get->pid);
+ if (status_get->pid && !new_pid)
+ return -ESRCH;
+
+ /* check that non-zero pid it is trying to set is its
+ * own*/
+ if (status_get->pid &&
+ (status_get->pid != task_pid_vnr(current)))
+ return -EPERM;

if (audit_enabled != AUDIT_OFF)
- audit_log_config_change("audit_pid", new_pid, audit_pid, 1);
+ audit_log_config_change("audit_pid",
+ pid_nr_init_ns(new_pid),
+ pid_nr_init_ns(audit_pid),
+ 1);
audit_pid = new_pid;
audit_nlk_portid = NETLINK_CB(skb).portid;
}
diff --git a/kernel/audit.h b/kernel/audit.h
index 36edcf5..3932a3b 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -218,7 +218,7 @@ extern void audit_log_name(struct audit_context *context,
struct audit_names *n, struct path *path,
int record_num, int *call_panic);

-extern int audit_pid;
+extern struct pid *audit_pid;

#define AUDIT_INODE_BUCKETS 32
extern struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
@@ -310,7 +310,7 @@ extern u32 audit_sig_sid;
extern int __audit_signal_info(int sig, struct task_struct *t);
static inline int audit_signal_info(int sig, struct task_struct *t)
{
- if (unlikely((audit_pid && t->tgid == audit_pid) ||
+ if (unlikely((audit_pid && task_tgid(t) == audit_pid) ||
(audit_signals && !audit_dummy_context())))
return __audit_signal_info(sig, t);
return 0;
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 60a966d..2dcf67d 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -739,7 +739,7 @@ static enum audit_state audit_filter_syscall(struct task_struct *tsk,
struct audit_entry *e;
enum audit_state state;

- if (audit_pid && tsk->tgid == audit_pid)
+ if (audit_pid && task_tgid(tsk) == audit_pid)
return AUDIT_DISABLED;

rcu_read_lock();
@@ -800,7 +800,7 @@ void audit_filter_inodes(struct task_struct *tsk, struct audit_context *ctx)
{
struct audit_names *n;

- if (audit_pid && tsk->tgid == audit_pid)
+ if (audit_pid && task_tgid(tsk) == audit_pid)
return;

rcu_read_lock();
@@ -2220,7 +2220,7 @@ int __audit_signal_info(int sig, struct task_struct *t)
struct audit_context *ctx = tsk->audit_context;
kuid_t uid = current_uid(), t_uid = task_uid(t);

- if (audit_pid && t->tgid == audit_pid) {
+ if (audit_pid && task_tgid(t) == audit_pid) {
if (sig == SIGTERM || sig == SIGHUP || sig == SIGUSR1 || sig == SIGUSR2) {
audit_sig_pid = tsk->pid;
if (uid_valid(tsk->loginuid))
--
1.7.1
Richard Guy Briggs
2013-08-20 21:32:00 UTC
Permalink
Store and log all PIDs with reference to the initial PID namespace and
deprecate the use of the error-prone duplicity of task->pid and task->tgid.

Still only permit the audit logging daemon and control to operate from the
initial PID namespace, but allow processes to log from another PID namespace.

Cc: "Eric W. Biederman" <***@xmission.com>
(informed by ebiederman's c776b5d2)

Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
drivers/tty/tty_audit.c | 3 ++-
kernel/audit.c | 15 ++++++++++-----
kernel/auditfilter.c | 17 ++++++++++++++++-
kernel/auditsc.c | 16 +++++++++-------
security/apparmor/audit.c | 2 +-
security/integrity/integrity_audit.c | 2 +-
security/lsm_audit.c | 11 +++++++----
security/tomoyo/audit.c | 2 +-
8 files changed, 47 insertions(+), 21 deletions(-)

diff --git a/drivers/tty/tty_audit.c b/drivers/tty/tty_audit.c
index a4fdce7..a0ae01f 100644
--- a/drivers/tty/tty_audit.c
+++ b/drivers/tty/tty_audit.c
@@ -65,6 +65,7 @@ static void tty_audit_log(const char *description, int major, int minor,
{
struct audit_buffer *ab;
struct task_struct *tsk = current;
+ pid_t pid = task_pid_nr_init_ns(tsk);
uid_t uid = from_kuid(&init_user_ns, task_uid(tsk));
uid_t loginuid = from_kuid(&init_user_ns, audit_get_loginuid(tsk));
u32 sessionid = audit_get_sessionid(tsk);
@@ -74,7 +75,7 @@ static void tty_audit_log(const char *description, int major, int minor,
char name[sizeof(tsk->comm)];

audit_log_format(ab, "%s pid=%u uid=%u auid=%u ses=%u major=%d"
- " minor=%d comm=", description, tsk->pid, uid,
+ " minor=%d comm=", description, pid, uid,
loginuid, sessionid, major, minor);
get_task_comm(name, tsk);
audit_log_untrustedstring(ab, name);
diff --git a/kernel/audit.c b/kernel/audit.c
index fa1d5f5..8e4958b 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -574,9 +574,8 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
{
int err = 0;

- /* Only support the initial namespaces for now. */
- if ((current_user_ns() != &init_user_ns) ||
- (task_active_pid_ns(current) != &init_pid_ns))
+ /* Only support initial user namespace for now. */
+ if ((current_user_ns() != &init_user_ns))
return -EPERM;

switch (msg_type) {
@@ -594,6 +593,11 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
case AUDIT_TTY_SET:
case AUDIT_TRIM:
case AUDIT_MAKE_EQUIV:
+ /* Only support auditd and auditctl in initial pid namespace
+ * for now. */
+ if ((task_active_pid_ns(current) != &init_pid_ns))
+ return -EPERM;
+
if (!capable(CAP_AUDIT_CONTROL))
err = -EPERM;
break;
@@ -614,6 +618,7 @@ static int audit_log_common_recv_msg(struct audit_buffer **ab, u16 msg_type)
{
int rc = 0;
uid_t uid = from_kuid(&init_user_ns, current_uid());
+ pid_t pid = task_tgid_nr_init_ns(current);

if (!audit_enabled) {
*ab = NULL;
@@ -623,7 +628,7 @@ static int audit_log_common_recv_msg(struct audit_buffer **ab, u16 msg_type)
*ab = audit_log_start(NULL, GFP_KERNEL, msg_type);
if (unlikely(!*ab))
return rc;
- audit_log_format(*ab, "pid=%d uid=%u", task_tgid_vnr(current), uid);
+ audit_log_format(*ab, "pid=%d uid=%u", pid, uid);
audit_log_session_info(*ab);
audit_log_task_context(*ab);

@@ -1605,7 +1610,7 @@ void audit_log_task_info(struct audit_buffer *ab, struct task_struct *tsk)
" euid=%u suid=%u fsuid=%u"
" egid=%u sgid=%u fsgid=%u ses=%u tty=%s",
task_ppid_nr_init_ns(tsk),
- tsk->pid,
+ task_pid_nr_init_ns(tsk),
from_kuid(&init_user_ns, audit_get_loginuid(tsk)),
from_kuid(&init_user_ns, cred->uid),
from_kgid(&init_user_ns, cred->gid),
diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
index 381d3de..2a60455 100644
--- a/kernel/auditfilter.c
+++ b/kernel/auditfilter.c
@@ -428,6 +428,19 @@ static struct audit_entry *audit_data_to_entry(struct audit_rule_data *data,
f->val = 0;
}

+ if ((f->type == AUDIT_PID) || (f->type == AUDIT_PPID)) {
+ struct pid *pid;
+ rcu_read_lock();
+ pid = find_vpid(f->val);
+ if (!pid) {
+ rcu_read_unlock();
+ err = -ESRCH;
+ goto exit_free;
+ }
+ f->val = pid_nr_init_ns(pid);
+ rcu_read_unlock();
+ }
+
err = audit_field_valid(entry, f);
if (err)
goto exit_free;
@@ -1223,12 +1236,14 @@ static int audit_filter_user_rules(struct audit_krule *rule, int type,

for (i = 0; i < rule->field_count; i++) {
struct audit_field *f = &rule->fields[i];
+ pid_t pid;
int result = 0;
u32 sid;

switch (f->type) {
case AUDIT_PID:
- result = audit_comparator(task_pid_vnr(current), f->op, f->val);
+ pid = task_pid_nr_init_ns(current);
+ result = audit_comparator(pid, f->op, f->val);
break;
case AUDIT_UID:
result = audit_uid_comparator(current_uid(), f->op, f->uid);
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 2dcf67d..5cde81c 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -458,10 +458,12 @@ static int audit_filter_rules(struct task_struct *tsk,
struct audit_field *f = &rule->fields[i];
struct audit_names *n;
int result = 0;
+ pid_t pid;

switch (f->type) {
case AUDIT_PID:
- result = audit_comparator(tsk->pid, f->op, f->val);
+ pid = task_pid_nr_init_ns(tsk);
+ result = audit_comparator(pid, f->op, f->val);
break;
case AUDIT_PPID:
if (ctx) {
@@ -1989,7 +1991,7 @@ int audit_set_loginuid(kuid_t loginuid)
audit_log_format(ab, "login pid=%d uid=%u "
"old auid=%u new auid=%u"
" old ses=%u new ses=%u",
- task->pid,
+ task_pid_nr_init_ns(task),
from_kuid(&init_user_ns, task_uid(task)),
from_kuid(&init_user_ns, task->loginuid),
from_kuid(&init_user_ns, loginuid),
@@ -2197,7 +2199,7 @@ void __audit_ptrace(struct task_struct *t)
{
struct audit_context *context = current->audit_context;

- context->target_pid = t->pid;
+ context->target_pid = task_pid_nr_init_ns(t);
context->target_auid = audit_get_loginuid(t);
context->target_uid = task_uid(t);
context->target_sessionid = audit_get_sessionid(t);
@@ -2222,7 +2224,7 @@ int __audit_signal_info(int sig, struct task_struct *t)

if (audit_pid && task_tgid(t) == audit_pid) {
if (sig == SIGTERM || sig == SIGHUP || sig == SIGUSR1 || sig == SIGUSR2) {
- audit_sig_pid = tsk->pid;
+ audit_sig_pid = task_pid_nr_init_ns(tsk);
if (uid_valid(tsk->loginuid))
audit_sig_uid = tsk->loginuid;
else
@@ -2236,7 +2238,7 @@ int __audit_signal_info(int sig, struct task_struct *t)
/* optimize the common case by putting first signal recipient directly
* in audit_context */
if (!ctx->target_pid) {
- ctx->target_pid = t->tgid;
+ ctx->target_pid = task_tgid_nr_init_ns(t);
ctx->target_auid = audit_get_loginuid(t);
ctx->target_uid = t_uid;
ctx->target_sessionid = audit_get_sessionid(t);
@@ -2257,7 +2259,7 @@ int __audit_signal_info(int sig, struct task_struct *t)
}
BUG_ON(axp->pid_count >= AUDIT_AUX_PIDS);

- axp->target_pid[axp->pid_count] = t->tgid;
+ axp->target_pid[axp->pid_count] = task_tgid_nr_init_ns(t);
axp->target_auid[axp->pid_count] = audit_get_loginuid(t);
axp->target_uid[axp->pid_count] = t_uid;
axp->target_sessionid[axp->pid_count] = audit_get_sessionid(t);
@@ -2356,7 +2358,7 @@ static void audit_log_task(struct audit_buffer *ab)
from_kgid(&init_user_ns, gid),
sessionid);
audit_log_task_context(ab);
- audit_log_format(ab, " pid=%d comm=", current->pid);
+ audit_log_format(ab, " pid=%d comm=", task_pid_nr_init_ns(current));
audit_log_untrustedstring(ab, current->comm);
}

diff --git a/security/apparmor/audit.c b/security/apparmor/audit.c
index 497b306..afe71f5 100644
--- a/security/apparmor/audit.c
+++ b/security/apparmor/audit.c
@@ -148,7 +148,7 @@ static void audit_pre(struct audit_buffer *ab, void *ca)
}

if (sa->aad->tsk) {
- audit_log_format(ab, " pid=%d comm=", tsk->pid);
+ audit_log_format(ab, " pid=%d comm=", task_pid_nr_init_ns(tsk));
audit_log_untrustedstring(ab, tsk->comm);
}

diff --git a/security/integrity/integrity_audit.c b/security/integrity/integrity_audit.c
index d7efb30..1e7291a 100644
--- a/security/integrity/integrity_audit.c
+++ b/security/integrity/integrity_audit.c
@@ -39,7 +39,7 @@ void integrity_audit_msg(int audit_msgno, struct inode *inode,

ab = audit_log_start(current->audit_context, GFP_KERNEL, audit_msgno);
audit_log_format(ab, "pid=%d uid=%u auid=%u ses=%u",
- current->pid,
+ task_pid_nr_init_ns(current),
from_kuid(&init_user_ns, current_cred()->uid),
from_kuid(&init_user_ns, audit_get_loginuid(current)),
audit_get_sessionid(current));
diff --git a/security/lsm_audit.c b/security/lsm_audit.c
index 8d8d97d..4f43488 100644
--- a/security/lsm_audit.c
+++ b/security/lsm_audit.c
@@ -220,7 +220,7 @@ static void dump_common_audit_data(struct audit_buffer *ab,
*/
BUILD_BUG_ON(sizeof(a->u) > sizeof(void *)*2);

- audit_log_format(ab, " pid=%d comm=", tsk->pid);
+ audit_log_format(ab, " pid=%d comm=", task_pid_nr_init_ns(tsk));
audit_log_untrustedstring(ab, tsk->comm);

switch (a->type) {
@@ -278,9 +278,12 @@ static void dump_common_audit_data(struct audit_buffer *ab,
}
case LSM_AUDIT_DATA_TASK:
tsk = a->u.tsk;
- if (tsk && tsk->pid) {
- audit_log_format(ab, " pid=%d comm=", tsk->pid);
- audit_log_untrustedstring(ab, tsk->comm);
+ if (tsk) {
+ pid_t pid = task_pid_nr_init_ns(tsk);
+ if (pid) {
+ audit_log_format(ab, " pid=%d comm=", pid);
+ audit_log_untrustedstring(ab, tsk->comm);
+ }
}
break;
case LSM_AUDIT_DATA_NET:
diff --git a/security/tomoyo/audit.c b/security/tomoyo/audit.c
index c1b0037..6657607 100644
--- a/security/tomoyo/audit.c
+++ b/security/tomoyo/audit.c
@@ -147,7 +147,7 @@ static inline const char *tomoyo_filetype(const umode_t mode)
static char *tomoyo_print_header(struct tomoyo_request_info *r)
{
struct tomoyo_time stamp;
- const pid_t gpid = task_pid_nr(current);
+ const pid_t gpid = task_pid_nr_init_ns(current);
struct tomoyo_obj_info *obj = r->obj;
static const int tomoyo_buffer_len = 4096;
char *buffer = kmalloc(tomoyo_buffer_len, GFP_NOFS);
--
1.7.1
Richard Guy Briggs
2013-08-20 21:32:01 UTC
Permalink
task->pid is an error prone construct and results in duplicate maintenance.
Start it's demise by modifying task_pid_nr to not use it.

(informed by ebiederman's 3a2e8c59)
Cc: "Eric W. Biederman" <***@xmission.com>
Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
include/linux/sched.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9651d95..fb09b93 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1473,7 +1473,7 @@ pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,

static inline pid_t task_pid_nr(struct task_struct *tsk)
{
- return tsk->pid;
+ return pid_nr(task_pid(tsk));
}

static inline pid_t task_pid_nr_ns(struct task_struct *tsk,
--
1.7.1
Richard Guy Briggs
2013-12-16 21:03:38 UTC
Permalink
task->pid is only ever assigned once (well ok, twice). For system health and
secure logging confidence, make it const to make it much more intentional when
it is being changed.
---

Peter, as you had suggested, does this approach work for you in terms of making
task_struct::pid a lot more difficult to accidentally change to try to preserve
its integrity?

Is the use of memcpy() significantly different from *p = *q ?


arch/x86/kernel/process.c | 2 +-
fs/exec.c | 4 +++-
include/linux/sched.h | 2 +-
kernel/fork.c | 8 ++++++--
4 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index c83516b..4170026 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -66,7 +66,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
{
int ret;

- *dst = *src;
+ memcpy(dst, src, sizeof(struct task_struct));
if (fpu_allocated(&src->thread.fpu)) {
memset(&dst->thread.fpu, 0, sizeof(dst->thread.fpu));
ret = fpu_alloc(&dst->thread.fpu);
diff --git a/fs/exec.c b/fs/exec.c
index 47d7edb..c8d0159 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -908,6 +908,8 @@ static int de_thread(struct task_struct *tsk)
*/
if (!thread_group_leader(tsk)) {
struct task_struct *leader = tsk->group_leader;
+ /* tast_struct::pid is const pid_t, hence the ugly cast */
+ pid_t *pid_p = (pid_t*)&(tsk->pid);

sig->notify_count = -1; /* for exit_notify() */
for (;;) {
@@ -950,7 +952,7 @@ static int de_thread(struct task_struct *tsk)
* Note: The old leader also uses this pid until release_task
* is called. Odd but simple and correct.
*/
- tsk->pid = leader->pid;
+ *pid_p = leader->pid;
change_pid(tsk, PIDTYPE_PID, task_pid(leader));
transfer_pid(leader, tsk, PIDTYPE_PGID);
transfer_pid(leader, tsk, PIDTYPE_SID);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d9ada71..45069c0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1112,7 +1112,7 @@ struct task_struct {
unsigned sched_reset_on_fork:1;
unsigned sched_contributes_to_load:1;

- pid_t pid;
+ const pid_t pid;
pid_t tgid;

#ifdef CONFIG_CC_STACKPROTECTOR
diff --git a/kernel/fork.c b/kernel/fork.c
index 086fe73..ec0683d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -286,7 +286,7 @@ void __init fork_init(unsigned long mempages)
int __attribute__((weak)) arch_dup_task_struct(struct task_struct *dst,
struct task_struct *src)
{
- *dst = *src;
+ memcpy(dst, src, sizeof(struct task_struct));
return 0;
}

@@ -1137,6 +1137,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
{
int retval;
struct task_struct *p;
+ pid_t *pid_p;

if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS))
return ERR_PTR(-EINVAL);
@@ -1392,7 +1393,10 @@ static struct task_struct *copy_process(unsigned long clone_flags,
clear_all_latency_tracing(p);

/* ok, now we should be set up.. */
- p->pid = pid_nr(pid);
+
+ /* tast_struct::pid is const pid_t, hence the ugly cast */
+ pid_p = (pid_t*)&(p->pid);
+ *pid_p = pid_nr(pid);
if (clone_flags & CLONE_THREAD) {
p->exit_signal = -1;
p->group_leader = current->group_leader;
--
1.7.1
Peter Zijlstra
2013-12-17 09:58:01 UTC
Permalink
Post by Richard Guy Briggs
task->pid is only ever assigned once (well ok, twice). For system health and
secure logging confidence, make it const to make it much more intentional when
it is being changed.
---
Peter, as you had suggested, does this approach work for you in terms of making
task_struct::pid a lot more difficult to accidentally change to try to preserve
its integrity?
Yeah, looks good to me.
Post by Richard Guy Briggs
Is the use of memcpy() significantly different from *p = *q ?
You'd have to look at the asm, but I suspect gcc knows to do full struct
copies using mempcy.
Richard Guy Briggs
2013-12-20 04:48:26 UTC
Permalink
Post by Peter Zijlstra
Post by Richard Guy Briggs
task->pid is only ever assigned once (well ok, twice). For system health and
secure logging confidence, make it const to make it much more intentional when
it is being changed.
---
Peter, as you had suggested, does this approach work for you in terms of making
task_struct::pid a lot more difficult to accidentally change to try to preserve
its integrity?
Yeah, looks good to me.
Ok, who would carry this patch? You? AKPM? Me?

Any opinions about Oleg's macro idea?

- RGB

--
Richard Guy Briggs <***@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
Peter Zijlstra
2013-12-17 09:59:10 UTC
Permalink
On Mon, Dec 16, 2013 at 04:03:38PM -0500, Richard Guy Briggs wrote:

Cc: linux-***@redhat.com, linux-***@vger.kernel.org

Could you not cross-post to a moderated list please? I keep getting
endless bounces.
Oleg Nesterov
2013-12-17 15:36:11 UTC
Permalink
Post by Richard Guy Briggs
task->pid is only ever assigned once (well ok, twice). For system health and
secure logging confidence, make it const to make it much more intentional when
it is being changed.
Hmm. I am a bit suprized you decided to constify task->pid, but OK.

And we can do the same with task->signal, this can actually help to generate
a better code, probably.
Post by Richard Guy Briggs
if (!thread_group_leader(tsk)) {
struct task_struct *leader = tsk->group_leader;
+ /* tast_struct::pid is const pid_t, hence the ugly cast */
+ pid_t *pid_p = (pid_t*)&(tsk->pid);
sig->notify_count = -1; /* for exit_notify() */
for (;;) {
@@ -950,7 +952,7 @@ static int de_thread(struct task_struct *tsk)
* Note: The old leader also uses this pid until release_task
* is called. Odd but simple and correct.
*/
- tsk->pid = leader->pid;
+ *pid_p = leader->pid;
Well, imho this (and de_thread) looks a bit ugly. Perhaps we should add
something like

#define ASSIGN_CONST(l, r) (*(typeof(r) *)&(l) = (r))

into compiler.h ?

Oleg.
Oleg Nesterov
2013-12-17 15:40:04 UTC
Permalink
Post by Oleg Nesterov
Post by Richard Guy Briggs
task->pid is only ever assigned once (well ok, twice). For system health and
secure logging confidence, make it const to make it much more intentional when
it is being changed.
Hmm. I am a bit suprized you decided to constify task->pid, but OK.
And we can do the same with task->signal, this can actually help to generate
a better code, probably.
Post by Richard Guy Briggs
if (!thread_group_leader(tsk)) {
struct task_struct *leader = tsk->group_leader;
+ /* tast_struct::pid is const pid_t, hence the ugly cast */
+ pid_t *pid_p = (pid_t*)&(tsk->pid);
sig->notify_count = -1; /* for exit_notify() */
for (;;) {
@@ -950,7 +952,7 @@ static int de_thread(struct task_struct *tsk)
* Note: The old leader also uses this pid until release_task
* is called. Odd but simple and correct.
*/
- tsk->pid = leader->pid;
+ *pid_p = leader->pid;
Well, imho this (and de_thread) looks a bit ugly. Perhaps we should add
something like
#define ASSIGN_CONST(l, r) (*(typeof(r) *)&(l) = (r))
into compiler.h ?
Or even

#define ASSIGN_CONST(l, r) \
({ BUILD_BUG_ON(sizeof(l) != sizeof(r)); *(typeof(r) *)&(l) = (r); })

perhaps it will have more users.

Oleg.
Oleg Nesterov
2013-12-20 19:01:57 UTC
Permalink
Richard, Peter,

sorry I deleted your emails by accident, so I am replying to my email.

Sure, ASSIGN_CONST() looks "dangerous". Still to me it is safer than
"(pid_t*)&(tsk->pid)" done by hand. And yes, it is visible to grep.

But the main point, it is much more readable. Just look at the change
below,
Post by Richard Guy Briggs
if (!thread_group_leader(tsk)) {
struct task_struct *leader = tsk->group_leader;
+ /* tast_struct::pid is const pid_t, hence the ugly cast */
+ pid_t *pid_p = (pid_t*)&(tsk->pid);
sig->notify_count = -1; /* for exit_notify() */
for (;;) {
@@ -950,7 +952,7 @@ static int de_thread(struct task_struct *tsk)
* Note: The old leader also uses this pid until release_task
* is called. Odd but simple and correct.
*/
- tsk->pid = leader->pid;
+ *pid_p = leader->pid;
Isn't it ugly to add the temporary? And this temporary is the pointer.
ASSIGN_CONST(task->pid, leader->pid) is self-documenting and clear.

The only problem is that

#define ASSIGN_CONST(l, r) (*(typeof(r) *)&(l) = (r))

obviously can't work in this case ;) We need something more clever.

Oleg.
Richard Guy Briggs
2013-12-20 20:19:31 UTC
Permalink
Post by Oleg Nesterov
Richard, Peter,
Oleg,
Post by Oleg Nesterov
sorry I deleted your emails by accident, so I am replying to my email.
That's ok, you could send yourself a new copy from the "forward" link
here:
https://lkml.org/lkml/2013/12/20/277
Post by Oleg Nesterov
Sure, ASSIGN_CONST() looks "dangerous". Still to me it is safer than
"(pid_t*)&(tsk->pid)" done by hand. And yes, it is visible to grep.
But the main point, it is much more readable. Just look at the change
below,
I completely agree it is more readable and safer, if it works.
Post by Oleg Nesterov
Post by Richard Guy Briggs
if (!thread_group_leader(tsk)) {
struct task_struct *leader = tsk->group_leader;
+ /* tast_struct::pid is const pid_t, hence the ugly cast */
+ pid_t *pid_p = (pid_t*)&(tsk->pid);
sig->notify_count = -1; /* for exit_notify() */
for (;;) {
@@ -950,7 +952,7 @@ static int de_thread(struct task_struct *tsk)
* Note: The old leader also uses this pid until release_task
* is called. Odd but simple and correct.
*/
- tsk->pid = leader->pid;
+ *pid_p = leader->pid;
Isn't it ugly to add the temporary? And this temporary is the pointer.
ASSIGN_CONST(task->pid, leader->pid) is self-documenting and clear.
The only problem is that
#define ASSIGN_CONST(l, r) (*(typeof(r) *)&(l) = (r))
obviously can't work in this case ;) We need something more clever.
Go for it. I'll just stick with what I found works. I gave it a shot
to do it in one step and failed. I moved on.
Post by Oleg Nesterov
Oleg.
- RGB

--
Richard Guy Briggs <***@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
Peter Zijlstra
2013-12-20 21:33:48 UTC
Permalink
Post by Oleg Nesterov
The only problem is that
#define ASSIGN_CONST(l, r) (*(typeof(r) *)&(l) = (r))
obviously can't work in this case ;) We need something more clever.
Hmm indeed, C++ has both the const_cast<>() thingy and the template
system is powerful enough to actually implement const_cast<>() inside
the language.

But I cannot find anything useful for C. Your attempt to use the rvalue
type to hopefully obtain a const-less lvalue type is clever, but does
indeed fail where the rvalue is const too.
Oleg Nesterov
2013-12-22 16:03:21 UTC
Permalink
Post by Peter Zijlstra
Post by Oleg Nesterov
The only problem is that
#define ASSIGN_CONST(l, r) (*(typeof(r) *)&(l) = (r))
obviously can't work in this case ;) We need something more clever.
Hmm indeed, C++ has both the const_cast<>() thingy and the template
system is powerful enough to actually implement const_cast<>() inside
the language.
But I cannot find anything useful for C. Your attempt to use the rvalue
type to hopefully obtain a const-less lvalue type is clever, but does
indeed fail where the rvalue is const too.
Yes.

We can probably do

#define ASSIGN_CONST(l, r) ({ \
typeof (l) const r__ = (r); \
memcpy((void *)&(l), &(r__), sizeof(l)); \
(l); \
})

gcc can actually avoid a temporary if you do

ASSIGN_CONST(tsk->pid, leader->pid);

but this doesn't look very nice and assumes __builtin_memcpy().


So, how about

#define CONST_CAST(type, lval) \
(*({ \
(void)((type *)0 == &(lval)); \
(type *)&(lval); \
})) \

?

de_thread/copy_process can do

CONST_CAST(pid_t, tsk->pid) = leader->pid;

and if "type" is wrong we have a warning.

Oleg.
Richard Guy Briggs
2014-01-23 19:24:03 UTC
Permalink
Post by Oleg Nesterov
Post by Peter Zijlstra
Post by Oleg Nesterov
The only problem is that
#define ASSIGN_CONST(l, r) (*(typeof(r) *)&(l) = (r))
obviously can't work in this case ;) We need something more clever.
Hmm indeed, C++ has both the const_cast<>() thingy and the template
system is powerful enough to actually implement const_cast<>() inside
the language.
So, how about
#define CONST_CAST(type, lval) \
(*({ \
(void)((type *)0 == &(lval)); \
(type *)&(lval); \
})) \
de_thread/copy_process can do
CONST_CAST(pid_t, tsk->pid) = leader->pid;
and if "type" is wrong we have a warning.
Cool. This works very nicely. Thanks for the ideas Oleg!
Post by Oleg Nesterov
Oleg.
- RGB

--
Richard Guy Briggs <***@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
Richard Guy Briggs
2014-01-23 19:32:33 UTC
Permalink
These are a number of patches inspired by ebiederman's container work that were
included by me 2013-08-20 as the patchset:
RFC: steps to make audit pid namespace-safe

They have been seperated out for the pid maintainer since there are no direct
dependencies from the audit pid namespace patchset with the exception of:
pid: get pid_t ppid of task in init_pid_ns

Andrew, are you willing to adopt these?

In particular, there is discussion around read-only task_struct::pid here:
https://lkml.org/lkml/2013/12/16/552

Richard Guy Briggs (7):
pid: change task_struct::pid to read-only
compiler: CONST_CAST makes writing const vars easier and obvious
pid: use the CONST_CAST macro instead to write to const
task_struct::pid
pid: modify task_tgid_nr to work without task->tgid.
pid: rewrite task helper function is_global_init() avoiding task->pid
pid: mark struct task const in helper functions
pid: get pid_t ppid of task in init_pid_ns

arch/x86/kernel/process.c | 2 +-
fs/exec.c | 2 +-
include/linux/compiler.h | 8 ++++++
include/linux/sched.h | 60 +++++++++++++++++++++++++++++---------------
kernel/fork.c | 5 ++-
kernel/pid.c | 4 +-
6 files changed, 54 insertions(+), 27 deletions(-)
Peter Zijlstra
2014-01-23 21:25:35 UTC
Permalink
Post by Richard Guy Briggs
These are a number of patches inspired by ebiederman's container work that were
RFC: steps to make audit pid namespace-safe
They have been seperated out for the pid maintainer since there are no direct
pid: get pid_t ppid of task in init_pid_ns
Andrew, are you willing to adopt these?
https://lkml.org/lkml/2013/12/16/552
I would have ordered them slightly different, but:

Acked-by: Peter Zijlstra <***@infradead.org>
Richard Guy Briggs
2014-01-24 06:14:47 UTC
Permalink
Post by Richard Guy Briggs
These are a number of patches inspired by ebiederman's container work that were
RFC: steps to make audit pid namespace-safe
They have been seperated out for the pid maintainer since there are no direct
pid: get pid_t ppid of task in init_pid_ns
Andrew, are you willing to adopt these?
https://lkml.org/lkml/2013/12/16/552
Can you briefly explain how and why so I can understand for next time?
I originally had the ppid patch first...
- RGB

--
Richard Guy Briggs <***@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
Peter Zijlstra
2014-01-24 08:52:48 UTC
Permalink
Post by Richard Guy Briggs
Post by Richard Guy Briggs
These are a number of patches inspired by ebiederman's container work that were
RFC: steps to make audit pid namespace-safe
They have been seperated out for the pid maintainer since there are no direct
pid: get pid_t ppid of task in init_pid_ns
Andrew, are you willing to adopt these?
https://lkml.org/lkml/2013/12/16/552
Can you briefly explain how and why so I can understand for next time?
I originally had the ppid patch first...
Ah, I would have introduced CONST_CAST() earlier, then used it in the
make pid const and then avoided the conversion patch.
Richard Guy Briggs
2014-01-24 14:31:45 UTC
Permalink
Post by Peter Zijlstra
Post by Richard Guy Briggs
Post by Richard Guy Briggs
These are a number of patches inspired by ebiederman's container work that were
RFC: steps to make audit pid namespace-safe
They have been seperated out for the pid maintainer since there are no direct
pid: get pid_t ppid of task in init_pid_ns
Andrew, are you willing to adopt these?
https://lkml.org/lkml/2013/12/16/552
Can you briefly explain how and why so I can understand for next time?
I originally had the ppid patch first...
Ah, I would have introduced CONST_CAST() earlier, then used it in the
make pid const and then avoided the conversion patch.
Ah, fair enough. It does serve to show how much ugliness can be avoided
though...

- RGB

--
Richard Guy Briggs <***@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
Richard Guy Briggs
2014-02-19 16:18:58 UTC
Permalink
Andrew,

Are you willing to shepherd this patchset?
Post by Richard Guy Briggs
These are a number of patches inspired by ebiederman's container work that were
RFC: steps to make audit pid namespace-safe
They have been seperated out for the pid maintainer since there are no direct
pid: get pid_t ppid of task in init_pid_ns
Andrew, are you willing to adopt these?
https://lkml.org/lkml/2013/12/16/552
pid: change task_struct::pid to read-only
compiler: CONST_CAST makes writing const vars easier and obvious
pid: use the CONST_CAST macro instead to write to const
task_struct::pid
pid: modify task_tgid_nr to work without task->tgid.
pid: rewrite task helper function is_global_init() avoiding task->pid
pid: mark struct task const in helper functions
pid: get pid_t ppid of task in init_pid_ns
arch/x86/kernel/process.c | 2 +-
fs/exec.c | 2 +-
include/linux/compiler.h | 8 ++++++
include/linux/sched.h | 60 +++++++++++++++++++++++++++++---------------
kernel/fork.c | 5 ++-
kernel/pid.c | 4 +-
6 files changed, 54 insertions(+), 27 deletions(-)
- RGB

--
Richard Guy Briggs <***@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
Oleg Nesterov
2014-02-19 17:47:45 UTC
Permalink
Richard,

I am sorry for delay, I'll try to review this series tomorrow.

But at first glance, can't you send 2/7 first and join 1/7 and 3/7?

And since you change is_global_init() perhaps you can also fix it?
It actually needs tgid.
Post by Richard Guy Briggs
Andrew,
Are you willing to shepherd this patchset?
Post by Richard Guy Briggs
These are a number of patches inspired by ebiederman's container work that were
RFC: steps to make audit pid namespace-safe
They have been seperated out for the pid maintainer since there are no direct
pid: get pid_t ppid of task in init_pid_ns
Andrew, are you willing to adopt these?
https://lkml.org/lkml/2013/12/16/552
pid: change task_struct::pid to read-only
compiler: CONST_CAST makes writing const vars easier and obvious
pid: use the CONST_CAST macro instead to write to const
task_struct::pid
pid: modify task_tgid_nr to work without task->tgid.
pid: rewrite task helper function is_global_init() avoiding task->pid
pid: mark struct task const in helper functions
pid: get pid_t ppid of task in init_pid_ns
arch/x86/kernel/process.c | 2 +-
fs/exec.c | 2 +-
include/linux/compiler.h | 8 ++++++
include/linux/sched.h | 60 +++++++++++++++++++++++++++++---------------
kernel/fork.c | 5 ++-
kernel/pid.c | 4 +-
6 files changed, 54 insertions(+), 27 deletions(-)
- RGB
--
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
Richard Guy Briggs
2014-02-19 18:15:01 UTC
Permalink
Post by Oleg Nesterov
Richard,
Hi Oleg,
Post by Oleg Nesterov
I am sorry for delay, I'll try to review this series tomorrow.
But at first glance, can't you send 2/7 first and join 1/7 and 3/7?
Yes, Peter made the same observation. I thought it was more useful to
have them seperated out, but I'll join them.
Post by Oleg Nesterov
And since you change is_global_init() perhaps you can also fix it?
It actually needs tgid.
Sure. Can you explain why? We only want init killing off its own
threads?
Post by Oleg Nesterov
Post by Richard Guy Briggs
Andrew,
Are you willing to shepherd this patchset?
Post by Richard Guy Briggs
These are a number of patches inspired by ebiederman's container work that were
RFC: steps to make audit pid namespace-safe
They have been seperated out for the pid maintainer since there are no direct
pid: get pid_t ppid of task in init_pid_ns
Andrew, are you willing to adopt these?
https://lkml.org/lkml/2013/12/16/552
pid: change task_struct::pid to read-only
compiler: CONST_CAST makes writing const vars easier and obvious
pid: use the CONST_CAST macro instead to write to const
task_struct::pid
pid: modify task_tgid_nr to work without task->tgid.
pid: rewrite task helper function is_global_init() avoiding task->pid
pid: mark struct task const in helper functions
pid: get pid_t ppid of task in init_pid_ns
arch/x86/kernel/process.c | 2 +-
fs/exec.c | 2 +-
include/linux/compiler.h | 8 ++++++
include/linux/sched.h | 60 +++++++++++++++++++++++++++++---------------
kernel/fork.c | 5 ++-
kernel/pid.c | 4 +-
6 files changed, 54 insertions(+), 27 deletions(-)
- RGB
- RGB

--
Richard Guy Briggs <***@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
Oleg Nesterov
2014-02-20 19:08:16 UTC
Permalink
Post by Richard Guy Briggs
Post by Oleg Nesterov
But at first glance, can't you send 2/7 first and join 1/7 and 3/7?
Yes, Peter made the same observation. I thought it was more useful to
have them seperated out, but I'll join them.
Yes, thanks, I think it doesn't make sense to uglify the code in 1/7
and then fix it in 2/7.
Post by Richard Guy Briggs
Post by Oleg Nesterov
And since you change is_global_init() perhaps you can also fix it?
It actually needs tgid.
Sure. Can you explain why? We only want init killing off its own
threads?
Please see my reply to 5/7.

Oleg.
Richard Guy Briggs
2014-01-23 19:32:34 UTC
Permalink
task->pid is only ever assigned once (well ok, twice). For system health and
secure logging confidence, make it const to make it much more intentional when
it is being changed.

Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
arch/x86/kernel/process.c | 2 +-
fs/exec.c | 4 +++-
include/linux/sched.h | 2 +-
kernel/fork.c | 8 ++++++--
4 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 3fb8d95..bab0730 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -66,7 +66,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
{
int ret;

- *dst = *src;
+ memcpy(dst, src, sizeof(struct task_struct));
if (fpu_allocated(&src->thread.fpu)) {
memset(&dst->thread.fpu, 0, sizeof(dst->thread.fpu));
ret = fpu_alloc(&dst->thread.fpu);
diff --git a/fs/exec.c b/fs/exec.c
index 7ea097f..a6b585e 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -906,6 +906,8 @@ static int de_thread(struct task_struct *tsk)
*/
if (!thread_group_leader(tsk)) {
struct task_struct *leader = tsk->group_leader;
+ /* tast_struct::pid is const pid_t, hence the ugly cast */
+ pid_t *pid_p = (pid_t *)&(tsk->pid);

sig->notify_count = -1; /* for exit_notify() */
for (;;) {
@@ -948,7 +950,7 @@ static int de_thread(struct task_struct *tsk)
* Note: The old leader also uses this pid until release_task
* is called. Odd but simple and correct.
*/
- tsk->pid = leader->pid;
+ *pid_p = leader->pid;
change_pid(tsk, PIDTYPE_PID, task_pid(leader));
transfer_pid(leader, tsk, PIDTYPE_PGID);
transfer_pid(leader, tsk, PIDTYPE_SID);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 53f97eb..195f6bd 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1128,7 +1128,7 @@ struct task_struct {
unsigned sched_reset_on_fork:1;
unsigned sched_contributes_to_load:1;

- pid_t pid;
+ const pid_t pid;
pid_t tgid;

#ifdef CONFIG_CC_STACKPROTECTOR
diff --git a/kernel/fork.c b/kernel/fork.c
index dfa736c..296e564 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -286,7 +286,7 @@ void __init fork_init(unsigned long mempages)
int __attribute__((weak)) arch_dup_task_struct(struct task_struct *dst,
struct task_struct *src)
{
- *dst = *src;
+ memcpy(dst, src, sizeof(struct task_struct));
return 0;
}

@@ -1135,6 +1135,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
{
int retval;
struct task_struct *p;
+ pid_t *pid_p;

if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS))
return ERR_PTR(-EINVAL);
@@ -1389,7 +1390,10 @@ static struct task_struct *copy_process(unsigned long clone_flags,
clear_all_latency_tracing(p);

/* ok, now we should be set up.. */
- p->pid = pid_nr(pid);
+
+ /* tast_struct::pid is const pid_t, hence the ugly cast */
+ pid_p = (pid_t *)&(p->pid);
+ *pid_p = pid_nr(pid);
if (clone_flags & CLONE_THREAD) {
p->exit_signal = -1;
p->group_leader = current->group_leader;
--
1.7.1
Richard Guy Briggs
2014-01-23 19:32:35 UTC
Permalink
There are vars that would benefit from making it more difficult or obvious to
overwrite by setting them to const. This macro makes it easier to do so
avoiding typos while making it easier to find such instances.

Oleg Nesterov deserves full credit for this patch.

Cc: Oleg Nesterov <***@redhat.com>
Cc: Peter Zijlstra <***@infradead.org>
Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
include/linux/compiler.h | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 92669cd..eaaf273 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -357,4 +357,12 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
#else
# define __kprobes
#endif
+
+/* Make it easier to find and get right making a const writable */
+#define CONST_CAST(type, lval) \
+ (*({ \
+ (void)((type *)0 == &(lval)); \
+ (type *)&(lval); \
+ }))
+
#endif /* __LINUX_COMPILER_H */
--
1.7.1
Richard Guy Briggs
2014-01-23 19:32:36 UTC
Permalink
The CONST_CAST macro is a cleaner and more reliable way to write to the const
task_struct::pid.

Cc: Oleg Nesterov <***@redhat.com>
Cc: Peter Zijlstra <***@infradead.org>
Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
fs/exec.c | 4 +---
kernel/fork.c | 5 +----
2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index a6b585e..1d2369d 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -906,8 +906,6 @@ static int de_thread(struct task_struct *tsk)
*/
if (!thread_group_leader(tsk)) {
struct task_struct *leader = tsk->group_leader;
- /* tast_struct::pid is const pid_t, hence the ugly cast */
- pid_t *pid_p = (pid_t*)&(tsk->pid);

sig->notify_count = -1; /* for exit_notify() */
for (;;) {
@@ -950,7 +948,7 @@ static int de_thread(struct task_struct *tsk)
* Note: The old leader also uses this pid until release_task
* is called. Odd but simple and correct.
*/
- *pid_p = leader->pid;
+ CONST_CAST(pid_t, tsk->pid) = leader->pid;
change_pid(tsk, PIDTYPE_PID, task_pid(leader));
transfer_pid(leader, tsk, PIDTYPE_PGID);
transfer_pid(leader, tsk, PIDTYPE_SID);
diff --git a/kernel/fork.c b/kernel/fork.c
index 296e564..207c543 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1135,7 +1135,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
{
int retval;
struct task_struct *p;
- pid_t *pid_p;

if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS))
return ERR_PTR(-EINVAL);
@@ -1391,9 +1390,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,

/* ok, now we should be set up.. */

- /* tast_struct::pid is const pid_t, hence the ugly cast */
- pid_p = (pid_t*)&(p->pid);
- *pid_p = pid_nr(pid);
+ CONST_CAST(pid_t, p->pid) = pid_nr(pid);
if (clone_flags & CLONE_THREAD) {
p->exit_signal = -1;
p->group_leader = current->group_leader;
--
1.7.1
Richard Guy Briggs
2014-01-23 19:32:37 UTC
Permalink
task->tgid is an error prone construct and results in duplicate maintenance.
Start it's demise by modifying task_tgid_nr to not use it.

Cc: "Eric W. Biederman" <***@xmission.com>
Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
include/linux/sched.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 195f6bd..b4e658c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1550,7 +1550,7 @@ static inline pid_t task_pid_vnr(struct task_struct *tsk)

static inline pid_t task_tgid_nr(struct task_struct *tsk)
{
- return tsk->tgid;
+ return pid_nr(task_tgid(tsk));
}

pid_t task_tgid_nr_ns(struct task_struct *tsk, struct pid_namespace *ns);
--
1.7.1
Oleg Nesterov
2014-02-20 18:35:18 UTC
Permalink
Post by Richard Guy Briggs
task->tgid is an error prone construct and results in duplicate maintenance.
Start it's demise by modifying task_tgid_nr to not use it.
Well, I disagree.

Yes I agree that ->tgid should probably die. But this change itself doesn't
help, it only makes task_tgid_nr() slower. We need to convert other users
first, then consider this change along with ->tgid removal.
Post by Richard Guy Briggs
static inline pid_t task_tgid_nr(struct task_struct *tsk)
{
- return tsk->tgid;
+ return pid_nr(task_tgid(tsk));
}
And what protect task_tgid? This is racy.

The race is very unlikely, pid_nr() will likely hit pid == NULL if tsk
exits. But still it can use the freed/unmapped/reused memory.

And even if we add rcu_read_lock() the patch will add the semantics change,
task_tgid_nr() can return 0 if tsk has already exited. At least this should
be documented, but you also need to audit the users.

Oleg.
Richard Guy Briggs
2014-01-23 19:32:38 UTC
Permalink
Use the access function task_pid_nr() to get pid.

(informed by ebiederman's ea5a4d01)
Cc: "Eric W. Biederman" <***@xmission.com>
Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
include/linux/sched.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index b4e658c..f60de19 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1615,7 +1615,7 @@ static inline int pid_alive(struct task_struct *p)
*/
static inline int is_global_init(struct task_struct *tsk)
{
- return tsk->pid == 1;
+ return task_pid_nr(tsk) == 1;
}

extern struct pid *cad_pid;
--
1.7.1
Oleg Nesterov
2014-02-20 18:39:58 UTC
Permalink
Post by Richard Guy Briggs
static inline int is_global_init(struct task_struct *tsk)
{
- return tsk->pid == 1;
+ return task_pid_nr(tsk) == 1;
}
ACK, but we need to fix it.

With or without this change is_global_init() actually means
is-the-main-thread-of-global-init.

This is not what the callers actually want. Just look at
oom_unkillable_task(). Suppose that p is the sub-thread of
/sbin/init, if we kill this thread we kill the whole process.

The same of other callers. is_global_init() should return T
if this task is the part of init's thread group.

Oleg.
Richard Guy Briggs
2014-02-21 16:10:58 UTC
Permalink
Post by Oleg Nesterov
Post by Richard Guy Briggs
static inline int is_global_init(struct task_struct *tsk)
{
- return tsk->pid == 1;
+ return task_pid_nr(tsk) == 1;
}
ACK, but we need to fix it.
With or without this change is_global_init() actually means
is-the-main-thread-of-global-init.
This is not what the callers actually want. Just look at
oom_unkillable_task(). Suppose that p is the sub-thread of
/sbin/init, if we kill this thread we kill the whole process.
The same of other callers. is_global_init() should return T
if this task is the part of init's thread group.
Good. Updated in my tree.
Post by Oleg Nesterov
Oleg.
- RGB

--
Richard Guy Briggs <***@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
Richard Guy Briggs
2014-01-23 19:32:39 UTC
Permalink
It doesn't make any sense to recallers to pass in a non-const struct
task so update the function signatures to only require a const struct
task.

(informed by ebiederman's c76b2526)
Cc: "Eric W. Biederman" <***@xmission.com>
Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
include/linux/sched.h | 36 ++++++++++++++++++------------------
kernel/pid.c | 4 ++--
2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f60de19..2016d92 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1488,12 +1488,12 @@ static inline void task_numa_free(struct task_struct *p)
}
#endif

-static inline struct pid *task_pid(struct task_struct *task)
+static inline struct pid *task_pid(const struct task_struct *task)
{
return task->pids[PIDTYPE_PID].pid;
}

-static inline struct pid *task_tgid(struct task_struct *task)
+static inline struct pid *task_tgid(const struct task_struct *task)
{
return task->group_leader->pids[PIDTYPE_PID].pid;
}
@@ -1503,12 +1503,12 @@ static inline struct pid *task_tgid(struct task_struct *task)
* the result of task_pgrp/task_session even if task == current,
* we can race with another thread doing sys_setsid/sys_setpgid.
*/
-static inline struct pid *task_pgrp(struct task_struct *task)
+static inline struct pid *task_pgrp(const struct task_struct *task)
{
return task->group_leader->pids[PIDTYPE_PGID].pid;
}

-static inline struct pid *task_session(struct task_struct *task)
+static inline struct pid *task_session(const struct task_struct *task)
{
return task->group_leader->pids[PIDTYPE_SID].pid;
}
@@ -1528,64 +1528,64 @@ struct pid_namespace;
*
* see also pid_nr() etc in include/linux/pid.h
*/
-pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
+pid_t __task_pid_nr_ns(const struct task_struct *task, enum pid_type type,
struct pid_namespace *ns);

-static inline pid_t task_pid_nr(struct task_struct *tsk)
+static inline pid_t task_pid_nr(const struct task_struct *tsk)
{
return tsk->pid;
}

-static inline pid_t task_pid_nr_ns(struct task_struct *tsk,
+static inline pid_t task_pid_nr_ns(const struct task_struct *tsk,
struct pid_namespace *ns)
{
return __task_pid_nr_ns(tsk, PIDTYPE_PID, ns);
}

-static inline pid_t task_pid_vnr(struct task_struct *tsk)
+static inline pid_t task_pid_vnr(const struct task_struct *tsk)
{
return __task_pid_nr_ns(tsk, PIDTYPE_PID, NULL);
}


-static inline pid_t task_tgid_nr(struct task_struct *tsk)
+static inline pid_t task_tgid_nr(const const struct task_struct *tsk)
{
return pid_nr(task_tgid(tsk));
}

-pid_t task_tgid_nr_ns(struct task_struct *tsk, struct pid_namespace *ns);
+pid_t task_tgid_nr_ns(const struct task_struct *tsk, struct pid_namespace *ns);

-static inline pid_t task_tgid_vnr(struct task_struct *tsk)
+static inline pid_t task_tgid_vnr(const struct task_struct *tsk)
{
return pid_vnr(task_tgid(tsk));
}


-static inline pid_t task_pgrp_nr_ns(struct task_struct *tsk,
+static inline pid_t task_pgrp_nr_ns(const struct task_struct *tsk,
struct pid_namespace *ns)
{
return __task_pid_nr_ns(tsk, PIDTYPE_PGID, ns);
}

-static inline pid_t task_pgrp_vnr(struct task_struct *tsk)
+static inline pid_t task_pgrp_vnr(const struct task_struct *tsk)
{
return __task_pid_nr_ns(tsk, PIDTYPE_PGID, NULL);
}


-static inline pid_t task_session_nr_ns(struct task_struct *tsk,
+static inline pid_t task_session_nr_ns(const struct task_struct *tsk,
struct pid_namespace *ns)
{
return __task_pid_nr_ns(tsk, PIDTYPE_SID, ns);
}

-static inline pid_t task_session_vnr(struct task_struct *tsk)
+static inline pid_t task_session_vnr(const struct task_struct *tsk)
{
return __task_pid_nr_ns(tsk, PIDTYPE_SID, NULL);
}

/* obsolete, do not use */
-static inline pid_t task_pgrp_nr(struct task_struct *tsk)
+static inline pid_t task_pgrp_nr(const struct task_struct *tsk)
{
return task_pgrp_nr_ns(tsk, &init_pid_ns);
}
@@ -1600,7 +1600,7 @@ static inline pid_t task_pgrp_nr(struct task_struct *tsk)
*
* Return: 1 if the process is alive. 0 otherwise.
*/
-static inline int pid_alive(struct task_struct *p)
+static inline int pid_alive(const struct task_struct *p)
{
return p->pids[PIDTYPE_PID].pid != NULL;
}
@@ -1613,7 +1613,7 @@ static inline int pid_alive(struct task_struct *p)
*
* Return: 1 if the task structure is init. 0 otherwise.
*/
-static inline int is_global_init(struct task_struct *tsk)
+static inline int is_global_init(const struct task_struct *tsk)
{
return task_pid_nr(tsk) == 1;
}
diff --git a/kernel/pid.c b/kernel/pid.c
index 9b9a266..468b35c 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -512,7 +512,7 @@ pid_t pid_vnr(struct pid *pid)
}
EXPORT_SYMBOL_GPL(pid_vnr);

-pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
+pid_t __task_pid_nr_ns(const struct task_struct *task, enum pid_type type,
struct pid_namespace *ns)
{
pid_t nr = 0;
@@ -531,7 +531,7 @@ pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
}
EXPORT_SYMBOL(__task_pid_nr_ns);

-pid_t task_tgid_nr_ns(struct task_struct *tsk, struct pid_namespace *ns)
+pid_t task_tgid_nr_ns(const struct task_struct *tsk, struct pid_namespace *ns)
{
return pid_nr_ns(task_tgid(tsk), ns);
}
--
1.7.1
Richard Guy Briggs
2014-01-23 19:32:40 UTC
Permalink
Added the functions task_ppid_nr_ns() and task_ppid_nr() to abstract the lookup
of the PPID (real_parent's pid_t) of a process, including rcu locking, in the
arbitrary and init_pid_ns.
This provides an alternative to sys_getppid(), which is relative to the child
process' pid namespace.

(informed by ebiederman's 6c621b7e)
Cc: ***@vger.kernel.org
Cc: Eric W. Biederman <***@xmission.com>
Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
include/linux/sched.h | 18 ++++++++++++++++++
1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2016d92..cba2486 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1561,6 +1561,24 @@ static inline pid_t task_tgid_vnr(const struct task_struct *tsk)
}


+static int pid_alive(const struct task_struct *p);
+static inline pid_t task_ppid_nr_ns(const struct task_struct *tsk, struct pid_namespace *ns)
+{
+ pid_t pid = 0;
+
+ rcu_read_lock();
+ if (pid_alive(tsk))
+ pid = task_tgid_nr_ns(rcu_dereference(tsk->real_parent), ns);
+ rcu_read_unlock();
+
+ return pid;
+}
+
+static inline pid_t task_ppid_nr(const struct task_struct *tsk)
+{
+ return task_ppid_nr_ns(tsk, &init_pid_ns);
+}
+
static inline pid_t task_pgrp_nr_ns(const struct task_struct *tsk,
struct pid_namespace *ns)
{
--
1.7.1
Oleg Nesterov
2014-02-20 19:01:22 UTC
Permalink
Post by Richard Guy Briggs
Added the functions task_ppid_nr_ns() and task_ppid_nr() to abstract the lookup
of the PPID (real_parent's pid_t) of a process, including rcu locking, in the
arbitrary and init_pid_ns.
This provides an alternative to sys_getppid(), which is relative to the child
process' pid namespace.
I agree, this makes sense.


imho it would be better to send this patch along with sys_getppid()
conversions, but I won't argue.
Post by Richard Guy Briggs
+static int pid_alive(const struct task_struct *p);
+static inline pid_t task_ppid_nr_ns(const struct task_struct *tsk, struct pid_namespace *ns)
+{
+ pid_t pid = 0;
+
+ rcu_read_lock();
+ if (pid_alive(tsk))
+ pid = task_tgid_nr_ns(rcu_dereference(tsk->real_parent), ns);
+ rcu_read_unlock();
+
+ return pid;
+}
Perhaps it should be named task_ptgid_nr_ns() or even parent_tgid_nr_ns().
Since it returns tgid, not pid (== tid).

But this is cosmetic, I won't insist.

Oleg.
Richard Guy Briggs
2014-02-21 18:10:55 UTC
Permalink
Post by Oleg Nesterov
Post by Richard Guy Briggs
Added the functions task_ppid_nr_ns() and task_ppid_nr() to abstract the lookup
of the PPID (real_parent's pid_t) of a process, including rcu locking, in the
arbitrary and init_pid_ns.
This provides an alternative to sys_getppid(), which is relative to the child
process' pid namespace.
I agree, this makes sense.
imho it would be better to send this patch along with sys_getppid()
conversions, but I won't argue.
I don't think sys_getppid() should be changed since it is a syscall
whose userspace user would assume the working namespace. Many of the
kernel internal uses of it are likely wrong though, so fixing that
should probably fall under another patch. I fixed audit in another
patch.
Post by Oleg Nesterov
Post by Richard Guy Briggs
+static int pid_alive(const struct task_struct *p);
+static inline pid_t task_ppid_nr_ns(const struct task_struct *tsk, struct pid_namespace *ns)
+{
+ pid_t pid = 0;
+
+ rcu_read_lock();
+ if (pid_alive(tsk))
+ pid = task_tgid_nr_ns(rcu_dereference(tsk->real_parent), ns);
+ rcu_read_unlock();
+
+ return pid;
+}
Perhaps it should be named task_ptgid_nr_ns() or even parent_tgid_nr_ns().
Since it returns tgid, not pid (== tid).
I like task_ptgid_nr_ns() (or maybe even task_parent_tgid_nr_ns() but
that gets a bit long.
Post by Oleg Nesterov
But this is cosmetic, I won't insist.
Oleg.
- RGB

--
Richard Guy Briggs <***@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
Richard Guy Briggs
2013-08-20 21:32:02 UTC
Permalink
task->tgid is an error prone construct and results in duplicate maintenance.
Start it's demise by modifying task_tgid_nr to not use it.

Cc: "Eric W. Biederman" <***@xmission.com>
Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
include/linux/sched.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index fb09b93..8e69807 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1495,7 +1495,7 @@ static inline pid_t task_pid_vnr(struct task_struct *tsk)

static inline pid_t task_tgid_nr(struct task_struct *tsk)
{
- return tsk->tgid;
+ return pid_nr(task_tgid(tsk));
}

pid_t task_tgid_nr_ns(struct task_struct *tsk, struct pid_namespace *ns);
--
1.7.1
Richard Guy Briggs
2013-08-20 21:32:03 UTC
Permalink
This stops these four task helper functions from using the deprecated and
error-prone task->pid and task->tgid.

(informed by ebiederman's ea5a4d01)
Cc: "Eric W. Biederman" <***@xmission.com>
Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
include/linux/sched.h | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8e69807..46e739d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1579,7 +1579,7 @@ static inline int pid_alive(struct task_struct *p)
*/
static inline int is_global_init(struct task_struct *tsk)
{
- return tsk->pid == 1;
+ return task_pid_nr(tsk) == 1;
}

extern struct pid *cad_pid;
@@ -1930,7 +1930,7 @@ extern struct task_struct *idle_task(int cpu);
*/
static inline bool is_idle_task(const struct task_struct *p)
{
- return p->pid == 0;
+ return task_pid(p) == &init_struct_pid;
}
extern struct task_struct *curr_task(int cpu);
extern void set_curr_task(int cpu, struct task_struct *p);
@@ -2203,13 +2203,13 @@ static inline bool thread_group_leader(struct task_struct *p)
*/
static inline int has_group_leader_pid(struct task_struct *p)
{
- return p->pid == p->tgid;
+ return task_pid(p) == task_tgid(p);
}

static inline
int same_thread_group(struct task_struct *p1, struct task_struct *p2)
{
- return p1->tgid == p2->tgid;
+ return task_tgid(p1) == task_tgid(p2);
}

static inline struct task_struct *next_thread(const struct task_struct *p)
--
1.7.1
Oleg Nesterov
2013-08-22 19:08:48 UTC
Permalink
Post by Richard Guy Briggs
static inline int is_global_init(struct task_struct *tsk)
{
- return tsk->pid == 1;
+ return task_pid_nr(tsk) == 1;
}
Probably it would be better to simply kill it. Almost every usage is
wrong.
Post by Richard Guy Briggs
static inline bool is_idle_task(const struct task_struct *p)
{
- return p->pid == 0;
+ return task_pid(p) == &init_struct_pid;
}
hmm. there should be a simpler check for this...
Post by Richard Guy Briggs
static inline int has_group_leader_pid(struct task_struct *p)
{
- return p->pid == p->tgid;
+ return task_pid(p) == task_tgid(p);
}
static inline
int same_thread_group(struct task_struct *p1, struct task_struct *p2)
{
- return p1->tgid == p2->tgid;
+ return task_tgid(p1) == task_tgid(p2);
This is suboptinal. See the attached
include-linux-schedh-dont-use-task-pid-tgid-in-same_thread_group-has_group_leader_pid.patch
from -mm below.

Oleg.


--- a/include/linux/sched.h~include-linux-schedh-dont-use-task-pid-tgid-in-same_thread_group-has_group_leader_pid
+++ a/include/linux/sched.h
@@ -2179,15 +2179,15 @@ static inline bool thread_group_leader(s
* all we care about is that we have a task with the appropriate
* pid, we don't actually care if we have the right task.
*/
-static inline int has_group_leader_pid(struct task_struct *p)
+static inline bool has_group_leader_pid(struct task_struct *p)
{
- return p->pid == p->tgid;
+ return task_pid(p) == p->signal->leader_pid;
}

static inline
-int same_thread_group(struct task_struct *p1, struct task_struct *p2)
+bool same_thread_group(struct task_struct *p1, struct task_struct *p2)
{
- return p1->tgid == p2->tgid;
+ return p1->signal == p2->signal;
}

static inline struct task_struct *next_thread(const struct task_struct *p)
Richard Guy Briggs
2013-08-26 22:07:11 UTC
Permalink
Post by Oleg Nesterov
Post by Richard Guy Briggs
static inline int is_global_init(struct task_struct *tsk)
{
- return tsk->pid == 1;
+ return task_pid_nr(tsk) == 1;
}
Probably it would be better to simply kill it. Almost every usage is
wrong.
Can you be more clear? I don't follow. It should instead return a
boolean. Usage of is_global_init() or task_pid_nr()?

If is_global_init(), is that because they could be unaware of pid
namespaces?

If task_pid_nr(), is that for the same reason?
Post by Oleg Nesterov
Post by Richard Guy Briggs
static inline bool is_idle_task(const struct task_struct *p)
{
- return p->pid == 0;
+ return task_pid(p) == &init_struct_pid;
}
hmm. there should be a simpler check for this...
Other than the original, this one is pretty simple. What did you have
in mind?
Post by Oleg Nesterov
Post by Richard Guy Briggs
static inline int has_group_leader_pid(struct task_struct *p)
{
- return p->pid == p->tgid;
+ return task_pid(p) == task_tgid(p);
}
static inline
int same_thread_group(struct task_struct *p1, struct task_struct *p2)
{
- return p1->tgid == p2->tgid;
+ return task_tgid(p1) == task_tgid(p2);
This is suboptinal. See the attached
include-linux-schedh-dont-use-task-pid-tgid-in-same_thread_group-has_group_leader_pid.patch
from -mm below.
I'm fine with that if it is deemed better. The point was to remove the
dependence on task_struct::tgid.
Post by Oleg Nesterov
Oleg.
- RGB

--
Richard Guy Briggs <***@redhat.com>
Senior Software Engineer
Kernel Security
AMER ENG Base Operating Systems
Remote, Ottawa, Canada
Voice: +1.647.777.2635
Internal: (81) 32635
Alt: +1.613.693.0684x3545
Oleg Nesterov
2013-08-27 16:15:36 UTC
Permalink
Post by Richard Guy Briggs
Post by Oleg Nesterov
Post by Richard Guy Briggs
static inline int is_global_init(struct task_struct *tsk)
{
- return tsk->pid == 1;
+ return task_pid_nr(tsk) == 1;
}
Probably it would be better to simply kill it. Almost every usage is
wrong.
Can you be more clear? I don't follow. It should instead return a
boolean. Usage of is_global_init() or task_pid_nr()?
Just look at the callers. For example, how is_global_init() can save
/sbin/init from oom-killer if it is multithreaded ?
Post by Richard Guy Briggs
If is_global_init(), is that because they could be unaware of pid
namespaces?
Because I think nobody actually needs is_a_group_leader_of_global_init(),
and this is what this helper actually is.
Post by Richard Guy Briggs
Post by Oleg Nesterov
Post by Richard Guy Briggs
static inline bool is_idle_task(const struct task_struct *p)
{
- return p->pid == 0;
+ return task_pid(p) == &init_struct_pid;
}
hmm. there should be a simpler check for this...
Other than the original, this one is pretty simple.
I meant that the original check is cheaper,
Post by Richard Guy Briggs
What did you have
in mind?
Well. I agree that it would be nice to avoid the dependence on task->pid
if possible. And perhaps even kill it eventually. But I am not sure how
much we should try.

If it was the last user of ->pid, then I would agree with this change.
Although we can make it cheaper, say, we can change idle_init() to
nullify tasks->next and use ->next == NULL.

But we have a lot more ->pid users, perhaps we should change them first.

And more importantly, let me repeat. I do not think that this change
should be mixed with other changes in this series.
Post by Richard Guy Briggs
Post by Oleg Nesterov
Post by Richard Guy Briggs
static inline int has_group_leader_pid(struct task_struct *p)
{
- return p->pid == p->tgid;
+ return task_pid(p) == task_tgid(p);
}
static inline
int same_thread_group(struct task_struct *p1, struct task_struct *p2)
{
- return p1->tgid == p2->tgid;
+ return task_tgid(p1) == task_tgid(p2);
This is suboptinal. See the attached
include-linux-schedh-dont-use-task-pid-tgid-in-same_thread_group-has_group_leader_pid.patch
from -mm below.
I'm fine with that if it is deemed better. The point was to remove the
dependence on task_struct::tgid.
But I agree! My only point was, this conflicts with the patch we already
have and that patch is more optimal. p1->leader == p2->leader is cheaper.

Oleg.
Richard Guy Briggs
2013-12-16 17:35:47 UTC
Permalink
Post by Richard Guy Briggs
Post by Oleg Nesterov
Post by Richard Guy Briggs
static inline int is_global_init(struct task_struct *tsk)
{
- return tsk->pid == 1;
+ return task_pid_nr(tsk) == 1;
}
Probably it would be better to simply kill it. Almost every usage is
wrong.
Can you be more clear? I don't follow. It should instead return a
boolean. Usage of is_global_init() or task_pid_nr()?
If is_global_init(), is that because they could be unaware of pid
namespaces?
If task_pid_nr(), is that for the same reason?
Oleg, I still don't understand your comment above. Kill what,
"is_global_init()"? If so, how is almost every usage of it wrong?

There are a number of functions that call is_global_init(). Might any
of them be called from inside the namespace context of a container and
hence should return true?
Post by Richard Guy Briggs
Post by Oleg Nesterov
Post by Richard Guy Briggs
static inline bool is_idle_task(const struct task_struct *p)
{
- return p->pid == 0;
+ return task_pid(p) == &init_struct_pid;
}
hmm. there should be a simpler check for this...
Other than the original, this one is pretty simple. What did you have
in mind?
I vaguely remember a clarification to this, but don't remember and can't
find it. What sort of simplification did you have in mind? I'd like to
go at least to:
task_pid_nr(p) == 0
Post by Richard Guy Briggs
Post by Oleg Nesterov
Oleg.
- RGB
- RGB

--
Richard Guy Briggs <***@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
Richard Guy Briggs
2013-12-16 22:20:51 UTC
Permalink
Hi Richard,
Sorry, I already forgot the context, not sure I understand your email
correctly.
Post by Richard Guy Briggs
Post by Richard Guy Briggs
Post by Oleg Nesterov
Post by Richard Guy Briggs
static inline int is_global_init(struct task_struct *tsk)
{
- return tsk->pid == 1;
+ return task_pid_nr(tsk) == 1;
}
Probably it would be better to simply kill it. Almost every usage is
wrong.
Can you be more clear? I don't follow. It should instead return a
boolean. Usage of is_global_init() or task_pid_nr()?
If is_global_init(), is that because they could be unaware of pid
namespaces?
If task_pid_nr(), is that for the same reason?
Oleg, I still don't understand your comment above. Kill what,
"is_global_init()"? If so, how is almost every usage of it wrong?
Because is_global_init() is only true for the main thread of /sbin/init.
Just look at oom_unkillable_task(). It tries to not kill init. But, say,
select_bad_process() can happily find a sub-thread of is_global_init()
and still kill it.
Ah! So it should be task_tgid_nr(tsk) == 1.
Post by Richard Guy Briggs
There are a number of functions that call is_global_init(). Might any
of them be called from inside the namespace context of a container and
hence should return true?
Not sure I understand, but certainly some callers should check ->child_reaper
or SIGNAL_UNKILLABLE instead. Say, unhandled_signal().
So in some situations it should allow it to kill init of a container and
in others, refuse it.
Post by Richard Guy Briggs
Post by Richard Guy Briggs
Post by Oleg Nesterov
Post by Richard Guy Briggs
static inline bool is_idle_task(const struct task_struct *p)
{
- return p->pid == 0;
+ return task_pid(p) == &init_struct_pid;
}
hmm. there should be a simpler check for this...
Other than the original, this one is pretty simple. What did you have
in mind?
I vaguely remember a clarification to this, but don't remember and can't
find it. What sort of simplification did you have in mind?
I do not remember ;) Most probably, I meant "it would be nice to find a
simpler check".
I'll stick with task_pid_nr(p) == 0.
Oleg.
- RGB

--
Richard Guy Briggs <***@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
Peter Zijlstra
2013-12-17 09:34:42 UTC
Permalink
Post by Richard Guy Briggs
static inline bool is_idle_task(const struct task_struct *p)
{
- return p->pid == 0;
+ return task_pid(p) == &init_struct_pid;
}
I'll stick with task_pid_nr(p) == 0.
We're going to probably switch to:

return p->flags & PF_IDLE;

Soon, because people are playing silly tricks and want normal threads
to temporarily appear to be the idle thread (idle time injection).
Peter Zijlstra
2013-12-17 09:48:40 UTC
Permalink
Post by Peter Zijlstra
Post by Richard Guy Briggs
static inline bool is_idle_task(const struct task_struct *p)
{
- return p->pid == 0;
+ return task_pid(p) == &init_struct_pid;
}
I'll stick with task_pid_nr(p) == 0.
return p->flags & PF_IDLE;
Soon, because people are playing silly tricks and want normal threads
to temporarily appear to be the idle thread (idle time injection).
See http://marc.info/?l=linux-kernel&m=138548236506953&w=2 for more
context.
Richard Guy Briggs
2013-12-20 04:54:13 UTC
Permalink
Post by Peter Zijlstra
Post by Richard Guy Briggs
static inline bool is_idle_task(const struct task_struct *p)
{
- return p->pid == 0;
+ return task_pid(p) == &init_struct_pid;
}
I'll stick with task_pid_nr(p) == 0.
return p->flags & PF_IDLE;
Soon, because people are playing silly tricks and want normal threads
to temporarily appear to be the idle thread (idle time injection).
Ok, this addresses my concerns. I'll stay out of the way.

- RGB

--
Richard Guy Briggs <***@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
Oleg Nesterov
2013-12-16 21:05:58 UTC
Permalink
Hi Richard,

Sorry, I already forgot the context, not sure I understand your email
correctly.
Post by Richard Guy Briggs
Post by Richard Guy Briggs
Post by Oleg Nesterov
Post by Richard Guy Briggs
static inline int is_global_init(struct task_struct *tsk)
{
- return tsk->pid == 1;
+ return task_pid_nr(tsk) == 1;
}
Probably it would be better to simply kill it. Almost every usage is
wrong.
Can you be more clear? I don't follow. It should instead return a
boolean. Usage of is_global_init() or task_pid_nr()?
If is_global_init(), is that because they could be unaware of pid
namespaces?
If task_pid_nr(), is that for the same reason?
Oleg, I still don't understand your comment above. Kill what,
"is_global_init()"? If so, how is almost every usage of it wrong?
Because is_global_init() is only true for the main thread of /sbin/init.

Just look at oom_unkillable_task(). It tries to not kill init. But, say,
select_bad_process() can happily find a sub-thread of is_global_init()
and still kill it.
Post by Richard Guy Briggs
There are a number of functions that call is_global_init(). Might any
of them be called from inside the namespace context of a container and
hence should return true?
Not sure I understand, but certainly some callers should check ->child_reaper
or SIGNAL_UNKILLABLE instead. Say, unhandled_signal().
Post by Richard Guy Briggs
Post by Richard Guy Briggs
Post by Oleg Nesterov
Post by Richard Guy Briggs
static inline bool is_idle_task(const struct task_struct *p)
{
- return p->pid == 0;
+ return task_pid(p) == &init_struct_pid;
}
hmm. there should be a simpler check for this...
Other than the original, this one is pretty simple. What did you have
in mind?
I vaguely remember a clarification to this, but don't remember and can't
find it. What sort of simplification did you have in mind?
I do not remember ;) Most probably, I meant "it would be nice to find a
simpler check".

Oleg.
Peter Zijlstra
2013-08-22 20:05:55 UTC
Permalink
Post by Richard Guy Briggs
This stops these four task helper functions from using the deprecated and
error-prone task->pid and task->tgid.
(informed by ebiederman's ea5a4d01)
---
include/linux/sched.h | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8e69807..46e739d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1579,7 +1579,7 @@ static inline int pid_alive(struct task_struct *p)
*/
static inline int is_global_init(struct task_struct *tsk)
{
- return tsk->pid == 1;
+ return task_pid_nr(tsk) == 1;
}
extern struct pid *cad_pid;
@@ -1930,7 +1930,7 @@ extern struct task_struct *idle_task(int cpu);
*/
static inline bool is_idle_task(const struct task_struct *p)
{
- return p->pid == 0;
+ return task_pid(p) == &init_struct_pid;
}
extern struct task_struct *curr_task(int cpu);
extern void set_curr_task(int cpu, struct task_struct *p);
Why would you ever want to do this? It just makes these tests more
expensive for no gain what so ff'ing ever.
Richard Guy Briggs
2013-08-22 21:43:47 UTC
Permalink
Post by Peter Zijlstra
Post by Richard Guy Briggs
This stops these four task helper functions from using the deprecated and
error-prone task->pid and task->tgid.
(informed by ebiederman's ea5a4d01)
---
include/linux/sched.h | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8e69807..46e739d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1579,7 +1579,7 @@ static inline int pid_alive(struct task_struct *p)
*/
static inline int is_global_init(struct task_struct *tsk)
{
- return tsk->pid == 1;
+ return task_pid_nr(tsk) == 1;
}
extern struct pid *cad_pid;
@@ -1930,7 +1930,7 @@ extern struct task_struct *idle_task(int cpu);
*/
static inline bool is_idle_task(const struct task_struct *p)
{
- return p->pid == 0;
+ return task_pid(p) == &init_struct_pid;
}
extern struct task_struct *curr_task(int cpu);
extern void set_curr_task(int cpu, struct task_struct *p);
Why would you ever want to do this? It just makes these tests more
expensive for no gain what so ff'ing ever.
Backups are generally considered a good idea, but in this case, I'd
quote:
"A man with one watch knows what time it is. A man with two is
never certain."

Reminds me of the twist of a phrase frequently seen in the US gov:
"Government Duplicity, Do Not Propagate" ;-)


Can you suggest a safe way to live with this duplicity?


- RGB

--
Richard Guy Briggs <***@redhat.com>
Senior Software Engineer
Kernel Security
AMER ENG Base Operating Systems
Remote, Ottawa, Canada
Voice: +1.647.777.2635
Internal: (81) 32635
Alt: +1.613.693.0684x3545
Peter Zijlstra
2013-08-23 06:36:21 UTC
Permalink
Post by Richard Guy Briggs
Post by Peter Zijlstra
Post by Richard Guy Briggs
This stops these four task helper functions from using the deprecated and
error-prone task->pid and task->tgid.
(informed by ebiederman's ea5a4d01)
---
include/linux/sched.h | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8e69807..46e739d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1579,7 +1579,7 @@ static inline int pid_alive(struct task_struct *p)
*/
static inline int is_global_init(struct task_struct *tsk)
{
- return tsk->pid == 1;
+ return task_pid_nr(tsk) == 1;
}
extern struct pid *cad_pid;
@@ -1930,7 +1930,7 @@ extern struct task_struct *idle_task(int cpu);
*/
static inline bool is_idle_task(const struct task_struct *p)
{
- return p->pid == 0;
+ return task_pid(p) == &init_struct_pid;
}
extern struct task_struct *curr_task(int cpu);
extern void set_curr_task(int cpu, struct task_struct *p);
Why would you ever want to do this? It just makes these tests more
expensive for no gain what so ff'ing ever.
Backups are generally considered a good idea, but in this case, I'd
"A man with one watch knows what time it is. A man with two is
never certain."
Except that's not the case, with namespaces there's a clear hierarchy
and the task_struct::pid is the one true value aka. root namespace.

Furthermore idle threads really are special and it doesn't make sense to
address them in any but the root namespace, doubly so because only
kernel space does this.

As for the init thread, that function is called is_global_init() for
crying out loud, what numb nut doesn't get that that's supposed to be
using the root namespace?

Seriously, you namespace guys should stop messing things up and
confusing yourselves and others.
Richard Guy Briggs
2013-08-27 02:37:22 UTC
Permalink
Post by Peter Zijlstra
Post by Richard Guy Briggs
Post by Peter Zijlstra
Post by Richard Guy Briggs
This stops these four task helper functions from using the deprecated and
error-prone task->pid and task->tgid.
(informed by ebiederman's ea5a4d01)
---
include/linux/sched.h | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8e69807..46e739d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1579,7 +1579,7 @@ static inline int pid_alive(struct task_struct *p)
*/
static inline int is_global_init(struct task_struct *tsk)
{
- return tsk->pid == 1;
+ return task_pid_nr(tsk) == 1;
}
extern struct pid *cad_pid;
@@ -1930,7 +1930,7 @@ extern struct task_struct *idle_task(int cpu);
*/
static inline bool is_idle_task(const struct task_struct *p)
{
- return p->pid == 0;
+ return task_pid(p) == &init_struct_pid;
}
extern struct task_struct *curr_task(int cpu);
extern void set_curr_task(int cpu, struct task_struct *p);
Why would you ever want to do this? It just makes these tests more
expensive for no gain what so ff'ing ever.
Backups are generally considered a good idea, but in this case, I'd
"A man with one watch knows what time it is. A man with two is
never certain."
Except that's not the case, with namespaces there's a clear hierarchy
and the task_struct::pid is the one true value aka. root namespace.
Peter, I agonized over the access efficiency of dropping this one or the
duplicate in task_struct::pids and this one was far easier to drop in
terms of somehow always forcing
task->pids[PIDTYPE_PID].pid->numbers[0].nr to point to task->pid.

It should be possible to audit the kernel to make certain task->pid is
only ever written at the time of task creation and copied to its own
task->pids[PIDTYPE_PID].pid->numbers[0].nr at the time of task creation
so that the two values are consistent. Continuously auditing the kernel
so this is the case would be a bit more of a challenge.

Would it be reasonable to suggest task_struct::pid only be accessed by
the existing inlined task_pid_nr() converted to const?

The goal is to gain assurance that any PIDs referred to in audit logs
are indisputable.

Would you be alright with doing away with task_struct::tgid?
Post by Peter Zijlstra
Furthermore idle threads really are special and it doesn't make sense to
address them in any but the root namespace, doubly so because only
kernel space does this.
If that is the case, and the above holds true, then we don't need the
second hunk, I agree.
Post by Peter Zijlstra
As for the init thread, that function is called is_global_init() for
crying out loud, what numb nut doesn't get that that's supposed to be
using the root namespace?
A process in another pid namespace? If that's never going to happen,
then drop it.
Post by Peter Zijlstra
Seriously, you namespace guys should stop messing things up and
confusing yourselves and others.
"you namespace guys"? I'm not a namespace guy. I'm a rusty kernel
network security guy taking on audit, trying to prepare it for moving
into a more and more namespace-enabled place of
containerization/light-virtualization.

We aren't ready for it yet, but there is demand to run additional auditd
daemons in other pid namespaces and some of this work is trying to move
in that direction.

This patchset certainly wasn't intended to be ready for adoption just
yet. It was this sort of discussion I was hoping to have.


- RGB

--
Richard Guy Briggs <***@redhat.com>
Senior Software Engineer
Kernel Security
AMER ENG Base Operating Systems
Remote, Ottawa, Canada
Voice: +1.647.777.2635
Internal: (81) 32635
Alt: +1.613.693.0684x3545
Peter Zijlstra
2013-08-27 12:11:33 UTC
Permalink
Post by Richard Guy Briggs
Post by Peter Zijlstra
Except that's not the case, with namespaces there's a clear hierarchy
and the task_struct::pid is the one true value aka. root namespace.
Peter, I agonized over the access efficiency of dropping this one or the
duplicate in task_struct::pids and this one was far easier to drop in
terms of somehow always forcing
task->pids[PIDTYPE_PID].pid->numbers[0].nr to point to task->pid.
You mean there's more than 1 site that sets task_struct::pid? I thought
we only assign that thing once in fork.c someplace.
Post by Richard Guy Briggs
It should be possible to audit the kernel to make certain task->pid is
only ever written at the time of task creation and copied to its own
task->pids[PIDTYPE_PID].pid->numbers[0].nr at the time of task creation
so that the two values are consistent. Continuously auditing the kernel
so this is the case would be a bit more of a challenge.
I know there's people running scripts over git commits to catch abuse,
if this is scriptable that might be doable.
Post by Richard Guy Briggs
Would it be reasonable to suggest task_struct::pid only be accessed by
the existing inlined task_pid_nr() converted to const?
Sure that works for me, alternatively what's wrong with making
task_struct::pid itself a const pid_t ? Then assignment will need an
ugly cast to even work.
Post by Richard Guy Briggs
The goal is to gain assurance that any PIDs referred to in audit logs
are indisputable.
Would you be alright with doing away with task_struct::tgid?
So I don't particularly use that one much -- if at all. So no I don't
mind it too much.
Post by Richard Guy Briggs
Post by Peter Zijlstra
Furthermore idle threads really are special and it doesn't make sense to
address them in any but the root namespace, doubly so because only
kernel space does this.
If that is the case, and the above holds true, then we don't need the
second hunk, I agree.
It should be the case -- not entirely sure it is the case, but in any
case pid-0 should be 'special' by all accounts.
Post by Richard Guy Briggs
Post by Peter Zijlstra
As for the init thread, that function is called is_global_init() for
crying out loud, what numb nut doesn't get that that's supposed to be
using the root namespace?
A process in another pid namespace? If that's never going to happen,
then drop it.
That'd be a bug I suppose, you want the 'global' init when using that
function. I don't use this function, never have. So I really don't know
_that_ much about it. It just seems to me that the name really implies
it wants the root init process and not any other.
Post by Richard Guy Briggs
Post by Peter Zijlstra
Seriously, you namespace guys should stop messing things up and
confusing yourselves and others.
"you namespace guys"? I'm not a namespace guy. I'm a rusty kernel
network security guy taking on audit, trying to prepare it for moving
into a more and more namespace-enabled place of
containerization/light-virtualization.
Well, you let yourself in with 'those' people ;-)
Post by Richard Guy Briggs
We aren't ready for it yet, but there is demand to run additional auditd
daemons in other pid namespaces and some of this work is trying to move
in that direction.
So afaict that's 'simply' writing the 'right' pid to your logger, right?
Your additional concern that the pid space isn't corrupted sounds a bit
superfluous to me, we don't typically muck about with pids, stuff would
horribly break if we did that.

There's a few special cases, like the idle threads having pid-0 and
'simple' people like myself prefer to use task_struct::pid for debugging
when we run our simple kernels without all this namespace stuff enabled.

The entire task->pids[PIDTYPE_PID].pid->numbers[0].nr thing just seems
increddibly unwieldy and double dereferences, even if the lines are
'hot' are worse than single derefs.
Eric W. Biederman
2013-08-27 21:35:11 UTC
Permalink
Post by Peter Zijlstra
Post by Richard Guy Briggs
Post by Peter Zijlstra
Except that's not the case, with namespaces there's a clear hierarchy
and the task_struct::pid is the one true value aka. root namespace.
Peter, I agonized over the access efficiency of dropping this one or the
duplicate in task_struct::pids and this one was far easier to drop in
terms of somehow always forcing
task->pids[PIDTYPE_PID].pid->numbers[0].nr to point to task->pid.
You mean there's more than 1 site that sets task_struct::pid? I thought
we only assign that thing once in fork.c someplace.
No there is not and that is not a concern.

Now I had thought given how the perf subsystem was constructed that only
the god like root was even allowed to use the code. But it turns out
there is sysctl_perf_event_paranoid that can bet twiddled that in some
circumstance that unprivileged users are allowed to use perf. Which
ultimately means perf will return the wrong data.

Meaning that perf is broken by design and perf has no excuse to be using
task->pid. Similarly every other place in the kernel that has made the
same mistake. I mention perf explicitly for two reasons. perf gets the
namespace handling horribly wrong, perf is the only place in the kernel
where we are accessing pids frequently enough for an extra cache line
miss to be a concern.

When really pids in the kernel what we care about is not some stupid
number but the stuct pid which points to that tasks, process groups, and
sessions. You know the object that a pid is the name for.

So yes as a long term direction task->pid and task->tgid need to be
killed because we keep getting subsystems like perf that return the
wrong data to userspace, or perform the wrong checks, because the
current structure makes it seem like it is ok to do the wrong thing.

Now that should not be Richard's fight. Hopefully he can focus on
fixing audit.
Post by Peter Zijlstra
There's a few special cases, like the idle threads having pid-0 and
'simple' people like myself prefer to use task_struct::pid for debugging
when we run our simple kernels without all this namespace stuff enabled.
Which is why a special printf format is likely a good idea because it
means you can easily print pids without needing to call ungainly helper
functions.

Of course you can't run kernels without this ``namespace'' stuff
enabled. The best you can do is run kernels without the ability to
create new instances of the namespaces.
Post by Peter Zijlstra
The entire task->pids[PIDTYPE_PID].pid->numbers[0].nr thing just seems
increddibly unwieldy and double dereferences, even if the lines are
'hot' are worse than single derefs.
But it is so much better than having to look up task->pid in a hash
table to get anything done, which is the previous state of affairs.

When the pid namespace support was merged except for a few overlooked
corner cases everything was converted except a bunch of printk
statements. Now I look in the kernel and we have had subsystems added
that totally get the namespace handling wrong because it is easy and
apparently socially acceptable to mess up other peoples hard work.

Apparently even Linus yelling at people a few years back wasn't enough
for people to wake up and be responsible developers and use proper
abstractions. So the only valid long term direction I can see is to
remove all of the abstractions that make getting pid handling wrong,
and to fix all of the code that gets it wrong today. So that there are
no more bad examples in the kernel.

This isn't Richard's fight, and this isn't what needs to happen with
audit. Audit just needs to be fixed so that so that it reports pid
numbers the audit daemon can make sense of, and to do that the audit
should use helper functions that are pid namespace aware and make it
clear that the proper pid namespace is being used.

In the long term ->pid and ->tgid must die, and take all of this wrong
think with it.

Eric
Peter Zijlstra
2013-08-28 08:16:31 UTC
Permalink
Post by Eric W. Biederman
Post by Peter Zijlstra
Post by Richard Guy Briggs
Post by Peter Zijlstra
Except that's not the case, with namespaces there's a clear hierarchy
and the task_struct::pid is the one true value aka. root namespace.
Peter, I agonized over the access efficiency of dropping this one or the
duplicate in task_struct::pids and this one was far easier to drop in
terms of somehow always forcing
task->pids[PIDTYPE_PID].pid->numbers[0].nr to point to task->pid.
You mean there's more than 1 site that sets task_struct::pid? I thought
we only assign that thing once in fork.c someplace.
No there is not and that is not a concern.
Now I had thought given how the perf subsystem was constructed that only
the god like root was even allowed to use the code. But it turns out
there is sysctl_perf_event_paranoid that can bet twiddled that in some
circumstance that unprivileged users are allowed to use perf.
Even without poking at that, a user is always allowed to use perf on his
own tasks.
Post by Eric W. Biederman
Which
ultimately means perf will return the wrong data.
How so, perf uses pid-namespaces, have a look at
kernel/events/core.c:perf_event_[pt]id(). We store the namespace of the
task creating the event (which is also -- hopefully -- the consumer of
the data it generates). If you create an event and then switch
namespaces you've bigger issues I suppose.
Post by Eric W. Biederman
Meaning that perf is broken by design and perf has no excuse to be using
task->pid.
It doesn't.
Post by Eric W. Biederman
Similarly every other place in the kernel that has made the
same mistake. I mention perf explicitly for two reasons. perf gets the
namespace handling horribly wrong,
Do tell.
Post by Eric W. Biederman
perf is the only place in the kernel
where we are accessing pids frequently enough for an extra cache line
miss to be a concern.
When really pids in the kernel what we care about is not some stupid
number but the stuct pid which points to that tasks, process groups, and
sessions. You know the object that a pid is the name for.
So yes as a long term direction task->pid and task->tgid need to be
killed because we keep getting subsystems like perf that return the
wrong data to userspace, or perform the wrong checks, because the
current structure makes it seem like it is ok to do the wrong thing.
I think you should have a look at code before you start raving.. makes
you looks silly.
Oleg Nesterov
2013-08-23 19:28:07 UTC
Permalink
Post by Richard Guy Briggs
Post by Peter Zijlstra
Why would you ever want to do this? It just makes these tests more
expensive for no gain what so ff'ing ever.
Backups are generally considered a good idea, but in this case, I'd
And perhaps you are right. At least we can probably kill task->tgid.
And I agree, it would be nice to kill task->pid.

But: I also agree with Peter, we should try to not make the current
code more expensive.

Anyway. Imho, you should not mix the different things in one series.
If you want to fix audit, do not add the patches like 10/12 or 11/12
into this series.

Or 3/12. OK, I agree sys_getppid() in audit_log_task_info() looks
strange at least. Just fix it using the helpers we already have and
add the new helpers later. Or send the patch(es) which adds the new
helpers first.

Or task_pid_nr_init_ns()... For what? We already have task_pid_nr().
Use the helper we already have, or introduce the new one first and
change the current users of task_pid_nr().

In short. Fortunately you do not need to convince me, I do not
maintain audit or namespaces ;) But imho this series looks a bit
confusing.

Oleg.
Richard Guy Briggs
2013-08-27 03:04:48 UTC
Permalink
Post by Oleg Nesterov
Post by Richard Guy Briggs
Post by Peter Zijlstra
Why would you ever want to do this? It just makes these tests more
expensive for no gain what so ff'ing ever.
Backups are generally considered a good idea, but in this case, I'd
And perhaps you are right. At least we can probably kill task->tgid.
And I agree, it would be nice to kill task->pid.
But: I also agree with Peter, we should try to not make the current
code more expensive.
I don't disagree. I was given some hope by Eric Biederman that it might
not cause cache-line misses...
Post by Oleg Nesterov
Anyway. Imho, you should not mix the different things in one series.
If you want to fix audit, do not add the patches like 10/12 or 11/12
into this series.
They were included to gain reassurance that PIDs reported in audit logs
were accurate. If those assurances can be made, then I can limit my
work to audit itself.
Post by Oleg Nesterov
Or 3/12.
That is a cleanup to make clear what parts are actually pid-related and
what isn't.
Post by Oleg Nesterov
OK, I agree sys_getppid() in audit_log_task_info() looks
strange at least. Just fix it using the helpers we already have and
add the new helpers later. Or send the patch(es) which adds the new
helpers first.
Patch 04/12 is that helper. It is used in only two places in audit (and
once in apparmor), so I could have duplicated the code, but since it
needs rcu locking, an inline funciton in the audit namespace seemed
somewhat pointless when it could just as easily be shared with other
subsystems.
Post by Oleg Nesterov
Or task_pid_nr_init_ns()... For what? We already have task_pid_nr().
Use the helper we already have, or introduce the new one first and
change the current users of task_pid_nr().
If task_struct::pid is definitely not going away, then that whole part
is moot and we'll just use task_pid_nr() as is.
Post by Oleg Nesterov
In short. Fortunately you do not need to convince me, I do not
maintain audit or namespaces ;) But imho this series looks a bit
confusing.
It is incomplete. The last step(s) were intended to purge
task_struct::pid (or abstract it using task_pid_nr()), which haven't
been submitted yet. The whole point of the patchset was to give
confidence in the PIDs reported in any audit logs.
Post by Oleg Nesterov
Oleg.
- RGB

--
Richard Guy Briggs <***@redhat.com>
Senior Software Engineer
Kernel Security
AMER ENG Base Operating Systems
Remote, Ottawa, Canada
Voice: +1.647.777.2635
Internal: (81) 32635
Alt: +1.613.693.0684x3545
Oleg Nesterov
2013-08-27 17:11:34 UTC
Permalink
Post by Richard Guy Briggs
Post by Oleg Nesterov
Or 3/12.
That is a cleanup to make clear what parts are actually pid-related and
what isn't.
You know, I decided to send another email about this patch. This cleanup
doesn't look even correct.
Post by Richard Guy Briggs
Post by Oleg Nesterov
OK, I agree sys_getppid() in audit_log_task_info() looks
strange at least. Just fix it using the helpers we already have and
add the new helpers later. Or send the patch(es) which adds the new
helpers first.
Patch 04/12 is that helper. It is used in only two places in audit
I see what 3/4 do. But I am not sure we need this. At least in this
series.

OK, why do we need 3 new helper? audit needs only one,
task_ppid_nr_init_ns(). And who else needs this task_ppid* stuff?
Post by Richard Guy Briggs
once in apparmor),
OK, apparmor. So perhaps a single new helper in sched.c makes sense,
I dunno. But see above.
Post by Richard Guy Briggs
Post by Oleg Nesterov
Or task_pid_nr_init_ns()... For what? We already have task_pid_nr().
Use the helper we already have, or introduce the new one first and
change the current users of task_pid_nr().
If task_struct::pid is definitely not going away, then that whole part
is moot and we'll just use task_pid_nr() as is.
Can't understand. We already have task_tgid_nr(). This helper can be
changed to avoid ->tgig. Why task_ppid_nr_init_ns() can't use the
helper we already have?

But let me repeat. I am not maintainer and I do not really care.
You should convince Eric, I am not going to argue.


Btw. audit looks unmaintained... if you are going to take care of
this code, perhaps you can look at

http://marc.info/?l=linux-kernel&m=137589907108485
http://marc.info/?l=linux-kernel&m=137590271809664

?

Oleg.
Richard Guy Briggs
2013-08-30 19:06:46 UTC
Permalink
Post by Oleg Nesterov
Btw. audit looks unmaintained... if you are going to take care of
this code, perhaps you can look at
http://marc.info/?l=linux-kernel&m=137589907108485
http://marc.info/?l=linux-kernel&m=137590271809664
(I don't want to lose these refs... First I've seen these.)

Why do you say this? Could you elaborate? Due to the state of the code
itself, or the lack of response from audit folks? (Like most, I'm not
subscribed to that firehose, so I don't have archives that show
addressees.) Most of the kernel audit folks are on
Post by Oleg Nesterov
Oleg.
- RGB

--
Richard Guy Briggs <***@redhat.com>
Senior Software Engineer
Kernel Security
AMER ENG Base Operating Systems
Remote, Ottawa, Canada
Voice: +1.647.777.2635
Internal: (81) 32635
Alt: +1.613.693.0684x3545
Steve Grubb
2013-08-30 19:54:46 UTC
Permalink
Post by Oleg Nesterov
Btw. audit looks unmaintained... if you are going to take care of
this code, perhaps you can look at
http://marc.info/?l=linux-kernel&m=137589907108485
http://marc.info/?l=linux-kernel&m=137590271809664
You don't want to clear the TIF audit flag when context == NULL. What that will
do is make a bunch of inauditable processes. There are times when audit is
disabled and then re-enabled later. If the flag gets cleared, then a task's
syscall will never enter the auditing framework from kernel/entry_64.S.

That flag is 0 when auditing has never ever been enabled. If auditing is
enabled, it should always be a 1 unless the task filter has determined that
this process should not be audited ever. In practice, this is almost never
used. But ensuring the TIF_SYSCALL_AUDIT set to 1 on all processes is why we
have the boot argument. Not setting audit=1 on the boot arguments means that
any process running before the audit daemon enables auditing can never ever be
audited because the only place its set is when processes are cloned.

Hope this clears up the use. NAK to the patch, it'll break auditing.

-Steve
Oleg Nesterov
2013-09-08 15:54:35 UTC
Permalink
Sorry for delay, vacation.

First of all, I do not pretend I understand this code. This was mostly
the question, and in fact I mostly asked about audit_bprm() in 0/1.

However,
Post by Steve Grubb
Post by Oleg Nesterov
Btw. audit looks unmaintained... if you are going to take care of
this code, perhaps you can look at
http://marc.info/?l=linux-kernel&m=137589907108485
http://marc.info/?l=linux-kernel&m=137590271809664
You don't want to clear the TIF audit flag when context == NULL. What that will
do is make a bunch of inauditable processes. There are times when audit is
disabled and then re-enabled later. If the flag gets cleared, then a task's
syscall will never enter the auditing framework from kernel/entry_64.S.
That flag is 0 when auditing has never ever been enabled. If auditing is
enabled, it should always be a 1 unless the task filter has determined that
this process should not be audited ever. In practice, this is almost never
used. But ensuring the TIF_SYSCALL_AUDIT set to 1 on all processes is why we
have the boot argument. Not setting audit=1 on the boot arguments means that
any process running before the audit daemon enables auditing can never ever be
audited because the only place its set is when processes are cloned.
Then why audit_alloc() doesn't set TIF_SYSCALL_AUDIT unconditionally?

And I do not understand "when context == NULL" above. Say, audit_syscall_entry()
does nothing if !audit_context, and nobody except copy_process() does
audit_alloc(). So why do we need to trigger the audit's paths if it is NULL?
Post by Steve Grubb
Hope this clears up the use. NAK to the patch, it'll break auditing.
Not really, but thanks for your reply anyway.

Oleg.
Oleg Nesterov
2013-09-10 17:20:33 UTC
Permalink
Post by Oleg Nesterov
First of all, I do not pretend I understand this code. This was mostly
the question, and in fact I mostly asked about audit_bprm() in 0/1.
However,
Post by Steve Grubb
Post by Oleg Nesterov
Btw. audit looks unmaintained... if you are going to take care of
this code, perhaps you can look at
http://marc.info/?l=linux-kernel&m=137589907108485
http://marc.info/?l=linux-kernel&m=137590271809664
You don't want to clear the TIF audit flag when context == NULL. What that will
do is make a bunch of inauditable processes. There are times when audit is
disabled and then re-enabled later. If the flag gets cleared, then a task's
syscall will never enter the auditing framework from kernel/entry_64.S.
That flag is 0 when auditing has never ever been enabled. If auditing is
enabled, it should always be a 1 unless the task filter has determined that
this process should not be audited ever. In practice, this is almost never
used. But ensuring the TIF_SYSCALL_AUDIT set to 1 on all processes is why we
have the boot argument. Not setting audit=1 on the boot arguments means that
any process running before the audit daemon enables auditing can never ever be
audited because the only place its set is when processes are cloned.
Then why audit_alloc() doesn't set TIF_SYSCALL_AUDIT unconditionally?
And I do not understand "when context == NULL" above. Say, audit_syscall_entry()
does nothing if !audit_context, and nobody except copy_process() does
audit_alloc(). So why do we need to trigger the audit's paths if it is NULL?
Post by Steve Grubb
Hope this clears up the use. NAK to the patch, it'll break auditing.
Not really, but thanks for your reply anyway.
So, Steve, do you still think that patch was wrong? Attached below
just in case.

Oleg.

[PATCH 1/1] audit_alloc: clear TIF_SYSCALL_AUDIT if !audit_context

If audit_filter_task() nacks the new thread it makes sense
to clear TIF_SYSCALL_AUDIT which can be copied from parent
by dup_task_struct().

A wrong TIF_SYSCALL_AUDIT is not really bad, but it triggers
the "slow" audit paths in entry.S.

Signed-off-by: Oleg Nesterov <***@redhat.com>
---
kernel/auditsc.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 9845cb3..95293ab 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -943,8 +943,10 @@ int audit_alloc(struct task_struct *tsk)
return 0; /* Return if not auditing. */

state = audit_filter_task(tsk, &key);
- if (state == AUDIT_DISABLED)
+ if (state == AUDIT_DISABLED) {
+ clear_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
return 0;
+ }

if (!(context = audit_alloc_context(state))) {
kfree(key);
--
1.5.5.1
Steve Grubb
2013-09-13 18:42:56 UTC
Permalink
Post by Oleg Nesterov
Post by Oleg Nesterov
First of all, I do not pretend I understand this code. This was mostly
the question, and in fact I mostly asked about audit_bprm() in 0/1.
However,
Post by Steve Grubb
Post by Oleg Nesterov
Btw. audit looks unmaintained... if you are going to take care of
this code, perhaps you can look at
http://marc.info/?l=linux-kernel&m=137589907108485
http://marc.info/?l=linux-kernel&m=137590271809664
You don't want to clear the TIF audit flag when context == NULL. What
that will do is make a bunch of inauditable processes. There are times
when audit is disabled and then re-enabled later. If the flag gets
cleared, then a task's syscall will never enter the auditing framework
from kernel/entry_64.S.
That flag is 0 when auditing has never ever been enabled. If auditing is
enabled, it should always be a 1 unless the task filter has determined that
this process should not be audited ever. In practice, this is almost never
used. But ensuring the TIF_SYSCALL_AUDIT set to 1 on all processes is
why we have the boot argument. Not setting audit=1 on the boot
arguments means that any process running before the audit daemon
enables auditing can never ever be audited because the only place its
set is when processes are cloned.>
Then why audit_alloc() doesn't set TIF_SYSCALL_AUDIT unconditionally?
And I do not understand "when context == NULL" above. Say,
audit_syscall_entry() does nothing if !audit_context, and nobody except
copy_process() does audit_alloc(). So why do we need to trigger the
audit's paths if it is NULL?>
Post by Steve Grubb
Hope this clears up the use. NAK to the patch, it'll break auditing.
Not really, but thanks for your reply anyway.
So, Steve, do you still think that patch was wrong? Attached below
just in case.
I think this looks OK. If the task filter NACK's auditing the process, then
clearing the flag is probably correct. I have design notes from back around the
2.6.7 kernel saying this was the intention.

ACK.

-Steve
Post by Oleg Nesterov
[PATCH 1/1] audit_alloc: clear TIF_SYSCALL_AUDIT if !audit_context
If audit_filter_task() nacks the new thread it makes sense
to clear TIF_SYSCALL_AUDIT which can be copied from parent
by dup_task_struct().
A wrong TIF_SYSCALL_AUDIT is not really bad, but it triggers
the "slow" audit paths in entry.S.
---
kernel/auditsc.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 9845cb3..95293ab 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -943,8 +943,10 @@ int audit_alloc(struct task_struct *tsk)
return 0; /* Return if not auditing. */
state = audit_filter_task(tsk, &key);
- if (state == AUDIT_DISABLED)
+ if (state == AUDIT_DISABLED) {
+ clear_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
return 0;
+ }
if (!(context = audit_alloc_context(state))) {
kfree(key);
Oleg Nesterov
2013-09-14 18:10:01 UTC
Permalink
Post by Steve Grubb
Post by Oleg Nesterov
So, Steve, do you still think that patch was wrong? Attached below
just in case.
I think this looks OK. If the task filter NACK's auditing the process, then
clearing the flag is probably correct. I have design notes from back around the
2.6.7 kernel saying this was the intention.
Then I do not really understand your previous email... Nevermind ;)
Post by Steve Grubb
ACK.
Thanks. I'll resend this patch with your ack applied.

Oleg.
Steve Grubb
2013-09-13 18:28:17 UTC
Permalink
Post by Oleg Nesterov
Sorry for delay, vacation.
First of all, I do not pretend I understand this code. This was mostly
the question, and in fact I mostly asked about audit_bprm() in 0/1.
However,
Post by Steve Grubb
Post by Oleg Nesterov
Btw. audit looks unmaintained... if you are going to take care of
this code, perhaps you can look at
http://marc.info/?l=linux-kernel&m=137589907108485
http://marc.info/?l=linux-kernel&m=137590271809664
You don't want to clear the TIF audit flag when context == NULL. What that
will do is make a bunch of inauditable processes. There are times when
audit is disabled and then re-enabled later. If the flag gets cleared,
then a task's syscall will never enter the auditing framework from
kernel/entry_64.S.
That flag is 0 when auditing has never ever been enabled. If auditing is
enabled, it should always be a 1 unless the task filter has determined that
this process should not be audited ever. In practice, this is almost never
used. But ensuring the TIF_SYSCALL_AUDIT set to 1 on all processes is why
we have the boot argument. Not setting audit=1 on the boot arguments
means that any process running before the audit daemon enables auditing
can never ever be audited because the only place its set is when
processes are cloned.
Then why audit_alloc() doesn't set TIF_SYSCALL_AUDIT unconditionally?
The code I'm looking at does right at the end of the function.
Post by Oleg Nesterov
And I do not understand "when context == NULL" above. Say,
audit_syscall_entry() does nothing if !audit_context, and nobody except
copy_process() does audit_alloc(). So why do we need to trigger the audit's
paths if it is NULL?
Because if you enter the audit framework, that means auditing has been turned
on at some point in the past, and could be turned back on at some point in the
future. If auditing has never been enabled, then you never enter the audit
framework at all. If the context is NULL, then audit is not currently enabled.

-Steve
Oleg Nesterov
2013-09-14 18:08:52 UTC
Permalink
Post by Steve Grubb
Post by Oleg Nesterov
Then why audit_alloc() doesn't set TIF_SYSCALL_AUDIT unconditionally?
The code I'm looking at does right at the end of the function.
The code I'm looking at does right at the end too ;) but it also
returns at the start if audit_filter_task() returns AUDIT_DISABLED.
Post by Steve Grubb
Post by Oleg Nesterov
And I do not understand "when context == NULL" above. Say,
audit_syscall_entry() does nothing if !audit_context, and nobody except
copy_process() does audit_alloc(). So why do we need to trigger the audit's
paths if it is NULL?
Because if you enter the audit framework,
framework? TIF_SYSCALL_AUDIT has only meaning in entry.S, we need it
to ensure that the audited task can't miss audit_syscall_*().
Post by Steve Grubb
that means auditing has been turned
on at some point in the past, and could be turned back on at some point in the
future.
And this will change nothing, afaics (wrt TIF_SYSCALL_AUDIT).

Oleg.
Richard Guy Briggs
2013-08-20 21:32:04 UTC
Permalink
It doesn't make any sense to recallers to pass in a non-const struct
task so update the function signatures to only require a const struct
task.

(informed by ebiederman's c76b2526)
Cc: "Eric W. Biederman" <***@xmission.com>
Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
include/linux/sched.h | 44 ++++++++++++++++++++++----------------------
kernel/pid.c | 4 ++--
2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 46e739d..a585b51 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1423,12 +1423,12 @@ static inline void set_numabalancing_state(bool enabled)
}
#endif

-static inline struct pid *task_pid(struct task_struct *task)
+static inline struct pid *task_pid(const struct task_struct *task)
{
return task->pids[PIDTYPE_PID].pid;
}

-static inline struct pid *task_tgid(struct task_struct *task)
+static inline struct pid *task_tgid(const struct task_struct *task)
{
return task->group_leader->pids[PIDTYPE_PID].pid;
}
@@ -1438,12 +1438,12 @@ static inline struct pid *task_tgid(struct task_struct *task)
* the result of task_pgrp/task_session even if task == current,
* we can race with another thread doing sys_setsid/sys_setpgid.
*/
-static inline struct pid *task_pgrp(struct task_struct *task)
+static inline struct pid *task_pgrp(const struct task_struct *task)
{
return task->group_leader->pids[PIDTYPE_PGID].pid;
}

-static inline struct pid *task_session(struct task_struct *task)
+static inline struct pid *task_session(const struct task_struct *task)
{
return task->group_leader->pids[PIDTYPE_SID].pid;
}
@@ -1468,50 +1468,50 @@ struct pid_namespace;
*
* see also pid_nr() etc in include/linux/pid.h
*/
-pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
+pid_t __task_pid_nr_ns(const struct task_struct *task, enum pid_type type,
struct pid_namespace *ns);

-static inline pid_t task_pid_nr(struct task_struct *tsk)
+static inline pid_t task_pid_nr(const struct task_struct *tsk)
{
return pid_nr(task_pid(tsk));
}

-static inline pid_t task_pid_nr_ns(struct task_struct *tsk,
+static inline pid_t task_pid_nr_ns(const struct task_struct *tsk,
struct pid_namespace *ns)
{
return __task_pid_nr_ns(tsk, PIDTYPE_PID, ns);
}

-static inline pid_t task_pid_nr_init_ns(struct task_struct *tsk)
+static inline pid_t task_pid_nr_init_ns(const struct task_struct *tsk)
{
return task_pid_nr_ns(tsk, &init_pid_ns);
}

-static inline pid_t task_pid_vnr(struct task_struct *tsk)
+static inline pid_t task_pid_vnr(const struct task_struct *tsk)
{
return __task_pid_nr_ns(tsk, PIDTYPE_PID, NULL);
}


-static inline pid_t task_tgid_nr(struct task_struct *tsk)
+static inline pid_t task_tgid_nr(const struct task_struct *tsk)
{
return pid_nr(task_tgid(tsk));
}

-pid_t task_tgid_nr_ns(struct task_struct *tsk, struct pid_namespace *ns);
+pid_t task_tgid_nr_ns(const struct task_struct *tsk, struct pid_namespace *ns);

-static inline pid_t task_tgid_nr_init_ns(struct task_struct *tsk)
+static inline pid_t task_tgid_nr_init_ns(const struct task_struct *tsk)
{
return task_tgid_nr_ns(tsk, &init_pid_ns);
}

-static inline pid_t task_tgid_vnr(struct task_struct *tsk)
+static inline pid_t task_tgid_vnr(const struct task_struct *tsk)
{
return pid_vnr(task_tgid(tsk));
}


-static inline pid_t task_ppid_nr_ns(struct task_struct *tsk,
+static inline pid_t task_ppid_nr_ns(const struct task_struct *tsk,
struct pid_namespace *ns)
{
pid_t pid;
@@ -1523,37 +1523,37 @@ static inline pid_t task_ppid_nr_ns(struct task_struct *tsk,
return pid;
}

-static inline pid_t task_ppid_nr_init_ns(struct task_struct *tsk)
+static inline pid_t task_ppid_nr_init_ns(const struct task_struct *tsk)
{
return task_ppid_nr_ns(tsk, &init_pid_ns);
}


-static inline pid_t task_pgrp_nr_ns(struct task_struct *tsk,
+static inline pid_t task_pgrp_nr_ns(const struct task_struct *tsk,
struct pid_namespace *ns)
{
return __task_pid_nr_ns(tsk, PIDTYPE_PGID, ns);
}

-static inline pid_t task_pgrp_vnr(struct task_struct *tsk)
+static inline pid_t task_pgrp_vnr(const struct task_struct *tsk)
{
return __task_pid_nr_ns(tsk, PIDTYPE_PGID, NULL);
}


-static inline pid_t task_session_nr_ns(struct task_struct *tsk,
+static inline pid_t task_session_nr_ns(const struct task_struct *tsk,
struct pid_namespace *ns)
{
return __task_pid_nr_ns(tsk, PIDTYPE_SID, ns);
}

-static inline pid_t task_session_vnr(struct task_struct *tsk)
+static inline pid_t task_session_vnr(const struct task_struct *tsk)
{
return __task_pid_nr_ns(tsk, PIDTYPE_SID, NULL);
}

/* obsolete, do not use */
-static inline pid_t task_pgrp_nr(struct task_struct *tsk)
+static inline pid_t task_pgrp_nr(const struct task_struct *tsk)
{
return task_pgrp_nr_ns(tsk, &init_pid_ns);
}
@@ -1566,7 +1566,7 @@ static inline pid_t task_pgrp_nr(struct task_struct *tsk)
* If pid_alive fails, then pointers within the task structure
* can be stale and must not be dereferenced.
*/
-static inline int pid_alive(struct task_struct *p)
+static inline int pid_alive(const struct task_struct *p)
{
return p->pids[PIDTYPE_PID].pid != NULL;
}
@@ -1577,7 +1577,7 @@ static inline int pid_alive(struct task_struct *p)
*
* Check if a task structure is the first user space task the kernel created.
*/
-static inline int is_global_init(struct task_struct *tsk)
+static inline int is_global_init(const struct task_struct *tsk)
{
return task_pid_nr(tsk) == 1;
}
diff --git a/kernel/pid.c b/kernel/pid.c
index 66505c1..17755ae 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -506,7 +506,7 @@ pid_t pid_vnr(struct pid *pid)
}
EXPORT_SYMBOL_GPL(pid_vnr);

-pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
+pid_t __task_pid_nr_ns(const struct task_struct *task, enum pid_type type,
struct pid_namespace *ns)
{
pid_t nr = 0;
@@ -525,7 +525,7 @@ pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
}
EXPORT_SYMBOL(__task_pid_nr_ns);

-pid_t task_tgid_nr_ns(struct task_struct *tsk, struct pid_namespace *ns)
+pid_t task_tgid_nr_ns(const struct task_struct *tsk, struct pid_namespace *ns)
{
return pid_nr_ns(task_tgid(tsk), ns);
}
--
1.7.1
Richard Guy Briggs
2013-12-23 22:27:34 UTC
Permalink
This patchset is a revival of some of Eric Biederman's work to make audit
pid-namespace-safe.

In a couple of places, audit was printing PIDs in the task's pid namespace
rather than relative to the audit daemon's pid namespace, which currently is
init_pid_ns.

It also allows processes to log audit user messages in their own pid
namespaces, which was not previously permitted. Please see:
https://bugzilla.redhat.com/show_bug.cgi?id=947530
https://bugs.launchpad.net/ubuntu/+source/vsftpd/+bug/1160372
https://bugzilla.novell.com/show_bug.cgi?id=786024

Part of the cleanup here involves deprecating task->pid and task->tgid, which
should be accessed using their respective helper functions.

See: https://lkml.org/lkml/2013/8/20/638

Richard Guy Briggs (5):
pid: get pid_t ppid of task in init_pid_ns
audit: convert PPIDs to the inital PID namespace.
audit: store audit_pid as a struct pid pointer
audit: anchor all pid references in the initial pid namespace
audit: allow user processes to log from another PID namespace

drivers/tty/tty_audit.c | 3 +-
include/linux/sched.h | 24 +++++++++++++++
kernel/audit.c | 54 ++++++++++++++++++++++++----------
kernel/audit.h | 4 +-
kernel/auditfilter.c | 17 ++++++++++-
kernel/auditsc.c | 24 ++++++++-------
security/apparmor/audit.c | 2 +-
security/integrity/integrity_audit.c | 2 +-
security/lsm_audit.c | 11 ++++--
9 files changed, 104 insertions(+), 37 deletions(-)
Richard Guy Briggs
2014-02-19 20:57:38 UTC
Permalink
This patchset is a revival of some of Eric Biederman's work to make audit
pid-namespace-safe.

In a couple of places, audit was printing PIDs in the task's pid namespace
rather than relative to the audit daemon's pid namespace, which currently is
init_pid_ns.

It also allows processes to log audit user messages in their own pid
namespaces, which was not previously permitted. Please see:
https://bugzilla.redhat.com/show_bug.cgi?id=947530
https://bugs.launchpad.net/ubuntu/+source/vsftpd/+bug/1160372
https://bugzilla.novell.com/show_bug.cgi?id=786024

Part of the cleanup here involves deprecating task->pid and task->tgid, which
should be accessed using their respective helper functions.

See: https://lkml.org/lkml/2013/8/20/638


Richard Guy Briggs (5):
pid: get pid_t ppid of task in init_pid_ns
audit: convert PPIDs to the inital PID namespace.
audit: store audit_pid as a struct pid pointer
audit: anchor all pid references in the initial pid namespace
audit: allow user processes to log from another PID namespace

drivers/tty/tty_audit.c | 3 +-
include/linux/sched.h | 18 ++++++++++
kernel/audit.c | 59 ++++++++++++++++++++++++----------
kernel/audit.h | 4 +-
kernel/auditfilter.c | 17 +++++++++-
kernel/auditsc.c | 24 +++++++------
security/integrity/integrity_audit.c | 2 +-
security/lsm_audit.c | 11 ++++--
8 files changed, 101 insertions(+), 37 deletions(-)
Richard Guy Briggs
2014-02-19 20:57:39 UTC
Permalink
Added the functions task_ppid_nr_ns() and task_ppid_nr() to abstract the lookup
of the PPID (real_parent's pid_t) of a process, including rcu locking, in the
arbitrary and init_pid_ns.
This provides an alternative to sys_getppid(), which is relative to the child
process' pid namespace.

(informed by ebiederman's 6c621b7e)
Cc: ***@vger.kernel.org
Cc: Eric W. Biederman <***@xmission.com>
Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
include/linux/sched.h | 18 ++++++++++++++++++
1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 53f97eb..685326f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1561,6 +1561,24 @@ static inline pid_t task_tgid_vnr(struct task_struct *tsk)
}


+static int pid_alive(const struct task_struct *p);
+static inline pid_t task_ppid_nr_ns(const struct task_struct *tsk, struct pid_namespace *ns)
+{
+ pid_t pid = 0;
+
+ rcu_read_lock();
+ if (pid_alive(tsk))
+ pid = task_tgid_nr_ns(rcu_dereference(tsk->real_parent), ns);
+ rcu_read_unlock();
+
+ return pid;
+}
+
+static inline pid_t task_ppid_nr(const struct task_struct *tsk)
+{
+ return task_ppid_nr_ns(tsk, &init_pid_ns);
+}
+
static inline pid_t task_pgrp_nr_ns(struct task_struct *tsk,
struct pid_namespace *ns)
{
--
1.7.1
Richard Guy Briggs
2014-02-19 20:57:40 UTC
Permalink
sys_getppid() returns the parent pid of the current process in its own pid
namespace. Since audit filters are based in the init pid namespace, a process
could avoid a filter or trigger an unintended one by being in an alternate pid
namespace or log meaningless information.

Switch to task_ppid_nr() for PPIDs to anchor all audit filters in the
init_pid_ns.

(informed by ebiederman's 6c621b7e)
Cc: ***@vger.kernel.org
Cc: Eric W. Biederman <***@xmission.com>
Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
kernel/audit.c | 4 ++--
kernel/auditsc.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 34c5a23..f5ea718 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1818,10 +1818,10 @@ void audit_log_task_info(struct audit_buffer *ab, struct task_struct *tsk)
spin_unlock_irq(&tsk->sighand->siglock);

audit_log_format(ab,
- " ppid=%ld pid=%d auid=%u uid=%u gid=%u"
+ " ppid=%d pid=%d auid=%u uid=%u gid=%u"
" euid=%u suid=%u fsuid=%u"
" egid=%u sgid=%u fsgid=%u tty=%s ses=%u",
- sys_getppid(),
+ task_ppid_nr(tsk),
tsk->pid,
from_kuid(&init_user_ns, audit_get_loginuid(tsk)),
from_kuid(&init_user_ns, cred->uid),
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 0c3bf79..b909715 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -461,7 +461,7 @@ static int audit_filter_rules(struct task_struct *tsk,
case AUDIT_PPID:
if (ctx) {
if (!ctx->ppid)
- ctx->ppid = sys_getppid();
+ ctx->ppid = task_ppid_nr(tsk);
result = audit_comparator(ctx->ppid, f->op, f->val);
}
break;
--
1.7.1
Richard Guy Briggs
2014-02-19 20:57:41 UTC
Permalink
- PID will be reported in the relevant querying PID namespace.

- Refuse to change the current audit_pid if the new value does not
point to an existing process.

- Refuse to set the current audit_pid if the new value is not its own PID
(unless it is being unset).

- Convert audit_pid into the initial pid namespace for reports

(informed by ebiederman's 5bf431da)
Cc: "Eric W. Biederman" <***@xmission.com>
Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
kernel/audit.c | 40 ++++++++++++++++++++++++++++++----------
kernel/audit.h | 4 ++--
kernel/auditsc.c | 6 +++---
3 files changed, 35 insertions(+), 15 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index f5ea718..015003c 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -95,7 +95,7 @@ static u32 audit_failure = AUDIT_FAIL_PRINTK;
* contains the pid of the auditd process and audit_nlk_portid contains
* the portid to use to send netlink messages to that process.
*/
-int audit_pid;
+struct pid *audit_pid;
static __u32 audit_nlk_portid;

/* If audit_rate_limit is non-zero, limit the rate of sending audit records
@@ -411,9 +411,12 @@ static void kauditd_send_skb(struct sk_buff *skb)
if (err < 0) {
BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
if (audit_pid) {
- pr_err("*NO* daemon at audit_pid=%d\n", audit_pid);
+ struct pid *temp_pid = audit_pid;
+ pr_err("*NO* daemon at audit_pid=%d\n", pid_nr(audit_pid));
audit_log_lost("auditd disappeared\n");
- audit_pid = 0;
+ audit_pid = NULL;
+ smp_mb();
+ put_pid(temp_pid);
audit_sock = NULL;
}
/* we might get lucky and get this in the next auditd */
@@ -787,7 +790,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
memset(&s, 0, sizeof(s));
s.enabled = audit_enabled;
s.failure = audit_failure;
- s.pid = audit_pid;
+ s.pid = pid_vnr(audit_pid);
s.rate_limit = audit_rate_limit;
s.backlog_limit = audit_backlog_limit;
s.lost = atomic_read(&audit_lost);
@@ -814,12 +817,29 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
return err;
}
if (s.mask & AUDIT_STATUS_PID) {
- int new_pid = s.pid;
-
- if ((!new_pid) && (task_tgid_vnr(current) != audit_pid))
- return -EACCES;
+ struct pid *new_pid = NULL;
+
+ if (s.pid) {
+ rcu_read_lock();
+ new_pid = get_pid(task_tgid(find_task_by_vpid(s.pid)));
+ rcu_read_unlock();
+ if (!new_pid)
+ return -ESRCH;
+ /* check that non-zero pid it is trying to set
+ * is its own*/
+ if (new_pid != task_tgid(current)) {
+ put_pid(new_pid);
+ return -EPERM;
+ }
+ } else {
+ /* check that it isn't orphaning another process */
+ if (task_pid_vnr(current) != pid_vnr(audit_pid))
+ return -EACCES;
+ }
if (audit_enabled != AUDIT_OFF)
- audit_log_config_change("audit_pid", new_pid, audit_pid, 1);
+ audit_log_config_change("audit_pid", pid_nr(new_pid),
+ pid_nr(audit_pid), 1);
+ put_pid(audit_pid);
audit_pid = new_pid;
audit_nlk_portid = NETLINK_CB(skb).portid;
audit_sock = skb->sk;
@@ -1316,7 +1336,7 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
return NULL;

if (gfp_mask & __GFP_WAIT) {
- if (audit_pid && audit_pid == current->pid)
+ if (audit_pid == task_tgid(current))
gfp_mask &= ~__GFP_WAIT;
else
reserve = 0;
diff --git a/kernel/audit.h b/kernel/audit.h
index 57cc64d..e0924bc 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -221,7 +221,7 @@ extern void audit_log_name(struct audit_context *context,
struct audit_names *n, struct path *path,
int record_num, int *call_panic);

-extern int audit_pid;
+extern struct pid *audit_pid;

#define AUDIT_INODE_BUCKETS 32
extern struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
@@ -318,7 +318,7 @@ extern u32 audit_sig_sid;
extern int __audit_signal_info(int sig, struct task_struct *t);
static inline int audit_signal_info(int sig, struct task_struct *t)
{
- if (unlikely((audit_pid && t->tgid == audit_pid) ||
+ if (unlikely((audit_pid && task_tgid(t) == audit_pid) ||
(audit_signals && !audit_dummy_context())))
return __audit_signal_info(sig, t);
return 0;
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index b909715..a747afb 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -734,7 +734,7 @@ static enum audit_state audit_filter_syscall(struct task_struct *tsk,
struct audit_entry *e;
enum audit_state state;

- if (audit_pid && tsk->tgid == audit_pid)
+ if (audit_pid && task_tgid(tsk) == audit_pid)
return AUDIT_DISABLED;

rcu_read_lock();
@@ -795,7 +795,7 @@ void audit_filter_inodes(struct task_struct *tsk, struct audit_context *ctx)
{
struct audit_names *n;

- if (audit_pid && tsk->tgid == audit_pid)
+ if (audit_pid && task_tgid(tsk) == audit_pid)
return;

rcu_read_lock();
@@ -2231,7 +2231,7 @@ int __audit_signal_info(int sig, struct task_struct *t)
struct audit_context *ctx = tsk->audit_context;
kuid_t uid = current_uid(), t_uid = task_uid(t);

- if (audit_pid && t->tgid == audit_pid) {
+ if (audit_pid && task_tgid(t) == audit_pid) {
if (sig == SIGTERM || sig == SIGHUP || sig == SIGUSR1 || sig == SIGUSR2) {
audit_sig_pid = tsk->pid;
if (uid_valid(tsk->loginuid))
--
1.7.1
Richard Guy Briggs
2014-02-19 20:57:42 UTC
Permalink
Store and log all PIDs with reference to the initial PID namespace and
use the access functions task_pid_nr() and task_tgid_nr() for task->pid
and task->tgid.

Cc: "Eric W. Biederman" <***@xmission.com>
(informed by ebiederman's c776b5d2)
Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
drivers/tty/tty_audit.c | 3 ++-
kernel/audit.c | 5 +++--
kernel/auditfilter.c | 17 ++++++++++++++++-
kernel/auditsc.c | 16 +++++++++-------
security/integrity/integrity_audit.c | 2 +-
security/lsm_audit.c | 11 +++++++----
6 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/drivers/tty/tty_audit.c b/drivers/tty/tty_audit.c
index b0e5401..90ca082 100644
--- a/drivers/tty/tty_audit.c
+++ b/drivers/tty/tty_audit.c
@@ -65,6 +65,7 @@ static void tty_audit_log(const char *description, int major, int minor,
{
struct audit_buffer *ab;
struct task_struct *tsk = current;
+ pid_t pid = task_pid_nr(tsk);
uid_t uid = from_kuid(&init_user_ns, task_uid(tsk));
uid_t loginuid = from_kuid(&init_user_ns, audit_get_loginuid(tsk));
unsigned int sessionid = audit_get_sessionid(tsk);
@@ -74,7 +75,7 @@ static void tty_audit_log(const char *description, int major, int minor,
char name[sizeof(tsk->comm)];

audit_log_format(ab, "%s pid=%u uid=%u auid=%u ses=%u major=%d"
- " minor=%d comm=", description, tsk->pid, uid,
+ " minor=%d comm=", description, pid, uid,
loginuid, sessionid, major, minor);
get_task_comm(name, tsk);
audit_log_untrustedstring(ab, name);
diff --git a/kernel/audit.c b/kernel/audit.c
index 015003c..6d33246 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -648,6 +648,7 @@ static int audit_log_common_recv_msg(struct audit_buffer **ab, u16 msg_type)
{
int rc = 0;
uid_t uid = from_kuid(&init_user_ns, current_uid());
+ pid_t pid = task_tgid_nr(current);

if (!audit_enabled && msg_type != AUDIT_USER_AVC) {
*ab = NULL;
@@ -657,7 +658,7 @@ static int audit_log_common_recv_msg(struct audit_buffer **ab, u16 msg_type)
*ab = audit_log_start(NULL, GFP_KERNEL, msg_type);
if (unlikely(!*ab))
return rc;
- audit_log_format(*ab, "pid=%d uid=%u", task_tgid_vnr(current), uid);
+ audit_log_format(*ab, "pid=%d uid=%u", pid, uid);
audit_log_session_info(*ab);
audit_log_task_context(*ab);

@@ -1842,7 +1843,7 @@ void audit_log_task_info(struct audit_buffer *ab, struct task_struct *tsk)
" euid=%u suid=%u fsuid=%u"
" egid=%u sgid=%u fsgid=%u tty=%s ses=%u",
task_ppid_nr(tsk),
- tsk->pid,
+ task_pid_nr(tsk),
from_kuid(&init_user_ns, audit_get_loginuid(tsk)),
from_kuid(&init_user_ns, cred->uid),
from_kgid(&init_user_ns, cred->gid),
diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
index 3152d1a..2f2f69e 100644
--- a/kernel/auditfilter.c
+++ b/kernel/auditfilter.c
@@ -431,6 +431,19 @@ static struct audit_entry *audit_data_to_entry(struct audit_rule_data *data,
f->val = 0;
}

+ if ((f->type == AUDIT_PID) || (f->type == AUDIT_PPID)) {
+ struct pid *pid;
+ rcu_read_lock();
+ pid = find_vpid(f->val);
+ if (!pid) {
+ rcu_read_unlock();
+ err = -ESRCH;
+ goto exit_free;
+ }
+ f->val = pid_nr(pid);
+ rcu_read_unlock();
+ }
+
err = audit_field_valid(entry, f);
if (err)
goto exit_free;
@@ -1238,12 +1251,14 @@ static int audit_filter_user_rules(struct audit_krule *rule, int type,

for (i = 0; i < rule->field_count; i++) {
struct audit_field *f = &rule->fields[i];
+ pid_t pid;
int result = 0;
u32 sid;

switch (f->type) {
case AUDIT_PID:
- result = audit_comparator(task_pid_vnr(current), f->op, f->val);
+ pid = task_pid_nr(current);
+ result = audit_comparator(pid, f->op, f->val);
break;
case AUDIT_UID:
result = audit_uid_comparator(current_uid(), f->op, f->uid);
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index a747afb..f52379b 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -453,10 +453,12 @@ static int audit_filter_rules(struct task_struct *tsk,
struct audit_field *f = &rule->fields[i];
struct audit_names *n;
int result = 0;
+ pid_t pid;

switch (f->type) {
case AUDIT_PID:
- result = audit_comparator(tsk->pid, f->op, f->val);
+ pid = task_pid_nr(tsk);
+ result = audit_comparator(pid, f->op, f->val);
break;
case AUDIT_PPID:
if (ctx) {
@@ -1984,7 +1986,7 @@ static void audit_log_set_loginuid(kuid_t koldloginuid, kuid_t kloginuid,
audit_log_format(ab, "pid=%d uid=%u"
" old-auid=%u new-auid=%u old-ses=%u new-ses=%u"
" res=%d",
- current->pid, uid,
+ task_pid_nr(current), uid,
oldloginuid, loginuid, oldsessionid, sessionid,
!rc);
audit_log_end(ab);
@@ -2208,7 +2210,7 @@ void __audit_ptrace(struct task_struct *t)
{
struct audit_context *context = current->audit_context;

- context->target_pid = t->pid;
+ context->target_pid = task_pid_nr(t);
context->target_auid = audit_get_loginuid(t);
context->target_uid = task_uid(t);
context->target_sessionid = audit_get_sessionid(t);
@@ -2233,7 +2235,7 @@ int __audit_signal_info(int sig, struct task_struct *t)

if (audit_pid && task_tgid(t) == audit_pid) {
if (sig == SIGTERM || sig == SIGHUP || sig == SIGUSR1 || sig == SIGUSR2) {
- audit_sig_pid = tsk->pid;
+ audit_sig_pid = task_pid_nr(tsk);
if (uid_valid(tsk->loginuid))
audit_sig_uid = tsk->loginuid;
else
@@ -2247,7 +2249,7 @@ int __audit_signal_info(int sig, struct task_struct *t)
/* optimize the common case by putting first signal recipient directly
* in audit_context */
if (!ctx->target_pid) {
- ctx->target_pid = t->tgid;
+ ctx->target_pid = task_tgid_nr(t);
ctx->target_auid = audit_get_loginuid(t);
ctx->target_uid = t_uid;
ctx->target_sessionid = audit_get_sessionid(t);
@@ -2268,7 +2270,7 @@ int __audit_signal_info(int sig, struct task_struct *t)
}
BUG_ON(axp->pid_count >= AUDIT_AUX_PIDS);

- axp->target_pid[axp->pid_count] = t->tgid;
+ axp->target_pid[axp->pid_count] = task_tgid_nr(t);
axp->target_auid[axp->pid_count] = audit_get_loginuid(t);
axp->target_uid[axp->pid_count] = t_uid;
axp->target_sessionid[axp->pid_count] = audit_get_sessionid(t);
@@ -2368,7 +2370,7 @@ static void audit_log_task(struct audit_buffer *ab)
from_kgid(&init_user_ns, gid),
sessionid);
audit_log_task_context(ab);
- audit_log_format(ab, " pid=%d comm=", current->pid);
+ audit_log_format(ab, " pid=%d comm=", task_pid_nr(current));
audit_log_untrustedstring(ab, current->comm);
if (mm) {
down_read(&mm->mmap_sem);
diff --git a/security/integrity/integrity_audit.c b/security/integrity/integrity_audit.c
index d7efb30..85253b5 100644
--- a/security/integrity/integrity_audit.c
+++ b/security/integrity/integrity_audit.c
@@ -39,7 +39,7 @@ void integrity_audit_msg(int audit_msgno, struct inode *inode,

ab = audit_log_start(current->audit_context, GFP_KERNEL, audit_msgno);
audit_log_format(ab, "pid=%d uid=%u auid=%u ses=%u",
- current->pid,
+ task_pid_nr(current),
from_kuid(&init_user_ns, current_cred()->uid),
from_kuid(&init_user_ns, audit_get_loginuid(current)),
audit_get_sessionid(current));
diff --git a/security/lsm_audit.c b/security/lsm_audit.c
index 9a62045..69fdf3b 100644
--- a/security/lsm_audit.c
+++ b/security/lsm_audit.c
@@ -220,7 +220,7 @@ static void dump_common_audit_data(struct audit_buffer *ab,
*/
BUILD_BUG_ON(sizeof(a->u) > sizeof(void *)*2);

- audit_log_format(ab, " pid=%d comm=", tsk->pid);
+ audit_log_format(ab, " pid=%d comm=", task_pid_nr(tsk));
audit_log_untrustedstring(ab, tsk->comm);

switch (a->type) {
@@ -278,9 +278,12 @@ static void dump_common_audit_data(struct audit_buffer *ab,
}
case LSM_AUDIT_DATA_TASK:
tsk = a->u.tsk;
- if (tsk && tsk->pid) {
- audit_log_format(ab, " pid=%d comm=", tsk->pid);
- audit_log_untrustedstring(ab, tsk->comm);
+ if (tsk) {
+ pid_t pid = task_pid_nr(tsk);
+ if (pid) {
+ audit_log_format(ab, " pid=%d comm=", pid);
+ audit_log_untrustedstring(ab, tsk->comm);
+ }
}
break;
case LSM_AUDIT_DATA_NET:
--
1.7.1
Richard Guy Briggs
2014-02-19 20:57:43 UTC
Permalink
Still only permit the audit logging daemon and control to operate from the
initial PID namespace, but allow processes to log from another PID namespace.

Cc: "Eric W. Biederman" <***@xmission.com>
(informed by ebiederman's c776b5d2)

Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
kernel/audit.c | 10 +++++++---
1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 6d33246..4cd4011 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -606,9 +606,8 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
{
int err = 0;

- /* Only support the initial namespaces for now. */
- if ((current_user_ns() != &init_user_ns) ||
- (task_active_pid_ns(current) != &init_pid_ns))
+ /* Only support initial user namespace for now. */
+ if ((current_user_ns() != &init_user_ns))
return -EPERM;

switch (msg_type) {
@@ -628,6 +627,11 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
case AUDIT_TTY_SET:
case AUDIT_TRIM:
case AUDIT_MAKE_EQUIV:
+ /* Only support auditd and auditctl in initial pid namespace
+ * for now. */
+ if ((task_active_pid_ns(current) != &init_pid_ns))
+ return -EPERM;
+
if (!capable(CAP_AUDIT_CONTROL))
err = -EPERM;
break;
--
1.7.1
Richard Guy Briggs
2013-12-23 22:27:35 UTC
Permalink
Added the functions task_ppid_nr_ns() and task_ppid_nr() to abstract the lookup
of the PPID (real_parent's pid_t) of a process, including rcu locking, in the
arbitrary and init_pid_ns.
This provides an alternative to sys_getppid(), which is relative to the child
process' pid namespace.

(informed by ebiederman's 6c621b7e)
Cc: ***@vger.kernel.org
Cc: Eric W. Biederman <***@xmission.com>
Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
include/linux/sched.h | 24 ++++++++++++++++++++++++
1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index e27baee..7bf5ab2 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1497,6 +1497,30 @@ static inline pid_t task_tgid_vnr(struct task_struct *tsk)
}


+static inline pid_t task_ppid_nr_ns(struct task_struct *tsk, struct pid_namespace *ns)
+{
+ pid_t pid;
+
+ rcu_read_lock();
+ pid = pid_alive(tsk) ?
+ task_pid_nr_ns(rcu_dereference(tsk->real_parent), ns) : 0;
+ rcu_read_unlock();
+
+ return pid;
+}
+
+static inline pid_t task_ppid_nr(struct task_struct *tsk)
+{
+ pid_t pid;
+
+ rcu_read_lock();
+ pid = pid_alive(tsk) ?
+ task_pid_nr(rcu_dereference(tsk->real_parent)) : 0;
+ rcu_read_unlock();
+
+ return pid;
+}
+
static inline pid_t task_pgrp_nr_ns(struct task_struct *tsk,
struct pid_namespace *ns)
{
--
1.7.1
Oleg Nesterov
2013-12-30 17:04:31 UTC
Permalink
Post by Richard Guy Briggs
+static inline pid_t task_ppid_nr_ns(struct task_struct *tsk, struct pid_namespace *ns)
+{
+ pid_t pid;
+
+ rcu_read_lock();
+ pid = pid_alive(tsk) ?
+ task_pid_nr_ns(rcu_dereference(tsk->real_parent), ns) : 0;
+ rcu_read_unlock();
+
+ return pid;
+}
I do not really mind, but perhaps

pid_t pid = 0;

rcu_read_lock();
if (pid_alive(task))
pid = task_pid_nr_ns(rcu_dereference(tsk->real_parent);
rcu_read_unlock();

return pid;

looks a bit cleaner.
Post by Richard Guy Briggs
+static inline pid_t task_ppid_nr(struct task_struct *tsk)
+{
+ pid_t pid;
+
+ rcu_read_lock();
+ pid = pid_alive(tsk) ?
+ task_pid_nr(rcu_dereference(tsk->real_parent)) : 0;
+ rcu_read_unlock();
+
+ return pid;
+}
It could simply do

return task_ppid_nr_ns(tsk, init_pid_ns);

but again, I won't argue.

Oleg.
Richard Guy Briggs
2013-12-23 22:27:36 UTC
Permalink
sys_getppid() returns the parent pid of the current process in its own pid
namespace. Since audit filters are based in the init pid namespace, a process
could avoid a filter or trigger an unintended one by being in an alternate pid
namespace or log meaningless information.

Switch to task_ppid_nr() for PPIDs to anchor all audit filters in the
init_pid_ns.

(informed by ebiederman's 6c621b7e)
Cc: ***@vger.kernel.org
Cc: Eric W. Biederman <***@xmission.com>
Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
kernel/audit.c | 4 ++--
kernel/auditsc.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 1b13b82..900f5d6 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1839,10 +1839,10 @@ void audit_log_task_info(struct audit_buffer *ab, struct task_struct *tsk)
spin_unlock_irq(&tsk->sighand->siglock);

audit_log_format(ab,
- " ppid=%ld pid=%d auid=%u uid=%u gid=%u"
+ " ppid=%d pid=%d auid=%u uid=%u gid=%u"
" euid=%u suid=%u fsuid=%u"
" egid=%u sgid=%u fsgid=%u tty=%s ses=%u",
- sys_getppid(),
+ task_ppid_nr(tsk),
tsk->pid,
from_kuid(&init_user_ns, audit_get_loginuid(tsk)),
from_kuid(&init_user_ns, cred->uid),
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 10176cd..d396c8b 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -459,7 +459,7 @@ static int audit_filter_rules(struct task_struct *tsk,
case AUDIT_PPID:
if (ctx) {
if (!ctx->ppid)
- ctx->ppid = sys_getppid();
+ ctx->ppid = task_ppid_nr(tsk);
result = audit_comparator(ctx->ppid, f->op, f->val);
}
break;
--
1.7.1
Oleg Nesterov
2013-12-30 17:07:38 UTC
Permalink
Post by Richard Guy Briggs
@@ -1839,10 +1839,10 @@ void audit_log_task_info(struct audit_buffer *ab, struct task_struct *tsk)
spin_unlock_irq(&tsk->sighand->siglock);
audit_log_format(ab,
- " ppid=%ld pid=%d auid=%u uid=%u gid=%u"
+ " ppid=%d pid=%d auid=%u uid=%u gid=%u"
" euid=%u suid=%u fsuid=%u"
" egid=%u sgid=%u fsgid=%u tty=%s ses=%u",
- sys_getppid(),
+ task_ppid_nr(tsk),
Hmm. But sys_getppid() returns tgid, not pid.

This probably means that 1/5 should use task_tgid_nr_*() ?

Note that ->real_parent is not necessarily the group leader.
Post by Richard Guy Briggs
@@ -459,7 +459,7 @@ static int audit_filter_rules(struct task_struct *tsk,
if (ctx) {
if (!ctx->ppid)
- ctx->ppid = sys_getppid();
+ ctx->ppid = task_ppid_nr(tsk);
The same.

Oleg.
Richard Guy Briggs
2013-12-23 22:27:37 UTC
Permalink
- PID will be reported in the relevant querying PID namespace.

- Refuse to change the current audit_pid if the new value does not
point to an existing process.

- Refuse to set the current audit_pid if the new value is not its own PID
(unless it is being unset).

- Convert audit_pid into the initial pid namespace for reports

(informed by ebiederman's 5bf431da)
Cc: "Eric W. Biederman" <***@xmission.com>
Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
kernel/audit.c | 35 ++++++++++++++++++++++++++---------
kernel/audit.h | 4 ++--
kernel/auditsc.c | 6 +++---
3 files changed, 31 insertions(+), 14 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 900f5d6..48312bf 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -93,7 +93,7 @@ static int audit_failure = AUDIT_FAIL_PRINTK;
* contains the pid of the auditd process and audit_nlk_portid contains
* the portid to use to send netlink messages to that process.
*/
-int audit_pid;
+struct pid *audit_pid;
static __u32 audit_nlk_portid;

/* If audit_rate_limit is non-zero, limit the rate of sending audit records
@@ -412,9 +412,11 @@ static void kauditd_send_skb(struct sk_buff *skb)
BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
if (audit_pid) {
pr_err("audit: *NO* daemon at audit_pid=%d\n",
- audit_pid);
+ pid_nr(audit_pid));
audit_log_lost("auditd disappeared\n");
- audit_pid = 0;
+ put_pid(audit_pid);
+ audit_pid = NULL;
+
audit_sock = NULL;
}
/* we might get lucky and get this in the next auditd */
@@ -787,7 +789,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
memset(&s, 0, sizeof(s));
s.enabled = audit_enabled;
s.failure = audit_failure;
- s.pid = audit_pid;
+ s.pid = pid_vnr(audit_pid);
s.rate_limit = audit_rate_limit;
s.backlog_limit = audit_backlog_limit;
s.lost = atomic_read(&audit_lost);
@@ -814,12 +816,26 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
return err;
}
if (s.mask & AUDIT_STATUS_PID) {
- int new_pid = s.pid;
-
- if ((!new_pid) && (task_tgid_vnr(current) != audit_pid))
+ struct pid *new_pid = find_get_pid(s.pid);
+ if (s.pid && !new_pid)
+ return -ESRCH;
+
+ /* check that non-zero pid it is trying to set is its
+ * own*/
+ if (s.pid && (s.pid != task_pid_vnr(current)))
+ return -EPERM;
+
+ /* check that it isn't orphaning another process */
+ if ((!new_pid) &&
+ (task_tgid_vnr(current) != pid_vnr(audit_pid)))
return -EACCES;
+
if (audit_enabled != AUDIT_OFF)
- audit_log_config_change("audit_pid", new_pid, audit_pid, 1);
+ audit_log_config_change("audit_pid",
+ pid_nr(new_pid),
+ pid_nr(audit_pid),
+ 1);
+
audit_pid = new_pid;
audit_nlk_portid = NETLINK_CB(skb).portid;
audit_sock = NETLINK_CB(skb).sk;
@@ -1331,7 +1347,8 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
return NULL;

if (gfp_mask & __GFP_WAIT) {
- if (audit_pid && audit_pid == current->pid)
+ if (pid_nr(audit_pid) == task_pid_nr(current))
+ //if (pid_task(audit_pid, PIDTYPE_PID) == current)
gfp_mask &= ~__GFP_WAIT;
else
reserve = 0;
diff --git a/kernel/audit.h b/kernel/audit.h
index 0719b45..e6d48f4 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -221,7 +221,7 @@ extern void audit_log_name(struct audit_context *context,
struct audit_names *n, struct path *path,
int record_num, int *call_panic);

-extern int audit_pid;
+extern struct pid *audit_pid;

#define AUDIT_INODE_BUCKETS 32
extern struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
@@ -318,7 +318,7 @@ extern u32 audit_sig_sid;
extern int __audit_signal_info(int sig, struct task_struct *t);
static inline int audit_signal_info(int sig, struct task_struct *t)
{
- if (unlikely((audit_pid && t->tgid == audit_pid) ||
+ if (unlikely((audit_pid && task_tgid(t) == audit_pid) ||
(audit_signals && !audit_dummy_context())))
return __audit_signal_info(sig, t);
return 0;
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index d396c8b..db3cb4f 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -732,7 +732,7 @@ static enum audit_state audit_filter_syscall(struct task_struct *tsk,
struct audit_entry *e;
enum audit_state state;

- if (audit_pid && tsk->tgid == audit_pid)
+ if (audit_pid && task_tgid(tsk) == audit_pid)
return AUDIT_DISABLED;

rcu_read_lock();
@@ -793,7 +793,7 @@ void audit_filter_inodes(struct task_struct *tsk, struct audit_context *ctx)
{
struct audit_names *n;

- if (audit_pid && tsk->tgid == audit_pid)
+ if (audit_pid && task_tgid(tsk) == audit_pid)
return;

rcu_read_lock();
@@ -2231,7 +2231,7 @@ int __audit_signal_info(int sig, struct task_struct *t)
struct audit_context *ctx = tsk->audit_context;
kuid_t uid = current_uid(), t_uid = task_uid(t);

- if (audit_pid && t->tgid == audit_pid) {
+ if (audit_pid && task_tgid(t) == audit_pid) {
if (sig == SIGTERM || sig == SIGHUP || sig == SIGUSR1 || sig == SIGUSR2) {
audit_sig_pid = tsk->pid;
if (uid_valid(tsk->loginuid))
--
1.7.1
Oleg Nesterov
2013-12-30 17:51:03 UTC
Permalink
Post by Richard Guy Briggs
- PID will be reported in the relevant querying PID namespace.
- Refuse to change the current audit_pid if the new value does not
point to an existing process.
- Refuse to set the current audit_pid if the new value is not its own PID
(unless it is being unset).
I started to read the changelog only after I failed to understand the
patch. And it looks confusing too.

"Refuse to set the current audit_pid if the new value is not its own PID",
OK, but if the new value is the caller's pid then it should obviously
point to the existing process, current?
Post by Richard Guy Briggs
- Convert audit_pid into the initial pid namespace for reports
I can't apply this patch (and the previous one) due to multiple rejects,
I guess it depends on other changes?

In any case, this patch looks wrong.
Post by Richard Guy Briggs
@@ -412,9 +412,11 @@ static void kauditd_send_skb(struct sk_buff *skb)
BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
if (audit_pid) {
pr_err("audit: *NO* daemon at audit_pid=%d\n",
- audit_pid);
+ pid_nr(audit_pid));
audit_log_lost("auditd disappeared\n");
- audit_pid = 0;
+ put_pid(audit_pid);
But this can actually free this pid. Why it is safe to use it elsewhere?
Post by Richard Guy Briggs
+ audit_pid = NULL;
This also means that every "if (audit_pid)" is racy.
Post by Richard Guy Briggs
@@ -814,12 +816,26 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
return err;
}
if (s.mask & AUDIT_STATUS_PID) {
- int new_pid = s.pid;
-
- if ((!new_pid) && (task_tgid_vnr(current) != audit_pid))
+ struct pid *new_pid = find_get_pid(s.pid);
+ if (s.pid && !new_pid)
+ return -ESRCH;
can't understand... find_get_pid(s.pid) is pointless if s.pid == 0 ?

struct pid *new_pid = NULL;

if (new_pid) {
new_pid = find_get_pid(s.pid);
if (!new_pid)
return -ESRCH;
}

looks more understandable.
Post by Richard Guy Briggs
+ /* check that non-zero pid it is trying to set is its
+ * own*/
+ if (s.pid && (s.pid != task_pid_vnr(current)))
again, this looks a bit confusing and suboptimal. Imho it would be better
to add

if (new_pid != task_tgid(current))

into the "if (s.pid)" above. Hmm, actually task_pid_vnr() above looks
simply wrong, we need tgid or I am totally confused.

OTOH, if we require s.pid == task_pid_vnr(current), then why do we need
find_pid() at all?
Post by Richard Guy Briggs
+ return -EPERM;
this lacks put_pid(new_pid).
Post by Richard Guy Briggs
+ /* check that it isn't orphaning another process */
+ if ((!new_pid) &&
+ (task_tgid_vnr(current) != pid_vnr(audit_pid)))
return -EACCES;
and this can go into the "else" branch.

And I can't understand the "it isn't orphaning another process" logic...

OK, what if s.pid == 0 and audit_pid == NULL, should we fail in this case?

And I do not see how this matches "Refuse to set the current audit_pid
if the new value is not its own PID" from the changelog.
Post by Richard Guy Briggs
+
audit_pid = new_pid;
Another leak, it seems.
Post by Richard Guy Briggs
@@ -1331,7 +1347,8 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
return NULL;
if (gfp_mask & __GFP_WAIT) {
- if (audit_pid && audit_pid == current->pid)
+ if (pid_nr(audit_pid) == task_pid_nr(current))
So is this audit_pid tid or tgid? Its usage looks totally confusing.

Anyway,

if (audit_pid == task_pid(current))

(or probably task_tgid) looks much better.
Post by Richard Guy Briggs
+ //if (pid_task(audit_pid, PIDTYPE_PID) == current)
in this case you need to update Documentation/CodingStyle ;)

Oleg.
Richard Guy Briggs
2014-01-21 23:37:42 UTC
Permalink
Post by Oleg Nesterov
Post by Richard Guy Briggs
- PID will be reported in the relevant querying PID namespace.
- Refuse to change the current audit_pid if the new value does not
point to an existing process.
- Refuse to set the current audit_pid if the new value is not its own PID
(unless it is being unset).
I started to read the changelog only after I failed to understand the
patch. And it looks confusing too.
"Refuse to set the current audit_pid if the new value is not its own PID",
OK, but if the new value is the caller's pid then it should obviously
point to the existing process, current?
Yes, but it may be lying, which helps nobody. The value should either
be its own pid, or zero.
Post by Oleg Nesterov
Post by Richard Guy Briggs
- Convert audit_pid into the initial pid namespace for reports
I can't apply this patch (and the previous one) due to multiple rejects,
I guess it depends on other changes?
It depends on other patches in my for-next tree, but not necessarily
functionally.
Post by Oleg Nesterov
In any case, this patch looks wrong.
Post by Richard Guy Briggs
@@ -412,9 +412,11 @@ static void kauditd_send_skb(struct sk_buff *skb)
BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
if (audit_pid) {
pr_err("audit: *NO* daemon at audit_pid=%d\n",
- audit_pid);
+ pid_nr(audit_pid));
audit_log_lost("auditd disappeared\n");
- audit_pid = 0;
+ put_pid(audit_pid);
But this can actually free this pid. Why it is safe to use it elsewhere?
Post by Richard Guy Briggs
+ audit_pid = NULL;
This also means that every "if (audit_pid)" is racy.
Ok, so I really need something like:

if (audit_pid) {
struct pid *temp_pid = audit_pid;
pr_err("*NO* daemon at audit_pid=%d\n", pid_nr(audit_pid));
audit_log_lost("auditd disappeared\n");
audit_pid = NULL;
smp_mb();
put_pid(temp_pid);
audit_sock = NULL;
}

Do I actually need the memory barrier there? Is that the right one to
use?
Post by Oleg Nesterov
Post by Richard Guy Briggs
@@ -814,12 +816,26 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
return err;
}
if (s.mask & AUDIT_STATUS_PID) {
- int new_pid = s.pid;
-
- if ((!new_pid) && (task_tgid_vnr(current) != audit_pid))
+ struct pid *new_pid = find_get_pid(s.pid);
+ if (s.pid && !new_pid)
+ return -ESRCH;
can't understand... find_get_pid(s.pid) is pointless if s.pid == 0 ?
"can't understand"? I hope you mean "don't understand". ;-)
Post by Oleg Nesterov
struct pid *new_pid = NULL;
if (new_pid) {
I think you mean "if (s.pid) {".
Post by Oleg Nesterov
new_pid = find_get_pid(s.pid);
if (!new_pid)
return -ESRCH;
}
looks more understandable.
Agreed.
Post by Oleg Nesterov
Post by Richard Guy Briggs
+ /* check that non-zero pid it is trying to set is its
+ * own*/
+ if (s.pid && (s.pid != task_pid_vnr(current)))
again, this looks a bit confusing and suboptimal. Imho it would be better
to add
if (new_pid != task_tgid(current))
It is not so obvious to me, because this code had been dealing with
pid_t rather than struct pid*, but I agree it may be more optimal.
Post by Oleg Nesterov
into the "if (s.pid)" above. Hmm, actually task_pid_vnr() above looks
simply wrong, we need tgid or I am totally confused.
The use of task_tgid_vnr() in isolation in commit 34eab0a7 was an error,
I believe, and should have been task_pid_vnr() unless new_pid was then
converted to the thread group leader before being assigned to audit_pid.
I believe auditd only ever registers from the thread group leader, but I
will have to check that assumption with Steve and Eric. If not, I may
have to change it to reference the tgid, making this code a bit more
complex since it would need to convert from pid_t to struct pid*, then
find the tgid.

I'd need to replace:
new_pid = find_get_pid(s.pid);

with:
rcu_read_lock();
new_pid = get_pid(task_tgid(find_task_by_vpid(s.pid)))
rcu_read_unlock();

or it might be nicer to define:
struct pid *audit_find_get_tgid(pid_t vnr) {
rcu_read_lock();
new_pid = get_pid(task_tgid(find_task_by_vpid(vnr)));
rcu_read_unlock();
}

or better yet in kernel/pid.c:
struct pid *find_get_tgid(pid_t vnr) {
rcu_read_lock();
new_pid = get_pid(task_tgid(find_task_by_vpid(vnr)));
rcu_read_unlock();
}

Do you have another reason to believe we need task_tgid_vnr()?

Part of the answer is in 582edda5, but if my assumption above is
correct, we don't need it here.
Post by Oleg Nesterov
OTOH, if we require s.pid == task_pid_vnr(current), then why do we need
find_pid() at all?
Because userspace may be trying to set it to zero on shutdown, but I
suppose we could special-case zero.
Post by Oleg Nesterov
Post by Richard Guy Briggs
+ return -EPERM;
this lacks put_pid(new_pid).
Thanks.
Post by Oleg Nesterov
Post by Richard Guy Briggs
+ /* check that it isn't orphaning another process */
+ if ((!new_pid) &&
+ (task_tgid_vnr(current) != pid_vnr(audit_pid)))
return -EACCES;
and this can go into the "else" branch.
Yup.
Post by Oleg Nesterov
And I can't understand the "it isn't orphaning another process" logic...
There was a potential race condition that if a second auditd started up
while there was still one running, when the first eventually shut down,
it would try to set the audit_pid to zero to indicate it was going away
and orphan the newer auditd from kaudit.

If we check that the pid of the auditd trying to set it to zero was the
same as the pid recorded in kaudit, we can safely assume it is the
newest and safely set audit_pid to zero.

A newer auditd can orphan a previous auditd, but this isn't really
avoidable since it may be awol and need replacing anyways.
Post by Oleg Nesterov
OK, what if s.pid == 0 and audit_pid == NULL, should we fail in this case?
It doesn't really matter since auditd isn't likely to try this
combinaiton and it won't matter if it fails.
Post by Oleg Nesterov
And I do not see how this matches "Refuse to set the current audit_pid
if the new value is not its own PID" from the changelog.
Post by Richard Guy Briggs
+
audit_pid = new_pid;
Another leak, it seems.
Oops, thanks.
Post by Oleg Nesterov
Post by Richard Guy Briggs
@@ -1331,7 +1347,8 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
return NULL;
if (gfp_mask & __GFP_WAIT) {
- if (audit_pid && audit_pid == current->pid)
+ if (pid_nr(audit_pid) == task_pid_nr(current))
So is this audit_pid tid or tgid? Its usage looks totally confusing.
pid, as noted above, but could be changed to tgid.
Post by Oleg Nesterov
Anyway,
if (audit_pid == task_pid(current))
(or probably task_tgid) looks much better.
Fair enough, compare struct pid* rather than the original pid_t.
Post by Oleg Nesterov
Post by Richard Guy Briggs
+ //if (pid_task(audit_pid, PIDTYPE_PID) == current)
in this case you need to update Documentation/CodingStyle ;)
Heh... trust the SCM... and that was comparing task_structs rather
than pid_t or struct pid*. Still getting comfortable with struct pid*.
Post by Oleg Nesterov
Oleg.
- RGB

--
Richard Guy Briggs <***@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
Richard Guy Briggs
2013-12-23 22:27:38 UTC
Permalink
Store and log all PIDs with reference to the initial PID namespace and
use the access functions task_pid_nr() and task_tgid_nr() for task->pid
and task->tgid rather than access them directly.

Cc: "Eric W. Biederman" <***@xmission.com>
(informed by ebiederman's c776b5d2)
Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
drivers/tty/tty_audit.c | 3 ++-
kernel/audit.c | 5 +++--
kernel/auditfilter.c | 17 ++++++++++++++++-
kernel/auditsc.c | 16 +++++++++-------
security/apparmor/audit.c | 2 +-
security/integrity/integrity_audit.c | 2 +-
security/lsm_audit.c | 11 +++++++----
7 files changed, 39 insertions(+), 17 deletions(-)

diff --git a/drivers/tty/tty_audit.c b/drivers/tty/tty_audit.c
index b0e5401..90ca082 100644
--- a/drivers/tty/tty_audit.c
+++ b/drivers/tty/tty_audit.c
@@ -65,6 +65,7 @@ static void tty_audit_log(const char *description, int major, int minor,
{
struct audit_buffer *ab;
struct task_struct *tsk = current;
+ pid_t pid = task_pid_nr(tsk);
uid_t uid = from_kuid(&init_user_ns, task_uid(tsk));
uid_t loginuid = from_kuid(&init_user_ns, audit_get_loginuid(tsk));
unsigned int sessionid = audit_get_sessionid(tsk);
@@ -74,7 +75,7 @@ static void tty_audit_log(const char *description, int major, int minor,
char name[sizeof(tsk->comm)];

audit_log_format(ab, "%s pid=%u uid=%u auid=%u ses=%u major=%d"
- " minor=%d comm=", description, tsk->pid, uid,
+ " minor=%d comm=", description, pid, uid,
loginuid, sessionid, major, minor);
get_task_comm(name, tsk);
audit_log_untrustedstring(ab, name);
diff --git a/kernel/audit.c b/kernel/audit.c
index 48312bf..e39606b 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -648,6 +648,7 @@ static int audit_log_common_recv_msg(struct audit_buffer **ab, u16 msg_type)
{
int rc = 0;
uid_t uid = from_kuid(&init_user_ns, current_uid());
+ pid_t pid = task_tgid_nr(current);

if (!audit_enabled && msg_type != AUDIT_USER_AVC) {
*ab = NULL;
@@ -657,7 +658,7 @@ static int audit_log_common_recv_msg(struct audit_buffer **ab, u16 msg_type)
*ab = audit_log_start(NULL, GFP_KERNEL, msg_type);
if (unlikely(!*ab))
return rc;
- audit_log_format(*ab, "pid=%d uid=%u", task_tgid_vnr(current), uid);
+ audit_log_format(*ab, "pid=%d uid=%u", pid, uid);
audit_log_session_info(*ab);
audit_log_task_context(*ab);

@@ -1860,7 +1861,7 @@ void audit_log_task_info(struct audit_buffer *ab, struct task_struct *tsk)
" euid=%u suid=%u fsuid=%u"
" egid=%u sgid=%u fsgid=%u tty=%s ses=%u",
task_ppid_nr(tsk),
- tsk->pid,
+ task_pid_nr(tsk),
from_kuid(&init_user_ns, audit_get_loginuid(tsk)),
from_kuid(&init_user_ns, cred->uid),
from_kgid(&init_user_ns, cred->gid),
diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
index 14a78cc..3e64fd8 100644
--- a/kernel/auditfilter.c
+++ b/kernel/auditfilter.c
@@ -429,6 +429,19 @@ static struct audit_entry *audit_data_to_entry(struct audit_rule_data *data,
f->val = 0;
}

+ if ((f->type == AUDIT_PID) || (f->type == AUDIT_PPID)) {
+ struct pid *pid;
+ rcu_read_lock();
+ pid = find_vpid(f->val);
+ if (!pid) {
+ rcu_read_unlock();
+ err = -ESRCH;
+ goto exit_free;
+ }
+ f->val = pid_nr(pid);
+ rcu_read_unlock();
+ }
+
err = audit_field_valid(entry, f);
if (err)
goto exit_free;
@@ -1236,12 +1249,14 @@ static int audit_filter_user_rules(struct audit_krule *rule, int type,

for (i = 0; i < rule->field_count; i++) {
struct audit_field *f = &rule->fields[i];
+ pid_t pid;
int result = 0;
u32 sid;

switch (f->type) {
case AUDIT_PID:
- result = audit_comparator(task_pid_vnr(current), f->op, f->val);
+ pid = task_pid_nr(current);
+ result = audit_comparator(pid, f->op, f->val);
break;
case AUDIT_UID:
result = audit_uid_comparator(current_uid(), f->op, f->uid);
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index db3cb4f..ac852a9 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -451,10 +451,12 @@ static int audit_filter_rules(struct task_struct *tsk,
struct audit_field *f = &rule->fields[i];
struct audit_names *n;
int result = 0;
+ pid_t pid;

switch (f->type) {
case AUDIT_PID:
- result = audit_comparator(tsk->pid, f->op, f->val);
+ pid = task_pid_nr(tsk);
+ result = audit_comparator(pid, f->op, f->val);
break;
case AUDIT_PPID:
if (ctx) {
@@ -1984,7 +1986,7 @@ static void audit_log_set_loginuid(kuid_t koldloginuid, kuid_t kloginuid,
audit_log_format(ab, "pid=%d uid=%u"
" old-auid=%u new-auid=%u old-ses=%u new-ses=%u"
" res=%d",
- current->pid, uid,
+ task_pid_nr(current), uid,
oldloginuid, loginuid, oldsessionid, sessionid,
!rc);
audit_log_end(ab);
@@ -2208,7 +2210,7 @@ void __audit_ptrace(struct task_struct *t)
{
struct audit_context *context = current->audit_context;

- context->target_pid = t->pid;
+ context->target_pid = task_pid_nr(t);
context->target_auid = audit_get_loginuid(t);
context->target_uid = task_uid(t);
context->target_sessionid = audit_get_sessionid(t);
@@ -2233,7 +2235,7 @@ int __audit_signal_info(int sig, struct task_struct *t)

if (audit_pid && task_tgid(t) == audit_pid) {
if (sig == SIGTERM || sig == SIGHUP || sig == SIGUSR1 || sig == SIGUSR2) {
- audit_sig_pid = tsk->pid;
+ audit_sig_pid = task_pid_nr(tsk);
if (uid_valid(tsk->loginuid))
audit_sig_uid = tsk->loginuid;
else
@@ -2247,7 +2249,7 @@ int __audit_signal_info(int sig, struct task_struct *t)
/* optimize the common case by putting first signal recipient directly
* in audit_context */
if (!ctx->target_pid) {
- ctx->target_pid = t->tgid;
+ ctx->target_pid = task_tgid_nr(t);
ctx->target_auid = audit_get_loginuid(t);
ctx->target_uid = t_uid;
ctx->target_sessionid = audit_get_sessionid(t);
@@ -2268,7 +2270,7 @@ int __audit_signal_info(int sig, struct task_struct *t)
}
BUG_ON(axp->pid_count >= AUDIT_AUX_PIDS);

- axp->target_pid[axp->pid_count] = t->tgid;
+ axp->target_pid[axp->pid_count] = task_tgid_nr(t);
axp->target_auid[axp->pid_count] = audit_get_loginuid(t);
axp->target_uid[axp->pid_count] = t_uid;
axp->target_sessionid[axp->pid_count] = audit_get_sessionid(t);
@@ -2368,7 +2370,7 @@ static void audit_log_task(struct audit_buffer *ab)
from_kgid(&init_user_ns, gid),
sessionid);
audit_log_task_context(ab);
- audit_log_format(ab, " pid=%d comm=", current->pid);
+ audit_log_format(ab, " pid=%d comm=", task_pid_nr(current));
audit_log_untrustedstring(ab, current->comm);
if (mm) {
down_read(&mm->mmap_sem);
diff --git a/security/apparmor/audit.c b/security/apparmor/audit.c
index 031d2d9..008f4d9 100644
--- a/security/apparmor/audit.c
+++ b/security/apparmor/audit.c
@@ -151,7 +151,7 @@ static void audit_pre(struct audit_buffer *ab, void *ca)
}

if (sa->aad->tsk) {
- audit_log_format(ab, " pid=%d comm=", tsk->pid);
+ audit_log_format(ab, " pid=%d comm=", task_pid_nr(tsk));
audit_log_untrustedstring(ab, tsk->comm);
}

diff --git a/security/integrity/integrity_audit.c b/security/integrity/integrity_audit.c
index d7efb30..85253b5 100644
--- a/security/integrity/integrity_audit.c
+++ b/security/integrity/integrity_audit.c
@@ -39,7 +39,7 @@ void integrity_audit_msg(int audit_msgno, struct inode *inode,

ab = audit_log_start(current->audit_context, GFP_KERNEL, audit_msgno);
audit_log_format(ab, "pid=%d uid=%u auid=%u ses=%u",
- current->pid,
+ task_pid_nr(current),
from_kuid(&init_user_ns, current_cred()->uid),
from_kuid(&init_user_ns, audit_get_loginuid(current)),
audit_get_sessionid(current));
diff --git a/security/lsm_audit.c b/security/lsm_audit.c
index b0f249d..e6a688c 100644
--- a/security/lsm_audit.c
+++ b/security/lsm_audit.c
@@ -220,7 +220,7 @@ static void dump_common_audit_data(struct audit_buffer *ab,
*/
BUILD_BUG_ON(sizeof(a->u) > sizeof(void *)*2);

- audit_log_format(ab, " pid=%d comm=", tsk->pid);
+ audit_log_format(ab, " pid=%d comm=", task_pid_nr(tsk));
audit_log_untrustedstring(ab, tsk->comm);

switch (a->type) {
@@ -278,9 +278,12 @@ static void dump_common_audit_data(struct audit_buffer *ab,
}
case LSM_AUDIT_DATA_TASK:
tsk = a->u.tsk;
- if (tsk && tsk->pid) {
- audit_log_format(ab, " pid=%d comm=", tsk->pid);
- audit_log_untrustedstring(ab, tsk->comm);
+ if (tsk) {
+ pid_t pid = task_pid_nr(tsk);
+ if (pid) {
+ audit_log_format(ab, " pid=%d comm=", pid);
+ audit_log_untrustedstring(ab, tsk->comm);
+ }
}
break;
case LSM_AUDIT_DATA_NET:
--
1.7.1
Oleg Nesterov
2013-12-30 18:06:00 UTC
Permalink
Post by Richard Guy Briggs
Store and log all PIDs with reference to the initial PID namespace and
use the access functions task_pid_nr() and task_tgid_nr() for task->pid
and task->tgid rather than access them directly.
At first glance this patch looks like a good cleanup, but...
Post by Richard Guy Briggs
@@ -429,6 +429,19 @@ static struct audit_entry *audit_data_to_entry(struct audit_rule_data *data,
f->val = 0;
}
+ if ((f->type == AUDIT_PID) || (f->type == AUDIT_PPID)) {
+ struct pid *pid;
+ rcu_read_lock();
+ pid = find_vpid(f->val);
+ if (!pid) {
+ rcu_read_unlock();
+ err = -ESRCH;
+ goto exit_free;
+ }
+ f->val = pid_nr(pid);
+ rcu_read_unlock();
+ }
I do not really understand this change, but this doesn't matter, I do
not understand audit.

However, I think this deserves a separate patch with the changelog.
Post by Richard Guy Briggs
@@ -278,9 +278,12 @@ static void dump_common_audit_data(struct audit_buffer *ab,
}
tsk = a->u.tsk;
- if (tsk && tsk->pid) {
- audit_log_format(ab, " pid=%d comm=", tsk->pid);
- audit_log_untrustedstring(ab, tsk->comm);
+ if (tsk) {
+ pid_t pid = task_pid_nr(tsk);
+ if (pid) {
+ audit_log_format(ab, " pid=%d comm=", pid);
+ audit_log_untrustedstring(ab, tsk->comm);
Just curious, is it really possible that a->u.tsk is an idle thread?

Oleg.
Richard Guy Briggs
2014-02-19 20:28:41 UTC
Permalink
Post by Oleg Nesterov
Post by Richard Guy Briggs
Store and log all PIDs with reference to the initial PID namespace and
use the access functions task_pid_nr() and task_tgid_nr() for task->pid
and task->tgid rather than access them directly.
At first glance this patch looks like a good cleanup, but...
Post by Richard Guy Briggs
@@ -429,6 +429,19 @@ static struct audit_entry *audit_data_to_entry(struct audit_rule_data *data,
f->val = 0;
}
+ if ((f->type == AUDIT_PID) || (f->type == AUDIT_PPID)) {
+ struct pid *pid;
+ rcu_read_lock();
+ pid = find_vpid(f->val);
+ if (!pid) {
+ rcu_read_unlock();
+ err = -ESRCH;
+ goto exit_free;
+ }
+ f->val = pid_nr(pid);
+ rcu_read_unlock();
+ }
I do not really understand this change, but this doesn't matter, I do
not understand audit.
Is this pid_t handed down from userspace a valid pid_t in its
namespace? If not, generate an error. If it is, store its global
pid_t for future comparisons. It might be better to store a struct
task_struct * or a struct pid *, but then the comparison functions would
have to change too, along with the rule reporting functions, which still
report pid_t to userspace.
Post by Oleg Nesterov
However, I think this deserves a separate patch with the changelog.
All *that* would need a seperate patch...
Post by Oleg Nesterov
Post by Richard Guy Briggs
@@ -278,9 +278,12 @@ static void dump_common_audit_data(struct audit_buffer *ab,
}
tsk = a->u.tsk;
- if (tsk && tsk->pid) {
- audit_log_format(ab, " pid=%d comm=", tsk->pid);
- audit_log_untrustedstring(ab, tsk->comm);
+ if (tsk) {
+ pid_t pid = task_pid_nr(tsk);
+ if (pid) {
+ audit_log_format(ab, " pid=%d comm=", pid);
+ audit_log_untrustedstring(ab, tsk->comm);
Just curious, is it really possible that a->u.tsk is an idle thread?
No. It is possible a->u.tsk isn't filled in.
Post by Oleg Nesterov
Oleg.
- RGB

--
Richard Guy Briggs <***@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
Richard Guy Briggs
2013-12-23 22:27:39 UTC
Permalink
Still only permit the audit logging daemon and control to operate from the
initial PID namespace, but allow processes to log from another PID namespace.

Cc: "Eric W. Biederman" <***@xmission.com>
(informed by ebiederman's c776b5d2)

Signed-off-by: Richard Guy Briggs <***@redhat.com>
---
kernel/audit.c | 10 +++++++---
1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index e39606b..83f8068 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -606,9 +606,8 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
{
int err = 0;

- /* Only support the initial namespaces for now. */
- if ((current_user_ns() != &init_user_ns) ||
- (task_active_pid_ns(current) != &init_pid_ns))
+ /* Only support initial user namespace for now. */
+ if ((current_user_ns() != &init_user_ns))
return -EPERM;

switch (msg_type) {
@@ -628,6 +627,11 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
case AUDIT_TTY_SET:
case AUDIT_TRIM:
case AUDIT_MAKE_EQUIV:
+ /* Only support auditd and auditctl in initial pid namespace
+ * for now. */
+ if ((task_active_pid_ns(current) != &init_pid_ns))
+ return -EPERM;
+
if (!capable(CAP_AUDIT_CONTROL))
err = -EPERM;
break;
--
1.7.1
Continue reading on narkive:
Loading...