Discussion:
Systemd Journald and audit logging causing journal issues
(too old to reply)
Brad Zynda
2017-10-02 17:30:19 UTC
Permalink
Raw Message
Hello Everyone,

I am sending along an issue brought to the systemd-journald dev list
Sep 28 13:50:03 server systemd-journal[565]: Suppressed 73244 messages
from /system.slice/auditd.service
The question is: why does auditd even log to the journal?
Now we are required to have full audit rules and does this look like at
rate limiting issue or an issue of journal not able to handle the
traffic to logging?
journald detected that it got flooded with too many messages in too
short a time from auditd. if this happens then something is almost
certainly off with auditd, as auditd is not supposed to flood journald
with messages, after all it maintains its own auditing log database.
Please ping the auditd folks for help
Lennart
_______________________________________________
systemd-devel mailing list
systemd-***@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel





Hey Everyone,

Not sure if this is a bug so:

systemctl status -l systemd-journald.service
● systemd-journald.service - Journal Service
Loaded: loaded (/usr/lib/systemd/system/systemd-journald.service;
static; vendor preset: disabled)
Active: active (running) since Tue 2017-09-26 20:01:16 UTC; 5 days ago
Docs: man:systemd-journald.service(8)
man:journald.conf(5)
Main PID: 565 (systemd-journal)
Status: "Processing requests..."
CGroup: /system.slice/systemd-journald.service
└─565 /usr/lib/systemd/systemd-journald

Sep 28 13:50:03 server systemd-journal[565]: Suppressed 73244 messages
from /system.slice/auditd.service
Sep 28 13:51:03 server systemd-journal[565]: Suppressed 98979 messages
from /system.slice/auditd.service
Sep 28 13:52:03 server systemd-journal[565]: Suppressed 109433 messages
from /system.slice/auditd.service
Sep 28 13:53:03 server systemd-journal[565]: Suppressed 99788 messages
from /system.slice/auditd.service
Sep 28 13:54:03 server systemd-journal[565]: Suppressed 111605 messages
from /system.slice/auditd.service
Sep 28 13:55:03 server systemd-journal[565]: Suppressed 111591 messages
from /system.slice/auditd.service
Sep 28 13:56:03 server systemd-journal[565]: Suppressed 107947 messages
from /system.slice/auditd.service
Sep 28 13:57:51 server systemd-journal[565]: Suppressed 32760 messages
from /system.slice/auditd.service
Sep 28 17:21:40 server systemd-journal[565]: Suppressed 210 messages
from /system.slice/auditd.service
Oct 01 02:16:01 server systemd-journal[565]: Suppressed 1333 messages
from /system.slice/auditd.service



journalctl --verify
PASS: /run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/system.journal
PASS:
/run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/***@0d49221d68d04ef0b95d8203c5e96a46-000000000097f6c7-0005596b745b4d1c.journal
PASS:
/run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/***@0d49221d68d04ef0b95d8203c5e96a46-000000000096a587-00055966f35ae59a.journal
PASS:
/run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/***@0d49221d68d04ef0b95d8203c5e96a46-00000000009554f1-000559629c4cdb7e.journal
PASS:
/run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/***@0d49221d68d04ef0b95d8203c5e96a46-0000000000940591-0005595e1811a2d1.journal
PASS:
/run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/***@0d49221d68d04ef0b95d8203c5e96a46-000000000092b500-00055959f2de5ede.journal
PASS:
/run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/***@0d49221d68d04ef0b95d8203c5e96a46-0000000000916479-0005595573137b74.journal
PASS:
/run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/***@0d49221d68d04ef0b95d8203c5e96a46-0000000000901337-00055950d80cc3d8.journal
PASS:
/run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/***@0d49221d68d04ef0b95d8203c5e96a46-00000000008ec2fb-0005594cad14b07a.journal
PASS:
/run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/***@0d49221d68d04ef0b95d8203c5e96a46-00000000008d7373-0005594838683e58.journal
PASS:
/run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/***@0d49221d68d04ef0b95d8203c5e96a46-00000000008c238e-00055943fe2072e3.journal
PASS:
/run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/***@0d49221d68d04ef0b95d8203c5e96a46-00000000008ad1d9-0005593ff64a4f69.journal
PASS:
/run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/***@0d49221d68d04ef0b95d8203c5e96a46-0000000000897f32-0005593e18c5758b.journal


journalctl --disk-usage
Archived and active journals take up 1.1G on disk.


Initially we saw:
16733 PATH
5070 SYSCALL
5024 CWD
3765 AVC
323 CRYPTO_KEY_USER
223 USER_START
222 USER_ACCT
222 CRED_ACQ
220 LOGIN
220 CRED_REFR
218 USER_END
218 CRED_DISP
46 USER_LOGIN
12 EXECVE
4 USER_AUTH
2 CRYPTO_SESSION
1 USER_ROLE_CHANGE
1 USER_CMD
1 SERVICE_STOP
1 SERVICE_START
1 BPRM_FCAPS

so we blocked type PATH in audit.rules

But we are still seeing 100K of dropped/suppressed messages.

Note: systemloglevel = INFO

Centos 7 1708 3.10.0-693.2.2.el7.x86_64

systemd.x86_64 219-42.el7_4.1


Now we are required to have full audit rules and does this look like at
rate limiting issue or an issue of journal not able to handle the
traffic to logging?

Error we are seeing from services that have silently failed, in this
case glassfish..

systemctl status -l glassfish
● glassfish.service - SYSV: GlassFish start and stop daemon
Loaded: loaded (/etc/rc.d/init.d/glassfish; bad; vendor preset: disabled)
Active: active (exited) since Tue 2017-09-26 20:01:36 UTC; 5 days ago
Docs: man:systemd-sysv-generator(8)
Process: 1328 ExecStart=/etc/rc.d/init.d/glassfish start (code=exited,
status=0/SUCCESS)

Warning: Journal has been rotated since unit was started. Log output is
incomplete or unavailable.

Eventually glassfish will fail but it wont kill the service so we never
get an nms service down trap from the OID.

Please let me know if further info is needed or if certain limits need
to be adjusted.

Thanks,
Brad Zynda
Steve Grubb
2017-10-17 15:25:53 UTC
Permalink
Raw Message
Hello,

I apologize for the late reply...just found the message.
Post by Brad Zynda
I am sending along an issue brought to the systemd-journald dev list
Sep 28 13:50:03 server systemd-journal[565]: Suppressed 73244 messages
from /system.slice/auditd.service
The question is: why does auditd even log to the journal?
It doesn't. I have had many arguments with the systemd people about polluting
syslog with audit events. If we wanted audit events there, we would have wrote
them there. The journal is listening on a multicast audit socket that was
created just for them and using a posix capability that was created just for
them. And journal also turns on auditing even if you didn't want it. In short,
they have, with intention, created your problem.
Post by Brad Zynda
Now we are required to have full audit rules and does this look like at
rate limiting issue or an issue of journal not able to handle the
traffic to logging?
journald detected that it got flooded with too many messages in too
short a time from auditd. if this happens then something is almost
certainly off with auditd, as auditd is not supposed to flood journald
with messages, after all it maintains its own auditing log database.
No...that's the way it works. If you want the audit stream, you have to be
able to handle it. My suggestion is that we have a separation of duties.
Auditd has audit events, journal has syslog. Besides, mixing audit and syslog
data means the security officer and system admin roles have been combined. I
think there is an audit.socket unit file that can be masked.
Post by Brad Zynda
Please ping the auditd folks for help
They created the problem of audit events in syslog. That said, its been my
experience that whenever you get lots of events, there may be something wrong
with your rules.

The normal technique to figure out what wrong is to run aureport --summary
--key during the time range of the flood to see what rule is triggering. Then
we can look at that rule to see if there's something wrong with it.

More below...
Post by Brad Zynda
Hey Everyone,
systemctl status -l systemd-journald.service
● systemd-journald.service - Journal Service
Loaded: loaded (/usr/lib/systemd/system/systemd-journald.service;
static; vendor preset: disabled)
Active: active (running) since Tue 2017-09-26 20:01:16 UTC; 5 days ago
Docs: man:systemd-journald.service(8)
man:journald.conf(5)
Main PID: 565 (systemd-journal)
Status: "Processing requests..."
CGroup: /system.slice/systemd-journald.service
└─565 /usr/lib/systemd/systemd-journald
Sep 28 13:50:03 server systemd-journal[565]: Suppressed 73244 messages
from /system.slice/auditd.service
Sep 28 13:51:03 server systemd-journal[565]: Suppressed 98979 messages
from /system.slice/auditd.service
Sep 28 13:52:03 server systemd-journal[565]: Suppressed 109433 messages
from /system.slice/auditd.service
Sep 28 13:53:03 server systemd-journal[565]: Suppressed 99788 messages
from /system.slice/auditd.service
Sep 28 13:54:03 server systemd-journal[565]: Suppressed 111605 messages
from /system.slice/auditd.service
Sep 28 13:55:03 server systemd-journal[565]: Suppressed 111591 messages
from /system.slice/auditd.service
Sep 28 13:56:03 server systemd-journal[565]: Suppressed 107947 messages
from /system.slice/auditd.service
Sep 28 13:57:51 server systemd-journal[565]: Suppressed 32760 messages
from /system.slice/auditd.service
Sep 28 17:21:40 server systemd-journal[565]: Suppressed 210 messages
from /system.slice/auditd.service
Oct 01 02:16:01 server systemd-journal[565]: Suppressed 1333 messages
from /system.slice/auditd.service
journalctl --verify
PASS: /run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/system.journal
d8203c5e96a46-0000000000897f32-0005593e18c5758b.journal
journalctl --disk-usage
Archived and active journals take up 1.1G on disk.
16733 PATH
5070 SYSCALL
5024 CWD
3765 AVC
323 CRYPTO_KEY_USER
223 USER_START
222 USER_ACCT
222 CRED_ACQ
220 LOGIN
220 CRED_REFR
218 USER_END
218 CRED_DISP
46 USER_LOGIN
12 EXECVE
4 USER_AUTH
2 CRYPTO_SESSION
1 USER_ROLE_CHANGE
1 USER_CMD
1 SERVICE_STOP
1 SERVICE_START
1 BPRM_FCAPS
so we blocked type PATH in audit.rules
This is not the right thing to do. If a security officer asks what is being
accessed, you got rid of the information. The right thing is to figure out
which rule is being hit and see if something is wrong with it. For example, I
have seen people do this:

-a always,exit -S open,openat -F exit=-EPERM

The problem is that they did not restrict the rule an architecture and they
were getting lots of events for the wrong syscall. I've also seen people add
-F success 0 to an open syscall. This also results in a large number of
events.

So, I'd recommend making sure all rules have keys added and the running the
key summary report to see what rule needs inspection.

If you find the rule that's causing the problem and you want an opinion, send
it to the mail list.

-Steve
Post by Brad Zynda
But we are still seeing 100K of dropped/suppressed messages.
Note: systemloglevel = INFO
Centos 7 1708 3.10.0-693.2.2.el7.x86_64
systemd.x86_64 219-42.el7_4.1
Now we are required to have full audit rules and does this look like at
rate limiting issue or an issue of journal not able to handle the
traffic to logging?
Error we are seeing from services that have silently failed, in this
case glassfish..
systemctl status -l glassfish
● glassfish.service - SYSV: GlassFish start and stop daemon
Loaded: loaded (/etc/rc.d/init.d/glassfish; bad; vendor preset: disabled)
man:systemd-sysv-generator(8)
Process: 1328 ExecStart=/etc/rc.d/init.d/glassfish start (code=exited,
status=0/SUCCESS)
Warning: Journal has been rotated since unit was started. Log output is
incomplete or unavailable.
Eventually glassfish will fail but it wont kill the service so we never
get an nms service down trap from the OID.
Please let me know if further info is needed or if certain limits need
to be adjusted.
Thanks,
Brad Zynda
--
Linux-audit mailing list
https://www.redhat.com/mailman/listinfo/linux-audit
Brad Zynda
2017-10-17 15:40:12 UTC
Permalink
Raw Message
Hey Steve,

No problem you guys are busy with updates..

So I kind of stepped into a known issue with a current disagreement
between the 2 maintainers? what can be done to resolve this going
forward as it is killing services in production environments?

I agree with the need not to remove auditing as this is a slippery slope
and should not occur but the decision was based on little documentation
in regards to the problem and loss of service, I will look at further
checking in hopes to find the specific rule. Though I think the latter
to fix the issue is the appropriate avenue.

The rules have been put in place across many organizations that check
with tools like CIS-CAT and OSCAP, so a lot of rules and a point of
possible single failure.

In regards to the audit.socket what is the expected outcome of masking
this service?

Thanks,
Brad
Post by Steve Grubb
Hello,
I apologize for the late reply...just found the message.
Post by Brad Zynda
I am sending along an issue brought to the systemd-journald dev list
Sep 28 13:50:03 server systemd-journal[565]: Suppressed 73244 messages
from /system.slice/auditd.service
The question is: why does auditd even log to the journal?
It doesn't. I have had many arguments with the systemd people about polluting
them there. The journal is listening on a multicast audit socket that was
created just for them and using a posix capability that was created just for
them. And journal also turns on auditing even if you didn't want it. In short,
they have, with intention, created your problem.
Post by Brad Zynda
Now we are required to have full audit rules and does this look like at
rate limiting issue or an issue of journal not able to handle the
traffic to logging?
journald detected that it got flooded with too many messages in too
short a time from auditd. if this happens then something is almost
certainly off with auditd, as auditd is not supposed to flood journald
with messages, after all it maintains its own auditing log database.
No...that's the way it works. If you want the audit stream, you have to be
able to handle it. My suggestion is that we have a separation of duties.
Auditd has audit events, journal has syslog. Besides, mixing audit and syslog
data means the security officer and system admin roles have been combined. I
think there is an audit.socket unit file that can be masked.
Post by Brad Zynda
Please ping the auditd folks for help
They created the problem of audit events in syslog. That said, its been my
experience that whenever you get lots of events, there may be something wrong
with your rules.
The normal technique to figure out what wrong is to run aureport --summary
--key during the time range of the flood to see what rule is triggering. Then
we can look at that rule to see if there's something wrong with it.
More below...
Post by Brad Zynda
Hey Everyone,
systemctl status -l systemd-journald.service
● systemd-journald.service - Journal Service
Loaded: loaded (/usr/lib/systemd/system/systemd-journald.service;
static; vendor preset: disabled)
Active: active (running) since Tue 2017-09-26 20:01:16 UTC; 5 days ago
Docs: man:systemd-journald.service(8)
man:journald.conf(5)
Main PID: 565 (systemd-journal)
Status: "Processing requests..."
CGroup: /system.slice/systemd-journald.service
└─565 /usr/lib/systemd/systemd-journald
Sep 28 13:50:03 server systemd-journal[565]: Suppressed 73244 messages
from /system.slice/auditd.service
Sep 28 13:51:03 server systemd-journal[565]: Suppressed 98979 messages
from /system.slice/auditd.service
Sep 28 13:52:03 server systemd-journal[565]: Suppressed 109433 messages
from /system.slice/auditd.service
Sep 28 13:53:03 server systemd-journal[565]: Suppressed 99788 messages
from /system.slice/auditd.service
Sep 28 13:54:03 server systemd-journal[565]: Suppressed 111605 messages
from /system.slice/auditd.service
Sep 28 13:55:03 server systemd-journal[565]: Suppressed 111591 messages
from /system.slice/auditd.service
Sep 28 13:56:03 server systemd-journal[565]: Suppressed 107947 messages
from /system.slice/auditd.service
Sep 28 13:57:51 server systemd-journal[565]: Suppressed 32760 messages
from /system.slice/auditd.service
Sep 28 17:21:40 server systemd-journal[565]: Suppressed 210 messages
from /system.slice/auditd.service
Oct 01 02:16:01 server systemd-journal[565]: Suppressed 1333 messages
from /system.slice/auditd.service
journalctl --verify
PASS: /run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/system.journal
d8203c5e96a46-0000000000897f32-0005593e18c5758b.journal
journalctl --disk-usage
Archived and active journals take up 1.1G on disk.
16733 PATH
5070 SYSCALL
5024 CWD
3765 AVC
323 CRYPTO_KEY_USER
223 USER_START
222 USER_ACCT
222 CRED_ACQ
220 LOGIN
220 CRED_REFR
218 USER_END
218 CRED_DISP
46 USER_LOGIN
12 EXECVE
4 USER_AUTH
2 CRYPTO_SESSION
1 USER_ROLE_CHANGE
1 USER_CMD
1 SERVICE_STOP
1 SERVICE_START
1 BPRM_FCAPS
so we blocked type PATH in audit.rules
This is not the right thing to do. If a security officer asks what is being
accessed, you got rid of the information. The right thing is to figure out
which rule is being hit and see if something is wrong with it. For example, I
-a always,exit -S open,openat -F exit=-EPERM
The problem is that they did not restrict the rule an architecture and they
were getting lots of events for the wrong syscall. I've also seen people add
-F success 0 to an open syscall. This also results in a large number of
events.
So, I'd recommend making sure all rules have keys added and the running the
key summary report to see what rule needs inspection.
If you find the rule that's causing the problem and you want an opinion, send
it to the mail list.
-Steve
Post by Brad Zynda
But we are still seeing 100K of dropped/suppressed messages.
Note: systemloglevel = INFO
Centos 7 1708 3.10.0-693.2.2.el7.x86_64
systemd.x86_64 219-42.el7_4.1
Now we are required to have full audit rules and does this look like at
rate limiting issue or an issue of journal not able to handle the
traffic to logging?
Error we are seeing from services that have silently failed, in this
case glassfish..
systemctl status -l glassfish
● glassfish.service - SYSV: GlassFish start and stop daemon
Loaded: loaded (/etc/rc.d/init.d/glassfish; bad; vendor preset: disabled)
man:systemd-sysv-generator(8)
Process: 1328 ExecStart=/etc/rc.d/init.d/glassfish start (code=exited,
status=0/SUCCESS)
Warning: Journal has been rotated since unit was started. Log output is
incomplete or unavailable.
Eventually glassfish will fail but it wont kill the service so we never
get an nms service down trap from the OID.
Please let me know if further info is needed or if certain limits need
to be adjusted.
Thanks,
Brad Zynda
--
Linux-audit mailing list
https://www.redhat.com/mailman/listinfo/linux-audit
Steve Grubb
2017-10-17 16:25:55 UTC
Permalink
Raw Message
Post by Brad Zynda
Hey Steve,
No problem you guys are busy with updates..
So I kind of stepped into a known issue with a current disagreement
between the 2 maintainers?
Its not a disagreement. Its systemd wants to do everything. Its a crond/
xinetd/syslogd/auditd/core dump collector/udev/login service/fstab/fs
automounter/container manager/file system monitor/resource manager/daemon
watchdog and oh by the way, it does init.
Post by Brad Zynda
what can be done to resolve this going
forward as it is killing services in production environments?
End users have to take the situation into their own hands. There are
configuration knobs for a reason. More info here:

https://bugzilla.redhat.com/show_bug.cgi?id=1227379
Post by Brad Zynda
I agree with the need not to remove auditing as this is a slippery slope
and should not occur but the decision was based on little documentation
in regards to the problem and loss of service, I will look at further
checking in hopes to find the specific rule. Though I think the latter
to fix the issue is the appropriate avenue.
Figuring out which rule is triggering is the best solution. It may turn out
you just have a busy system. But most of the time its a bad rule.
Post by Brad Zynda
The rules have been put in place across many organizations that check
with tools like CIS-CAT and OSCAP, so a lot of rules and a point of
possible single failure.
They make mistakes, too.
Post by Brad Zynda
In regards to the audit.socket what is the expected outcome of masking
this service?
The expected outcome is that journald stops getting audit records. It doesn't
solve the problem of why you are getting so many events. Fixing the rule does
that.

-Steve
Post by Brad Zynda
Post by Steve Grubb
Hello,
I apologize for the late reply...just found the message.
Post by Brad Zynda
I am sending along an issue brought to the systemd-journald dev list
Sep 28 13:50:03 server systemd-journal[565]: Suppressed 73244 messages
from /system.slice/auditd.service
The question is: why does auditd even log to the journal?
It doesn't. I have had many arguments with the systemd people about
polluting syslog with audit events. If we wanted audit events there, we
would have wrote them there. The journal is listening on a multicast
audit socket that was created just for them and using a posix capability
that was created just for them. And journal also turns on auditing even
if you didn't want it. In short, they have, with intention, created your
problem.
Post by Brad Zynda
Now we are required to have full audit rules and does this look like at
rate limiting issue or an issue of journal not able to handle the
traffic to logging?
journald detected that it got flooded with too many messages in too
short a time from auditd. if this happens then something is almost
certainly off with auditd, as auditd is not supposed to flood journald
with messages, after all it maintains its own auditing log database.
No...that's the way it works. If you want the audit stream, you have to be
able to handle it. My suggestion is that we have a separation of duties.
Auditd has audit events, journal has syslog. Besides, mixing audit and
syslog data means the security officer and system admin roles have been
combined. I think there is an audit.socket unit file that can be masked.
Post by Brad Zynda
Please ping the auditd folks for help
They created the problem of audit events in syslog. That said, its been my
experience that whenever you get lots of events, there may be something
wrong with your rules.
The normal technique to figure out what wrong is to run aureport --summary
--key during the time range of the flood to see what rule is triggering.
Then we can look at that rule to see if there's something wrong with it.
More below...
Post by Brad Zynda
Hey Everyone,
systemctl status -l systemd-journald.service
● systemd-journald.service - Journal Service
Loaded: loaded (/usr/lib/systemd/system/systemd-journald.service;
static; vendor preset: disabled)
Active: active (running) since Tue 2017-09-26 20:01:16 UTC; 5 days ago
Docs: man:systemd-journald.service(8)
man:journald.conf(5)
Main PID: 565 (systemd-journal)
Status: "Processing requests..."
CGroup: /system.slice/systemd-journald.service
└─565 /usr/lib/systemd/systemd-journald
Sep 28 13:50:03 server systemd-journal[565]: Suppressed 73244 messages
from /system.slice/auditd.service
Sep 28 13:51:03 server systemd-journal[565]: Suppressed 98979 messages
from /system.slice/auditd.service
Sep 28 13:52:03 server systemd-journal[565]: Suppressed 109433 messages
from /system.slice/auditd.service
Sep 28 13:53:03 server systemd-journal[565]: Suppressed 99788 messages
from /system.slice/auditd.service
Sep 28 13:54:03 server systemd-journal[565]: Suppressed 111605 messages
from /system.slice/auditd.service
Sep 28 13:55:03 server systemd-journal[565]: Suppressed 111591 messages
from /system.slice/auditd.service
Sep 28 13:56:03 server systemd-journal[565]: Suppressed 107947 messages
from /system.slice/auditd.service
Sep 28 13:57:51 server systemd-journal[565]: Suppressed 32760 messages
from /system.slice/auditd.service
Sep 28 17:21:40 server systemd-journal[565]: Suppressed 210 messages
from /system.slice/auditd.service
Oct 01 02:16:01 server systemd-journal[565]: Suppressed 1333 messages
from /system.slice/auditd.service
journalctl --verify
PASS: /run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/system.journal
b95 d8203c5e96a46-0000000000897f32-0005593e18c5758b.journal
journalctl --disk-usage
Archived and active journals take up 1.1G on disk.
16733 PATH
5070 SYSCALL
5024 CWD
3765 AVC
323 CRYPTO_KEY_USER
223 USER_START
222 USER_ACCT
222 CRED_ACQ
220 LOGIN
220 CRED_REFR
218 USER_END
218 CRED_DISP
46 USER_LOGIN
12 EXECVE
4 USER_AUTH
2 CRYPTO_SESSION
1 USER_ROLE_CHANGE
1 USER_CMD
1 SERVICE_STOP
1 SERVICE_START
1 BPRM_FCAPS
so we blocked type PATH in audit.rules
This is not the right thing to do. If a security officer asks what is being
accessed, you got rid of the information. The right thing is to figure out
which rule is being hit and see if something is wrong with it. For
-a always,exit -S open,openat -F exit=-EPERM
The problem is that they did not restrict the rule an architecture and they
were getting lots of events for the wrong syscall. I've also seen people
add -F success 0 to an open syscall. This also results in a large number
of events.
So, I'd recommend making sure all rules have keys added and the running the
key summary report to see what rule needs inspection.
If you find the rule that's causing the problem and you want an opinion,
send it to the mail list.
-Steve
Post by Brad Zynda
But we are still seeing 100K of dropped/suppressed messages.
Note: systemloglevel = INFO
Centos 7 1708 3.10.0-693.2.2.el7.x86_64
systemd.x86_64 219-42.el7_4.1
Now we are required to have full audit rules and does this look like at
rate limiting issue or an issue of journal not able to handle the
traffic to logging?
Error we are seeing from services that have silently failed, in this
case glassfish..
systemctl status -l glassfish
● glassfish.service - SYSV: GlassFish start and stop daemon
Loaded: loaded (/etc/rc.d/init.d/glassfish; bad; vendor preset: disabled)
man:systemd-sysv-generator(8)
Process: 1328 ExecStart=/etc/rc.d/init.d/glassfish start (code=exited,
status=0/SUCCESS)
Warning: Journal has been rotated since unit was started. Log output is
incomplete or unavailable.
Eventually glassfish will fail but it wont kill the service so we never
get an nms service down trap from the OID.
Please let me know if further info is needed or if certain limits need
to be adjusted.
Thanks,
Brad Zynda
--
Linux-audit mailing list
https://www.redhat.com/mailman/listinfo/linux-audit
Brad Zynda
2017-10-17 17:13:36 UTC
Permalink
Raw Message
Hi Steve,

Thanks for pointing me in the right direction and including the 2 year
old ticket to reference ;)

I will see about getting the audit.socket masked if it is allowed under
FIPS/NIST.

Thanks again,
Brad
Post by Steve Grubb
Post by Brad Zynda
Hey Steve,
No problem you guys are busy with updates..
So I kind of stepped into a known issue with a current disagreement
between the 2 maintainers?
Its not a disagreement. Its systemd wants to do everything. Its a crond/
xinetd/syslogd/auditd/core dump collector/udev/login service/fstab/fs
automounter/container manager/file system monitor/resource manager/daemon
watchdog and oh by the way, it does init.
Post by Brad Zynda
what can be done to resolve this going
forward as it is killing services in production environments?
End users have to take the situation into their own hands. There are
https://bugzilla.redhat.com/show_bug.cgi?id=1227379
Post by Brad Zynda
I agree with the need not to remove auditing as this is a slippery slope
and should not occur but the decision was based on little documentation
in regards to the problem and loss of service, I will look at further
checking in hopes to find the specific rule. Though I think the latter
to fix the issue is the appropriate avenue.
Figuring out which rule is triggering is the best solution. It may turn out
you just have a busy system. But most of the time its a bad rule.
Post by Brad Zynda
The rules have been put in place across many organizations that check
with tools like CIS-CAT and OSCAP, so a lot of rules and a point of
possible single failure.
They make mistakes, too.
Post by Brad Zynda
In regards to the audit.socket what is the expected outcome of masking
this service?
The expected outcome is that journald stops getting audit records. It doesn't
solve the problem of why you are getting so many events. Fixing the rule does
that.
-Steve
Post by Brad Zynda
Post by Steve Grubb
Hello,
I apologize for the late reply...just found the message.
Post by Brad Zynda
I am sending along an issue brought to the systemd-journald dev list
Sep 28 13:50:03 server systemd-journal[565]: Suppressed 73244 messages
from /system.slice/auditd.service
The question is: why does auditd even log to the journal?
It doesn't. I have had many arguments with the systemd people about
polluting syslog with audit events. If we wanted audit events there, we
would have wrote them there. The journal is listening on a multicast
audit socket that was created just for them and using a posix capability
that was created just for them. And journal also turns on auditing even
if you didn't want it. In short, they have, with intention, created your
problem.
Post by Brad Zynda
Now we are required to have full audit rules and does this look like at
rate limiting issue or an issue of journal not able to handle the
traffic to logging?
journald detected that it got flooded with too many messages in too
short a time from auditd. if this happens then something is almost
certainly off with auditd, as auditd is not supposed to flood journald
with messages, after all it maintains its own auditing log database.
No...that's the way it works. If you want the audit stream, you have to be
able to handle it. My suggestion is that we have a separation of duties.
Auditd has audit events, journal has syslog. Besides, mixing audit and
syslog data means the security officer and system admin roles have been
combined. I think there is an audit.socket unit file that can be masked.
Post by Brad Zynda
Please ping the auditd folks for help
They created the problem of audit events in syslog. That said, its been my
experience that whenever you get lots of events, there may be something
wrong with your rules.
The normal technique to figure out what wrong is to run aureport --summary
--key during the time range of the flood to see what rule is triggering.
Then we can look at that rule to see if there's something wrong with it.
More below...
Post by Brad Zynda
Hey Everyone,
systemctl status -l systemd-journald.service
● systemd-journald.service - Journal Service
Loaded: loaded (/usr/lib/systemd/system/systemd-journald.service;
static; vendor preset: disabled)
Active: active (running) since Tue 2017-09-26 20:01:16 UTC; 5 days ago
Docs: man:systemd-journald.service(8)
man:journald.conf(5)
Main PID: 565 (systemd-journal)
Status: "Processing requests..."
CGroup: /system.slice/systemd-journald.service
└─565 /usr/lib/systemd/systemd-journald
Sep 28 13:50:03 server systemd-journal[565]: Suppressed 73244 messages
from /system.slice/auditd.service
Sep 28 13:51:03 server systemd-journal[565]: Suppressed 98979 messages
from /system.slice/auditd.service
Sep 28 13:52:03 server systemd-journal[565]: Suppressed 109433 messages
from /system.slice/auditd.service
Sep 28 13:53:03 server systemd-journal[565]: Suppressed 99788 messages
from /system.slice/auditd.service
Sep 28 13:54:03 server systemd-journal[565]: Suppressed 111605 messages
from /system.slice/auditd.service
Sep 28 13:55:03 server systemd-journal[565]: Suppressed 111591 messages
from /system.slice/auditd.service
Sep 28 13:56:03 server systemd-journal[565]: Suppressed 107947 messages
from /system.slice/auditd.service
Sep 28 13:57:51 server systemd-journal[565]: Suppressed 32760 messages
from /system.slice/auditd.service
Sep 28 17:21:40 server systemd-journal[565]: Suppressed 210 messages
from /system.slice/auditd.service
Oct 01 02:16:01 server systemd-journal[565]: Suppressed 1333 messages
from /system.slice/auditd.service
journalctl --verify
PASS: /run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/system.journal
b95 d8203c5e96a46-0000000000897f32-0005593e18c5758b.journal
journalctl --disk-usage
Archived and active journals take up 1.1G on disk.
16733 PATH
5070 SYSCALL
5024 CWD
3765 AVC
323 CRYPTO_KEY_USER
223 USER_START
222 USER_ACCT
222 CRED_ACQ
220 LOGIN
220 CRED_REFR
218 USER_END
218 CRED_DISP
46 USER_LOGIN
12 EXECVE
4 USER_AUTH
2 CRYPTO_SESSION
1 USER_ROLE_CHANGE
1 USER_CMD
1 SERVICE_STOP
1 SERVICE_START
1 BPRM_FCAPS
so we blocked type PATH in audit.rules
This is not the right thing to do. If a security officer asks what is being
accessed, you got rid of the information. The right thing is to figure out
which rule is being hit and see if something is wrong with it. For
-a always,exit -S open,openat -F exit=-EPERM
The problem is that they did not restrict the rule an architecture and they
were getting lots of events for the wrong syscall. I've also seen people
add -F success 0 to an open syscall. This also results in a large number
of events.
So, I'd recommend making sure all rules have keys added and the running the
key summary report to see what rule needs inspection.
If you find the rule that's causing the problem and you want an opinion,
send it to the mail list.
-Steve
Post by Brad Zynda
But we are still seeing 100K of dropped/suppressed messages.
Note: systemloglevel = INFO
Centos 7 1708 3.10.0-693.2.2.el7.x86_64
systemd.x86_64 219-42.el7_4.1
Now we are required to have full audit rules and does this look like at
rate limiting issue or an issue of journal not able to handle the
traffic to logging?
Error we are seeing from services that have silently failed, in this
case glassfish..
systemctl status -l glassfish
● glassfish.service - SYSV: GlassFish start and stop daemon
Loaded: loaded (/etc/rc.d/init.d/glassfish; bad; vendor preset: disabled)
man:systemd-sysv-generator(8)
Process: 1328 ExecStart=/etc/rc.d/init.d/glassfish start (code=exited,
status=0/SUCCESS)
Warning: Journal has been rotated since unit was started. Log output is
incomplete or unavailable.
Eventually glassfish will fail but it wont kill the service so we never
get an nms service down trap from the OID.
Please let me know if further info is needed or if certain limits need
to be adjusted.
Thanks,
Brad Zynda
--
Linux-audit mailing list
https://www.redhat.com/mailman/listinfo/linux-audit
Brad Zynda
2017-10-18 15:14:31 UTC
Permalink
Raw Message
Hey Steve,

Here is an output from the server with PATH audit type re-allowed
(everything back to normal):

Key Summary Report
===========================
total key
===========================
6019 perm_mod
3878 delete
964 access
96 privileged
57 time-change
51 session
41 modules
20 logins
6 system-locale
5 identity
2 mounts
2 scope
2 actions
1 MAC-policy

Now this is probably not a peak result but I have come across 2 questions..

1. I wanted to cron this and email results but get, verified path sbin:

Key Summary Report
===========================
total key
===========================
<no events of interest were found>


2. If it ends up being perm_mod as the high count what is the next step
to identify the rule in question?

Thanks,
Brad
Post by Brad Zynda
Hi Steve,
Thanks for pointing me in the right direction and including the 2 year
old ticket to reference ;)
I will see about getting the audit.socket masked if it is allowed under
FIPS/NIST.
Thanks again,
Brad
Post by Steve Grubb
Post by Brad Zynda
Hey Steve,
No problem you guys are busy with updates..
So I kind of stepped into a known issue with a current disagreement
between the 2 maintainers?
Its not a disagreement. Its systemd wants to do everything. Its a crond/
xinetd/syslogd/auditd/core dump collector/udev/login service/fstab/fs
automounter/container manager/file system monitor/resource manager/daemon
watchdog and oh by the way, it does init.
Post by Brad Zynda
what can be done to resolve this going
forward as it is killing services in production environments?
End users have to take the situation into their own hands. There are
https://bugzilla.redhat.com/show_bug.cgi?id=1227379
Post by Brad Zynda
I agree with the need not to remove auditing as this is a slippery slope
and should not occur but the decision was based on little documentation
in regards to the problem and loss of service, I will look at further
checking in hopes to find the specific rule. Though I think the latter
to fix the issue is the appropriate avenue.
Figuring out which rule is triggering is the best solution. It may turn out
you just have a busy system. But most of the time its a bad rule.
Post by Brad Zynda
The rules have been put in place across many organizations that check
with tools like CIS-CAT and OSCAP, so a lot of rules and a point of
possible single failure.
They make mistakes, too.
Post by Brad Zynda
In regards to the audit.socket what is the expected outcome of masking
this service?
The expected outcome is that journald stops getting audit records. It doesn't
solve the problem of why you are getting so many events. Fixing the rule does
that.
-Steve
Post by Brad Zynda
Post by Steve Grubb
Hello,
I apologize for the late reply...just found the message.
Post by Brad Zynda
I am sending along an issue brought to the systemd-journald dev list
Sep 28 13:50:03 server systemd-journal[565]: Suppressed 73244 messages
from /system.slice/auditd.service
The question is: why does auditd even log to the journal?
It doesn't. I have had many arguments with the systemd people about
polluting syslog with audit events. If we wanted audit events there, we
would have wrote them there. The journal is listening on a multicast
audit socket that was created just for them and using a posix capability
that was created just for them. And journal also turns on auditing even
if you didn't want it. In short, they have, with intention, created your
problem.
Post by Brad Zynda
Now we are required to have full audit rules and does this look like at
rate limiting issue or an issue of journal not able to handle the
traffic to logging?
journald detected that it got flooded with too many messages in too
short a time from auditd. if this happens then something is almost
certainly off with auditd, as auditd is not supposed to flood journald
with messages, after all it maintains its own auditing log database.
No...that's the way it works. If you want the audit stream, you have to be
able to handle it. My suggestion is that we have a separation of duties.
Auditd has audit events, journal has syslog. Besides, mixing audit and
syslog data means the security officer and system admin roles have been
combined. I think there is an audit.socket unit file that can be masked.
Post by Brad Zynda
Please ping the auditd folks for help
They created the problem of audit events in syslog. That said, its been my
experience that whenever you get lots of events, there may be something
wrong with your rules.
The normal technique to figure out what wrong is to run aureport --summary
--key during the time range of the flood to see what rule is triggering.
Then we can look at that rule to see if there's something wrong with it.
More below...
Post by Brad Zynda
Hey Everyone,
systemctl status -l systemd-journald.service
● systemd-journald.service - Journal Service
Loaded: loaded (/usr/lib/systemd/system/systemd-journald.service;
static; vendor preset: disabled)
Active: active (running) since Tue 2017-09-26 20:01:16 UTC; 5 days ago
Docs: man:systemd-journald.service(8)
man:journald.conf(5)
Main PID: 565 (systemd-journal)
Status: "Processing requests..."
CGroup: /system.slice/systemd-journald.service
└─565 /usr/lib/systemd/systemd-journald
Sep 28 13:50:03 server systemd-journal[565]: Suppressed 73244 messages
from /system.slice/auditd.service
Sep 28 13:51:03 server systemd-journal[565]: Suppressed 98979 messages
from /system.slice/auditd.service
Sep 28 13:52:03 server systemd-journal[565]: Suppressed 109433 messages
from /system.slice/auditd.service
Sep 28 13:53:03 server systemd-journal[565]: Suppressed 99788 messages
from /system.slice/auditd.service
Sep 28 13:54:03 server systemd-journal[565]: Suppressed 111605 messages
from /system.slice/auditd.service
Sep 28 13:55:03 server systemd-journal[565]: Suppressed 111591 messages
from /system.slice/auditd.service
Sep 28 13:56:03 server systemd-journal[565]: Suppressed 107947 messages
from /system.slice/auditd.service
Sep 28 13:57:51 server systemd-journal[565]: Suppressed 32760 messages
from /system.slice/auditd.service
Sep 28 17:21:40 server systemd-journal[565]: Suppressed 210 messages
from /system.slice/auditd.service
Oct 01 02:16:01 server systemd-journal[565]: Suppressed 1333 messages
from /system.slice/auditd.service
journalctl --verify
PASS: /run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/system.journal
b95 d8203c5e96a46-0000000000897f32-0005593e18c5758b.journal
journalctl --disk-usage
Archived and active journals take up 1.1G on disk.
16733 PATH
5070 SYSCALL
5024 CWD
3765 AVC
323 CRYPTO_KEY_USER
223 USER_START
222 USER_ACCT
222 CRED_ACQ
220 LOGIN
220 CRED_REFR
218 USER_END
218 CRED_DISP
46 USER_LOGIN
12 EXECVE
4 USER_AUTH
2 CRYPTO_SESSION
1 USER_ROLE_CHANGE
1 USER_CMD
1 SERVICE_STOP
1 SERVICE_START
1 BPRM_FCAPS
so we blocked type PATH in audit.rules
This is not the right thing to do. If a security officer asks what is being
accessed, you got rid of the information. The right thing is to figure out
which rule is being hit and see if something is wrong with it. For
-a always,exit -S open,openat -F exit=-EPERM
The problem is that they did not restrict the rule an architecture and they
were getting lots of events for the wrong syscall. I've also seen people
add -F success 0 to an open syscall. This also results in a large number
of events.
So, I'd recommend making sure all rules have keys added and the running the
key summary report to see what rule needs inspection.
If you find the rule that's causing the problem and you want an opinion,
send it to the mail list.
-Steve
Post by Brad Zynda
But we are still seeing 100K of dropped/suppressed messages.
Note: systemloglevel = INFO
Centos 7 1708 3.10.0-693.2.2.el7.x86_64
systemd.x86_64 219-42.el7_4.1
Now we are required to have full audit rules and does this look like at
rate limiting issue or an issue of journal not able to handle the
traffic to logging?
Error we are seeing from services that have silently failed, in this
case glassfish..
systemctl status -l glassfish
● glassfish.service - SYSV: GlassFish start and stop daemon
Loaded: loaded (/etc/rc.d/init.d/glassfish; bad; vendor preset: disabled)
man:systemd-sysv-generator(8)
Process: 1328 ExecStart=/etc/rc.d/init.d/glassfish start (code=exited,
status=0/SUCCESS)
Warning: Journal has been rotated since unit was started. Log output is
incomplete or unavailable.
Eventually glassfish will fail but it wont kill the service so we never
get an nms service down trap from the OID.
Please let me know if further info is needed or if certain limits need
to be adjusted.
Thanks,
Brad Zynda
--
Linux-audit mailing list
https://www.redhat.com/mailman/listinfo/linux-audit
Steve Grubb
2017-10-18 15:40:21 UTC
Permalink
Raw Message
Post by Brad Zynda
Here is an output from the server with PATH audit type re-allowed
Key Summary Report
===========================
total key
===========================
6019 perm_mod
3878 delete
964 access
96 privileged
57 time-change
51 session
41 modules
20 logins
6 system-locale
5 identity
2 mounts
2 scope
2 actions
1 MAC-policy
Now this is probably not a peak result but I have come across 2 questions..
Key Summary Report
===========================
total key
===========================
<no events of interest were found>
This is a well known problem. From aureport man page:

--input-logs
Use the log file location from auditd.conf as input for analy‐
sis. This is needed if you are using aureport from a cron job.

ausearch/report can be piped to by stdin. This takes priority over the logs.
Cron uses pipes for all 3 descriptors. Therefore you have to tell them to
ignore what they are seeing and just use the logs.
Post by Brad Zynda
2. If it ends up being perm_mod as the high count what is the next step
to identify the rule in question?
grep perm_mod /etc/audit/audit.rules

delete also looks excessive.

-Steve
Brad Zynda
2017-10-18 16:13:13 UTC
Permalink
Raw Message
Post by Steve Grubb
Post by Brad Zynda
Here is an output from the server with PATH audit type re-allowed
Key Summary Report
===========================
total key
===========================
6019 perm_mod
3878 delete
964 access
96 privileged
57 time-change
51 session
41 modules
20 logins
6 system-locale
5 identity
2 mounts
2 scope
2 actions
1 MAC-policy
Now this is probably not a peak result but I have come across 2 questions..
Key Summary Report
===========================
total key
===========================
<no events of interest were found>
--input-logs
Use the log file location from auditd.conf as input for analy‐
sis. This is needed if you are using aureport from a cron job.
ausearch/report can be piped to by stdin. This takes priority over the logs.
Cron uses pipes for all 3 descriptors. Therefore you have to tell them to
ignore what they are seeing and just use the logs.
Post by Brad Zynda
2. If it ends up being perm_mod as the high count what is the next step
to identify the rule in question?
grep perm_mod /etc/audit/audit.rules
delete also looks excessive.
-Steve
Yep input-logs fit the bill.

So now you have to comment out a rule at a time and watch for
usage/count to fall?

Also if I wanted to grep and compare those numbers and alert with an
email what would be the magic number to threshold with in a gt statement
(500, 1000, etc.)?

Thanks,
Brad
Steve Grubb
2017-10-18 16:26:45 UTC
Permalink
Raw Message
Post by Brad Zynda
So now you have to comment out a rule at a time and watch for
usage/count to fall?
Well, I am certain that commenting out that rule will drop the count. But the
question more is why is that rule being triggered. One thing you could do is
post the rule to the mail list if you think it might be formed wrong. But you
might also want to see whay its being triggered by doing something like

ausearch --start today -k perm_mod --raw | aureport --summary --file -i

ausearch --start today -k perm_mod --raw | aureport --summary -x -i

ausearch --start today -k perm_mod --raw | aureport --summary --syscall -i
Post by Brad Zynda
Also if I wanted to grep and compare those numbers and alert with an
email what would be the magic number to threshold with in a gt statement
(500, 1000, etc.)?
That's a bit harder. You'd want a sliding window of time. Assuming your cron
job runs once an hour and a US locale, you'd do something like this:

aureport --start `date -d '1 hour ago' "+%m/%d/%Y %H:%M:%S"` --key --summary
--input-logs

I don't know what the best threshold would be because its workload dependent.
If you wanted to see things visualized, I'd try playing with the data in R.

http://security-plus-data-science.blogspot.com/2017/03/bar-charts.html
http://security-plus-data-science.blogspot.com/2017/03/heatmaps.html

That assumes you have a recent audit package (2.7 or higher) and RStudio.

-Steve
Brad Zynda
2017-10-18 16:32:15 UTC
Permalink
Raw Message
Post by Steve Grubb
Post by Brad Zynda
So now you have to comment out a rule at a time and watch for
usage/count to fall?
Well, I am certain that commenting out that rule will drop the count. But the
question more is why is that rule being triggered. One thing you could do is
post the rule to the mail list if you think it might be formed wrong. But you
might also want to see whay its being triggered by doing something like
ausearch --start today -k perm_mod --raw | aureport --summary --file -i
ausearch --start today -k perm_mod --raw | aureport --summary -x -i
ausearch --start today -k perm_mod --raw | aureport --summary --syscall -i
Post by Brad Zynda
Also if I wanted to grep and compare those numbers and alert with an
email what would be the magic number to threshold with in a gt statement
(500, 1000, etc.)?
That's a bit harder. You'd want a sliding window of time. Assuming your cron
aureport --start `date -d '1 hour ago' "+%m/%d/%Y %H:%M:%S"` --key --summary
--input-logs
I don't know what the best threshold would be because its workload dependent.
If you wanted to see things visualized, I'd try playing with the data in R.
http://security-plus-data-science.blogspot.com/2017/03/bar-charts.html
http://security-plus-data-science.blogspot.com/2017/03/heatmaps.html
That assumes you have a recent audit package (2.7 or higher) and RStudio.
-Steve
Here are the rules:

grep perm_mod /etc/audit/audit.rules
-a always,exit -F arch=b64 -S chmod -S fchmod -S fchmodat -F auid>=1000
-F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S chmod -S fchmod -S fchmodat -F auid>=1000
-F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b64 -S chown -S fchown -S fchownat -S lchown -F
auid>=1000 -F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S chown -S fchown -S fchownat -S lchown -F
auid>=1000 -F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b64 -S setxattr -S lsetxattr -S fsetxattr -S
removexattr -S lremovexattr -S fremovexattr -F auid>=1000 -F
auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S setxattr -S lsetxattr -S fsetxattr -S
removexattr -S lremovexattr -S fremovexattr -F auid>=1000 -F
auid!=4294967295 -k perm_mod

grep delete /etc/audit/audit.rules
-a always,exit -F arch=b64 -S unlink -S unlinkat -S rename -S renameat
-F auid>=1000 -F auid!=4294967295 -k delete
-a always,exit -F arch=b32 -S unlink -S unlinkat -S rename -S renameat
-F auid>=1000 -F auid!=4294967295 -k delete
-a always,exit -F arch=b64 -S init_module -S delete_module -k modules
-a always,exit -F arch=b32 -S init_module -S delete_module -k modules

Thanks,
Brad
Steve Grubb
2017-10-18 23:27:53 UTC
Permalink
Raw Message
Post by Steve Grubb
Post by Steve Grubb
Post by Brad Zynda
So now you have to comment out a rule at a time and watch for
usage/count to fall?
Well, I am certain that commenting out that rule will drop the count. But
the question more is why is that rule being triggered. One thing you
could do is post the rule to the mail list if you think it might be
formed wrong. But you might also want to see whay its being triggered by
doing something like
ausearch --start today -k perm_mod --raw | aureport --summary --file -i
ausearch --start today -k perm_mod --raw | aureport --summary -x -i
ausearch --start today -k perm_mod --raw | aureport --summary --syscall -i
Post by Brad Zynda
Also if I wanted to grep and compare those numbers and alert with an
email what would be the magic number to threshold with in a gt statement
(500, 1000, etc.)?
That's a bit harder. You'd want a sliding window of time. Assuming your
aureport --start `date -d '1 hour ago' "+%m/%d/%Y %H:%M:%S"` --key
--summary --input-logs
I don't know what the best threshold would be because its workload
dependent. If you wanted to see things visualized, I'd try playing with
the data in R.
http://security-plus-data-science.blogspot.com/2017/03/bar-charts.html
http://security-plus-data-science.blogspot.com/2017/03/heatmaps.html
That assumes you have a recent audit package (2.7 or higher) and RStudio.
-Steve
grep perm_mod /etc/audit/audit.rules
-a always,exit -F arch=b64 -S chmod -S fchmod -S fchmodat -F auid>=1000
-F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S chmod -S fchmod -S fchmodat -F auid>=1000
-F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b64 -S chown -S fchown -S fchownat -S lchown -F
auid>=1000 -F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S chown -S fchown -S fchownat -S lchown -F
auid>=1000 -F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b64 -S setxattr -S lsetxattr -S fsetxattr -S
removexattr -S lremovexattr -S fremovexattr -F auid>=1000 -F
auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S setxattr -S lsetxattr -S fsetxattr -S
removexattr -S lremovexattr -S fremovexattr -F auid>=1000 -F
auid!=4294967295 -k perm_mod
grep delete /etc/audit/audit.rules
-a always,exit -F arch=b64 -S unlink -S unlinkat -S rename -S renameat
-F auid>=1000 -F auid!=4294967295 -k delete
-a always,exit -F arch=b32 -S unlink -S unlinkat -S rename -S renameat
-F auid>=1000 -F auid!=4294967295 -k delete
-a always,exit -F arch=b64 -S init_module -S delete_module -k modules
-a always,exit -F arch=b32 -S init_module -S delete_module -k modules
These rules are well formed. Meaning no obvious holes that would cause
unintended events. The other ausearch/aureport commands I gave you should show
you what is causing the events and to which files. This information may be
used to create some kind of "never" rule that limits what gets audited. If you
do create some exclusion rule, it needs to be above the perm_mod rules because
audit is a "first match wins" system.

-Steve

-Steve
Brad Zynda
2017-10-19 17:08:22 UTC
Permalink
Raw Message
Post by Steve Grubb
Post by Steve Grubb
Post by Steve Grubb
Post by Brad Zynda
So now you have to comment out a rule at a time and watch for
usage/count to fall?
Well, I am certain that commenting out that rule will drop the count. But
the question more is why is that rule being triggered. One thing you
could do is post the rule to the mail list if you think it might be
formed wrong. But you might also want to see whay its being triggered by
doing something like
ausearch --start today -k perm_mod --raw | aureport --summary --file -i
ausearch --start today -k perm_mod --raw | aureport --summary -x -i
ausearch --start today -k perm_mod --raw | aureport --summary --syscall -i
Post by Brad Zynda
Also if I wanted to grep and compare those numbers and alert with an
email what would be the magic number to threshold with in a gt statement
(500, 1000, etc.)?
That's a bit harder. You'd want a sliding window of time. Assuming your
aureport --start `date -d '1 hour ago' "+%m/%d/%Y %H:%M:%S"` --key
--summary --input-logs
I don't know what the best threshold would be because its workload
dependent. If you wanted to see things visualized, I'd try playing with
the data in R.
http://security-plus-data-science.blogspot.com/2017/03/bar-charts.html
http://security-plus-data-science.blogspot.com/2017/03/heatmaps.html
That assumes you have a recent audit package (2.7 or higher) and RStudio.
-Steve
grep perm_mod /etc/audit/audit.rules
-a always,exit -F arch=b64 -S chmod -S fchmod -S fchmodat -F auid>=1000
-F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S chmod -S fchmod -S fchmodat -F auid>=1000
-F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b64 -S chown -S fchown -S fchownat -S lchown -F
auid>=1000 -F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S chown -S fchown -S fchownat -S lchown -F
auid>=1000 -F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b64 -S setxattr -S lsetxattr -S fsetxattr -S
removexattr -S lremovexattr -S fremovexattr -F auid>=1000 -F
auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S setxattr -S lsetxattr -S fsetxattr -S
removexattr -S lremovexattr -S fremovexattr -F auid>=1000 -F
auid!=4294967295 -k perm_mod
grep delete /etc/audit/audit.rules
-a always,exit -F arch=b64 -S unlink -S unlinkat -S rename -S renameat
-F auid>=1000 -F auid!=4294967295 -k delete
-a always,exit -F arch=b32 -S unlink -S unlinkat -S rename -S renameat
-F auid>=1000 -F auid!=4294967295 -k delete
-a always,exit -F arch=b64 -S init_module -S delete_module -k modules
-a always,exit -F arch=b32 -S init_module -S delete_module -k modules
These rules are well formed. Meaning no obvious holes that would cause
unintended events. The other ausearch/aureport commands I gave you should show
you what is causing the events and to which files. This information may be
used to create some kind of "never" rule that limits what gets audited. If you
do create some exclusion rule, it needs to be above the perm_mod rules because
audit is a "first match wins" system.
-Steve
-Steve
Here is a peak report (at least in the last 24 hours) didnt include the
file summaries as that would make this email HUGE:

Key Summary Report
===========================
total key
===========================
8170 perm_mod
5390 delete
1073 access
56 time-change
26 session
12 privileged
7 logins


Syscall Summary Report
==========================
total syscall
==========================
4250 fchmodat
1613 chmod
831 fchmod
521 fchown
97 chown
12 setxattr


Syscall Summary Report
==========================
total syscall
==========================
2887 unlink
2189 rename
186 unlinkat


so from the list the top 2 in each perm_mod and delete from the above
list seem to be the culprits..

Thanks,
Brad
Steve Grubb
2017-10-19 20:13:22 UTC
Permalink
Raw Message
Post by Brad Zynda
Post by Steve Grubb
Post by Steve Grubb
grep perm_mod /etc/audit/audit.rules
-a always,exit -F arch=b64 -S chmod -S fchmod -S fchmodat -F auid>=1000
-F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S chmod -S fchmod -S fchmodat -F auid>=1000
-F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b64 -S chown -S fchown -S fchownat -S lchown -F
auid>=1000 -F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S chown -S fchown -S fchownat -S lchown -F
auid>=1000 -F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b64 -S setxattr -S lsetxattr -S fsetxattr -S
removexattr -S lremovexattr -S fremovexattr -F auid>=1000 -F
auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S setxattr -S lsetxattr -S fsetxattr -S
removexattr -S lremovexattr -S fremovexattr -F auid>=1000 -F
auid!=4294967295 -k perm_mod
grep delete /etc/audit/audit.rules
-a always,exit -F arch=b64 -S unlink -S unlinkat -S rename -S renameat
-F auid>=1000 -F auid!=4294967295 -k delete
-a always,exit -F arch=b32 -S unlink -S unlinkat -S rename -S renameat
-F auid>=1000 -F auid!=4294967295 -k delete
-a always,exit -F arch=b64 -S init_module -S delete_module -k modules
-a always,exit -F arch=b32 -S init_module -S delete_module -k modules
These rules are well formed. Meaning no obvious holes that would cause
unintended events. The other ausearch/aureport commands I gave you should
show you what is causing the events and to which files. This information
may be used to create some kind of "never" rule that limits what gets
audited. If you do create some exclusion rule, it needs to be above the
perm_mod rules because audit is a "first match wins" system.
-Steve
-Steve
Here is a peak report (at least in the last 24 hours) didnt include the
Well, the idea is to look for something that's getting hit a lot. What it
sounds like is that things are getting deleted and modified quite a bit all
over the place. Does the executable report show a pattern such as one
application doing a lot? For example, with the rule you have, doing a yum
update will delete a whole lot of stuff and modify a whole lot of stuff.

Its possible to put exceptions in the rules so that one program does not flood
the logs. But looking at the quantities below, the audit system can easily
handle that.

Its also possible to exclude directories from auditing if the pattern is that
you have a daemon receiving and modifying files and then deleting them. You
should look at the executables & files and see if you can do with auditing
what they are doing because its not interesting.

If this is causing you problems on the journald side where its causing other
tasks to fail. Then I'd break the link between auditd and journald. The amount
of event you show is highish but well within range of what auditd can do. Just
make sure flush is set to incremental_async and freq is 100 or 200. You should
be OK.

-Steve
Post by Brad Zynda
Key Summary Report
===========================
total key
===========================
8170 perm_mod
5390 delete
1073 access
56 time-change
26 session
12 privileged
7 logins
Syscall Summary Report
==========================
total syscall
==========================
4250 fchmodat
1613 chmod
831 fchmod
521 fchown
97 chown
12 setxattr
Syscall Summary Report
==========================
total syscall
==========================
2887 unlink
2189 rename
186 unlinkat
so from the list the top 2 in each perm_mod and delete from the above
list seem to be the culprits..
Brad Zynda
2017-12-01 13:17:58 UTC
Permalink
Raw Message
Hey Steve,

Just wanted to follow up on this and say we are still seeing services
across the board that have:

Warning: Journal has been rotated since unit was started. Log output is
incomplete or unavailable

basically created a script to check all unit file services/targets and
grep status -l for Journal, it is effecting a large range of
service.target's, service.service's and service.socket's

If we restart the service or reboot we no longer see those messages
about journal and everything appears to run as expected.

Thanks,
Brad
Post by Steve Grubb
Post by Brad Zynda
Post by Steve Grubb
Post by Steve Grubb
grep perm_mod /etc/audit/audit.rules
-a always,exit -F arch=b64 -S chmod -S fchmod -S fchmodat -F auid>=1000
-F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S chmod -S fchmod -S fchmodat -F auid>=1000
-F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b64 -S chown -S fchown -S fchownat -S lchown -F
auid>=1000 -F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S chown -S fchown -S fchownat -S lchown -F
auid>=1000 -F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b64 -S setxattr -S lsetxattr -S fsetxattr -S
removexattr -S lremovexattr -S fremovexattr -F auid>=1000 -F
auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S setxattr -S lsetxattr -S fsetxattr -S
removexattr -S lremovexattr -S fremovexattr -F auid>=1000 -F
auid!=4294967295 -k perm_mod
grep delete /etc/audit/audit.rules
-a always,exit -F arch=b64 -S unlink -S unlinkat -S rename -S renameat
-F auid>=1000 -F auid!=4294967295 -k delete
-a always,exit -F arch=b32 -S unlink -S unlinkat -S rename -S renameat
-F auid>=1000 -F auid!=4294967295 -k delete
-a always,exit -F arch=b64 -S init_module -S delete_module -k modules
-a always,exit -F arch=b32 -S init_module -S delete_module -k modules
These rules are well formed. Meaning no obvious holes that would cause
unintended events. The other ausearch/aureport commands I gave you should
show you what is causing the events and to which files. This information
may be used to create some kind of "never" rule that limits what gets
audited. If you do create some exclusion rule, it needs to be above the
perm_mod rules because audit is a "first match wins" system.
-Steve
-Steve
Here is a peak report (at least in the last 24 hours) didnt include the
Well, the idea is to look for something that's getting hit a lot. What it
sounds like is that things are getting deleted and modified quite a bit all
over the place. Does the executable report show a pattern such as one
application doing a lot? For example, with the rule you have, doing a yum
update will delete a whole lot of stuff and modify a whole lot of stuff.
Its possible to put exceptions in the rules so that one program does not flood
the logs. But looking at the quantities below, the audit system can easily
handle that.
Its also possible to exclude directories from auditing if the pattern is that
you have a daemon receiving and modifying files and then deleting them. You
should look at the executables & files and see if you can do with auditing
what they are doing because its not interesting.
If this is causing you problems on the journald side where its causing other
tasks to fail. Then I'd break the link between auditd and journald. The amount
of event you show is highish but well within range of what auditd can do. Just
make sure flush is set to incremental_async and freq is 100 or 200. You should
be OK.
-Steve
Post by Brad Zynda
Key Summary Report
===========================
total key
===========================
8170 perm_mod
5390 delete
1073 access
56 time-change
26 session
12 privileged
7 logins
Syscall Summary Report
==========================
total syscall
==========================
4250 fchmodat
1613 chmod
831 fchmod
521 fchown
97 chown
12 setxattr
Syscall Summary Report
==========================
total syscall
==========================
2887 unlink
2189 rename
186 unlinkat
so from the list the top 2 in each perm_mod and delete from the above
list seem to be the culprits..
Steve Grubb
2017-12-01 13:54:28 UTC
Permalink
Raw Message
Post by Brad Zynda
Hey Steve,
Just wanted to follow up on this and say we are still seeing services
Warning: Journal has been rotated since unit was started. Log output is
incomplete or unavailable
basically created a script to check all unit file services/targets and
grep status -l for Journal, it is effecting a large range of
service.target's, service.service's and service.socket's
If we restart the service or reboot we no longer see those messages
about journal and everything appears to run as expected.
I have never looked at journald code and have no idea how it works or why it
cares about audit events. My advice last email was to break the link if its
causing problems.

-Steve
Post by Brad Zynda
Post by Steve Grubb
Post by Brad Zynda
Post by Steve Grubb
Post by Steve Grubb
grep perm_mod /etc/audit/audit.rules
-a always,exit -F arch=b64 -S chmod -S fchmod -S fchmodat -F auid>=1000
-F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S chmod -S fchmod -S fchmodat -F auid>=1000
-F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b64 -S chown -S fchown -S fchownat -S lchown -F
auid>=1000 -F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S chown -S fchown -S fchownat -S lchown -F
auid>=1000 -F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b64 -S setxattr -S lsetxattr -S fsetxattr -S
removexattr -S lremovexattr -S fremovexattr -F auid>=1000 -F
auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S setxattr -S lsetxattr -S fsetxattr -S
removexattr -S lremovexattr -S fremovexattr -F auid>=1000 -F
auid!=4294967295 -k perm_mod
grep delete /etc/audit/audit.rules
-a always,exit -F arch=b64 -S unlink -S unlinkat -S rename -S renameat
-F auid>=1000 -F auid!=4294967295 -k delete
-a always,exit -F arch=b32 -S unlink -S unlinkat -S rename -S renameat
-F auid>=1000 -F auid!=4294967295 -k delete
-a always,exit -F arch=b64 -S init_module -S delete_module -k modules
-a always,exit -F arch=b32 -S init_module -S delete_module -k modules
These rules are well formed. Meaning no obvious holes that would cause
unintended events. The other ausearch/aureport commands I gave you should
show you what is causing the events and to which files. This information
may be used to create some kind of "never" rule that limits what gets
audited. If you do create some exclusion rule, it needs to be above the
perm_mod rules because audit is a "first match wins" system.
-Steve
-Steve
Here is a peak report (at least in the last 24 hours) didnt include the
Well, the idea is to look for something that's getting hit a lot. What it
sounds like is that things are getting deleted and modified quite a bit all
over the place. Does the executable report show a pattern such as one
application doing a lot? For example, with the rule you have, doing a yum
update will delete a whole lot of stuff and modify a whole lot of stuff.
Its possible to put exceptions in the rules so that one program does not
flood the logs. But looking at the quantities below, the audit system can
easily handle that.
Its also possible to exclude directories from auditing if the pattern is
that you have a daemon receiving and modifying files and then deleting
them. You should look at the executables & files and see if you can do
with auditing what they are doing because its not interesting.
If this is causing you problems on the journald side where its causing
other tasks to fail. Then I'd break the link between auditd and journald.
The amount of event you show is highish but well within range of what
auditd can do. Just make sure flush is set to incremental_async and freq
is 100 or 200. You should be OK.
-Steve
Post by Brad Zynda
Key Summary Report
===========================
total key
===========================
8170 perm_mod
5390 delete
1073 access
56 time-change
26 session
12 privileged
7 logins
Syscall Summary Report
==========================
total syscall
==========================
4250 fchmodat
1613 chmod
831 fchmod
521 fchown
97 chown
12 setxattr
Syscall Summary Report
==========================
total syscall
==========================
2887 unlink
2189 rename
186 unlinkat
so from the list the top 2 in each perm_mod and delete from the above
list seem to be the culprits..
Loading...