Discussion:
[PATCH v4 00/14] Add kdbus implementation
(too old to reply)
Greg Kroah-Hartman
2015-03-09 13:09:43 UTC
Permalink
kdbus is a kernel-level IPC implementation that aims for resemblance to
the the protocol layer with the existing userspace D-Bus daemon while
enabling some features that couldn't be implemented before in userspace.

The documentation in the first patch in this series explains the
protocol and the API details.

This is v4 of the kdbus series for inclusion into the mainline kernel.
Changes since v3 are:

* Drop KDBUS_FLAG_KERNEL and the 'kernel_flags' member from all
struct kdbus_cmd_*, and introduce a new KDBUS_FLAGS_NEGOTIATE
instead. Requested by Michael Kerrisk.

* Transform kdbus.txt into DocBook man-pages for better readablity,
and extend the documentation significantly. Requested by Michael
Kerrisk and Christoph Hellwig.

* Add a walk-through example for using the low-level ioctl API from
userspace.

* Consolidate some 'struct kdbus_cmd_*' types to make the API
interface easier to grasp.

* Drop 'struct kdbus_item_list'. The information stored in this
struct was redundant as all ioctls report the returned size
in the command struct already.

* KDBUS_CMD_NAME_ACQUIRE now returns the KDBUS_NAME_IN_QUEUE flag
in cmd->return_flags rather than modifying cmd->flags.

* Get rid of the need for a 2nd pool slice at install time. This
avoids pool fragmentation, message memory footprint and complexity.

* Separate flags from attach_flags in struct kdbus_cmd_info.

* Fix handling of messages with file descriptors with regard to
monitor connections that don't accept file descriptors.

* Revisited and reimplemented the quota logic. 50% are now always
kept reserved for the connection to receive notification etc,
and the rest is accounted per remote peer to avoid denial of
service attacks.

* Make use of new functions introduced with 4.0-rc1
(vfs_iter_write(), {kstrdup,kfree}_const())

* Some internal restructuring and cleanups.


Reasons why this should be done in the kernel, instead of userspace as
it is currently done today include the following:

* Performance: Fewer process context switches, fewer copies, fewer
syscalls, larger memory chunks via memfd. This is really important
for a whole class of userspace programs that are ported from other
operating systems that are run on tiny ARM systems that rely on
hundreds of thousands of messages passed at boot time, and at
"critical" times in their user interaction loops. DBus is not used
for performance sensitive applications because DBus is slow.
We want to make it fast so we can finally use it for low-latency,
high-throughput applications. A simple DBus method-call+reply takes
200us on an up-to-date test machine, with kdbus it takes 8us (with
UDS about 2us). If the packet size is increased from 8k to 128k,
kdbus even beats UDS due to single-copy transfers.

* Security: The peers which communicate do not have to trust each
other, as the only trustworthy component in the game is the kernel
which adds metadata and ensures that all data passed as payload is
either copied or sealed, so that the receiver can parse the data
without having to protect against changing memory while parsing
buffers. Also, all the data transfer is controlled by the kernel,
so that LSMs can track and control what is going on, without
involving userspace. Because of the LSM issue, security people are
much happier with this model than the current scheme of having to
hook into dbus to mediate things.

* More types of metadata can be attached to messages than in userspace

* Semantics for apps with heavy data payloads (media apps, for
instance) with optinal priority message dequeuing, and global
message ordering. Some "crazy" people are playing with using kdbus
for audio data in the system. I'm not saying that this is the best
model for this, but until now, there wasn't any other way to do this
without having to create custom "buses", one for each application
library.

* Being in the kernel closes a lot of races which can't be fixed with
the current userspace solutions. For example, with kdbus, there is a
way a client can disconnect from a bus, but do so only if no further
messages present in its queue, which is crucial for implementing
race-free "exit-on-idle" services

* Eavesdropping on the kernel level, so privileged users can hook into
the message stream without hacking support for that into their
userspace processes

* A number of smaller benefits: for example kdbus learned a way to peek
full messages without dequeing them, which is really useful for
logging metadata when handling bus-activation requests.

* dbus-daemon is not available during early-boot or shutdown.

DBus marshaling is the de-facto standard in all major(!) Linux desktop
systems. It is well established and accepted by many DEs. It also
solves many other problems, including: policy, authentication /
authorization, well-known name registry, efficient broadcasts /
multicasts, peer discovery, bus discovery, metadata transmission, and
more.

It is a shame that we cannot use this well-established protocol for
low-latency applications. We, effectively, have to duplicate all this
code on custom UDS and other transports just because DBus is too slow.
kdbus tries to unify those efforts, so that we don't need multiple
policy implementations, name registries and peer discovery mechanisms.
Furthermore, kdbus implements comprehensive, yet optional, metadata
transmission that allows to identify and authenticate peers in a
race-free manner (which is *not* possible with UDS).

Also, kdbus provides a single transport bus with sequential message
numbering. If you use multiple channels, you cannot give any ordering
guarantees across peers (for instance, regarding parallel name-registry
changes).

Of course, some of the bits above could be implemented in userspace
alone, for example with more sophisticated memory management APIs, but
this is usually done by losing out on the other details. For example,
for many of the memory management APIs, it's hard to not require the
communicating peers to fully trust each other. And we _really_ don't
want peers to have to trust each other.

Another benefit of having this in the kernel, rather than as a userspace
daemon, is that you can now easily use the bus from the initrd, or up to
the very end when the system shuts down. On current userspace D-Bus,
this is not really possible, as this requires passing the bus instance
around between initrd and the "real" system. Such a transition of all
fds also requires keeping full state of what has already been read from
the connection fds. kdbus makes this much simpler, as we can change the
ownership of the bus, just by passing one fd over from one part to the
other.

Given the theoretical advantages above, here are some real-world
examples:

* The Tizen developers have been complaining about the high latency
of DBus for polkit'ish policy queries. That's why their
authentication framework uses custom UDS sockets (called 'Cynara').
If a UI-interaction needs multiple authentication-queries, you don't
want it to take multiple milliseconds, given that you usually want
to render the result in the same frame.

* PulseAudio doesn't use DBus for data transmission. They had to
implement their own marshaling code, transport layer and so on, just
because DBus1-latency is horrible. With kdbus, we can basically drop
this code-duplication and unify the IPC layer. Same is true for
Wayland, btw.

* By moving broadcast-transmission into the kernel, we can use the
time-slices of the sender to perform heavy operations. This is also
true for policy decisions, etc. With a userspace daemon, we cannot
perform operations in a time-slice of the caller. This makes DoS
attacks much harder.

* With priority-inheritance, we can do synchronous calls into trusted
peers and let them optionally use our time-slice to perform the
action. This allows syscall-like/binder-like method-calls into other
processes. Without priority-inheritance, this is not possible in a
secure manner (see 'priority-inheritance').

* Logging-daemons often want to attach metadata to log-messages so
debugging/filtering gets easier. If short-lived programs send
log-messages, the destination peer might not be able to read such
metadata from /proc, as the process might no longer be available at
that time. Same is true for policy-decisions like polkit does. You
cannot send off method-calls and exit. You have to wait for a reply,
even though you might not even care for it. If you don't wait, the
other side might not be able to verify your identity and as such
reject the request.

* Even though the dbus traffic on idle-systems might be low, this
doesn't mean it's not significant at boot-times or under high-load.
If you run a dbus-monitor of your choice, you will see there is an
significant number of messages exchanged during VT-switches, startup,
shutdown, suspend, wakeup, hotplugging and similar situations where
lots of control-messages are exchanged. We don't want to spend
hundreds of ms just to transmit those messages.


These patches can also be found in a git tree, the kdbus branch of
char-misc.git at:
https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git/


Daniel Mack (14):
kdbus: add documentation
kdbus: add uapi header file
kdbus: add driver skeleton, ioctl entry points and utility functions
kdbus: add connection pool implementation
kdbus: add connection, queue handling and message validation code
kdbus: add node and filesystem implementation
kdbus: add code to gather metadata
kdbus: add code for notifications and matches
kdbus: add code for buses, domains and endpoints
kdbus: add name registry implementation
kdbus: add policy database implementation
kdbus: add Makefile, Kconfig and MAINTAINERS entry
kdbus: add walk-through user space example
kdbus: add selftests

Documentation/Makefile | 2 +-
Documentation/ioctl/ioctl-number.txt | 1 +
Documentation/kdbus/Makefile | 30 +
Documentation/kdbus/kdbus.bus.xml | 360 ++++
Documentation/kdbus/kdbus.connection.xml | 1252 ++++++++++++
Documentation/kdbus/kdbus.endpoint.xml | 436 ++++
Documentation/kdbus/kdbus.fs.xml | 124 ++
Documentation/kdbus/kdbus.item.xml | 840 ++++++++
Documentation/kdbus/kdbus.match.xml | 553 +++++
Documentation/kdbus/kdbus.message.xml | 1277 ++++++++++++
Documentation/kdbus/kdbus.name.xml | 711 +++++++
Documentation/kdbus/kdbus.policy.xml | 406 ++++
Documentation/kdbus/kdbus.pool.xml | 320 +++
Documentation/kdbus/kdbus.xml | 1012 ++++++++++
Documentation/kdbus/stylesheet.xsl | 16 +
MAINTAINERS | 13 +
Makefile | 1 +
include/uapi/linux/Kbuild | 1 +
include/uapi/linux/kdbus.h | 979 +++++++++
include/uapi/linux/magic.h | 2 +
init/Kconfig | 12 +
ipc/Makefile | 2 +-
ipc/kdbus/Makefile | 22 +
ipc/kdbus/bus.c | 560 ++++++
ipc/kdbus/bus.h | 101 +
ipc/kdbus/connection.c | 2215 +++++++++++++++++++++
ipc/kdbus/connection.h | 257 +++
ipc/kdbus/domain.c | 296 +++
ipc/kdbus/domain.h | 77 +
ipc/kdbus/endpoint.c | 275 +++
ipc/kdbus/endpoint.h | 67 +
ipc/kdbus/fs.c | 510 +++++
ipc/kdbus/fs.h | 28 +
ipc/kdbus/handle.c | 617 ++++++
ipc/kdbus/handle.h | 85 +
ipc/kdbus/item.c | 339 ++++
ipc/kdbus/item.h | 64 +
ipc/kdbus/limits.h | 64 +
ipc/kdbus/main.c | 125 ++
ipc/kdbus/match.c | 559 ++++++
ipc/kdbus/match.h | 35 +
ipc/kdbus/message.c | 616 ++++++
ipc/kdbus/message.h | 133 ++
ipc/kdbus/metadata.c | 1164 +++++++++++
ipc/kdbus/metadata.h | 57 +
ipc/kdbus/names.c | 772 +++++++
ipc/kdbus/names.h | 74 +
ipc/kdbus/node.c | 910 +++++++++
ipc/kdbus/node.h | 84 +
ipc/kdbus/notify.c | 248 +++
ipc/kdbus/notify.h | 30 +
ipc/kdbus/policy.c | 489 +++++
ipc/kdbus/policy.h | 51 +
ipc/kdbus/pool.c | 728 +++++++
ipc/kdbus/pool.h | 46 +
ipc/kdbus/queue.c | 678 +++++++
ipc/kdbus/queue.h | 92 +
ipc/kdbus/reply.c | 259 +++
ipc/kdbus/reply.h | 68 +
ipc/kdbus/util.c | 201 ++
ipc/kdbus/util.h | 74 +
samples/Makefile | 3 +-
samples/kdbus/.gitignore | 1 +
samples/kdbus/Makefile | 10 +
samples/kdbus/kdbus-api.h | 114 ++
samples/kdbus/kdbus-workers.c | 1327 ++++++++++++
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/kdbus/.gitignore | 3 +
tools/testing/selftests/kdbus/Makefile | 46 +
tools/testing/selftests/kdbus/kdbus-enum.c | 94 +
tools/testing/selftests/kdbus/kdbus-enum.h | 14 +
tools/testing/selftests/kdbus/kdbus-test.c | 923 +++++++++
tools/testing/selftests/kdbus/kdbus-test.h | 85 +
tools/testing/selftests/kdbus/kdbus-util.c | 1615 +++++++++++++++
tools/testing/selftests/kdbus/kdbus-util.h | 222 +++
tools/testing/selftests/kdbus/test-activator.c | 318 +++
tools/testing/selftests/kdbus/test-attach-flags.c | 750 +++++++
tools/testing/selftests/kdbus/test-benchmark.c | 451 +++++
tools/testing/selftests/kdbus/test-bus.c | 175 ++
tools/testing/selftests/kdbus/test-chat.c | 122 ++
tools/testing/selftests/kdbus/test-connection.c | 616 ++++++
tools/testing/selftests/kdbus/test-daemon.c | 65 +
tools/testing/selftests/kdbus/test-endpoint.c | 341 ++++
tools/testing/selftests/kdbus/test-fd.c | 789 ++++++++
tools/testing/selftests/kdbus/test-free.c | 64 +
tools/testing/selftests/kdbus/test-match.c | 441 ++++
tools/testing/selftests/kdbus/test-message.c | 731 +++++++
tools/testing/selftests/kdbus/test-metadata-ns.c | 506 +++++
tools/testing/selftests/kdbus/test-monitor.c | 176 ++
tools/testing/selftests/kdbus/test-names.c | 194 ++
tools/testing/selftests/kdbus/test-policy-ns.c | 632 ++++++
tools/testing/selftests/kdbus/test-policy-priv.c | 1269 ++++++++++++
tools/testing/selftests/kdbus/test-policy.c | 80 +
tools/testing/selftests/kdbus/test-sync.c | 369 ++++
tools/testing/selftests/kdbus/test-timeout.c | 99 +
95 files changed, 34063 insertions(+), 3 deletions(-)
create mode 100644 Documentation/kdbus/Makefile
create mode 100644 Documentation/kdbus/kdbus.bus.xml
create mode 100644 Documentation/kdbus/kdbus.connection.xml
create mode 100644 Documentation/kdbus/kdbus.endpoint.xml
create mode 100644 Documentation/kdbus/kdbus.fs.xml
create mode 100644 Documentation/kdbus/kdbus.item.xml
create mode 100644 Documentation/kdbus/kdbus.match.xml
create mode 100644 Documentation/kdbus/kdbus.message.xml
create mode 100644 Documentation/kdbus/kdbus.name.xml
create mode 100644 Documentation/kdbus/kdbus.policy.xml
create mode 100644 Documentation/kdbus/kdbus.pool.xml
create mode 100644 Documentation/kdbus/kdbus.xml
create mode 100644 Documentation/kdbus/stylesheet.xsl
create mode 100644 include/uapi/linux/kdbus.h
create mode 100644 ipc/kdbus/Makefile
create mode 100644 ipc/kdbus/bus.c
create mode 100644 ipc/kdbus/bus.h
create mode 100644 ipc/kdbus/connection.c
create mode 100644 ipc/kdbus/connection.h
create mode 100644 ipc/kdbus/domain.c
create mode 100644 ipc/kdbus/domain.h
create mode 100644 ipc/kdbus/endpoint.c
create mode 100644 ipc/kdbus/endpoint.h
create mode 100644 ipc/kdbus/fs.c
create mode 100644 ipc/kdbus/fs.h
create mode 100644 ipc/kdbus/handle.c
create mode 100644 ipc/kdbus/handle.h
create mode 100644 ipc/kdbus/item.c
create mode 100644 ipc/kdbus/item.h
create mode 100644 ipc/kdbus/limits.h
create mode 100644 ipc/kdbus/main.c
create mode 100644 ipc/kdbus/match.c
create mode 100644 ipc/kdbus/match.h
create mode 100644 ipc/kdbus/message.c
create mode 100644 ipc/kdbus/message.h
create mode 100644 ipc/kdbus/metadata.c
create mode 100644 ipc/kdbus/metadata.h
create mode 100644 ipc/kdbus/names.c
create mode 100644 ipc/kdbus/names.h
create mode 100644 ipc/kdbus/node.c
create mode 100644 ipc/kdbus/node.h
create mode 100644 ipc/kdbus/notify.c
create mode 100644 ipc/kdbus/notify.h
create mode 100644 ipc/kdbus/policy.c
create mode 100644 ipc/kdbus/policy.h
create mode 100644 ipc/kdbus/pool.c
create mode 100644 ipc/kdbus/pool.h
create mode 100644 ipc/kdbus/queue.c
create mode 100644 ipc/kdbus/queue.h
create mode 100644 ipc/kdbus/reply.c
create mode 100644 ipc/kdbus/reply.h
create mode 100644 ipc/kdbus/util.c
create mode 100644 ipc/kdbus/util.h
create mode 100644 samples/kdbus/.gitignore
create mode 100644 samples/kdbus/Makefile
create mode 100644 samples/kdbus/kdbus-api.h
create mode 100644 samples/kdbus/kdbus-workers.c
create mode 100644 tools/testing/selftests/kdbus/.gitignore
create mode 100644 tools/testing/selftests/kdbus/Makefile
create mode 100644 tools/testing/selftests/kdbus/kdbus-enum.c
create mode 100644 tools/testing/selftests/kdbus/kdbus-enum.h
create mode 100644 tools/testing/selftests/kdbus/kdbus-test.c
create mode 100644 tools/testing/selftests/kdbus/kdbus-test.h
create mode 100644 tools/testing/selftests/kdbus/kdbus-util.c
create mode 100644 tools/testing/selftests/kdbus/kdbus-util.h
create mode 100644 tools/testing/selftests/kdbus/test-activator.c
create mode 100644 tools/testing/selftests/kdbus/test-attach-flags.c
create mode 100644 tools/testing/selftests/kdbus/test-benchmark.c
create mode 100644 tools/testing/selftests/kdbus/test-bus.c
create mode 100644 tools/testing/selftests/kdbus/test-chat.c
create mode 100644 tools/testing/selftests/kdbus/test-connection.c
create mode 100644 tools/testing/selftests/kdbus/test-daemon.c
create mode 100644 tools/testing/selftests/kdbus/test-endpoint.c
create mode 100644 tools/testing/selftests/kdbus/test-fd.c
create mode 100644 tools/testing/selftests/kdbus/test-free.c
create mode 100644 tools/testing/selftests/kdbus/test-match.c
create mode 100644 tools/testing/selftests/kdbus/test-message.c
create mode 100644 tools/testing/selftests/kdbus/test-metadata-ns.c
create mode 100644 tools/testing/selftests/kdbus/test-monitor.c
create mode 100644 tools/testing/selftests/kdbus/test-names.c
create mode 100644 tools/testing/selftests/kdbus/test-policy-ns.c
create mode 100644 tools/testing/selftests/kdbus/test-policy-priv.c
create mode 100644 tools/testing/selftests/kdbus/test-policy.c
create mode 100644 tools/testing/selftests/kdbus/test-sync.c
create mode 100644 tools/testing/selftests/kdbus/test-timeout.c


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2015-03-09 13:09:58 UTC
Permalink
From: Daniel Mack <***@zonque.org>

This patch adds the header file which describes the low-level
transport protocol used by various ioctls. The header file is located
in include/uapi/linux/ as it is shared between kernel and userspace,
and it only contains data structure definitions, enums and defines
for constants.

The low-level kernel API of kdbus is exposed through ioctls, employed
on nodes exposed by kdbusfs. We've chosen a ioctl-based implementation
over syscalls for various reaons:

* The ioctls kdbus offers are completely specific to nodes exposed by
kdbusfs and can not be applied to any other file descriptor in a
system.

* The file descriptors derived from opening nodes in kdbusfs can only be
used for poll(), close() and the ioctls described in kdbus.h.

* Not all systems will make use of kdbus eventually, and we want to
make as many parts of the kernel optional at build time.

* We want to build the kdbus code as module, which is impossible to
do when implemented with syscalls.

* The ioctl dispatching logic does not show up in our performance
graphs; its overhead is negligible.

* For development, being able to build, load and unload a separate
module with a versioned name suffix is essential.

Signed-off-by: Daniel Mack <***@zonque.org>
Signed-off-by: David Herrmann <***@gmail.com>
Signed-off-by: Djalal Harouni <***@opendz.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
include/uapi/linux/Kbuild | 1 +
include/uapi/linux/kdbus.h | 979 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 980 insertions(+)
create mode 100644 include/uapi/linux/kdbus.h

diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 68ceb97c458c..ddc413e1959f 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -214,6 +214,7 @@ header-y += ixjuser.h
header-y += jffs2.h
header-y += joystick.h
header-y += kcmp.h
+header-y += kdbus.h
header-y += kdev_t.h
header-y += kd.h
header-y += kernelcapi.h
diff --git a/include/uapi/linux/kdbus.h b/include/uapi/linux/kdbus.h
new file mode 100644
index 000000000000..fc1d77dd7c93
--- /dev/null
+++ b/include/uapi/linux/kdbus.h
@@ -0,0 +1,979 @@
+/*
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef _KDBUS_UAPI_H_
+#define _KDBUS_UAPI_H_
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+#define KDBUS_IOCTL_MAGIC 0x95
+#define KDBUS_SRC_ID_KERNEL (0)
+#define KDBUS_DST_ID_NAME (0)
+#define KDBUS_MATCH_ID_ANY (~0ULL)
+#define KDBUS_DST_ID_BROADCAST (~0ULL)
+#define KDBUS_FLAG_NEGOTIATE (1ULL << 63)
+
+/**
+ * struct kdbus_notify_id_change - name registry change message
+ * @id: New or former owner of the name
+ * @flags: flags field from KDBUS_HELLO_*
+ *
+ * Sent from kernel to userspace when the owner or activator of
+ * a well-known name changes.
+ *
+ * Attached to:
+ * KDBUS_ITEM_ID_ADD
+ * KDBUS_ITEM_ID_REMOVE
+ */
+struct kdbus_notify_id_change {
+ __u64 id;
+ __u64 flags;
+} __attribute__((__aligned__(8)));
+
+/**
+ * struct kdbus_notify_name_change - name registry change message
+ * @old_id: ID and flags of former owner of a name
+ * @new_id: ID and flags of new owner of a name
+ * @name: Well-known name
+ *
+ * Sent from kernel to userspace when the owner or activator of
+ * a well-known name changes.
+ *
+ * Attached to:
+ * KDBUS_ITEM_NAME_ADD
+ * KDBUS_ITEM_NAME_REMOVE
+ * KDBUS_ITEM_NAME_CHANGE
+ */
+struct kdbus_notify_name_change {
+ struct kdbus_notify_id_change old_id;
+ struct kdbus_notify_id_change new_id;
+ char name[0];
+} __attribute__((__aligned__(8)));
+
+/**
+ * struct kdbus_creds - process credentials
+ * @uid: User ID
+ * @euid: Effective UID
+ * @suid: Saved UID
+ * @fsuid: Filesystem UID
+ * @gid: Group ID
+ * @egid: Effective GID
+ * @sgid: Saved GID
+ * @fsgid: Filesystem GID
+ *
+ * Attached to:
+ * KDBUS_ITEM_CREDS
+ */
+struct kdbus_creds {
+ __u64 uid;
+ __u64 euid;
+ __u64 suid;
+ __u64 fsuid;
+ __u64 gid;
+ __u64 egid;
+ __u64 sgid;
+ __u64 fsgid;
+} __attribute__((__aligned__(8)));
+
+/**
+ * struct kdbus_pids - process identifiers
+ * @pid: Process ID
+ * @tid: Thread ID
+ * @ppid: Parent process ID
+ *
+ * The PID and TID of a process.
+ *
+ * Attached to:
+ * KDBUS_ITEM_PIDS
+ */
+struct kdbus_pids {
+ __u64 pid;
+ __u64 tid;
+ __u64 ppid;
+} __attribute__((__aligned__(8)));
+
+/**
+ * struct kdbus_caps - process capabilities
+ * @last_cap: Highest currently known capability bit
+ * @caps: Variable number of 32-bit capabilities flags
+ *
+ * Contains a variable number of 32-bit capabilities flags.
+ *
+ * Attached to:
+ * KDBUS_ITEM_CAPS
+ */
+struct kdbus_caps {
+ __u32 last_cap;
+ __u32 caps[0];
+} __attribute__((__aligned__(8)));
+
+/**
+ * struct kdbus_audit - audit information
+ * @sessionid: The audit session ID
+ * @loginuid: The audit login uid
+ *
+ * Attached to:
+ * KDBUS_ITEM_AUDIT
+ */
+struct kdbus_audit {
+ __u32 sessionid;
+ __u32 loginuid;
+} __attribute__((__aligned__(8)));
+
+/**
+ * struct kdbus_timestamp
+ * @seqnum: Global per-domain message sequence number
+ * @monotonic_ns: Monotonic timestamp, in nanoseconds
+ * @realtime_ns: Realtime timestamp, in nanoseconds
+ *
+ * Attached to:
+ * KDBUS_ITEM_TIMESTAMP
+ */
+struct kdbus_timestamp {
+ __u64 seqnum;
+ __u64 monotonic_ns;
+ __u64 realtime_ns;
+} __attribute__((__aligned__(8)));
+
+/**
+ * struct kdbus_vec - I/O vector for kdbus payload items
+ * @size: The size of the vector
+ * @address: Memory address of data buffer
+ * @offset: Offset in the in-message payload memory,
+ * relative to the message head
+ *
+ * Attached to:
+ * KDBUS_ITEM_PAYLOAD_VEC, KDBUS_ITEM_PAYLOAD_OFF
+ */
+struct kdbus_vec {
+ __u64 size;
+ union {
+ __u64 address;
+ __u64 offset;
+ };
+} __attribute__((__aligned__(8)));
+
+/**
+ * struct kdbus_bloom_parameter - bus-wide bloom parameters
+ * @size: Size of the bit field in bytes (m / 8)
+ * @n_hash: Number of hash functions used (k)
+ */
+struct kdbus_bloom_parameter {
+ __u64 size;
+ __u64 n_hash;
+} __attribute__((__aligned__(8)));
+
+/**
+ * struct kdbus_bloom_filter - bloom filter containing n elements
+ * @generation: Generation of the element set in the filter
+ * @data: Bit field, multiple of 8 bytes
+ */
+struct kdbus_bloom_filter {
+ __u64 generation;
+ __u64 data[0];
+} __attribute__((__aligned__(8)));
+
+/**
+ * struct kdbus_memfd - a kdbus memfd
+ * @start: The offset into the memfd where the segment starts
+ * @size: The size of the memfd segment
+ * @fd: The file descriptor number
+ * @__pad: Padding to ensure proper alignment and size
+ *
+ * Attached to:
+ * KDBUS_ITEM_PAYLOAD_MEMFD
+ */
+struct kdbus_memfd {
+ __u64 start;
+ __u64 size;
+ int fd;
+ __u32 __pad;
+} __attribute__((__aligned__(8)));
+
+/**
+ * struct kdbus_name - a registered well-known name with its flags
+ * @flags: Flags from KDBUS_NAME_*
+ * @name: Well-known name
+ *
+ * Attached to:
+ * KDBUS_ITEM_OWNED_NAME
+ */
+struct kdbus_name {
+ __u64 flags;
+ char name[0];
+} __attribute__((__aligned__(8)));
+
+/**
+ * enum kdbus_policy_access_type - permissions of a policy record
+ * @_KDBUS_POLICY_ACCESS_NULL: Uninitialized/invalid
+ * @KDBUS_POLICY_ACCESS_USER: Grant access to a uid
+ * @KDBUS_POLICY_ACCESS_GROUP: Grant access to gid
+ * @KDBUS_POLICY_ACCESS_WORLD: World-accessible
+ */
+enum kdbus_policy_access_type {
+ _KDBUS_POLICY_ACCESS_NULL,
+ KDBUS_POLICY_ACCESS_USER,
+ KDBUS_POLICY_ACCESS_GROUP,
+ KDBUS_POLICY_ACCESS_WORLD,
+};
+
+/**
+ * enum kdbus_policy_access_flags - mode flags
+ * @KDBUS_POLICY_OWN: Allow to own a well-known name
+ * Implies KDBUS_POLICY_TALK and KDBUS_POLICY_SEE
+ * @KDBUS_POLICY_TALK: Allow communication to a well-known name
+ * Implies KDBUS_POLICY_SEE
+ * @KDBUS_POLICY_SEE: Allow to see a well-known name
+ */
+enum kdbus_policy_type {
+ KDBUS_POLICY_SEE = 0,
+ KDBUS_POLICY_TALK,
+ KDBUS_POLICY_OWN,
+};
+
+/**
+ * struct kdbus_policy_access - policy access item
+ * @type: One of KDBUS_POLICY_ACCESS_* types
+ * @access: Access to grant
+ * @id: For KDBUS_POLICY_ACCESS_USER, the uid
+ * For KDBUS_POLICY_ACCESS_GROUP, the gid
+ */
+struct kdbus_policy_access {
+ __u64 type; /* USER, GROUP, WORLD */
+ __u64 access; /* OWN, TALK, SEE */
+ __u64 id; /* uid, gid, 0 */
+} __attribute__((__aligned__(8)));
+
+/**
+ * enum kdbus_attach_flags - flags for metadata attachments
+ * @KDBUS_ATTACH_TIMESTAMP: Timestamp
+ * @KDBUS_ATTACH_CREDS: Credentials
+ * @KDBUS_ATTACH_PIDS: PIDs
+ * @KDBUS_ATTACH_AUXGROUPS: Auxiliary groups
+ * @KDBUS_ATTACH_NAMES: Well-known names
+ * @KDBUS_ATTACH_TID_COMM: The "comm" process identifier of the TID
+ * @KDBUS_ATTACH_PID_COMM: The "comm" process identifier of the PID
+ * @KDBUS_ATTACH_EXE: The path of the executable
+ * @KDBUS_ATTACH_CMDLINE: The process command line
+ * @KDBUS_ATTACH_CGROUP: The croup membership
+ * @KDBUS_ATTACH_CAPS: The process capabilities
+ * @KDBUS_ATTACH_SECLABEL: The security label
+ * @KDBUS_ATTACH_AUDIT: The audit IDs
+ * @KDBUS_ATTACH_CONN_DESCRIPTION: The human-readable connection name
+ * @_KDBUS_ATTACH_ALL: All of the above
+ * @_KDBUS_ATTACH_ANY: Wildcard match to enable any kind of
+ * metatdata.
+ */
+enum kdbus_attach_flags {
+ KDBUS_ATTACH_TIMESTAMP = 1ULL << 0,
+ KDBUS_ATTACH_CREDS = 1ULL << 1,
+ KDBUS_ATTACH_PIDS = 1ULL << 2,
+ KDBUS_ATTACH_AUXGROUPS = 1ULL << 3,
+ KDBUS_ATTACH_NAMES = 1ULL << 4,
+ KDBUS_ATTACH_TID_COMM = 1ULL << 5,
+ KDBUS_ATTACH_PID_COMM = 1ULL << 6,
+ KDBUS_ATTACH_EXE = 1ULL << 7,
+ KDBUS_ATTACH_CMDLINE = 1ULL << 8,
+ KDBUS_ATTACH_CGROUP = 1ULL << 9,
+ KDBUS_ATTACH_CAPS = 1ULL << 10,
+ KDBUS_ATTACH_SECLABEL = 1ULL << 11,
+ KDBUS_ATTACH_AUDIT = 1ULL << 12,
+ KDBUS_ATTACH_CONN_DESCRIPTION = 1ULL << 13,
+ _KDBUS_ATTACH_ALL = (1ULL << 14) - 1,
+ _KDBUS_ATTACH_ANY = ~0ULL
+};
+
+/**
+ * enum kdbus_item_type - item types to chain data in a list
+ * @_KDBUS_ITEM_NULL: Uninitialized/invalid
+ * @_KDBUS_ITEM_USER_BASE: Start of user items
+ * @KDBUS_ITEM_NEGOTIATE: Negotiate supported items
+ * @KDBUS_ITEM_PAYLOAD_VEC: Vector to data
+ * @KDBUS_ITEM_PAYLOAD_OFF: Data at returned offset to message head
+ * @KDBUS_ITEM_PAYLOAD_MEMFD: Data as sealed memfd
+ * @KDBUS_ITEM_FDS: Attached file descriptors
+ * @KDBUS_ITEM_CANCEL_FD: FD used to cancel a synchronous
+ * operation by writing to it from
+ * userspace
+ * @KDBUS_ITEM_BLOOM_PARAMETER: Bus-wide bloom parameters, used with
+ * KDBUS_CMD_BUS_MAKE, carries a
+ * struct kdbus_bloom_parameter
+ * @KDBUS_ITEM_BLOOM_FILTER: Bloom filter carried with a message,
+ * used to match against a bloom mask of a
+ * connection, carries a struct
+ * kdbus_bloom_filter
+ * @KDBUS_ITEM_BLOOM_MASK: Bloom mask used to match against a
+ * message'sbloom filter
+ * @KDBUS_ITEM_DST_NAME: Destination's well-known name
+ * @KDBUS_ITEM_MAKE_NAME: Name of domain, bus, endpoint
+ * @KDBUS_ITEM_ATTACH_FLAGS_SEND: Attach-flags, used for updating which
+ * metadata a connection opts in to send
+ * @KDBUS_ITEM_ATTACH_FLAGS_RECV: Attach-flags, used for updating which
+ * metadata a connection requests to
+ * receive for each reeceived message
+ * @KDBUS_ITEM_ID: Connection ID
+ * @KDBUS_ITEM_NAME: Well-know name with flags
+ * @_KDBUS_ITEM_ATTACH_BASE: Start of metadata attach items
+ * @KDBUS_ITEM_TIMESTAMP: Timestamp
+ * @KDBUS_ITEM_CREDS: Process credentials
+ * @KDBUS_ITEM_PIDS: Process identifiers
+ * @KDBUS_ITEM_AUXGROUPS: Auxiliary process groups
+ * @KDBUS_ITEM_OWNED_NAME: A name owned by the associated
+ * connection
+ * @KDBUS_ITEM_TID_COMM: Thread ID "comm" identifier
+ * (Don't trust this, see below.)
+ * @KDBUS_ITEM_PID_COMM: Process ID "comm" identifier
+ * (Don't trust this, see below.)
+ * @KDBUS_ITEM_EXE: The path of the executable
+ * (Don't trust this, see below.)
+ * @KDBUS_ITEM_CMDLINE: The process command line
+ * (Don't trust this, see below.)
+ * @KDBUS_ITEM_CGROUP: The croup membership
+ * @KDBUS_ITEM_CAPS: The process capabilities
+ * @KDBUS_ITEM_SECLABEL: The security label
+ * @KDBUS_ITEM_AUDIT: The audit IDs
+ * @KDBUS_ITEM_CONN_DESCRIPTION: The connection's human-readable name
+ * (debugging)
+ * @_KDBUS_ITEM_POLICY_BASE: Start of policy items
+ * @KDBUS_ITEM_POLICY_ACCESS: Policy access block
+ * @_KDBUS_ITEM_KERNEL_BASE: Start of kernel-generated message items
+ * @KDBUS_ITEM_NAME_ADD: Notification in kdbus_notify_name_change
+ * @KDBUS_ITEM_NAME_REMOVE: Notification in kdbus_notify_name_change
+ * @KDBUS_ITEM_NAME_CHANGE: Notification in kdbus_notify_name_change
+ * @KDBUS_ITEM_ID_ADD: Notification in kdbus_notify_id_change
+ * @KDBUS_ITEM_ID_REMOVE: Notification in kdbus_notify_id_change
+ * @KDBUS_ITEM_REPLY_TIMEOUT: Timeout has been reached
+ * @KDBUS_ITEM_REPLY_DEAD: Destination died
+ *
+ * N.B: The process and thread COMM fields, as well as the CMDLINE and
+ * EXE fields may be altered by unprivileged processes und should
+ * hence *not* used for security decisions. Peers should make use of
+ * these items only for informational purposes, such as generating log
+ * records.
+ */
+enum kdbus_item_type {
+ _KDBUS_ITEM_NULL,
+ _KDBUS_ITEM_USER_BASE,
+ KDBUS_ITEM_NEGOTIATE = _KDBUS_ITEM_USER_BASE,
+ KDBUS_ITEM_PAYLOAD_VEC,
+ KDBUS_ITEM_PAYLOAD_OFF,
+ KDBUS_ITEM_PAYLOAD_MEMFD,
+ KDBUS_ITEM_FDS,
+ KDBUS_ITEM_CANCEL_FD,
+ KDBUS_ITEM_BLOOM_PARAMETER,
+ KDBUS_ITEM_BLOOM_FILTER,
+ KDBUS_ITEM_BLOOM_MASK,
+ KDBUS_ITEM_DST_NAME,
+ KDBUS_ITEM_MAKE_NAME,
+ KDBUS_ITEM_ATTACH_FLAGS_SEND,
+ KDBUS_ITEM_ATTACH_FLAGS_RECV,
+ KDBUS_ITEM_ID,
+ KDBUS_ITEM_NAME,
+
+ /* keep these item types in sync with KDBUS_ATTACH_* flags */
+ _KDBUS_ITEM_ATTACH_BASE = 0x1000,
+ KDBUS_ITEM_TIMESTAMP = _KDBUS_ITEM_ATTACH_BASE,
+ KDBUS_ITEM_CREDS,
+ KDBUS_ITEM_PIDS,
+ KDBUS_ITEM_AUXGROUPS,
+ KDBUS_ITEM_OWNED_NAME,
+ KDBUS_ITEM_TID_COMM,
+ KDBUS_ITEM_PID_COMM,
+ KDBUS_ITEM_EXE,
+ KDBUS_ITEM_CMDLINE,
+ KDBUS_ITEM_CGROUP,
+ KDBUS_ITEM_CAPS,
+ KDBUS_ITEM_SECLABEL,
+ KDBUS_ITEM_AUDIT,
+ KDBUS_ITEM_CONN_DESCRIPTION,
+
+ _KDBUS_ITEM_POLICY_BASE = 0x2000,
+ KDBUS_ITEM_POLICY_ACCESS = _KDBUS_ITEM_POLICY_BASE,
+
+ _KDBUS_ITEM_KERNEL_BASE = 0x8000,
+ KDBUS_ITEM_NAME_ADD = _KDBUS_ITEM_KERNEL_BASE,
+ KDBUS_ITEM_NAME_REMOVE,
+ KDBUS_ITEM_NAME_CHANGE,
+ KDBUS_ITEM_ID_ADD,
+ KDBUS_ITEM_ID_REMOVE,
+ KDBUS_ITEM_REPLY_TIMEOUT,
+ KDBUS_ITEM_REPLY_DEAD,
+};
+
+/**
+ * struct kdbus_item - chain of data blocks
+ * @size: Overall data record size
+ * @type: Kdbus_item type of data
+ * @data: Generic bytes
+ * @data32: Generic 32 bit array
+ * @data64: Generic 64 bit array
+ * @str: Generic string
+ * @id: Connection ID
+ * @vec: KDBUS_ITEM_PAYLOAD_VEC
+ * @creds: KDBUS_ITEM_CREDS
+ * @audit: KDBUS_ITEM_AUDIT
+ * @timestamp: KDBUS_ITEM_TIMESTAMP
+ * @name: KDBUS_ITEM_NAME
+ * @bloom_parameter: KDBUS_ITEM_BLOOM_PARAMETER
+ * @bloom_filter: KDBUS_ITEM_BLOOM_FILTER
+ * @memfd: KDBUS_ITEM_PAYLOAD_MEMFD
+ * @name_change: KDBUS_ITEM_NAME_ADD
+ * KDBUS_ITEM_NAME_REMOVE
+ * KDBUS_ITEM_NAME_CHANGE
+ * @id_change: KDBUS_ITEM_ID_ADD
+ * KDBUS_ITEM_ID_REMOVE
+ * @policy: KDBUS_ITEM_POLICY_ACCESS
+ */
+struct kdbus_item {
+ __u64 size;
+ __u64 type;
+ union {
+ __u8 data[0];
+ __u32 data32[0];
+ __u64 data64[0];
+ char str[0];
+
+ __u64 id;
+ struct kdbus_vec vec;
+ struct kdbus_creds creds;
+ struct kdbus_pids pids;
+ struct kdbus_audit audit;
+ struct kdbus_caps caps;
+ struct kdbus_timestamp timestamp;
+ struct kdbus_name name;
+ struct kdbus_bloom_parameter bloom_parameter;
+ struct kdbus_bloom_filter bloom_filter;
+ struct kdbus_memfd memfd;
+ int fds[0];
+ struct kdbus_notify_name_change name_change;
+ struct kdbus_notify_id_change id_change;
+ struct kdbus_policy_access policy_access;
+ };
+} __attribute__((__aligned__(8)));
+
+/**
+ * enum kdbus_msg_flags - type of message
+ * @KDBUS_MSG_EXPECT_REPLY: Expect a reply message, used for
+ * method calls. The userspace-supplied
+ * cookie identifies the message and the
+ * respective reply carries the cookie
+ * in cookie_reply
+ * @KDBUS_MSG_NO_AUTO_START: Do not start a service if the addressed
+ * name is not currently active. This flag is
+ * not looked at by the kernel but only
+ * serves as hint for userspace implementations.
+ * @KDBUS_MSG_SIGNAL: Treat this message as signal
+ */
+enum kdbus_msg_flags {
+ KDBUS_MSG_EXPECT_REPLY = 1ULL << 0,
+ KDBUS_MSG_NO_AUTO_START = 1ULL << 1,
+ KDBUS_MSG_SIGNAL = 1ULL << 2,
+};
+
+/**
+ * enum kdbus_payload_type - type of payload carried by message
+ * @KDBUS_PAYLOAD_KERNEL: Kernel-generated simple message
+ * @KDBUS_PAYLOAD_DBUS: D-Bus marshalling "DBusDBus"
+ *
+ * Any payload-type is accepted. Common types will get added here once
+ * established.
+ */
+enum kdbus_payload_type {
+ KDBUS_PAYLOAD_KERNEL,
+ KDBUS_PAYLOAD_DBUS = 0x4442757344427573ULL,
+};
+
+/**
+ * struct kdbus_msg - the representation of a kdbus message
+ * @size: Total size of the message
+ * @flags: Message flags (KDBUS_MSG_*), userspace → kernel
+ * @priority: Message queue priority value
+ * @dst_id: 64-bit ID of the destination connection
+ * @src_id: 64-bit ID of the source connection
+ * @payload_type: Payload type (KDBUS_PAYLOAD_*)
+ * @cookie: Userspace-supplied cookie, for the connection
+ * to identify its messages
+ * @timeout_ns: The time to wait for a message reply from the peer.
+ * If there is no reply, and the send command is
+ * executed asynchronously, a kernel-generated message
+ * with an attached KDBUS_ITEM_REPLY_TIMEOUT item
+ * is sent to @src_id. For synchronously executed send
+ * command, the value denotes the maximum time the call
+ * blocks to wait for a reply. The timeout is expected in
+ * nanoseconds and as absolute CLOCK_MONOTONIC value.
+ * @cookie_reply: A reply to the requesting message with the same
+ * cookie. The requesting connection can match its
+ * request and the reply with this value
+ * @items: A list of kdbus_items containing the message payload
+ */
+struct kdbus_msg {
+ __u64 size;
+ __u64 flags;
+ __s64 priority;
+ __u64 dst_id;
+ __u64 src_id;
+ __u64 payload_type;
+ __u64 cookie;
+ union {
+ __u64 timeout_ns;
+ __u64 cookie_reply;
+ };
+ struct kdbus_item items[0];
+} __attribute__((__aligned__(8)));
+
+/**
+ * struct kdbus_msg_info - returned message container
+ * @offset: Offset of kdbus_msg slice in pool
+ * @msg_size: Copy of the kdbus_msg.size field
+ * @return_flags: Command return flags, kernel → userspace
+ */
+struct kdbus_msg_info {
+ __u64 offset;
+ __u64 msg_size;
+ __u64 return_flags;
+} __attribute__((__aligned__(8)));
+
+/**
+ * enum kdbus_send_flags - flags for sending messages
+ * @KDBUS_SEND_SYNC_REPLY: Wait for destination connection to
+ * reply to this message. The
+ * KDBUS_CMD_SEND ioctl() will block
+ * until the reply is received, and
+ * offset_reply in struct kdbus_msg will
+ * yield the offset in the sender's pool
+ * where the reply can be found.
+ * This flag is only valid if
+ * @KDBUS_MSG_EXPECT_REPLY is set as well.
+ */
+enum kdbus_send_flags {
+ KDBUS_SEND_SYNC_REPLY = 1ULL << 0,
+};
+
+/**
+ * struct kdbus_cmd_send - send message
+ * @size: Overall size of this structure
+ * @flags: Flags to change send behavior (KDBUS_SEND_*)
+ * @return_flags: Command return flags, kernel → userspace
+ * @msg_address: Storage address of the kdbus_msg to send
+ * @reply: Storage for message reply if KDBUS_SEND_SYNC_REPLY
+ * was given
+ * @items: Additional items for this command
+ */
+struct kdbus_cmd_send {
+ __u64 size;
+ __u64 flags;
+ __u64 return_flags;
+ __u64 msg_address;
+ struct kdbus_msg_info reply;
+ struct kdbus_item items[0];
+} __attribute__((__aligned__(8)));
+
+/**
+ * enum kdbus_recv_flags - flags for de-queuing messages
+ * @KDBUS_RECV_PEEK: Return the next queued message without
+ * actually de-queuing it, and without installing
+ * any file descriptors or other resources. It is
+ * usually used to determine the activating
+ * connection of a bus name.
+ * @KDBUS_RECV_DROP: Drop and free the next queued message and all
+ * its resources without actually receiving it.
+ * @KDBUS_RECV_USE_PRIORITY: Only de-queue messages with the specified or
+ * higher priority (lowest values); if not set,
+ * the priority value is ignored.
+ */
+enum kdbus_recv_flags {
+ KDBUS_RECV_PEEK = 1ULL << 0,
+ KDBUS_RECV_DROP = 1ULL << 1,
+ KDBUS_RECV_USE_PRIORITY = 1ULL << 2,
+};
+
+/**
+ * enum kdbus_recv_return_flags - return flags for message receive commands
+ * @KDBUS_RECV_RETURN_INCOMPLETE_FDS: One or more file descriptors could not
+ * be installed. These descriptors in
+ * KDBUS_ITEM_FDS will carry the value -1.
+ * @KDBUS_RECV_RETURN_DROPPED_MSGS: There have been dropped messages since
+ * the last time a message was received.
+ * The 'dropped_msgs' counter contains the
+ * number of messages dropped pool
+ * overflows or other missed broadcasts.
+ */
+enum kdbus_recv_return_flags {
+ KDBUS_RECV_RETURN_INCOMPLETE_FDS = 1ULL << 0,
+ KDBUS_RECV_RETURN_DROPPED_MSGS = 1ULL << 1,
+};
+
+/**
+ * struct kdbus_cmd_recv - struct to de-queue a buffered message
+ * @size: Overall size of this object
+ * @flags: KDBUS_RECV_* flags, userspace → kernel
+ * @return_flags: Command return flags, kernel → userspace
+ * @priority: Minimum priority of the messages to de-queue. Lowest
+ * values have the highest priority.
+ * @dropped_msgs: In case there were any dropped messages since the last
+ * time a message was received, this will be set to the
+ * number of lost messages and
+ * KDBUS_RECV_RETURN_DROPPED_MSGS will be set in
+ * 'return_flags'. This can only happen if the ioctl
+ * returns 0 or EAGAIN.
+ * @msg: Return storage for received message.
+ * @items: Additional items for this command.
+ *
+ * This struct is used with the KDBUS_CMD_RECV ioctl.
+ */
+struct kdbus_cmd_recv {
+ __u64 size;
+ __u64 flags;
+ __u64 return_flags;
+ __s64 priority;
+ __u64 dropped_msgs;
+ struct kdbus_msg_info msg;
+ struct kdbus_item items[0];
+} __attribute__((__aligned__(8)));
+
+/**
+ * struct kdbus_cmd_free - struct to free a slice of memory in the pool
+ * @size: Overall size of this structure
+ * @flags: Flags for the free command, userspace → kernel
+ * @return_flags: Command return flags, kernel → userspace
+ * @offset: The offset of the memory slice, as returned by other
+ * ioctls
+ * @items: Additional items to modify the behavior
+ *
+ * This struct is used with the KDBUS_CMD_FREE ioctl.
+ */
+struct kdbus_cmd_free {
+ __u64 size;
+ __u64 flags;
+ __u64 return_flags;
+ __u64 offset;
+ struct kdbus_item items[0];
+} __attribute__((__aligned__(8)));
+
+/**
+ * enum kdbus_hello_flags - flags for struct kdbus_cmd_hello
+ * @KDBUS_HELLO_ACCEPT_FD: The connection allows the reception of
+ * any passed file descriptors
+ * @KDBUS_HELLO_ACTIVATOR: Special-purpose connection which registers
+ * a well-know name for a process to be started
+ * when traffic arrives
+ * @KDBUS_HELLO_POLICY_HOLDER: Special-purpose connection which registers
+ * policy entries for a name. The provided name
+ * is not activated and not registered with the
+ * name database, it only allows unprivileged
+ * connections to acquire a name, talk or discover
+ * a service
+ * @KDBUS_HELLO_MONITOR: Special-purpose connection to monitor
+ * bus traffic
+ */
+enum kdbus_hello_flags {
+ KDBUS_HELLO_ACCEPT_FD = 1ULL << 0,
+ KDBUS_HELLO_ACTIVATOR = 1ULL << 1,
+ KDBUS_HELLO_POLICY_HOLDER = 1ULL << 2,
+ KDBUS_HELLO_MONITOR = 1ULL << 3,
+};
+
+/**
+ * struct kdbus_cmd_hello - struct to say hello to kdbus
+ * @size: The total size of the structure
+ * @flags: Connection flags (KDBUS_HELLO_*), userspace → kernel
+ * @return_flags: Command return flags, kernel → userspace
+ * @attach_flags_send: Mask of metadata to attach to each message sent
+ * off by this connection (KDBUS_ATTACH_*)
+ * @attach_flags_recv: Mask of metadata to attach to each message receieved
+ * by the new connection (KDBUS_ATTACH_*)
+ * @bus_flags: The flags field copied verbatim from the original
+ * KDBUS_CMD_BUS_MAKE ioctl. It's intended to be useful
+ * to do negotiation of features of the payload that is
+ * transferred (kernel → userspace)
+ * @id: The ID of this connection (kernel → userspace)
+ * @pool_size: Size of the connection's buffer where the received
+ * messages are placed
+ * @offset: Pool offset where items are returned to report
+ * additional information about the bus and the newly
+ * created connection.
+ * @items_size: Size of buffer returned in the pool slice at @offset.
+ * @id128: Unique 128-bit ID of the bus (kernel → userspace)
+ * @items: A list of items
+ *
+ * This struct is used with the KDBUS_CMD_HELLO ioctl.
+ */
+struct kdbus_cmd_hello {
+ __u64 size;
+ __u64 flags;
+ __u64 return_flags;
+ __u64 attach_flags_send;
+ __u64 attach_flags_recv;
+ __u64 bus_flags;
+ __u64 id;
+ __u64 pool_size;
+ __u64 offset;
+ __u64 items_size;
+ __u8 id128[16];
+ struct kdbus_item items[0];
+} __attribute__((__aligned__(8)));
+
+/**
+ * struct kdbus_info - connection information
+ * @size: total size of the struct
+ * @id: 64bit object ID
+ * @flags: object creation flags
+ * @items: list of items
+ *
+ * Note that the user is responsible for freeing the allocated memory with
+ * the KDBUS_CMD_FREE ioctl.
+ */
+struct kdbus_info {
+ __u64 size;
+ __u64 id;
+ __u64 flags;
+ struct kdbus_item items[0];
+} __attribute__((__aligned__(8)));
+
+/**
+ * enum kdbus_list_flags - what to include into the returned list
+ * @KDBUS_LIST_UNIQUE: active connections
+ * @KDBUS_LIST_ACTIVATORS: activator connections
+ * @KDBUS_LIST_NAMES: known well-known names
+ * @KDBUS_LIST_QUEUED: queued-up names
+ */
+enum kdbus_list_flags {
+ KDBUS_LIST_UNIQUE = 1ULL << 0,
+ KDBUS_LIST_NAMES = 1ULL << 1,
+ KDBUS_LIST_ACTIVATORS = 1ULL << 2,
+ KDBUS_LIST_QUEUED = 1ULL << 3,
+};
+
+/**
+ * struct kdbus_cmd_list - list connections
+ * @size: overall size of this object
+ * @flags: flags for the query (KDBUS_LIST_*), userspace → kernel
+ * @return_flags: command return flags, kernel → userspace
+ * @offset: Offset in the caller's pool buffer where an array of
+ * kdbus_info objects is stored.
+ * The user must use KDBUS_CMD_FREE to free the
+ * allocated memory.
+ * @list_size: size of returned list in bytes
+ * @items: Items for the command. Reserved for future use.
+ *
+ * This structure is used with the KDBUS_CMD_LIST ioctl.
+ */
+struct kdbus_cmd_list {
+ __u64 size;
+ __u64 flags;
+ __u64 return_flags;
+ __u64 offset;
+ __u64 list_size;
+ struct kdbus_item items[0];
+} __attribute__((__aligned__(8)));
+
+/**
+ * struct kdbus_cmd_info - struct used for KDBUS_CMD_CONN_INFO ioctl
+ * @size: The total size of the struct
+ * @flags: Flags for this ioctl, userspace → kernel
+ * @return_flags: Command return flags, kernel → userspace
+ * @id: The 64-bit ID of the connection. If set to zero, passing
+ * @name is required. kdbus will look up the name to
+ * determine the ID in this case.
+ * @attach_flags: Set of attach flags to specify the set of information
+ * to receive, userspace → kernel
+ * @offset: Returned offset in the caller's pool buffer where the
+ * kdbus_info struct result is stored. The user must
+ * use KDBUS_CMD_FREE to free the allocated memory.
+ * @info_size: Output buffer to report size of data at @offset.
+ * @items: The optional item list, containing the
+ * well-known name to look up as a KDBUS_ITEM_NAME.
+ * Only needed in case @id is zero.
+ *
+ * On success, the KDBUS_CMD_CONN_INFO ioctl will return 0 and @offset will
+ * tell the user the offset in the connection pool buffer at which to find the
+ * result in a struct kdbus_info.
+ */
+struct kdbus_cmd_info {
+ __u64 size;
+ __u64 flags;
+ __u64 return_flags;
+ __u64 id;
+ __u64 attach_flags;
+ __u64 offset;
+ __u64 info_size;
+ struct kdbus_item items[0];
+} __attribute__((__aligned__(8)));
+
+/**
+ * enum kdbus_cmd_match_flags - flags to control the KDBUS_CMD_MATCH_ADD ioctl
+ * @KDBUS_MATCH_REPLACE: If entries with the supplied cookie already
+ * exists, remove them before installing the new
+ * matches.
+ */
+enum kdbus_cmd_match_flags {
+ KDBUS_MATCH_REPLACE = 1ULL << 0,
+};
+
+/**
+ * struct kdbus_cmd_match - struct to add or remove matches
+ * @size: The total size of the struct
+ * @flags: Flags for match command (KDBUS_MATCH_*),
+ * userspace → kernel
+ * @return_flags: Command return flags, kernel → userspace
+ * @cookie: Userspace supplied cookie. When removing, the cookie
+ * identifies the match to remove
+ * @items: A list of items for additional information
+ *
+ * This structure is used with the KDBUS_CMD_MATCH_ADD and
+ * KDBUS_CMD_MATCH_REMOVE ioctl.
+ */
+struct kdbus_cmd_match {
+ __u64 size;
+ __u64 flags;
+ __u64 return_flags;
+ __u64 cookie;
+ struct kdbus_item items[0];
+} __attribute__((__aligned__(8)));
+
+/**
+ * enum kdbus_make_flags - Flags for KDBUS_CMD_{BUS,ENDPOINT}_MAKE
+ * @KDBUS_MAKE_ACCESS_GROUP: Make the bus or endpoint node group-accessible
+ * @KDBUS_MAKE_ACCESS_WORLD: Make the bus or endpoint node world-accessible
+ */
+enum kdbus_make_flags {
+ KDBUS_MAKE_ACCESS_GROUP = 1ULL << 0,
+ KDBUS_MAKE_ACCESS_WORLD = 1ULL << 1,
+};
+
+/**
+ * enum kdbus_name_flags - flags for KDBUS_CMD_NAME_ACQUIRE
+ * @KDBUS_NAME_REPLACE_EXISTING: Try to replace name of other connections
+ * @KDBUS_NAME_ALLOW_REPLACEMENT: Allow the replacement of the name
+ * @KDBUS_NAME_QUEUE: Name should be queued if busy
+ * @KDBUS_NAME_IN_QUEUE: Name is queued
+ * @KDBUS_NAME_ACTIVATOR: Name is owned by a activator connection
+ */
+enum kdbus_name_flags {
+ KDBUS_NAME_REPLACE_EXISTING = 1ULL << 0,
+ KDBUS_NAME_ALLOW_REPLACEMENT = 1ULL << 1,
+ KDBUS_NAME_QUEUE = 1ULL << 2,
+ KDBUS_NAME_IN_QUEUE = 1ULL << 3,
+ KDBUS_NAME_ACTIVATOR = 1ULL << 4,
+};
+
+/**
+ * struct kdbus_cmd - generic ioctl payload
+ * @size: Overall size of this structure
+ * @flags: Flags for this ioctl, userspace → kernel
+ * @return_flags: Ioctl return flags, kernel → userspace
+ * @items: Additional items to modify the behavior
+ *
+ * This is a generic ioctl payload object. It's used by all ioctls that only
+ * take flags and items as input.
+ */
+struct kdbus_cmd {
+ __u64 size;
+ __u64 flags;
+ __u64 return_flags;
+ struct kdbus_item items[0];
+} __attribute__((__aligned__(8)));
+
+/**
+ * Ioctl API
+ *
+ * KDBUS_CMD_BUS_MAKE: After opening the "control" node, this command
+ * creates a new bus with the specified
+ * name. The bus is immediately shut down and
+ * cleaned up when the opened file descriptor is
+ * closed.
+ *
+ * KDBUS_CMD_ENDPOINT_MAKE: Creates a new named special endpoint to talk to
+ * the bus. Such endpoints usually carry a more
+ * restrictive policy and grant restricted access
+ * to specific applications.
+ * KDBUS_CMD_ENDPOINT_UPDATE: Update the properties of a custom enpoint. Used
+ * to update the policy.
+ *
+ * KDBUS_CMD_HELLO: By opening the bus node, a connection is
+ * created. After a HELLO the opened connection
+ * becomes an active peer on the bus.
+ * KDBUS_CMD_UPDATE: Update the properties of a connection. Used to
+ * update the metadata subscription mask and
+ * policy.
+ * KDBUS_CMD_BYEBYE: Disconnect a connection. If there are no
+ * messages queued up in the connection's pool,
+ * the call succeeds, and the handle is rendered
+ * unusable. Otherwise, -EBUSY is returned without
+ * any further side-effects.
+ * KDBUS_CMD_FREE: Release the allocated memory in the receiver's
+ * pool.
+ * KDBUS_CMD_CONN_INFO: Retrieve credentials and properties of the
+ * initial creator of the connection. The data was
+ * stored at registration time and does not
+ * necessarily represent the connected process or
+ * the actual state of the process.
+ * KDBUS_CMD_BUS_CREATOR_INFO: Retrieve information of the creator of the bus
+ * a connection is attached to.
+ *
+ * KDBUS_CMD_SEND: Send a message and pass data from userspace to
+ * the kernel.
+ * KDBUS_CMD_RECV: Receive a message from the kernel which is
+ * placed in the receiver's pool.
+ *
+ * KDBUS_CMD_NAME_ACQUIRE: Request a well-known bus name to associate with
+ * the connection. Well-known names are used to
+ * address a peer on the bus.
+ * KDBUS_CMD_NAME_RELEASE: Release a well-known name the connection
+ * currently owns.
+ * KDBUS_CMD_LIST: Retrieve the list of all currently registered
+ * well-known and unique names.
+ *
+ * KDBUS_CMD_MATCH_ADD: Install a match which broadcast messages should
+ * be delivered to the connection.
+ * KDBUS_CMD_MATCH_REMOVE: Remove a current match for broadcast messages.
+ */
+enum kdbus_ioctl_type {
+ /* bus owner (00-0f) */
+ KDBUS_CMD_BUS_MAKE = _IOW(KDBUS_IOCTL_MAGIC, 0x00,
+ struct kdbus_cmd),
+
+ /* endpoint owner (10-1f) */
+ KDBUS_CMD_ENDPOINT_MAKE = _IOW(KDBUS_IOCTL_MAGIC, 0x10,
+ struct kdbus_cmd),
+ KDBUS_CMD_ENDPOINT_UPDATE = _IOW(KDBUS_IOCTL_MAGIC, 0x11,
+ struct kdbus_cmd),
+
+ /* connection owner (80-ff) */
+ KDBUS_CMD_HELLO = _IOWR(KDBUS_IOCTL_MAGIC, 0x80,
+ struct kdbus_cmd_hello),
+ KDBUS_CMD_UPDATE = _IOW(KDBUS_IOCTL_MAGIC, 0x81,
+ struct kdbus_cmd),
+ KDBUS_CMD_BYEBYE = _IOW(KDBUS_IOCTL_MAGIC, 0x82,
+ struct kdbus_cmd),
+ KDBUS_CMD_FREE = _IOW(KDBUS_IOCTL_MAGIC, 0x83,
+ struct kdbus_cmd_free),
+ KDBUS_CMD_CONN_INFO = _IOR(KDBUS_IOCTL_MAGIC, 0x84,
+ struct kdbus_cmd_info),
+ KDBUS_CMD_BUS_CREATOR_INFO = _IOR(KDBUS_IOCTL_MAGIC, 0x85,
+ struct kdbus_cmd_info),
+ KDBUS_CMD_LIST = _IOR(KDBUS_IOCTL_MAGIC, 0x86,
+ struct kdbus_cmd_list),
+
+ KDBUS_CMD_SEND = _IOW(KDBUS_IOCTL_MAGIC, 0x90,
+ struct kdbus_cmd_send),
+ KDBUS_CMD_RECV = _IOR(KDBUS_IOCTL_MAGIC, 0x91,
+ struct kdbus_cmd_recv),
+
+ KDBUS_CMD_NAME_ACQUIRE = _IOW(KDBUS_IOCTL_MAGIC, 0xa0,
+ struct kdbus_cmd),
+ KDBUS_CMD_NAME_RELEASE = _IOW(KDBUS_IOCTL_MAGIC, 0xa1,
+ struct kdbus_cmd),
+
+ KDBUS_CMD_MATCH_ADD = _IOW(KDBUS_IOCTL_MAGIC, 0xb0,
+ struct kdbus_cmd_match),
+ KDBUS_CMD_MATCH_REMOVE = _IOW(KDBUS_IOCTL_MAGIC, 0xb1,
+ struct kdbus_cmd_match),
+};
+
+#endif /* _KDBUS_UAPI_H_ */
--
2.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2015-03-09 13:10:14 UTC
Permalink
From: Daniel Mack <***@zonque.org>

Add the basic driver structure.

handle.c is the main ioctl command dispatcher that calls into other parts
of the driver.

main.c contains the code that creates the initial domain at startup, and
util.c has utility functions such as item iterators that are shared with
other files.

limits.h describes limits on things like maximum data structure sizes,
number of messages per users and suchlike. Some of the numbers currently
picked are rough ideas of what what might be sufficient and are probably
rather conservative.

Signed-off-by: Daniel Mack <***@zonque.org>
Signed-off-by: David Herrmann <***@gmail.com>
Signed-off-by: Djalal Harouni <***@opendz.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
Documentation/ioctl/ioctl-number.txt | 1 +
ipc/kdbus/handle.c | 617 +++++++++++++++++++++++++++++++++++
ipc/kdbus/handle.h | 85 +++++
ipc/kdbus/limits.h | 64 ++++
ipc/kdbus/main.c | 125 +++++++
ipc/kdbus/util.c | 201 ++++++++++++
ipc/kdbus/util.h | 74 +++++
7 files changed, 1167 insertions(+)
create mode 100644 ipc/kdbus/handle.c
create mode 100644 ipc/kdbus/handle.h
create mode 100644 ipc/kdbus/limits.h
create mode 100644 ipc/kdbus/main.c
create mode 100644 ipc/kdbus/util.c
create mode 100644 ipc/kdbus/util.h

diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 8136e1fd30fd..54e091ebb862 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -292,6 +292,7 @@ Code Seq#(hex) Include File Comments
0x92 00-0F drivers/usb/mon/mon_bin.c
0x93 60-7F linux/auto_fs.h
0x94 all fs/btrfs/ioctl.h
+0x95 all uapi/linux/kdbus.h kdbus IPC driver
0x97 00-7F fs/ceph/ioctl.h Ceph file system
0x99 00-0F 537-Addinboard driver
<mailto:***@buks.ipn.de>
diff --git a/ipc/kdbus/handle.c b/ipc/kdbus/handle.c
new file mode 100644
index 000000000000..f72dbe513b4a
--- /dev/null
+++ b/ipc/kdbus/handle.c
@@ -0,0 +1,617 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ * Copyright (C) 2014-2015 Djalal Harouni <***@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/idr.h>
+#include <linux/init.h>
+#include <linux/kdev_t.h>
+#include <linux/module.h>
+#include <linux/poll.h>
+#include <linux/rwsem.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/syscalls.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "endpoint.h"
+#include "fs.h"
+#include "handle.h"
+#include "item.h"
+#include "match.h"
+#include "message.h"
+#include "names.h"
+#include "domain.h"
+#include "policy.h"
+
+static int kdbus_args_verify(struct kdbus_args *args)
+{
+ struct kdbus_item *item;
+ size_t i;
+ int ret;
+
+ KDBUS_ITEMS_FOREACH(item, args->items, args->items_size) {
+ struct kdbus_arg *arg = NULL;
+
+ if (!KDBUS_ITEM_VALID(item, args->items, args->items_size))
+ return -EINVAL;
+
+ for (i = 0; i < args->argc; ++i)
+ if (args->argv[i].type == item->type)
+ break;
+ if (i >= args->argc)
+ return -EINVAL;
+
+ arg = &args->argv[i];
+
+ ret = kdbus_item_validate(item);
+ if (ret < 0)
+ return ret;
+
+ if (arg->item && !arg->multiple)
+ return -EINVAL;
+
+ arg->item = item;
+ }
+
+ if (!KDBUS_ITEMS_END(item, args->items, args->items_size))
+ return -EINVAL;
+
+ for (i = 0; i < args->argc; ++i)
+ if (args->argv[i].mandatory && !args->argv[i].item)
+ return -EINVAL;
+
+ return 0;
+}
+
+static int kdbus_args_negotiate(struct kdbus_args *args)
+{
+ struct kdbus_item __user *user;
+ struct kdbus_item *negotiation;
+ size_t i, j, num;
+
+ /*
+ * If KDBUS_FLAG_NEGOTIATE is set, we overwrite the flags field with
+ * the set of supported flags. Furthermore, if an KDBUS_ITEM_NEGOTIATE
+ * item is passed, we iterate its payload (array of u64, each set to an
+ * item type) and clear all unsupported item-types to 0.
+ * The caller might do this recursively, if other flags or objects are
+ * embedded in the payload itself.
+ */
+
+ if (args->cmd->flags & KDBUS_FLAG_NEGOTIATE) {
+ if (put_user(args->allowed_flags & ~KDBUS_FLAG_NEGOTIATE,
+ &args->user->flags))
+ return -EFAULT;
+ }
+
+ if (args->argc < 1 || args->argv[0].type != KDBUS_ITEM_NEGOTIATE ||
+ !args->argv[0].item)
+ return 0;
+
+ negotiation = args->argv[0].item;
+ user = (struct kdbus_item __user *)
+ ((u8 __user *)args->user +
+ ((u8 *)negotiation - (u8 *)args->cmd));
+ num = KDBUS_ITEM_PAYLOAD_SIZE(negotiation) / sizeof(u64);
+
+ for (i = 0; i < num; ++i) {
+ for (j = 0; j < args->argc; ++j)
+ if (negotiation->data64[i] == args->argv[j].type)
+ break;
+
+ if (j < args->argc)
+ continue;
+
+ /* this item is not supported, clear it out */
+ negotiation->data64[i] = 0;
+ if (put_user(negotiation->data64[i], &user->data64[i]))
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+/**
+ * __kdbus_args_parse() - parse payload of kdbus command
+ * @args: object to parse data into
+ * @argp: user-space location of command payload to parse
+ * @type_size: overall size of command payload to parse
+ * @items_offset: offset of items array in command payload
+ * @out: output variable to store pointer to copied payload
+ *
+ * This parses the ioctl payload at user-space location @argp into @args. @args
+ * must be pre-initialized by the caller to reflect the supported flags and
+ * items of this command. This parser will then copy the command payload into
+ * kernel-space, verify correctness and consistency and cache pointers to parsed
+ * items and other data in @args.
+ *
+ * If this function succeeded, you must call kdbus_args_clear() to release
+ * allocated resources before destroying @args.
+ *
+ * Return: On failure a negative error code is returned. Otherwise, 1 is
+ * returned if negotiation was requested, 0 if not.
+ */
+int __kdbus_args_parse(struct kdbus_args *args, void __user *argp,
+ size_t type_size, size_t items_offset, void **out)
+{
+ int ret;
+
+ args->cmd = kdbus_memdup_user(argp, type_size, KDBUS_CMD_MAX_SIZE);
+ if (IS_ERR(args->cmd))
+ return PTR_ERR(args->cmd);
+
+ args->cmd->return_flags = 0;
+ args->user = argp;
+ args->items = (void *)((u8 *)args->cmd + items_offset);
+ args->items_size = args->cmd->size - items_offset;
+
+ if (args->cmd->flags & ~args->allowed_flags) {
+ ret = -EINVAL;
+ goto error;
+ }
+
+ ret = kdbus_args_verify(args);
+ if (ret < 0)
+ goto error;
+
+ ret = kdbus_args_negotiate(args);
+ if (ret < 0)
+ goto error;
+
+ *out = args->cmd;
+ return !!(args->cmd->flags & KDBUS_FLAG_NEGOTIATE);
+
+error:
+ return kdbus_args_clear(args, ret);
+}
+
+/**
+ * kdbus_args_clear() - release allocated command resources
+ * @args: object to release resources of
+ * @ret: return value of this command
+ *
+ * This frees all allocated resources on @args and copies the command result
+ * flags into user-space. @ret is usually returned unchanged by this function,
+ * so it can be used in the final 'return' statement of the command handler.
+ *
+ * Return: -EFAULT if return values cannot be copied into user-space, otherwise
+ * @ret is returned unchanged.
+ */
+int kdbus_args_clear(struct kdbus_args *args, int ret)
+{
+ if (!args)
+ return ret;
+
+ if (!IS_ERR_OR_NULL(args->cmd)) {
+ if (put_user(args->cmd->return_flags,
+ &args->user->return_flags))
+ ret = -EFAULT;
+ kfree(args->cmd);
+ args->cmd = NULL;
+ }
+
+ return ret;
+}
+
+/**
+ * enum kdbus_handle_type - type an handle can be of
+ * @KDBUS_HANDLE_NONE: no type set, yet
+ * @KDBUS_HANDLE_BUS_OWNER: bus owner
+ * @KDBUS_HANDLE_EP_OWNER: endpoint owner
+ * @KDBUS_HANDLE_CONNECTED: endpoint connection after HELLO
+ */
+enum kdbus_handle_type {
+ KDBUS_HANDLE_NONE,
+ KDBUS_HANDLE_BUS_OWNER,
+ KDBUS_HANDLE_EP_OWNER,
+ KDBUS_HANDLE_CONNECTED,
+};
+
+/**
+ * struct kdbus_handle - handle to the kdbus system
+ * @rwlock: handle lock
+ * @type: type of this handle (KDBUS_HANDLE_*)
+ * @bus_owner: bus this handle owns
+ * @ep_owner: endpoint this handle owns
+ * @conn: connection this handle owns
+ * @privileged: Flag to mark a handle as privileged
+ */
+struct kdbus_handle {
+ struct rw_semaphore rwlock;
+
+ enum kdbus_handle_type type;
+ union {
+ struct kdbus_bus *bus_owner;
+ struct kdbus_ep *ep_owner;
+ struct kdbus_conn *conn;
+ };
+
+ bool privileged:1;
+};
+
+static int kdbus_handle_open(struct inode *inode, struct file *file)
+{
+ struct kdbus_handle *handle;
+ struct kdbus_node *node;
+ int ret;
+
+ node = kdbus_node_from_inode(inode);
+ if (!kdbus_node_acquire(node))
+ return -ESHUTDOWN;
+
+ handle = kzalloc(sizeof(*handle), GFP_KERNEL);
+ if (!handle) {
+ ret = -ENOMEM;
+ goto exit;
+ }
+
+ init_rwsem(&handle->rwlock);
+ handle->type = KDBUS_HANDLE_NONE;
+
+ if (node->type == KDBUS_NODE_ENDPOINT) {
+ struct kdbus_ep *ep = kdbus_ep_from_node(node);
+ struct kdbus_bus *bus = ep->bus;
+
+ /*
+ * A connection is privileged if it is opened on an endpoint
+ * without custom policy and either:
+ * * the user has CAP_IPC_OWNER in the domain user namespace
+ * or
+ * * the callers euid matches the uid of the bus creator
+ */
+ if (!ep->user &&
+ (ns_capable(bus->domain->user_namespace, CAP_IPC_OWNER) ||
+ uid_eq(file->f_cred->euid, bus->node.uid)))
+ handle->privileged = true;
+ }
+
+ file->private_data = handle;
+ ret = 0;
+
+exit:
+ kdbus_node_release(node);
+ return ret;
+}
+
+static int kdbus_handle_release(struct inode *inode, struct file *file)
+{
+ struct kdbus_handle *handle = file->private_data;
+
+ switch (handle->type) {
+ case KDBUS_HANDLE_BUS_OWNER:
+ if (handle->bus_owner) {
+ kdbus_node_deactivate(&handle->bus_owner->node);
+ kdbus_bus_unref(handle->bus_owner);
+ }
+ break;
+ case KDBUS_HANDLE_EP_OWNER:
+ if (handle->ep_owner) {
+ kdbus_node_deactivate(&handle->ep_owner->node);
+ kdbus_ep_unref(handle->ep_owner);
+ }
+ break;
+ case KDBUS_HANDLE_CONNECTED:
+ kdbus_conn_disconnect(handle->conn, false);
+ kdbus_conn_unref(handle->conn);
+ break;
+ case KDBUS_HANDLE_NONE:
+ /* nothing to clean up */
+ break;
+ }
+
+ kfree(handle);
+
+ return 0;
+}
+
+static long kdbus_handle_ioctl_control(struct file *file, unsigned int cmd,
+ void __user *argp)
+{
+ struct kdbus_handle *handle = file->private_data;
+ struct kdbus_node *node = file_inode(file)->i_private;
+ struct kdbus_domain *domain;
+ int ret = 0;
+
+ if (!kdbus_node_acquire(node))
+ return -ESHUTDOWN;
+
+ /*
+ * The parent of control-nodes is always a domain, make sure to pin it
+ * so the parent is actually valid.
+ */
+ domain = kdbus_domain_from_node(node->parent);
+ if (!kdbus_node_acquire(&domain->node)) {
+ kdbus_node_release(node);
+ return -ESHUTDOWN;
+ }
+
+ switch (cmd) {
+ case KDBUS_CMD_BUS_MAKE: {
+ struct kdbus_bus *bus;
+
+ bus = kdbus_cmd_bus_make(domain, argp);
+ if (IS_ERR_OR_NULL(bus)) {
+ ret = PTR_ERR_OR_ZERO(bus);
+ break;
+ }
+
+ handle->type = KDBUS_HANDLE_BUS_OWNER;
+ handle->bus_owner = bus;
+ break;
+ }
+
+ default:
+ ret = -EBADFD;
+ break;
+ }
+
+ kdbus_node_release(&domain->node);
+ kdbus_node_release(node);
+ return ret;
+}
+
+static long kdbus_handle_ioctl_ep(struct file *file, unsigned int cmd,
+ void __user *buf)
+{
+ struct kdbus_handle *handle = file->private_data;
+ struct kdbus_node *node = file_inode(file)->i_private;
+ struct kdbus_ep *ep, *file_ep = kdbus_ep_from_node(node);
+ struct kdbus_conn *conn;
+ int ret = 0;
+
+ if (!kdbus_node_acquire(node))
+ return -ESHUTDOWN;
+
+ switch (cmd) {
+ case KDBUS_CMD_ENDPOINT_MAKE:
+ /* creating custom endpoints is a privileged operation */
+ if (!handle->privileged) {
+ ret = -EPERM;
+ break;
+ }
+
+ ep = kdbus_cmd_ep_make(file_ep->bus, buf);
+ if (IS_ERR_OR_NULL(ep)) {
+ ret = PTR_ERR_OR_ZERO(ep);
+ break;
+ }
+
+ handle->type = KDBUS_HANDLE_EP_OWNER;
+ handle->ep_owner = ep;
+ break;
+
+ case KDBUS_CMD_HELLO:
+ conn = kdbus_cmd_hello(file_ep, handle->privileged, buf);
+ if (IS_ERR_OR_NULL(conn)) {
+ ret = PTR_ERR_OR_ZERO(conn);
+ break;
+ }
+
+ handle->type = KDBUS_HANDLE_CONNECTED;
+ handle->conn = conn;
+ break;
+
+ default:
+ ret = -EBADFD;
+ break;
+ }
+
+ kdbus_node_release(node);
+ return ret;
+}
+
+static long kdbus_handle_ioctl_ep_owner(struct file *file, unsigned int command,
+ void __user *buf)
+{
+ struct kdbus_handle *handle = file->private_data;
+ struct kdbus_ep *ep = handle->ep_owner;
+ int ret;
+
+ if (!kdbus_node_acquire(&ep->node))
+ return -ESHUTDOWN;
+
+ switch (command) {
+ case KDBUS_CMD_ENDPOINT_UPDATE:
+ ret = kdbus_cmd_ep_update(ep, buf);
+ break;
+ default:
+ ret = -EBADFD;
+ break;
+ }
+
+ kdbus_node_release(&ep->node);
+ return ret;
+}
+
+static long kdbus_handle_ioctl_connected(struct file *file,
+ unsigned int command, void __user *buf)
+{
+ struct kdbus_handle *handle = file->private_data;
+ struct kdbus_conn *conn = handle->conn;
+ struct kdbus_conn *release_conn = NULL;
+ int ret;
+
+ release_conn = conn;
+ ret = kdbus_conn_acquire(release_conn);
+ if (ret < 0)
+ return ret;
+
+ switch (command) {
+ case KDBUS_CMD_BYEBYE:
+ /*
+ * BYEBYE is special; we must not acquire a connection when
+ * calling into kdbus_conn_disconnect() or we will deadlock,
+ * because kdbus_conn_disconnect() will wait for all acquired
+ * references to be dropped.
+ */
+ kdbus_conn_release(release_conn);
+ release_conn = NULL;
+ ret = kdbus_cmd_byebye_unlocked(conn, buf);
+ break;
+ case KDBUS_CMD_NAME_ACQUIRE:
+ ret = kdbus_cmd_name_acquire(conn, buf);
+ break;
+ case KDBUS_CMD_NAME_RELEASE:
+ ret = kdbus_cmd_name_release(conn, buf);
+ break;
+ case KDBUS_CMD_LIST:
+ ret = kdbus_cmd_list(conn, buf);
+ break;
+ case KDBUS_CMD_CONN_INFO:
+ ret = kdbus_cmd_conn_info(conn, buf);
+ break;
+ case KDBUS_CMD_BUS_CREATOR_INFO:
+ ret = kdbus_cmd_bus_creator_info(conn, buf);
+ break;
+ case KDBUS_CMD_UPDATE:
+ ret = kdbus_cmd_update(conn, buf);
+ break;
+ case KDBUS_CMD_MATCH_ADD:
+ ret = kdbus_cmd_match_add(conn, buf);
+ break;
+ case KDBUS_CMD_MATCH_REMOVE:
+ ret = kdbus_cmd_match_remove(conn, buf);
+ break;
+ case KDBUS_CMD_SEND:
+ ret = kdbus_cmd_send(conn, file, buf);
+ break;
+ case KDBUS_CMD_RECV:
+ ret = kdbus_cmd_recv(conn, buf);
+ break;
+ case KDBUS_CMD_FREE:
+ ret = kdbus_cmd_free(conn, buf);
+ break;
+ default:
+ ret = -EBADFD;
+ break;
+ }
+
+ kdbus_conn_release(release_conn);
+ return ret;
+}
+
+static long kdbus_handle_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ struct kdbus_handle *handle = file->private_data;
+ struct kdbus_node *node = kdbus_node_from_inode(file_inode(file));
+ void __user *argp = (void __user *)arg;
+ long ret = -EBADFD;
+
+ switch (cmd) {
+ case KDBUS_CMD_BUS_MAKE:
+ case KDBUS_CMD_ENDPOINT_MAKE:
+ case KDBUS_CMD_HELLO:
+ /* bail out early if already typed */
+ if (handle->type != KDBUS_HANDLE_NONE)
+ break;
+
+ down_write(&handle->rwlock);
+ if (handle->type == KDBUS_HANDLE_NONE) {
+ if (node->type == KDBUS_NODE_CONTROL)
+ ret = kdbus_handle_ioctl_control(file, cmd,
+ argp);
+ else if (node->type == KDBUS_NODE_ENDPOINT)
+ ret = kdbus_handle_ioctl_ep(file, cmd, argp);
+ }
+ up_write(&handle->rwlock);
+ break;
+
+ case KDBUS_CMD_ENDPOINT_UPDATE:
+ case KDBUS_CMD_BYEBYE:
+ case KDBUS_CMD_NAME_ACQUIRE:
+ case KDBUS_CMD_NAME_RELEASE:
+ case KDBUS_CMD_LIST:
+ case KDBUS_CMD_CONN_INFO:
+ case KDBUS_CMD_BUS_CREATOR_INFO:
+ case KDBUS_CMD_UPDATE:
+ case KDBUS_CMD_MATCH_ADD:
+ case KDBUS_CMD_MATCH_REMOVE:
+ case KDBUS_CMD_SEND:
+ case KDBUS_CMD_RECV:
+ case KDBUS_CMD_FREE:
+ down_read(&handle->rwlock);
+ if (handle->type == KDBUS_HANDLE_EP_OWNER)
+ ret = kdbus_handle_ioctl_ep_owner(file, cmd, argp);
+ else if (handle->type == KDBUS_HANDLE_CONNECTED)
+ ret = kdbus_handle_ioctl_connected(file, cmd, argp);
+ up_read(&handle->rwlock);
+ break;
+ default:
+ ret = -ENOTTY;
+ break;
+ }
+
+ return ret < 0 ? ret : 0;
+}
+
+static unsigned int kdbus_handle_poll(struct file *file,
+ struct poll_table_struct *wait)
+{
+ struct kdbus_handle *handle = file->private_data;
+ unsigned int mask = POLLOUT | POLLWRNORM;
+ int ret;
+
+ /* Only a connected endpoint can read/write data */
+ down_read(&handle->rwlock);
+ if (handle->type != KDBUS_HANDLE_CONNECTED) {
+ up_read(&handle->rwlock);
+ return POLLERR | POLLHUP;
+ }
+ up_read(&handle->rwlock);
+
+ ret = kdbus_conn_acquire(handle->conn);
+ if (ret < 0)
+ return POLLERR | POLLHUP;
+
+ poll_wait(file, &handle->conn->wait, wait);
+
+ if (!list_empty(&handle->conn->queue.msg_list) ||
+ atomic_read(&handle->conn->lost_count) > 0)
+ mask |= POLLIN | POLLRDNORM;
+
+ kdbus_conn_release(handle->conn);
+
+ return mask;
+}
+
+static int kdbus_handle_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ struct kdbus_handle *handle = file->private_data;
+ int ret = -EBADFD;
+
+ if (down_read_trylock(&handle->rwlock)) {
+ if (handle->type == KDBUS_HANDLE_CONNECTED)
+ ret = kdbus_pool_mmap(handle->conn->pool, vma);
+ up_read(&handle->rwlock);
+ }
+ return ret;
+}
+
+const struct file_operations kdbus_handle_ops = {
+ .owner = THIS_MODULE,
+ .open = kdbus_handle_open,
+ .release = kdbus_handle_release,
+ .poll = kdbus_handle_poll,
+ .llseek = noop_llseek,
+ .unlocked_ioctl = kdbus_handle_ioctl,
+ .mmap = kdbus_handle_mmap,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = kdbus_handle_ioctl,
+#endif
+};
diff --git a/ipc/kdbus/handle.h b/ipc/kdbus/handle.h
new file mode 100644
index 000000000000..93a372d554a2
--- /dev/null
+++ b/ipc/kdbus/handle.h
@@ -0,0 +1,85 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_HANDLE_H
+#define __KDBUS_HANDLE_H
+
+#include <linux/fs.h>
+#include <uapi/linux/kdbus.h>
+
+extern const struct file_operations kdbus_handle_ops;
+
+/**
+ * kdbus_arg - information and state of a single ioctl command item
+ * @type: item type
+ * @item: set by the parser to the first found item of this type
+ * @multiple: whether multiple items of this type are allowed
+ * @mandatory: whether at least one item of this type is required
+ *
+ * This structure describes a single item in an ioctl command payload. The
+ * caller has to pre-fill the type and flags, the parser will then use this
+ * information to verify the ioctl payload. @item is set by the parser to point
+ * to the first occurrence of the item.
+ */
+struct kdbus_arg {
+ u64 type;
+ struct kdbus_item *item;
+ bool multiple : 1;
+ bool mandatory : 1;
+};
+
+/**
+ * kdbus_args - information and state of ioctl command parser
+ * @allowed_flags: set of flags this command supports
+ * @argc: number of items in @argv
+ * @argv: array of items this command supports
+ * @user: set by parser to user-space location of current command
+ * @cmd: set by parser to kernel copy of command payload
+ * @items: points to item array in @cmd
+ * @items_size: size of @items in bytes
+ *
+ * This structure is used to parse ioctl command payloads on each invocation.
+ * The ioctl handler has to pre-fill the flags and allowed items before passing
+ * the object to kdbus_args_parse(). The parser will copy the command payload
+ * into kernel-space and verify the correctness of the data.
+ */
+struct kdbus_args {
+ u64 allowed_flags;
+ size_t argc;
+ struct kdbus_arg *argv;
+
+ struct kdbus_cmd __user *user;
+ struct kdbus_cmd *cmd;
+
+ struct kdbus_item *items;
+ size_t items_size;
+};
+
+int __kdbus_args_parse(struct kdbus_args *args, void __user *argp,
+ size_t type_size, size_t items_offset, void **out);
+int kdbus_args_clear(struct kdbus_args *args, int ret);
+
+#define kdbus_args_parse(_args, _argp, _v) \
+ ({ \
+ BUILD_BUG_ON(offsetof(typeof(**(_v)), size) != \
+ offsetof(struct kdbus_cmd, size)); \
+ BUILD_BUG_ON(offsetof(typeof(**(_v)), flags) != \
+ offsetof(struct kdbus_cmd, flags)); \
+ BUILD_BUG_ON(offsetof(typeof(**(_v)), return_flags) != \
+ offsetof(struct kdbus_cmd, return_flags)); \
+ __kdbus_args_parse((_args), (_argp), sizeof(**(_v)), \
+ offsetof(typeof(**(_v)), items), \
+ (void **)(_v)); \
+ })
+
+#endif
diff --git a/ipc/kdbus/limits.h b/ipc/kdbus/limits.h
new file mode 100644
index 000000000000..6450f58cffcf
--- /dev/null
+++ b/ipc/kdbus/limits.h
@@ -0,0 +1,64 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_DEFAULTS_H
+#define __KDBUS_DEFAULTS_H
+
+#include <linux/kernel.h>
+
+/* maximum size of message header and items */
+#define KDBUS_MSG_MAX_SIZE SZ_8K
+
+/* maximum number of message items */
+#define KDBUS_MSG_MAX_ITEMS 128
+
+/* maximum number of memfd items per message */
+#define KDBUS_MSG_MAX_MEMFD_ITEMS 16
+
+/* max size of ioctl command data */
+#define KDBUS_CMD_MAX_SIZE SZ_32K
+
+/* maximum number of inflight fds in a target queue per user */
+#define KDBUS_CONN_MAX_FDS_PER_USER 16
+
+/* maximum message payload size */
+#define KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE SZ_2M
+
+/* maximum size of bloom bit field in bytes */
+#define KDBUS_BUS_BLOOM_MAX_SIZE SZ_4K
+
+/* maximum length of well-known bus name */
+#define KDBUS_NAME_MAX_LEN 255
+
+/* maximum length of bus, domain, ep name */
+#define KDBUS_SYSNAME_MAX_LEN 63
+
+/* maximum number of matches per connection */
+#define KDBUS_MATCH_MAX 256
+
+/* maximum number of queued messages from the same individual user */
+#define KDBUS_CONN_MAX_MSGS 256
+
+/* maximum number of well-known names per connection */
+#define KDBUS_CONN_MAX_NAMES 256
+
+/* maximum number of queued requests waiting for a reply */
+#define KDBUS_CONN_MAX_REQUESTS_PENDING 128
+
+/* maximum number of connections per user in one domain */
+#define KDBUS_USER_MAX_CONN 1024
+
+/* maximum number of buses per user in one domain */
+#define KDBUS_USER_MAX_BUSES 16
+
+#endif
diff --git a/ipc/kdbus/main.c b/ipc/kdbus/main.c
new file mode 100644
index 000000000000..785f529d98b7
--- /dev/null
+++ b/ipc/kdbus/main.c
@@ -0,0 +1,125 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+
+#include "util.h"
+#include "fs.h"
+#include "handle.h"
+#include "metadata.h"
+#include "node.h"
+
+/*
+ * This is a simplified outline of the internal kdbus object relations, for
+ * those interested in the inner life of the driver implementation.
+ *
+ * From a mount point's (domain's) perspective:
+ *
+ * struct kdbus_domain
+ * |» struct kdbus_user *user (many, owned)
+ * '» struct kdbus_node node (embedded)
+ * |» struct kdbus_node children (many, referenced)
+ * |» struct kdbus_node *parent (pinned)
+ * '» struct kdbus_bus (many, pinned)
+ * |» struct kdbus_node node (embedded)
+ * '» struct kdbus_ep (many, pinned)
+ * |» struct kdbus_node node (embedded)
+ * |» struct kdbus_bus *bus (pinned)
+ * |» struct kdbus_conn conn_list (many, pinned)
+ * | |» struct kdbus_ep *ep (pinned)
+ * | |» struct kdbus_name_entry *activator_of (owned)
+ * | |» struct kdbus_match_db *match_db (owned)
+ * | |» struct kdbus_meta *meta (owned)
+ * | |» struct kdbus_match_db *match_db (owned)
+ * | | '» struct kdbus_match_entry (many, owned)
+ * | |
+ * | |» struct kdbus_pool *pool (owned)
+ * | | '» struct kdbus_pool_slice *slices (many, owned)
+ * | | '» struct kdbus_pool *pool (pinned)
+ * | |
+ * | |» struct kdbus_user *user (pinned)
+ * | `» struct kdbus_queue_entry entries (many, embedded)
+ * | |» struct kdbus_pool_slice *slice (pinned)
+ * | |» struct kdbus_conn_reply *reply (owned)
+ * | '» struct kdbus_user *user (pinned)
+ * |
+ * '» struct kdbus_user *user (pinned)
+ * '» struct kdbus_policy_db policy_db (embedded)
+ * |» struct kdbus_policy_db_entry (many, owned)
+ * | |» struct kdbus_conn (pinned)
+ * | '» struct kdbus_ep (pinned)
+ * |
+ * '» struct kdbus_policy_db_cache_entry (many, owned)
+ * '» struct kdbus_conn (pinned)
+ *
+ * For the life-time of a file descriptor derived from calling open() on a file
+ * inside the mount point:
+ *
+ * struct kdbus_handle
+ * |» struct kdbus_meta *meta (owned)
+ * |» struct kdbus_ep *ep (pinned)
+ * |» struct kdbus_conn *conn (owned)
+ * '» struct kdbus_ep *ep (owned)
+ */
+
+/* kdbus mount-point /sys/fs/kdbus */
+static struct kobject *kdbus_dir;
+
+/* global module option to apply a mask to exported metadata */
+unsigned long long kdbus_meta_attach_mask = KDBUS_ATTACH_TIMESTAMP |
+ KDBUS_ATTACH_CREDS |
+ KDBUS_ATTACH_PIDS |
+ KDBUS_ATTACH_AUXGROUPS |
+ KDBUS_ATTACH_NAMES |
+ KDBUS_ATTACH_SECLABEL |
+ KDBUS_ATTACH_CONN_DESCRIPTION;
+MODULE_PARM_DESC(attach_flags_mask, "Attach-flags mask for exported metadata");
+module_param_named(attach_flags_mask, kdbus_meta_attach_mask, ullong, 0644);
+
+static int __init kdbus_init(void)
+{
+ int ret;
+
+ kdbus_dir = kobject_create_and_add(KBUILD_MODNAME, fs_kobj);
+ if (!kdbus_dir)
+ return -ENOMEM;
+
+ ret = kdbus_fs_init();
+ if (ret < 0) {
+ pr_err("cannot register filesystem: %d\n", ret);
+ goto exit_dir;
+ }
+
+ pr_info("initialized\n");
+ return 0;
+
+exit_dir:
+ kobject_put(kdbus_dir);
+ return ret;
+}
+
+static void __exit kdbus_exit(void)
+{
+ kdbus_fs_exit();
+ kobject_put(kdbus_dir);
+}
+
+module_init(kdbus_init);
+module_exit(kdbus_exit);
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("D-Bus, powerful, easy to use interprocess communication");
+MODULE_ALIAS_FS(KBUILD_MODNAME "fs");
diff --git a/ipc/kdbus/util.c b/ipc/kdbus/util.c
new file mode 100644
index 000000000000..eaa806a27997
--- /dev/null
+++ b/ipc/kdbus/util.c
@@ -0,0 +1,201 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ * Copyright (C) 2014-2015 Djalal Harouni <***@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/capability.h>
+#include <linux/cred.h>
+#include <linux/ctype.h>
+#include <linux/err.h>
+#include <linux/file.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/uaccess.h>
+#include <linux/uio.h>
+#include <linux/user_namespace.h>
+
+#include "limits.h"
+#include "util.h"
+
+/**
+ * kdbus_copy_from_user() - copy aligned data from user-space
+ * @dest: target buffer in kernel memory
+ * @user_ptr: user-provided source buffer
+ * @size: memory size to copy from user
+ *
+ * This copies @size bytes from @user_ptr into the kernel, just like
+ * copy_from_user() does. But we enforce an 8-byte alignment and reject any
+ * unaligned user-space pointers.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kdbus_copy_from_user(void *dest, void __user *user_ptr, size_t size)
+{
+ if (!KDBUS_IS_ALIGNED8((uintptr_t)user_ptr))
+ return -EFAULT;
+
+ if (copy_from_user(dest, user_ptr, size))
+ return -EFAULT;
+
+ return 0;
+}
+
+/**
+ * kdbus_memdup_user() - copy dynamically sized object from user-space
+ * @user_ptr: user-provided source buffer
+ * @sz_min: minimum object size
+ * @sz_max: maximum object size
+ *
+ * This copies a dynamically sized object from user-space into kernel-space. We
+ * require the object to have a 64bit size field at offset 0. We read it out
+ * first, allocate a suitably sized buffer and then copy all data.
+ *
+ * The @sz_min and @sz_max parameters define possible min and max object sizes
+ * so user-space cannot trigger un-bound kernel-space allocations.
+ *
+ * The same alignment-restrictions as described in kdbus_copy_from_user() apply.
+ *
+ * Return: pointer to dynamically allocated copy, or ERR_PTR() on failure.
+ */
+void *kdbus_memdup_user(void __user *user_ptr, size_t sz_min, size_t sz_max)
+{
+ void *ptr;
+ u64 size;
+ int ret;
+
+ ret = kdbus_copy_from_user(&size, user_ptr, sizeof(size));
+ if (ret < 0)
+ return ERR_PTR(ret);
+
+ if (size < sz_min)
+ return ERR_PTR(-EINVAL);
+
+ if (size > sz_max)
+ return ERR_PTR(-EMSGSIZE);
+
+ ptr = memdup_user(user_ptr, size);
+ if (IS_ERR(ptr))
+ return ptr;
+
+ if (*(u64 *)ptr != size) {
+ kfree(ptr);
+ return ERR_PTR(-EINVAL);
+ }
+
+ return ptr;
+}
+
+/**
+ * kdbus_verify_uid_prefix() - verify UID prefix of a user-supplied name
+ * @name: user-supplied name to verify
+ * @user_ns: user-namespace to act in
+ * @kuid: Kernel internal uid of user
+ *
+ * This verifies that the user-supplied name @name has their UID as prefix. This
+ * is the default name-spacing policy we enforce on user-supplied names for
+ * public kdbus entities like buses and endpoints.
+ *
+ * The user must supply names prefixed with "<UID>-", whereas the UID is
+ * interpreted in the user-namespace of the domain. If the user fails to supply
+ * such a prefixed name, we reject it.
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int kdbus_verify_uid_prefix(const char *name, struct user_namespace *user_ns,
+ kuid_t kuid)
+{
+ uid_t uid;
+ char prefix[16];
+
+ /*
+ * The kuid must have a mapping into the userns of the domain
+ * otherwise do not allow creation of buses nor endpoints.
+ */
+ uid = from_kuid(user_ns, kuid);
+ if (uid == (uid_t) -1)
+ return -EINVAL;
+
+ snprintf(prefix, sizeof(prefix), "%u-", uid);
+ if (strncmp(name, prefix, strlen(prefix)) != 0)
+ return -EINVAL;
+
+ return 0;
+}
+
+/**
+ * kdbus_sanitize_attach_flags() - Sanitize attach flags from user-space
+ * @flags: Attach flags provided by userspace
+ * @attach_flags: A pointer where to store the valid attach flags
+ *
+ * Convert attach-flags provided by user-space into a valid mask. If the mask
+ * is invalid, an error is returned. The sanitized attach flags are stored in
+ * the output parameter.
+ *
+ * Return: 0 on success, negative error on failure.
+ */
+int kdbus_sanitize_attach_flags(u64 flags, u64 *attach_flags)
+{
+ /* 'any' degrades to 'all' for compatibility */
+ if (flags == _KDBUS_ATTACH_ANY)
+ flags = _KDBUS_ATTACH_ALL;
+
+ /* reject unknown attach flags */
+ if (flags & ~_KDBUS_ATTACH_ALL)
+ return -EINVAL;
+
+ *attach_flags = flags;
+ return 0;
+}
+
+/**
+ * kdbus_kvec_set - helper utility to assemble kvec arrays
+ * @kvec: kvec entry to use
+ * @src: Source address to set in @kvec
+ * @len: Number of bytes in @src
+ * @total_len: Pointer to total length variable
+ *
+ * Set @src and @len in @kvec, and increase @total_len by @len.
+ */
+void kdbus_kvec_set(struct kvec *kvec, void *src, size_t len, u64 *total_len)
+{
+ kvec->iov_base = src;
+ kvec->iov_len = len;
+ *total_len += len;
+}
+
+static const char * const zeros = "\0\0\0\0\0\0\0";
+
+/**
+ * kdbus_kvec_pad - conditionally write a padding kvec
+ * @kvec: kvec entry to use
+ * @len: Total length used for kvec array
+ *
+ * Check if the current total byte length of the array in @len is aligned to
+ * 8 bytes. If it isn't, fill @kvec with padding information and increase @len
+ * by the number of bytes stored in @kvec.
+ *
+ * Return: the number of added padding bytes.
+ */
+size_t kdbus_kvec_pad(struct kvec *kvec, u64 *len)
+{
+ size_t pad = KDBUS_ALIGN8(*len) - *len;
+
+ if (!pad)
+ return 0;
+
+ kvec->iov_base = (void *)zeros;
+ kvec->iov_len = pad;
+
+ *len += pad;
+
+ return pad;
+}
diff --git a/ipc/kdbus/util.h b/ipc/kdbus/util.h
new file mode 100644
index 000000000000..9caadb337912
--- /dev/null
+++ b/ipc/kdbus/util.h
@@ -0,0 +1,74 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ * Copyright (C) 2014-2015 Djalal Harouni <***@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_UTIL_H
+#define __KDBUS_UTIL_H
+
+#include <linux/dcache.h>
+#include <linux/ioctl.h>
+
+#include "kdbus.h"
+
+/* all exported addresses are 64 bit */
+#define KDBUS_PTR(addr) ((void __user *)(uintptr_t)(addr))
+
+/* all exported sizes are 64 bit and data aligned to 64 bit */
+#define KDBUS_ALIGN8(s) ALIGN((s), 8)
+#define KDBUS_IS_ALIGNED8(s) (IS_ALIGNED(s, 8))
+
+/**
+ * kdbus_member_set_user - write a structure member to user memory
+ * @_s: Variable to copy from
+ * @_b: Buffer to write to
+ * @_t: Structure type
+ * @_m: Member name in the passed structure
+ *
+ * Return: the result of copy_to_user()
+ */
+#define kdbus_member_set_user(_s, _b, _t, _m) \
+({ \
+ u64 __user *_sz = \
+ (void __user *)((u8 __user *)(_b) + offsetof(_t, _m)); \
+ copy_to_user(_sz, _s, sizeof(((_t *)0)->_m)); \
+})
+
+/**
+ * kdbus_strhash - calculate a hash
+ * @str: String
+ *
+ * Return: hash value
+ */
+static inline unsigned int kdbus_strhash(const char *str)
+{
+ unsigned long hash = init_name_hash();
+
+ while (*str)
+ hash = partial_name_hash(*str++, hash);
+
+ return end_name_hash(hash);
+}
+
+int kdbus_verify_uid_prefix(const char *name, struct user_namespace *user_ns,
+ kuid_t kuid);
+int kdbus_sanitize_attach_flags(u64 flags, u64 *attach_flags);
+
+int kdbus_copy_from_user(void *dest, void __user *user_ptr, size_t size);
+void *kdbus_memdup_user(void __user *user_ptr, size_t sz_min, size_t sz_max);
+
+struct kvec;
+
+void kdbus_kvec_set(struct kvec *kvec, void *src, size_t len, u64 *total_len);
+size_t kdbus_kvec_pad(struct kvec *kvec, u64 *len);
+
+#endif
--
2.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2015-03-09 13:10:25 UTC
Permalink
From: Daniel Mack <***@zonque.org>

A pool for data received from the kernel is installed for every
connection of the bus, and it is used to copy data from the kernel to
userspace clients, for messages and other information.

It is accessed when one of the following ioctls is issued:

* KDBUS_CMD_MSG_RECV, to receive a message
* KDBUS_CMD_NAME_LIST, to dump the name registry
* KDBUS_CMD_CONN_INFO, to retrieve information on a connection

The offsets returned by either one of the aforementioned ioctls
describe offsets inside the pool. Internally, the pool is organized in
slices, that are dynamically allocated on demand. The overall size of
the pool is chosen by the connection when it connects to the bus with
KDBUS_CMD_HELLO.

In order to make the slice available for subsequent calls,
KDBUS_CMD_FREE has to be called on the offset.

To access the memory, the caller is expected to mmap() it to its task.

Signed-off-by: Daniel Mack <***@zonque.org>
Signed-off-by: David Herrmann <***@gmail.com>
Signed-off-by: Djalal Harouni <***@opendz.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
ipc/kdbus/pool.c | 728 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
ipc/kdbus/pool.h | 46 ++++
2 files changed, 774 insertions(+)
create mode 100644 ipc/kdbus/pool.c
create mode 100644 ipc/kdbus/pool.h

diff --git a/ipc/kdbus/pool.c b/ipc/kdbus/pool.c
new file mode 100644
index 000000000000..139bb77056b3
--- /dev/null
+++ b/ipc/kdbus/pool.c
@@ -0,0 +1,728 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ * Copyright (C) 2014-2015 Djalal Harouni <***@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/aio.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/highmem.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/pagemap.h>
+#include <linux/rbtree.h>
+#include <linux/sched.h>
+#include <linux/shmem_fs.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/uio.h>
+
+#include "pool.h"
+#include "util.h"
+
+/**
+ * struct kdbus_pool - the receiver's buffer
+ * @f: The backing shmem file
+ * @size: The size of the file
+ * @accounted_size: Currently accounted memory in bytes
+ * @lock: Pool data lock
+ * @slices: All slices sorted by address
+ * @slices_busy: Tree of allocated slices
+ * @slices_free: Tree of free slices
+ *
+ * The receiver's buffer, managed as a pool of allocated and free
+ * slices containing the queued messages.
+ *
+ * Messages sent with KDBUS_CMD_SEND are copied direcly by the
+ * sending process into the receiver's pool.
+ *
+ * Messages received with KDBUS_CMD_RECV just return the offset
+ * to the data placed in the pool.
+ *
+ * The internally allocated memory needs to be returned by the receiver
+ * with KDBUS_CMD_FREE.
+ */
+struct kdbus_pool {
+ struct file *f;
+ size_t size;
+ size_t accounted_size;
+ struct mutex lock;
+
+ struct list_head slices;
+ struct rb_root slices_busy;
+ struct rb_root slices_free;
+};
+
+/**
+ * struct kdbus_pool_slice - allocated element in kdbus_pool
+ * @pool: Pool this slice belongs to
+ * @off: Offset of slice in the shmem file
+ * @size: Size of slice
+ * @entry: Entry in "all slices" list
+ * @rb_node: Entry in free or busy list
+ * @free: Unused slice
+ * @accounted: Accounted as queue slice
+ * @ref_kernel: Kernel holds a reference
+ * @ref_user: Userspace holds a reference
+ *
+ * The pool has one or more slices, always spanning the entire size of the
+ * pool.
+ *
+ * Every slice is an element in a list sorted by the buffer address, to
+ * provide access to the next neighbor slice.
+ *
+ * Every slice is member in either the busy or the free tree. The free
+ * tree is organized by slice size, the busy tree organized by buffer
+ * offset.
+ */
+struct kdbus_pool_slice {
+ struct kdbus_pool *pool;
+ size_t off;
+ size_t size;
+
+ struct list_head entry;
+ struct rb_node rb_node;
+
+ bool free:1;
+ bool accounted:1;
+ bool ref_kernel:1;
+ bool ref_user:1;
+};
+
+static struct kdbus_pool_slice *kdbus_pool_slice_new(struct kdbus_pool *pool,
+ size_t off, size_t size)
+{
+ struct kdbus_pool_slice *slice;
+
+ slice = kzalloc(sizeof(*slice), GFP_KERNEL);
+ if (!slice)
+ return NULL;
+
+ slice->pool = pool;
+ slice->off = off;
+ slice->size = size;
+ slice->free = true;
+ return slice;
+}
+
+/* insert a slice into the free tree */
+static void kdbus_pool_add_free_slice(struct kdbus_pool *pool,
+ struct kdbus_pool_slice *slice)
+{
+ struct rb_node **n;
+ struct rb_node *pn = NULL;
+
+ n = &pool->slices_free.rb_node;
+ while (*n) {
+ struct kdbus_pool_slice *pslice;
+
+ pn = *n;
+ pslice = rb_entry(pn, struct kdbus_pool_slice, rb_node);
+ if (slice->size < pslice->size)
+ n = &pn->rb_left;
+ else
+ n = &pn->rb_right;
+ }
+
+ rb_link_node(&slice->rb_node, pn, n);
+ rb_insert_color(&slice->rb_node, &pool->slices_free);
+}
+
+/* insert a slice into the busy tree */
+static void kdbus_pool_add_busy_slice(struct kdbus_pool *pool,
+ struct kdbus_pool_slice *slice)
+{
+ struct rb_node **n;
+ struct rb_node *pn = NULL;
+
+ n = &pool->slices_busy.rb_node;
+ while (*n) {
+ struct kdbus_pool_slice *pslice;
+
+ pn = *n;
+ pslice = rb_entry(pn, struct kdbus_pool_slice, rb_node);
+ if (slice->off < pslice->off)
+ n = &pn->rb_left;
+ else if (slice->off > pslice->off)
+ n = &pn->rb_right;
+ else
+ BUG();
+ }
+
+ rb_link_node(&slice->rb_node, pn, n);
+ rb_insert_color(&slice->rb_node, &pool->slices_busy);
+}
+
+static struct kdbus_pool_slice *kdbus_pool_find_slice(struct kdbus_pool *pool,
+ size_t off)
+{
+ struct rb_node *n;
+
+ n = pool->slices_busy.rb_node;
+ while (n) {
+ struct kdbus_pool_slice *s;
+
+ s = rb_entry(n, struct kdbus_pool_slice, rb_node);
+ if (off < s->off)
+ n = n->rb_left;
+ else if (off > s->off)
+ n = n->rb_right;
+ else
+ return s;
+ }
+
+ return NULL;
+}
+
+/**
+ * kdbus_pool_slice_alloc() - allocate memory from a pool
+ * @pool: The receiver's pool
+ * @size: The number of bytes to allocate
+ * @accounted: Whether this slice should be accounted for
+ *
+ * The returned slice is used for kdbus_pool_slice_release() to
+ * free the allocated memory. If either @kvec or @iovec is non-NULL, the data
+ * will be copied from kernel or userspace memory into the new slice at
+ * offset 0.
+ *
+ * Return: the allocated slice on success, ERR_PTR on failure.
+ */
+struct kdbus_pool_slice *kdbus_pool_slice_alloc(struct kdbus_pool *pool,
+ size_t size, bool accounted)
+{
+ size_t slice_size = KDBUS_ALIGN8(size);
+ struct rb_node *n, *found = NULL;
+ struct kdbus_pool_slice *s;
+ int ret = 0;
+
+ if (WARN_ON(!size))
+ return ERR_PTR(-EINVAL);
+
+ /* search a free slice with the closest matching size */
+ mutex_lock(&pool->lock);
+ n = pool->slices_free.rb_node;
+ while (n) {
+ s = rb_entry(n, struct kdbus_pool_slice, rb_node);
+ if (slice_size < s->size) {
+ found = n;
+ n = n->rb_left;
+ } else if (slice_size > s->size) {
+ n = n->rb_right;
+ } else {
+ found = n;
+ break;
+ }
+ }
+
+ /* no slice with the minimum size found in the pool */
+ if (!found) {
+ ret = -EXFULL;
+ goto exit_unlock;
+ }
+
+ /* no exact match, use the closest one */
+ if (!n) {
+ struct kdbus_pool_slice *s_new;
+
+ s = rb_entry(found, struct kdbus_pool_slice, rb_node);
+
+ /* split-off the remainder of the size to its own slice */
+ s_new = kdbus_pool_slice_new(pool, s->off + slice_size,
+ s->size - slice_size);
+ if (!s_new) {
+ ret = -ENOMEM;
+ goto exit_unlock;
+ }
+
+ list_add(&s_new->entry, &s->entry);
+ kdbus_pool_add_free_slice(pool, s_new);
+
+ /* adjust our size now that we split-off another slice */
+ s->size = slice_size;
+ }
+
+ /* move slice from free to the busy tree */
+ rb_erase(found, &pool->slices_free);
+ kdbus_pool_add_busy_slice(pool, s);
+
+ WARN_ON(s->ref_kernel || s->ref_user);
+
+ s->ref_kernel = true;
+ s->free = false;
+ s->accounted = accounted;
+ if (accounted)
+ pool->accounted_size += s->size;
+ mutex_unlock(&pool->lock);
+
+ return s;
+
+exit_unlock:
+ mutex_unlock(&pool->lock);
+ return ERR_PTR(ret);
+}
+
+static void __kdbus_pool_slice_release(struct kdbus_pool_slice *slice)
+{
+ struct kdbus_pool *pool = slice->pool;
+
+ /* don't free the slice if either has a reference */
+ if (slice->ref_kernel || slice->ref_user)
+ return;
+
+ if (WARN_ON(slice->free))
+ return;
+
+ rb_erase(&slice->rb_node, &pool->slices_busy);
+
+ /* merge with the next free slice */
+ if (!list_is_last(&slice->entry, &pool->slices)) {
+ struct kdbus_pool_slice *s;
+
+ s = list_entry(slice->entry.next,
+ struct kdbus_pool_slice, entry);
+ if (s->free) {
+ rb_erase(&s->rb_node, &pool->slices_free);
+ list_del(&s->entry);
+ slice->size += s->size;
+ kfree(s);
+ }
+ }
+
+ /* merge with previous free slice */
+ if (pool->slices.next != &slice->entry) {
+ struct kdbus_pool_slice *s;
+
+ s = list_entry(slice->entry.prev,
+ struct kdbus_pool_slice, entry);
+ if (s->free) {
+ rb_erase(&s->rb_node, &pool->slices_free);
+ list_del(&slice->entry);
+ s->size += slice->size;
+ kfree(slice);
+ slice = s;
+ }
+ }
+
+ slice->free = true;
+ kdbus_pool_add_free_slice(pool, slice);
+}
+
+/**
+ * kdbus_pool_slice_release() - drop kernel-reference on allocated slice
+ * @slice: Slice allocated from the pool
+ *
+ * This releases the kernel-reference on the given slice. If the
+ * kernel-reference and the user-reference on a slice are dropped, the slice is
+ * returned to the pool.
+ *
+ * So far, we do not implement full ref-counting on slices. Each, kernel and
+ * user-space can have exactly one reference to a slice. If both are dropped at
+ * the same time, the slice is released.
+ */
+void kdbus_pool_slice_release(struct kdbus_pool_slice *slice)
+{
+ struct kdbus_pool *pool;
+
+ if (!slice)
+ return;
+
+ /* @slice may be freed, so keep local ptr to @pool */
+ pool = slice->pool;
+
+ mutex_lock(&pool->lock);
+ /* kernel must own a ref to @slice to drop it */
+ WARN_ON(!slice->ref_kernel);
+ slice->ref_kernel = false;
+ /* no longer kernel-owned, de-account slice */
+ if (slice->accounted && !WARN_ON(pool->accounted_size < slice->size))
+ pool->accounted_size -= slice->size;
+ __kdbus_pool_slice_release(slice);
+ mutex_unlock(&pool->lock);
+}
+
+/**
+ * kdbus_pool_release_offset() - release a public offset
+ * @pool: pool to operate on
+ * @off: offset to release
+ *
+ * This should be called whenever user-space frees a slice given to them. It
+ * verifies the slice is available and public, and then drops it. It ensures
+ * correct locking and barriers against queues.
+ *
+ * Return: 0 on success, ENXIO if the offset is invalid or not public.
+ */
+int kdbus_pool_release_offset(struct kdbus_pool *pool, size_t off)
+{
+ struct kdbus_pool_slice *slice;
+ int ret = 0;
+
+ /* 'pool->size' is used as dummy offset for empty slices */
+ if (off == pool->size)
+ return 0;
+
+ mutex_lock(&pool->lock);
+ slice = kdbus_pool_find_slice(pool, off);
+ if (slice && slice->ref_user) {
+ slice->ref_user = false;
+ __kdbus_pool_slice_release(slice);
+ } else {
+ ret = -ENXIO;
+ }
+ mutex_unlock(&pool->lock);
+
+ return ret;
+}
+
+/**
+ * kdbus_pool_publish_empty() - publish empty slice to user-space
+ * @pool: pool to operate on
+ * @off: output storage for offset, or NULL
+ * @size: output storage for size, or NULL
+ *
+ * This is the same as kdbus_pool_slice_publish(), but uses a dummy slice with
+ * size 0. The returned offset points to the end of the pool and is never
+ * returned on real slices.
+ */
+void kdbus_pool_publish_empty(struct kdbus_pool *pool, u64 *off, u64 *size)
+{
+ if (off)
+ *off = pool->size;
+ if (size)
+ *size = 0;
+}
+
+/**
+ * kdbus_pool_slice_publish() - publish slice to user-space
+ * @slice: The slice
+ * @out_offset: Output storage for offset, or NULL
+ * @out_size: Output storage for size, or NULL
+ *
+ * This prepares a slice to be published to user-space.
+ *
+ * This call combines the following operations:
+ * * the memory region is flushed so the user's memory view is consistent
+ * * the slice is marked as referenced by user-space, so user-space has to
+ * call KDBUS_CMD_FREE to release it
+ * * the offset and size of the slice are written to the given output
+ * arguments, if non-NULL
+ */
+void kdbus_pool_slice_publish(struct kdbus_pool_slice *slice,
+ u64 *out_offset, u64 *out_size)
+{
+ mutex_lock(&slice->pool->lock);
+ /* kernel must own a ref to @slice to gain a user-space ref */
+ WARN_ON(!slice->ref_kernel);
+ slice->ref_user = true;
+ mutex_unlock(&slice->pool->lock);
+
+ if (out_offset)
+ *out_offset = slice->off;
+ if (out_size)
+ *out_size = slice->size;
+}
+
+/**
+ * kdbus_pool_slice_offset() - Get a slice's offset inside the pool
+ * @slice: Slice to return the offset of
+ *
+ * Return: The internal offset @slice inside the pool.
+ */
+off_t kdbus_pool_slice_offset(const struct kdbus_pool_slice *slice)
+{
+ return slice->off;
+}
+
+/**
+ * kdbus_pool_slice_size() - get size of a pool slice
+ * @slice: slice to query
+ *
+ * Return: size of the given slice
+ */
+size_t kdbus_pool_slice_size(const struct kdbus_pool_slice *slice)
+{
+ return slice->size;
+}
+
+/**
+ * kdbus_pool_new() - create a new pool
+ * @name: Name of the (deleted) file which shows up in
+ * /proc, used for debugging
+ * @size: Maximum size of the pool
+ *
+ * Return: a new kdbus_pool on success, ERR_PTR on failure.
+ */
+struct kdbus_pool *kdbus_pool_new(const char *name, size_t size)
+{
+ struct kdbus_pool_slice *s;
+ struct kdbus_pool *p;
+ struct file *f;
+ char *n = NULL;
+ int ret;
+
+ p = kzalloc(sizeof(*p), GFP_KERNEL);
+ if (!p)
+ return ERR_PTR(-ENOMEM);
+
+ if (name) {
+ n = kasprintf(GFP_KERNEL, KBUILD_MODNAME "-conn:%s", name);
+ if (!n) {
+ ret = -ENOMEM;
+ goto exit_free;
+ }
+ }
+
+ f = shmem_file_setup(n ?: KBUILD_MODNAME "-conn", size, 0);
+ kfree(n);
+
+ if (IS_ERR(f)) {
+ ret = PTR_ERR(f);
+ goto exit_free;
+ }
+
+ ret = get_write_access(file_inode(f));
+ if (ret < 0)
+ goto exit_put_shmem;
+
+ /* allocate first slice spanning the entire pool */
+ s = kdbus_pool_slice_new(p, 0, size);
+ if (!s) {
+ ret = -ENOMEM;
+ goto exit_put_write;
+ }
+
+ p->f = f;
+ p->size = size;
+ p->slices_free = RB_ROOT;
+ p->slices_busy = RB_ROOT;
+ mutex_init(&p->lock);
+
+ INIT_LIST_HEAD(&p->slices);
+ list_add(&s->entry, &p->slices);
+
+ kdbus_pool_add_free_slice(p, s);
+ return p;
+
+exit_put_write:
+ put_write_access(file_inode(f));
+exit_put_shmem:
+ fput(f);
+exit_free:
+ kfree(p);
+ return ERR_PTR(ret);
+}
+
+/**
+ * kdbus_pool_free() - destroy pool
+ * @pool: The receiver's pool
+ */
+void kdbus_pool_free(struct kdbus_pool *pool)
+{
+ struct kdbus_pool_slice *s, *tmp;
+
+ if (!pool)
+ return;
+
+ list_for_each_entry_safe(s, tmp, &pool->slices, entry) {
+ list_del(&s->entry);
+ kfree(s);
+ }
+
+ put_write_access(file_inode(pool->f));
+ fput(pool->f);
+ kfree(pool);
+}
+
+/**
+ * kdbus_pool_accounted() - retrieve accounting information
+ * @pool: pool to query
+ * @size: output for overall pool size
+ * @acc: output for currently accounted size
+ *
+ * This returns accounting information of the pool. Note that the data might
+ * change after the function returns, as the pool lock is dropped. You need to
+ * protect the data via other means, if you need reliable accounting.
+ */
+void kdbus_pool_accounted(struct kdbus_pool *pool, size_t *size, size_t *acc)
+{
+ mutex_lock(&pool->lock);
+ if (size)
+ *size = pool->size;
+ if (acc)
+ *acc = pool->accounted_size;
+ mutex_unlock(&pool->lock);
+}
+
+/**
+ * kdbus_pool_slice_copy_iovec() - copy user memory to a slice
+ * @slice: The slice to write to
+ * @off: Offset in the slice to write to
+ * @iov: iovec array, pointing to data to copy
+ * @iov_len: Number of elements in @iov
+ * @total_len: Total number of bytes described in members of @iov
+ *
+ * User memory referenced by @iov will be copied into @slice at offset @off.
+ *
+ * Return: the numbers of bytes copied, negative errno on failure.
+ */
+ssize_t
+kdbus_pool_slice_copy_iovec(const struct kdbus_pool_slice *slice, loff_t off,
+ struct iovec *iov, size_t iov_len, size_t total_len)
+{
+ struct iov_iter iter;
+ ssize_t len;
+
+ if (WARN_ON(off + total_len > slice->size))
+ return -EFAULT;
+
+ off += slice->off;
+ iov_iter_init(&iter, WRITE, iov, iov_len, total_len);
+ len = vfs_iter_write(slice->pool->f, &iter, &off);
+
+ return (len >= 0 && len != total_len) ? -EFAULT : len;
+}
+
+/**
+ * kdbus_pool_slice_copy_kvec() - copy kernel memory to a slice
+ * @slice: The slice to write to
+ * @off: Offset in the slice to write to
+ * @kvec: kvec array, pointing to data to copy
+ * @kvec_len: Number of elements in @kvec
+ * @total_len: Total number of bytes described in members of @kvec
+ *
+ * Kernel memory referenced by @kvec will be copied into @slice at offset @off.
+ *
+ * Return: the numbers of bytes copied, negative errno on failure.
+ */
+ssize_t kdbus_pool_slice_copy_kvec(const struct kdbus_pool_slice *slice,
+ loff_t off, struct kvec *kvec,
+ size_t kvec_len, size_t total_len)
+{
+ struct iov_iter iter;
+ mm_segment_t old_fs;
+ ssize_t len;
+
+ if (WARN_ON(off + total_len > slice->size))
+ return -EFAULT;
+
+ off += slice->off;
+ iov_iter_kvec(&iter, WRITE | ITER_KVEC, kvec, kvec_len, total_len);
+
+ old_fs = get_fs();
+ set_fs(get_ds());
+ len = vfs_iter_write(slice->pool->f, &iter, &off);
+ set_fs(old_fs);
+
+ return (len >= 0 && len != total_len) ? -EFAULT : len;
+}
+
+/**
+ * kdbus_pool_slice_copy() - copy data from one slice into another
+ * @slice_dst: destination slice
+ * @slice_src: source slice
+ *
+ * Return: 0 on success, negative error number on failure.
+ */
+int kdbus_pool_slice_copy(const struct kdbus_pool_slice *slice_dst,
+ const struct kdbus_pool_slice *slice_src)
+{
+ struct file *f_src = slice_src->pool->f;
+ struct file *f_dst = slice_dst->pool->f;
+ struct inode *i_dst = file_inode(f_dst);
+ struct address_space *mapping_dst = f_dst->f_mapping;
+ const struct address_space_operations *aops = mapping_dst->a_ops;
+ unsigned long len = slice_src->size;
+ loff_t off_src = slice_src->off;
+ loff_t off_dst = slice_dst->off;
+ mm_segment_t old_fs;
+ int ret = 0;
+
+ if (WARN_ON(slice_src->size != slice_dst->size) ||
+ WARN_ON(slice_src->free || slice_dst->free))
+ return -EINVAL;
+
+ mutex_lock(&i_dst->i_mutex);
+ old_fs = get_fs();
+ set_fs(get_ds());
+ while (len > 0) {
+ unsigned long page_off;
+ unsigned long copy_len;
+ char __user *kaddr;
+ struct page *page;
+ ssize_t n_read;
+ void *fsdata;
+ long status;
+
+ page_off = off_dst & (PAGE_CACHE_SIZE - 1);
+ copy_len = min_t(unsigned long,
+ PAGE_CACHE_SIZE - page_off, len);
+
+ status = aops->write_begin(f_dst, mapping_dst, off_dst,
+ copy_len, 0, &page, &fsdata);
+ if (unlikely(status < 0)) {
+ ret = status;
+ break;
+ }
+
+ kaddr = (char __force __user *)kmap(page) + page_off;
+ n_read = f_src->f_op->read(f_src, kaddr, copy_len, &off_src);
+ kunmap(page);
+ mark_page_accessed(page);
+ flush_dcache_page(page);
+
+ if (unlikely(n_read != copy_len)) {
+ ret = -EFAULT;
+ break;
+ }
+
+ status = aops->write_end(f_dst, mapping_dst, off_dst,
+ copy_len, copy_len, page, fsdata);
+ if (unlikely(status != copy_len)) {
+ ret = -EFAULT;
+ break;
+ }
+
+ off_dst += copy_len;
+ len -= copy_len;
+ }
+ set_fs(old_fs);
+ mutex_unlock(&i_dst->i_mutex);
+
+ return ret;
+}
+
+/**
+ * kdbus_pool_mmap() - map the pool into the process
+ * @pool: The receiver's pool
+ * @vma: passed by mmap() syscall
+ *
+ * Return: the result of the mmap() call, negative errno on failure.
+ */
+int kdbus_pool_mmap(const struct kdbus_pool *pool, struct vm_area_struct *vma)
+{
+ /* deny write access to the pool */
+ if (vma->vm_flags & VM_WRITE)
+ return -EPERM;
+ vma->vm_flags &= ~VM_MAYWRITE;
+
+ /* do not allow to map more than the size of the file */
+ if ((vma->vm_end - vma->vm_start) > pool->size)
+ return -EFAULT;
+
+ /* replace the connection file with our shmem file */
+ if (vma->vm_file)
+ fput(vma->vm_file);
+ vma->vm_file = get_file(pool->f);
+
+ return pool->f->f_op->mmap(pool->f, vma);
+}
diff --git a/ipc/kdbus/pool.h b/ipc/kdbus/pool.h
new file mode 100644
index 000000000000..a9038213aa4d
--- /dev/null
+++ b/ipc/kdbus/pool.h
@@ -0,0 +1,46 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ * Copyright (C) 2014-2015 Djalal Harouni <***@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_POOL_H
+#define __KDBUS_POOL_H
+
+#include <linux/uio.h>
+
+struct kdbus_pool;
+struct kdbus_pool_slice;
+
+struct kdbus_pool *kdbus_pool_new(const char *name, size_t size);
+void kdbus_pool_free(struct kdbus_pool *pool);
+void kdbus_pool_accounted(struct kdbus_pool *pool, size_t *size, size_t *acc);
+int kdbus_pool_mmap(const struct kdbus_pool *pool, struct vm_area_struct *vma);
+int kdbus_pool_release_offset(struct kdbus_pool *pool, size_t off);
+void kdbus_pool_publish_empty(struct kdbus_pool *pool, u64 *off, u64 *size);
+
+struct kdbus_pool_slice *kdbus_pool_slice_alloc(struct kdbus_pool *pool,
+ size_t size, bool accounted);
+void kdbus_pool_slice_release(struct kdbus_pool_slice *slice);
+void kdbus_pool_slice_publish(struct kdbus_pool_slice *slice,
+ u64 *out_offset, u64 *out_size);
+off_t kdbus_pool_slice_offset(const struct kdbus_pool_slice *slice);
+size_t kdbus_pool_slice_size(const struct kdbus_pool_slice *slice);
+int kdbus_pool_slice_copy(const struct kdbus_pool_slice *slice_dst,
+ const struct kdbus_pool_slice *slice_src);
+ssize_t kdbus_pool_slice_copy_kvec(const struct kdbus_pool_slice *slice,
+ loff_t off, struct kvec *kvec,
+ size_t kvec_count, size_t total_len);
+ssize_t kdbus_pool_slice_copy_iovec(const struct kdbus_pool_slice *slice,
+ loff_t off, struct iovec *iov,
+ size_t iov_count, size_t total_len);
+
+#endif
--
2.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2015-03-09 13:11:00 UTC
Permalink
From: Daniel Mack <***@zonque.org>

kdbusfs is a filesystem that will expose a fresh kdbus domain context
each time it is mounted. Per mount point, there will be a 'control'
node, which can be used to create buses. fs.c contains the
implementation of that pseudo-fs. Exported inodes of 'file' type have
their i_fop set to either kdbus_handle_control_ops or
kdbus_handle_ep_ops, depending on their type. The actual dispatching
of file operations is done from handle.c

node.c is an implementation of a kdbus object that has an id and
children, organized in an R/B tree. The tree is used by the filesystem
code for lookup and iterator functions, and to deactivate children
once the parent is deactivated. Every inode exported by kdbusfs is
backed by a kdbus_node, hence it is embedded in struct kdbus_ep,
struct kdbus_bus and struct kdbus_domain.

Signed-off-by: Daniel Mack <***@zonque.org>
Signed-off-by: David Herrmann <***@gmail.com>
Signed-off-by: Djalal Harouni <***@opendz.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
include/uapi/linux/magic.h | 2 +
ipc/kdbus/fs.c | 510 +++++++++++++++++++++++++
ipc/kdbus/fs.h | 28 ++
ipc/kdbus/node.c | 910 +++++++++++++++++++++++++++++++++++++++++++++
ipc/kdbus/node.h | 84 +++++
5 files changed, 1534 insertions(+)
create mode 100644 ipc/kdbus/fs.c
create mode 100644 ipc/kdbus/fs.h
create mode 100644 ipc/kdbus/node.c
create mode 100644 ipc/kdbus/node.h

diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index 7d664ea85ebd..1cf05c066158 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -74,4 +74,6 @@
#define BTRFS_TEST_MAGIC 0x73727279
#define NSFS_MAGIC 0x6e736673

+#define KDBUS_SUPER_MAGIC 0x44427573
+
#endif /* __LINUX_MAGIC_H__ */
diff --git a/ipc/kdbus/fs.c b/ipc/kdbus/fs.c
new file mode 100644
index 000000000000..d01f33baaa0d
--- /dev/null
+++ b/ipc/kdbus/fs.c
@@ -0,0 +1,510 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/dcache.h>
+#include <linux/fs.h>
+#include <linux/fsnotify.h>
+#include <linux/init.h>
+#include <linux/ipc_namespace.h>
+#include <linux/magic.h>
+#include <linux/module.h>
+#include <linux/mount.h>
+#include <linux/mutex.h>
+#include <linux/namei.h>
+#include <linux/pagemap.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+
+#include "bus.h"
+#include "domain.h"
+#include "endpoint.h"
+#include "fs.h"
+#include "handle.h"
+#include "node.h"
+
+#define kdbus_node_from_dentry(_dentry) \
+ ((struct kdbus_node *)(_dentry)->d_fsdata)
+
+static struct inode *fs_inode_get(struct super_block *sb,
+ struct kdbus_node *node);
+
+/*
+ * Directory Management
+ */
+
+static inline unsigned char kdbus_dt_type(struct kdbus_node *node)
+{
+ switch (node->type) {
+ case KDBUS_NODE_DOMAIN:
+ case KDBUS_NODE_BUS:
+ return DT_DIR;
+ case KDBUS_NODE_CONTROL:
+ case KDBUS_NODE_ENDPOINT:
+ return DT_REG;
+ }
+
+ return DT_UNKNOWN;
+}
+
+static int fs_dir_fop_iterate(struct file *file, struct dir_context *ctx)
+{
+ struct dentry *dentry = file->f_path.dentry;
+ struct kdbus_node *parent = kdbus_node_from_dentry(dentry);
+ struct kdbus_node *old, *next = file->private_data;
+
+ /*
+ * kdbusfs directory iterator (modelled after sysfs/kernfs)
+ * When iterating kdbusfs directories, we iterate all children of the
+ * parent kdbus_node object. We use ctx->pos to store the hash of the
+ * child and file->private_data to store a reference to the next node
+ * object. If ctx->pos is not modified via llseek while you iterate a
+ * directory, then we use the file->private_data node pointer to
+ * directly access the next node in the tree.
+ * However, if you directly seek on the directory, we have to find the
+ * closest node to that position and cannot use our node pointer. This
+ * means iterating the rb-tree to find the closest match and start over
+ * from there.
+ * Note that hash values are not neccessarily unique. Therefore, llseek
+ * is not guaranteed to seek to the same node that you got when you
+ * retrieved the position. Seeking to 0, 1, 2 and >=INT_MAX is safe,
+ * though. We could use the inode-number as position, but this would
+ * require another rb-tree for fast access. Kernfs and others already
+ * ignore those conflicts, so we should be fine, too.
+ */
+
+ if (!dir_emit_dots(file, ctx))
+ return 0;
+
+ /* acquire @next; if deactivated, or seek detected, find next node */
+ old = next;
+ if (next && ctx->pos == next->hash) {
+ if (kdbus_node_acquire(next))
+ kdbus_node_ref(next);
+ else
+ next = kdbus_node_next_child(parent, next);
+ } else {
+ next = kdbus_node_find_closest(parent, ctx->pos);
+ }
+ kdbus_node_unref(old);
+
+ while (next) {
+ /* emit @next */
+ file->private_data = next;
+ ctx->pos = next->hash;
+
+ kdbus_node_release(next);
+
+ if (!dir_emit(ctx, next->name, strlen(next->name), next->id,
+ kdbus_dt_type(next)))
+ return 0;
+
+ /* find next node after @next */
+ old = next;
+ next = kdbus_node_next_child(parent, next);
+ kdbus_node_unref(old);
+ }
+
+ file->private_data = NULL;
+ ctx->pos = INT_MAX;
+
+ return 0;
+}
+
+static loff_t fs_dir_fop_llseek(struct file *file, loff_t offset, int whence)
+{
+ struct inode *inode = file_inode(file);
+ loff_t ret;
+
+ /* protect f_off against fop_iterate */
+ mutex_lock(&inode->i_mutex);
+ ret = generic_file_llseek(file, offset, whence);
+ mutex_unlock(&inode->i_mutex);
+
+ return ret;
+}
+
+static int fs_dir_fop_release(struct inode *inode, struct file *file)
+{
+ kdbus_node_unref(file->private_data);
+ return 0;
+}
+
+static const struct file_operations fs_dir_fops = {
+ .read = generic_read_dir,
+ .iterate = fs_dir_fop_iterate,
+ .llseek = fs_dir_fop_llseek,
+ .release = fs_dir_fop_release,
+};
+
+static struct dentry *fs_dir_iop_lookup(struct inode *dir,
+ struct dentry *dentry,
+ unsigned int flags)
+{
+ struct dentry *dnew = NULL;
+ struct kdbus_node *parent;
+ struct kdbus_node *node;
+ struct inode *inode;
+
+ parent = kdbus_node_from_dentry(dentry->d_parent);
+ if (!kdbus_node_acquire(parent))
+ return NULL;
+
+ /* returns reference to _acquired_ child node */
+ node = kdbus_node_find_child(parent, dentry->d_name.name);
+ if (node) {
+ dentry->d_fsdata = node;
+ inode = fs_inode_get(dir->i_sb, node);
+ if (IS_ERR(inode))
+ dnew = ERR_CAST(inode);
+ else
+ dnew = d_splice_alias(inode, dentry);
+
+ kdbus_node_release(node);
+ }
+
+ kdbus_node_release(parent);
+ return dnew;
+}
+
+static const struct inode_operations fs_dir_iops = {
+ .permission = generic_permission,
+ .lookup = fs_dir_iop_lookup,
+};
+
+/*
+ * Inode Management
+ */
+
+static const struct inode_operations fs_inode_iops = {
+ .permission = generic_permission,
+};
+
+static struct inode *fs_inode_get(struct super_block *sb,
+ struct kdbus_node *node)
+{
+ struct inode *inode;
+
+ inode = iget_locked(sb, node->id);
+ if (!inode)
+ return ERR_PTR(-ENOMEM);
+ if (!(inode->i_state & I_NEW))
+ return inode;
+
+ inode->i_private = kdbus_node_ref(node);
+ inode->i_mapping->a_ops = &empty_aops;
+ inode->i_mode = node->mode & S_IALLUGO;
+ inode->i_atime = inode->i_ctime = inode->i_mtime = CURRENT_TIME;
+ inode->i_uid = node->uid;
+ inode->i_gid = node->gid;
+
+ switch (node->type) {
+ case KDBUS_NODE_DOMAIN:
+ case KDBUS_NODE_BUS:
+ inode->i_mode |= S_IFDIR;
+ inode->i_op = &fs_dir_iops;
+ inode->i_fop = &fs_dir_fops;
+ set_nlink(inode, 2);
+ break;
+ case KDBUS_NODE_CONTROL:
+ case KDBUS_NODE_ENDPOINT:
+ inode->i_mode |= S_IFREG;
+ inode->i_op = &fs_inode_iops;
+ inode->i_fop = &kdbus_handle_ops;
+ break;
+ }
+
+ unlock_new_inode(inode);
+
+ return inode;
+}
+
+/*
+ * Superblock Management
+ */
+
+static int fs_super_dop_revalidate(struct dentry *dentry, unsigned int flags)
+{
+ struct kdbus_node *node;
+
+ /* Force lookup on negatives */
+ if (!dentry->d_inode)
+ return 0;
+
+ node = kdbus_node_from_dentry(dentry);
+
+ /* see whether the node has been removed */
+ if (!kdbus_node_is_active(node))
+ return 0;
+
+ return 1;
+}
+
+static void fs_super_dop_release(struct dentry *dentry)
+{
+ kdbus_node_unref(dentry->d_fsdata);
+}
+
+static const struct dentry_operations fs_super_dops = {
+ .d_revalidate = fs_super_dop_revalidate,
+ .d_release = fs_super_dop_release,
+};
+
+static void fs_super_sop_evict_inode(struct inode *inode)
+{
+ struct kdbus_node *node = kdbus_node_from_inode(inode);
+
+ truncate_inode_pages_final(&inode->i_data);
+ clear_inode(inode);
+ kdbus_node_unref(node);
+}
+
+static const struct super_operations fs_super_sops = {
+ .statfs = simple_statfs,
+ .drop_inode = generic_delete_inode,
+ .evict_inode = fs_super_sop_evict_inode,
+};
+
+static int fs_super_fill(struct super_block *sb)
+{
+ struct kdbus_domain *domain = sb->s_fs_info;
+ struct inode *inode;
+ int ret;
+
+ sb->s_blocksize = PAGE_CACHE_SIZE;
+ sb->s_blocksize_bits = PAGE_CACHE_SHIFT;
+ sb->s_magic = KDBUS_SUPER_MAGIC;
+ sb->s_maxbytes = MAX_LFS_FILESIZE;
+ sb->s_op = &fs_super_sops;
+ sb->s_time_gran = 1;
+
+ inode = fs_inode_get(sb, &domain->node);
+ if (IS_ERR(inode))
+ return PTR_ERR(inode);
+
+ sb->s_root = d_make_root(inode);
+ if (!sb->s_root) {
+ /* d_make_root iput()s the inode on failure */
+ return -ENOMEM;
+ }
+
+ /* sb holds domain reference */
+ sb->s_root->d_fsdata = &domain->node;
+ sb->s_d_op = &fs_super_dops;
+
+ /* sb holds root reference */
+ domain->dentry = sb->s_root;
+
+ if (!kdbus_node_activate(&domain->node))
+ return -ESHUTDOWN;
+
+ ret = kdbus_domain_populate(domain, KDBUS_MAKE_ACCESS_WORLD);
+ if (ret < 0)
+ return ret;
+
+ sb->s_flags |= MS_ACTIVE;
+ return 0;
+}
+
+static void fs_super_kill(struct super_block *sb)
+{
+ struct kdbus_domain *domain = sb->s_fs_info;
+
+ if (domain) {
+ kdbus_node_deactivate(&domain->node);
+ domain->dentry = NULL;
+ }
+
+ kill_anon_super(sb);
+
+ if (domain)
+ kdbus_domain_unref(domain);
+}
+
+static int fs_super_set(struct super_block *sb, void *data)
+{
+ int ret;
+
+ ret = set_anon_super(sb, data);
+ if (!ret)
+ sb->s_fs_info = data;
+
+ return ret;
+}
+
+static struct dentry *fs_super_mount(struct file_system_type *fs_type,
+ int flags, const char *dev_name,
+ void *data)
+{
+ struct kdbus_domain *domain;
+ struct super_block *sb;
+ int ret;
+
+ domain = kdbus_domain_new(KDBUS_MAKE_ACCESS_WORLD);
+ if (IS_ERR(domain))
+ return ERR_CAST(domain);
+
+ sb = sget(fs_type, NULL, fs_super_set, flags, domain);
+ if (IS_ERR(sb)) {
+ kdbus_node_deactivate(&domain->node);
+ kdbus_domain_unref(domain);
+ return ERR_CAST(sb);
+ }
+
+ WARN_ON(sb->s_fs_info != domain);
+ WARN_ON(sb->s_root);
+
+ ret = fs_super_fill(sb);
+ if (ret < 0) {
+ /* calls into ->kill_sb() when done */
+ deactivate_locked_super(sb);
+ return ERR_PTR(ret);
+ }
+
+ return dget(sb->s_root);
+}
+
+static struct file_system_type fs_type = {
+ .name = KBUILD_MODNAME "fs",
+ .owner = THIS_MODULE,
+ .mount = fs_super_mount,
+ .kill_sb = fs_super_kill,
+ .fs_flags = FS_USERNS_MOUNT,
+};
+
+/**
+ * kdbus_fs_init() - register kdbus filesystem
+ *
+ * This registers a filesystem with the VFS layer. The filesystem is called
+ * `KBUILD_MODNAME "fs"', which usually resolves to `kdbusfs'. The nameing
+ * scheme allows to set KBUILD_MODNAME to "kdbus2" and you will get an
+ * independent filesystem for developers.
+ *
+ * Each mount of the kdbusfs filesystem has an kdbus_domain attached.
+ * Operations on this mount will only affect the attached domain. On each mount
+ * a new domain is automatically created and used for this mount exclusively.
+ * If you want to share a domain across multiple mounts, you need to bind-mount
+ * it.
+ *
+ * Mounts of kdbusfs (with a different domain each) are unrelated to each other
+ * and will never have any effect on any domain but their own.
+ *
+ * Return: 0 on success, negative error otherwise.
+ */
+int kdbus_fs_init(void)
+{
+ return register_filesystem(&fs_type);
+}
+
+/**
+ * kdbus_fs_exit() - unregister kdbus filesystem
+ *
+ * This does the reverse to kdbus_fs_init(). It unregisters the kdbusfs
+ * filesystem from VFS and cleans up any allocated resources.
+ */
+void kdbus_fs_exit(void)
+{
+ unregister_filesystem(&fs_type);
+}
+
+/* acquire domain of @node, making sure all ancestors are active */
+static struct kdbus_domain *fs_acquire_domain(struct kdbus_node *node)
+{
+ struct kdbus_domain *domain;
+ struct kdbus_node *iter;
+
+ /* caller must guarantee that @node is linked */
+ for (iter = node; iter->parent; iter = iter->parent)
+ if (!kdbus_node_is_active(iter->parent))
+ return NULL;
+
+ /* root nodes are always domains */
+ if (WARN_ON(iter->type != KDBUS_NODE_DOMAIN))
+ return NULL;
+
+ domain = kdbus_domain_from_node(iter);
+ if (!kdbus_node_acquire(&domain->node))
+ return NULL;
+
+ return domain;
+}
+
+/**
+ * kdbus_fs_flush() - flush dcache entries of a node
+ * @node: Node to flush entries of
+ *
+ * This flushes all VFS filesystem cache entries for a node and all its
+ * children. This should be called whenever a node is destroyed during
+ * runtime. It will flush the cache entries so the linked objects can be
+ * deallocated.
+ *
+ * This is a no-op if you call it on active nodes (they really should stay in
+ * cache) or on nodes with deactivated parents (flushing the parent is enough).
+ * Furthermore, there is no need to call it on nodes whose lifetime is bound to
+ * their parents'. In those cases, the parent-flush will always also flush the
+ * children.
+ */
+void kdbus_fs_flush(struct kdbus_node *node)
+{
+ struct dentry *dentry, *parent_dentry = NULL;
+ struct kdbus_domain *domain;
+ struct qstr name;
+
+ /* active nodes should remain in cache */
+ if (!kdbus_node_is_deactivated(node))
+ return;
+
+ /* nodes that were never linked were never instantiated */
+ if (!node->parent)
+ return;
+
+ /* acquire domain and verify all ancestors are active */
+ domain = fs_acquire_domain(node);
+ if (!domain)
+ return;
+
+ switch (node->type) {
+ case KDBUS_NODE_ENDPOINT:
+ if (WARN_ON(!node->parent || !node->parent->name))
+ goto exit;
+
+ name.name = node->parent->name;
+ name.len = strlen(node->parent->name);
+ parent_dentry = d_hash_and_lookup(domain->dentry, &name);
+ if (IS_ERR_OR_NULL(parent_dentry))
+ goto exit;
+
+ /* fallthrough */
+ case KDBUS_NODE_BUS:
+ if (WARN_ON(!node->name))
+ goto exit;
+
+ name.name = node->name;
+ name.len = strlen(node->name);
+ dentry = d_hash_and_lookup(parent_dentry ? : domain->dentry,
+ &name);
+ if (!IS_ERR_OR_NULL(dentry)) {
+ d_invalidate(dentry);
+ dput(dentry);
+ }
+
+ dput(parent_dentry);
+ break;
+
+ default:
+ /* all other types are bound to their parent lifetime */
+ break;
+ }
+
+exit:
+ kdbus_node_release(&domain->node);
+}
diff --git a/ipc/kdbus/fs.h b/ipc/kdbus/fs.h
new file mode 100644
index 000000000000..62f7d6abf11e
--- /dev/null
+++ b/ipc/kdbus/fs.h
@@ -0,0 +1,28 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUSFS_H
+#define __KDBUSFS_H
+
+#include <linux/kernel.h>
+
+struct kdbus_node;
+
+int kdbus_fs_init(void);
+void kdbus_fs_exit(void);
+void kdbus_fs_flush(struct kdbus_node *node);
+
+#define kdbus_node_from_inode(_inode) \
+ ((struct kdbus_node *)(_inode)->i_private)
+
+#endif
diff --git a/ipc/kdbus/node.c b/ipc/kdbus/node.c
new file mode 100644
index 000000000000..520df00e676a
--- /dev/null
+++ b/ipc/kdbus/node.c
@@ -0,0 +1,910 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/atomic.h>
+#include <linux/fs.h>
+#include <linux/idr.h>
+#include <linux/kdev_t.h>
+#include <linux/rbtree.h>
+#include <linux/rwsem.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/wait.h>
+
+#include "bus.h"
+#include "domain.h"
+#include "endpoint.h"
+#include "fs.h"
+#include "handle.h"
+#include "node.h"
+#include "util.h"
+
+/**
+ * DOC: kdbus nodes
+ *
+ * Nodes unify lifetime management across exposed kdbus objects and provide a
+ * hierarchy. Each kdbus object, that might be exposed to user-space, has a
+ * kdbus_node object embedded and is linked into the hierarchy. Each node can
+ * have any number (0-n) of child nodes linked. Each child retains a reference
+ * to its parent node. For root-nodes, the parent is NULL.
+ *
+ * Each node object goes through a bunch of states during it's lifetime:
+ * * NEW
+ * * LINKED (can be skipped by NEW->FREED transition)
+ * * ACTIVE (can be skipped by LINKED->INACTIVE transition)
+ * * INACTIVE
+ * * DRAINED
+ * * FREED
+ *
+ * Each node is allocated by the caller and initialized via kdbus_node_init().
+ * This never fails and sets the object into state NEW. From now on, ref-counts
+ * on the node manage its lifetime. During init, the ref-count is set to 1. Once
+ * it drops to 0, the node goes to state FREED and the node->free_cb() callback
+ * is called to deallocate any memory.
+ *
+ * After initializing a node, you usually link it into the hierarchy. You need
+ * to provide a parent node and a name. The node will be linked as child to the
+ * parent and a globally unique ID is assigned to the child. The name of the
+ * child must be unique for all children of this parent. Otherwise, linking the
+ * child will fail with -EEXIST.
+ * Note that the child is not marked active, yet. Admittedly, it prevents any
+ * other node from being linked with the same name (thus, it reserves that
+ * name), but any child-lookup (via name or unique ID) will never return this
+ * child unless it has been marked active.
+ *
+ * Once successfully linked, you can use kdbus_node_activate() to activate a
+ * child. This will mark the child active. This state can be skipped by directly
+ * deactivating the child via kdbus_node_deactivate() (see below).
+ * By activating a child, you enable any lookups on this child to succeed from
+ * now on. Furthermore, any code that got its hands on a reference to the node,
+ * can from now on "acquire" the node.
+ *
+ * Active References (or: 'acquiring' and 'releasing' a node)
+ * Additionally to normal object references, nodes support something we call
+ * "active references". An active reference can be acquired via
+ * kdbus_node_acquire() and released via kdbus_node_release(). A caller
+ * _must_ own a normal object reference whenever calling those functions.
+ * Unlike object references, acquiring an active reference can fail (by
+ * returning 'false' from kdbus_node_acquire()). An active reference can
+ * only be acquired if the node is marked active. If it is not marked
+ * active, yet, or if it was already deactivated, no more active references
+ * can be acquired, ever!
+ * Active references are used to track tasks working on a node. Whenever a
+ * task enters kernel-space to perform an action on a node, it acquires an
+ * active reference, performs the action and releases the reference again.
+ * While holding an active reference, the node is guaranteed to stay active.
+ * If the node is deactivated in parallel, the node is marked as
+ * deactivated, then we wait for all active references to be dropped, before
+ * we finally proceed with any cleanups. That is, if you hold an active
+ * reference to a node, any resources that are bound to the "active" state
+ * are guaranteed to stay accessible until you release your reference.
+ *
+ * Active-references are very similar to rw-locks, where acquiring a node is
+ * equal to try-read-lock and releasing to read-unlock. Deactivating a node
+ * means write-lock and never releasing it again.
+ * Unlike rw-locks, the 'active reference' concept is more versatile and
+ * avoids unusual rw-lock usage (never releasing a write-lock..).
+ *
+ * It is safe to acquire multiple active-references recursively. But you
+ * need to check the return value of kdbus_node_acquire() on _each_ call. It
+ * may stop granting references at _any_ time.
+ *
+ * You're free to perform any operations you want while holding an active
+ * reference, except sleeping for an indefinite period. Sleeping for a fixed
+ * amount of time is fine, but you usually should not wait on wait-queues
+ * without a timeout.
+ * For example, if you wait for I/O to happen, you should gather all data
+ * and schedule the I/O operation, then release your active reference and
+ * wait for it to complete. Then try to acquire a new reference. If it
+ * fails, perform any cleanup (the node is now dead). Otherwise, you can
+ * finish your operation.
+ *
+ * All nodes can be deactivated via kdbus_node_deactivate() at any time. You can
+ * call this multiple times, even in parallel or on nodes that were never
+ * linked, and it will just work. The only restriction is, you must not hold an
+ * active reference when calling kdbus_node_deactivate().
+ * By deactivating a node, it is immediately marked inactive. Then, we wait for
+ * all active references to be released (called 'draining' the node). This
+ * shouldn't take very long as we don't perform long-lasting operations while
+ * holding an active reference. Note that once the node is marked inactive, no
+ * new active references can be acquired.
+ * Once all active references are dropped, the node is considered 'drained'. Now
+ * kdbus_node_deactivate() is called on each child of the node before we
+ * continue deactvating our node. That is, once all children are entirely
+ * deactivated, we call ->release_cb() of our node. ->release_cb() can release
+ * any resources on that node which are bound to the "active" state of a node.
+ * When done, we unlink the node from its parent rb-tree, mark it as
+ * 'released' and return.
+ * If kdbus_node_deactivate() is called multiple times (even in parallel), all
+ * but one caller will just wait until the node is fully deactivated. That is,
+ * one random caller of kdbus_node_deactivate() is selected to call
+ * ->release_cb() and cleanup the node. Only once all this is done, all other
+ * callers will return from kdbus_node_deactivate(). That is, it doesn't matter
+ * whether you're the selected caller or not, it will only return after
+ * everything is fully done.
+ *
+ * When a node is activated, we acquire a normal object reference to the node.
+ * This reference is dropped after deactivation is fully done (and only iff the
+ * node really was activated). This allows callers to link+activate a child node
+ * and then drop all refs. The node will be deactivated together with the
+ * parent, and then be freed when this reference is dropped.
+ *
+ * Currently, nodes provide a bunch of resources that external code can use
+ * directly. This includes:
+ *
+ * * node->waitq: Each node has its own wait-queue that is used to manage
+ * the 'active' state. When a node is deactivated, we wait on
+ * this queue until all active refs are dropped. Analogously,
+ * when you release an active reference on a deactivated
+ * node, and the active ref-count drops to 0, we wake up a
+ * single thread on this queue. Furthermore, once the
+ * ->release_cb() callback finished, we wake up all waiters.
+ * The node-owner is free to re-use this wait-queue for other
+ * purposes. As node-management uses this queue only during
+ * deactivation, it is usually totally fine to re-use the
+ * queue for other, preferably low-overhead, use-cases.
+ *
+ * * node->type: This field defines the type of the owner of this node. It
+ * must be set during node initialization and must remain
+ * constant. The node management never looks at this value,
+ * but external users might use to gain access to the owner
+ * object of a node.
+ * It is totally up to the owner of the node to define what
+ * their type means. Usually it means you can access the
+ * parent structure via container_of(), as long as you hold an
+ * active reference to the node.
+ *
+ * * node->free_cb: callback after all references are dropped
+ * node->release_cb: callback during node deactivation
+ * These fields must be set by the node owner during
+ * node initialization. They must remain constant. If
+ * NULL, they're skipped.
+ *
+ * * node->mode: filesystem access modes
+ * node->uid: filesystem owner uid
+ * node->gid: filesystem owner gid
+ * These fields must be set by the node owner during node
+ * initialization. They must remain constant and may be
+ * accessed by other callers to properly initialize
+ * filesystem nodes.
+ *
+ * * node->id: This is an unsigned 32bit integer allocated by an IDR. It is
+ * always kept as small as possible during allocation and is
+ * globally unique across all nodes allocated by this module. 0
+ * is reserved as "not assigned" and is the default.
+ * The ID is assigned during kdbus_node_link() and is kept until
+ * the object is freed. Thus, the ID surpasses the active
+ * lifetime of a node. As long as you hold an object reference
+ * to a node (and the node was linked once), the ID is valid and
+ * unique.
+ *
+ * * node->name: name of this node
+ * node->hash: 31bit hash-value of @name (range [2..INT_MAX-1])
+ * These values follow the same lifetime rules as node->id.
+ * They're initialized when the node is linked and then remain
+ * constant until the last object reference is dropped.
+ * Unlike the id, the name is only unique across all siblings
+ * and only until the node is deactivated. Currently, the name
+ * is even unique if linked but not activated, yet. This might
+ * change in the future, though. Code should not rely on this.
+ *
+ * * node->lock: lock to protect node->children, node->rb, node->parent
+ * * node->parent: Reference to parent node. This is set during LINK time
+ * and is dropped during destruction. You must not access
+ * it unless you hold an active reference to the node or if
+ * you know the node is dead.
+ * * node->children: rb-tree of all linked children of this node. You must
+ * not access this directly, but use one of the iterator
+ * or lookup helpers.
+ */
+
+/*
+ * Bias values track states of "active references". They're all negative. If a
+ * node is active, its active-ref-counter is >=0 and tracks all active
+ * references. Once a node is deactivaed, we subtract NODE_BIAS. This means, the
+ * counter is now negative but still counts the active references. Once it drops
+ * to exactly NODE_BIAS, we know all active references were dropped. Exactly one
+ * thread will change it to NODE_RELEASE now, perform cleanup and then put it
+ * into NODE_DRAINED. Once drained, all other threads that tried deactivating
+ * the node will now be woken up (thus, they wait until the node is fully done).
+ * The initial state during node-setup is NODE_NEW. If a node is directly
+ * deactivated without having ever been active, it is put into
+ * NODE_RELEASE_DIRECT instead of NODE_BIAS. This tracks this one-bit state
+ * across node-deactivation. The task putting it into NODE_RELEASE now knows
+ * whether the node was active before or not.
+ *
+ * Some archs implement atomic_sub(v) with atomic_add(-v), so reserve INT_MIN
+ * to avoid overflows if multiplied by -1.
+ */
+#define KDBUS_NODE_BIAS (INT_MIN + 5)
+#define KDBUS_NODE_RELEASE_DIRECT (KDBUS_NODE_BIAS - 1)
+#define KDBUS_NODE_RELEASE (KDBUS_NODE_BIAS - 2)
+#define KDBUS_NODE_DRAINED (KDBUS_NODE_BIAS - 3)
+#define KDBUS_NODE_NEW (KDBUS_NODE_BIAS - 4)
+
+/* global unique ID mapping for kdbus nodes */
+static DEFINE_IDR(kdbus_node_idr);
+static DECLARE_RWSEM(kdbus_node_idr_lock);
+
+/**
+ * kdbus_node_name_hash() - hash a name
+ * @name: The string to hash
+ *
+ * This computes the hash of @name. It is guaranteed to be in the range
+ * [2..INT_MAX-1]. The values 1, 2 and INT_MAX are unused as they are reserved
+ * for the filesystem code.
+ *
+ * Return: hash value of the passed string
+ */
+static unsigned int kdbus_node_name_hash(const char *name)
+{
+ unsigned int hash;
+
+ /* reserve hash numbers 0, 1 and >=INT_MAX for magic directories */
+ hash = kdbus_strhash(name) & INT_MAX;
+ if (hash < 2)
+ hash += 2;
+ if (hash >= INT_MAX)
+ hash = INT_MAX - 1;
+
+ return hash;
+}
+
+/**
+ * kdbus_node_name_compare() - compare a name with a node's name
+ * @hash: hash of the string to compare the node with
+ * @name: name to compare the node with
+ * @node: node to compare the name with
+ *
+ * Return: 0 if @name and @hash exactly match the information in @node, or
+ * an integer less than or greater than zero if @name is found, respectively,
+ * to be less than or be greater than the string stored in @node.
+ */
+static int kdbus_node_name_compare(unsigned int hash, const char *name,
+ const struct kdbus_node *node)
+{
+ if (hash != node->hash)
+ return hash - node->hash;
+
+ return strcmp(name, node->name);
+}
+
+/**
+ * kdbus_node_init() - initialize a kdbus_node
+ * @node: Pointer to the node to initialize
+ * @type: The type the node will have (KDBUS_NODE_*)
+ *
+ * The caller is responsible of allocating @node and initializating it to zero.
+ * Once this call returns, you must use the node_ref() and node_unref()
+ * functions to manage this node.
+ */
+void kdbus_node_init(struct kdbus_node *node, unsigned int type)
+{
+ atomic_set(&node->refcnt, 1);
+ mutex_init(&node->lock);
+ node->id = 0;
+ node->type = type;
+ RB_CLEAR_NODE(&node->rb);
+ node->children = RB_ROOT;
+ init_waitqueue_head(&node->waitq);
+ atomic_set(&node->active, KDBUS_NODE_NEW);
+}
+
+/**
+ * kdbus_node_link() - link a node into the nodes system
+ * @node: Pointer to the node to initialize
+ * @parent: Pointer to a parent node, may be %NULL
+ * @name: The name of the node (or NULL if root node)
+ *
+ * This links a node into the hierarchy. This must not be called multiple times.
+ * If @parent is NULL, the node becomes a new root node.
+ *
+ * This call will fail if @name is not unique across all its siblings or if no
+ * ID could be allocated. You must not activate a node if linking failed! It is
+ * safe to deactivate it, though.
+ *
+ * Once you linked a node, you must call kdbus_node_deactivate() before you drop
+ * the last reference (even if you never activate the node).
+ *
+ * Return: 0 on success. negative error otherwise.
+ */
+int kdbus_node_link(struct kdbus_node *node, struct kdbus_node *parent,
+ const char *name)
+{
+ int ret;
+
+ if (WARN_ON(node->type != KDBUS_NODE_DOMAIN && !parent))
+ return -EINVAL;
+
+ if (WARN_ON(parent && !name))
+ return -EINVAL;
+
+ if (name) {
+ node->name = kstrdup(name, GFP_KERNEL);
+ if (!node->name)
+ return -ENOMEM;
+
+ node->hash = kdbus_node_name_hash(name);
+ }
+
+ down_write(&kdbus_node_idr_lock);
+ ret = idr_alloc(&kdbus_node_idr, node, 1, 0, GFP_KERNEL);
+ if (ret >= 0)
+ node->id = ret;
+ up_write(&kdbus_node_idr_lock);
+
+ if (ret < 0)
+ return ret;
+
+ ret = 0;
+
+ if (parent) {
+ struct rb_node **n, *prev;
+
+ if (!kdbus_node_acquire(parent))
+ return -ESHUTDOWN;
+
+ mutex_lock(&parent->lock);
+
+ n = &parent->children.rb_node;
+ prev = NULL;
+
+ while (*n) {
+ struct kdbus_node *pos;
+ int result;
+
+ pos = kdbus_node_from_rb(*n);
+ prev = *n;
+ result = kdbus_node_name_compare(node->hash,
+ node->name,
+ pos);
+ if (result == 0) {
+ ret = -EEXIST;
+ goto exit_unlock;
+ }
+
+ if (result < 0)
+ n = &pos->rb.rb_left;
+ else
+ n = &pos->rb.rb_right;
+ }
+
+ /* add new node and rebalance the tree */
+ rb_link_node(&node->rb, prev, n);
+ rb_insert_color(&node->rb, &parent->children);
+ node->parent = kdbus_node_ref(parent);
+
+exit_unlock:
+ mutex_unlock(&parent->lock);
+ kdbus_node_release(parent);
+ }
+
+ return ret;
+}
+
+/**
+ * kdbus_node_ref() - Acquire object reference
+ * @node: node to acquire reference to (or NULL)
+ *
+ * This acquires a new reference to @node. You must already own a reference when
+ * calling this!
+ * If @node is NULL, this is a no-op.
+ *
+ * Return: @node is returned
+ */
+struct kdbus_node *kdbus_node_ref(struct kdbus_node *node)
+{
+ if (node)
+ atomic_inc(&node->refcnt);
+ return node;
+}
+
+/**
+ * kdbus_node_unref() - Drop object reference
+ * @node: node to drop reference to (or NULL)
+ *
+ * This drops an object reference to @node. You must not access the node if you
+ * no longer own a reference.
+ * If the ref-count drops to 0, the object will be destroyed (->free_cb will be
+ * called).
+ *
+ * If you linked or activated the node, you must deactivate the node before you
+ * drop your last reference! If you didn't link or activate the node, you can
+ * drop any reference you want.
+ *
+ * Note that this calls into ->free_cb() and thus _might_ sleep. The ->free_cb()
+ * callbacks must not acquire any outer locks, though. So you can safely drop
+ * references while holding locks.
+ *
+ * If @node is NULL, this is a no-op.
+ *
+ * Return: This always returns NULL
+ */
+struct kdbus_node *kdbus_node_unref(struct kdbus_node *node)
+{
+ if (node && atomic_dec_and_test(&node->refcnt)) {
+ struct kdbus_node safe = *node;
+
+ WARN_ON(atomic_read(&node->active) != KDBUS_NODE_DRAINED);
+ WARN_ON(!RB_EMPTY_NODE(&node->rb));
+
+ if (node->free_cb)
+ node->free_cb(node);
+
+ down_write(&kdbus_node_idr_lock);
+ if (safe.id > 0)
+ idr_remove(&kdbus_node_idr, safe.id);
+ /* drop caches after last node to not leak memory on unload */
+ if (idr_is_empty(&kdbus_node_idr)) {
+ idr_destroy(&kdbus_node_idr);
+ idr_init(&kdbus_node_idr);
+ }
+ up_write(&kdbus_node_idr_lock);
+
+ kfree(safe.name);
+
+ /*
+ * kdbusfs relies on the parent to be available even after the
+ * node was deactivated and unlinked. Therefore, we pin it
+ * until a node is destroyed.
+ */
+ kdbus_node_unref(safe.parent);
+ }
+
+ return NULL;
+}
+
+/**
+ * kdbus_node_is_active() - test whether a node is active
+ * @node: node to test
+ *
+ * This checks whether @node is active. That means, @node was linked and
+ * activated by the node owner and hasn't been deactivated, yet. If, and only
+ * if, a node is active, kdbus_node_acquire() will be able to acquire active
+ * references.
+ *
+ * Note that this function does not give any lifetime guarantees. After this
+ * call returns, the node might be deactivated immediately. Normally, what you
+ * want is to acquire a real active reference via kdbus_node_acquire().
+ *
+ * Return: true if @node is active, false otherwise
+ */
+bool kdbus_node_is_active(struct kdbus_node *node)
+{
+ return atomic_read(&node->active) >= 0;
+}
+
+/**
+ * kdbus_node_is_deactivated() - test whether a node was already deactivated
+ * @node: node to test
+ *
+ * This checks whether kdbus_node_deactivate() was called on @node. Note that
+ * this might be true even if you never deactivated the node directly, but only
+ * one of its ancestors.
+ *
+ * Note that even if this returns 'false', the node might get deactivated
+ * immediately after the call returns.
+ *
+ * Return: true if @node was already deactivated, false if not
+ */
+bool kdbus_node_is_deactivated(struct kdbus_node *node)
+{
+ int v;
+
+ v = atomic_read(&node->active);
+ return v != KDBUS_NODE_NEW && v < 0;
+}
+
+/**
+ * kdbus_node_activate() - activate a node
+ * @node: node to activate
+ *
+ * This marks @node as active if, and only if, the node wasn't activated nor
+ * deactivated, yet, and the parent is still active. Any but the first call to
+ * kdbus_node_activate() is a no-op.
+ * If you called kdbus_node_deactivate() before, then even the first call to
+ * kdbus_node_activate() will be a no-op.
+ *
+ * This call doesn't give any lifetime guarantees. The node might get
+ * deactivated immediately after this call returns. Or the parent might already
+ * be deactivated, which will make this call a no-op.
+ *
+ * If this call successfully activated a node, it will take an object reference
+ * to it. This reference is dropped after the node is deactivated. Therefore,
+ * the object owner can safely drop their reference to @node iff they know that
+ * its parent node will get deactivated at some point. Once the parent node is
+ * deactivated, it will deactivate all its child and thus drop this reference
+ * again.
+ *
+ * Return: True if this call successfully activated the node, otherwise false.
+ * Note that this might return false, even if the node is still active
+ * (eg., if you called this a second time).
+ */
+bool kdbus_node_activate(struct kdbus_node *node)
+{
+ bool res = false;
+
+ mutex_lock(&node->lock);
+ if (atomic_read(&node->active) == KDBUS_NODE_NEW) {
+ atomic_sub(KDBUS_NODE_NEW, &node->active);
+ /* activated nodes have ref +1 */
+ kdbus_node_ref(node);
+ res = true;
+ }
+ mutex_unlock(&node->lock);
+
+ return res;
+}
+
+/**
+ * kdbus_node_deactivate() - deactivate a node
+ * @node: The node to deactivate.
+ *
+ * This function recursively deactivates this node and all its children. It
+ * returns only once all children and the node itself were recursively disabled
+ * (even if you call this function multiple times in parallel).
+ *
+ * It is safe to call this function on _any_ node that was initialized _any_
+ * number of times.
+ *
+ * This call may sleep, as it waits for all active references to be dropped.
+ */
+void kdbus_node_deactivate(struct kdbus_node *node)
+{
+ struct kdbus_node *pos, *child;
+ struct rb_node *rb;
+ int v_pre, v_post;
+
+ pos = node;
+
+ /*
+ * To avoid recursion, we perform back-tracking while deactivating
+ * nodes. For each node we enter, we first mark the active-counter as
+ * deactivated by adding BIAS. If the node as children, we set the first
+ * child as current position and start over. If the node has no
+ * children, we drain the node by waiting for all active refs to be
+ * dropped and then releasing the node.
+ *
+ * After the node is released, we set its parent as current position
+ * and start over. If the current position was the initial node, we're
+ * done.
+ *
+ * Note that this function can be called in parallel by multiple
+ * callers. We make sure that each node is only released once, and any
+ * racing caller will wait until the other thread fully released that
+ * node.
+ */
+
+ for (;;) {
+ /*
+ * Add BIAS to node->active to mark it as inactive. If it was
+ * never active before, immediately mark it as RELEASE_INACTIVE
+ * so we remember this state.
+ * We cannot remember v_pre as we might iterate into the
+ * children, overwriting v_pre, before we can release our node.
+ */
+ mutex_lock(&pos->lock);
+ v_pre = atomic_read(&pos->active);
+ if (v_pre >= 0)
+ atomic_add_return(KDBUS_NODE_BIAS, &pos->active);
+ else if (v_pre == KDBUS_NODE_NEW)
+ atomic_set(&pos->active, KDBUS_NODE_RELEASE_DIRECT);
+ mutex_unlock(&pos->lock);
+
+ /* wait until all active references were dropped */
+ wait_event(pos->waitq,
+ atomic_read(&pos->active) <= KDBUS_NODE_BIAS);
+
+ mutex_lock(&pos->lock);
+ /* recurse into first child if any */
+ rb = rb_first(&pos->children);
+ if (rb) {
+ child = kdbus_node_ref(kdbus_node_from_rb(rb));
+ mutex_unlock(&pos->lock);
+ pos = child;
+ continue;
+ }
+
+ /* mark object as RELEASE */
+ v_post = atomic_read(&pos->active);
+ if (v_post == KDBUS_NODE_BIAS ||
+ v_post == KDBUS_NODE_RELEASE_DIRECT)
+ atomic_set(&pos->active, KDBUS_NODE_RELEASE);
+ mutex_unlock(&pos->lock);
+
+ /*
+ * If this is the thread that marked the object as RELEASE, we
+ * perform the actual release. Otherwise, we wait until the
+ * release is done and the node is marked as DRAINED.
+ */
+ if (v_post == KDBUS_NODE_BIAS ||
+ v_post == KDBUS_NODE_RELEASE_DIRECT) {
+ if (pos->release_cb)
+ pos->release_cb(pos, v_post == KDBUS_NODE_BIAS);
+
+ if (pos->parent) {
+ mutex_lock(&pos->parent->lock);
+ if (!RB_EMPTY_NODE(&pos->rb)) {
+ rb_erase(&pos->rb,
+ &pos->parent->children);
+ RB_CLEAR_NODE(&pos->rb);
+ }
+ mutex_unlock(&pos->parent->lock);
+ }
+
+ /* mark as DRAINED */
+ atomic_set(&pos->active, KDBUS_NODE_DRAINED);
+ wake_up_all(&pos->waitq);
+
+ /* drop VFS cache */
+ kdbus_fs_flush(pos);
+
+ /*
+ * If the node was activated and somone subtracted BIAS
+ * from it to deactivate it, we, and only us, are
+ * responsible to release the extra ref-count that was
+ * taken once in kdbus_node_activate().
+ * If the node was never activated, no-one ever
+ * subtracted BIAS, but instead skipped that state and
+ * immediately went to NODE_RELEASE_DIRECT. In that case
+ * we must not drop the reference.
+ */
+ if (v_post == KDBUS_NODE_BIAS)
+ kdbus_node_unref(pos);
+ } else {
+ /* wait until object is DRAINED */
+ wait_event(pos->waitq,
+ atomic_read(&pos->active) == KDBUS_NODE_DRAINED);
+ }
+
+ /*
+ * We're done with the current node. Continue on its parent
+ * again, which will try deactivating its next child, or itself
+ * if no child is left.
+ * If we've reached our initial node again, we are done and
+ * can safely return.
+ */
+ if (pos == node)
+ break;
+
+ child = pos;
+ pos = pos->parent;
+ kdbus_node_unref(child);
+ }
+}
+
+/**
+ * kdbus_node_acquire() - Acquire an active ref on a node
+ * @node: The node
+ *
+ * This acquires an active-reference to @node. This will only succeed if the
+ * node is active. You must release this active reference via
+ * kdbus_node_release() again.
+ *
+ * See the introduction to "active references" for more details.
+ *
+ * Return: %true if @node was non-NULL and active
+ */
+bool kdbus_node_acquire(struct kdbus_node *node)
+{
+ return node && atomic_inc_unless_negative(&node->active);
+}
+
+/**
+ * kdbus_node_release() - Release an active ref on a node
+ * @node: The node
+ *
+ * This releases an active reference that was previously acquired via
+ * kdbus_node_acquire(). See kdbus_node_acquire() for details.
+ */
+void kdbus_node_release(struct kdbus_node *node)
+{
+ if (node && atomic_dec_return(&node->active) == KDBUS_NODE_BIAS)
+ wake_up(&node->waitq);
+}
+
+/**
+ * kdbus_node_find_child() - Find child by name
+ * @node: parent node to search through
+ * @name: name of child node
+ *
+ * This searches through all children of @node for a child-node with name @name.
+ * If not found, or if the child is deactivated, NULL is returned. Otherwise,
+ * the child is acquired and a new reference is returned.
+ *
+ * If you're done with the child, you need to release it and drop your
+ * reference.
+ *
+ * This function does not acquire the parent node. However, if the parent was
+ * already deactivated, then kdbus_node_deactivate() will, at some point, also
+ * deactivate the child. Therefore, we can rely on the explicit ordering during
+ * deactivation.
+ *
+ * Return: Reference to acquired child node, or NULL if not found / not active.
+ */
+struct kdbus_node *kdbus_node_find_child(struct kdbus_node *node,
+ const char *name)
+{
+ struct kdbus_node *child;
+ struct rb_node *rb;
+ unsigned int hash;
+ int ret;
+
+ hash = kdbus_node_name_hash(name);
+
+ mutex_lock(&node->lock);
+ rb = node->children.rb_node;
+ while (rb) {
+ child = kdbus_node_from_rb(rb);
+ ret = kdbus_node_name_compare(hash, name, child);
+ if (ret < 0)
+ rb = rb->rb_left;
+ else if (ret > 0)
+ rb = rb->rb_right;
+ else
+ break;
+ }
+ if (rb && kdbus_node_acquire(child))
+ kdbus_node_ref(child);
+ else
+ child = NULL;
+ mutex_unlock(&node->lock);
+
+ return child;
+}
+
+static struct kdbus_node *node_find_closest_unlocked(struct kdbus_node *node,
+ unsigned int hash,
+ const char *name)
+{
+ struct kdbus_node *n, *pos = NULL;
+ struct rb_node *rb;
+ int res;
+
+ /*
+ * Find the closest child with ``node->hash >= hash'', or, if @name is
+ * valid, ``node->name >= name'' (where '>=' is the lex. order).
+ */
+
+ rb = node->children.rb_node;
+ while (rb) {
+ n = kdbus_node_from_rb(rb);
+
+ if (name)
+ res = kdbus_node_name_compare(hash, name, n);
+ else
+ res = hash - n->hash;
+
+ if (res <= 0) {
+ rb = rb->rb_left;
+ pos = n;
+ } else { /* ``hash > n->hash'', ``name > n->name'' */
+ rb = rb->rb_right;
+ }
+ }
+
+ return pos;
+}
+
+/**
+ * kdbus_node_find_closest() - Find closest child-match
+ * @node: parent node to search through
+ * @hash: hash value to find closest match for
+ *
+ * Find the closest child of @node with a hash greater than or equal to @hash.
+ * The closest match is the left-most child of @node with this property. Which
+ * means, it is the first child with that hash returned by
+ * kdbus_node_next_child(), if you'd iterate the whole parent node.
+ *
+ * Return: Reference to acquired child, or NULL if none found.
+ */
+struct kdbus_node *kdbus_node_find_closest(struct kdbus_node *node,
+ unsigned int hash)
+{
+ struct kdbus_node *child;
+ struct rb_node *rb;
+
+ mutex_lock(&node->lock);
+
+ child = node_find_closest_unlocked(node, hash, NULL);
+ while (child && !kdbus_node_acquire(child)) {
+ rb = rb_next(&child->rb);
+ if (rb)
+ child = kdbus_node_from_rb(rb);
+ else
+ child = NULL;
+ }
+ kdbus_node_ref(child);
+
+ mutex_unlock(&node->lock);
+
+ return child;
+}
+
+/**
+ * kdbus_node_next_child() - Acquire next child
+ * @node: parent node
+ * @prev: previous child-node position or NULL
+ *
+ * This function returns a reference to the next active child of @node, after
+ * the passed position @prev. If @prev is NULL, a reference to the first active
+ * child is returned. If no more active children are found, NULL is returned.
+ *
+ * This function acquires the next child it returns. If you're done with the
+ * returned pointer, you need to release _and_ unref it.
+ *
+ * The passed in pointer @prev is not modified by this function, and it does
+ * *not* have to be active. If @prev was acquired via different means, or if it
+ * was unlinked from its parent before you pass it in, then this iterator will
+ * still return the next active child (it will have to search through the
+ * rb-tree based on the node-name, though).
+ * However, @prev must not be linked to a different parent than @node!
+ *
+ * Return: Reference to next acquired child, or NULL if at the end.
+ */
+struct kdbus_node *kdbus_node_next_child(struct kdbus_node *node,
+ struct kdbus_node *prev)
+{
+ struct kdbus_node *pos = NULL;
+ struct rb_node *rb;
+
+ mutex_lock(&node->lock);
+
+ if (!prev) {
+ /*
+ * New iteration; find first node in rb-tree and try to acquire
+ * it. If we got it, directly return it as first element.
+ * Otherwise, the loop below will find the next active node.
+ */
+ rb = rb_first(&node->children);
+ if (!rb)
+ goto exit;
+ pos = kdbus_node_from_rb(rb);
+ if (kdbus_node_acquire(pos))
+ goto exit;
+ } else if (RB_EMPTY_NODE(&prev->rb)) {
+ /*
+ * The current iterator is no longer linked to the rb-tree. Use
+ * its hash value and name to find the next _higher_ node and
+ * acquire it. If we got it, return it as next element.
+ * Otherwise, the loop below will find the next active node.
+ */
+ pos = node_find_closest_unlocked(node, prev->hash, prev->name);
+ if (!pos)
+ goto exit;
+ if (kdbus_node_acquire(pos))
+ goto exit;
+ } else {
+ /*
+ * The current iterator is still linked to the parent. Set it
+ * as current position and use the loop below to find the next
+ * active element.
+ */
+ pos = prev;
+ }
+
+ /* @pos was already returned or is inactive; find next active node */
+ do {
+ rb = rb_next(&pos->rb);
+ if (rb)
+ pos = kdbus_node_from_rb(rb);
+ else
+ pos = NULL;
+ } while (pos && !kdbus_node_acquire(pos));
+
+exit:
+ /* @pos is NULL or acquired. Take ref if non-NULL and return it */
+ kdbus_node_ref(pos);
+ mutex_unlock(&node->lock);
+ return pos;
+}
diff --git a/ipc/kdbus/node.h b/ipc/kdbus/node.h
new file mode 100644
index 000000000000..be125ce4fd58
--- /dev/null
+++ b/ipc/kdbus/node.h
@@ -0,0 +1,84 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_NODE_H
+#define __KDBUS_NODE_H
+
+#include <linux/atomic.h>
+#include <linux/kernel.h>
+#include <linux/mutex.h>
+#include <linux/wait.h>
+
+struct kdbus_node;
+
+enum kdbus_node_type {
+ KDBUS_NODE_DOMAIN,
+ KDBUS_NODE_CONTROL,
+ KDBUS_NODE_BUS,
+ KDBUS_NODE_ENDPOINT,
+};
+
+typedef void (*kdbus_node_free_t) (struct kdbus_node *node);
+typedef void (*kdbus_node_release_t) (struct kdbus_node *node, bool was_active);
+
+struct kdbus_node {
+ atomic_t refcnt;
+ atomic_t active;
+ wait_queue_head_t waitq;
+
+ /* static members */
+ unsigned int type;
+ kdbus_node_free_t free_cb;
+ kdbus_node_release_t release_cb;
+ umode_t mode;
+ kuid_t uid;
+ kgid_t gid;
+
+ /* valid once linked */
+ char *name;
+ unsigned int hash;
+ unsigned int id;
+ struct kdbus_node *parent; /* may be NULL */
+
+ /* valid iff active */
+ struct mutex lock;
+ struct rb_node rb;
+ struct rb_root children;
+};
+
+#define kdbus_node_from_rb(_node) rb_entry((_node), struct kdbus_node, rb)
+
+void kdbus_node_init(struct kdbus_node *node, unsigned int type);
+
+int kdbus_node_link(struct kdbus_node *node, struct kdbus_node *parent,
+ const char *name);
+
+struct kdbus_node *kdbus_node_ref(struct kdbus_node *node);
+struct kdbus_node *kdbus_node_unref(struct kdbus_node *node);
+
+bool kdbus_node_is_active(struct kdbus_node *node);
+bool kdbus_node_is_deactivated(struct kdbus_node *node);
+bool kdbus_node_activate(struct kdbus_node *node);
+void kdbus_node_deactivate(struct kdbus_node *node);
+
+bool kdbus_node_acquire(struct kdbus_node *node);
+void kdbus_node_release(struct kdbus_node *node);
+
+struct kdbus_node *kdbus_node_find_child(struct kdbus_node *node,
+ const char *name);
+struct kdbus_node *kdbus_node_find_closest(struct kdbus_node *node,
+ unsigned int hash);
+struct kdbus_node *kdbus_node_next_child(struct kdbus_node *node,
+ struct kdbus_node *prev);
+
+#endif
--
2.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2015-03-09 13:11:08 UTC
Permalink
From: Daniel Mack <***@zonque.org>

This patch adds code for matches and notifications.

Notifications are broadcast messages generated by the kernel, which
notify subscribes when connections are created or destroyed, when
well-known-names have been claimed, released or changed ownership,
or when reply messages have timed out.

Matches are used to tell the kernel driver which broadcast messages
a connection is interested in. Matches can either be specific on one
of the kernel-generated notification types, or carry a bloom filter
mask to match against a message from userspace. The latter is a way
to pre-filter messages from other connections in order to mitigate
unnecessary wakeups.

Signed-off-by: Daniel Mack <***@zonque.org>
Signed-off-by: David Herrmann <***@gmail.com>
Signed-off-by: Djalal Harouni <***@opendz.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
ipc/kdbus/match.c | 559 +++++++++++++++++++++++++++++++++++++++++++++++++++++
ipc/kdbus/match.h | 35 ++++
ipc/kdbus/notify.c | 248 ++++++++++++++++++++++++
ipc/kdbus/notify.h | 30 +++
4 files changed, 872 insertions(+)
create mode 100644 ipc/kdbus/match.c
create mode 100644 ipc/kdbus/match.h
create mode 100644 ipc/kdbus/notify.c
create mode 100644 ipc/kdbus/notify.h

diff --git a/ipc/kdbus/match.c b/ipc/kdbus/match.c
new file mode 100644
index 000000000000..30cec1ca819f
--- /dev/null
+++ b/ipc/kdbus/match.c
@@ -0,0 +1,559 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ * Copyright (C) 2014-2015 Djalal Harouni <***@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/fs.h>
+#include <linux/hash.h>
+#include <linux/init.h>
+#include <linux/mutex.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "endpoint.h"
+#include "handle.h"
+#include "item.h"
+#include "match.h"
+#include "message.h"
+#include "names.h"
+
+/**
+ * struct kdbus_match_db - message filters
+ * @entries_list: List of matches
+ * @mdb_rwlock: Match data lock
+ * @entries_count: Number of entries in database
+ */
+struct kdbus_match_db {
+ struct list_head entries_list;
+ struct rw_semaphore mdb_rwlock;
+ unsigned int entries_count;
+};
+
+/**
+ * struct kdbus_match_entry - a match database entry
+ * @cookie: User-supplied cookie to lookup the entry
+ * @list_entry: The list entry element for the db list
+ * @rules_list: The list head for tracking rules of this entry
+ */
+struct kdbus_match_entry {
+ u64 cookie;
+ struct list_head list_entry;
+ struct list_head rules_list;
+};
+
+/**
+ * struct kdbus_bloom_mask - mask to match against filter
+ * @generations: Number of generations carried
+ * @data: Array of bloom bit fields
+ */
+struct kdbus_bloom_mask {
+ u64 generations;
+ u64 *data;
+};
+
+/**
+ * struct kdbus_match_rule - a rule appended to a match entry
+ * @type: An item type to match agains
+ * @bloom_mask: Bloom mask to match a message's filter against, used
+ * with KDBUS_ITEM_BLOOM_MASK
+ * @name: Name to match against, used with KDBUS_ITEM_NAME,
+ * KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE}
+ * @old_id: ID to match against, used with
+ * KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE},
+ * KDBUS_ITEM_ID_REMOVE
+ * @new_id: ID to match against, used with
+ * KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE},
+ * KDBUS_ITEM_ID_REMOVE
+ * @src_id: ID to match against, used with KDBUS_ITEM_ID
+ * @rules_entry: Entry in the entry's rules list
+ */
+struct kdbus_match_rule {
+ u64 type;
+ union {
+ struct kdbus_bloom_mask bloom_mask;
+ struct {
+ char *name;
+ u64 old_id;
+ u64 new_id;
+ };
+ u64 src_id;
+ };
+ struct list_head rules_entry;
+};
+
+static void kdbus_match_rule_free(struct kdbus_match_rule *rule)
+{
+ if (!rule)
+ return;
+
+ switch (rule->type) {
+ case KDBUS_ITEM_BLOOM_MASK:
+ kfree(rule->bloom_mask.data);
+ break;
+
+ case KDBUS_ITEM_NAME:
+ case KDBUS_ITEM_NAME_ADD:
+ case KDBUS_ITEM_NAME_REMOVE:
+ case KDBUS_ITEM_NAME_CHANGE:
+ kfree(rule->name);
+ break;
+
+ case KDBUS_ITEM_ID:
+ case KDBUS_ITEM_ID_ADD:
+ case KDBUS_ITEM_ID_REMOVE:
+ break;
+
+ default:
+ BUG();
+ }
+
+ list_del(&rule->rules_entry);
+ kfree(rule);
+}
+
+static void kdbus_match_entry_free(struct kdbus_match_entry *entry)
+{
+ struct kdbus_match_rule *r, *tmp;
+
+ if (!entry)
+ return;
+
+ list_for_each_entry_safe(r, tmp, &entry->rules_list, rules_entry)
+ kdbus_match_rule_free(r);
+
+ list_del(&entry->list_entry);
+ kfree(entry);
+}
+
+/**
+ * kdbus_match_db_free() - free match db resources
+ * @mdb: The match database
+ */
+void kdbus_match_db_free(struct kdbus_match_db *mdb)
+{
+ struct kdbus_match_entry *entry, *tmp;
+
+ if (!mdb)
+ return;
+
+ list_for_each_entry_safe(entry, tmp, &mdb->entries_list, list_entry)
+ kdbus_match_entry_free(entry);
+
+ kfree(mdb);
+}
+
+/**
+ * kdbus_match_db_new() - create a new match database
+ *
+ * Return: a new kdbus_match_db on success, ERR_PTR on failure.
+ */
+struct kdbus_match_db *kdbus_match_db_new(void)
+{
+ struct kdbus_match_db *d;
+
+ d = kzalloc(sizeof(*d), GFP_KERNEL);
+ if (!d)
+ return ERR_PTR(-ENOMEM);
+
+ init_rwsem(&d->mdb_rwlock);
+ INIT_LIST_HEAD(&d->entries_list);
+
+ return d;
+}
+
+static bool kdbus_match_bloom(const struct kdbus_bloom_filter *filter,
+ const struct kdbus_bloom_mask *mask,
+ const struct kdbus_conn *conn)
+{
+ size_t n = conn->ep->bus->bloom.size / sizeof(u64);
+ const u64 *m;
+ size_t i;
+
+ /*
+ * The message's filter carries a generation identifier, the
+ * match's mask possibly carries an array of multiple generations
+ * of the mask. Select the mask with the closest match of the
+ * filter's generation.
+ */
+ m = mask->data + (min(filter->generation, mask->generations - 1) * n);
+
+ /*
+ * The message's filter contains the messages properties,
+ * the match's mask contains the properties to look for in the
+ * message. Check the mask bit field against the filter bit field,
+ * if the message possibly carries the properties the connection
+ * has subscribed to.
+ */
+ for (i = 0; i < n; i++)
+ if ((filter->data[i] & m[i]) != m[i])
+ return false;
+
+ return true;
+}
+
+static bool kdbus_match_rules(const struct kdbus_match_entry *entry,
+ struct kdbus_conn *conn_src,
+ struct kdbus_kmsg *kmsg)
+{
+ struct kdbus_match_rule *r;
+
+ if (conn_src)
+ lockdep_assert_held(&conn_src->ep->bus->name_registry->rwlock);
+
+ /*
+ * Walk all the rules and bail out immediately
+ * if any of them is unsatisfied.
+ */
+
+ list_for_each_entry(r, &entry->rules_list, rules_entry) {
+ if (conn_src) {
+ /* messages from userspace */
+
+ switch (r->type) {
+ case KDBUS_ITEM_BLOOM_MASK:
+ if (!kdbus_match_bloom(kmsg->bloom_filter,
+ &r->bloom_mask,
+ conn_src))
+ return false;
+ break;
+
+ case KDBUS_ITEM_ID:
+ if (r->src_id != conn_src->id &&
+ r->src_id != KDBUS_MATCH_ID_ANY)
+ return false;
+
+ break;
+
+ case KDBUS_ITEM_NAME:
+ if (!kdbus_conn_has_name(conn_src, r->name))
+ return false;
+
+ break;
+
+ default:
+ return false;
+ }
+ } else {
+ /* kernel notifications */
+
+ if (kmsg->notify_type != r->type)
+ return false;
+
+ switch (r->type) {
+ case KDBUS_ITEM_ID_ADD:
+ if (r->new_id != KDBUS_MATCH_ID_ANY &&
+ r->new_id != kmsg->notify_new_id)
+ return false;
+
+ break;
+
+ case KDBUS_ITEM_ID_REMOVE:
+ if (r->old_id != KDBUS_MATCH_ID_ANY &&
+ r->old_id != kmsg->notify_old_id)
+ return false;
+
+ break;
+
+ case KDBUS_ITEM_NAME_ADD:
+ case KDBUS_ITEM_NAME_CHANGE:
+ case KDBUS_ITEM_NAME_REMOVE:
+ if ((r->old_id != KDBUS_MATCH_ID_ANY &&
+ r->old_id != kmsg->notify_old_id) ||
+ (r->new_id != KDBUS_MATCH_ID_ANY &&
+ r->new_id != kmsg->notify_new_id) ||
+ (r->name && kmsg->notify_name &&
+ strcmp(r->name, kmsg->notify_name) != 0))
+ return false;
+
+ break;
+
+ default:
+ return false;
+ }
+ }
+ }
+
+ return true;
+}
+
+/**
+ * kdbus_match_db_match_kmsg() - match a kmsg object agains the database entries
+ * @mdb: The match database
+ * @conn_src: The connection object originating the message
+ * @kmsg: The kmsg to perform the match on
+ *
+ * This function will walk through all the database entries previously uploaded
+ * with kdbus_match_db_add(). As soon as any of them has an all-satisfied rule
+ * set, this function will return true.
+ *
+ * The caller must hold the registry lock of conn_src->ep->bus, in case conn_src
+ * is non-NULL.
+ *
+ * Return: true if there was a matching database entry, false otherwise.
+ */
+bool kdbus_match_db_match_kmsg(struct kdbus_match_db *mdb,
+ struct kdbus_conn *conn_src,
+ struct kdbus_kmsg *kmsg)
+{
+ struct kdbus_match_entry *entry;
+ bool matched = false;
+
+ down_read(&mdb->mdb_rwlock);
+ list_for_each_entry(entry, &mdb->entries_list, list_entry) {
+ matched = kdbus_match_rules(entry, conn_src, kmsg);
+ if (matched)
+ break;
+ }
+ up_read(&mdb->mdb_rwlock);
+
+ return matched;
+}
+
+static int kdbus_match_db_remove_unlocked(struct kdbus_match_db *mdb,
+ u64 cookie)
+{
+ struct kdbus_match_entry *entry, *tmp;
+ bool found = false;
+
+ list_for_each_entry_safe(entry, tmp, &mdb->entries_list, list_entry)
+ if (entry->cookie == cookie) {
+ kdbus_match_entry_free(entry);
+ --mdb->entries_count;
+ found = true;
+ }
+
+ return found ? 0 : -EBADSLT;
+}
+
+/**
+ * kdbus_cmd_match_add() - handle KDBUS_CMD_MATCH_ADD
+ * @conn: connection to operate on
+ * @argp: command payload
+ *
+ * One call to this function (or one ioctl(KDBUS_CMD_MATCH_ADD), respectively,
+ * adds one new database entry with n rules attached to it. Each rule is
+ * described with an kdbus_item, and an entry is considered matching if all
+ * its rules are satisfied.
+ *
+ * The items attached to a kdbus_cmd_match struct have the following mapping:
+ *
+ * KDBUS_ITEM_BLOOM_MASK: A bloom mask
+ * KDBUS_ITEM_NAME: A connection's source name
+ * KDBUS_ITEM_ID: A connection ID
+ * KDBUS_ITEM_NAME_ADD:
+ * KDBUS_ITEM_NAME_REMOVE:
+ * KDBUS_ITEM_NAME_CHANGE: Well-known name changes, carry
+ * kdbus_notify_name_change
+ * KDBUS_ITEM_ID_ADD:
+ * KDBUS_ITEM_ID_REMOVE: Connection ID changes, carry
+ * kdbus_notify_id_change
+ *
+ * For kdbus_notify_{id,name}_change structs, only the ID and name fields
+ * are looked at when adding an entry. The flags are unused.
+ *
+ * Also note that KDBUS_ITEM_BLOOM_MASK, KDBUS_ITEM_NAME and KDBUS_ITEM_ID
+ * are used to match messages from userspace, while the others apply to
+ * kernel-generated notifications.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kdbus_cmd_match_add(struct kdbus_conn *conn, void __user *argp)
+{
+ struct kdbus_match_db *mdb = conn->match_db;
+ struct kdbus_match_entry *entry = NULL;
+ struct kdbus_cmd_match *cmd;
+ struct kdbus_item *item;
+ int ret;
+
+ struct kdbus_arg argv[] = {
+ { .type = KDBUS_ITEM_NEGOTIATE },
+ { .type = KDBUS_ITEM_BLOOM_MASK, .multiple = true },
+ { .type = KDBUS_ITEM_NAME, .multiple = true },
+ { .type = KDBUS_ITEM_ID, .multiple = true },
+ { .type = KDBUS_ITEM_NAME_ADD, .multiple = true },
+ { .type = KDBUS_ITEM_NAME_REMOVE, .multiple = true },
+ { .type = KDBUS_ITEM_NAME_CHANGE, .multiple = true },
+ { .type = KDBUS_ITEM_ID_ADD, .multiple = true },
+ { .type = KDBUS_ITEM_ID_REMOVE, .multiple = true },
+ };
+ struct kdbus_args args = {
+ .allowed_flags = KDBUS_FLAG_NEGOTIATE |
+ KDBUS_MATCH_REPLACE,
+ .argv = argv,
+ .argc = ARRAY_SIZE(argv),
+ };
+
+ if (!kdbus_conn_is_ordinary(conn))
+ return -EOPNOTSUPP;
+
+ ret = kdbus_args_parse(&args, argp, &cmd);
+ if (ret != 0)
+ return ret;
+
+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+ if (!entry) {
+ ret = -ENOMEM;
+ goto exit;
+ }
+
+ entry->cookie = cmd->cookie;
+ INIT_LIST_HEAD(&entry->list_entry);
+ INIT_LIST_HEAD(&entry->rules_list);
+
+ KDBUS_ITEMS_FOREACH(item, cmd->items, KDBUS_ITEMS_SIZE(cmd, items)) {
+ struct kdbus_match_rule *rule;
+ size_t size = item->size - offsetof(struct kdbus_item, data);
+
+ rule = kzalloc(sizeof(*rule), GFP_KERNEL);
+ if (!rule) {
+ ret = -ENOMEM;
+ goto exit;
+ }
+
+ rule->type = item->type;
+ INIT_LIST_HEAD(&rule->rules_entry);
+
+ switch (item->type) {
+ case KDBUS_ITEM_BLOOM_MASK: {
+ u64 bsize = conn->ep->bus->bloom.size;
+ u64 generations;
+ u64 remainder;
+
+ generations = div64_u64_rem(size, bsize, &remainder);
+ if (size < bsize || remainder > 0) {
+ ret = -EDOM;
+ break;
+ }
+
+ rule->bloom_mask.data = kmemdup(item->data,
+ size, GFP_KERNEL);
+ if (!rule->bloom_mask.data) {
+ ret = -ENOMEM;
+ break;
+ }
+
+ rule->bloom_mask.generations = generations;
+ break;
+ }
+
+ case KDBUS_ITEM_NAME:
+ if (!kdbus_name_is_valid(item->str, false)) {
+ ret = -EINVAL;
+ break;
+ }
+
+ rule->name = kstrdup(item->str, GFP_KERNEL);
+ if (!rule->name)
+ ret = -ENOMEM;
+
+ break;
+
+ case KDBUS_ITEM_ID:
+ rule->src_id = item->id;
+ break;
+
+ case KDBUS_ITEM_NAME_ADD:
+ case KDBUS_ITEM_NAME_REMOVE:
+ case KDBUS_ITEM_NAME_CHANGE:
+ rule->old_id = item->name_change.old_id.id;
+ rule->new_id = item->name_change.new_id.id;
+
+ if (size > sizeof(struct kdbus_notify_name_change)) {
+ rule->name = kstrdup(item->name_change.name,
+ GFP_KERNEL);
+ if (!rule->name)
+ ret = -ENOMEM;
+ }
+
+ break;
+
+ case KDBUS_ITEM_ID_ADD:
+ case KDBUS_ITEM_ID_REMOVE:
+ if (item->type == KDBUS_ITEM_ID_ADD)
+ rule->new_id = item->id_change.id;
+ else
+ rule->old_id = item->id_change.id;
+
+ break;
+ }
+
+ if (ret < 0) {
+ kdbus_match_rule_free(rule);
+ goto exit;
+ }
+
+ list_add_tail(&rule->rules_entry, &entry->rules_list);
+ }
+
+ down_write(&mdb->mdb_rwlock);
+
+ /* Remove any entry that has the same cookie as the current one. */
+ if (cmd->flags & KDBUS_MATCH_REPLACE)
+ kdbus_match_db_remove_unlocked(mdb, entry->cookie);
+
+ /*
+ * If the above removal caught any entry, there will be room for the
+ * new one.
+ */
+ if (++mdb->entries_count > KDBUS_MATCH_MAX) {
+ --mdb->entries_count;
+ ret = -EMFILE;
+ } else {
+ list_add_tail(&entry->list_entry, &mdb->entries_list);
+ entry = NULL;
+ }
+
+ up_write(&mdb->mdb_rwlock);
+
+exit:
+ kdbus_match_entry_free(entry);
+ return kdbus_args_clear(&args, ret);
+}
+
+/**
+ * kdbus_cmd_match_remove() - handle KDBUS_CMD_MATCH_REMOVE
+ * @conn: connection to operate on
+ * @argp: command payload
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kdbus_cmd_match_remove(struct kdbus_conn *conn, void __user *argp)
+{
+ struct kdbus_cmd_match *cmd;
+ int ret;
+
+ struct kdbus_arg argv[] = {
+ { .type = KDBUS_ITEM_NEGOTIATE },
+ };
+ struct kdbus_args args = {
+ .allowed_flags = KDBUS_FLAG_NEGOTIATE,
+ .argv = argv,
+ .argc = ARRAY_SIZE(argv),
+ };
+
+ if (!kdbus_conn_is_ordinary(conn))
+ return -EOPNOTSUPP;
+
+ ret = kdbus_args_parse(&args, argp, &cmd);
+ if (ret != 0)
+ return ret;
+
+ down_write(&conn->match_db->mdb_rwlock);
+ ret = kdbus_match_db_remove_unlocked(conn->match_db, cmd->cookie);
+ up_write(&conn->match_db->mdb_rwlock);
+
+ return kdbus_args_clear(&args, ret);
+}
diff --git a/ipc/kdbus/match.h b/ipc/kdbus/match.h
new file mode 100644
index 000000000000..ea4292938deb
--- /dev/null
+++ b/ipc/kdbus/match.h
@@ -0,0 +1,35 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ * Copyright (C) 2014-2015 Djalal Harouni <***@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_MATCH_H
+#define __KDBUS_MATCH_H
+
+struct kdbus_conn;
+struct kdbus_kmsg;
+struct kdbus_match_db;
+
+struct kdbus_match_db *kdbus_match_db_new(void);
+void kdbus_match_db_free(struct kdbus_match_db *db);
+int kdbus_match_db_add(struct kdbus_conn *conn,
+ struct kdbus_cmd_match *cmd);
+int kdbus_match_db_remove(struct kdbus_conn *conn,
+ struct kdbus_cmd_match *cmd);
+bool kdbus_match_db_match_kmsg(struct kdbus_match_db *db,
+ struct kdbus_conn *conn_src,
+ struct kdbus_kmsg *kmsg);
+
+int kdbus_cmd_match_add(struct kdbus_conn *conn, void __user *argp);
+int kdbus_cmd_match_remove(struct kdbus_conn *conn, void __user *argp);
+
+#endif
diff --git a/ipc/kdbus/notify.c b/ipc/kdbus/notify.c
new file mode 100644
index 000000000000..e4a454222f09
--- /dev/null
+++ b/ipc/kdbus/notify.c
@@ -0,0 +1,248 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ * Copyright (C) 2014-2015 Djalal Harouni <***@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/spinlock.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "domain.h"
+#include "endpoint.h"
+#include "item.h"
+#include "message.h"
+#include "notify.h"
+
+static inline void kdbus_notify_add_tail(struct kdbus_kmsg *kmsg,
+ struct kdbus_bus *bus)
+{
+ spin_lock(&bus->notify_lock);
+ list_add_tail(&kmsg->notify_entry, &bus->notify_list);
+ spin_unlock(&bus->notify_lock);
+}
+
+static int kdbus_notify_reply(struct kdbus_bus *bus, u64 id,
+ u64 cookie, u64 msg_type)
+{
+ struct kdbus_kmsg *kmsg = NULL;
+
+ WARN_ON(id == 0);
+
+ kmsg = kdbus_kmsg_new(bus, 0);
+ if (IS_ERR(kmsg))
+ return PTR_ERR(kmsg);
+
+ /*
+ * a kernel-generated notification can only contain one
+ * struct kdbus_item, so make a shortcut here for
+ * faster lookup in the match db.
+ */
+ kmsg->notify_type = msg_type;
+ kmsg->msg.flags = KDBUS_MSG_SIGNAL;
+ kmsg->msg.dst_id = id;
+ kmsg->msg.src_id = KDBUS_SRC_ID_KERNEL;
+ kmsg->msg.payload_type = KDBUS_PAYLOAD_KERNEL;
+ kmsg->msg.cookie_reply = cookie;
+ kmsg->msg.items[0].type = msg_type;
+
+ kdbus_notify_add_tail(kmsg, bus);
+
+ return 0;
+}
+
+/**
+ * kdbus_notify_reply_timeout() - queue a timeout reply
+ * @bus: Bus which queues the messages
+ * @id: The destination's connection ID
+ * @cookie: The cookie to set in the reply.
+ *
+ * Queues a message that has a KDBUS_ITEM_REPLY_TIMEOUT item attached.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_notify_reply_timeout(struct kdbus_bus *bus, u64 id, u64 cookie)
+{
+ return kdbus_notify_reply(bus, id, cookie, KDBUS_ITEM_REPLY_TIMEOUT);
+}
+
+/**
+ * kdbus_notify_reply_dead() - queue a 'dead' reply
+ * @bus: Bus which queues the messages
+ * @id: The destination's connection ID
+ * @cookie: The cookie to set in the reply.
+ *
+ * Queues a message that has a KDBUS_ITEM_REPLY_DEAD item attached.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_notify_reply_dead(struct kdbus_bus *bus, u64 id, u64 cookie)
+{
+ return kdbus_notify_reply(bus, id, cookie, KDBUS_ITEM_REPLY_DEAD);
+}
+
+/**
+ * kdbus_notify_name_change() - queue a notification about a name owner change
+ * @bus: Bus which queues the messages
+ * @type: The type if the notification; KDBUS_ITEM_NAME_ADD,
+ * KDBUS_ITEM_NAME_CHANGE or KDBUS_ITEM_NAME_REMOVE
+ * @old_id: The id of the connection that used to own the name
+ * @new_id: The id of the new owner connection
+ * @old_flags: The flags to pass in the KDBUS_ITEM flags field for
+ * the old owner
+ * @new_flags: The flags to pass in the KDBUS_ITEM flags field for
+ * the new owner
+ * @name: The name that was removed or assigned to a new owner
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_notify_name_change(struct kdbus_bus *bus, u64 type,
+ u64 old_id, u64 new_id,
+ u64 old_flags, u64 new_flags,
+ const char *name)
+{
+ struct kdbus_kmsg *kmsg = NULL;
+ size_t name_len, extra_size;
+
+ name_len = strlen(name) + 1;
+ extra_size = sizeof(struct kdbus_notify_name_change) + name_len;
+ kmsg = kdbus_kmsg_new(bus, extra_size);
+ if (IS_ERR(kmsg))
+ return PTR_ERR(kmsg);
+
+ kmsg->msg.flags = KDBUS_MSG_SIGNAL;
+ kmsg->msg.dst_id = KDBUS_DST_ID_BROADCAST;
+ kmsg->msg.src_id = KDBUS_SRC_ID_KERNEL;
+ kmsg->msg.payload_type = KDBUS_PAYLOAD_KERNEL;
+ kmsg->notify_type = type;
+ kmsg->notify_old_id = old_id;
+ kmsg->notify_new_id = new_id;
+ kmsg->msg.items[0].type = type;
+ kmsg->msg.items[0].name_change.old_id.id = old_id;
+ kmsg->msg.items[0].name_change.old_id.flags = old_flags;
+ kmsg->msg.items[0].name_change.new_id.id = new_id;
+ kmsg->msg.items[0].name_change.new_id.flags = new_flags;
+ memcpy(kmsg->msg.items[0].name_change.name, name, name_len);
+ kmsg->notify_name = kmsg->msg.items[0].name_change.name;
+
+ kdbus_notify_add_tail(kmsg, bus);
+
+ return 0;
+}
+
+/**
+ * kdbus_notify_id_change() - queue a notification about a unique ID change
+ * @bus: Bus which queues the messages
+ * @type: The type if the notification; KDBUS_ITEM_ID_ADD or
+ * KDBUS_ITEM_ID_REMOVE
+ * @id: The id of the connection that was added or removed
+ * @flags: The flags to pass in the KDBUS_ITEM flags field
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_notify_id_change(struct kdbus_bus *bus, u64 type, u64 id, u64 flags)
+{
+ struct kdbus_kmsg *kmsg = NULL;
+
+ kmsg = kdbus_kmsg_new(bus, sizeof(struct kdbus_notify_id_change));
+ if (IS_ERR(kmsg))
+ return PTR_ERR(kmsg);
+
+ kmsg->msg.flags = KDBUS_MSG_SIGNAL;
+ kmsg->msg.dst_id = KDBUS_DST_ID_BROADCAST;
+ kmsg->msg.src_id = KDBUS_SRC_ID_KERNEL;
+ kmsg->msg.payload_type = KDBUS_PAYLOAD_KERNEL;
+ kmsg->notify_type = type;
+
+ switch (type) {
+ case KDBUS_ITEM_ID_ADD:
+ kmsg->notify_new_id = id;
+ break;
+
+ case KDBUS_ITEM_ID_REMOVE:
+ kmsg->notify_old_id = id;
+ break;
+
+ default:
+ BUG();
+ }
+
+ kmsg->msg.items[0].type = type;
+ kmsg->msg.items[0].id_change.id = id;
+ kmsg->msg.items[0].id_change.flags = flags;
+
+ kdbus_notify_add_tail(kmsg, bus);
+
+ return 0;
+}
+
+/**
+ * kdbus_notify_flush() - send a list of collected messages
+ * @bus: Bus which queues the messages
+ *
+ * The list is empty after sending the messages.
+ */
+void kdbus_notify_flush(struct kdbus_bus *bus)
+{
+ LIST_HEAD(notify_list);
+ struct kdbus_kmsg *kmsg, *tmp;
+
+ mutex_lock(&bus->notify_flush_lock);
+ down_read(&bus->name_registry->rwlock);
+
+ spin_lock(&bus->notify_lock);
+ list_splice_init(&bus->notify_list, &notify_list);
+ spin_unlock(&bus->notify_lock);
+
+ list_for_each_entry_safe(kmsg, tmp, &notify_list, notify_entry) {
+ kdbus_meta_conn_collect(kmsg->conn_meta, kmsg, NULL,
+ KDBUS_ATTACH_TIMESTAMP);
+
+ if (kmsg->msg.dst_id != KDBUS_DST_ID_BROADCAST) {
+ struct kdbus_conn *conn;
+
+ conn = kdbus_bus_find_conn_by_id(bus, kmsg->msg.dst_id);
+ if (conn) {
+ kdbus_bus_eavesdrop(bus, NULL, kmsg);
+ kdbus_conn_entry_insert(NULL, conn, kmsg, NULL);
+ kdbus_conn_unref(conn);
+ }
+ } else {
+ kdbus_bus_broadcast(bus, NULL, kmsg);
+ }
+
+ list_del(&kmsg->notify_entry);
+ kdbus_kmsg_free(kmsg);
+ }
+
+ up_read(&bus->name_registry->rwlock);
+ mutex_unlock(&bus->notify_flush_lock);
+}
+
+/**
+ * kdbus_notify_free() - free a list of collected messages
+ * @bus: Bus which queues the messages
+ */
+void kdbus_notify_free(struct kdbus_bus *bus)
+{
+ struct kdbus_kmsg *kmsg, *tmp;
+
+ list_for_each_entry_safe(kmsg, tmp, &bus->notify_list, notify_entry) {
+ list_del(&kmsg->notify_entry);
+ kdbus_kmsg_free(kmsg);
+ }
+}
diff --git a/ipc/kdbus/notify.h b/ipc/kdbus/notify.h
new file mode 100644
index 000000000000..03df464cb735
--- /dev/null
+++ b/ipc/kdbus/notify.h
@@ -0,0 +1,30 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ * Copyright (C) 2014-2015 Djalal Harouni <***@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_NOTIFY_H
+#define __KDBUS_NOTIFY_H
+
+struct kdbus_bus;
+
+int kdbus_notify_id_change(struct kdbus_bus *bus, u64 type, u64 id, u64 flags);
+int kdbus_notify_reply_timeout(struct kdbus_bus *bus, u64 id, u64 cookie);
+int kdbus_notify_reply_dead(struct kdbus_bus *bus, u64 id, u64 cookie);
+int kdbus_notify_name_change(struct kdbus_bus *bus, u64 type,
+ u64 old_id, u64 new_id,
+ u64 old_flags, u64 new_flags,
+ const char *name);
+void kdbus_notify_flush(struct kdbus_bus *bus);
+void kdbus_notify_free(struct kdbus_bus *bus);
+
+#endif
--
2.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2015-03-09 13:11:16 UTC
Permalink
From: Daniel Mack <***@zonque.org>

Add the logic to handle the following entities:

Domain:
A domain is an unamed object containing a number of buses. A
domain is automatically created when an instance of kdbusfs
is mounted, and destroyed when it is unmounted.
Every domain offers its own 'control' device node to create
buses. Domains are isolated from each other.

Bus:
A bus is a named object inside a domain. Clients exchange messages
over a bus. Multiple buses themselves have no connection to each
other; messages can only be exchanged on the same bus. The default
entry point to a bus, where clients establish the connection to, is
the "bus" device node /sys/fs/kdbus/<bus name>/bus. Common operating
system setups create one "system bus" per system, and one "user
bus" for every logged-in user. Applications or services may create
their own private named buses.

Endpoint:
An endpoint provides the device node to talk to a bus. Opening an
endpoint creates a new connection to the bus to which the endpoint
belongs. Every bus has a default endpoint called "bus". A bus can
optionally offer additional endpoints with custom names to provide
a restricted access to the same bus. Custom endpoints carry
additional policy which can be used to give sandboxed processes
only a locked-down, limited, filtered access to the same bus.

See kdbus(7), kdbus.bus(7), kdbus.endpoint(7) and kdbus.fs(7)
for more details.

Signed-off-by: Daniel Mack <***@zonque.org>
Signed-off-by: David Herrmann <***@gmail.com>
Signed-off-by: Djalal Harouni <***@opendz.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
ipc/kdbus/bus.c | 560 +++++++++++++++++++++++++++++++++++++++++++++++++++
ipc/kdbus/bus.h | 101 ++++++++++
ipc/kdbus/domain.c | 296 +++++++++++++++++++++++++++
ipc/kdbus/domain.h | 77 +++++++
ipc/kdbus/endpoint.c | 275 +++++++++++++++++++++++++
ipc/kdbus/endpoint.h | 67 ++++++
6 files changed, 1376 insertions(+)
create mode 100644 ipc/kdbus/bus.c
create mode 100644 ipc/kdbus/bus.h
create mode 100644 ipc/kdbus/domain.c
create mode 100644 ipc/kdbus/domain.h
create mode 100644 ipc/kdbus/endpoint.c
create mode 100644 ipc/kdbus/endpoint.h

diff --git a/ipc/kdbus/bus.c b/ipc/kdbus/bus.c
new file mode 100644
index 000000000000..9d0679eb59f6
--- /dev/null
+++ b/ipc/kdbus/bus.c
@@ -0,0 +1,560 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ * Copyright (C) 2014-2015 Djalal Harouni
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/fs.h>
+#include <linux/hashtable.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/random.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/uio.h>
+
+#include "bus.h"
+#include "notify.h"
+#include "connection.h"
+#include "domain.h"
+#include "endpoint.h"
+#include "handle.h"
+#include "item.h"
+#include "match.h"
+#include "message.h"
+#include "metadata.h"
+#include "names.h"
+#include "policy.h"
+#include "util.h"
+
+static void kdbus_bus_free(struct kdbus_node *node)
+{
+ struct kdbus_bus *bus = container_of(node, struct kdbus_bus, node);
+
+ WARN_ON(!list_empty(&bus->monitors_list));
+ WARN_ON(!hash_empty(bus->conn_hash));
+
+ kdbus_notify_free(bus);
+
+ kdbus_user_unref(bus->creator);
+ kdbus_name_registry_free(bus->name_registry);
+ kdbus_domain_unref(bus->domain);
+ kdbus_policy_db_clear(&bus->policy_db);
+ kdbus_meta_proc_unref(bus->creator_meta);
+ kfree(bus);
+}
+
+static void kdbus_bus_release(struct kdbus_node *node, bool was_active)
+{
+ struct kdbus_bus *bus = container_of(node, struct kdbus_bus, node);
+
+ if (was_active)
+ atomic_dec(&bus->creator->buses);
+}
+
+static struct kdbus_bus *kdbus_bus_new(struct kdbus_domain *domain,
+ const char *name,
+ struct kdbus_bloom_parameter *bloom,
+ const u64 *pattach_owner,
+ const u64 *pattach_recv,
+ u64 flags, kuid_t uid, kgid_t gid)
+{
+ struct kdbus_bus *b;
+ u64 attach_owner;
+ u64 attach_recv;
+ int ret;
+
+ if (bloom->size < 8 || bloom->size > KDBUS_BUS_BLOOM_MAX_SIZE ||
+ !KDBUS_IS_ALIGNED8(bloom->size) || bloom->n_hash < 1)
+ return ERR_PTR(-EINVAL);
+
+ ret = kdbus_sanitize_attach_flags(pattach_recv ? *pattach_recv : 0,
+ &attach_recv);
+ if (ret < 0)
+ return ERR_PTR(ret);
+
+ ret = kdbus_sanitize_attach_flags(pattach_owner ? *pattach_owner : 0,
+ &attach_owner);
+ if (ret < 0)
+ return ERR_PTR(ret);
+
+ ret = kdbus_verify_uid_prefix(name, domain->user_namespace, uid);
+ if (ret < 0)
+ return ERR_PTR(ret);
+
+ b = kzalloc(sizeof(*b), GFP_KERNEL);
+ if (!b)
+ return ERR_PTR(-ENOMEM);
+
+ kdbus_node_init(&b->node, KDBUS_NODE_BUS);
+
+ b->node.free_cb = kdbus_bus_free;
+ b->node.release_cb = kdbus_bus_release;
+ b->node.uid = uid;
+ b->node.gid = gid;
+ b->node.mode = S_IRUSR | S_IXUSR;
+
+ if (flags & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
+ b->node.mode |= S_IRGRP | S_IXGRP;
+ if (flags & KDBUS_MAKE_ACCESS_WORLD)
+ b->node.mode |= S_IROTH | S_IXOTH;
+
+ b->id = atomic64_inc_return(&domain->last_id);
+ b->bus_flags = flags;
+ b->attach_flags_req = attach_recv;
+ b->attach_flags_owner = attach_owner;
+ generate_random_uuid(b->id128);
+ b->bloom = *bloom;
+ b->domain = kdbus_domain_ref(domain);
+
+ kdbus_policy_db_init(&b->policy_db);
+
+ init_rwsem(&b->conn_rwlock);
+ hash_init(b->conn_hash);
+ INIT_LIST_HEAD(&b->monitors_list);
+
+ INIT_LIST_HEAD(&b->notify_list);
+ spin_lock_init(&b->notify_lock);
+ mutex_init(&b->notify_flush_lock);
+
+ ret = kdbus_node_link(&b->node, &domain->node, name);
+ if (ret < 0)
+ goto exit_unref;
+
+ /* cache the metadata/credentials of the creator */
+ b->creator_meta = kdbus_meta_proc_new();
+ if (IS_ERR(b->creator_meta)) {
+ ret = PTR_ERR(b->creator_meta);
+ b->creator_meta = NULL;
+ goto exit_unref;
+ }
+
+ ret = kdbus_meta_proc_collect(b->creator_meta,
+ KDBUS_ATTACH_CREDS |
+ KDBUS_ATTACH_PIDS |
+ KDBUS_ATTACH_AUXGROUPS |
+ KDBUS_ATTACH_TID_COMM |
+ KDBUS_ATTACH_PID_COMM |
+ KDBUS_ATTACH_EXE |
+ KDBUS_ATTACH_CMDLINE |
+ KDBUS_ATTACH_CGROUP |
+ KDBUS_ATTACH_CAPS |
+ KDBUS_ATTACH_SECLABEL |
+ KDBUS_ATTACH_AUDIT);
+ if (ret < 0)
+ goto exit_unref;
+
+ b->name_registry = kdbus_name_registry_new();
+ if (IS_ERR(b->name_registry)) {
+ ret = PTR_ERR(b->name_registry);
+ b->name_registry = NULL;
+ goto exit_unref;
+ }
+
+ /*
+ * Bus-limits of the creator are accounted on its real UID, just like
+ * all other per-user limits.
+ */
+ b->creator = kdbus_user_lookup(domain, current_uid());
+ if (IS_ERR(b->creator)) {
+ ret = PTR_ERR(b->creator);
+ b->creator = NULL;
+ goto exit_unref;
+ }
+
+ return b;
+
+exit_unref:
+ kdbus_node_deactivate(&b->node);
+ kdbus_node_unref(&b->node);
+ return ERR_PTR(ret);
+}
+
+/**
+ * kdbus_bus_ref() - increase the reference counter of a kdbus_bus
+ * @bus: The bus to reference
+ *
+ * Every user of a bus, except for its creator, must add a reference to the
+ * kdbus_bus using this function.
+ *
+ * Return: the bus itself
+ */
+struct kdbus_bus *kdbus_bus_ref(struct kdbus_bus *bus)
+{
+ if (bus)
+ kdbus_node_ref(&bus->node);
+ return bus;
+}
+
+/**
+ * kdbus_bus_unref() - decrease the reference counter of a kdbus_bus
+ * @bus: The bus to unref
+ *
+ * Release a reference. If the reference count drops to 0, the bus will be
+ * freed.
+ *
+ * Return: NULL
+ */
+struct kdbus_bus *kdbus_bus_unref(struct kdbus_bus *bus)
+{
+ if (bus)
+ kdbus_node_unref(&bus->node);
+ return NULL;
+}
+
+/**
+ * kdbus_bus_find_conn_by_id() - find a connection with a given id
+ * @bus: The bus to look for the connection
+ * @id: The 64-bit connection id
+ *
+ * Looks up a connection with a given id. The returned connection
+ * is ref'ed, and needs to be unref'ed by the user. Returns NULL if
+ * the connection can't be found.
+ */
+struct kdbus_conn *kdbus_bus_find_conn_by_id(struct kdbus_bus *bus, u64 id)
+{
+ struct kdbus_conn *conn, *found = NULL;
+
+ down_read(&bus->conn_rwlock);
+ hash_for_each_possible(bus->conn_hash, conn, hentry, id)
+ if (conn->id == id) {
+ found = kdbus_conn_ref(conn);
+ break;
+ }
+ up_read(&bus->conn_rwlock);
+
+ return found;
+}
+
+/**
+ * kdbus_bus_broadcast() - send a message to all subscribed connections
+ * @bus: The bus the connections are connected to
+ * @conn_src: The source connection, may be %NULL for kernel notifications
+ * @kmsg: The message to send.
+ *
+ * Send @kmsg to all connections that are currently active on the bus.
+ * Connections must still have matches installed in order to let the message
+ * pass.
+ *
+ * The caller must hold the name-registry lock of @bus.
+ */
+void kdbus_bus_broadcast(struct kdbus_bus *bus,
+ struct kdbus_conn *conn_src,
+ struct kdbus_kmsg *kmsg)
+{
+ struct kdbus_conn *conn_dst;
+ unsigned int i;
+ int ret;
+
+ lockdep_assert_held(&bus->name_registry->rwlock);
+
+ /*
+ * Make sure broadcast are queued on monitors before we send it out to
+ * anyone else. Otherwise, connections might react to broadcasts before
+ * the monitor gets the broadcast queued. In the worst case, the
+ * monitor sees a reaction to the broadcast before the broadcast itself.
+ * We don't give ordering guarantees across connections (and monitors
+ * can re-construct order via sequence numbers), but we should at least
+ * try to avoid re-ordering for monitors.
+ */
+ kdbus_bus_eavesdrop(bus, conn_src, kmsg);
+
+ down_read(&bus->conn_rwlock);
+ hash_for_each(bus->conn_hash, i, conn_dst, hentry) {
+ if (conn_dst->id == kmsg->msg.src_id)
+ continue;
+ if (!kdbus_conn_is_ordinary(conn_dst))
+ continue;
+
+ /*
+ * Check if there is a match for the kmsg object in
+ * the destination connection match db
+ */
+ if (!kdbus_match_db_match_kmsg(conn_dst->match_db, conn_src,
+ kmsg))
+ continue;
+
+ if (conn_src) {
+ u64 attach_flags;
+
+ /*
+ * Anyone can send broadcasts, as they have no
+ * destination. But a receiver needs TALK access to
+ * the sender in order to receive broadcasts.
+ */
+ if (!kdbus_conn_policy_talk(conn_dst, NULL, conn_src))
+ continue;
+
+ attach_flags = kdbus_meta_calc_attach_flags(conn_src,
+ conn_dst);
+
+ /*
+ * Keep sending messages even if we cannot acquire the
+ * requested metadata. It's up to the receiver to drop
+ * messages that lack expected metadata.
+ */
+ if (!conn_src->faked_meta)
+ kdbus_meta_proc_collect(kmsg->proc_meta,
+ attach_flags);
+ kdbus_meta_conn_collect(kmsg->conn_meta, kmsg, conn_src,
+ attach_flags);
+ } else {
+ /*
+ * Check if there is a policy db that prevents the
+ * destination connection from receiving this kernel
+ * notification
+ */
+ if (!kdbus_conn_policy_see_notification(conn_dst, NULL,
+ kmsg))
+ continue;
+ }
+
+ ret = kdbus_conn_entry_insert(conn_src, conn_dst, kmsg, NULL);
+ if (ret < 0)
+ kdbus_conn_lost_message(conn_dst);
+ }
+ up_read(&bus->conn_rwlock);
+}
+
+/**
+ * kdbus_bus_eavesdrop() - send a message to all subscribed monitors
+ * @bus: The bus the monitors are connected to
+ * @conn_src: The source connection, may be %NULL for kernel notifications
+ * @kmsg: The message to send.
+ *
+ * Send @kmsg to all monitors that are currently active on the bus. Monitors
+ * must still have matches installed in order to let the message pass.
+ *
+ * The caller must hold the name-registry lock of @bus.
+ */
+void kdbus_bus_eavesdrop(struct kdbus_bus *bus,
+ struct kdbus_conn *conn_src,
+ struct kdbus_kmsg *kmsg)
+{
+ struct kdbus_conn *conn_dst;
+ int ret;
+
+ /*
+ * Monitor connections get all messages; ignore possible errors
+ * when sending messages to monitor connections.
+ */
+
+ lockdep_assert_held(&bus->name_registry->rwlock);
+
+ down_read(&bus->conn_rwlock);
+ list_for_each_entry(conn_dst, &bus->monitors_list, monitor_entry) {
+ /*
+ * Collect metadata requested by the destination connection.
+ * Ignore errors, as receivers need to check metadata
+ * availability, anyway. So it's still better to send messages
+ * that lack data, than to skip it entirely.
+ */
+ if (conn_src) {
+ u64 attach_flags;
+
+ attach_flags = kdbus_meta_calc_attach_flags(conn_src,
+ conn_dst);
+ if (!conn_src->faked_meta)
+ kdbus_meta_proc_collect(kmsg->proc_meta,
+ attach_flags);
+ kdbus_meta_conn_collect(kmsg->conn_meta, kmsg, conn_src,
+ attach_flags);
+ }
+
+ ret = kdbus_conn_entry_insert(conn_src, conn_dst, kmsg, NULL);
+ if (ret < 0)
+ kdbus_conn_lost_message(conn_dst);
+ }
+ up_read(&bus->conn_rwlock);
+}
+
+/**
+ * kdbus_cmd_bus_make() - handle KDBUS_CMD_BUS_MAKE
+ * @domain: domain to operate on
+ * @argp: command payload
+ *
+ * Return: Newly created bus on success, ERR_PTR on failure.
+ */
+struct kdbus_bus *kdbus_cmd_bus_make(struct kdbus_domain *domain,
+ void __user *argp)
+{
+ struct kdbus_bus *bus = NULL;
+ struct kdbus_cmd *cmd;
+ struct kdbus_ep *ep = NULL;
+ int ret;
+
+ struct kdbus_arg argv[] = {
+ { .type = KDBUS_ITEM_NEGOTIATE },
+ { .type = KDBUS_ITEM_MAKE_NAME, .mandatory = true },
+ { .type = KDBUS_ITEM_BLOOM_PARAMETER, .mandatory = true },
+ { .type = KDBUS_ITEM_ATTACH_FLAGS_SEND },
+ { .type = KDBUS_ITEM_ATTACH_FLAGS_RECV },
+ };
+ struct kdbus_args args = {
+ .allowed_flags = KDBUS_FLAG_NEGOTIATE |
+ KDBUS_MAKE_ACCESS_GROUP |
+ KDBUS_MAKE_ACCESS_WORLD,
+ .argv = argv,
+ .argc = ARRAY_SIZE(argv),
+ };
+
+ ret = kdbus_args_parse(&args, argp, &cmd);
+ if (ret < 0)
+ return ERR_PTR(ret);
+ if (ret > 0)
+ return NULL;
+
+ bus = kdbus_bus_new(domain,
+ argv[1].item->str, &argv[2].item->bloom_parameter,
+ argv[3].item ? argv[3].item->data64 : NULL,
+ argv[4].item ? argv[4].item->data64 : NULL,
+ cmd->flags, current_euid(), current_egid());
+ if (IS_ERR(bus)) {
+ ret = PTR_ERR(bus);
+ bus = NULL;
+ goto exit;
+ }
+
+ if (atomic_inc_return(&bus->creator->buses) > KDBUS_USER_MAX_BUSES) {
+ atomic_dec(&bus->creator->buses);
+ ret = -EMFILE;
+ goto exit;
+ }
+
+ if (!kdbus_node_activate(&bus->node)) {
+ atomic_dec(&bus->creator->buses);
+ ret = -ESHUTDOWN;
+ goto exit;
+ }
+
+ ep = kdbus_ep_new(bus, "bus", cmd->flags, bus->node.uid, bus->node.gid,
+ false);
+ if (IS_ERR(ep)) {
+ ret = PTR_ERR(ep);
+ ep = NULL;
+ goto exit;
+ }
+
+ if (!kdbus_node_activate(&ep->node)) {
+ ret = -ESHUTDOWN;
+ goto exit;
+ }
+
+ /*
+ * Drop our own reference, effectively causing the endpoint to be
+ * deactivated and released when the parent bus is.
+ */
+ ep = kdbus_ep_unref(ep);
+
+exit:
+ ret = kdbus_args_clear(&args, ret);
+ if (ret < 0) {
+ if (ep) {
+ kdbus_node_deactivate(&ep->node);
+ kdbus_ep_unref(ep);
+ }
+ if (bus) {
+ kdbus_node_deactivate(&bus->node);
+ kdbus_bus_unref(bus);
+ }
+ return ERR_PTR(ret);
+ }
+ return bus;
+}
+
+/**
+ * kdbus_cmd_bus_creator_info() - handle KDBUS_CMD_BUS_CREATOR_INFO
+ * @conn: connection to operate on
+ * @argp: command payload
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kdbus_cmd_bus_creator_info(struct kdbus_conn *conn, void __user *argp)
+{
+ struct kdbus_cmd_info *cmd;
+ struct kdbus_bus *bus = conn->ep->bus;
+ struct kdbus_pool_slice *slice = NULL;
+ struct kdbus_item_header item_hdr;
+ struct kdbus_info info = {};
+ size_t meta_size, name_len;
+ struct kvec kvec[5];
+ u64 hdr_size = 0;
+ u64 attach_flags;
+ size_t cnt = 0;
+ int ret;
+
+ struct kdbus_arg argv[] = {
+ { .type = KDBUS_ITEM_NEGOTIATE },
+ };
+ struct kdbus_args args = {
+ .allowed_flags = KDBUS_FLAG_NEGOTIATE,
+ .argv = argv,
+ .argc = ARRAY_SIZE(argv),
+ };
+
+ ret = kdbus_args_parse(&args, argp, &cmd);
+ if (ret != 0)
+ return ret;
+
+ ret = kdbus_sanitize_attach_flags(cmd->attach_flags, &attach_flags);
+ if (ret < 0)
+ goto exit;
+
+ attach_flags &= bus->attach_flags_owner;
+
+ ret = kdbus_meta_export_prepare(bus->creator_meta, NULL,
+ &attach_flags, &meta_size);
+ if (ret < 0)
+ goto exit;
+
+ name_len = strlen(bus->node.name) + 1;
+ info.id = bus->id;
+ info.flags = bus->bus_flags;
+ item_hdr.type = KDBUS_ITEM_MAKE_NAME;
+ item_hdr.size = KDBUS_ITEM_HEADER_SIZE + name_len;
+
+ kdbus_kvec_set(&kvec[cnt++], &info, sizeof(info), &hdr_size);
+ kdbus_kvec_set(&kvec[cnt++], &item_hdr, sizeof(item_hdr), &hdr_size);
+ kdbus_kvec_set(&kvec[cnt++], bus->node.name, name_len, &hdr_size);
+ cnt += !!kdbus_kvec_pad(&kvec[cnt], &hdr_size);
+
+ slice = kdbus_pool_slice_alloc(conn->pool, hdr_size + meta_size, false);
+ if (IS_ERR(slice)) {
+ ret = PTR_ERR(slice);
+ slice = NULL;
+ goto exit;
+ }
+
+ ret = kdbus_meta_export(bus->creator_meta, NULL, attach_flags,
+ slice, hdr_size, &meta_size);
+ if (ret < 0)
+ goto exit;
+
+ info.size = hdr_size + meta_size;
+
+ ret = kdbus_pool_slice_copy_kvec(slice, 0, kvec, cnt, hdr_size);
+ if (ret < 0)
+ goto exit;
+
+ kdbus_pool_slice_publish(slice, &cmd->offset, &cmd->info_size);
+
+ if (kdbus_member_set_user(&cmd->offset, argp, typeof(*cmd), offset) ||
+ kdbus_member_set_user(&cmd->info_size, argp,
+ typeof(*cmd), info_size))
+ ret = -EFAULT;
+
+exit:
+ kdbus_pool_slice_release(slice);
+
+ return kdbus_args_clear(&args, ret);
+}
diff --git a/ipc/kdbus/bus.h b/ipc/kdbus/bus.h
new file mode 100644
index 000000000000..5bea5ef768f1
--- /dev/null
+++ b/ipc/kdbus/bus.h
@@ -0,0 +1,101 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ * Copyright (C) 2014-2015 Djalal Harouni <***@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_BUS_H
+#define __KDBUS_BUS_H
+
+#include <linux/hashtable.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/rwsem.h>
+#include <linux/spinlock.h>
+#include <uapi/linux/kdbus.h>
+
+#include "metadata.h"
+#include "names.h"
+#include "node.h"
+#include "policy.h"
+
+struct kdbus_conn;
+struct kdbus_domain;
+struct kdbus_kmsg;
+struct kdbus_user;
+
+/**
+ * struct kdbus_bus - bus in a domain
+ * @node: kdbus_node
+ * @id: ID of this bus in the domain
+ * @bus_flags: Simple pass-through flags from userspace to userspace
+ * @attach_flags_req: KDBUS_ATTACH_* flags required by connecting peers
+ * @attach_flags_owner: KDBUS_ATTACH_* flags of bus creator that other
+ * connections can see or query
+ * @id128: Unique random 128 bit ID of this bus
+ * @bloom: Bloom parameters
+ * @domain: Domain of this bus
+ * @creator: Creator of the bus
+ * @creator_meta: Meta information about the bus creator
+ * @policy_db: Policy database for this bus
+ * @name_registry: Name registry of this bus
+ * @conn_rwlock: Read/Write lock for all lists of child connections
+ * @conn_hash: Map of connection IDs
+ * @monitors_list: Connections that monitor this bus
+ * @notify_list: List of pending kernel-generated messages
+ * @notify_lock: Notification list lock
+ * @notify_flush_lock: Notification flushing lock
+ */
+struct kdbus_bus {
+ struct kdbus_node node;
+
+ /* static */
+ u64 id;
+ u64 bus_flags;
+ u64 attach_flags_req;
+ u64 attach_flags_owner;
+ u8 id128[16];
+ struct kdbus_bloom_parameter bloom;
+ struct kdbus_domain *domain;
+ struct kdbus_user *creator;
+ struct kdbus_meta_proc *creator_meta;
+
+ /* protected by own locks */
+ struct kdbus_policy_db policy_db;
+ struct kdbus_name_registry *name_registry;
+
+ /* protected by conn_rwlock */
+ struct rw_semaphore conn_rwlock;
+ DECLARE_HASHTABLE(conn_hash, 8);
+ struct list_head monitors_list;
+
+ /* protected by notify_lock */
+ struct list_head notify_list;
+ spinlock_t notify_lock;
+ struct mutex notify_flush_lock;
+};
+
+struct kdbus_bus *kdbus_bus_ref(struct kdbus_bus *bus);
+struct kdbus_bus *kdbus_bus_unref(struct kdbus_bus *bus);
+
+struct kdbus_conn *kdbus_bus_find_conn_by_id(struct kdbus_bus *bus, u64 id);
+void kdbus_bus_broadcast(struct kdbus_bus *bus,
+ struct kdbus_conn *conn_src,
+ struct kdbus_kmsg *kmsg);
+void kdbus_bus_eavesdrop(struct kdbus_bus *bus,
+ struct kdbus_conn *conn_src,
+ struct kdbus_kmsg *kmsg);
+
+struct kdbus_bus *kdbus_cmd_bus_make(struct kdbus_domain *domain,
+ void __user *argp);
+int kdbus_cmd_bus_creator_info(struct kdbus_conn *conn, void __user *argp);
+
+#endif
diff --git a/ipc/kdbus/domain.c b/ipc/kdbus/domain.c
new file mode 100644
index 000000000000..ac9f760c150d
--- /dev/null
+++ b/ipc/kdbus/domain.c
@@ -0,0 +1,296 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ * Copyright (C) 2014-2015 Djalal Harouni <***@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/fs.h>
+#include <linux/idr.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include "bus.h"
+#include "domain.h"
+#include "handle.h"
+#include "item.h"
+#include "limits.h"
+#include "util.h"
+
+static void kdbus_domain_control_free(struct kdbus_node *node)
+{
+ kfree(node);
+}
+
+static struct kdbus_node *kdbus_domain_control_new(struct kdbus_domain *domain,
+ unsigned int access)
+{
+ struct kdbus_node *node;
+ int ret;
+
+ node = kzalloc(sizeof(*node), GFP_KERNEL);
+ if (!node)
+ return ERR_PTR(-ENOMEM);
+
+ kdbus_node_init(node, KDBUS_NODE_CONTROL);
+
+ node->free_cb = kdbus_domain_control_free;
+ node->mode = domain->node.mode;
+ node->mode = S_IRUSR | S_IWUSR;
+ if (access & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
+ node->mode |= S_IRGRP | S_IWGRP;
+ if (access & KDBUS_MAKE_ACCESS_WORLD)
+ node->mode |= S_IROTH | S_IWOTH;
+
+ ret = kdbus_node_link(node, &domain->node, "control");
+ if (ret < 0)
+ goto exit_free;
+
+ return node;
+
+exit_free:
+ kdbus_node_deactivate(node);
+ kdbus_node_unref(node);
+ return ERR_PTR(ret);
+}
+
+static void kdbus_domain_free(struct kdbus_node *node)
+{
+ struct kdbus_domain *domain =
+ container_of(node, struct kdbus_domain, node);
+
+ put_user_ns(domain->user_namespace);
+ ida_destroy(&domain->user_ida);
+ idr_destroy(&domain->user_idr);
+ kfree(domain);
+}
+
+/**
+ * kdbus_domain_new() - create a new domain
+ * @access: The access mode for this node (KDBUS_MAKE_ACCESS_*)
+ *
+ * Return: a new kdbus_domain on success, ERR_PTR on failure
+ */
+struct kdbus_domain *kdbus_domain_new(unsigned int access)
+{
+ struct kdbus_domain *d;
+ int ret;
+
+ d = kzalloc(sizeof(*d), GFP_KERNEL);
+ if (!d)
+ return ERR_PTR(-ENOMEM);
+
+ kdbus_node_init(&d->node, KDBUS_NODE_DOMAIN);
+
+ d->node.free_cb = kdbus_domain_free;
+ d->node.mode = S_IRUSR | S_IXUSR;
+ if (access & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
+ d->node.mode |= S_IRGRP | S_IXGRP;
+ if (access & KDBUS_MAKE_ACCESS_WORLD)
+ d->node.mode |= S_IROTH | S_IXOTH;
+
+ mutex_init(&d->lock);
+ idr_init(&d->user_idr);
+ ida_init(&d->user_ida);
+
+ /* Pin user namespace so we can guarantee domain-unique bus * names. */
+ d->user_namespace = get_user_ns(current_user_ns());
+
+ ret = kdbus_node_link(&d->node, NULL, NULL);
+ if (ret < 0)
+ goto exit_unref;
+
+ return d;
+
+exit_unref:
+ kdbus_node_deactivate(&d->node);
+ kdbus_node_unref(&d->node);
+ return ERR_PTR(ret);
+}
+
+/**
+ * kdbus_domain_ref() - take a domain reference
+ * @domain: Domain
+ *
+ * Return: the domain itself
+ */
+struct kdbus_domain *kdbus_domain_ref(struct kdbus_domain *domain)
+{
+ if (domain)
+ kdbus_node_ref(&domain->node);
+ return domain;
+}
+
+/**
+ * kdbus_domain_unref() - drop a domain reference
+ * @domain: Domain
+ *
+ * When the last reference is dropped, the domain internal structure
+ * is freed.
+ *
+ * Return: NULL
+ */
+struct kdbus_domain *kdbus_domain_unref(struct kdbus_domain *domain)
+{
+ if (domain)
+ kdbus_node_unref(&domain->node);
+ return NULL;
+}
+
+/**
+ * kdbus_domain_populate() - populate static domain nodes
+ * @domain: domain to populate
+ * @access: KDBUS_MAKE_ACCESS_* access restrictions for new nodes
+ *
+ * Allocate and activate static sub-nodes of the given domain. This will fail if
+ * you call it on a non-active node or if the domain was already populated.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kdbus_domain_populate(struct kdbus_domain *domain, unsigned int access)
+{
+ struct kdbus_node *control;
+
+ /*
+ * Create a control-node for this domain. We drop our own reference
+ * immediately, effectively causing the node to be deactivated and
+ * released when the parent domain is.
+ */
+ control = kdbus_domain_control_new(domain, access);
+ if (IS_ERR(control))
+ return PTR_ERR(control);
+
+ kdbus_node_activate(control);
+ kdbus_node_unref(control);
+ return 0;
+}
+
+/**
+ * kdbus_user_lookup() - lookup a kdbus_user object
+ * @domain: domain of the user
+ * @uid: uid of the user; INVALID_UID for an anon user
+ *
+ * Lookup the kdbus user accounting object for the given domain. If INVALID_UID
+ * is passed, a new anonymous user is created which is private to the caller.
+ *
+ * Return: The user object is returned, ERR_PTR on failure.
+ */
+struct kdbus_user *kdbus_user_lookup(struct kdbus_domain *domain, kuid_t uid)
+{
+ struct kdbus_user *u = NULL, *old = NULL;
+ int ret;
+
+ mutex_lock(&domain->lock);
+
+ if (uid_valid(uid)) {
+ old = idr_find(&domain->user_idr, __kuid_val(uid));
+ /*
+ * If the object is about to be destroyed, ignore it and
+ * replace the slot in the IDR later on.
+ */
+ if (old && kref_get_unless_zero(&old->kref)) {
+ mutex_unlock(&domain->lock);
+ return old;
+ }
+ }
+
+ u = kzalloc(sizeof(*u), GFP_KERNEL);
+ if (!u) {
+ ret = -ENOMEM;
+ goto exit;
+ }
+
+ kref_init(&u->kref);
+ u->domain = kdbus_domain_ref(domain);
+ u->uid = uid;
+ atomic_set(&u->buses, 0);
+ atomic_set(&u->connections, 0);
+
+ if (uid_valid(uid)) {
+ if (old) {
+ idr_replace(&domain->user_idr, u, __kuid_val(uid));
+ old->uid = INVALID_UID; /* mark old as removed */
+ } else {
+ ret = idr_alloc(&domain->user_idr, u, __kuid_val(uid),
+ __kuid_val(uid) + 1, GFP_KERNEL);
+ if (ret < 0)
+ goto exit;
+ }
+ }
+
+ /*
+ * Allocate the smallest possible index for this user; used
+ * in arrays for accounting user quota in receiver queues.
+ */
+ ret = ida_simple_get(&domain->user_ida, 1, 0, GFP_KERNEL);
+ if (ret < 0)
+ goto exit;
+
+ u->id = ret;
+ mutex_unlock(&domain->lock);
+ return u;
+
+exit:
+ if (u) {
+ if (uid_valid(u->uid))
+ idr_remove(&domain->user_idr, __kuid_val(u->uid));
+ kdbus_domain_unref(u->domain);
+ kfree(u);
+ }
+ mutex_unlock(&domain->lock);
+ return ERR_PTR(ret);
+}
+
+static void __kdbus_user_free(struct kref *kref)
+{
+ struct kdbus_user *user = container_of(kref, struct kdbus_user, kref);
+
+ WARN_ON(atomic_read(&user->buses) > 0);
+ WARN_ON(atomic_read(&user->connections) > 0);
+
+ mutex_lock(&user->domain->lock);
+ ida_simple_remove(&user->domain->user_ida, user->id);
+ if (uid_valid(user->uid))
+ idr_remove(&user->domain->user_idr, __kuid_val(user->uid));
+ mutex_unlock(&user->domain->lock);
+
+ kdbus_domain_unref(user->domain);
+ kfree(user);
+}
+
+/**
+ * kdbus_user_ref() - take a user reference
+ * @u: User
+ *
+ * Return: @u is returned
+ */
+struct kdbus_user *kdbus_user_ref(struct kdbus_user *u)
+{
+ if (u)
+ kref_get(&u->kref);
+ return u;
+}
+
+/**
+ * kdbus_user_unref() - drop a user reference
+ * @u: User
+ *
+ * Return: NULL
+ */
+struct kdbus_user *kdbus_user_unref(struct kdbus_user *u)
+{
+ if (u)
+ kref_put(&u->kref, __kdbus_user_free);
+ return NULL;
+}
diff --git a/ipc/kdbus/domain.h b/ipc/kdbus/domain.h
new file mode 100644
index 000000000000..447a2bd4d972
--- /dev/null
+++ b/ipc/kdbus/domain.h
@@ -0,0 +1,77 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ * Copyright (C) 2014-2015 Djalal Harouni <***@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_DOMAIN_H
+#define __KDBUS_DOMAIN_H
+
+#include <linux/fs.h>
+#include <linux/idr.h>
+#include <linux/kref.h>
+#include <linux/user_namespace.h>
+
+#include "node.h"
+
+/**
+ * struct kdbus_domain - domain for buses
+ * @node: Underlying API node
+ * @lock: Domain data lock
+ * @last_id: Last used object id
+ * @user_idr: Set of all users indexed by UID
+ * @user_ida: Set of all users to compute small indices
+ * @user_namespace: User namespace, pinned at creation time
+ * @dentry: Root dentry of VFS mount (don't use outside of kdbusfs)
+ */
+struct kdbus_domain {
+ struct kdbus_node node;
+ struct mutex lock;
+ atomic64_t last_id;
+ struct idr user_idr;
+ struct ida user_ida;
+ struct user_namespace *user_namespace;
+ struct dentry *dentry;
+};
+
+/**
+ * struct kdbus_user - resource accounting for users
+ * @kref: Reference counter
+ * @domain: Domain of the user
+ * @id: Index of this user
+ * @uid: UID of the user
+ * @buses: Number of buses the user has created
+ * @connections: Number of connections the user has created
+ */
+struct kdbus_user {
+ struct kref kref;
+ struct kdbus_domain *domain;
+ unsigned int id;
+ kuid_t uid;
+ atomic_t buses;
+ atomic_t connections;
+};
+
+#define kdbus_domain_from_node(_node) \
+ container_of((_node), struct kdbus_domain, node)
+
+struct kdbus_domain *kdbus_domain_new(unsigned int access);
+struct kdbus_domain *kdbus_domain_ref(struct kdbus_domain *domain);
+struct kdbus_domain *kdbus_domain_unref(struct kdbus_domain *domain);
+int kdbus_domain_populate(struct kdbus_domain *domain, unsigned int access);
+
+#define KDBUS_USER_KERNEL_ID 0 /* ID 0 is reserved for kernel accounting */
+
+struct kdbus_user *kdbus_user_lookup(struct kdbus_domain *domain, kuid_t uid);
+struct kdbus_user *kdbus_user_ref(struct kdbus_user *u);
+struct kdbus_user *kdbus_user_unref(struct kdbus_user *u);
+
+#endif
diff --git a/ipc/kdbus/endpoint.c b/ipc/kdbus/endpoint.c
new file mode 100644
index 000000000000..174d274b113e
--- /dev/null
+++ b/ipc/kdbus/endpoint.c
@@ -0,0 +1,275 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ * Copyright (C) 2014-2015 Djalal Harouni <***@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/fs.h>
+#include <linux/idr.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/uio.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "domain.h"
+#include "endpoint.h"
+#include "handle.h"
+#include "item.h"
+#include "message.h"
+#include "policy.h"
+
+static void kdbus_ep_free(struct kdbus_node *node)
+{
+ struct kdbus_ep *ep = container_of(node, struct kdbus_ep, node);
+
+ WARN_ON(!list_empty(&ep->conn_list));
+
+ kdbus_policy_db_clear(&ep->policy_db);
+ kdbus_bus_unref(ep->bus);
+ kdbus_user_unref(ep->user);
+ kfree(ep);
+}
+
+static void kdbus_ep_release(struct kdbus_node *node, bool was_active)
+{
+ struct kdbus_ep *ep = container_of(node, struct kdbus_ep, node);
+
+ /* disconnect all connections to this endpoint */
+ for (;;) {
+ struct kdbus_conn *conn;
+
+ mutex_lock(&ep->lock);
+ conn = list_first_entry_or_null(&ep->conn_list,
+ struct kdbus_conn,
+ ep_entry);
+ if (!conn) {
+ mutex_unlock(&ep->lock);
+ break;
+ }
+
+ /* take reference, release lock, disconnect without lock */
+ kdbus_conn_ref(conn);
+ mutex_unlock(&ep->lock);
+
+ kdbus_conn_disconnect(conn, false);
+ kdbus_conn_unref(conn);
+ }
+}
+
+/**
+ * kdbus_ep_new() - create a new endpoint
+ * @bus: The bus this endpoint will be created for
+ * @name: The name of the endpoint
+ * @access: The access flags for this node (KDBUS_MAKE_ACCESS_*)
+ * @uid: The uid of the node
+ * @gid: The gid of the node
+ * @is_custom: Whether this is a custom endpoint
+ *
+ * This function will create a new enpoint with the given
+ * name and properties for a given bus.
+ *
+ * Return: a new kdbus_ep on success, ERR_PTR on failure.
+ */
+struct kdbus_ep *kdbus_ep_new(struct kdbus_bus *bus, const char *name,
+ unsigned int access, kuid_t uid, kgid_t gid,
+ bool is_custom)
+{
+ struct kdbus_ep *e;
+ int ret;
+
+ /*
+ * Validate only custom endpoints names, default endpoints
+ * with a "bus" name are created when the bus is created
+ */
+ if (is_custom) {
+ ret = kdbus_verify_uid_prefix(name, bus->domain->user_namespace,
+ uid);
+ if (ret < 0)
+ return ERR_PTR(ret);
+ }
+
+ e = kzalloc(sizeof(*e), GFP_KERNEL);
+ if (!e)
+ return ERR_PTR(-ENOMEM);
+
+ kdbus_node_init(&e->node, KDBUS_NODE_ENDPOINT);
+
+ e->node.free_cb = kdbus_ep_free;
+ e->node.release_cb = kdbus_ep_release;
+ e->node.uid = uid;
+ e->node.gid = gid;
+ e->node.mode = S_IRUSR | S_IWUSR;
+ if (access & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
+ e->node.mode |= S_IRGRP | S_IWGRP;
+ if (access & KDBUS_MAKE_ACCESS_WORLD)
+ e->node.mode |= S_IROTH | S_IWOTH;
+
+ mutex_init(&e->lock);
+ INIT_LIST_HEAD(&e->conn_list);
+ kdbus_policy_db_init(&e->policy_db);
+ e->bus = kdbus_bus_ref(bus);
+
+ ret = kdbus_node_link(&e->node, &bus->node, name);
+ if (ret < 0)
+ goto exit_unref;
+
+ /*
+ * Transactions on custom endpoints are never accounted on the global
+ * user limits. Instead, for each custom endpoint, we create a custom,
+ * unique user, which all transactions are accounted on. Regardless of
+ * the user using that endpoint, it is always accounted on the same
+ * user-object. This budget is not shared with ordinary users on
+ * non-custom endpoints.
+ */
+ if (is_custom) {
+ e->user = kdbus_user_lookup(bus->domain, INVALID_UID);
+ if (IS_ERR(e->user)) {
+ ret = PTR_ERR(e->user);
+ e->user = NULL;
+ goto exit_unref;
+ }
+ }
+
+ return e;
+
+exit_unref:
+ kdbus_node_deactivate(&e->node);
+ kdbus_node_unref(&e->node);
+ return ERR_PTR(ret);
+}
+
+/**
+ * kdbus_ep_ref() - increase the reference counter of a kdbus_ep
+ * @ep: The endpoint to reference
+ *
+ * Every user of an endpoint, except for its creator, must add a reference to
+ * the kdbus_ep instance using this function.
+ *
+ * Return: the ep itself
+ */
+struct kdbus_ep *kdbus_ep_ref(struct kdbus_ep *ep)
+{
+ if (ep)
+ kdbus_node_ref(&ep->node);
+ return ep;
+}
+
+/**
+ * kdbus_ep_unref() - decrease the reference counter of a kdbus_ep
+ * @ep: The ep to unref
+ *
+ * Release a reference. If the reference count drops to 0, the ep will be
+ * freed.
+ *
+ * Return: NULL
+ */
+struct kdbus_ep *kdbus_ep_unref(struct kdbus_ep *ep)
+{
+ if (ep)
+ kdbus_node_unref(&ep->node);
+ return NULL;
+}
+
+/**
+ * kdbus_cmd_ep_make() - handle KDBUS_CMD_ENDPOINT_MAKE
+ * @bus: bus to operate on
+ * @argp: command payload
+ *
+ * Return: Newly created endpoint on success, ERR_PTR on failure.
+ */
+struct kdbus_ep *kdbus_cmd_ep_make(struct kdbus_bus *bus, void __user *argp)
+{
+ const char *item_make_name;
+ struct kdbus_ep *ep = NULL;
+ struct kdbus_cmd *cmd;
+ int ret;
+
+ struct kdbus_arg argv[] = {
+ { .type = KDBUS_ITEM_NEGOTIATE },
+ { .type = KDBUS_ITEM_MAKE_NAME, .mandatory = true },
+ };
+ struct kdbus_args args = {
+ .allowed_flags = KDBUS_FLAG_NEGOTIATE |
+ KDBUS_MAKE_ACCESS_GROUP |
+ KDBUS_MAKE_ACCESS_WORLD,
+ .argv = argv,
+ .argc = ARRAY_SIZE(argv),
+ };
+
+ ret = kdbus_args_parse(&args, argp, &cmd);
+ if (ret < 0)
+ return ERR_PTR(ret);
+ if (ret > 0)
+ return NULL;
+
+ item_make_name = argv[1].item->str;
+
+ ep = kdbus_ep_new(bus, item_make_name, cmd->flags,
+ current_euid(), current_egid(), true);
+ if (IS_ERR(ep)) {
+ ret = PTR_ERR(ep);
+ ep = NULL;
+ goto exit;
+ }
+
+ if (!kdbus_node_activate(&ep->node)) {
+ ret = -ESHUTDOWN;
+ goto exit;
+ }
+
+exit:
+ ret = kdbus_args_clear(&args, ret);
+ if (ret < 0) {
+ if (ep) {
+ kdbus_node_deactivate(&ep->node);
+ kdbus_ep_unref(ep);
+ }
+ return ERR_PTR(ret);
+ }
+ return ep;
+}
+
+/**
+ * kdbus_cmd_ep_update() - handle KDBUS_CMD_ENDPOINT_UPDATE
+ * @ep: endpoint to operate on
+ * @argp: command payload
+ *
+ * Return: Newly created endpoint on success, ERR_PTR on failure.
+ */
+int kdbus_cmd_ep_update(struct kdbus_ep *ep, void __user *argp)
+{
+ struct kdbus_cmd *cmd;
+ int ret;
+
+ struct kdbus_arg argv[] = {
+ { .type = KDBUS_ITEM_NEGOTIATE },
+ { .type = KDBUS_ITEM_NAME, .multiple = true },
+ { .type = KDBUS_ITEM_POLICY_ACCESS, .multiple = true },
+ };
+ struct kdbus_args args = {
+ .allowed_flags = KDBUS_FLAG_NEGOTIATE,
+ .argv = argv,
+ .argc = ARRAY_SIZE(argv),
+ };
+
+ ret = kdbus_args_parse(&args, argp, &cmd);
+ if (ret != 0)
+ return ret;
+
+ ret = kdbus_policy_set(&ep->policy_db, args.items, args.items_size,
+ 0, true, ep);
+ return kdbus_args_clear(&args, ret);
+}
diff --git a/ipc/kdbus/endpoint.h b/ipc/kdbus/endpoint.h
new file mode 100644
index 000000000000..d31954bfba2c
--- /dev/null
+++ b/ipc/kdbus/endpoint.h
@@ -0,0 +1,67 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ * Copyright (C) 2014-2015 Djalal Harouni <***@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_ENDPOINT_H
+#define __KDBUS_ENDPOINT_H
+
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/uidgid.h>
+#include "node.h"
+#include "policy.h"
+
+struct kdbus_bus;
+struct kdbus_user;
+
+/**
+ * struct kdbus_ep - enpoint to access a bus
+ * @node: The kdbus node
+ * @lock: Endpoint data lock
+ * @bus: Bus behind this endpoint
+ * @user: Custom enpoints account against an anonymous user
+ * @policy_db: Uploaded policy
+ * @conn_list: Connections of this endpoint
+ *
+ * An enpoint offers access to a bus; the default endpoint node name is "bus".
+ * Additional custom endpoints to the same bus can be created and they can
+ * carry their own policies/filters.
+ */
+struct kdbus_ep {
+ struct kdbus_node node;
+ struct mutex lock;
+
+ /* static */
+ struct kdbus_bus *bus;
+ struct kdbus_user *user;
+
+ /* protected by own locks */
+ struct kdbus_policy_db policy_db;
+
+ /* protected by ep->lock */
+ struct list_head conn_list;
+};
+
+#define kdbus_ep_from_node(_node) \
+ container_of((_node), struct kdbus_ep, node)
+
+struct kdbus_ep *kdbus_ep_new(struct kdbus_bus *bus, const char *name,
+ unsigned int access, kuid_t uid, kgid_t gid,
+ bool policy);
+struct kdbus_ep *kdbus_ep_ref(struct kdbus_ep *ep);
+struct kdbus_ep *kdbus_ep_unref(struct kdbus_ep *ep);
+
+struct kdbus_ep *kdbus_cmd_ep_make(struct kdbus_bus *bus, void __user *argp);
+int kdbus_cmd_ep_update(struct kdbus_ep *ep, void __user *argp);
+
+#endif
--
2.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2015-03-09 13:11:46 UTC
Permalink
From: Daniel Mack <***@zonque.org>

This patch hooks up the build system to actually compile the files
added by previous patches. It also adds an entry to MAINTAINERS to
direct people to Greg KH, David Herrmann, Djalal Harouni and me for
questions and patches.

Signed-off-by: Daniel Mack <***@zonque.org>
Signed-off-by: David Herrmann <***@gmail.com>
Signed-off-by: Djalal Harouni <***@opendz.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
MAINTAINERS | 13 +++++++++++++
init/Kconfig | 12 ++++++++++++
ipc/Makefile | 2 +-
ipc/kdbus/Makefile | 22 ++++++++++++++++++++++
4 files changed, 48 insertions(+), 1 deletion(-)
create mode 100644 ipc/kdbus/Makefile

diff --git a/MAINTAINERS b/MAINTAINERS
index 6239a305dff0..e924246fb545 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5503,6 +5503,19 @@ S: Maintained
F: Documentation/kbuild/kconfig-language.txt
F: scripts/kconfig/

+KDBUS
+M: Greg Kroah-Hartman <***@linuxfoundation.org>
+M: Daniel Mack <***@zonque.org>
+M: David Herrmann <***@googlemail.com>
+M: Djalal Harouni <***@opendz.org>
+L: linux-***@vger.kernel.org
+S: Maintained
+F: ipc/kdbus/*
+F: samples/kdbus/*
+F: Documentation/kdbus/*
+F: include/uapi/linux/kdbus.h
+F: tools/testing/selftests/kdbus/
+
KDUMP
M: Vivek Goyal <***@redhat.com>
M: Haren Myneni <***@us.ibm.com>
diff --git a/init/Kconfig b/init/Kconfig
index f5dbc6d4261b..a7b462e7d647 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -261,6 +261,18 @@ config POSIX_MQUEUE_SYSCTL
depends on SYSCTL
default y

+config KDBUS
+ tristate "kdbus interprocess communication"
+ depends on TMPFS
+ help
+ D-Bus is a system for low-latency, low-overhead, easy to use
+ interprocess communication (IPC).
+
+ See Documentation/kdbus.txt
+
+ To compile this driver as a module, choose M here: the
+ module will be called kdbus.
+
config CROSS_MEMORY_ATTACH
bool "Enable process_vm_readv/writev syscalls"
depends on MMU
diff --git a/ipc/Makefile b/ipc/Makefile
index 86c7300ecdf5..68ec4167d11b 100644
--- a/ipc/Makefile
+++ b/ipc/Makefile
@@ -9,4 +9,4 @@ obj_mq-$(CONFIG_COMPAT) += compat_mq.o
obj-$(CONFIG_POSIX_MQUEUE) += mqueue.o msgutil.o $(obj_mq-y)
obj-$(CONFIG_IPC_NS) += namespace.o
obj-$(CONFIG_POSIX_MQUEUE_SYSCTL) += mq_sysctl.o
-
+obj-$(CONFIG_KDBUS) += kdbus/
diff --git a/ipc/kdbus/Makefile b/ipc/kdbus/Makefile
new file mode 100644
index 000000000000..7ee9271e1449
--- /dev/null
+++ b/ipc/kdbus/Makefile
@@ -0,0 +1,22 @@
+kdbus-y := \
+ bus.o \
+ connection.o \
+ endpoint.o \
+ fs.o \
+ handle.o \
+ item.o \
+ main.o \
+ match.o \
+ message.o \
+ metadata.o \
+ names.o \
+ node.o \
+ notify.o \
+ domain.o \
+ policy.o \
+ pool.o \
+ reply.o \
+ queue.o \
+ util.o
+
+obj-$(CONFIG_KDBUS) += kdbus.o
--
2.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jiri Slaby
2015-03-24 15:15:52 UTC
Permalink
Post by Greg Kroah-Hartman
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -261,6 +261,18 @@ config POSIX_MQUEUE_SYSCTL
depends on SYSCTL
default y
+config KDBUS
+ tristate "kdbus interprocess communication"
+ depends on TMPFS
+ help
+ D-Bus is a system for low-latency, low-overhead, easy to use
+ interprocess communication (IPC).
+
+ See Documentation/kdbus.txt
This one is missing from the series.
Post by Greg Kroah-Hartman
+ To compile this driver as a module, choose M here: the
+ module will be called kdbus.
It would be nice to also know who actually needs this... The old good
'if you have an ordinary machine, select m here'.

thanks,
--
js
suse labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Daniel Mack
2015-03-24 18:52:14 UTC
Permalink
Drop a left-over from the times when documentation lived in a
simple text file, which is no longer the case. Mention the
auto-generated man-pages and HTML files instead.

Reported-by: Jiri Slaby <***@suse.cz>
Signed-off-by: Daniel Mack <***@zonque.org>
---

Thanks for reporting this, Jiri!


init/Kconfig | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index a7b462e..6bda631 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -268,10 +268,11 @@ config KDBUS
D-Bus is a system for low-latency, low-overhead, easy to use
interprocess communication (IPC).

- See Documentation/kdbus.txt
+ See the man-pages and HTML files in Documentation/kdbus/
+ that are generated by 'make mandocs' and 'make htmldocs'.

- To compile this driver as a module, choose M here: the
- module will be called kdbus.
+ If you have an ordinary machine, select M here. The module
+ will be called kdbus.

config CROSS_MEMORY_ATTACH
bool "Enable process_vm_readv/writev syscalls"
--
2.3.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
David Herrmann
2015-03-25 01:05:44 UTC
Permalink
Hi
Post by Daniel Mack
Drop a left-over from the times when documentation lived in a
simple text file, which is no longer the case. Mention the
auto-generated man-pages and HTML files instead.
---
Thanks for reporting this, Jiri!
Reviewed-by: David Herrmann <***@gmail.com>

Thanks
David
Post by Daniel Mack
init/Kconfig | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig
index a7b462e..6bda631 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -268,10 +268,11 @@ config KDBUS
D-Bus is a system for low-latency, low-overhead, easy to use
interprocess communication (IPC).
- See Documentation/kdbus.txt
+ See the man-pages and HTML files in Documentation/kdbus/
+ that are generated by 'make mandocs' and 'make htmldocs'.
- To compile this driver as a module, choose M here: the
- module will be called kdbus.
+ If you have an ordinary machine, select M here. The module
+ will be called kdbus.
config CROSS_MEMORY_ATTACH
bool "Enable process_vm_readv/writev syscalls"
--
2.3.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg KH
2015-03-25 09:51:32 UTC
Permalink
Post by Daniel Mack
Drop a left-over from the times when documentation lived in a
simple text file, which is no longer the case. Mention the
auto-generated man-pages and HTML files instead.
---
Thanks for reporting this, Jiri!
Applied, thanks.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2015-03-09 13:12:04 UTC
Permalink
From: Daniel Mack <***@zonque.org>

Provide a walk-through example that explains how to use the low-level
ioctl API that kdbus offers. This example is meant to be useful for
developers who want to gain a in-depth understanding of how the kdbus
API works by reading a well-documented real-world example.

This program computes prime-numbers based on the sieve of Eratosthenes.
The master sets up a shared memory region and spawns workers which clear
out the non-primes. The master reacts to keyboard input and to
client-requests to control what each worker does. Note that this is in
no way meant as efficient way to compute primes. It should only serve as
example how a master/worker concept can be implemented with kdbus used
as control messages.

The main process is called the 'master'. It creates a new, private bus
which will be used between the master and its workers to communicate.
The master then spawns a fixed number of workers. Whenever a worker dies
(detected via SIGCHLD), the master spawns a new worker. When done, the
master waits for all workers to exit, prints a status report and exits
itself.

The master process does *not* keep track of its workers. Instead, this
example implements a PULL model. That is, the master acquires a
well-known name on the bus which each worker uses to request tasks from
the master. If there are no more tasks, the master will return an empty
task-list, which casues a worker to exit immediately.

As tasks can be computationally expensive, we support cancellation.
Whenever the master process is interrupted, it will drop its well-known
name on the bus. This causes kdbus to broadcast a name-change
notification. The workers check for broadcast messages regularly and
will exit if they receive one.

Signed-off-by: Daniel Mack <***@zonque.org>
Signed-off-by: David Herrmann <***@gmail.com>
Signed-off-by: Djalal Harouni <***@opendz.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
samples/Makefile | 3 +-
samples/kdbus/.gitignore | 1 +
samples/kdbus/Makefile | 10 +
samples/kdbus/kdbus-api.h | 114 ++++
samples/kdbus/kdbus-workers.c | 1326 +++++++++++++++++++++++++++++++++++++++++
5 files changed, 1453 insertions(+), 1 deletion(-)
create mode 100644 samples/kdbus/.gitignore
create mode 100644 samples/kdbus/Makefile
create mode 100644 samples/kdbus/kdbus-api.h
create mode 100644 samples/kdbus/kdbus-workers.c

diff --git a/samples/Makefile b/samples/Makefile
index f00257bcc5a7..f0ad51e5b342 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -1,4 +1,5 @@
# Makefile for Linux samples code

obj-$(CONFIG_SAMPLES) += kobject/ kprobes/ trace_events/ livepatch/ \
- hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ seccomp/
+ hw_breakpoint/ kfifo/ kdb/ kdbus/ hidraw/ rpmsg/ \
+ seccomp/
diff --git a/samples/kdbus/.gitignore b/samples/kdbus/.gitignore
new file mode 100644
index 000000000000..ee07d9857086
--- /dev/null
+++ b/samples/kdbus/.gitignore
@@ -0,0 +1 @@
+kdbus-workers
diff --git a/samples/kdbus/Makefile b/samples/kdbus/Makefile
new file mode 100644
index 000000000000..d009025369f4
--- /dev/null
+++ b/samples/kdbus/Makefile
@@ -0,0 +1,10 @@
+# kbuild trick to avoid linker error. Can be omitted if a module is built.
+obj- := dummy.o
+
+hostprogs-y += kdbus-workers
+
+always := $(hostprogs-y)
+
+HOSTCFLAGS_kdbus-workers.o += \
+ -I$(objtree)/usr/include/ \
+ -I$(objtree)/include/uapi/
diff --git a/samples/kdbus/kdbus-api.h b/samples/kdbus/kdbus-api.h
new file mode 100644
index 000000000000..5ed5907c5cb4
--- /dev/null
+++ b/samples/kdbus/kdbus-api.h
@@ -0,0 +1,114 @@
+#ifndef KDBUS_API_H
+#define KDBUS_API_H
+
+#include <sys/ioctl.h>
+#include <linux/kdbus.h>
+
+#define KDBUS_ALIGN8(l) (((l) + 7) & ~7)
+#define KDBUS_ITEM_HEADER_SIZE offsetof(struct kdbus_item, data)
+#define KDBUS_ITEM_SIZE(s) KDBUS_ALIGN8((s) + KDBUS_ITEM_HEADER_SIZE)
+#define KDBUS_ITEM_NEXT(item) \
+ (typeof(item))(((uint8_t *)item) + KDBUS_ALIGN8((item)->size))
+#define KDBUS_FOREACH(iter, first, _size) \
+ for (iter = (first); \
+ ((uint8_t *)(iter) < (uint8_t *)(first) + (_size)) && \
+ ((uint8_t *)(iter) >= (uint8_t *)(first)); \
+ iter = (void*)(((uint8_t *)iter) + KDBUS_ALIGN8((iter)->size)))
+
+static inline int kdbus_cmd_bus_make(int control_fd, struct kdbus_cmd *cmd)
+{
+ int ret = ioctl(control_fd, KDBUS_CMD_BUS_MAKE, cmd);
+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
+}
+
+static inline int kdbus_cmd_endpoint_make(int bus_fd, struct kdbus_cmd *cmd)
+{
+ int ret = ioctl(bus_fd, KDBUS_CMD_ENDPOINT_MAKE, cmd);
+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
+}
+
+static inline int kdbus_cmd_endpoint_update(int ep_fd, struct kdbus_cmd *cmd)
+{
+ int ret = ioctl(ep_fd, KDBUS_CMD_ENDPOINT_UPDATE, cmd);
+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
+}
+
+static inline int kdbus_cmd_hello(int bus_fd, struct kdbus_cmd_hello *cmd)
+{
+ int ret = ioctl(bus_fd, KDBUS_CMD_HELLO, cmd);
+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
+}
+
+static inline int kdbus_cmd_update(int fd, struct kdbus_cmd *cmd)
+{
+ int ret = ioctl(fd, KDBUS_CMD_UPDATE, cmd);
+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
+}
+
+static inline int kdbus_cmd_byebye(int conn_fd, struct kdbus_cmd *cmd)
+{
+ int ret = ioctl(conn_fd, KDBUS_CMD_BYEBYE, cmd);
+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
+}
+
+static inline int kdbus_cmd_free(int conn_fd, struct kdbus_cmd_free *cmd)
+{
+ int ret = ioctl(conn_fd, KDBUS_CMD_FREE, cmd);
+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
+}
+
+static inline int kdbus_cmd_conn_info(int conn_fd, struct kdbus_cmd_info *cmd)
+{
+ int ret = ioctl(conn_fd, KDBUS_CMD_CONN_INFO, cmd);
+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
+}
+
+static inline int kdbus_cmd_bus_creator_info(int conn_fd, struct kdbus_cmd_info *cmd)
+{
+ int ret = ioctl(conn_fd, KDBUS_CMD_BUS_CREATOR_INFO, cmd);
+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
+}
+
+static inline int kdbus_cmd_list(int fd, struct kdbus_cmd_list *cmd)
+{
+ int ret = ioctl(fd, KDBUS_CMD_LIST, cmd);
+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
+}
+
+static inline int kdbus_cmd_send(int conn_fd, struct kdbus_cmd_send *cmd)
+{
+ int ret = ioctl(conn_fd, KDBUS_CMD_SEND, cmd);
+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
+}
+
+static inline int kdbus_cmd_recv(int conn_fd, struct kdbus_cmd_recv *cmd)
+{
+ int ret = ioctl(conn_fd, KDBUS_CMD_RECV, cmd);
+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
+}
+
+static inline int kdbus_cmd_name_acquire(int conn_fd, struct kdbus_cmd *cmd)
+{
+ int ret = ioctl(conn_fd, KDBUS_CMD_NAME_ACQUIRE, cmd);
+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
+}
+
+static inline int kdbus_cmd_name_release(int conn_fd, struct kdbus_cmd *cmd)
+{
+ int ret = ioctl(conn_fd, KDBUS_CMD_NAME_RELEASE, cmd);
+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
+}
+
+static inline int kdbus_cmd_match_add(int conn_fd, struct kdbus_cmd_match *cmd)
+{
+ int ret = ioctl(conn_fd, KDBUS_CMD_MATCH_ADD, cmd);
+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
+}
+
+static inline int kdbus_cmd_match_remove(int conn_fd, struct kdbus_cmd_match *cmd)
+{
+ int ret = ioctl(conn_fd, KDBUS_CMD_MATCH_REMOVE, cmd);
+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
+}
+
+#endif /* KDBUS_API_H */
diff --git a/samples/kdbus/kdbus-workers.c b/samples/kdbus/kdbus-workers.c
new file mode 100644
index 000000000000..d1d8f7a7697b
--- /dev/null
+++ b/samples/kdbus/kdbus-workers.c
@@ -0,0 +1,1326 @@
+/*
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+/*
+ * Example: Workers
+ * This program computes prime-numbers based on the sieve of Eratosthenes. The
+ * master sets up a shared memory region and spawns workers which clear out the
+ * non-primes. The master reacts to keyboard input and to client-requests to
+ * control what each worker does. Note that this is in no way meant as efficient
+ * way to compute primes. It should only serve as example how a master/worker
+ * concept can be implemented with kdbus used as control messages.
+ *
+ * The main process is called the 'master'. It creates a new, private bus which
+ * will be used between the master and its workers to communicate. The master
+ * then spawns a fixed number of workers. Whenever a worker dies (detected via
+ * SIGCHLD), the master spawns a new worker. When done, the master waits for all
+ * workers to exit, prints a status report and exits itself.
+ *
+ * The master process does *not* keep track of its workers. Instead, this
+ * example implements a PULL model. That is, the master acquires a well-known
+ * name on the bus which each worker uses to request tasks from the master. If
+ * there are no more tasks, the master will return an empty task-list, which
+ * casues a worker to exit immediately.
+ *
+ * As tasks can be computationally expensive, we support cancellation. Whenever
+ * the master process is interrupted, it will drop its well-known name on the
+ * bus. This causes kdbus to broadcast a name-change notification. The workers
+ * check for broadcast messages regularly and will exit if they receive one.
+ *
+ * This example exists of 4 objects:
+ * * master: The master object contains the context of the master process. This
+ * process manages the prime-context, spawns workers and assigns
+ * prime-ranges to each worker to compute.
+ * The master itself does not do any prime-computations itself.
+ * * child: The child object contains the context of a worker. It inherits the
+ * prime context from its parent (the master) and then creates a new
+ * bus context to request prime-ranges to compute.
+ * * prime: The "prime" object is used to abstract how we compute primes. When
+ * allocated, it prepares a memory region to hold 1 bit for each
+ * natural number up to a fixed maximum ('MAX_PRIMES').
+ * The memory region is backed by a memfd which we share between
+ * processes. Each worker now gets assigned a range of natural
+ * numbers which it clears multiples of off the memory region. The
+ * master process is responsible of distributing all natural numbers
+ * up to the fixed maximum to its workers.
+ * * bus: The bus object is an abstraction of the kdbus API. It is pretty
+ * straightfoward and only manages the connection-fd plus the
+ * memory-mapped pool in a single object.
+ *
+ * This example is in reversed order, which should make it easier to read
+ * top-down, but requires some forward-declarations. Just ignore those.
+ */
+
+#include <ctype.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/memfd.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/poll.h>
+#include <sys/signalfd.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <sys/wait.h>
+#include <time.h>
+#include <unistd.h>
+#include "kdbus-api.h"
+
+/* FORWARD DECLARATIONS */
+
+#define POOL_SIZE (16 * 1024 * 1024)
+#define MAX_PRIMES (2UL << 24)
+#define WORKER_COUNT (16)
+#define PRIME_STEPS (65536 * 4)
+
+static const char *arg_busname = "example-workers";
+static const char *arg_modname = "kdbus";
+static const char *arg_master = "org.freedesktop.master";
+
+static int err_assert(int r_errno, const char *msg, const char *func, int line,
+ const char *file)
+{
+ r_errno = (r_errno != 0) ? -abs(r_errno) : -EFAULT;
+ if (r_errno < 0) {
+ errno = -r_errno;
+ fprintf(stderr, "ERR: %s: %m (%s:%d in %s)\n",
+ msg, func, line, file);
+ }
+ return r_errno;
+}
+
+#define err_r(_r, _msg) err_assert((_r), (_msg), __func__, __LINE__, __FILE__)
+#define err(_msg) err_r(errno, (_msg))
+
+struct prime;
+struct bus;
+struct master;
+struct child;
+
+struct prime {
+ int fd;
+ uint8_t *area;
+ size_t max;
+ size_t done;
+ size_t status;
+};
+
+static int prime_new(struct prime **out);
+static void prime_free(struct prime *p);
+static bool prime_done(struct prime *p);
+static void prime_consume(struct prime *p, size_t amount);
+static int prime_run(struct prime *p, struct bus *cancel, size_t number);
+static void prime_print(struct prime *p);
+
+struct bus {
+ int fd;
+ uint8_t *pool;
+};
+
+static int bus_open_connection(struct bus **out, uid_t uid, const char *name,
+ uint64_t recv_flags);
+static void bus_close_connection(struct bus *b);
+static void bus_poool_free_slice(struct bus *b, uint64_t offset);
+static int bus_acquire_name(struct bus *b, const char *name);
+static int bus_install_name_loss_match(struct bus *b, const char *name);
+static int bus_poll(struct bus *b);
+static int bus_make(uid_t uid, const char *name);
+
+struct master {
+ size_t n_workers;
+ size_t max_workers;
+
+ int signal_fd;
+ int control_fd;
+
+ struct prime *prime;
+ struct bus *bus;
+};
+
+static int master_new(struct master **out);
+static void master_free(struct master *m);
+static int master_run(struct master *m);
+static int master_poll(struct master *m);
+static int master_handle_stdin(struct master *m);
+static int master_handle_signal(struct master *m);
+static int master_handle_bus(struct master *m);
+static int master_reply(struct master *m, const struct kdbus_msg *msg);
+static int master_waitpid(struct master *m);
+static int master_spawn(struct master *m);
+
+struct child {
+ struct bus *bus;
+ struct prime *prime;
+};
+
+static int child_new(struct child **out, struct prime *p);
+static void child_free(struct child *c);
+static int child_run(struct child *c);
+
+/* END OF FORWARD DECLARATIONS */
+
+/*
+ * This is the main entrypoint of this example. It is pretty straightforward. We
+ * create a master object, run the computation, print a status report and then
+ * exit. Nothing particularly interesting here, so lets look into the master
+ * object...
+ */
+int main(int argc, char **argv)
+{
+ struct master *m = NULL;
+ int r;
+
+ r = master_new(&m);
+ if (r < 0)
+ goto out;
+
+ r = master_run(m);
+ if (r < 0)
+ goto out;
+
+ if (0)
+ prime_print(m->prime);
+
+out:
+ master_free(m);
+ if (r < 0 && r != -EINTR)
+ fprintf(stderr, "failed\n");
+ else
+ fprintf(stderr, "done\n");
+ return r < 0 ? EXIT_FAILURE : EXIT_SUCCESS;
+}
+
+/*
+ * ...this will allocate a new master context. It keeps track of the current
+ * number of children/workers that are running, manages a signalfd to track
+ * SIGCHLD, and creates a private kdbus bus. Afterwards, it opens its connection
+ * to the bus and acquires a well known-name (arg_master).
+ */
+static int master_new(struct master **out)
+{
+ struct master *m;
+ sigset_t smask;
+ int r;
+
+ m = calloc(1, sizeof(*m));
+ if (!m)
+ return err("cannot allocate master");
+
+ m->max_workers = WORKER_COUNT;
+ m->signal_fd = -1;
+ m->control_fd = -1;
+
+ /* Block SIGINT and SIGCHLD signals */
+ sigemptyset(&smask);
+ sigaddset(&smask, SIGINT);
+ sigaddset(&smask, SIGCHLD);
+ sigprocmask(SIG_BLOCK, &smask, NULL);
+
+ m->signal_fd = signalfd(-1, &smask, SFD_CLOEXEC);
+ if (m->signal_fd < 0) {
+ r = err("cannot create signalfd");
+ goto error;
+ }
+
+ r = prime_new(&m->prime);
+ if (r < 0)
+ goto error;
+
+ m->control_fd = bus_make(getuid(), arg_busname);
+ if (m->control_fd < 0) {
+ r = m->control_fd;
+ goto error;
+ }
+
+ /*
+ * Open a bus connection for the master, and require each received
+ * message to have a metadata item of type KDBUS_ITEM_PIDS attached.
+ * The current UID is needed to compute the name of the bus node to
+ * connect to.
+ */
+ r = bus_open_connection(&m->bus, getuid(),
+ arg_busname, KDBUS_ATTACH_PIDS);
+ if (r < 0)
+ goto error;
+
+ /*
+ * Acquire a well-known name on the bus, so children can address
+ * messages to the master using KDBUS_DST_ID_NAME as destination-ID
+ * of messages.
+ */
+ r = bus_acquire_name(m->bus, arg_master);
+ if (r < 0)
+ goto error;
+
+ *out = m;
+ return 0;
+
+error:
+ master_free(m);
+ return r;
+}
+
+/* pretty straightforward destructor of a master object */
+static void master_free(struct master *m)
+{
+ if (!m)
+ return;
+
+ bus_close_connection(m->bus);
+ if (m->control_fd >= 0)
+ close(m->control_fd);
+ prime_free(m->prime);
+ if (m->signal_fd >= 0)
+ close(m->signal_fd);
+ free(m);
+}
+
+static int master_run(struct master *m)
+{
+ int res, r = 0;
+
+ while (!prime_done(m->prime)) {
+ while (m->n_workers < m->max_workers) {
+ r = master_spawn(m);
+ if (r < 0)
+ break;
+ }
+
+ r = master_poll(m);
+ if (r < 0)
+ break;
+ }
+
+ if (r < 0) {
+ bus_close_connection(m->bus);
+ m->bus = NULL;
+ }
+
+ while (m->n_workers > 0) {
+ res = master_poll(m);
+ if (res < 0) {
+ if (m->bus) {
+ bus_close_connection(m->bus);
+ m->bus = NULL;
+ }
+ r = res;
+ }
+ }
+
+ return r == -EINTR ? 0 : r;
+}
+
+static int master_poll(struct master *m)
+{
+ struct pollfd fds[3] = {};
+ int r = 0, n = 0;
+
+ /*
+ * Add stdin, the eventfd and the connection owner file descriptor to
+ * the pollfd table, and handle incoming traffic on the latter in
+ * master_handle_bus().
+ */
+ fds[n].fd = STDIN_FILENO;
+ fds[n++].events = POLLIN;
+ fds[n].fd = m->signal_fd;
+ fds[n++].events = POLLIN;
+ if (m->bus) {
+ fds[n].fd = m->bus->fd;
+ fds[n++].events = POLLIN;
+ }
+
+ r = poll(fds, n, -1);
+ if (r < 0)
+ return err("poll() failed");
+
+ if (fds[0].revents & POLLIN)
+ r = master_handle_stdin(m);
+ else if (fds[0].revents)
+ r = err("ERR/HUP on stdin");
+ if (r < 0)
+ return r;
+
+ if (fds[1].revents & POLLIN)
+ r = master_handle_signal(m);
+ else if (fds[1].revents)
+ r = err("ERR/HUP on signalfd");
+ if (r < 0)
+ return r;
+
+ if (fds[2].revents & POLLIN)
+ r = master_handle_bus(m);
+ else if (fds[2].revents)
+ r = err("ERR/HUP on bus");
+
+ return r;
+}
+
+static int master_handle_stdin(struct master *m)
+{
+ char buf[128];
+ ssize_t l;
+ int r = 0;
+
+ l = read(STDIN_FILENO, buf, sizeof(buf));
+ if (l < 0)
+ return err("cannot read stdin");
+ if (l == 0)
+ return err_r(-EINVAL, "EOF on stdin");
+
+ while (l-- > 0) {
+ switch (buf[l]) {
+ case 'q':
+ /* quit */
+ r = -EINTR;
+ break;
+ case '\n':
+ case ' ':
+ /* ignore */
+ break;
+ default:
+ if (isgraph(buf[l]))
+ fprintf(stderr, "invalid input '%c'\n", buf[l]);
+ else
+ fprintf(stderr, "invalid input 0x%x\n", buf[l]);
+ break;
+ }
+ }
+
+ return r;
+}
+
+static int master_handle_signal(struct master *m)
+{
+ struct signalfd_siginfo val;
+ ssize_t l;
+
+ l = read(m->signal_fd, &val, sizeof(val));
+ if (l < 0)
+ return err("cannot read signalfd");
+ if (l != sizeof(val))
+ return err_r(-EINVAL, "invalid data from signalfd");
+
+ switch (val.ssi_signo) {
+ case SIGCHLD:
+ return master_waitpid(m);
+ case SIGINT:
+ return err_r(-EINTR, "interrupted");
+ default:
+ return err_r(-EINVAL, "caught invalid signal");
+ }
+}
+
+static int master_handle_bus(struct master *m)
+{
+ struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
+ const struct kdbus_msg *msg = NULL;
+ const struct kdbus_item *item;
+ const struct kdbus_vec *vec = NULL;
+ int r = 0;
+
+ /*
+ * To receive a message, the KDBUS_CMD_RECV ioctl is used.
+ * It takes an argument of type 'struct kdbus_cmd_recv', which
+ * will contain information on the received message when the call
+ * returns. See kdbus.message(7).
+ */
+ r = kdbus_cmd_recv(m->bus->fd, &recv);
+ /*
+ * EAGAIN is returned when there is no message waiting on this
+ * connection. This is not an error - simply bail out.
+ */
+ if (r == -EAGAIN)
+ return 0;
+ if (r < 0)
+ return err_r(r, "cannot receive message");
+
+ /*
+ * Messages received by a connection are stored inside the connection's
+ * pool, at an offset that has been returned in the 'recv' command
+ * struct above. The value describes the relative offset from the
+ * start address of the pool. A message is described with
+ * 'struct kdbus_msg'. See kdbus.message(7).
+ */
+ msg = (void *)(m->bus->pool + recv.msg.offset);
+
+ /*
+ * A messages describes its actual payload in an array of items.
+ * KDBUS_FOREACH() is a simple iterator that walks such an array.
+ * struct kdbus_msg has a field to denote its total size, which is
+ * needed to determine the number of items in the array.
+ */
+ KDBUS_FOREACH(item, msg->items,
+ msg->size - offsetof(struct kdbus_msg, items)) {
+ /*
+ * An item of type PAYLOAD_OFF describes in-line memory
+ * stored in the pool at a described offset. That offset is
+ * relative to the start address of the message header.
+ * This example program only expects one single item of that
+ * type, remembers the struct kdbus_vec member of the item
+ * when it sees it, and bails out if there is more than one
+ * of them.
+ */
+ if (item->type == KDBUS_ITEM_PAYLOAD_OFF) {
+ if (vec) {
+ r = err_r(-EEXIST,
+ "message with multiple vecs");
+ break;
+ }
+ vec = &item->vec;
+ if (vec->size != 1) {
+ r = err_r(-EINVAL, "invalid message size");
+ break;
+ }
+
+ /*
+ * MEMFDs are transported as items of type PAYLOAD_MEMFD.
+ * If such an item is attached, a new file descriptor was
+ * installed into the task when KDBUS_CMD_RECV was called, and
+ * its number is stored in item->memfd.fd.
+ * Implementers *must* handle this item type and close the
+ * file descriptor when no longer needed in order to prevent
+ * file descriptor exhaustion. This example program just bails
+ * out with an error in this case, as memfds are not expected
+ * in this context.
+ */
+ } else if (item->type == KDBUS_ITEM_PAYLOAD_MEMFD) {
+ r = err_r(-EINVAL, "message with memfd");
+ break;
+ }
+ }
+ if (r < 0)
+ goto exit;
+ if (!vec) {
+ r = err_r(-EINVAL, "empty message");
+ goto exit;
+ }
+
+ switch (*((const uint8_t *)msg + vec->offset)) {
+ case 'r': {
+ r = master_reply(m, msg);
+ break;
+ }
+ default:
+ r = err_r(-EINVAL, "invalid message type");
+ break;
+ }
+
+exit:
+ /*
+ * We are done with the memory slice that was given to us through
+ * recv.msg.offset. Tell the kernel it can use it for other content
+ * in the future. See kdbus.pool(7).
+ */
+ bus_poool_free_slice(m->bus, recv.msg.offset);
+ return r;
+}
+
+static int master_reply(struct master *m, const struct kdbus_msg *msg)
+{
+ struct kdbus_cmd_send cmd;
+ struct kdbus_item *item;
+ struct kdbus_msg *reply;
+ size_t size, status, p[2];
+ int r;
+
+ /*
+ * This functions sends a message over kdbus. To do this, it uses the
+ * KDBUS_CMD_SEND ioctl, which takes a command struct argument of type
+ * 'struct kdbus_cmd_send'. This struct stores a pointer to the actual
+ * message to send. See kdbus.message(7).
+ */
+ p[0] = m->prime->done;
+ p[1] = prime_done(m->prime) ? 0 : PRIME_STEPS;
+
+ size = sizeof(*reply);
+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
+
+ /* Prepare the message to send */
+ reply = alloca(size);
+ memset(reply, 0, size);
+ reply->size = size;
+
+ /* Each message has a cookie that can be used to send replies */
+ reply->cookie = 1;
+
+ /* The payload_type is arbitrary, but it must be non-zero */
+ reply->payload_type = 0xdeadbeef;
+
+ /*
+ * We are sending a reply. Let the kernel know the cookie of the
+ * message we are replying to.
+ */
+ reply->cookie_reply = msg->cookie;
+
+ /*
+ * Messages can either be directed to a well-known name (stored as
+ * string) or to a unique name (stored as number). This example does
+ * the latter. If the message would be directed to a well-known name
+ * instead, the message's dst_id field would be set to
+ * KDBUS_DST_ID_NAME, and the name would be attaches in an item of type
+ * KDBUS_ITEM_DST_NAME. See below for an example, and also refer to
+ * kdbus.message(7).
+ */
+ reply->dst_id = msg->src_id;
+
+ /* Our message has exactly one item to store its payload */
+ item = reply->items;
+ item->type = KDBUS_ITEM_PAYLOAD_VEC;
+ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
+ item->vec.address = (uintptr_t)p;
+ item->vec.size = sizeof(p);
+
+ /*
+ * Now prepare the command struct, and reference the message we want
+ * to send.
+ */
+ memset(&cmd, 0, sizeof(cmd));
+ cmd.size = sizeof(cmd);
+ cmd.msg_address = (uintptr_t)reply;
+
+ /*
+ * Finally, employ the command on the connection owner
+ * file descriptor.
+ */
+ r = kdbus_cmd_send(m->bus->fd, &cmd);
+ if (r < 0)
+ return err_r(r, "cannot send reply");
+
+ if (p[1]) {
+ prime_consume(m->prime, p[1]);
+ status = m->prime->done * 10000 / m->prime->max;
+ if (status != m->prime->status) {
+ m->prime->status = status;
+ fprintf(stderr, "status: %7.3lf%%\n",
+ (double)status / 100);
+ }
+ }
+
+ return 0;
+}
+
+static int master_waitpid(struct master *m)
+{
+ pid_t pid;
+ int r;
+
+ while ((pid = waitpid(-1, &r, WNOHANG)) > 0) {
+ if (m->n_workers > 0)
+ --m->n_workers;
+ if (!WIFEXITED(r))
+ r = err_r(-EINVAL, "child died unexpectedly");
+ else if (WEXITSTATUS(r) != 0)
+ r = err_r(-WEXITSTATUS(r), "child failed");
+ }
+
+ return r;
+}
+
+static int master_spawn(struct master *m)
+{
+ struct child *c = NULL;
+ struct prime *p = NULL;
+ pid_t pid;
+ int r;
+
+ /* Spawn off one child and call child_run() inside it */
+
+ pid = fork();
+ if (pid < 0)
+ return err("cannot fork");
+ if (pid > 0) {
+ /* parent */
+ ++m->n_workers;
+ return 0;
+ }
+
+ /* child */
+
+ p = m->prime;
+ m->prime = NULL;
+ master_free(m);
+
+ r = child_new(&c, p);
+ if (r < 0)
+ goto exit;
+
+ r = child_run(c);
+
+exit:
+ child_free(c);
+ exit(abs(r));
+}
+
+static int child_new(struct child **out, struct prime *p)
+{
+ struct child *c;
+ int r;
+
+ c = calloc(1, sizeof(*c));
+ if (!c)
+ return err("cannot allocate child");
+
+ c->prime = p;
+
+ /*
+ * Open a connection to the bus and require each received message to
+ * carry a list of the well-known names the sendind connection currently
+ * owns. The current UID is needed in order to determine the name of the
+ * bus node to connect to.
+ */
+ r = bus_open_connection(&c->bus, getuid(),
+ arg_busname, KDBUS_ATTACH_NAMES);
+ if (r < 0)
+ goto error;
+
+ /*
+ * Install a kdbus match so the child's connection gets notified when
+ * the master loses its well-known name.
+ */
+ r = bus_install_name_loss_match(c->bus, arg_master);
+ if (r < 0)
+ goto error;
+
+ *out = c;
+ return 0;
+
+error:
+ child_free(c);
+ return r;
+}
+
+static void child_free(struct child *c)
+{
+ if (!c)
+ return;
+
+ bus_close_connection(c->bus);
+ prime_free(c->prime);
+ free(c);
+}
+
+static int child_run(struct child *c)
+{
+ struct kdbus_cmd_send cmd;
+ struct kdbus_item *item;
+ struct kdbus_vec *vec = NULL;
+ struct kdbus_msg *msg;
+ struct timespec spec;
+ size_t n, steps, size;
+ int r = 0;
+
+ /*
+ * Let's send a message to the master and ask for work. To do this,
+ * we use the KDBUS_CMD_SEND ioctl, which takes an argument of type
+ * 'struct kdbus_cmd_send'. This struct stores a pointer to the actual
+ * message to send. See kdbus.message(7).
+ */
+ size = sizeof(*msg);
+ size += KDBUS_ITEM_SIZE(strlen(arg_master) + 1);
+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
+
+ msg = alloca(size);
+ memset(msg, 0, size);
+ msg->size = size;
+
+ /*
+ * Tell the kernel that we expect a reply to this message. This means
+ * that
+ *
+ * a) The remote peer will gain temporary permission to talk to us
+ * even if it would not be allowed to normally.
+ *
+ * b) A timeout value is required.
+ *
+ * For asynchronous send commands, if no reply is received, we will
+ * get a kernel notification with an item of type
+ * KDBUS_ITEM_REPLY_TIMEOUT attached.
+ *
+ * For synchronous send commands (which this example does), the
+ * ioctl will block until a reply is received or the timeout is
+ * exceeded.
+ */
+ msg->flags = KDBUS_MSG_EXPECT_REPLY;
+
+ /* Set our cookie. Replies must use this cookie to send their reply. */
+ msg->cookie = 1;
+
+ /* The payload_type is arbitrary, but it must be non-zero */
+ msg->payload_type = 0xdeadbeef;
+
+ /*
+ * We are sending our message to the current owner of a well-known
+ * name. This makes an item of type KDBUS_ITEM_DST_NAME mandatory.
+ */
+ msg->dst_id = KDBUS_DST_ID_NAME;
+
+ /*
+ * Set the reply timeout to 5 seconds. Timeouts are always set in
+ * absolute timestamps, based con CLOCK_MONOTONIC. See kdbus.message(7).
+ */
+ clock_gettime(CLOCK_MONOTONIC_COARSE, &spec);
+ msg->timeout_ns += (5 + spec.tv_sec) * 1000ULL * 1000ULL * 1000ULL;
+ msg->timeout_ns += spec.tv_nsec;
+
+ /*
+ * Fill the appended items. First, set the well-known name of the
+ * destination we want to talk to.
+ */
+ item = msg->items;
+ item->type = KDBUS_ITEM_DST_NAME;
+ item->size = KDBUS_ITEM_HEADER_SIZE + strlen(arg_master) + 1;
+ strcpy(item->str, arg_master);
+
+ /*
+ * The 2nd item contains a vector to memory we want to send. It
+ * can be content of any type. In our case, we're sending a one-byte
+ * string only. The memory referenced by this item will be copied into
+ * the pool of the receveiver connection, and does not need to be
+ * valid after the command is employed.
+ */
+ item = KDBUS_ITEM_NEXT(item);
+ item->type = KDBUS_ITEM_PAYLOAD_VEC;
+ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
+ item->vec.address = (uintptr_t)"r";
+ item->vec.size = 1;
+
+ /* Set up the command struct and reference the message we prepared */
+ memset(&cmd, 0, sizeof(cmd));
+ cmd.size = sizeof(cmd);
+ cmd.msg_address = (uintptr_t)msg;
+
+ /*
+ * The send commands knows a mode in which it will block until a
+ * reply to a message is received. This example uses that mode.
+ * The pool offset to the received reply will be stored in the command
+ * struct after the send command returned. See below.
+ */
+ cmd.flags = KDBUS_SEND_SYNC_REPLY;
+
+ /*
+ * Finally, employ the command on the connection owner
+ * file descriptor.
+ */
+ r = kdbus_cmd_send(c->bus->fd, &cmd);
+ if (r == -ESRCH || r == -EPIPE || r == -ECONNRESET)
+ return 0;
+ if (r < 0)
+ return err_r(r, "cannot send request to master");
+
+ /*
+ * The command was sent with the KDBUS_SEND_SYNC_REPLY flag set,
+ * and returned successfully, which means that cmd.reply.offset now
+ * points to a message inside our connection's pool where the reply
+ * is found. This is equivalent to receiving the reply with
+ * KDBUS_CMD_RECV, but it doesn't require waiting for the reply with
+ * poll() and also saves the ioctl to receive the message.
+ */
+ msg = (void *)(c->bus->pool + cmd.reply.offset);
+
+ /*
+ * A messages describes its actual payload in an array of items.
+ * KDBUS_FOREACH() is a simple iterator that walks such an array.
+ * struct kdbus_msg has a field to denote its total size, which is
+ * needed to determine the number of items in the array.
+ */
+ KDBUS_FOREACH(item, msg->items,
+ msg->size - offsetof(struct kdbus_msg, items)) {
+ /*
+ * An item of type PAYLOAD_OFF describes in-line memory
+ * stored in the pool at a described offset. That offset is
+ * relative to the start address of the message header.
+ * This example program only expects one single item of that
+ * type, remembers the struct kdbus_vec member of the item
+ * when it sees it, and bails out if there is more than one
+ * of them.
+ */
+ if (item->type == KDBUS_ITEM_PAYLOAD_OFF) {
+ if (vec) {
+ r = err_r(-EEXIST,
+ "message with multiple vecs");
+ break;
+ }
+ vec = &item->vec;
+ if (vec->size != 2 * sizeof(size_t)) {
+ r = err_r(-EINVAL, "invalid message size");
+ break;
+ }
+ /*
+ * MEMFDs are transported as items of type PAYLOAD_MEMFD.
+ * If such an item is attached, a new file descriptor was
+ * installed into the task when KDBUS_CMD_RECV was called, and
+ * its number is stored in item->memfd.fd.
+ * Implementers *must* handle this item type close the
+ * file descriptor when no longer needed in order to prevent
+ * file descriptor exhaustion. This example program just bails
+ * out with an error in this case, as memfds are not expected
+ * in this context.
+ */
+ } else if (item->type == KDBUS_ITEM_PAYLOAD_MEMFD) {
+ r = err_r(-EINVAL, "message with memfd");
+ break;
+ }
+ }
+ if (r < 0)
+ goto exit;
+ if (!vec) {
+ r = err_r(-EINVAL, "empty message");
+ goto exit;
+ }
+
+ n = ((size_t *)((const uint8_t *)msg + vec->offset))[0];
+ steps = ((size_t *)((const uint8_t *)msg + vec->offset))[1];
+
+ while (steps-- > 0) {
+ ++n;
+ r = prime_run(c->prime, c->bus, n);
+ if (r < 0)
+ break;
+ r = bus_poll(c->bus);
+ if (r != 0) {
+ r = r < 0 ? r : -EINTR;
+ break;
+ }
+ }
+
+exit:
+ /*
+ * We are done with the memory slice that was given to us through
+ * cmd.reply.offset. Tell the kernel it can use it for other content
+ * in the future. See kdbus.pool(7).
+ */
+ bus_poool_free_slice(c->bus, cmd.reply.offset);
+ return r;
+}
+
+/*
+ * Prime Computation
+ *
+ */
+
+static int prime_new(struct prime **out)
+{
+ struct prime *p;
+ int r;
+
+ p = calloc(1, sizeof(*p));
+ if (!p)
+ return err("cannot allocate prime memory");
+
+ p->fd = -1;
+ p->area = MAP_FAILED;
+ p->max = MAX_PRIMES;
+
+ /*
+ * Prepare and map a memfd to store the bit-fields for the number
+ * ranges we want to perform the prime detection on.
+ */
+ p->fd = syscall(__NR_memfd_create, "prime-area", MFD_CLOEXEC);
+ if (p->fd < 0) {
+ r = err("cannot create memfd");
+ goto error;
+ }
+
+ r = ftruncate(p->fd, p->max / 8 + 1);
+ if (r < 0) {
+ r = err("cannot ftruncate area");
+ goto error;
+ }
+
+ p->area = mmap(NULL, p->max / 8 + 1, PROT_READ | PROT_WRITE,
+ MAP_SHARED, p->fd, 0);
+ if (p->area == MAP_FAILED) {
+ r = err("cannot mmap memfd");
+ goto error;
+ }
+
+ *out = p;
+ return 0;
+
+error:
+ prime_free(p);
+ return r;
+}
+
+static void prime_free(struct prime *p)
+{
+ if (!p)
+ return;
+
+ if (p->area != MAP_FAILED)
+ munmap(p->area, p->max / 8 + 1);
+ if (p->fd >= 0)
+ close(p->fd);
+ free(p);
+}
+
+static bool prime_done(struct prime *p)
+{
+ return p->done >= p->max;
+}
+
+static void prime_consume(struct prime *p, size_t amount)
+{
+ p->done += amount;
+}
+
+static int prime_run(struct prime *p, struct bus *cancel, size_t number)
+{
+ size_t i, n = 0;
+ int r;
+
+ if (number < 2 || number > 65535)
+ return 0;
+
+ for (i = number * number;
+ i < p->max && i > number;
+ i += number) {
+ p->area[i / 8] |= 1 << (i % 8);
+
+ if (!(++n % (1 << 20))) {
+ r = bus_poll(cancel);
+ if (r != 0)
+ return r < 0 ? r : -EINTR;
+ }
+ }
+
+ return 0;
+}
+
+static void prime_print(struct prime *p)
+{
+ size_t i, l = 0;
+
+ fprintf(stderr, "PRIMES:");
+ for (i = 0; i < p->max; ++i) {
+ if (!(p->area[i / 8] & (1 << (i % 8))))
+ fprintf(stderr, "%c%7zu", !(l++ % 16) ? '\n' : ' ', i);
+ }
+ fprintf(stderr, "\nEND\n");
+}
+
+static int bus_open_connection(struct bus **out, uid_t uid, const char *name,
+ uint64_t recv_flags)
+{
+ struct kdbus_cmd_hello hello;
+ char path[128];
+ struct bus *b;
+ int r;
+
+ /*
+ * The 'bus' object is our representation of a kdbus connection which
+ * stores two details: the connection owner file descriptor, and the
+ * mmap()ed memory of its associated pool. See kdbus.connection(7) and
+ * kdbus.pool(7).
+ */
+ b = calloc(1, sizeof(*b));
+ if (!b)
+ return err("cannot allocate bus memory");
+
+ b->fd = -1;
+ b->pool = MAP_FAILED;
+
+ /* Compute the name of the bus node to connect to. */
+ snprintf(path, sizeof(path), "/sys/fs/%s/%lu-%s/bus",
+ arg_modname, (unsigned long)uid, name);
+ b->fd = open(path, O_RDWR | O_CLOEXEC);
+ if (b->fd < 0) {
+ r = err("cannot open bus");
+ goto error;
+ }
+
+ /*
+ * To make a connection to the bus, the KDBUS_CMD_HELLO ioctl is used.
+ * It takes an argument of type 'struct kdbus_cmd_hello'.
+ */
+ memset(&hello, 0, sizeof(hello));
+ hello.size = sizeof(hello);
+
+ /*
+ * Specify a mask of metadata attach flags, describing metadata items
+ * that this new connection allows to be sent.
+ */
+ hello.attach_flags_send = _KDBUS_ATTACH_ALL;
+
+ /*
+ * Specify a mask of metadata attach flags, describing metadata items
+ * that this new connection wants to be receive along with each message.
+ */
+ hello.attach_flags_recv = recv_flags;
+
+ /*
+ * A connection may choose the size of its pool, but the number has to
+ * comply with two rules: a) it must be greater than 0, and b) it must
+ * be a mulitple of PAGE_SIZE. See kdbus.pool(7).
+ */
+ hello.pool_size = POOL_SIZE;
+
+ /*
+ * Now employ the command on the file descriptor opened above.
+ * This command will turn the file descriptor into a connection-owner
+ * file descriptor that controls the life-time of the connection; once
+ * it's closed, the connection is shut down.
+ */
+ r = kdbus_cmd_hello(b->fd, &hello);
+ if (r < 0) {
+ err_r(r, "HELLO failed");
+ goto error;
+ }
+
+ bus_poool_free_slice(b, hello.offset);
+
+ /*
+ * Map the pool of the connection. Its size has been set in the
+ * command struct above. See kdbus.pool(7).
+ */
+ b->pool = mmap(NULL, POOL_SIZE, PROT_READ, MAP_SHARED, b->fd, 0);
+ if (b->pool == MAP_FAILED) {
+ r = err("cannot mmap pool");
+ goto error;
+ }
+
+ *out = b;
+ return 0;
+
+error:
+ bus_close_connection(b);
+ return r;
+}
+
+static void bus_close_connection(struct bus *b)
+{
+ if (!b)
+ return;
+
+ /*
+ * A bus connection is closed by simply calling close() on the
+ * connection owner file descriptor. The unique name and all owned
+ * well-known names of the conneciton will disappear.
+ * See kdbus.connection(7).
+ */
+ if (b->pool != MAP_FAILED)
+ munmap(b->pool, POOL_SIZE);
+ if (b->fd >= 0)
+ close(b->fd);
+ free(b);
+}
+
+static void bus_poool_free_slice(struct bus *b, uint64_t offset)
+{
+ struct kdbus_cmd_free cmd = {
+ .size = sizeof(cmd),
+ .offset = offset,
+ };
+ int r;
+
+ /*
+ * Once we're done with a piece of pool memory that was returned
+ * by a command, we have to call the KDBUS_CMD_FREE ioctl on it so it
+ * can be reused. The command takes an argument of type
+ * 'struct kdbus_cmd_free', in which the pool offset of the slice to
+ * free is stored. The ioctl is employed on the connection owner
+ * file descriptor. See kdbus.pool(7),
+ */
+ r = kdbus_cmd_free(b->fd, &cmd);
+ if (r < 0)
+ err_r(r, "cannot free pool slice");
+}
+
+static int bus_acquire_name(struct bus *b, const char *name)
+{
+ struct kdbus_item *item;
+ struct kdbus_cmd *cmd;
+ size_t size;
+ int r;
+
+ /*
+ * This function acquires a well-known name on the bus through the
+ * KDBUS_CMD_NAME_ACQUIRE ioctl. This ioctl takes an argument of type
+ * 'struct kdbus_cmd', which is assembled below. See kdbus.name(7).
+ */
+ size = sizeof(*cmd);
+ size += KDBUS_ITEM_SIZE(strlen(name) + 1);
+
+ cmd = alloca(size);
+ memset(cmd, 0, size);
+ cmd->size = size;
+
+ /*
+ * The command requires an item of type KDBUS_ITEM_NAME, and its
+ * content must be a valid bus name.
+ */
+ item = cmd->items;
+ item->type = KDBUS_ITEM_NAME;
+ item->size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
+ strcpy(item->str, name);
+
+ /*
+ * Employ the command on the connection owner file descriptor.
+ */
+ r = kdbus_cmd_name_acquire(b->fd, cmd);
+ if (r < 0)
+ return err_r(r, "cannot acquire name");
+
+ return 0;
+}
+
+static int bus_install_name_loss_match(struct bus *b, const char *name)
+{
+ struct kdbus_cmd_match *match;
+ struct kdbus_item *item;
+ size_t size;
+ int r;
+
+ /*
+ * In order to install a match for signal messages, we have to
+ * assemble a 'struct kdbus_cmd_match' and use it along with the
+ * KDBUS_CMD_MATCH_ADD ioctl. See kdbus.match(7).
+ */
+ size = sizeof(*match);
+ size += KDBUS_ITEM_SIZE(sizeof(item->name_change) + strlen(name) + 1);
+
+ match = alloca(size);
+ memset(match, 0, size);
+ match->size = size;
+
+ /*
+ * A match is comprised of many 'rules', each of which describes a
+ * mandatory detail of the message. All rules of a match must be
+ * satified in order to make a message pass.
+ */
+ item = match->items;
+
+ /*
+ * In this case, we're interested in notifications that inform us
+ * about a well-known name being removed from the bus.
+ */
+ item->type = KDBUS_ITEM_NAME_REMOVE;
+ item->size = KDBUS_ITEM_HEADER_SIZE +
+ sizeof(item->name_change) + strlen(name) + 1;
+
+ /*
+ * We could limit the match further and require a specific unique-ID
+ * to be the new or the old owner of the name. In this case, however,
+ * we don't, and allow 'any' id.
+ */
+ item->name_change.old_id.id = KDBUS_MATCH_ID_ANY;
+ item->name_change.new_id.id = KDBUS_MATCH_ID_ANY;
+
+ /* Copy in the well-known name we're interested in */
+ strcpy(item->name_change.name, name);
+
+ /*
+ * Add the match through the KDBUS_CMD_MATCH_ADD ioctl, employed on
+ * the connection owner fd.
+ */
+ r = kdbus_cmd_match_add(b->fd, match);
+ if (r < 0)
+ return err_r(r, "cannot add match");
+
+ return 0;
+}
+
+static int bus_poll(struct bus *b)
+{
+ struct pollfd fds[1] = {};
+ int r;
+
+ /*
+ * A connection endpoint supports poll() and will wake-up the
+ * task with POLLIN set once a message has arrived.
+ */
+ fds[0].fd = b->fd;
+ fds[0].events = POLLIN;
+ r = poll(fds, sizeof(fds) / sizeof(*fds), 0);
+ if (r < 0)
+ return err("cannot poll bus");
+ return !!(fds[0].revents & POLLIN);
+}
+
+static int bus_make(uid_t uid, const char *name)
+{
+ struct kdbus_item *item;
+ struct kdbus_cmd *make;
+ char path[128], busname[128];
+ size_t size;
+ int r, fd;
+
+ /*
+ * Compute the full path to the 'control' node. 'arg_modname' may be
+ * set to a different value than 'kdbus' for development purposes.
+ * The 'control' node is the primary entry point to kdbus that must be
+ * used in order to create a bus. See kdbus(7) and kdbus.bus(7).
+ */
+ snprintf(path, sizeof(path), "/sys/fs/%s/control", arg_modname);
+
+ /*
+ * Compute the bus name. A valid bus name must always be prefixed with
+ * the EUID of the currently running process in order to avoid name
+ * conflicts. See kdbus.bus(7).
+ */
+ snprintf(busname, sizeof(busname), "%lu-%s", (unsigned long)uid, name);
+
+ fd = open(path, O_RDWR | O_CLOEXEC);
+ if (fd < 0)
+ return err("cannot open control file");
+
+ /*
+ * The KDBUS_CMD_BUS_MAKE ioctl takes an argument of type
+ * 'struct kdbus_cmd', and expects at least two items attached to
+ * it: one to decribe the bloom parameters to be propagated to
+ * connections of the bus, and the name of the bus that was computed
+ * above. Assemble this struct now, and fill it with values.
+ */
+ size = sizeof(*make);
+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_parameter));
+ size += KDBUS_ITEM_SIZE(strlen(busname) + 1);
+
+ make = alloca(size);
+ memset(make, 0, size);
+ make->size = size;
+
+ /*
+ * Each item has a 'type' and 'size' field, and must be stored at an
+ * 8-byte aligned address. The KDBUS_ITEM_NEXT macro is used to advance
+ * the pointer. See kdbus.item(7) for more details.
+ */
+ item = make->items;
+ item->type = KDBUS_ITEM_BLOOM_PARAMETER;
+ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(item->bloom_parameter);
+ item->bloom_parameter.size = 8;
+ item->bloom_parameter.n_hash = 1;
+
+ /* The name of the new bus is stored in the next item. */
+ item = KDBUS_ITEM_NEXT(item);
+ item->type = KDBUS_ITEM_MAKE_NAME;
+ item->size = KDBUS_ITEM_HEADER_SIZE + strlen(busname) + 1;
+ strcpy(item->str, busname);
+
+ /*
+ * Now create the bus via the KDBUS_CMD_BUS_MAKE ioctl and return the
+ * fd that was used back to the caller of this function. This fd is now
+ * called a 'bus owner file descriptor', and it controls the life-time
+ * of the newly created bus; once the file descriptor is closed, the
+ * bus goes away, and all connections are shut down. See kdbus.bus(7).
+ */
+ r = kdbus_cmd_bus_make(fd, make);
+ if (r < 0) {
+ err_r(r, "cannot make bus");
+ close(fd);
+ return r;
+ }
+
+ return fd;
+}
--
2.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Sasha Levin
2015-03-12 14:53:29 UTC
Permalink
Post by Greg Kroah-Hartman
diff --git a/samples/kdbus/Makefile b/samples/kdbus/Makefile
new file mode 100644
index 000000000000..d009025369f4
--- /dev/null
+++ b/samples/kdbus/Makefile
@@ -0,0 +1,10 @@
+# kbuild trick to avoid linker error. Can be omitted if a module is built.
+obj- := dummy.o
+
+hostprogs-y += kdbus-workers
+
+always := $(hostprogs-y)
+
+HOSTCFLAGS_kdbus-workers.o += \
+ -I$(objtree)/usr/include/ \
+ -I$(objtree)/include/uapi/
-lrt

For older glibcs, otherwise clock_gettime() isn't found on linking.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
David Herrmann
2015-03-12 16:28:12 UTC
Permalink
On older systems -lrt is needed for clock_gettime(). Add it to
HOSTLOADLIBES of kdbus-workers so it builds fine on those systems.

Reported-by: Sasha Levin <***@oracle.com>
Signed-off-by: David Herrmann <***@gmail.com>
---
Hi

This should fix the build-issues of the kdbus-examples on older systems where
clock_gettime() is not available in -lc, but requires -lrt.

This is based on gregkh/char-misc/kdbus, and available on:
http://cgit.freedesktop.org/~dvdhrm/linux/log/?h=kdbus

Thanks
David

samples/kdbus/Makefile | 1 +
1 file changed, 1 insertion(+)

diff --git a/samples/kdbus/Makefile b/samples/kdbus/Makefile
index d009025..eee9b9a 100644
--- a/samples/kdbus/Makefile
+++ b/samples/kdbus/Makefile
@@ -8,3 +8,4 @@ always := $(hostprogs-y)
HOSTCFLAGS_kdbus-workers.o += \
-I$(objtree)/usr/include/ \
-I$(objtree)/include/uapi/
+HOSTLOADLIBES_kdbus-workers := -lrt
--
2.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2015-03-12 21:40:51 UTC
Permalink
Post by David Herrmann
On older systems -lrt is needed for clock_gettime(). Add it to
HOSTLOADLIBES of kdbus-workers so it builds fine on those systems.
---
Hi
This should fix the build-issues of the kdbus-examples on older systems where
clock_gettime() is not available in -lc, but requires -lrt.
http://cgit.freedesktop.org/~dvdhrm/linux/log/?h=kdbus
Now applied, thanks.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
David Herrmann
2015-03-12 16:34:49 UTC
Permalink
Hi
Post by Sasha Levin
Post by Greg Kroah-Hartman
diff --git a/samples/kdbus/Makefile b/samples/kdbus/Makefile
new file mode 100644
index 000000000000..d009025369f4
--- /dev/null
+++ b/samples/kdbus/Makefile
@@ -0,0 +1,10 @@
+# kbuild trick to avoid linker error. Can be omitted if a module is built.
+obj- := dummy.o
+
+hostprogs-y += kdbus-workers
+
+always := $(hostprogs-y)
+
+HOSTCFLAGS_kdbus-workers.o += \
+ -I$(objtree)/usr/include/ \
+ -I$(objtree)/include/uapi/
-lrt
For older glibcs, otherwise clock_gettime() isn't found on linking.
Right, thanks! Fixed in "[PATCH] samples/kdbus: add -lrt".

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jiri Slaby
2015-03-24 16:46:11 UTC
Permalink
Post by Greg Kroah-Hartman
--- /dev/null
+++ b/samples/kdbus/Makefile
@@ -0,0 +1,10 @@
+# kbuild trick to avoid linker error. Can be omitted if a module is built.
+obj- := dummy.o
+
+hostprogs-y += kdbus-workers
+
+always := $(hostprogs-y)
Errr, no. Not only it causes build failures (even with KDBUS=n), it
definitely should not be built for everyone.

And why is it a host prog? It's a sample prog for the kernel I am
building, i.e. for the destination arch, like all the other samples.

thanks,
--
js
suse labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
David Herrmann
2015-03-24 17:16:21 UTC
Permalink
Hi
Post by Jiri Slaby
Post by Greg Kroah-Hartman
--- /dev/null
+++ b/samples/kdbus/Makefile
@@ -0,0 +1,10 @@
+# kbuild trick to avoid linker error. Can be omitted if a module is built.
+obj- := dummy.o
+
+hostprogs-y += kdbus-workers
+
+always := $(hostprogs-y)
Errr, no. Not only it causes build failures (even with KDBUS=n), it
definitely should not be built for everyone.
It's only built if CONFIG_SAMPLES is set, right?

What build-failures does it cause? linux/kdbus.h is not optional based
on CONFIG_KDBUS, so the samples should build just fine. Can you tell
me what kind of errors you get? The kbuild-robots didn't report
anything so far.
Post by Jiri Slaby
And why is it a host prog? It's a sample prog for the kernel I am
building, i.e. for the destination arch, like all the other samples.
It's modeled after the other user-space examples in ./samples/, which
all use hostprogs (see samples/{bpf,hidraw,seccomp,uhid}/Makefile). I
have no idea how to build programs that run on the target
architecture. Documentation/kbuild/makefiles.txt doesn't list it,
which is, I guess, the reason why everyone used hostprogs so far. And
given that autotools calls the target architecture "--host", I
actually thought this is what hostprogs does.. apparently that's not
the case, sorry.

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jiri Slaby
2015-03-24 17:37:40 UTC
Permalink
Ccing kbuild fellows (see the very bottom).
Post by David Herrmann
Hi
Post by Jiri Slaby
Post by Greg Kroah-Hartman
--- /dev/null
+++ b/samples/kdbus/Makefile
@@ -0,0 +1,10 @@
+# kbuild trick to avoid linker error. Can be omitted if a module is built.
+obj- := dummy.o
+
+hostprogs-y += kdbus-workers
+
+always := $(hostprogs-y)
Errr, no. Not only it causes build failures (even with KDBUS=n), it
definitely should not be built for everyone.
It's only built if CONFIG_SAMPLES is set, right?
Yes, but I build other samples selected with separate CONFIG_ options
like CONFIG_SAMPLE_LIVEPATCH. So this guy should have its own CONFIG_
option too, because I (and likely others) don't want to build it. All of
the samples should be protected at least by their respective kernel
CONFIG_ option IMO.
Post by David Herrmann
What build-failures does it cause? linux/kdbus.h is not optional based
on CONFIG_KDBUS, so the samples should build just fine. Can you tell
me what kind of errors you get? The kbuild-robots didn't report
anything so far.
The output is this:
In file included from samples/kdbus/kdbus-workers.c:79:0:
/home/latest/linux/samples/kdbus/kdbus-api.h:5:25: fatal error:
linux/kdbus.h: No such file or directory
#include <linux/kdbus.h>
^
compilation terminated.

I now know that I have to install_headers *if* I want to build the
sample. But I don't want to build it in the first place. (So the config
option above should be all we need.)
Post by David Herrmann
Post by Jiri Slaby
And why is it a host prog? It's a sample prog for the kernel I am
building, i.e. for the destination arch, like all the other samples.
It's modeled after the other user-space examples in ./samples/, which
all use hostprogs (see samples/{bpf,hidraw,seccomp,uhid}/Makefile). I
have no idea how to build programs that run on the target
architecture. Documentation/kbuild/makefiles.txt doesn't list it,
which is, I guess, the reason why everyone used hostprogs so far. And
given that autotools calls the target architecture "--host", I
actually thought this is what hostprogs does.. apparently that's not
the case, sorry.
Oh, it's cut&paste, I see. This does not look correct though. The hack
inclusive. Host progs are intended to be run on the host where the
kernel is built. During the compilation or such (like x/menuconfig).
Quite misleading naming if you are used to the autotools one.

thanks,
--
js
suse labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Michal Marek
2015-03-24 18:23:24 UTC
Permalink
Post by Jiri Slaby
Post by David Herrmann
Post by Jiri Slaby
Errr, no. Not only it causes build failures (even with KDBUS=n), it
definitely should not be built for everyone.
It's only built if CONFIG_SAMPLES is set, right?
Yes, but I build other samples selected with separate CONFIG_ options
like CONFIG_SAMPLE_LIVEPATCH. So this guy should have its own CONFIG_
option too, because I (and likely others) don't want to build it. All of
Definitely.
Post by Jiri Slaby
Post by David Herrmann
It's modeled after the other user-space examples in ./samples/, which
all use hostprogs (see samples/{bpf,hidraw,seccomp,uhid}/Makefile). I
have no idea how to build programs that run on the target
architecture. Documentation/kbuild/makefiles.txt doesn't list it,
which is, I guess, the reason why everyone used hostprogs so far. And
given that autotools calls the target architecture "--host", I
actually thought this is what hostprogs does.. apparently that's not
the case, sorry.
Oh, it's cut&paste, I see. This does not look correct though. The hack
inclusive. Host progs are intended to be run on the host where the
kernel is built. During the compilation or such (like x/menuconfig).
Quite misleading naming if you are used to the autotools one.
when cross compiling, we are a bit between a rock and a hard place with
the sample userspace programs:
- The target toolchain might not have libc support
- The host toolchain might be lacking recent kernel headers (therefore
the need to do make headers_install)
- It's not clean whether the samples are meant to be ran on the build
host or target.

There has been some work by Sam Ravnborg to introduce uapiprogs-y to for
sample userspace programs, but for now, please use hostprogs-y,
-Iusr/include and make each sample opt-in.

Michal

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Daniel Mack
2015-03-24 18:51:58 UTC
Permalink
Post by Michal Marek
Post by Jiri Slaby
Oh, it's cut&paste, I see. This does not look correct though. The hack
inclusive. Host progs are intended to be run on the host where the
kernel is built. During the compilation or such (like x/menuconfig).
Quite misleading naming if you are used to the autotools one.
when cross compiling, we are a bit between a rock and a hard place with
- The target toolchain might not have libc support
- The host toolchain might be lacking recent kernel headers (therefore
the need to do make headers_install)
- It's not clean whether the samples are meant to be ran on the build
host or target.
Exactly. I just checked in a cross-compiled source tree, and none of the
compiled standalone executables from samples/ or Documentation/ are
actually built for the target platform. Only the samples which come as
kernel modules are.
Post by Michal Marek
There has been some work by Sam Ravnborg to introduce uapiprogs-y to for
sample userspace programs, but for now, please use hostprogs-y,
-Iusr/include and make each sample opt-in.
Alright then. Does the attached patch fix your problem, Jiri?


Thanks,
Daniel
Daniel Mack
2015-03-31 13:11:53 UTC
Permalink
Give the kdbus sample its own config switch and only build it if it's
explicitly switched on.

Signed-off-by: Daniel Mack <***@zonque.org>
Reviewed-by: David Herrmann <***@gmail.com>
Reported-by: Jiri Slaby <***@suse.cz>
---
samples/Kconfig | 7 +++++++
samples/kdbus/Makefile | 2 +-
2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/samples/Kconfig b/samples/Kconfig
index 224ebb4..a4c6b2f 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -55,6 +55,13 @@ config SAMPLE_KDB
Build an example of how to dynamically add the hello
command to the kdb shell.

+config SAMPLE_KDBUS
+ bool "Build kdbus API example"
+ depends on KDBUS
+ help
+ Build an example of how the kdbus API can be used from
+ userspace.
+
config SAMPLE_RPMSG_CLIENT
tristate "Build rpmsg client sample -- loadable modules only"
depends on RPMSG && m
diff --git a/samples/kdbus/Makefile b/samples/kdbus/Makefile
index d009025..9e40c68 100644
--- a/samples/kdbus/Makefile
+++ b/samples/kdbus/Makefile
@@ -1,7 +1,7 @@
# kbuild trick to avoid linker error. Can be omitted if a module is built.
obj- := dummy.o

-hostprogs-y += kdbus-workers
+hostprogs-$(CONFIG_SAMPLE_KDBUS) += kdbus-workers

always := $(hostprogs-y)
--
2.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg KH
2015-04-01 12:47:32 UTC
Permalink
Post by Daniel Mack
Give the kdbus sample its own config switch and only build it if it's
explicitly switched on.
---
samples/Kconfig | 7 +++++++
samples/kdbus/Makefile | 2 +-
2 files changed, 8 insertions(+), 1 deletion(-)
Now applied, thanks.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andrew Morton
2015-04-01 21:41:29 UTC
Permalink
Post by Greg Kroah-Hartman
Post by Daniel Mack
Give the kdbus sample its own config switch and only build it if it's
explicitly switched on.
---
samples/Kconfig | 7 +++++++
samples/kdbus/Makefile | 2 +-
2 files changed, 8 insertions(+), 1 deletion(-)
Now applied, thanks.
Is this going to fix i386 allmodconfig, currently unhappy on my Fedora
Core 6 (lol) machine?

samples/kdbus/kdbus-workers.c:73:26: error: sys/signalfd.h: No such file or directory
samples/kdbus/kdbus-workers.c: In function 'master_new':
samples/kdbus/kdbus-workers.c:231: warning: implicit declaration of function 'signalfd'
samples/kdbus/kdbus-workers.c:231: error: 'SFD_CLOEXEC' undeclared (first use in this function)
samples/kdbus/kdbus-workers.c:231: error: (Each undeclared identifier is reported only once
samples/kdbus/kdbus-workers.c:231: error: for each function it appears in.)
samples/kdbus/kdbus-workers.c: In function 'master_handle_signal':
samples/kdbus/kdbus-workers.c:406: error: storage size of 'val' isn't known
samples/kdbus/kdbus-workers.c:406: warning: unused variable 'val'
samples/kdbus/kdbus-workers.c: In function 'child_run':
samples/kdbus/kdbus-workers.c:773: error: 'CLOCK_MONOTONIC_COARSE' undeclared (first use in this function)
samples/kdbus/kdbus-workers.c: In function 'bus_open_connection':
samples/kdbus/kdbus-workers.c:1038: error: 'O_CLOEXEC' undeclared (first use in this function)
samples/kdbus/kdbus-workers.c: In function 'bus_make':
samples/kdbus/kdbus-workers.c:1275: error: 'O_CLOEXEC' undeclared (first use in this function)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Daniel Mack
2015-04-03 11:02:52 UTC
Permalink
Hi Andrew,
Post by Andrew Morton
Post by Greg Kroah-Hartman
Post by Daniel Mack
Give the kdbus sample its own config switch and only build it if it's
explicitly switched on.
---
samples/Kconfig | 7 +++++++
samples/kdbus/Makefile | 2 +-
2 files changed, 8 insertions(+), 1 deletion(-)
Now applied, thanks.
Is this going to fix i386 allmodconfig, currently unhappy on my Fedora
Core 6 (lol) machine?
As allmodconfig will select CONFIG_SAMPLE_KDBUS just as it selected
CONFIG_SAMPLES, that won't help, no.
Post by Andrew Morton
samples/kdbus/kdbus-workers.c:73:26: error: sys/signalfd.h: No such file or directory
samples/kdbus/kdbus-workers.c:231: warning: implicit declaration of function 'signalfd'
samples/kdbus/kdbus-workers.c:231: error: 'SFD_CLOEXEC' undeclared (first use in this function)
samples/kdbus/kdbus-workers.c:231: error: (Each undeclared identifier is reported only once
samples/kdbus/kdbus-workers.c:231: error: for each function it appears in.)
samples/kdbus/kdbus-workers.c:406: error: storage size of 'val' isn't known
samples/kdbus/kdbus-workers.c:406: warning: unused variable 'val'
samples/kdbus/kdbus-workers.c:773: error: 'CLOCK_MONOTONIC_COARSE' undeclared (first use in this function)
samples/kdbus/kdbus-workers.c:1038: error: 'O_CLOEXEC' undeclared (first use in this function)
samples/kdbus/kdbus-workers.c:1275: error: 'O_CLOEXEC' undeclared (first use in this function)
Hmm, so your libc headers lack support for signalfds, which were
introduced in kernel v2.6.22 (2007), one year after the release of your
distribution.

We can't probe for the existance of files in the local toolchain before
compilation with kdbuild, so we have to stub out the code for glibc
version that are known to lack features. This isn't particularly nice,
but it should work.

Does the attached patch work for you?



Thanks,
Daniel
Greg Kroah-Hartman
2015-03-09 13:12:36 UTC
Permalink
From: Daniel Mack <***@zonque.org>

This patch adds the name registry implementation.

Each bus instantiates a name registry to resolve well-known names
into unique connection IDs for message delivery. The registry will
be queried when a message is sent with kdbus_msg.dst_id set to
KDBUS_DST_ID_NAME, or when a registry dump is requested.

It's important to have this registry implemented in the kernel to
implement lookups and take-overs in a race-free way.

Signed-off-by: Daniel Mack <***@zonque.org>
Signed-off-by: David Herrmann <***@gmail.com>
Signed-off-by: Djalal Harouni <***@opendz.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
ipc/kdbus/names.c | 772 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
ipc/kdbus/names.h | 74 ++++++
2 files changed, 846 insertions(+)
create mode 100644 ipc/kdbus/names.c
create mode 100644 ipc/kdbus/names.h

diff --git a/ipc/kdbus/names.c b/ipc/kdbus/names.c
new file mode 100644
index 000000000000..657008e1bb37
--- /dev/null
+++ b/ipc/kdbus/names.c
@@ -0,0 +1,772 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ * Copyright (C) 2014-2015 Djalal Harouni <***@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/ctype.h>
+#include <linux/fs.h>
+#include <linux/hash.h>
+#include <linux/idr.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/rwsem.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/uio.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "endpoint.h"
+#include "handle.h"
+#include "item.h"
+#include "names.h"
+#include "notify.h"
+#include "policy.h"
+
+struct kdbus_name_pending {
+ u64 flags;
+ struct kdbus_conn *conn;
+ struct kdbus_name_entry *name;
+ struct list_head conn_entry;
+ struct list_head name_entry;
+};
+
+static int kdbus_name_pending_new(struct kdbus_name_entry *e,
+ struct kdbus_conn *conn, u64 flags)
+{
+ struct kdbus_name_pending *p;
+
+ kdbus_conn_assert_active(conn);
+
+ p = kmalloc(sizeof(*p), GFP_KERNEL);
+ if (!p)
+ return -ENOMEM;
+
+ p->flags = flags;
+ p->conn = conn;
+ p->name = e;
+ list_add_tail(&p->conn_entry, &conn->names_queue_list);
+ list_add_tail(&p->name_entry, &e->queue);
+
+ return 0;
+}
+
+static void kdbus_name_pending_free(struct kdbus_name_pending *p)
+{
+ if (!p)
+ return;
+
+ list_del(&p->name_entry);
+ list_del(&p->conn_entry);
+ kfree(p);
+}
+
+static struct kdbus_name_entry *
+kdbus_name_entry_new(struct kdbus_name_registry *r, u32 hash, const char *name)
+{
+ struct kdbus_name_entry *e;
+ size_t namelen;
+
+ namelen = strlen(name);
+
+ e = kmalloc(sizeof(*e) + namelen + 1, GFP_KERNEL);
+ if (!e)
+ return ERR_PTR(-ENOMEM);
+
+ e->name_id = ++r->name_seq_last;
+ e->flags = 0;
+ e->conn = NULL;
+ e->activator = NULL;
+ INIT_LIST_HEAD(&e->queue);
+ INIT_LIST_HEAD(&e->conn_entry);
+ hash_add(r->entries_hash, &e->hentry, hash);
+ memcpy(e->name, name, namelen + 1);
+
+ return e;
+}
+
+static void kdbus_name_entry_free(struct kdbus_name_entry *e)
+{
+ if (!e)
+ return;
+
+ WARN_ON(!list_empty(&e->conn_entry));
+ WARN_ON(!list_empty(&e->queue));
+ WARN_ON(e->activator);
+ WARN_ON(e->conn);
+
+ hash_del(&e->hentry);
+ kfree(e);
+}
+
+static void kdbus_name_entry_set_owner(struct kdbus_name_entry *e,
+ struct kdbus_conn *conn, u64 flags)
+{
+ WARN_ON(e->conn);
+
+ e->conn = kdbus_conn_ref(conn);
+ e->flags = flags;
+ atomic_inc(&conn->name_count);
+ list_add_tail(&e->conn_entry, &e->conn->names_list);
+}
+
+static void kdbus_name_entry_remove_owner(struct kdbus_name_entry *e)
+{
+ WARN_ON(!e->conn);
+
+ list_del_init(&e->conn_entry);
+ atomic_dec(&e->conn->name_count);
+ e->flags = 0;
+ e->conn = kdbus_conn_unref(e->conn);
+}
+
+static void kdbus_name_entry_replace_owner(struct kdbus_name_entry *e,
+ struct kdbus_conn *conn, u64 flags)
+{
+ if (WARN_ON(!e->conn) || WARN_ON(conn == e->conn))
+ return;
+
+ kdbus_notify_name_change(conn->ep->bus, KDBUS_ITEM_NAME_CHANGE,
+ e->conn->id, conn->id,
+ e->flags, flags, e->name);
+ kdbus_name_entry_remove_owner(e);
+ kdbus_name_entry_set_owner(e, conn, flags);
+}
+
+/**
+ * kdbus_name_is_valid() - check if a name is valid
+ * @p: The name to check
+ * @allow_wildcard: Whether or not to allow a wildcard name
+ *
+ * A name is valid if all of the following criterias are met:
+ *
+ * - The name has two or more elements separated by a period ('.') character.
+ * - All elements must contain at least one character.
+ * - Each element must only contain the ASCII characters "[A-Z][a-z][0-9]_-"
+ * and must not begin with a digit.
+ * - The name must not exceed KDBUS_NAME_MAX_LEN.
+ * - If @allow_wildcard is true, the name may end on '.*'
+ */
+bool kdbus_name_is_valid(const char *p, bool allow_wildcard)
+{
+ bool dot, found_dot = false;
+ const char *q;
+
+ for (dot = true, q = p; *q; q++) {
+ if (*q == '.') {
+ if (dot)
+ return false;
+
+ found_dot = true;
+ dot = true;
+ } else {
+ bool good;
+
+ good = isalpha(*q) || (!dot && isdigit(*q)) ||
+ *q == '_' || *q == '-' ||
+ (allow_wildcard && dot &&
+ *q == '*' && *(q + 1) == '\0');
+
+ if (!good)
+ return false;
+
+ dot = false;
+ }
+ }
+
+ if (q - p > KDBUS_NAME_MAX_LEN)
+ return false;
+
+ if (dot)
+ return false;
+
+ if (!found_dot)
+ return false;
+
+ return true;
+}
+
+/**
+ * kdbus_name_registry_new() - create a new name registry
+ *
+ * Return: a new kdbus_name_registry on success, ERR_PTR on failure.
+ */
+struct kdbus_name_registry *kdbus_name_registry_new(void)
+{
+ struct kdbus_name_registry *r;
+
+ r = kmalloc(sizeof(*r), GFP_KERNEL);
+ if (!r)
+ return ERR_PTR(-ENOMEM);
+
+ hash_init(r->entries_hash);
+ init_rwsem(&r->rwlock);
+ r->name_seq_last = 0;
+
+ return r;
+}
+
+/**
+ * kdbus_name_registry_free() - drop a name reg's reference
+ * @reg: The name registry, may be %NULL
+ *
+ * Cleanup the name registry's internal structures.
+ */
+void kdbus_name_registry_free(struct kdbus_name_registry *reg)
+{
+ if (!reg)
+ return;
+
+ WARN_ON(!hash_empty(reg->entries_hash));
+ kfree(reg);
+}
+
+static struct kdbus_name_entry *
+kdbus_name_find(struct kdbus_name_registry *reg, u32 hash, const char *name)
+{
+ struct kdbus_name_entry *e;
+
+ lockdep_assert_held(&reg->rwlock);
+
+ hash_for_each_possible(reg->entries_hash, e, hentry, hash)
+ if (strcmp(e->name, name) == 0)
+ return e;
+
+ return NULL;
+}
+
+/**
+ * kdbus_name_lookup_unlocked() - lookup name in registry
+ * @reg: name registry
+ * @name: name to lookup
+ *
+ * This looks up @name in the given name-registry and returns the
+ * kdbus_name_entry object. The caller must hold the registry-lock and must not
+ * access the returned object after releasing the lock.
+ *
+ * Return: Pointer to name-entry, or NULL if not found.
+ */
+struct kdbus_name_entry *
+kdbus_name_lookup_unlocked(struct kdbus_name_registry *reg, const char *name)
+{
+ return kdbus_name_find(reg, kdbus_strhash(name), name);
+}
+
+/**
+ * kdbus_name_acquire() - acquire a name
+ * @reg: The name registry
+ * @conn: The connection to pin this entry to
+ * @name: The name to acquire
+ * @flags: Acquisition flags (KDBUS_NAME_*)
+ * @return_flags: Pointer to return flags for the acquired name
+ * (KDBUS_NAME_*), may be %NULL
+ *
+ * Callers must ensure that @conn is either a privileged bus user or has
+ * sufficient privileges in the policy-db to own the well-known name @name.
+ *
+ * Return: 0 success, negative error number on failure.
+ */
+int kdbus_name_acquire(struct kdbus_name_registry *reg,
+ struct kdbus_conn *conn, const char *name,
+ u64 flags, u64 *return_flags)
+{
+ struct kdbus_name_entry *e;
+ u64 rflags = 0;
+ int ret = 0;
+ u32 hash;
+
+ kdbus_conn_assert_active(conn);
+
+ down_write(&reg->rwlock);
+
+ if (!kdbus_conn_policy_own_name(conn, current_cred(), name)) {
+ ret = -EPERM;
+ goto exit_unlock;
+ }
+
+ hash = kdbus_strhash(name);
+ e = kdbus_name_find(reg, hash, name);
+ if (!e) {
+ /* claim new name */
+
+ if (conn->activator_of) {
+ ret = -EINVAL;
+ goto exit_unlock;
+ }
+
+ e = kdbus_name_entry_new(reg, hash, name);
+ if (IS_ERR(e)) {
+ ret = PTR_ERR(e);
+ goto exit_unlock;
+ }
+
+ if (kdbus_conn_is_activator(conn)) {
+ e->activator = kdbus_conn_ref(conn);
+ conn->activator_of = e;
+ }
+
+ kdbus_name_entry_set_owner(e, conn, flags);
+ kdbus_notify_name_change(e->conn->ep->bus, KDBUS_ITEM_NAME_ADD,
+ 0, e->conn->id, 0, e->flags, e->name);
+ } else if (e->conn == conn || e == conn->activator_of) {
+ /* connection already owns that name */
+ ret = -EALREADY;
+ } else if (kdbus_conn_is_activator(conn)) {
+ /* activator claims existing name */
+
+ if (conn->activator_of) {
+ ret = -EINVAL; /* multiple names not allowed */
+ } else if (e->activator) {
+ ret = -EEXIST; /* only one activator per name */
+ } else {
+ e->activator = kdbus_conn_ref(conn);
+ conn->activator_of = e;
+ }
+ } else if (e->flags & KDBUS_NAME_ACTIVATOR) {
+ /* claim name of an activator */
+
+ kdbus_conn_move_messages(conn, e->activator, 0);
+ kdbus_name_entry_replace_owner(e, conn, flags);
+ } else if ((flags & KDBUS_NAME_REPLACE_EXISTING) &&
+ (e->flags & KDBUS_NAME_ALLOW_REPLACEMENT)) {
+ /* claim name of a previous owner */
+
+ if (e->flags & KDBUS_NAME_QUEUE) {
+ /* move owner back to queue if they asked for it */
+ ret = kdbus_name_pending_new(e, e->conn, e->flags);
+ if (ret < 0)
+ goto exit_unlock;
+ }
+
+ kdbus_name_entry_replace_owner(e, conn, flags);
+ } else if (flags & KDBUS_NAME_QUEUE) {
+ /* add to waiting-queue of the name */
+
+ ret = kdbus_name_pending_new(e, conn, flags);
+ if (ret >= 0)
+ /* tell the caller that we queued it */
+ rflags |= KDBUS_NAME_IN_QUEUE;
+ } else {
+ /* the name is busy, return a failure */
+ ret = -EEXIST;
+ }
+
+ if (ret == 0 && return_flags)
+ *return_flags = rflags;
+
+exit_unlock:
+ up_write(&reg->rwlock);
+ kdbus_notify_flush(conn->ep->bus);
+ return ret;
+}
+
+static void kdbus_name_release_unlocked(struct kdbus_name_registry *reg,
+ struct kdbus_name_entry *e)
+{
+ struct kdbus_name_pending *p;
+
+ lockdep_assert_held(&reg->rwlock);
+
+ p = list_first_entry_or_null(&e->queue, struct kdbus_name_pending,
+ name_entry);
+
+ if (p) {
+ /* give it to first active waiter in the queue */
+ kdbus_name_entry_replace_owner(e, p->conn, p->flags);
+ kdbus_name_pending_free(p);
+ } else if (e->activator && e->activator != e->conn) {
+ /* hand it back to an active activator connection */
+ kdbus_conn_move_messages(e->activator, e->conn, e->name_id);
+ kdbus_name_entry_replace_owner(e, e->activator,
+ KDBUS_NAME_ACTIVATOR);
+ } else {
+ /* release the name */
+ kdbus_notify_name_change(e->conn->ep->bus,
+ KDBUS_ITEM_NAME_REMOVE,
+ e->conn->id, 0, e->flags, 0, e->name);
+ kdbus_name_entry_remove_owner(e);
+ kdbus_name_entry_free(e);
+ }
+}
+
+static int kdbus_name_release(struct kdbus_name_registry *reg,
+ struct kdbus_conn *conn,
+ const char *name)
+{
+ struct kdbus_name_pending *p;
+ struct kdbus_name_entry *e;
+ int ret = 0;
+
+ down_write(&reg->rwlock);
+ e = kdbus_name_find(reg, kdbus_strhash(name), name);
+ if (!e) {
+ ret = -ESRCH;
+ } else if (e->conn == conn) {
+ kdbus_name_release_unlocked(reg, e);
+ } else {
+ ret = -EADDRINUSE;
+ list_for_each_entry(p, &e->queue, name_entry) {
+ if (p->conn == conn) {
+ kdbus_name_pending_free(p);
+ ret = 0;
+ break;
+ }
+ }
+ }
+ up_write(&reg->rwlock);
+
+ kdbus_notify_flush(conn->ep->bus);
+ return ret;
+}
+
+/**
+ * kdbus_name_release_all() - remove all name entries of a given connection
+ * @reg: name registry
+ * @conn: connection
+ */
+void kdbus_name_release_all(struct kdbus_name_registry *reg,
+ struct kdbus_conn *conn)
+{
+ struct kdbus_name_pending *p;
+ struct kdbus_conn *activator = NULL;
+ struct kdbus_name_entry *e;
+
+ down_write(&reg->rwlock);
+
+ if (kdbus_conn_is_activator(conn)) {
+ activator = conn->activator_of->activator;
+ conn->activator_of->activator = NULL;
+ }
+
+ while ((p = list_first_entry_or_null(&conn->names_queue_list,
+ struct kdbus_name_pending,
+ conn_entry)))
+ kdbus_name_pending_free(p);
+ while ((e = list_first_entry_or_null(&conn->names_list,
+ struct kdbus_name_entry,
+ conn_entry)))
+ kdbus_name_release_unlocked(reg, e);
+
+ up_write(&reg->rwlock);
+
+ kdbus_conn_unref(activator);
+ kdbus_notify_flush(conn->ep->bus);
+}
+
+/**
+ * kdbus_cmd_name_acquire() - handle KDBUS_CMD_NAME_ACQUIRE
+ * @conn: connection to operate on
+ * @argp: command payload
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kdbus_cmd_name_acquire(struct kdbus_conn *conn, void __user *argp)
+{
+ const char *item_name;
+ struct kdbus_cmd *cmd;
+ int ret;
+
+ struct kdbus_arg argv[] = {
+ { .type = KDBUS_ITEM_NEGOTIATE },
+ { .type = KDBUS_ITEM_NAME, .mandatory = true },
+ };
+ struct kdbus_args args = {
+ .allowed_flags = KDBUS_FLAG_NEGOTIATE |
+ KDBUS_NAME_REPLACE_EXISTING |
+ KDBUS_NAME_ALLOW_REPLACEMENT |
+ KDBUS_NAME_QUEUE,
+ .argv = argv,
+ .argc = ARRAY_SIZE(argv),
+ };
+
+ if (!kdbus_conn_is_ordinary(conn))
+ return -EOPNOTSUPP;
+
+ ret = kdbus_args_parse(&args, argp, &cmd);
+ if (ret != 0)
+ return ret;
+
+ item_name = argv[1].item->str;
+ if (!kdbus_name_is_valid(item_name, false)) {
+ ret = -EINVAL;
+ goto exit;
+ }
+
+ /*
+ * Do atomic_inc_return here to reserve our slot, then decrement
+ * it before returning.
+ */
+ if (atomic_inc_return(&conn->name_count) > KDBUS_CONN_MAX_NAMES) {
+ ret = -E2BIG;
+ goto exit_dec;
+ }
+
+ ret = kdbus_name_acquire(conn->ep->bus->name_registry, conn, item_name,
+ cmd->flags, &cmd->return_flags);
+ if (ret < 0)
+ goto exit_dec;
+
+exit_dec:
+ atomic_dec(&conn->name_count);
+exit:
+ return kdbus_args_clear(&args, ret);
+}
+
+/**
+ * kdbus_cmd_name_release() - handle KDBUS_CMD_NAME_RELEASE
+ * @conn: connection to operate on
+ * @argp: command payload
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kdbus_cmd_name_release(struct kdbus_conn *conn, void __user *argp)
+{
+ struct kdbus_cmd *cmd;
+ int ret;
+
+ struct kdbus_arg argv[] = {
+ { .type = KDBUS_ITEM_NEGOTIATE },
+ { .type = KDBUS_ITEM_NAME, .mandatory = true },
+ };
+ struct kdbus_args args = {
+ .allowed_flags = KDBUS_FLAG_NEGOTIATE,
+ .argv = argv,
+ .argc = ARRAY_SIZE(argv),
+ };
+
+ if (!kdbus_conn_is_ordinary(conn))
+ return -EOPNOTSUPP;
+
+ ret = kdbus_args_parse(&args, argp, &cmd);
+ if (ret != 0)
+ return ret;
+
+ ret = kdbus_name_release(conn->ep->bus->name_registry, conn,
+ argv[1].item->str);
+ return kdbus_args_clear(&args, ret);
+}
+
+static int kdbus_list_write(struct kdbus_conn *conn,
+ struct kdbus_conn *c,
+ struct kdbus_pool_slice *slice,
+ size_t *pos,
+ struct kdbus_name_entry *e,
+ bool write)
+{
+ struct kvec kvec[4];
+ size_t cnt = 0;
+ int ret;
+
+ /* info header */
+ struct kdbus_info info = {
+ .size = 0,
+ .id = c->id,
+ .flags = c->flags,
+ };
+
+ /* fake the header of a kdbus_name item */
+ struct {
+ u64 size;
+ u64 type;
+ u64 flags;
+ } h = {};
+
+ if (e && !kdbus_conn_policy_see_name_unlocked(conn, current_cred(),
+ e->name))
+ return 0;
+
+ kdbus_kvec_set(&kvec[cnt++], &info, sizeof(info), &info.size);
+
+ /* append name */
+ if (e) {
+ size_t slen = strlen(e->name) + 1;
+
+ h.size = offsetof(struct kdbus_item, name.name) + slen;
+ h.type = KDBUS_ITEM_OWNED_NAME;
+ h.flags = e->flags;
+
+ kdbus_kvec_set(&kvec[cnt++], &h, sizeof(h), &info.size);
+ kdbus_kvec_set(&kvec[cnt++], e->name, slen, &info.size);
+ cnt += !!kdbus_kvec_pad(&kvec[cnt], &info.size);
+ }
+
+ if (write) {
+ ret = kdbus_pool_slice_copy_kvec(slice, *pos, kvec,
+ cnt, info.size);
+ if (ret < 0)
+ return ret;
+ }
+
+ *pos += info.size;
+ return 0;
+}
+
+static int kdbus_list_all(struct kdbus_conn *conn, u64 flags,
+ struct kdbus_pool_slice *slice,
+ size_t *pos, bool write)
+{
+ struct kdbus_conn *c;
+ size_t p = *pos;
+ int ret, i;
+
+ hash_for_each(conn->ep->bus->conn_hash, i, c, hentry) {
+ bool added = false;
+
+ /* skip monitors */
+ if (kdbus_conn_is_monitor(c))
+ continue;
+
+ /* skip activators */
+ if (!(flags & KDBUS_LIST_ACTIVATORS) &&
+ kdbus_conn_is_activator(c))
+ continue;
+
+ /* all names the connection owns */
+ if (flags & (KDBUS_LIST_NAMES | KDBUS_LIST_ACTIVATORS)) {
+ struct kdbus_name_entry *e;
+
+ list_for_each_entry(e, &c->names_list, conn_entry) {
+ struct kdbus_conn *a = e->activator;
+
+ if ((flags & KDBUS_LIST_ACTIVATORS) &&
+ a && a != c) {
+ ret = kdbus_list_write(conn, a, slice,
+ &p, e, write);
+ if (ret < 0) {
+ mutex_unlock(&c->lock);
+ return ret;
+ }
+
+ added = true;
+ }
+
+ if (flags & KDBUS_LIST_NAMES ||
+ kdbus_conn_is_activator(c)) {
+ ret = kdbus_list_write(conn, c, slice,
+ &p, e, write);
+ if (ret < 0) {
+ mutex_unlock(&c->lock);
+ return ret;
+ }
+
+ added = true;
+ }
+ }
+ }
+
+ /* queue of names the connection is currently waiting for */
+ if (flags & KDBUS_LIST_QUEUED) {
+ struct kdbus_name_pending *q;
+
+ list_for_each_entry(q, &c->names_queue_list,
+ conn_entry) {
+ ret = kdbus_list_write(conn, c, slice, &p,
+ q->name, write);
+ if (ret < 0) {
+ mutex_unlock(&c->lock);
+ return ret;
+ }
+
+ added = true;
+ }
+ }
+
+ /* nothing added so far, just add the unique ID */
+ if (!added && flags & KDBUS_LIST_UNIQUE) {
+ ret = kdbus_list_write(conn, c, slice, &p, NULL, write);
+ if (ret < 0)
+ return ret;
+ }
+ }
+
+ *pos = p;
+ return 0;
+}
+
+/**
+ * kdbus_cmd_list() - handle KDBUS_CMD_LIST
+ * @conn: connection to operate on
+ * @argp: command payload
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kdbus_cmd_list(struct kdbus_conn *conn, void __user *argp)
+{
+ struct kdbus_name_registry *reg = conn->ep->bus->name_registry;
+ struct kdbus_pool_slice *slice = NULL;
+ struct kdbus_cmd_list *cmd;
+ size_t pos, size;
+ int ret;
+
+ struct kdbus_arg argv[] = {
+ { .type = KDBUS_ITEM_NEGOTIATE },
+ };
+ struct kdbus_args args = {
+ .allowed_flags = KDBUS_FLAG_NEGOTIATE |
+ KDBUS_LIST_UNIQUE |
+ KDBUS_LIST_NAMES |
+ KDBUS_LIST_ACTIVATORS |
+ KDBUS_LIST_QUEUED,
+ .argv = argv,
+ .argc = ARRAY_SIZE(argv),
+ };
+
+ ret = kdbus_args_parse(&args, argp, &cmd);
+ if (ret != 0)
+ return ret;
+
+ /* lock order: domain -> bus -> ep -> names -> conn */
+ down_read(&reg->rwlock);
+ down_read(&conn->ep->bus->conn_rwlock);
+ down_read(&conn->ep->policy_db.entries_rwlock);
+
+ /* size of records */
+ size = 0;
+ ret = kdbus_list_all(conn, cmd->flags, NULL, &size, false);
+ if (ret < 0)
+ goto exit_unlock;
+
+ if (size == 0) {
+ kdbus_pool_publish_empty(conn->pool, &cmd->offset,
+ &cmd->list_size);
+ } else {
+ slice = kdbus_pool_slice_alloc(conn->pool, size, false);
+ if (IS_ERR(slice)) {
+ ret = PTR_ERR(slice);
+ slice = NULL;
+ goto exit_unlock;
+ }
+
+ /* copy the records */
+ pos = 0;
+ ret = kdbus_list_all(conn, cmd->flags, slice, &pos, true);
+ if (ret < 0)
+ goto exit_unlock;
+
+ WARN_ON(pos != size);
+ kdbus_pool_slice_publish(slice, &cmd->offset, &cmd->list_size);
+ }
+
+ if (kdbus_member_set_user(&cmd->offset, argp, typeof(*cmd), offset) ||
+ kdbus_member_set_user(&cmd->list_size, argp,
+ typeof(*cmd), list_size))
+ ret = -EFAULT;
+
+exit_unlock:
+ up_read(&conn->ep->policy_db.entries_rwlock);
+ up_read(&conn->ep->bus->conn_rwlock);
+ up_read(&reg->rwlock);
+ kdbus_pool_slice_release(slice);
+ return kdbus_args_clear(&args, ret);
+}
diff --git a/ipc/kdbus/names.h b/ipc/kdbus/names.h
new file mode 100644
index 000000000000..3dd2589293e0
--- /dev/null
+++ b/ipc/kdbus/names.h
@@ -0,0 +1,74 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ * Copyright (C) 2014-2015 Djalal Harouni <***@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_NAMES_H
+#define __KDBUS_NAMES_H
+
+#include <linux/hashtable.h>
+#include <linux/rwsem.h>
+
+/**
+ * struct kdbus_name_registry - names registered for a bus
+ * @entries_hash: Map of entries
+ * @lock: Registry data lock
+ * @name_seq_last: Last used sequence number to assign to a name entry
+ */
+struct kdbus_name_registry {
+ DECLARE_HASHTABLE(entries_hash, 8);
+ struct rw_semaphore rwlock;
+ u64 name_seq_last;
+};
+
+/**
+ * struct kdbus_name_entry - well-know name entry
+ * @name_id: Sequence number of name entry to be able to uniquely
+ * identify a name over its registration lifetime
+ * @flags: KDBUS_NAME_* flags
+ * @conn: Connection owning the name
+ * @activator: Connection of the activator queuing incoming messages
+ * @queue: List of queued connections
+ * @conn_entry: Entry in connection
+ * @hentry: Entry in registry map
+ * @name: The well-known name
+ */
+struct kdbus_name_entry {
+ u64 name_id;
+ u64 flags;
+ struct kdbus_conn *conn;
+ struct kdbus_conn *activator;
+ struct list_head queue;
+ struct list_head conn_entry;
+ struct hlist_node hentry;
+ char name[];
+};
+
+bool kdbus_name_is_valid(const char *p, bool allow_wildcard);
+
+struct kdbus_name_registry *kdbus_name_registry_new(void);
+void kdbus_name_registry_free(struct kdbus_name_registry *reg);
+
+struct kdbus_name_entry *
+kdbus_name_lookup_unlocked(struct kdbus_name_registry *reg, const char *name);
+
+int kdbus_name_acquire(struct kdbus_name_registry *reg,
+ struct kdbus_conn *conn, const char *name,
+ u64 flags, u64 *return_flags);
+void kdbus_name_release_all(struct kdbus_name_registry *reg,
+ struct kdbus_conn *conn);
+
+int kdbus_cmd_name_acquire(struct kdbus_conn *conn, void __user *argp);
+int kdbus_cmd_name_release(struct kdbus_conn *conn, void __user *argp);
+int kdbus_cmd_list(struct kdbus_conn *conn, void __user *argp);
+
+#endif
--
2.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2015-03-09 13:15:40 UTC
Permalink
From: Daniel Mack <***@zonque.org>

This patch adds the policy database implementation.

A policy database restricts the possibilities of connections to own,
see and talk to well-known names. It can be associated with a bus
(through a policy holder connection) or a custom endpoint.

By default, buses have an empty policy database that is augmented on
demand when a policy holder connection is instantiated.

Policies are set through KDBUS_CMD_HELLO (when creating a policy
holder connection), KDBUS_CMD_CONN_UPDATE (when updating a policy
holder connection), KDBUS_CMD_EP_MAKE (creating a custom endpoint)
or KDBUS_CMD_EP_UPDATE (updating a custom endpoint). In all cases,
the name and policy access information is stored in items of type
KDBUS_ITEM_NAME and KDBUS_ITEM_POLICY_ACCESS.

See kdbus.policy(7) for more details.

Signed-off-by: Daniel Mack <***@zonque.org>
Signed-off-by: David Herrmann <***@gmail.com>
Signed-off-by: Djalal Harouni <***@opendz.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
ipc/kdbus/policy.c | 489 +++++++++++++++++++++++++++++++++++++++++++++++++++++
ipc/kdbus/policy.h | 51 ++++++
2 files changed, 540 insertions(+)
create mode 100644 ipc/kdbus/policy.c
create mode 100644 ipc/kdbus/policy.h

diff --git a/ipc/kdbus/policy.c b/ipc/kdbus/policy.c
new file mode 100644
index 000000000000..dd7fffaafa84
--- /dev/null
+++ b/ipc/kdbus/policy.c
@@ -0,0 +1,489 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ * Copyright (C) 2014-2015 Djalal Harouni
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/dcache.h>
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/mutex.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "domain.h"
+#include "item.h"
+#include "names.h"
+#include "policy.h"
+
+#define KDBUS_POLICY_HASH_SIZE 64
+
+/**
+ * struct kdbus_policy_db_entry_access - a database entry access item
+ * @type: One of KDBUS_POLICY_ACCESS_* types
+ * @access: Access to grant. One of KDBUS_POLICY_*
+ * @uid: For KDBUS_POLICY_ACCESS_USER, the global uid
+ * @gid: For KDBUS_POLICY_ACCESS_GROUP, the global gid
+ * @list: List entry item for the entry's list
+ *
+ * This is the internal version of struct kdbus_policy_db_access.
+ */
+struct kdbus_policy_db_entry_access {
+ u8 type; /* USER, GROUP, WORLD */
+ u8 access; /* OWN, TALK, SEE */
+ union {
+ kuid_t uid; /* global uid */
+ kgid_t gid; /* global gid */
+ };
+ struct list_head list;
+};
+
+/**
+ * struct kdbus_policy_db_entry - a policy database entry
+ * @name: The name to match the policy entry against
+ * @hentry: The hash entry for the database's entries_hash
+ * @access_list: List head for keeping tracks of the entry's
+ * access items.
+ * @owner: The owner of this entry. Can be a kdbus_conn or
+ * a kdbus_ep object.
+ * @wildcard: The name is a wildcard, such as ending on '.*'
+ */
+struct kdbus_policy_db_entry {
+ char *name;
+ struct hlist_node hentry;
+ struct list_head access_list;
+ const void *owner;
+ bool wildcard:1;
+};
+
+static void kdbus_policy_entry_free(struct kdbus_policy_db_entry *e)
+{
+ struct kdbus_policy_db_entry_access *a, *tmp;
+
+ list_for_each_entry_safe(a, tmp, &e->access_list, list) {
+ list_del(&a->list);
+ kfree(a);
+ }
+
+ kfree(e->name);
+ kfree(e);
+}
+
+static unsigned int kdbus_strnhash(const char *str, size_t len)
+{
+ unsigned long hash = init_name_hash();
+
+ while (len--)
+ hash = partial_name_hash(*str++, hash);
+
+ return end_name_hash(hash);
+}
+
+static const struct kdbus_policy_db_entry *
+kdbus_policy_lookup(struct kdbus_policy_db *db, const char *name, u32 hash)
+{
+ struct kdbus_policy_db_entry *e;
+ const char *dot;
+ size_t len;
+
+ /* find exact match */
+ hash_for_each_possible(db->entries_hash, e, hentry, hash)
+ if (strcmp(e->name, name) == 0 && !e->wildcard)
+ return e;
+
+ /* find wildcard match */
+
+ dot = strrchr(name, '.');
+ if (!dot)
+ return NULL;
+
+ len = dot - name;
+ hash = kdbus_strnhash(name, len);
+
+ hash_for_each_possible(db->entries_hash, e, hentry, hash)
+ if (e->wildcard && !strncmp(e->name, name, len) &&
+ !e->name[len])
+ return e;
+
+ return NULL;
+}
+
+/**
+ * kdbus_policy_db_clear - release all memory from a policy db
+ * @db: The policy database
+ */
+void kdbus_policy_db_clear(struct kdbus_policy_db *db)
+{
+ struct kdbus_policy_db_entry *e;
+ struct hlist_node *tmp;
+ unsigned int i;
+
+ /* purge entries */
+ down_write(&db->entries_rwlock);
+ hash_for_each_safe(db->entries_hash, i, tmp, e, hentry) {
+ hash_del(&e->hentry);
+ kdbus_policy_entry_free(e);
+ }
+ up_write(&db->entries_rwlock);
+}
+
+/**
+ * kdbus_policy_db_init() - initialize a new policy database
+ * @db: The location of the database
+ *
+ * This initializes a new policy-db. The underlying memory must have been
+ * cleared to zero by the caller.
+ */
+void kdbus_policy_db_init(struct kdbus_policy_db *db)
+{
+ hash_init(db->entries_hash);
+ init_rwsem(&db->entries_rwlock);
+}
+
+/**
+ * kdbus_policy_query_unlocked() - Query the policy database
+ * @db: Policy database
+ * @cred: Credentials to test against
+ * @name: Name to query
+ * @hash: Hash value of @name
+ *
+ * Same as kdbus_policy_query() but requires the caller to lock the policy
+ * database against concurrent writes.
+ *
+ * Return: The highest KDBUS_POLICY_* access type found, or -EPERM if none.
+ */
+int kdbus_policy_query_unlocked(struct kdbus_policy_db *db,
+ const struct cred *cred, const char *name,
+ unsigned int hash)
+{
+ struct kdbus_policy_db_entry_access *a;
+ const struct kdbus_policy_db_entry *e;
+ int i, highest = -EPERM;
+
+ e = kdbus_policy_lookup(db, name, hash);
+ if (!e)
+ return -EPERM;
+
+ list_for_each_entry(a, &e->access_list, list) {
+ if ((int)a->access <= highest)
+ continue;
+
+ switch (a->type) {
+ case KDBUS_POLICY_ACCESS_USER:
+ if (uid_eq(cred->euid, a->uid))
+ highest = a->access;
+ break;
+ case KDBUS_POLICY_ACCESS_GROUP:
+ if (gid_eq(cred->egid, a->gid)) {
+ highest = a->access;
+ break;
+ }
+
+ for (i = 0; i < cred->group_info->ngroups; i++) {
+ kgid_t gid = GROUP_AT(cred->group_info, i);
+
+ if (gid_eq(gid, a->gid)) {
+ highest = a->access;
+ break;
+ }
+ }
+
+ break;
+ case KDBUS_POLICY_ACCESS_WORLD:
+ highest = a->access;
+ break;
+ }
+
+ /* OWN is the highest possible policy */
+ if (highest >= KDBUS_POLICY_OWN)
+ break;
+ }
+
+ return highest;
+}
+
+/**
+ * kdbus_policy_query() - Query the policy database
+ * @db: Policy database
+ * @cred: Credentials to test against
+ * @name: Name to query
+ * @hash: Hash value of @name
+ *
+ * Query the policy database @db for the access rights of @cred to the name
+ * @name. The access rights of @cred are returned, or -EPERM if no access is
+ * granted.
+ *
+ * This call effectively searches for the highest access-right granted to
+ * @cred. The caller should really cache those as policy lookups are rather
+ * expensive.
+ *
+ * Return: The highest KDBUS_POLICY_* access type found, or -EPERM if none.
+ */
+int kdbus_policy_query(struct kdbus_policy_db *db, const struct cred *cred,
+ const char *name, unsigned int hash)
+{
+ int ret;
+
+ down_read(&db->entries_rwlock);
+ ret = kdbus_policy_query_unlocked(db, cred, name, hash);
+ up_read(&db->entries_rwlock);
+
+ return ret;
+}
+
+static void __kdbus_policy_remove_owner(struct kdbus_policy_db *db,
+ const void *owner)
+{
+ struct kdbus_policy_db_entry *e;
+ struct hlist_node *tmp;
+ int i;
+
+ hash_for_each_safe(db->entries_hash, i, tmp, e, hentry)
+ if (e->owner == owner) {
+ hash_del(&e->hentry);
+ kdbus_policy_entry_free(e);
+ }
+}
+
+/**
+ * kdbus_policy_remove_owner() - remove all entries related to a connection
+ * @db: The policy database
+ * @owner: The connection which items to remove
+ */
+void kdbus_policy_remove_owner(struct kdbus_policy_db *db,
+ const void *owner)
+{
+ down_write(&db->entries_rwlock);
+ __kdbus_policy_remove_owner(db, owner);
+ up_write(&db->entries_rwlock);
+}
+
+/*
+ * Convert user provided policy access to internal kdbus policy
+ * access
+ */
+static struct kdbus_policy_db_entry_access *
+kdbus_policy_make_access(const struct kdbus_policy_access *uaccess)
+{
+ int ret;
+ struct kdbus_policy_db_entry_access *a;
+
+ a = kzalloc(sizeof(*a), GFP_KERNEL);
+ if (!a)
+ return ERR_PTR(-ENOMEM);
+
+ ret = -EINVAL;
+ switch (uaccess->access) {
+ case KDBUS_POLICY_SEE:
+ case KDBUS_POLICY_TALK:
+ case KDBUS_POLICY_OWN:
+ a->access = uaccess->access;
+ break;
+ default:
+ goto err;
+ }
+
+ switch (uaccess->type) {
+ case KDBUS_POLICY_ACCESS_USER:
+ a->uid = make_kuid(current_user_ns(), uaccess->id);
+ if (!uid_valid(a->uid))
+ goto err;
+
+ break;
+ case KDBUS_POLICY_ACCESS_GROUP:
+ a->gid = make_kgid(current_user_ns(), uaccess->id);
+ if (!gid_valid(a->gid))
+ goto err;
+
+ break;
+ case KDBUS_POLICY_ACCESS_WORLD:
+ break;
+ default:
+ goto err;
+ }
+
+ a->type = uaccess->type;
+
+ return a;
+
+err:
+ kfree(a);
+ return ERR_PTR(ret);
+}
+
+/**
+ * kdbus_policy_set() - set a connection's policy rules
+ * @db: The policy database
+ * @items: A list of kdbus_item elements that contain both
+ * names and access rules to set.
+ * @items_size: The total size of the items.
+ * @max_policies: The maximum number of policy entries to allow.
+ * Pass 0 for no limit.
+ * @allow_wildcards: Boolean value whether wildcard entries (such
+ * ending on '.*') should be allowed.
+ * @owner: The owner of the new policy items.
+ *
+ * This function sets a new set of policies for a given owner. The names and
+ * access rules are gathered by walking the list of items passed in as
+ * argument. An item of type KDBUS_ITEM_NAME is expected before any number of
+ * KDBUS_ITEM_POLICY_ACCESS items. If there are more repetitions of this
+ * pattern than denoted in @max_policies, -EINVAL is returned.
+ *
+ * In order to allow atomic replacement of rules, the function first removes
+ * all entries that have been created for the given owner previously.
+ *
+ * Callers to this function must make sur that the owner is a custom
+ * endpoint, or if the endpoint is a default endpoint, then it must be
+ * either a policy holder or an activator.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_policy_set(struct kdbus_policy_db *db,
+ const struct kdbus_item *items,
+ size_t items_size,
+ size_t max_policies,
+ bool allow_wildcards,
+ const void *owner)
+{
+ struct kdbus_policy_db_entry_access *a;
+ struct kdbus_policy_db_entry *e, *p;
+ const struct kdbus_item *item;
+ struct hlist_node *tmp;
+ HLIST_HEAD(entries);
+ HLIST_HEAD(restore);
+ size_t count = 0;
+ int i, ret = 0;
+ u32 hash;
+
+ /* Walk the list of items and look for new policies */
+ e = NULL;
+ KDBUS_ITEMS_FOREACH(item, items, items_size) {
+ switch (item->type) {
+ case KDBUS_ITEM_NAME: {
+ size_t len;
+
+ if (max_policies && ++count > max_policies) {
+ ret = -E2BIG;
+ goto exit;
+ }
+
+ if (!kdbus_name_is_valid(item->str, true)) {
+ ret = -EINVAL;
+ goto exit;
+ }
+
+ e = kzalloc(sizeof(*e), GFP_KERNEL);
+ if (!e) {
+ ret = -ENOMEM;
+ goto exit;
+ }
+
+ INIT_LIST_HEAD(&e->access_list);
+ e->owner = owner;
+ hlist_add_head(&e->hentry, &entries);
+
+ e->name = kstrdup(item->str, GFP_KERNEL);
+ if (!e->name) {
+ ret = -ENOMEM;
+ goto exit;
+ }
+
+ /*
+ * If a supplied name ends with an '.*', cut off that
+ * part, only store anything before it, and mark the
+ * entry as wildcard.
+ */
+ len = strlen(e->name);
+ if (len > 2 &&
+ e->name[len - 3] == '.' &&
+ e->name[len - 2] == '*') {
+ if (!allow_wildcards) {
+ ret = -EINVAL;
+ goto exit;
+ }
+
+ e->name[len - 3] = '\0';
+ e->wildcard = true;
+ }
+
+ break;
+ }
+
+ case KDBUS_ITEM_POLICY_ACCESS:
+ if (!e) {
+ ret = -EINVAL;
+ goto exit;
+ }
+
+ a = kdbus_policy_make_access(&item->policy_access);
+ if (IS_ERR(a)) {
+ ret = PTR_ERR(a);
+ goto exit;
+ }
+
+ list_add_tail(&a->list, &e->access_list);
+ break;
+ }
+ }
+
+ down_write(&db->entries_rwlock);
+
+ /* remember previous entries to restore in case of failure */
+ hash_for_each_safe(db->entries_hash, i, tmp, e, hentry)
+ if (e->owner == owner) {
+ hash_del(&e->hentry);
+ hlist_add_head(&e->hentry, &restore);
+ }
+
+ hlist_for_each_entry_safe(e, tmp, &entries, hentry) {
+ /* prevent duplicates */
+ hash = kdbus_strhash(e->name);
+ hash_for_each_possible(db->entries_hash, p, hentry, hash)
+ if (strcmp(e->name, p->name) == 0 &&
+ e->wildcard == p->wildcard) {
+ ret = -EEXIST;
+ goto restore;
+ }
+
+ hlist_del(&e->hentry);
+ hash_add(db->entries_hash, &e->hentry, hash);
+ }
+
+restore:
+ /* if we failed, flush all entries we added so far */
+ if (ret < 0)
+ __kdbus_policy_remove_owner(db, owner);
+
+ /* if we failed, restore entries, otherwise release them */
+ hlist_for_each_entry_safe(e, tmp, &restore, hentry) {
+ hlist_del(&e->hentry);
+ if (ret < 0) {
+ hash = kdbus_strhash(e->name);
+ hash_add(db->entries_hash, &e->hentry, hash);
+ } else {
+ kdbus_policy_entry_free(e);
+ }
+ }
+
+ up_write(&db->entries_rwlock);
+
+exit:
+ hlist_for_each_entry_safe(e, tmp, &entries, hentry) {
+ hlist_del(&e->hentry);
+ kdbus_policy_entry_free(e);
+ }
+
+ return ret;
+}
diff --git a/ipc/kdbus/policy.h b/ipc/kdbus/policy.h
new file mode 100644
index 000000000000..15dd7bc12068
--- /dev/null
+++ b/ipc/kdbus/policy.h
@@ -0,0 +1,51 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_POLICY_H
+#define __KDBUS_POLICY_H
+
+#include <linux/hashtable.h>
+#include <linux/rwsem.h>
+
+struct kdbus_conn;
+struct kdbus_item;
+
+/**
+ * struct kdbus_policy_db - policy database
+ * @entries_hash: Hashtable of entries
+ * @entries_rwlock: Mutex to protect the database's access entries
+ */
+struct kdbus_policy_db {
+ DECLARE_HASHTABLE(entries_hash, 6);
+ struct rw_semaphore entries_rwlock;
+};
+
+void kdbus_policy_db_init(struct kdbus_policy_db *db);
+void kdbus_policy_db_clear(struct kdbus_policy_db *db);
+
+int kdbus_policy_query_unlocked(struct kdbus_policy_db *db,
+ const struct cred *cred, const char *name,
+ unsigned int hash);
+int kdbus_policy_query(struct kdbus_policy_db *db, const struct cred *cred,
+ const char *name, unsigned int hash);
+
+void kdbus_policy_remove_owner(struct kdbus_policy_db *db,
+ const void *owner);
+int kdbus_policy_set(struct kdbus_policy_db *db,
+ const struct kdbus_item *items,
+ size_t items_size,
+ size_t max_policies,
+ bool allow_wildcards,
+ const void *owner);
+
+#endif
--
2.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2015-03-09 13:17:03 UTC
Permalink
From: Daniel Mack <***@zonque.org>

A connection chooses which metadata it wants to have attached to each
message it receives with kdbus_cmd_hello.attach_flags. The metadata
will be attached as items to the messages. All metadata refers to
information about the sending task at sending time, unless otherwise
stated. Also, the metadata is copied, not referenced, so even if the
sending task doesn't exist anymore at the time the message is received,
the information is still preserved.

In traditional D-Bus, userspace tasks like polkit or journald make a
live lookup in procfs and sysfs to gain information about a sending
task. This is racy, of course, as in a a connection-less system like
D-Bus, the originating peer can go away immediately after sending the
message. As we're moving D-Bus prmitives into the kernel, we have to
provide the same semantics here, and inform the receiving peer on the
live credentials of the sending peer.

Metadata is collected at the following times.

* When a bus is created (KDBUS_CMD_MAKE), information about the
calling task is collected. This data is returned by the kernel
via the KDBUS_CMD_BUS_CREATOR_INFO call.

* When a connection is created (KDBUS_CMD_HELLO), information about
the calling task is collected. Alternatively, a privileged
connection may provide 'faked' information about credentials,
PIDs and security labels which will be stored instead. This data
is returned by the kernel as information on a connection
(KDBUS_CMD_CONN_INFO). Only metadata that a connection allowed to
be sent (by setting its bit in attach_flags_send) will be exported
in this way.

* When a message is sent (KDBUS_CMD_SEND), information about the
sending task and the sending connection are collected. This
metadata will be attached to the message when it arrives in the
receiver's pool. If the connection sending the message installed
faked credentials (see kdbus.connection(7)), the message will not
be augmented by any information about the currently sending task.

Which metadata items are actually delivered depends on the following
sets and masks:

(a) the system-wide kmod creds mask
(module parameter 'attach_flags_mask')

(b) the per-connection send creds mask, set by the connecting client

(c) the per-connection receive creds mask, set by the connecting client

(d) the per-bus minimal creds mask, set by the bus creator

(e) the per-bus owner creds mask, set by the bus creator

(f) the mask specified when querying creds of a bus peer

(g) the mask specified when querying creds of a bus owner

With the following rules:

[1] The creds attached to messages are determined as a & b & c.

[2] When connecting to a bus (KDBUS_CMD_HELLO), and ~b & d != 0,
the call will fail with, -1, and errno is set to ECONNREFUSED.

[3] When querying creds of a bus peer, the creds returned
are a & b & f.

[4] When querying creds of a bus owner, the creds returned
are a & e & g.

See kdbus.metadata(7) and kdbus.item(7) for more details on which
metadata can currently be attached to messages.

Signed-off-by: Daniel Mack <***@zonque.org>
Signed-off-by: David Herrmann <***@gmail.com>
Signed-off-by: Djalal Harouni <***@opendz.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
ipc/kdbus/metadata.c | 1164 ++++++++++++++++++++++++++++++++++++++++++++++++++
ipc/kdbus/metadata.h | 57 +++
2 files changed, 1221 insertions(+)
create mode 100644 ipc/kdbus/metadata.c
create mode 100644 ipc/kdbus/metadata.h

diff --git a/ipc/kdbus/metadata.c b/ipc/kdbus/metadata.c
new file mode 100644
index 000000000000..06e0a54a276a
--- /dev/null
+++ b/ipc/kdbus/metadata.c
@@ -0,0 +1,1164 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ * Copyright (C) 2014-2015 Djalal Harouni <***@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/audit.h>
+#include <linux/capability.h>
+#include <linux/cgroup.h>
+#include <linux/cred.h>
+#include <linux/file.h>
+#include <linux/fs_struct.h>
+#include <linux/init.h>
+#include <linux/kref.h>
+#include <linux/mutex.h>
+#include <linux/sched.h>
+#include <linux/security.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/uidgid.h>
+#include <linux/uio.h>
+#include <linux/user_namespace.h>
+#include <linux/version.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "endpoint.h"
+#include "item.h"
+#include "message.h"
+#include "metadata.h"
+#include "names.h"
+
+/**
+ * struct kdbus_meta_proc - Process metadata
+ * @kref: Reference counting
+ * @lock: Object lock
+ * @collected: Bitmask of collected items
+ * @valid: Bitmask of collected and valid items
+ * @uid: UID of process
+ * @euid: EUID of process
+ * @suid: SUID of process
+ * @fsuid: FSUID of process
+ * @gid: GID of process
+ * @egid: EGID of process
+ * @sgid: SGID of process
+ * @fsgid: FSGID of process
+ * @pid: PID of process
+ * @tgid: TGID of process
+ * @ppid: PPID of process
+ * @auxgrps: Auxiliary groups
+ * @n_auxgrps: Number of items in @auxgrps
+ * @tid_comm: TID comm line
+ * @pid_comm: PID comm line
+ * @exe_path: Executable path
+ * @root_path: Root-FS path
+ * @cmdline: Command-line
+ * @cgroup: Full cgroup path
+ * @caps: Capabilities
+ * @caps_namespace: User-namespace of @caps
+ * @seclabel: Seclabel
+ * @audit_loginuid: Audit login-UID
+ * @audit_sessionid: Audit session-ID
+ */
+struct kdbus_meta_proc {
+ struct kref kref;
+ struct mutex lock;
+ u64 collected;
+ u64 valid;
+
+ /* KDBUS_ITEM_CREDS */
+ kuid_t uid, euid, suid, fsuid;
+ kgid_t gid, egid, sgid, fsgid;
+
+ /* KDBUS_ITEM_PIDS */
+ struct pid *pid;
+ struct pid *tgid;
+ struct pid *ppid;
+
+ /* KDBUS_ITEM_AUXGROUPS */
+ kgid_t *auxgrps;
+ size_t n_auxgrps;
+
+ /* KDBUS_ITEM_TID_COMM */
+ char tid_comm[TASK_COMM_LEN];
+ /* KDBUS_ITEM_PID_COMM */
+ char pid_comm[TASK_COMM_LEN];
+
+ /* KDBUS_ITEM_EXE */
+ struct path exe_path;
+ struct path root_path;
+
+ /* KDBUS_ITEM_CMDLINE */
+ char *cmdline;
+
+ /* KDBUS_ITEM_CGROUP */
+ char *cgroup;
+
+ /* KDBUS_ITEM_CAPS */
+ struct caps {
+ /* binary compatible to kdbus_caps */
+ u32 last_cap;
+ struct {
+ u32 caps[_KERNEL_CAPABILITY_U32S];
+ } set[4];
+ } caps;
+ struct user_namespace *caps_namespace;
+
+ /* KDBUS_ITEM_SECLABEL */
+ char *seclabel;
+
+ /* KDBUS_ITEM_AUDIT */
+ kuid_t audit_loginuid;
+ unsigned int audit_sessionid;
+};
+
+/**
+ * struct kdbus_meta_conn
+ * @kref: Reference counting
+ * @lock: Object lock
+ * @collected: Bitmask of collected items
+ * @valid: Bitmask of collected and valid items
+ * @ts: Timestamp values
+ * @owned_names_items: Serialized items for owned names
+ * @owned_names_size: Size of @owned_names_items
+ * @conn_description: Connection description
+ */
+struct kdbus_meta_conn {
+ struct kref kref;
+ struct mutex lock;
+ u64 collected;
+ u64 valid;
+
+ /* KDBUS_ITEM_TIMESTAMP */
+ struct kdbus_timestamp ts;
+
+ /* KDBUS_ITEM_OWNED_NAME */
+ struct kdbus_item *owned_names_items;
+ size_t owned_names_size;
+
+ /* KDBUS_ITEM_CONN_DESCRIPTION */
+ char *conn_description;
+};
+
+/**
+ * kdbus_meta_proc_new() - Create process metadata object
+ *
+ * Return: Pointer to new object on success, ERR_PTR on failure.
+ */
+struct kdbus_meta_proc *kdbus_meta_proc_new(void)
+{
+ struct kdbus_meta_proc *mp;
+
+ mp = kzalloc(sizeof(*mp), GFP_KERNEL);
+ if (!mp)
+ return ERR_PTR(-ENOMEM);
+
+ kref_init(&mp->kref);
+ mutex_init(&mp->lock);
+
+ return mp;
+}
+
+static void kdbus_meta_proc_free(struct kref *kref)
+{
+ struct kdbus_meta_proc *mp = container_of(kref, struct kdbus_meta_proc,
+ kref);
+
+ path_put(&mp->exe_path);
+ path_put(&mp->root_path);
+ put_user_ns(mp->caps_namespace);
+ put_pid(mp->ppid);
+ put_pid(mp->tgid);
+ put_pid(mp->pid);
+
+ kfree(mp->seclabel);
+ kfree(mp->auxgrps);
+ kfree(mp->cmdline);
+ kfree(mp->cgroup);
+ kfree(mp);
+}
+
+/**
+ * kdbus_meta_proc_ref() - Gain reference
+ * @mp: Process metadata object
+ *
+ * Return: @mp is returned
+ */
+struct kdbus_meta_proc *kdbus_meta_proc_ref(struct kdbus_meta_proc *mp)
+{
+ if (mp)
+ kref_get(&mp->kref);
+ return mp;
+}
+
+/**
+ * kdbus_meta_proc_unref() - Drop reference
+ * @mp: Process metadata object
+ *
+ * Return: NULL
+ */
+struct kdbus_meta_proc *kdbus_meta_proc_unref(struct kdbus_meta_proc *mp)
+{
+ if (mp)
+ kref_put(&mp->kref, kdbus_meta_proc_free);
+ return NULL;
+}
+
+static void kdbus_meta_proc_collect_creds(struct kdbus_meta_proc *mp)
+{
+ mp->uid = current_uid();
+ mp->euid = current_euid();
+ mp->suid = current_suid();
+ mp->fsuid = current_fsuid();
+
+ mp->gid = current_gid();
+ mp->egid = current_egid();
+ mp->sgid = current_sgid();
+ mp->fsgid = current_fsgid();
+
+ mp->valid |= KDBUS_ATTACH_CREDS;
+}
+
+static void kdbus_meta_proc_collect_pids(struct kdbus_meta_proc *mp)
+{
+ struct task_struct *parent;
+
+ mp->pid = get_pid(task_pid(current));
+ mp->tgid = get_pid(task_tgid(current));
+
+ rcu_read_lock();
+ parent = rcu_dereference(current->real_parent);
+ mp->ppid = get_pid(task_tgid(parent));
+ rcu_read_unlock();
+
+ mp->valid |= KDBUS_ATTACH_PIDS;
+}
+
+static int kdbus_meta_proc_collect_auxgroups(struct kdbus_meta_proc *mp)
+{
+ struct group_info *info;
+ size_t i;
+
+ info = get_current_groups();
+
+ if (info->ngroups > 0) {
+ mp->auxgrps = kmalloc_array(info->ngroups, sizeof(kgid_t),
+ GFP_KERNEL);
+ if (!mp->auxgrps) {
+ put_group_info(info);
+ return -ENOMEM;
+ }
+
+ for (i = 0; i < info->ngroups; i++)
+ mp->auxgrps[i] = GROUP_AT(info, i);
+ }
+
+ mp->n_auxgrps = info->ngroups;
+ put_group_info(info);
+ mp->valid |= KDBUS_ATTACH_AUXGROUPS;
+
+ return 0;
+}
+
+static void kdbus_meta_proc_collect_tid_comm(struct kdbus_meta_proc *mp)
+{
+ get_task_comm(mp->tid_comm, current);
+ mp->valid |= KDBUS_ATTACH_TID_COMM;
+}
+
+static void kdbus_meta_proc_collect_pid_comm(struct kdbus_meta_proc *mp)
+{
+ get_task_comm(mp->pid_comm, current->group_leader);
+ mp->valid |= KDBUS_ATTACH_PID_COMM;
+}
+
+static void kdbus_meta_proc_collect_exe(struct kdbus_meta_proc *mp)
+{
+ struct mm_struct *mm;
+
+ mm = get_task_mm(current);
+ if (!mm)
+ return;
+
+ down_read(&mm->mmap_sem);
+ if (mm->exe_file) {
+ mp->exe_path = mm->exe_file->f_path;
+ path_get(&mp->exe_path);
+ get_fs_root(current->fs, &mp->root_path);
+ mp->valid |= KDBUS_ATTACH_EXE;
+ }
+ up_read(&mm->mmap_sem);
+
+ mmput(mm);
+}
+
+static int kdbus_meta_proc_collect_cmdline(struct kdbus_meta_proc *mp)
+{
+ struct mm_struct *mm;
+ char *cmdline;
+
+ mm = get_task_mm(current);
+ if (!mm)
+ return 0;
+
+ if (!mm->arg_end) {
+ mmput(mm);
+ return 0;
+ }
+
+ cmdline = strndup_user((const char __user *)mm->arg_start,
+ mm->arg_end - mm->arg_start);
+ mmput(mm);
+
+ if (IS_ERR(cmdline))
+ return PTR_ERR(cmdline);
+
+ mp->cmdline = cmdline;
+ mp->valid |= KDBUS_ATTACH_CMDLINE;
+
+ return 0;
+}
+
+static int kdbus_meta_proc_collect_cgroup(struct kdbus_meta_proc *mp)
+{
+#ifdef CONFIG_CGROUPS
+ void *page;
+ char *s;
+
+ page = (void *)__get_free_page(GFP_TEMPORARY);
+ if (!page)
+ return -ENOMEM;
+
+ s = task_cgroup_path(current, page, PAGE_SIZE);
+ if (s) {
+ mp->cgroup = kstrdup(s, GFP_KERNEL);
+ if (!mp->cgroup) {
+ free_page((unsigned long)page);
+ return -ENOMEM;
+ }
+ }
+
+ free_page((unsigned long)page);
+ mp->valid |= KDBUS_ATTACH_CGROUP;
+#endif
+
+ return 0;
+}
+
+static void kdbus_meta_proc_collect_caps(struct kdbus_meta_proc *mp)
+{
+ const struct cred *c = current_cred();
+ int i;
+
+ /* ABI: "last_cap" equals /proc/sys/kernel/cap_last_cap */
+ mp->caps.last_cap = CAP_LAST_CAP;
+ mp->caps_namespace = get_user_ns(current_user_ns());
+
+ CAP_FOR_EACH_U32(i) {
+ mp->caps.set[0].caps[i] = c->cap_inheritable.cap[i];
+ mp->caps.set[1].caps[i] = c->cap_permitted.cap[i];
+ mp->caps.set[2].caps[i] = c->cap_effective.cap[i];
+ mp->caps.set[3].caps[i] = c->cap_bset.cap[i];
+ }
+
+ /* clear unused bits */
+ for (i = 0; i < 4; i++)
+ mp->caps.set[i].caps[CAP_TO_INDEX(CAP_LAST_CAP)] &=
+ CAP_LAST_U32_VALID_MASK;
+
+ mp->valid |= KDBUS_ATTACH_CAPS;
+}
+
+static int kdbus_meta_proc_collect_seclabel(struct kdbus_meta_proc *mp)
+{
+#ifdef CONFIG_SECURITY
+ char *ctx = NULL;
+ u32 sid, len;
+ int ret;
+
+ security_task_getsecid(current, &sid);
+ ret = security_secid_to_secctx(sid, &ctx, &len);
+ if (ret < 0) {
+ /*
+ * EOPNOTSUPP means no security module is active,
+ * lets skip adding the seclabel then. This effectively
+ * drops the SECLABEL item.
+ */
+ return (ret == -EOPNOTSUPP) ? 0 : ret;
+ }
+
+ mp->seclabel = kstrdup(ctx, GFP_KERNEL);
+ security_release_secctx(ctx, len);
+ if (!mp->seclabel)
+ return -ENOMEM;
+
+ mp->valid |= KDBUS_ATTACH_SECLABEL;
+#endif
+
+ return 0;
+}
+
+static void kdbus_meta_proc_collect_audit(struct kdbus_meta_proc *mp)
+{
+#ifdef CONFIG_AUDITSYSCALL
+ mp->audit_loginuid = audit_get_loginuid(current);
+ mp->audit_sessionid = audit_get_sessionid(current);
+ mp->valid |= KDBUS_ATTACH_AUDIT;
+#endif
+}
+
+/**
+ * kdbus_meta_proc_collect() - Collect process metadata
+ * @mp: Process metadata object
+ * @what: Attach flags to collect
+ *
+ * This collects process metadata from current and saves it in @mp.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kdbus_meta_proc_collect(struct kdbus_meta_proc *mp, u64 what)
+{
+ int ret;
+
+ if (!mp || !(what & (KDBUS_ATTACH_CREDS |
+ KDBUS_ATTACH_PIDS |
+ KDBUS_ATTACH_AUXGROUPS |
+ KDBUS_ATTACH_TID_COMM |
+ KDBUS_ATTACH_PID_COMM |
+ KDBUS_ATTACH_EXE |
+ KDBUS_ATTACH_CMDLINE |
+ KDBUS_ATTACH_CGROUP |
+ KDBUS_ATTACH_CAPS |
+ KDBUS_ATTACH_SECLABEL |
+ KDBUS_ATTACH_AUDIT)))
+ return 0;
+
+ mutex_lock(&mp->lock);
+
+ if ((what & KDBUS_ATTACH_CREDS) &&
+ !(mp->collected & KDBUS_ATTACH_CREDS)) {
+ kdbus_meta_proc_collect_creds(mp);
+ mp->collected |= KDBUS_ATTACH_CREDS;
+ }
+
+ if ((what & KDBUS_ATTACH_PIDS) &&
+ !(mp->collected & KDBUS_ATTACH_PIDS)) {
+ kdbus_meta_proc_collect_pids(mp);
+ mp->collected |= KDBUS_ATTACH_PIDS;
+ }
+
+ if ((what & KDBUS_ATTACH_AUXGROUPS) &&
+ !(mp->collected & KDBUS_ATTACH_AUXGROUPS)) {
+ ret = kdbus_meta_proc_collect_auxgroups(mp);
+ if (ret < 0)
+ goto exit_unlock;
+ mp->collected |= KDBUS_ATTACH_AUXGROUPS;
+ }
+
+ if ((what & KDBUS_ATTACH_TID_COMM) &&
+ !(mp->collected & KDBUS_ATTACH_TID_COMM)) {
+ kdbus_meta_proc_collect_tid_comm(mp);
+ mp->collected |= KDBUS_ATTACH_TID_COMM;
+ }
+
+ if ((what & KDBUS_ATTACH_PID_COMM) &&
+ !(mp->collected & KDBUS_ATTACH_PID_COMM)) {
+ kdbus_meta_proc_collect_pid_comm(mp);
+ mp->collected |= KDBUS_ATTACH_PID_COMM;
+ }
+
+ if ((what & KDBUS_ATTACH_EXE) &&
+ !(mp->collected & KDBUS_ATTACH_EXE)) {
+ kdbus_meta_proc_collect_exe(mp);
+ mp->collected |= KDBUS_ATTACH_EXE;
+ }
+
+ if ((what & KDBUS_ATTACH_CMDLINE) &&
+ !(mp->collected & KDBUS_ATTACH_CMDLINE)) {
+ ret = kdbus_meta_proc_collect_cmdline(mp);
+ if (ret < 0)
+ goto exit_unlock;
+ mp->collected |= KDBUS_ATTACH_CMDLINE;
+ }
+
+ if ((what & KDBUS_ATTACH_CGROUP) &&
+ !(mp->collected & KDBUS_ATTACH_CGROUP)) {
+ ret = kdbus_meta_proc_collect_cgroup(mp);
+ if (ret < 0)
+ goto exit_unlock;
+ mp->collected |= KDBUS_ATTACH_CGROUP;
+ }
+
+ if ((what & KDBUS_ATTACH_CAPS) &&
+ !(mp->collected & KDBUS_ATTACH_CAPS)) {
+ kdbus_meta_proc_collect_caps(mp);
+ mp->collected |= KDBUS_ATTACH_CAPS;
+ }
+
+ if ((what & KDBUS_ATTACH_SECLABEL) &&
+ !(mp->collected & KDBUS_ATTACH_SECLABEL)) {
+ ret = kdbus_meta_proc_collect_seclabel(mp);
+ if (ret < 0)
+ goto exit_unlock;
+ mp->collected |= KDBUS_ATTACH_SECLABEL;
+ }
+
+ if ((what & KDBUS_ATTACH_AUDIT) &&
+ !(mp->collected & KDBUS_ATTACH_AUDIT)) {
+ kdbus_meta_proc_collect_audit(mp);
+ mp->collected |= KDBUS_ATTACH_AUDIT;
+ }
+
+ ret = 0;
+
+exit_unlock:
+ mutex_unlock(&mp->lock);
+ return ret;
+}
+
+/**
+ * kdbus_meta_proc_fake() - Fill process metadata from faked credentials
+ * @mp: Metadata
+ * @creds: Creds to set, may be %NULL
+ * @pids: PIDs to set, may be %NULL
+ * @seclabel: Seclabel to set, may be %NULL
+ *
+ * This function takes information stored in @creds, @pids and @seclabel and
+ * resolves them to kernel-representations, if possible. A call to this function
+ * is considered an alternative to calling kdbus_meta_add_current(), which
+ * derives the same information from the 'current' task.
+ *
+ * This call uses the current task's namespaces to resolve the given
+ * information.
+ *
+ * Return: 0 on success, negative error number otherwise.
+ */
+int kdbus_meta_proc_fake(struct kdbus_meta_proc *mp,
+ const struct kdbus_creds *creds,
+ const struct kdbus_pids *pids,
+ const char *seclabel)
+{
+ int ret;
+
+ if (!mp)
+ return 0;
+
+ mutex_lock(&mp->lock);
+
+ if (creds && !(mp->collected & KDBUS_ATTACH_CREDS)) {
+ struct user_namespace *ns = current_user_ns();
+
+ mp->uid = make_kuid(ns, creds->uid);
+ mp->euid = make_kuid(ns, creds->euid);
+ mp->suid = make_kuid(ns, creds->suid);
+ mp->fsuid = make_kuid(ns, creds->fsuid);
+
+ mp->gid = make_kgid(ns, creds->gid);
+ mp->egid = make_kgid(ns, creds->egid);
+ mp->sgid = make_kgid(ns, creds->sgid);
+ mp->fsgid = make_kgid(ns, creds->fsgid);
+
+ if ((creds->uid != (uid_t)-1 && !uid_valid(mp->uid)) ||
+ (creds->euid != (uid_t)-1 && !uid_valid(mp->euid)) ||
+ (creds->suid != (uid_t)-1 && !uid_valid(mp->suid)) ||
+ (creds->fsuid != (uid_t)-1 && !uid_valid(mp->fsuid)) ||
+ (creds->gid != (gid_t)-1 && !gid_valid(mp->gid)) ||
+ (creds->egid != (gid_t)-1 && !gid_valid(mp->egid)) ||
+ (creds->sgid != (gid_t)-1 && !gid_valid(mp->sgid)) ||
+ (creds->fsgid != (gid_t)-1 && !gid_valid(mp->fsgid))) {
+ ret = -EINVAL;
+ goto exit_unlock;
+ }
+
+ mp->valid |= KDBUS_ATTACH_CREDS;
+ mp->collected |= KDBUS_ATTACH_CREDS;
+ }
+
+ if (pids && !(mp->collected & KDBUS_ATTACH_PIDS)) {
+ mp->pid = get_pid(find_vpid(pids->tid));
+ mp->tgid = get_pid(find_vpid(pids->pid));
+ mp->ppid = get_pid(find_vpid(pids->ppid));
+
+ if ((pids->tid != 0 && !mp->pid) ||
+ (pids->pid != 0 && !mp->tgid) ||
+ (pids->ppid != 0 && !mp->ppid)) {
+ put_pid(mp->pid);
+ put_pid(mp->tgid);
+ put_pid(mp->ppid);
+ mp->pid = NULL;
+ mp->tgid = NULL;
+ mp->ppid = NULL;
+ ret = -EINVAL;
+ goto exit_unlock;
+ }
+
+ mp->valid |= KDBUS_ATTACH_PIDS;
+ mp->collected |= KDBUS_ATTACH_PIDS;
+ }
+
+ if (seclabel && !(mp->collected & KDBUS_ATTACH_SECLABEL)) {
+ mp->seclabel = kstrdup(seclabel, GFP_KERNEL);
+ if (!mp->seclabel) {
+ ret = -ENOMEM;
+ goto exit_unlock;
+ }
+
+ mp->valid |= KDBUS_ATTACH_SECLABEL;
+ mp->collected |= KDBUS_ATTACH_SECLABEL;
+ }
+
+ ret = 0;
+
+exit_unlock:
+ mutex_unlock(&mp->lock);
+ return ret;
+}
+
+/**
+ * kdbus_meta_conn_new() - Create connection metadata object
+ *
+ * Return: Pointer to new object on success, ERR_PTR on failure.
+ */
+struct kdbus_meta_conn *kdbus_meta_conn_new(void)
+{
+ struct kdbus_meta_conn *mc;
+
+ mc = kzalloc(sizeof(*mc), GFP_KERNEL);
+ if (!mc)
+ return ERR_PTR(-ENOMEM);
+
+ kref_init(&mc->kref);
+ mutex_init(&mc->lock);
+
+ return mc;
+}
+
+static void kdbus_meta_conn_free(struct kref *kref)
+{
+ struct kdbus_meta_conn *mc =
+ container_of(kref, struct kdbus_meta_conn, kref);
+
+ kfree(mc->conn_description);
+ kfree(mc->owned_names_items);
+ kfree(mc);
+}
+
+/**
+ * kdbus_meta_conn_ref() - Gain reference
+ * @mc: Connection metadata object
+ */
+struct kdbus_meta_conn *kdbus_meta_conn_ref(struct kdbus_meta_conn *mc)
+{
+ if (mc)
+ kref_get(&mc->kref);
+ return mc;
+}
+
+/**
+ * kdbus_meta_conn_unref() - Drop reference
+ * @mc: Connection metadata object
+ */
+struct kdbus_meta_conn *kdbus_meta_conn_unref(struct kdbus_meta_conn *mc)
+{
+ if (mc)
+ kref_put(&mc->kref, kdbus_meta_conn_free);
+ return NULL;
+}
+
+static void kdbus_meta_conn_collect_timestamp(struct kdbus_meta_conn *mc,
+ struct kdbus_kmsg *kmsg)
+{
+ struct timespec ts;
+
+ ktime_get_ts(&ts);
+ mc->ts.monotonic_ns = timespec_to_ns(&ts);
+
+ ktime_get_real_ts(&ts);
+ mc->ts.realtime_ns = timespec_to_ns(&ts);
+
+ if (kmsg)
+ mc->ts.seqnum = kmsg->seq;
+
+ mc->valid |= KDBUS_ATTACH_TIMESTAMP;
+}
+
+static int kdbus_meta_conn_collect_names(struct kdbus_meta_conn *mc,
+ struct kdbus_conn *conn)
+{
+ const struct kdbus_name_entry *e;
+ struct kdbus_item *item;
+ size_t slen, size;
+
+ lockdep_assert_held(&conn->ep->bus->name_registry->rwlock);
+
+ size = 0;
+ list_for_each_entry(e, &conn->names_list, conn_entry)
+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_name) +
+ strlen(e->name) + 1);
+
+ if (!size)
+ return 0;
+
+ item = kmalloc(size, GFP_KERNEL);
+ if (!item)
+ return -ENOMEM;
+
+ mc->owned_names_items = item;
+ mc->owned_names_size = size;
+
+ list_for_each_entry(e, &conn->names_list, conn_entry) {
+ slen = strlen(e->name) + 1;
+ kdbus_item_set(item, KDBUS_ITEM_OWNED_NAME, NULL,
+ sizeof(struct kdbus_name) + slen);
+ item->name.flags = e->flags;
+ memcpy(item->name.name, e->name, slen);
+ item = KDBUS_ITEM_NEXT(item);
+ }
+
+ /* sanity check: the buffer should be completely written now */
+ WARN_ON((u8 *)item != (u8 *)mc->owned_names_items + size);
+
+ mc->valid |= KDBUS_ATTACH_NAMES;
+ return 0;
+}
+
+static int kdbus_meta_conn_collect_description(struct kdbus_meta_conn *mc,
+ struct kdbus_conn *conn)
+{
+ if (!conn->description)
+ return 0;
+
+ mc->conn_description = kstrdup(conn->description, GFP_KERNEL);
+ if (!mc->conn_description)
+ return -ENOMEM;
+
+ mc->valid |= KDBUS_ATTACH_CONN_DESCRIPTION;
+ return 0;
+}
+
+/**
+ * kdbus_meta_conn_collect() - Collect connection metadata
+ * @mc: Message metadata object
+ * @kmsg: Kmsg to collect data from
+ * @conn: Connection to collect data from
+ * @what: Attach flags to collect
+ *
+ * This collects connection metadata from @kmsg and @conn and saves it in @mc.
+ *
+ * If KDBUS_ATTACH_NAMES is set in @what and @conn is non-NULL, the caller must
+ * hold the name-registry read-lock of conn->ep->bus->registry.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kdbus_meta_conn_collect(struct kdbus_meta_conn *mc,
+ struct kdbus_kmsg *kmsg,
+ struct kdbus_conn *conn,
+ u64 what)
+{
+ int ret;
+
+ if (!mc || !(what & (KDBUS_ATTACH_TIMESTAMP |
+ KDBUS_ATTACH_NAMES |
+ KDBUS_ATTACH_CONN_DESCRIPTION)))
+ return 0;
+
+ mutex_lock(&mc->lock);
+
+ if (kmsg && (what & KDBUS_ATTACH_TIMESTAMP) &&
+ !(mc->collected & KDBUS_ATTACH_TIMESTAMP)) {
+ kdbus_meta_conn_collect_timestamp(mc, kmsg);
+ mc->collected |= KDBUS_ATTACH_TIMESTAMP;
+ }
+
+ if (conn && (what & KDBUS_ATTACH_NAMES) &&
+ !(mc->collected & KDBUS_ATTACH_NAMES)) {
+ ret = kdbus_meta_conn_collect_names(mc, conn);
+ if (ret < 0)
+ goto exit_unlock;
+ mc->collected |= KDBUS_ATTACH_NAMES;
+ }
+
+ if (conn && (what & KDBUS_ATTACH_CONN_DESCRIPTION) &&
+ !(mc->collected & KDBUS_ATTACH_CONN_DESCRIPTION)) {
+ ret = kdbus_meta_conn_collect_description(mc, conn);
+ if (ret < 0)
+ goto exit_unlock;
+ mc->collected |= KDBUS_ATTACH_CONN_DESCRIPTION;
+ }
+
+ ret = 0;
+
+exit_unlock:
+ mutex_unlock(&mc->lock);
+ return ret;
+}
+
+/*
+ * kdbus_meta_export_prepare() - Prepare metadata for export
+ * @mp: Process metadata, or NULL
+ * @mc: Connection metadata, or NULL
+ * @mask: Pointer to mask of KDBUS_ATTACH_* flags to export
+ * @sz: Pointer to return the size needed by the metadata
+ *
+ * Does a conservative calculation of how much space metadata information
+ * will take up during export. It is 'conservative' because for string
+ * translations in namespaces, it will use the kernel namespaces, which is
+ * the longest possible version.
+ *
+ * The actual size consumed by kdbus_meta_export() may hence vary from the
+ * one reported here, but it is guaranteed never to be greater.
+ *
+ * Return: 0 on success, negative error number otherwise.
+ */
+int kdbus_meta_export_prepare(struct kdbus_meta_proc *mp,
+ struct kdbus_meta_conn *mc,
+ u64 *mask, size_t *sz)
+{
+ char *exe_pathname = NULL;
+ void *exe_page = NULL;
+ size_t size = 0;
+ u64 valid = 0;
+ int ret = 0;
+
+ if (mp) {
+ mutex_lock(&mp->lock);
+ valid |= mp->valid;
+ mutex_unlock(&mp->lock);
+ }
+
+ if (mc) {
+ mutex_lock(&mc->lock);
+ valid |= mc->valid;
+ mutex_unlock(&mc->lock);
+ }
+
+ *mask &= valid;
+ *mask &= kdbus_meta_attach_mask;
+
+ if (!*mask)
+ goto exit;
+
+ /* process metadata */
+
+ if (mp && (*mask & KDBUS_ATTACH_CREDS))
+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_creds));
+
+ if (mp && (*mask & KDBUS_ATTACH_PIDS))
+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_pids));
+
+ if (mp && (*mask & KDBUS_ATTACH_AUXGROUPS))
+ size += KDBUS_ITEM_SIZE(mp->n_auxgrps * sizeof(u64));
+
+ if (mp && (*mask & KDBUS_ATTACH_TID_COMM))
+ size += KDBUS_ITEM_SIZE(strlen(mp->tid_comm) + 1);
+
+ if (mp && (*mask & KDBUS_ATTACH_PID_COMM))
+ size += KDBUS_ITEM_SIZE(strlen(mp->pid_comm) + 1);
+
+ if (mp && (*mask & KDBUS_ATTACH_EXE)) {
+ exe_page = (void *)__get_free_page(GFP_TEMPORARY);
+ if (!exe_page) {
+ ret = -ENOMEM;
+ goto exit;
+ }
+
+ exe_pathname = d_path(&mp->exe_path, exe_page, PAGE_SIZE);
+ if (IS_ERR(exe_pathname)) {
+ ret = PTR_ERR(exe_pathname);
+ goto exit;
+ }
+
+ size += KDBUS_ITEM_SIZE(strlen(exe_pathname) + 1);
+ free_page((unsigned long)exe_page);
+ }
+
+ if (mp && (*mask & KDBUS_ATTACH_CMDLINE))
+ size += KDBUS_ITEM_SIZE(strlen(mp->cmdline) + 1);
+
+ if (mp && (*mask & KDBUS_ATTACH_CGROUP))
+ size += KDBUS_ITEM_SIZE(strlen(mp->cgroup) + 1);
+
+ if (mp && (*mask & KDBUS_ATTACH_CAPS))
+ size += KDBUS_ITEM_SIZE(sizeof(mp->caps));
+
+ if (mp && (*mask & KDBUS_ATTACH_SECLABEL))
+ size += KDBUS_ITEM_SIZE(strlen(mp->seclabel) + 1);
+
+ if (mp && (*mask & KDBUS_ATTACH_AUDIT))
+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_audit));
+
+ /* connection metadata */
+
+ if (mc && (*mask & KDBUS_ATTACH_NAMES))
+ size += mc->owned_names_size;
+
+ if (mc && (*mask & KDBUS_ATTACH_CONN_DESCRIPTION))
+ size += KDBUS_ITEM_SIZE(strlen(mc->conn_description) + 1);
+
+ if (mc && (*mask & KDBUS_ATTACH_TIMESTAMP))
+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_timestamp));
+
+exit:
+ *sz = size;
+
+ return ret;
+}
+
+static int kdbus_meta_push_kvec(struct kvec *kvec,
+ struct kdbus_item_header *hdr,
+ u64 type, void *payload,
+ size_t payload_size, u64 *size)
+{
+ hdr->type = type;
+ hdr->size = KDBUS_ITEM_HEADER_SIZE + payload_size;
+ kdbus_kvec_set(kvec++, hdr, sizeof(*hdr), size);
+ kdbus_kvec_set(kvec++, payload, payload_size, size);
+ return 2 + !!kdbus_kvec_pad(kvec++, size);
+}
+
+/* This is equivalent to from_kuid_munged(), but maps INVALID_UID to itself */
+static uid_t kdbus_from_kuid_keep(kuid_t uid)
+{
+ return uid_valid(uid) ?
+ from_kuid_munged(current_user_ns(), uid) : ((uid_t)-1);
+}
+
+/* This is equivalent to from_kgid_munged(), but maps INVALID_GID to itself */
+static gid_t kdbus_from_kgid_keep(kgid_t gid)
+{
+ return gid_valid(gid) ?
+ from_kgid_munged(current_user_ns(), gid) : ((gid_t)-1);
+}
+
+/**
+ * kdbus_meta_export() - export information from metadata into a slice
+ * @mp: Process metadata, or NULL
+ * @mc: Connection metadata, or NULL
+ * @mask: Mask of KDBUS_ATTACH_* flags to export
+ * @slice: The slice to export to
+ * @offset: The offset inside @slice to write to
+ * @real_size: The real size the metadata consumed
+ *
+ * This function exports information from metadata into @slice at offset
+ * @offset inside that slice. Only information that is requested in @mask
+ * and that has been collected before is exported.
+ *
+ * In order to make sure not to write out of bounds, @mask must be the same
+ * value that was previously returned from kdbus_meta_export_prepare(). The
+ * function will, however, not necessarily write as many bytes as returned by
+ * kdbus_meta_export_prepare(); depending on the namespaces in question, it
+ * might use up less than that.
+ *
+ * All information will be translated using the current namespaces.
+ *
+ * Return: 0 on success, negative error number otherwise.
+ */
+int kdbus_meta_export(struct kdbus_meta_proc *mp,
+ struct kdbus_meta_conn *mc,
+ u64 mask,
+ struct kdbus_pool_slice *slice,
+ off_t offset,
+ size_t *real_size)
+{
+ struct user_namespace *user_ns = current_user_ns();
+ struct kdbus_item_header item_hdr[13], *hdr;
+ char *exe_pathname = NULL;
+ struct kdbus_creds creds;
+ struct kdbus_pids pids;
+ void *exe_page = NULL;
+ struct kvec kvec[40];
+ u64 *auxgrps = NULL;
+ size_t cnt = 0;
+ u64 size = 0;
+ int ret = 0;
+
+ hdr = &item_hdr[0];
+
+ /*
+ * TODO: We currently have no sane way of translating a set of caps
+ * between different user namespaces. Until that changes, we have
+ * to drop such items.
+ */
+ if (mp && mp->caps_namespace != user_ns)
+ mask &= ~KDBUS_ATTACH_CAPS;
+
+ if (mask == 0) {
+ *real_size = 0;
+ return 0;
+ }
+
+ /* process metadata */
+
+ if (mp && (mask & KDBUS_ATTACH_CREDS)) {
+ creds.uid = kdbus_from_kuid_keep(mp->uid);
+ creds.euid = kdbus_from_kuid_keep(mp->euid);
+ creds.suid = kdbus_from_kuid_keep(mp->suid);
+ creds.fsuid = kdbus_from_kuid_keep(mp->fsuid);
+ creds.gid = kdbus_from_kgid_keep(mp->gid);
+ creds.egid = kdbus_from_kgid_keep(mp->egid);
+ creds.sgid = kdbus_from_kgid_keep(mp->sgid);
+ creds.fsgid = kdbus_from_kgid_keep(mp->fsgid);
+
+ cnt += kdbus_meta_push_kvec(kvec + cnt, hdr++, KDBUS_ITEM_CREDS,
+ &creds, sizeof(creds), &size);
+ }
+
+ if (mp && (mask & KDBUS_ATTACH_PIDS)) {
+ pids.pid = pid_vnr(mp->tgid);
+ pids.tid = pid_vnr(mp->pid);
+ pids.ppid = pid_vnr(mp->ppid);
+
+ cnt += kdbus_meta_push_kvec(kvec + cnt, hdr++, KDBUS_ITEM_PIDS,
+ &pids, sizeof(pids), &size);
+ }
+
+ if (mp && (mask & KDBUS_ATTACH_AUXGROUPS)) {
+ size_t payload_size = mp->n_auxgrps * sizeof(u64);
+ int i;
+
+ auxgrps = kmalloc(payload_size, GFP_KERNEL);
+ if (!auxgrps) {
+ ret = -ENOMEM;
+ goto exit;
+ }
+
+ for (i = 0; i < mp->n_auxgrps; i++)
+ auxgrps[i] = from_kgid_munged(user_ns, mp->auxgrps[i]);
+
+ cnt += kdbus_meta_push_kvec(kvec + cnt, hdr++,
+ KDBUS_ITEM_AUXGROUPS,
+ auxgrps, payload_size, &size);
+ }
+
+ if (mp && (mask & KDBUS_ATTACH_TID_COMM))
+ cnt += kdbus_meta_push_kvec(kvec + cnt, hdr++,
+ KDBUS_ITEM_TID_COMM, mp->tid_comm,
+ strlen(mp->tid_comm) + 1, &size);
+
+ if (mp && (mask & KDBUS_ATTACH_PID_COMM))
+ cnt += kdbus_meta_push_kvec(kvec + cnt, hdr++,
+ KDBUS_ITEM_PID_COMM, mp->pid_comm,
+ strlen(mp->pid_comm) + 1, &size);
+
+ if (mp && (mask & KDBUS_ATTACH_EXE)) {
+ struct path p;
+
+ /*
+ * TODO: We need access to __d_path() so we can write the path
+ * relative to conn->root_path. Once upstream, we need
+ * EXPORT_SYMBOL(__d_path) or an equivalent of d_path() that
+ * takes the root path directly. Until then, we drop this item
+ * if the root-paths differ.
+ */
+
+ get_fs_root(current->fs, &p);
+ if (path_equal(&p, &mp->root_path)) {
+ exe_page = (void *)__get_free_page(GFP_TEMPORARY);
+ if (!exe_page) {
+ path_put(&p);
+ ret = -ENOMEM;
+ goto exit;
+ }
+
+ exe_pathname = d_path(&mp->exe_path, exe_page,
+ PAGE_SIZE);
+ if (IS_ERR(exe_pathname)) {
+ path_put(&p);
+ ret = PTR_ERR(exe_pathname);
+ goto exit;
+ }
+
+ cnt += kdbus_meta_push_kvec(kvec + cnt, hdr++,
+ KDBUS_ITEM_EXE,
+ exe_pathname,
+ strlen(exe_pathname) + 1,
+ &size);
+ }
+ path_put(&p);
+ }
+
+ if (mp && (mask & KDBUS_ATTACH_CMDLINE))
+ cnt += kdbus_meta_push_kvec(kvec + cnt, hdr++,
+ KDBUS_ITEM_CMDLINE, mp->cmdline,
+ strlen(mp->cmdline) + 1, &size);
+
+ if (mp && (mask & KDBUS_ATTACH_CGROUP))
+ cnt += kdbus_meta_push_kvec(kvec + cnt, hdr++,
+ KDBUS_ITEM_CGROUP, mp->cgroup,
+ strlen(mp->cgroup) + 1, &size);
+
+ if (mp && (mask & KDBUS_ATTACH_CAPS))
+ cnt += kdbus_meta_push_kvec(kvec + cnt, hdr++,
+ KDBUS_ITEM_CAPS, &mp->caps,
+ sizeof(mp->caps), &size);
+
+ if (mp && (mask & KDBUS_ATTACH_SECLABEL))
+ cnt += kdbus_meta_push_kvec(kvec + cnt, hdr++,
+ KDBUS_ITEM_SECLABEL, mp->seclabel,
+ strlen(mp->seclabel) + 1, &size);
+
+ if (mp && (mask & KDBUS_ATTACH_AUDIT)) {
+ struct kdbus_audit a = {
+ .loginuid = from_kuid(user_ns, mp->audit_loginuid),
+ .sessionid = mp->audit_sessionid,
+ };
+
+ cnt += kdbus_meta_push_kvec(kvec + cnt, hdr++, KDBUS_ITEM_AUDIT,
+ &a, sizeof(a), &size);
+ }
+
+ /* connection metadata */
+
+ if (mc && (mask & KDBUS_ATTACH_NAMES))
+ kdbus_kvec_set(&kvec[cnt++], mc->owned_names_items,
+ mc->owned_names_size, &size);
+
+ if (mc && (mask & KDBUS_ATTACH_CONN_DESCRIPTION))
+ cnt += kdbus_meta_push_kvec(kvec + cnt, hdr++,
+ KDBUS_ITEM_CONN_DESCRIPTION,
+ mc->conn_description,
+ strlen(mc->conn_description) + 1,
+ &size);
+
+ if (mc && (mask & KDBUS_ATTACH_TIMESTAMP))
+ cnt += kdbus_meta_push_kvec(kvec + cnt, hdr++,
+ KDBUS_ITEM_TIMESTAMP, &mc->ts,
+ sizeof(mc->ts), &size);
+
+ ret = kdbus_pool_slice_copy_kvec(slice, offset, kvec, cnt, size);
+ *real_size = size;
+
+exit:
+ kfree(auxgrps);
+
+ if (exe_page)
+ free_page((unsigned long)exe_page);
+
+ return ret;
+}
+
+/**
+ * kdbus_meta_calc_attach_flags() - calculate attach flags for a sender
+ * and a receiver
+ * @sender: Sending connection
+ * @receiver: Receiving connection
+ *
+ * Return: the attach flags both the sender and the receiver have opted-in
+ * for.
+ */
+u64 kdbus_meta_calc_attach_flags(const struct kdbus_conn *sender,
+ const struct kdbus_conn *receiver)
+{
+ return atomic64_read(&sender->attach_flags_send) &
+ atomic64_read(&receiver->attach_flags_recv);
+}
diff --git a/ipc/kdbus/metadata.h b/ipc/kdbus/metadata.h
new file mode 100644
index 000000000000..42c942b34d2c
--- /dev/null
+++ b/ipc/kdbus/metadata.h
@@ -0,0 +1,57 @@
+/*
+ * Copyright (C) 2013-2015 Kay Sievers
+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2015 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2015 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2015 Linux Foundation
+ * Copyright (C) 2014-2015 Djalal Harouni
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_METADATA_H
+#define __KDBUS_METADATA_H
+
+#include <linux/kernel.h>
+
+struct kdbus_conn;
+struct kdbus_kmsg;
+struct kdbus_pool_slice;
+
+struct kdbus_meta_proc;
+struct kdbus_meta_conn;
+
+extern unsigned long long kdbus_meta_attach_mask;
+
+struct kdbus_meta_proc *kdbus_meta_proc_new(void);
+struct kdbus_meta_proc *kdbus_meta_proc_ref(struct kdbus_meta_proc *mp);
+struct kdbus_meta_proc *kdbus_meta_proc_unref(struct kdbus_meta_proc *mp);
+int kdbus_meta_proc_collect(struct kdbus_meta_proc *mp, u64 what);
+int kdbus_meta_proc_fake(struct kdbus_meta_proc *mp,
+ const struct kdbus_creds *creds,
+ const struct kdbus_pids *pids,
+ const char *seclabel);
+
+struct kdbus_meta_conn *kdbus_meta_conn_new(void);
+struct kdbus_meta_conn *kdbus_meta_conn_ref(struct kdbus_meta_conn *mc);
+struct kdbus_meta_conn *kdbus_meta_conn_unref(struct kdbus_meta_conn *mc);
+int kdbus_meta_conn_collect(struct kdbus_meta_conn *mc,
+ struct kdbus_kmsg *kmsg,
+ struct kdbus_conn *conn,
+ u64 what);
+
+int kdbus_meta_export_prepare(struct kdbus_meta_proc *mp,
+ struct kdbus_meta_conn *mc,
+ u64 *mask, size_t *sz);
+int kdbus_meta_export(struct kdbus_meta_proc *mp,
+ struct kdbus_meta_conn *mc,
+ u64 mask,
+ struct kdbus_pool_slice *slice,
+ off_t offset, size_t *real_size);
+u64 kdbus_meta_calc_attach_flags(const struct kdbus_conn *sender,
+ const struct kdbus_conn *receiver);
+
+#endif
--
2.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Nicolas Iooss
2015-03-15 05:17:18 UTC
Permalink
s/receveiver/receiver/

Signed-off-by: Nicolas Iooss <***@m4x.org>
---
samples/kdbus/kdbus-workers.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/samples/kdbus/kdbus-workers.c b/samples/kdbus/kdbus-workers.c
index d1d8f7a7697b..d331e0186899 100644
--- a/samples/kdbus/kdbus-workers.c
+++ b/samples/kdbus/kdbus-workers.c
@@ -787,8 +787,8 @@ static int child_run(struct child *c)
* The 2nd item contains a vector to memory we want to send. It
* can be content of any type. In our case, we're sending a one-byte
* string only. The memory referenced by this item will be copied into
- * the pool of the receveiver connection, and does not need to be
- * valid after the command is employed.
+ * the pool of the receiver connection, and does not need to be valid
+ * after the command is employed.
*/
item = KDBUS_ITEM_NEXT(item);
item->type = KDBUS_ITEM_PAYLOAD_VEC;
--
2.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg KH
2015-03-15 09:32:26 UTC
Permalink
Post by Nicolas Iooss
s/receveiver/receiver/
---
samples/kdbus/kdbus-workers.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Now applied, thanks.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2015-03-17 19:25:20 UTC
Permalink
On Mon, Mar 9, 2015 at 6:09 AM, Greg Kroah-Hartman
Post by Greg Kroah-Hartman
kdbus is a kernel-level IPC implementation that aims for resemblance to
the the protocol layer with the existing userspace D-Bus daemon while
enabling some features that couldn't be implemented before in userspace.
[...]

What changed from last time?

The discussion from last time about performance seems to have stalled.
kdbus is supposed to be fast -- that seems to be a large part of the
point. Can anyone comment on how fast it actually is. I'm curious
about:

- The time it takes to do the ioctl to send a message

- The time it takes to receive a message (poll + whatever ioctls)

- The time it takes to transfer a memfd (I don't care about how long
it takes to create or map a memfd -- that's exactly the same between
kdbus and any other memfds user, I imagine)

- The time it takes to connect

I'm also interested in whether the current design is actually amenable
to good performance. I'm told that it is, but ISTM there's a lot of
heavyweight stuff going on with each send operation that will be hard
to remove.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
David Herrmann
2015-03-18 13:54:22 UTC
Permalink
Hi

On Tue, Mar 17, 2015 at 8:24 PM, Andy Lutomirski <***@amacapital.net> wrote:
[...]
Post by Andy Lutomirski
Can anyone comment on how fast it actually is. I'm curious
- The time it takes to do the ioctl to send a message
- The time it takes to receive a message (poll + whatever ioctls)
I'm not sure I can gather useful absolute data here. This highly
depends on how you call it, how often you call it, what payloads you
pass, what machine you're on.. you know all that.

So here's a flamegraph for you (which I use for comparisons to UDS):
http://people.freedesktop.org/~dvdhrm/kdbus_8kb.svg

This program sends unicast messages on kdbus and UDS, exactly the same
number of times with the same 8kb payload. No parsing, no marshaling
is done, just plain message passing. The interesting spikes are
sys_read(), sys_write() and the 3 kdbus sys_ioctl(). Everything else
should be ignored.

sys_read() and sys_ioctl(KDBUS_CMD_RECV) are about the same. But note
that we don't copy any payload in RECV, so it scales O(1) compared to
message-size.

sys_write() is about 3x faster than sys_ioctl(KDBUS_CMD_WRITE).

I see lots of room for improvement in both RECV and SEND. Caching the
namespaces on a connection, would get rid of
kdbus_queue_entry_install() in RECV, thus speeding it up by ~30%. In
SEND, we could merge the kvec and iovec copying, to avoid calling
shmem_begin_write() twice. We should also stop allocating management
structures that are not used (like for metadata, if no metadata is
transmitted). We should use stack-space for small ioctl objects,
instead of memdup_user(). And so on.. Oh, and locking can be reduced.
We haven't even looked at rcu, yet (though that's mostly interesting
for policy and broadcasts, not unicasts).
Post by Andy Lutomirski
- The time it takes to transfer a memfd (I don't care about how long
it takes to create or map a memfd -- that's exactly the same between
kdbus and any other memfds user, I imagine)
The time to transmit a memfd is the same as to transmit a 64-byte
payload. Ok, you also get to install the fd into the fd-table, but
that's true regardless of the transport.
Here's a graph for 64byte transfers (otherwise, same as above):
http://people.freedesktop.org/~dvdhrm/kdbus_64b.svg
Post by Andy Lutomirski
- The time it takes to connect
No idea, never measured it. Why is it of interest?
Post by Andy Lutomirski
I'm also interested in whether the current design is actually amenable
to good performance. I'm told that it is, but ISTM there's a lot of
heavyweight stuff going on with each send operation that will be hard
to remove.
I disagree. What heavyweight stuff is going on?

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2015-03-18 18:24:47 UTC
Permalink
Post by David Herrmann
Hi
[...]
Post by Andy Lutomirski
Can anyone comment on how fast it actually is. I'm curious
- The time it takes to do the ioctl to send a message
- The time it takes to receive a message (poll + whatever ioctls)
I'm not sure I can gather useful absolute data here. This highly
depends on how you call it, how often you call it, what payloads you
pass, what machine you're on.. you know all that.
http://people.freedesktop.org/~dvdhrm/kdbus_8kb.svg
This program sends unicast messages on kdbus and UDS, exactly the same
number of times with the same 8kb payload. No parsing, no marshaling
is done, just plain message passing. The interesting spikes are
sys_read(), sys_write() and the 3 kdbus sys_ioctl(). Everything else
should be ignored.
sys_read() and sys_ioctl(KDBUS_CMD_RECV) are about the same. But note
that we don't copy any payload in RECV, so it scales O(1) compared to
message-size.
sys_write() is about 3x faster than sys_ioctl(KDBUS_CMD_WRITE).
Is that factor of 3 for 8 kb payloads? If so, I expect it's a factor
of much worse than 3 for small payloads.
Post by David Herrmann
I see lots of room for improvement in both RECV and SEND. Caching the
namespaces on a connection, would get rid of
kdbus_queue_entry_install() in RECV, thus speeding it up by ~30%. In
SEND, we could merge the kvec and iovec copying, to avoid calling
shmem_begin_write() twice. We should also stop allocating management
structures that are not used (like for metadata, if no metadata is
transmitted). We should use stack-space for small ioctl objects,
instead of memdup_user(). And so on.. Oh, and locking can be reduced.
We haven't even looked at rcu, yet (though that's mostly interesting
for policy and broadcasts, not unicasts).
Post by Andy Lutomirski
- The time it takes to transfer a memfd (I don't care about how long
it takes to create or map a memfd -- that's exactly the same between
kdbus and any other memfds user, I imagine)
The time to transmit a memfd is the same as to transmit a 64-byte
payload. Ok, you also get to install the fd into the fd-table, but
that's true regardless of the transport.
http://people.freedesktop.org/~dvdhrm/kdbus_64b.svg
Post by Andy Lutomirski
- The time it takes to connect
No idea, never measured it. Why is it of interest?
Gah, sorry, bad terminology. I mean the time it takes to send a
message to a receiver that you haven't sent to before.

(The kdbus terminology is weird. You don't send to "endpoints", you
don't "connect" to other participants, and it's not even clear to me
what a participant in the bus is called.)
Post by David Herrmann
Post by Andy Lutomirski
I'm also interested in whether the current design is actually amenable
to good performance. I'm told that it is, but ISTM there's a lot of
heavyweight stuff going on with each send operation that will be hard
to remove.
I disagree. What heavyweight stuff is going on?
At least metadata generation, metadata free, and policy db checks seem
expensive. It could be worth running a bunch of copies of your
benchmark on different cores and seeing how it scales.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
David Herrmann
2015-03-19 11:27:05 UTC
Permalink
Hi
[...]
Post by Andy Lutomirski
Post by David Herrmann
This program sends unicast messages on kdbus and UDS, exactly the same
number of times with the same 8kb payload. No parsing, no marshaling
is done, just plain message passing. The interesting spikes are
sys_read(), sys_write() and the 3 kdbus sys_ioctl(). Everything else
should be ignored.
sys_read() and sys_ioctl(KDBUS_CMD_RECV) are about the same. But note
that we don't copy any payload in RECV, so it scales O(1) compared to
message-size.
sys_write() is about 3x faster than sys_ioctl(KDBUS_CMD_WRITE).
Is that factor of 3 for 8 kb payloads? If so, I expect it's a factor
of much worse than 3 for small payloads.
Yes, factor of 3x for 8kb payloads. ~3.8x for 64byte payloads (see the
second flamegraph I linked, which contains data for 64byte payloads
(which is basically nothing)).
Post by Andy Lutomirski
Post by David Herrmann
Post by Andy Lutomirski
- The time it takes to connect
No idea, never measured it. Why is it of interest?
Gah, sorry, bad terminology. I mean the time it takes to send a
message to a receiver that you haven't sent to before.
Cold message transactions are horribly slow for both, kdbus and UDS,
and the performance varies heavily (factor of 10x). I haven't figured
out whether it's icache/dcache misses, cold branch predictor, process
mem faults, scheduler, whatever..

What I can say, is the kdbus paths are more expensive, in both LOC and
execution time. We might be able to improve the cold-transaction
performance with _unlikely_() annotations, shortcuts, etc. But I want
much more benchmark data before I try to outsmart the compiler. It
works reasonably well right now.

Note that from a algorithmic view, there's no difference between the
first transaction and a following transaction. All relevant accesses
are O(1).

(Actually looking at the numbers again, worst-case vs. average-case in
message transaction is exactly 10x for both, UDS and kdbus. Skipping
the first couple, I get <2x. std-dev is roughly 2%)
Post by Andy Lutomirski
(The kdbus terminology is weird. You don't send to "endpoints", you
don't "connect" to other participants, and it's not even clear to me
what a participant in the bus is called.)
A participant is called a "connection" or "peer" (I prefer 'peer'). It
"connects" to a bus via an endpoint of the bus. Endpoints are
file-system entries and can be shared, and usually are.
Unlike binder, kdbus does not know peer-to-peer links. That is, there
is never (not even a temporary) link between only two peers. Messages
are always sent to the bus, and the bus makes sure only the addressed
recipients will get the message.
Post by Andy Lutomirski
Post by David Herrmann
Post by Andy Lutomirski
I'm also interested in whether the current design is actually amenable
to good performance. I'm told that it is, but ISTM there's a lot of
heavyweight stuff going on with each send operation that will be hard
to remove.
I disagree. What heavyweight stuff is going on?
At least metadata generation, metadata free, and policy db checks seem
expensive. It could be worth running a bunch of copies of your
benchmark on different cores and seeing how it scales.
metadata handling is local to the connection that sends the message.
It does not affect the overall performance of other bus operations in
parallel. Furthermore, it's way faster than collecting the "same" data
via /proc, so I don't think it slows down the overall transaction at
all. If a receiver doesn't want metadata, it should not request it (by
setting the receiver-metadata-mask). If a sender doesn't like the
overhead, it should not send the metadata (by setting the
sender-metadata-mask). Only if both peers set the metadata mask, it
will be transmitted.

The policy-db does indeed look like a bottleneck. Until v2 we used to
have a policy-cache, which I ripped out as it didn't meet our
expectations. There are plans to rewrite it, but it's low-priority.
Thing is, policy-setup is bus-privileged. As long as it's done in a
sane manner (keeping the entries per name minimal), the hash-table
based name-lookup gives suitable performance. Only if the number of
entries per name rises, it gets problematic due to O(n)
list-traversal. But even that could be optimized without a policy
cache, by merging matching entries (see kdbus_policy_db_entry_access).
Furthermore, the policy-db is skipped for privileged peers or if both,
sender and recipient, trust each other (eg., have the same
endpoint+uid). Thus, if you have a trusted transaction, the policy-db
is skipped, anyway.

One real bottleneck I see is the name-registry rwlock.
Acquiring/releasing names is still a heavy operation, as it blocks the
whole bus due to acquiring the write-lock. Again, I have plans to
optimize it (srcu would be an idea, syncing on name-acquire/release),
but it's an implementation detail.

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2015-03-19 15:49:12 UTC
Permalink
Post by David Herrmann
Hi
[...]
Post by Andy Lutomirski
Post by David Herrmann
This program sends unicast messages on kdbus and UDS, exactly the same
number of times with the same 8kb payload. No parsing, no marshaling
is done, just plain message passing. The interesting spikes are
sys_read(), sys_write() and the 3 kdbus sys_ioctl(). Everything else
should be ignored.
sys_read() and sys_ioctl(KDBUS_CMD_RECV) are about the same. But note
that we don't copy any payload in RECV, so it scales O(1) compared to
message-size.
sys_write() is about 3x faster than sys_ioctl(KDBUS_CMD_WRITE).
Is that factor of 3 for 8 kb payloads? If so, I expect it's a factor
of much worse than 3 for small payloads.
Yes, factor of 3x for 8kb payloads. ~3.8x for 64byte payloads (see the
second flamegraph I linked, which contains data for 64byte payloads
(which is basically nothing)).
I find this surprising. Are both of them so slow that copying 8kb is
negligible? That's rather sad.
Post by David Herrmann
Post by Andy Lutomirski
Post by David Herrmann
Post by Andy Lutomirski
- The time it takes to connect
No idea, never measured it. Why is it of interest?
Gah, sorry, bad terminology. I mean the time it takes to send a
message to a receiver that you haven't sent to before.
Cold message transactions are horribly slow for both, kdbus and UDS,
and the performance varies heavily (factor of 10x). I haven't figured
out whether it's icache/dcache misses, cold branch predictor, process
mem faults, scheduler, whatever..
What I can say, is the kdbus paths are more expensive, in both LOC and
execution time. We might be able to improve the cold-transaction
performance with _unlikely_() annotations, shortcuts, etc. But I want
much more benchmark data before I try to outsmart the compiler. It
works reasonably well right now.
Note that from a algorithmic view, there's no difference between the
first transaction and a following transaction. All relevant accesses
are O(1).
(Actually looking at the numbers again, worst-case vs. average-case in
message transaction is exactly 10x for both, UDS and kdbus. Skipping
the first couple, I get <2x. std-dev is roughly 2%)
Post by Andy Lutomirski
(The kdbus terminology is weird. You don't send to "endpoints", you
don't "connect" to other participants, and it's not even clear to me
what a participant in the bus is called.)
A participant is called a "connection" or "peer" (I prefer 'peer'). It
"connects" to a bus via an endpoint of the bus. Endpoints are
file-system entries and can be shared, and usually are.
Unlike binder, kdbus does not know peer-to-peer links. That is, there
is never (not even a temporary) link between only two peers. Messages
are always sent to the bus, and the bus makes sure only the addressed
recipients will get the message.
Post by Andy Lutomirski
Post by David Herrmann
Post by Andy Lutomirski
I'm also interested in whether the current design is actually amenable
to good performance. I'm told that it is, but ISTM there's a lot of
heavyweight stuff going on with each send operation that will be hard
to remove.
I disagree. What heavyweight stuff is going on?
At least metadata generation, metadata free, and policy db checks seem
expensive. It could be worth running a bunch of copies of your
benchmark on different cores and seeing how it scales.
metadata handling is local to the connection that sends the message.
It does not affect the overall performance of other bus operations in
parallel.
Sure it does if it writes to shared cachelines. Given that you're
incrementing refcounts, I'm reasonable sure that you're touching lots
of shared cachelines.
Post by David Herrmann
Furthermore, it's way faster than collecting the "same" data
via /proc, so I don't think it slows down the overall transaction at
all. If a receiver doesn't want metadata, it should not request it (by
setting the receiver-metadata-mask). If a sender doesn't like the
overhead, it should not send the metadata (by setting the
sender-metadata-mask). Only if both peers set the metadata mask, it
will be transmitted.
But you're comparing to the wrong thing, IMO. Of course it's much
faster than /proc hackery, but it's probably much slower to do the
metadata operation once per message than to do it when you connect to
the endpoint. (Gah! It's a "bus" that could easily have tons of
users but a single "endpoint". I'm still not used to it.)
Post by David Herrmann
The policy-db does indeed look like a bottleneck. Until v2 we used to
have a policy-cache, which I ripped out as it didn't meet our
expectations. There are plans to rewrite it, but it's low-priority.
Thing is, policy-setup is bus-privileged. As long as it's done in a
sane manner (keeping the entries per name minimal), the hash-table
based name-lookup gives suitable performance. Only if the number of
entries per name rises, it gets problematic due to O(n)
list-traversal. But even that could be optimized without a policy
cache, by merging matching entries (see kdbus_policy_db_entry_access).
Furthermore, the policy-db is skipped for privileged peers or if both,
sender and recipient, trust each other (eg., have the same
endpoint+uid). Thus, if you have a trusted transaction, the policy-db
is skipped, anyway.
Yeah, that's reasonable. I don't see any obvious way around that.
(The policy semantics are still insane wrt connections with multiple
names, though, but that should have nothing to do with performance.
Insanity for historical reasons is still insanity.)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
David Herrmann
2015-03-23 15:29:05 UTC
Permalink
Hi
Post by Andy Lutomirski
Post by David Herrmann
metadata handling is local to the connection that sends the message.
It does not affect the overall performance of other bus operations in
parallel.
Sure it does if it writes to shared cachelines. Given that you're
incrementing refcounts, I'm reasonable sure that you're touching lots
of shared cachelines.
Ok, sure, but it's still mostly local to the sending task. We take
locks and ref-counts on the task-struct and mm, which is for most
parts local to the CPU the task runs on. But this is inherent to
accessing this kind of data, which is the fundamental difference in
our views here, as seen below..
Post by Andy Lutomirski
Post by David Herrmann
Furthermore, it's way faster than collecting the "same" data
via /proc, so I don't think it slows down the overall transaction at
all. If a receiver doesn't want metadata, it should not request it (by
setting the receiver-metadata-mask). If a sender doesn't like the
overhead, it should not send the metadata (by setting the
sender-metadata-mask). Only if both peers set the metadata mask, it
will be transmitted.
But you're comparing to the wrong thing, IMO. Of course it's much
faster than /proc hackery, but it's probably much slower to do the
metadata operation once per message than to do it when you connect to
the endpoint. (Gah! It's a "bus" that could easily have tons of
users but a single "endpoint". I'm still not used to it.)
Yes, of course your assumption is right if you compare against
per-connection caches, instead of per-message metadata. But we do
support _both_ use-cases, so we don't impose any policy.
We still believe "live"-metadata is a crucial feature of kdbus,
despite the known performance penalties.

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2015-03-23 23:24:33 UTC
Permalink
Post by David Herrmann
Hi
Post by Andy Lutomirski
Post by David Herrmann
metadata handling is local to the connection that sends the message.
It does not affect the overall performance of other bus operations in
parallel.
Sure it does if it writes to shared cachelines. Given that you're
incrementing refcounts, I'm reasonable sure that you're touching lots
of shared cachelines.
Ok, sure, but it's still mostly local to the sending task. We take
locks and ref-counts on the task-struct and mm, which is for most
parts local to the CPU the task runs on. But this is inherent to
accessing this kind of data, which is the fundamental difference in
our views here, as seen below..
You're also refcounting the struct cred, and there's no good reason
for that to be local. (It might be a bit more local than intended
because of the absurd things that the key subsystem does to struct
cred, but IMO users should turn that off or the kernel should fix it.)

Even more globally, I think you're touching init_user_ns's refcount in
most scenarios. That's about as global as it gets.

(Also, is there an easy benchmark to see how much time it takes to
send and receive metadata? I tried to get the kdbus test to do this,
and I failed. I probably did it wrong.)
Post by David Herrmann
Post by Andy Lutomirski
Post by David Herrmann
Furthermore, it's way faster than collecting the "same" data
via /proc, so I don't think it slows down the overall transaction at
all. If a receiver doesn't want metadata, it should not request it (by
setting the receiver-metadata-mask). If a sender doesn't like the
overhead, it should not send the metadata (by setting the
sender-metadata-mask). Only if both peers set the metadata mask, it
will be transmitted.
But you're comparing to the wrong thing, IMO. Of course it's much
faster than /proc hackery, but it's probably much slower to do the
metadata operation once per message than to do it when you connect to
the endpoint. (Gah! It's a "bus" that could easily have tons of
users but a single "endpoint". I'm still not used to it.)
Yes, of course your assumption is right if you compare against
per-connection caches, instead of per-message metadata. But we do
support _both_ use-cases, so we don't impose any policy.
We still believe "live"-metadata is a crucial feature of kdbus,
despite the known performance penalties.
And you still have not described a single use case for which it's
better than per-connection metadata.

I'm against adding a feature to the kernel (per-message metadata) if
the primary reason it's being added is to support what appears to be a
misfeature in *new* userspace that we have no obligation whatsoever to
be ABI-compatible with. This is especially true if that feature is
slower than the alternatives. This is even more true if this feature
is *inconsistent* with legacy userspace (i.e. userspace dbus).

I could be wrong about the lack of use cases. If so, please enlighten me.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Eric W. Biederman
2015-03-24 00:24:31 UTC
Permalink
Post by Andy Lutomirski
Post by David Herrmann
Hi
Post by Andy Lutomirski
But you're comparing to the wrong thing, IMO. Of course it's much
faster than /proc hackery, but it's probably much slower to do the
metadata operation once per message than to do it when you connect to
the endpoint. (Gah! It's a "bus" that could easily have tons of
users but a single "endpoint". I'm still not used to it.)
Yes, of course your assumption is right if you compare against
per-connection caches, instead of per-message metadata. But we do
support _both_ use-cases, so we don't impose any policy.
We still believe "live"-metadata is a crucial feature of kdbus,
despite the known performance penalties.
And you still have not described a single use case for which it's
better than per-connection metadata.
I'm against adding a feature to the kernel (per-message metadata) if
the primary reason it's being added is to support what appears to be a
misfeature in *new* userspace that we have no obligation whatsoever to
be ABI-compatible with. This is especially true if that feature is
slower than the alternatives. This is even more true if this feature
is *inconsistent* with legacy userspace (i.e. userspace dbus).
I could be wrong about the lack of use cases. If so, please enlighten me.
Please. I asked this same question on the first revision of this code
to go up for review and I was told that there are no existing users of
dbus that cares.

Right now this looks like a case of the deplorable habit of getting
review comments, and then resubmitting a patch with trivial changes and
ignoring the substantial review comments. Again and again and again
until the reviewers run out of energy to object.

That seems like a very poor way to add a new ABI that to the kernel that
we will have to support for the next 20 years, because huge swaths of
userspace are going to be using it.

Not to mention that per message meta-data is known to be both a
performance problem but also that it tends to turn in to a security
problem. It is the kind of information that is easy to mess up and hard
to support long term.

So I agree with Andy that we really need something better than it would
be nice to have.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
David Herrmann
2015-03-25 17:29:29 UTC
Permalink
Hi
Post by Andy Lutomirski
Post by David Herrmann
Post by Andy Lutomirski
Post by David Herrmann
metadata handling is local to the connection that sends the message.
It does not affect the overall performance of other bus operations in
parallel.
Sure it does if it writes to shared cachelines. Given that you're
incrementing refcounts, I'm reasonable sure that you're touching lots
of shared cachelines.
Ok, sure, but it's still mostly local to the sending task. We take
locks and ref-counts on the task-struct and mm, which is for most
parts local to the CPU the task runs on. But this is inherent to
accessing this kind of data, which is the fundamental difference in
our views here, as seen below..
You're also refcounting the struct cred
No?

We do ref-count the group-info, but that is actually redundant as we
just copy the IDs. We should drop this, since group-info of 'current'
can be accessed right away. I noted it down.
Post by Andy Lutomirski
and there's no good reason
for that to be local. (It might be a bit more local than intended
because of the absurd things that the key subsystem does to struct
cred, but IMO users should turn that off or the kernel should fix it.)
Even more globally, I think you're touching init_user_ns's refcount in
most scenarios. That's about as global as it gets.
get_user_ns() in metadata.c is a workaround (as the comment there
explains). With better export-helpers for caps, we can simply drop it.
It's conditional on KDBUS_ATTACH_CAPS, anyway.
Post by Andy Lutomirski
(Also, is there an easy benchmark to see how much time it takes to
send and receive metadata? I tried to get the kdbus test to do this,
and I failed. I probably did it wrong.)
patch for out-of-tree kdbus:
https://gist.github.com/dvdhrm/3ac4339bf94fadc13b98

Update it to pass _KDBUS_ATTACH_ALL for both arguments of
kdbus_conn_update_attach_flags().
Post by Andy Lutomirski
Post by David Herrmann
Post by Andy Lutomirski
Post by David Herrmann
Furthermore, it's way faster than collecting the "same" data
via /proc, so I don't think it slows down the overall transaction at
all. If a receiver doesn't want metadata, it should not request it (by
setting the receiver-metadata-mask). If a sender doesn't like the
overhead, it should not send the metadata (by setting the
sender-metadata-mask). Only if both peers set the metadata mask, it
will be transmitted.
But you're comparing to the wrong thing, IMO. Of course it's much
faster than /proc hackery, but it's probably much slower to do the
metadata operation once per message than to do it when you connect to
the endpoint. (Gah! It's a "bus" that could easily have tons of
users but a single "endpoint". I'm still not used to it.)
Yes, of course your assumption is right if you compare against
per-connection caches, instead of per-message metadata. But we do
support _both_ use-cases, so we don't impose any policy.
We still believe "live"-metadata is a crucial feature of kdbus,
despite the known performance penalties.
[...]
Post by Andy Lutomirski
This is even more true if this feature
is *inconsistent* with legacy userspace (i.e. userspace dbus).
Live metadata is already supported on UDS via SCM_CREDENTIALS, we just
extend it to other metadata items. It's not a new invention by us.
Debian code-search on SO_PASSCRED and SCM_CREDENTIALS gives lots of
results.

Netlink, as a major example of an existing bus API, already uses
SCM_CREDENTIALS as primary way to transmit metadata.
Post by Andy Lutomirski
I could be wrong about the lack of use cases. If so, please enlighten me.
We have several dbus APIs that allow clients to register as a special
handler/controller/etc. (eg., see systemd-logind TakeControl()). The
API provider checks the privileges of a client on registration and
then just tracks the client ID. This way, the client can be privileged
when asking for special access, then drop privileges and still use the
interface. You cannot re-connect in between, as the API provider
tracks your bus ID. Without message-metadata, all your (other) calls
on this bus would always be treated as privileged. We *really* want to
avoid this.

Another example is logging, where we want exact data at the time a
message is logged. Otherwise, the data is useless. With
message-metadata, you can figure out the exact situation a process was
in when a specific message was logged. Furthermore, it is impossible
to read such data from /proc, as the process might already be dead.
Which is a _real_ problem right now!
Similarly, system monitoring wants message-metadata for the same
reasons. And it needs to be reliable, you don't want malicious
sandboxes to mess with your logs.

kdbus is a _bus_, not a p2p channel. Thus, a peer may talk to multiple
destinations, and it may want to look different to each of them. DBus
method-calls allow 'syscall'-ish behavior when calling into other
processes. We *want* to be able to drop privileges after doing process
setup. We want further bus-calls to no longer be treated privileged.

Furthermore, DBus was designed to allow peers to track other peers
(which is why it always had the NameOwnerChanged signal). This is an
essential feature, that simplifies access-management considerably, as
you can cache it together with the unique name of a peer. We only open
a single connection to a bus. glib, libdbus, efl, ell, qt, sd-bus, and
others use cached bus-connections that are shared by all code of a
single thread. Hence, the bus connection is kinda part of the process
itself, like stdin/stdout. Without message-metadata, it is impossible
to ever drop privileges on a bus, without losing all state.

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2015-03-25 18:12:37 UTC
Permalink
Post by David Herrmann
Hi
Post by Andy Lutomirski
Post by David Herrmann
Post by Andy Lutomirski
Post by David Herrmann
metadata handling is local to the connection that sends the message.
It does not affect the overall performance of other bus operations in
parallel.
Sure it does if it writes to shared cachelines. Given that you're
incrementing refcounts, I'm reasonable sure that you're touching lots
of shared cachelines.
Ok, sure, but it's still mostly local to the sending task. We take
locks and ref-counts on the task-struct and mm, which is for most
parts local to the CPU the task runs on. But this is inherent to
accessing this kind of data, which is the fundamental difference in
our views here, as seen below..
You're also refcounting the struct cred
No?
We do ref-count the group-info, but that is actually redundant as we
just copy the IDs. We should drop this, since group-info of 'current'
can be accessed right away. I noted it down.
OK
Post by David Herrmann
Post by Andy Lutomirski
and there's no good reason
for that to be local. (It might be a bit more local than intended
because of the absurd things that the key subsystem does to struct
cred, but IMO users should turn that off or the kernel should fix it.)
Even more globally, I think you're touching init_user_ns's refcount in
most scenarios. That's about as global as it gets.
get_user_ns() in metadata.c is a workaround (as the comment there
explains). With better export-helpers for caps, we can simply drop it.
It's conditional on KDBUS_ATTACH_CAPS, anyway.
Fair enough.
Post by David Herrmann
Post by Andy Lutomirski
(Also, is there an easy benchmark to see how much time it takes to
send and receive metadata? I tried to get the kdbus test to do this,
and I failed. I probably did it wrong.)
https://gist.github.com/dvdhrm/3ac4339bf94fadc13b98
Update it to pass _KDBUS_ATTACH_ALL for both arguments of
kdbus_conn_update_attach_flags().
Post by Andy Lutomirski
Post by David Herrmann
Post by Andy Lutomirski
Post by David Herrmann
Furthermore, it's way faster than collecting the "same" data
via /proc, so I don't think it slows down the overall transaction at
all. If a receiver doesn't want metadata, it should not request it (by
setting the receiver-metadata-mask). If a sender doesn't like the
overhead, it should not send the metadata (by setting the
sender-metadata-mask). Only if both peers set the metadata mask, it
will be transmitted.
But you're comparing to the wrong thing, IMO. Of course it's much
faster than /proc hackery, but it's probably much slower to do the
metadata operation once per message than to do it when you connect to
the endpoint. (Gah! It's a "bus" that could easily have tons of
users but a single "endpoint". I'm still not used to it.)
Yes, of course your assumption is right if you compare against
per-connection caches, instead of per-message metadata. But we do
support _both_ use-cases, so we don't impose any policy.
We still believe "live"-metadata is a crucial feature of kdbus,
despite the known performance penalties.
[...]
Post by Andy Lutomirski
This is even more true if this feature
is *inconsistent* with legacy userspace (i.e. userspace dbus).
Live metadata is already supported on UDS via SCM_CREDENTIALS, we just
extend it to other metadata items. It's not a new invention by us.
Debian code-search on SO_PASSCRED and SCM_CREDENTIALS gives lots of
results.
Netlink, as a major example of an existing bus API, already uses
SCM_CREDENTIALS as primary way to transmit metadata.
Post by Andy Lutomirski
I could be wrong about the lack of use cases. If so, please enlighten me.
We have several dbus APIs that allow clients to register as a special
handler/controller/etc. (eg., see systemd-logind TakeControl()). The
API provider checks the privileges of a client on registration and
then just tracks the client ID. This way, the client can be privileged
when asking for special access, then drop privileges and still use the
interface. You cannot re-connect in between, as the API provider
tracks your bus ID. Without message-metadata, all your (other) calls
on this bus would always be treated as privileged. We *really* want to
avoid this.
Connect twice?

You *already* have to reconnect or connect twice because you have
per-connection metadata. That's part of my problem with this scheme
-- you support *both styles*, which seems like it'll give you most of
the downsides of both without the upsides.
Post by David Herrmann
Another example is logging, where we want exact data at the time a
message is logged. Otherwise, the data is useless.
Why?

No, really, why is exact data at the time of logging so important? It
sounds nice, but I really don't see it.
Post by David Herrmann
With
message-metadata, you can figure out the exact situation a process was
in when a specific message was logged. Furthermore, it is impossible
to read such data from /proc, as the process might already be dead.
Which is a _real_ problem right now!
Similarly, system monitoring wants message-metadata for the same
reasons. And it needs to be reliable, you don't want malicious
sandboxes to mess with your logs.
Huh? A "malicious sandbox" can always impersonate itself, whether by
connecting and handing off a connection or simply by relaying mesages.
Post by David Herrmann
kdbus is a _bus_, not a p2p channel. Thus, a peer may talk to multiple
destinations, and it may want to look different to each of them. DBus
method-calls allow 'syscall'-ish behavior when calling into other
processes. We *want* to be able to drop privileges after doing process
setup. We want further bus-calls to no longer be treated privileged.
You could have an IOCTL that re-captures your connection metata.
Post by David Herrmann
Furthermore, DBus was designed to allow peers to track other peers
(which is why it always had the NameOwnerChanged signal). This is an
essential feature, that simplifies access-management considerably, as
you can cache it together with the unique name of a peer. We only open
a single connection to a bus. glib, libdbus, efl, ell, qt, sd-bus, and
others use cached bus-connections that are shared by all code of a
single thread. Hence, the bus connection is kinda part of the process
itself, like stdin/stdout. Without message-metadata, it is impossible
to ever drop privileges on a bus, without losing all state.
See above about an IOCTL that re-captures your connection metadata.

Again, you seem to be arguing that per-connection metadata is bad, but
you still have an implementation of per-connection metadata, so you
still have all these problems.

I'm actually okay with per-message metadata in principle, but I'd like
to see evidence (with numbers, please) that a send+recv of per-message
metadata is *not* significantly slower than a recv of already-captured
per-connection metadata. If this is in fact the case, then maybe you
should trash per-connection metadata instead and the legacy
compatibility code can figure out a way to deal with it. IMO that
would be a pretty nice outcome, since you would never have to worry
whether your connection to the bus is inadvertantly privileged.

(Also, FWIW, it seems like what you really want is a capability model,
in which you grab a handle to some service and that handle captures
all your privileges wrt that service. Per-message metadata is even
farther from this than per-connection-to-the-bus metadata, but neither
one is particularly close.)
Post by David Herrmann
Thanks
David
--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
David Herrmann
2015-03-30 16:56:54 UTC
Permalink
Hi
[...]
Post by Andy Lutomirski
Post by David Herrmann
Post by Andy Lutomirski
I could be wrong about the lack of use cases. If so, please enlighten me.
We have several dbus APIs that allow clients to register as a special
handler/controller/etc. (eg., see systemd-logind TakeControl()). The
API provider checks the privileges of a client on registration and
then just tracks the client ID. This way, the client can be privileged
when asking for special access, then drop privileges and still use the
interface. You cannot re-connect in between, as the API provider
tracks your bus ID. Without message-metadata, all your (other) calls
on this bus would always be treated as privileged. We *really* want to
avoid this.
Connect twice?
The remote peer might cache your connection ID to track you. You have
to use the same connection to talk to that peer, in this given
scenario. That's how D-Bus1 is used today, and we have to follow the
semantics.
Post by Andy Lutomirski
You *already* have to reconnect or connect twice because you have
per-connection metadata. That's part of my problem with this scheme
-- you support *both styles*, which seems like it'll give you most of
the downsides of both without the upsides.
Not necessarily. Connection metadata describes the state at the time
you connected to the bus. If someone ask for this information, they
will get exactly that. In this model, you cannot drop privileges, if
you need to be privileged during setup.
If someone asks for per-message metadata, they better ought not ask
for per-connection creds. It's not the information they're looking
for, so it will not match the data that at the time the message was
sent.

There is no immediate need to make both match. For security decisions,
we mandate per-message creds. Per-connection creds are for
backwards-compatibility to dbus1 and for passive introspection of bus
connections.
Post by Andy Lutomirski
Post by David Herrmann
Another example is logging, where we want exact data at the time a
message is logged. Otherwise, the data is useless.
Why?
No, really, why is exact data at the time of logging so important? It
sounds nice, but I really don't see it.
Example: If you don't have message-metadata, you don't know the thread
which sent a log-message. In a multi-threaded application, that's
incredibly useful information.

After all, logging is all about correct data. Logging creds that were
not effective at the time the message was sent is futile.
Post by Andy Lutomirski
Post by David Herrmann
kdbus is a _bus_, not a p2p channel. Thus, a peer may talk to multiple
destinations, and it may want to look different to each of them. DBus
method-calls allow 'syscall'-ish behavior when calling into other
processes. We *want* to be able to drop privileges after doing process
setup. We want further bus-calls to no longer be treated privileged.
You could have an IOCTL that re-captures your connection metata.
I don't see how this makes the model easier, or more predictable. On
the contrary, per-connection metadata is no longer compatible to UDS /
dbus1, nor is a per-message metadata concept reliable.
Post by Andy Lutomirski
Post by David Herrmann
Furthermore, DBus was designed to allow peers to track other peers
(which is why it always had the NameOwnerChanged signal). This is an
essential feature, that simplifies access-management considerably, as
you can cache it together with the unique name of a peer. We only open
a single connection to a bus. glib, libdbus, efl, ell, qt, sd-bus, and
others use cached bus-connections that are shared by all code of a
single thread. Hence, the bus connection is kinda part of the process
itself, like stdin/stdout. Without message-metadata, it is impossible
to ever drop privileges on a bus, without losing all state.
See above about an IOCTL that re-captures your connection metadata.
Again, you seem to be arguing that per-connection metadata is bad, but
you still have an implementation of per-connection metadata, so you
still have all these problems.
I don't see why we get the problems of per-connection metadata. Just
because you _can_ use it, doesn't mean you should use it for all
imaginable use-cases. The same goes for reading information from
/proc. There are valid use-cases to do so, but also a lot of cases
where it will not provide the information you want.
Post by Andy Lutomirski
I'm actually okay with per-message metadata in principle, but I'd like
to see evidence (with numbers, please) that a send+recv of per-message
metadata is *not* significantly slower than a recv of already-captured
per-connection metadata. If this is in fact the case, then maybe you
should trash per-connection metadata instead and the legacy
compatibility code can figure out a way to deal with it. IMO that
would be a pretty nice outcome, since you would never have to worry
whether your connection to the bus is inadvertantly privileged.
Per-message metadata makes SEND about 25% slower, if you transmit the
full set of all possible information. Just 3% if you only use
PIDs+UIDs. The expensive metadata is cgroup-path and exe-path.
If a service needs that information, however, and if that information
is not guaranteed to be up-to-date, the service _will_ go and look it
up in /proc or somewhere else, which is certainly a whole lot more
expensive than the code in kdbus.

In general, there seems to be a number of misconception in this thread
about what kdbus is supposed to be. We're not inventing something new
here with a clean slate, but we're moving parts of an existing
implementation that has tons of users into the kernel, in order to fix
issues that cannot be fixed otherwise in userspace (most notably, the
race gaps that exist when retrieving per-message metadata). Therefore,
we have to keep existing semantics stable, otherwise the exercise is
somewhat pointless.

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2015-03-31 13:59:21 UTC
Permalink
Post by David Herrmann
Hi
[...]
Post by Andy Lutomirski
Post by David Herrmann
Post by Andy Lutomirski
I could be wrong about the lack of use cases. If so, please enlighten me.
We have several dbus APIs that allow clients to register as a special
handler/controller/etc. (eg., see systemd-logind TakeControl()). The
API provider checks the privileges of a client on registration and
then just tracks the client ID. This way, the client can be privileged
when asking for special access, then drop privileges and still use the
interface. You cannot re-connect in between, as the API provider
tracks your bus ID. Without message-metadata, all your (other) calls
on this bus would always be treated as privileged. We *really* want to
avoid this.
Connect twice?
The remote peer might cache your connection ID to track you. You have
to use the same connection to talk to that peer, in this given
scenario. That's how D-Bus1 is used today, and we have to follow the
semantics.
That's yet another reason that you really ought to disconnect and
reconnect after a privilege change -- the remote peer might remember
you.

The "might" will be a big problem. Users of kdbus can't rely on any
particular concept of privilege because you have too many of them.
Post by David Herrmann
Post by Andy Lutomirski
You *already* have to reconnect or connect twice because you have
per-connection metadata. That's part of my problem with this scheme
-- you support *both styles*, which seems like it'll give you most of
the downsides of both without the upsides.
Not necessarily. Connection metadata describes the state at the time
you connected to the bus. If someone ask for this information, they
will get exactly that. In this model, you cannot drop privileges, if
you need to be privileged during setup.
If someone asks for per-message metadata, they better ought not ask
for per-connection creds. It's not the information they're looking
for, so it will not match the data that at the time the message was
sent.
There is no immediate need to make both match. For security decisions,
we mandate per-message creds. Per-connection creds are for
backwards-compatibility to dbus1 and for passive introspection of bus
connections.
Backwards compatibility doesn't magically exempt security
considerations. If new code is insecure when talking to a legacy
service, it's still insecure.

[...]
Post by David Herrmann
Post by Andy Lutomirski
Again, you seem to be arguing that per-connection metadata is bad, but
you still have an implementation of per-connection metadata, so you
still have all these problems.
I don't see why we get the problems of per-connection metadata. Just
because you _can_ use it, doesn't mean you should use it for all
imaginable use-cases. The same goes for reading information from
/proc. There are valid use-cases to do so, but also a lot of cases
where it will not provide the information you want.
Then you'll need to document really carefully which metadata is used
for which service. This actually seems impossible to do, since some
services will exist in legacy and kdbus forms.
Post by David Herrmann
Post by Andy Lutomirski
I'm actually okay with per-message metadata in principle, but I'd like
to see evidence (with numbers, please) that a send+recv of per-message
metadata is *not* significantly slower than a recv of already-captured
per-connection metadata. If this is in fact the case, then maybe you
should trash per-connection metadata instead and the legacy
compatibility code can figure out a way to deal with it. IMO that
would be a pretty nice outcome, since you would never have to worry
whether your connection to the bus is inadvertantly privileged.
Per-message metadata makes SEND about 25% slower, if you transmit the
full set of all possible information. Just 3% if you only use
PIDs+UIDs. The expensive metadata is cgroup-path and exe-path.
If a service needs that information, however, and if that information
is not guaranteed to be up-to-date, the service _will_ go and look it
up in /proc or somewhere else, which is certainly a whole lot more
expensive than the code in kdbus.
Can you give actual numbers, in ns or cycles, of how much overhead
metadata adds?
Post by David Herrmann
In general, there seems to be a number of misconception in this thread
about what kdbus is supposed to be. We're not inventing something new
here with a clean slate, but we're moving parts of an existing
implementation that has tons of users into the kernel, in order to fix
issues that cannot be fixed otherwise in userspace (most notably, the
race gaps that exist when retrieving per-message metadata). Therefore,
we have to keep existing semantics stable, otherwise the exercise is
somewhat pointless.
IOW you're taking something that you dislike aspects of and shoving
most of it in the kernel. That guarantees us an API in the kernel
that even the creators don't really like. This is, IMO, very
unfortunate.

Have you considered porting the kdbus per-message metadata mechanism to UDS?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Tom Gundersen
2015-03-31 15:11:29 UTC
Permalink
Post by Andy Lutomirski
Post by David Herrmann
Hi
[...]
Post by Andy Lutomirski
Post by David Herrmann
Post by Andy Lutomirski
I could be wrong about the lack of use cases. If so, please enlighten me.
We have several dbus APIs that allow clients to register as a special
handler/controller/etc. (eg., see systemd-logind TakeControl()). The
API provider checks the privileges of a client on registration and
then just tracks the client ID. This way, the client can be privileged
when asking for special access, then drop privileges and still use the
interface. You cannot re-connect in between, as the API provider
tracks your bus ID. Without message-metadata, all your (other) calls
on this bus would always be treated as privileged. We *really* want to
avoid this.
Connect twice?
The remote peer might cache your connection ID to track you. You have
to use the same connection to talk to that peer, in this given
scenario. That's how D-Bus1 is used today, and we have to follow the
semantics.
That's yet another reason that you really ought to disconnect and
reconnect after a privilege change -- the remote peer might remember
you.
The "might" will be a big problem. Users of kdbus can't rely on any
particular concept of privilege because you have too many of them.
Post by David Herrmann
Post by Andy Lutomirski
You *already* have to reconnect or connect twice because you have
per-connection metadata. That's part of my problem with this scheme
-- you support *both styles*, which seems like it'll give you most of
the downsides of both without the upsides.
Not necessarily. Connection metadata describes the state at the time
you connected to the bus. If someone ask for this information, they
will get exactly that. In this model, you cannot drop privileges, if
you need to be privileged during setup.
If someone asks for per-message metadata, they better ought not ask
for per-connection creds. It's not the information they're looking
for, so it will not match the data that at the time the message was
sent.
There is no immediate need to make both match. For security decisions,
we mandate per-message creds. Per-connection creds are for
backwards-compatibility to dbus1 and for passive introspection of bus
connections.
Backwards compatibility doesn't magically exempt security
considerations.
No one is arguing that.
Post by Andy Lutomirski
If new code is insecure when talking to a legacy
service, it's still insecure.
[...]
Post by David Herrmann
Post by Andy Lutomirski
Again, you seem to be arguing that per-connection metadata is bad, but
you still have an implementation of per-connection metadata, so you
still have all these problems.
I don't see why we get the problems of per-connection metadata. Just
because you _can_ use it, doesn't mean you should use it for all
imaginable use-cases. The same goes for reading information from
/proc. There are valid use-cases to do so, but also a lot of cases
where it will not provide the information you want.
Then you'll need to document really carefully which metadata is used
for which service. This actually seems impossible to do, since some
services will exist in legacy and kdbus forms.
Post by David Herrmann
Post by Andy Lutomirski
I'm actually okay with per-message metadata in principle, but I'd like
to see evidence (with numbers, please) that a send+recv of per-message
metadata is *not* significantly slower than a recv of already-captured
per-connection metadata. If this is in fact the case, then maybe you
should trash per-connection metadata instead and the legacy
compatibility code can figure out a way to deal with it. IMO that
would be a pretty nice outcome, since you would never have to worry
whether your connection to the bus is inadvertantly privileged.
Per-message metadata makes SEND about 25% slower, if you transmit the
full set of all possible information. Just 3% if you only use
PIDs+UIDs. The expensive metadata is cgroup-path and exe-path.
If a service needs that information, however, and if that information
is not guaranteed to be up-to-date, the service _will_ go and look it
up in /proc or somewhere else, which is certainly a whole lot more
expensive than the code in kdbus.
Can you give actual numbers, in ns or cycles, of how much overhead
metadata adds?
Post by David Herrmann
In general, there seems to be a number of misconception in this thread
about what kdbus is supposed to be. We're not inventing something new
here with a clean slate, but we're moving parts of an existing
implementation that has tons of users into the kernel, in order to fix
issues that cannot be fixed otherwise in userspace (most notably, the
race gaps that exist when retrieving per-message metadata). Therefore,
we have to keep existing semantics stable, otherwise the exercise is
somewhat pointless.
IOW you're taking something that you dislike aspects of and shoving
most of it in the kernel. That guarantees us an API in the kernel
that even the creators don't really like. This is, IMO, very
unfortunate.
This is a misrepresentation of what David wrote. We do want this API
regardless of dbus1 compatibility, but compatibility is by itself a
sufficient motivation. A further motivation is reliable introspection,
since this meta-data allows listing current peers on the bus and
showing their identities. That's hugely useful to make the bus
transparent to admins.
Post by Andy Lutomirski
Have you considered porting the kdbus per-message metadata mechanism to UDS?
As outlined before, enhancing UDS by porting the metadata mechanism
from kdbus would not be sufficient to solve all the problems we need
to solve, so it is not something we are currently working on.

Cheers,

Tom
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2015-03-31 18:29:34 UTC
Permalink
Post by Tom Gundersen
Post by Andy Lutomirski
IOW you're taking something that you dislike aspects of and shoving
most of it in the kernel. That guarantees us an API in the kernel
that even the creators don't really like. This is, IMO, very
unfortunate.
This is a misrepresentation of what David wrote. We do want this API
regardless of dbus1 compatibility, but compatibility is by itself a
sufficient motivation. A further motivation is reliable introspection,
since this meta-data allows listing current peers on the bus and
showing their identities. That's hugely useful to make the bus
transparent to admins.
I've heard the following use cases for per-connection metadata:

- Authenticating to dbus1 services.

- Identifying connected users for admin diagnostics.

I've heard the following use cases for per-message metadata:

- Logging.

- Authenticating to kdbus services that want this style of authentication.

The only reasonable conclusion I've been able to draw is that the dbus
community intends to use *both* per-connection and per-message
metadata for authentication. This means that, as a general rule,
dropping privileges while you have an open kdbus connection has poorly
defined effects.

It's particularly alarming that, when I express a concern about
logging, the kdbus authors cite authentication as an alternate
justification, and, when I cite a concern about authentication, the
kdbus authors cite logging as an alternative justificaiton.

This is simply not okay for a modern interface, and in my opinion the
kernel should not carry code to support new APIs with weakly defined
security semantics. It's important that one be able to tell what the
security implications of one's code is without cross-referencing with
the implementation of the server's you're interacting with.

To top that off, the kdbus policy mechanism has truly bizarre effects
with respect to services that have unique ids and well-known names.
That, too, is apparently for compatibility.

This all feels to me like a total of about four people are going to
understand the tangle that is kdbus security, and that's bad. I think
that the kernel should insist that new security-critical mechanisms
make sense and be hard to misuse. The current design of kdbus, in
contrast, feel like it will be very hard to use correctly.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
David Herrmann
2015-04-03 11:51:41 UTC
Permalink
Hi
Post by Andy Lutomirski
Post by Tom Gundersen
Post by Andy Lutomirski
IOW you're taking something that you dislike aspects of and shoving
most of it in the kernel. That guarantees us an API in the kernel
that even the creators don't really like. This is, IMO, very
unfortunate.
This is a misrepresentation of what David wrote. We do want this API
regardless of dbus1 compatibility, but compatibility is by itself a
sufficient motivation. A further motivation is reliable introspection,
since this meta-data allows listing current peers on the bus and
showing their identities. That's hugely useful to make the bus
transparent to admins.
- Authenticating to dbus1 services.
Not necessarily authentication, but we need to support the legacy API,
for whatever reason it was used by old applications. But..
Post by Andy Lutomirski
- Identifying connected users for admin diagnostics.
- Logging.
- Authenticating to kdbus services that want this style of authentication.
.please note that authentication on DBus has always been done with
per-message metadata (see polkit history). However, this had to be
reverted some years ago as it is racy (it used /proc for that, which
can be exploited by exec'ing setuid binaries). However, the
per-message metadata authentication worked very well for _years_
(minus the race..), so this is already a well-established scheme. With
kdbus we can finally implement this in a race-free manner.

[...]
Post by Andy Lutomirski
This is simply not okay for a modern interface, and in my opinion the
kernel should not carry code to support new APIs with weakly defined
security semantics. It's important that one be able to tell what the
security implications of one's code is without cross-referencing with
the implementation of the server's you're interacting with.
Again, I disagree. Our concepts are established and used on UDS and
DBus for decades.

Yes, we provide two ways to retrieve metadata, but the kernel offers
several more paths to gather that information. Just because those APIs
are available does not mean they should be used for authentication. We
mandate per-message metadata. If applications use per-connection
metadata, /proc, netlink, or random data, they're doing it wrong.

Furthermore, dbus provides pretty complete and straightforward
libraries which hide that from you. If you use glib, qt or sd-bus, you
don't even need to deal with all that.
Post by Andy Lutomirski
To top that off, the kdbus policy mechanism has truly bizarre effects
with respect to services that have unique ids and well-known names.
That, too, is apparently for compatibility.
This all feels to me like a total of about four people are going to
understand the tangle that is kdbus security, and that's bad. I think
that the kernel should insist that new security-critical mechanisms
make sense and be hard to misuse. The current design of kdbus, in
contrast, feel like it will be very hard to use correctly.
Native kdbus clients are authenticated with their credentials at time
of method call. Legacy clients will always have their credentials at
time of connect in effect. No fallbacks, no choices. It's a simple
question whether it's a legacy client or not.
Sounds simple to me.

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Eric W. Biederman
2015-04-05 12:09:36 UTC
Permalink
Post by Tom Gundersen
Hi
Post by Andy Lutomirski
On Tue, Mar 31, 2015 at 3:58 PM, Andy Lutomirski
Post by Andy Lutomirski
IOW you're taking something that you dislike aspects of and shoving
most of it in the kernel. That guarantees us an API in the kernel
that even the creators don't really like. This is, IMO, very
unfortunate.
This is a misrepresentation of what David wrote. We do want this API
regardless of dbus1 compatibility, but compatibility is by itself a
sufficient motivation. A further motivation is reliable
introspection,
Post by Andy Lutomirski
since this meta-data allows listing current peers on the bus and
showing their identities. That's hugely useful to make the bus
transparent to admins.
- Authenticating to dbus1 services.
Not necessarily authentication, but we need to support the legacy API,
for whatever reason it was used by old applications. But..
Post by Andy Lutomirski
- Identifying connected users for admin diagnostics.
- Logging.
- Authenticating to kdbus services that want this style of
authentication.
..please note that authentication on DBus has always been done with
per-message metadata (see polkit history). However, this had to be
reverted some years ago as it is racy (it used /proc for that, which
can be exploited by exec'ing setuid binaries). However, the
per-message metadata authentication worked very well for _years_
(minus the race..), so this is already a well-established scheme. With
kdbus we can finally implement this in a race-free manner.
[...]
Post by Andy Lutomirski
This is simply not okay for a modern interface, and in my opinion the
kernel should not carry code to support new APIs with weakly defined
security semantics. It's important that one be able to tell what the
security implications of one's code is without cross-referencing with
the implementation of the server's you're interacting with.
Again, I disagree. Our concepts are established and used on UDS and
DBus for decades.
Yes, we provide two ways to retrieve metadata, but the kernel offers
several more paths to gather that information. Just because those APIs
are available does not mean they should be used for authentication. We
mandate per-message metadata. If applications use per-connection
metadata, /proc, netlink, or random data, they're doing it wrong.
Furthermore, dbus provides pretty complete and straightforward
libraries which hide that from you. If you use glib, qt or sd-bus, you
don't even need to deal with all that.
Post by Andy Lutomirski
To top that off, the kdbus policy mechanism has truly bizarre effects
with respect to services that have unique ids and well-known names.
That, too, is apparently for compatibility.
This all feels to me like a total of about four people are going to
understand the tangle that is kdbus security, and that's bad. I
think
Post by Andy Lutomirski
that the kernel should insist that new security-critical mechanisms
make sense and be hard to misuse. The current design of kdbus, in
contrast, feel like it will be very hard to use correctly.
Native kdbus clients are authenticated with their credentials at time
of method call. Legacy clients will always have their credentials at
time of connect in effect. No fallbacks, no choices. It's a simple
question whether it's a legacy client or not.
Sounds simple to me.
So I just took a slightly deeper look and the user namespace bits are wrong. Both in implementation
and in design.

Passing "capabilities" to user space for reasons of authentication is wrong and a maintenance nightmare. Further the capabilities maintainer Serge Hallyn has not been copied.

There are several other pieces of information in your meta data like cmdline that I have similar concerns about, but are I am less familiar with, and have looked at less.

Which leads my to conclude that in its current form kdbus is inappropriate for inclusion in the kernel.

The code is dangerously and inappropriately wrong and comes with a huge maintenance obligation to people outside of kdbus.

Nacked-by: ***@xmission.com

The only way I can see this code being responsibly merged is for all if the metadata to be thrown out. The basics merged and then one small piece at a time with copious review and explanation the metadata be added back in.

If you can not throw out the meta data the kdbus code is too broken in concept to warrant serious consideration.

Eric


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2015-04-05 13:46:51 UTC
Permalink
Post by Eric W. Biederman
Post by Tom Gundersen
Hi
Post by Andy Lutomirski
On Tue, Mar 31, 2015 at 3:58 PM, Andy Lutomirski
Post by Andy Lutomirski
IOW you're taking something that you dislike aspects of and shoving
most of it in the kernel. That guarantees us an API in the kernel
that even the creators don't really like. This is, IMO, very
unfortunate.
This is a misrepresentation of what David wrote. We do want this API
regardless of dbus1 compatibility, but compatibility is by itself a
sufficient motivation. A further motivation is reliable
introspection,
Post by Andy Lutomirski
since this meta-data allows listing current peers on the bus and
showing their identities. That's hugely useful to make the bus
transparent to admins.
- Authenticating to dbus1 services.
Not necessarily authentication, but we need to support the legacy API,
for whatever reason it was used by old applications. But..
Post by Andy Lutomirski
- Identifying connected users for admin diagnostics.
- Logging.
- Authenticating to kdbus services that want this style of
authentication.
..please note that authentication on DBus has always been done with
per-message metadata (see polkit history). However, this had to be
reverted some years ago as it is racy (it used /proc for that, which
can be exploited by exec'ing setuid binaries). However, the
per-message metadata authentication worked very well for _years_
(minus the race..), so this is already a well-established scheme. With
kdbus we can finally implement this in a race-free manner.
[...]
Post by Andy Lutomirski
This is simply not okay for a modern interface, and in my opinion the
kernel should not carry code to support new APIs with weakly defined
security semantics. It's important that one be able to tell what the
security implications of one's code is without cross-referencing with
the implementation of the server's you're interacting with.
Again, I disagree. Our concepts are established and used on UDS and
DBus for decades.
Yes, we provide two ways to retrieve metadata, but the kernel offers
several more paths to gather that information. Just because those APIs
are available does not mean they should be used for authentication. We
mandate per-message metadata. If applications use per-connection
metadata, /proc, netlink, or random data, they're doing it wrong.
Furthermore, dbus provides pretty complete and straightforward
libraries which hide that from you. If you use glib, qt or sd-bus, you
don't even need to deal with all that.
Post by Andy Lutomirski
To top that off, the kdbus policy mechanism has truly bizarre effects
with respect to services that have unique ids and well-known names.
That, too, is apparently for compatibility.
This all feels to me like a total of about four people are going to
understand the tangle that is kdbus security, and that's bad. I
think
Post by Andy Lutomirski
that the kernel should insist that new security-critical mechanisms
make sense and be hard to misuse. The current design of kdbus, in
contrast, feel like it will be very hard to use correctly.
Native kdbus clients are authenticated with their credentials at time
of method call. Legacy clients will always have their credentials at
time of connect in effect. No fallbacks, no choices. It's a simple
question whether it's a legacy client or not.
Sounds simple to me.
So I just took a slightly deeper look and the user namespace bits are wrong. Both in implementation
and in design.
Passing "capabilities" to user space for reasons of authentication is wrong and a maintenance nightmare. Further the capabilities maintainer Serge Hallyn has not been copied.
I don't understand, where are passed that are not already exported today
through /proc/ already? kdbus gathers this information in a race-free
way, unlike having to dig this out of proc and hope that nothing has
changed underneath you.
Post by Eric W. Biederman
There are several other pieces of information in your meta data like cmdline that I have similar concerns about, but are I am less familiar with, and have looked at less.
Again, cmdline is also exported today, why is passing that somehow not
acceptable?
Post by Eric W. Biederman
Which leads my to conclude that in its current form kdbus is inappropriate for inclusion in the kernel.
Ah, so we should also remove those fields from /proc/ today as well, and
just break all of userspace that relies on it today? Again, kdbus is
just doing the same thing that userspace is doing today, but in a
race-free manner.
Post by Eric W. Biederman
The code is dangerously and inappropriately wrong and comes with a huge maintenance obligation to people outside of kdbus.
How so? Please explain.

Oh, and please wrap your email properly, reading it this way is a
horrible experience, you know better than that...

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2015-04-08 22:39:25 UTC
Permalink
[Trying again due to HTML mail goof. Trimming and responding better, too.]
Post by David Herrmann
Hi
Post by Andy Lutomirski
Post by Tom Gundersen
Post by Andy Lutomirski
IOW you're taking something that you dislike aspects of and shoving
most of it in the kernel. That guarantees us an API in the kernel
that even the creators don't really like. This is, IMO, very
unfortunate.
This is a misrepresentation of what David wrote. We do want this API
regardless of dbus1 compatibility, but compatibility is by itself a
sufficient motivation. A further motivation is reliable introspection,
since this meta-data allows listing current peers on the bus and
showing their identities. That's hugely useful to make the bus
transparent to admins.
- Authenticating to dbus1 services.
Not necessarily authentication, but we need to support the legacy API,
for whatever reason it was used by old applications. But..
Post by Andy Lutomirski
- Identifying connected users for admin diagnostics.
- Logging.
- Authenticating to kdbus services that want this style of authentication.
..please note that authentication on DBus has always been done with
per-message metadata (see polkit history). However, this had to be
reverted some years ago as it is racy (it used /proc for that, which
can be exploited by exec'ing setuid binaries). However, the
per-message metadata authentication worked very well for _years_
(minus the race..), so this is already a well-established scheme. With
kdbus we can finally implement this in a race-free manner.
[...]
Post by Andy Lutomirski
This is simply not okay for a modern interface, and in my opinion the
kernel should not carry code to support new APIs with weakly defined
security semantics. It's important that one be able to tell what the
security implications of one's code is without cross-referencing with
the implementation of the server's you're interacting with.
Again, I disagree. Our concepts are established and used on UDS and
DBus for decades.
SO_PASSCRED does not justify anything in my book. It was a mistake
and it remains a minor disaster. Please don't use it as justification
for something being a good idea.

Similarly, the fact that the *receiver* chooses which of SO_PASSCRED
and SO_PEERCRED to use is awful.
Post by David Herrmann
Yes, we provide two ways to retrieve metadata, but the kernel offers
several more paths to gather that information. Just because those APIs
are available does not mean they should be used for authentication. We
mandate per-message metadata. If applications use per-connection
metadata, /proc, netlink, or random data, they're doing it wrong.
ISTM kdbus is trying to add three more interfaces, at least two of
which are also doing it wrong. (Which two is debatable.)
Post by David Herrmann
Furthermore, dbus provides pretty complete and straightforward
libraries which hide that from you. If you use glib, qt or sd-bus, you
don't even need to deal with all that.
Libraries can't hide the issue of whether:

init_my_favorite_library();
connect_to_thingy();
setresuid(nobody);

is secure. It is either secure, insecure, or ambiguous.
Unfortunately, kdbus as currently proposed is aiming for ambiguous.
Post by David Herrmann
Post by Andy Lutomirski
To top that off, the kdbus policy mechanism has truly bizarre effects
with respect to services that have unique ids and well-known names.
That, too, is apparently for compatibility.
This all feels to me like a total of about four people are going to
understand the tangle that is kdbus security, and that's bad. I think
that the kernel should insist that new security-critical mechanisms
make sense and be hard to misuse. The current design of kdbus, in
contrast, feel like it will be very hard to use correctly.
Native kdbus clients are authenticated with their credentials at time
of method call. Legacy clients will always have their credentials at
time of connect in effect. No fallbacks, no choices. It's a simple
question whether it's a legacy client or not.
Sounds simple to me.
I had the distinct impression that the kdbus-client-to-dbus1-server
proxy used kdbus clients' connection metadata. I could be wrong here.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Loading...