c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * CDDL HEADER START
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * The contents of this file are subject to the terms of the
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Common Development and Distribution License (the "License").
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * You may not use this file except in compliance with the License.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * See the License for the specific language governing permissions
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * and limitations under the License.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * When distributing Covered Code, include this CDDL HEADER in each
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * If applicable, add the following below this CDDL HEADER, with the
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * fields enclosed by brackets "[]" replaced with your own identifying
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * information: Portions Copyright [yyyy] [name of copyright owner]
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * CDDL HEADER END
d3d50737e566cade9a08d73d2af95105ac7cd960Rafael Vanoni * Copyright 2009 Sun Microsystems, Inc. All rights reserved.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Use is subject to license terms.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * CPU Caps implementation
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * =======================
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * A CPU cap can be set on any project or any zone. Zone CPU cap limits the CPU
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * usage for all projects running inside the zone. If the zone CPU cap is set
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * below the project CPU cap, the latter will have no effect.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * When CPU usage of projects and/or zones reaches specified caps, threads in
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * them do not get scheduled and instead are placed on wait queues associated
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * with a cap. Such threads will start running again only when CPU usage drops
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * below the cap level. Each zone and each project has its own wait queue.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * When CPU cap is set, the kernel continously keeps track of CPU time used by
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * capped zones and/or projects over a short time interval and calculates their
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * current CPU usage as a percentage. When the accumulated usage reaches the CPU
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * cap, LWPs running in the user-land (when they are not holding any critical
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * kernel locks) are placed on special wait queues until their project's or
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * zone's CPU usage drops below the cap.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * The system maintains a list of all capped projects and all capped zones. On
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * every clock tick every active thread belonging to a capped project adds its
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * CPU usage to its project. Usage from all projects belonging to a capped zone
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * is aggregated to get the zone usage.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * When the current CPU usage is above the cap, a project or zone is considered
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * over-capped. Every user thread caught running in an over-capped project or
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * zone is marked by setting TS_PROJWAITQ flag in thread's t_schedflag field and
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * is requested to surrender its CPU. This causes scheduling class specific
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * CL_PREEMPT() callback to be invoked. The callback function places threads
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * marked as TS_PROJWAIT on a wait queue and calls switch().
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Threads are only placed on wait queues after trapping from user-land
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * (they could be holding some user locks, but no kernel locks) and while
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * returning from the trap back to the user-land when no kernel locks are held.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Putting threads on wait queues in random places while running in the
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * kernel might lead to all kinds of locking problems.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Accounting
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * ==========
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Accounting of CPU usage is based on per-thread micro-state accounting data.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * On every clock tick clock() adds new on-CPU time for every thread found on
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * CPU. Scheduling classes also add new on-CPU time for any thread leaving CPU.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * New times means time since it was last accounted for. On-CPU times greater
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * than 1 tick are truncated to 1 tick.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Project CPU usage is aggregated from all threads within the project.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Zone CPU usage is the sum of usages for all projects within the zone. Zone
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * CPU usage is calculated on every clock tick by walking list of projects and
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * adding their usage together.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * CPU usage is decayed by the caps_update() routine which is called once per
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * every clock tick. It walks lists of project caps and decays their usages by
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * one per cent. If CPU usage drops below cap levels, threads on the wait queue
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * are made runnable again, one thread per clock tick.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Interfaces
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * ==========
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * The CPU Caps facility provides the following interfaces to the rest of the
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * cpucaps_project_add(kproject_t *)
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Notifies the framework of a new project. It should be put on the
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * capped_projects list if its zone has a cap.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * cpucaps_project_remove(kproject_t *)
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Remove the association between the specified project and its cap.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Called right before the project is destroyed.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * cpucaps_project_set(kproject_t *, rctl_qty_t)
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Set project cap of the specified project to the specified value. Setting the
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * value to NOCAP is equivalent to removing the cap.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * cpucaps_zone_set(zone_t *, rctl_qty_t)
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Set zone cap of the specified zone to the specified value. Setting the value
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * to NOCAP is equivalent to removing the cap.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * cpucaps_zone_remove(zone_t *)
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Remove the association between the zone and its cap.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * cpucaps_charge(kthread_id_t, caps_sc_t *, cpucaps_charge_t)
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Charges specified thread's project the amount of on-CPU time that it used.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * If the third argument is CPUCAPS_CHARGE_ONLY returns False.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Otherwise returns True if project or zone should be penalized because its
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * project or zone is exceeding its cap. Also sets TS_PROJWAITQ or TS_ZONEWAITQ
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * bits in t_schedflag in this case.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * CPUCAPS_ENFORCE(kthread_id_t *)
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Enforces CPU caps for a specified thread. Places LWPs running in LWP_USER
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * state on project or zone wait queues, as requested by TS_PROJWAITQ or
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * TS_ZONEWAITQ bits in t_schedflag. Returns True if the thread was placed on a
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * wait queue or False otherwise.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * cpucaps_sc_init(caps_sc_t *)
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Initializes the scheduling-class specific CPU Caps data for a thread.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * all the individual caps structures and their lists are protected by a global
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * caps_lock mutex. The lock is grabbed either by clock() or by events modifying
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * caps, so it is usually uncontended. We avoid all blocking memory allocations
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * while holding caps_lock to prevent clock() from blocking.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Thread state is protected by the thread lock. It protects the association
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * between a thread and its project and, as a consequence, to its zone. The
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * association can not break while thread lock is held, so the project or zone
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * cap are not going to disappear while thread lock is held.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Cap usage field is protected by high-pil spin-lock cap_usagelock. It is
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * grabbed by scheduling classes already holding thread lock at high PIL and by
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * clock thread performing usage decay. We should do as little work as possible
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * while holding the lock since it may be very hot. All threads in the project
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * contend for the same cache line doing cap usage updates.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * caps_lock protects list of capped projects and zones, changes in the cap
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * state and changes of the global cpucaps_enabled flag.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Changing zone caps also sets cpucaps_busy to avoid races when a zone cap is
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * modified in parallel. This can be per-zone cap flag, but we don't keep any
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * cap state for now.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolbstatic list_t capped_zones; /* - list of zones with caps */
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolbstatic list_t capped_projects; /* - list of projects with caps */
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolbboolean_t cpucaps_enabled; /* - are there any caps defined? */
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * The accounting is based on the number of nanoseconds threads spend running
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * during a tick which is kept in the cap_tick_cost variable.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * How much of the usage value is decayed every clock tick
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Decay one per cent of value per tick
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Scale the value and round it to the closest integer value
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolbstatic void caps_update();
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * CAP kstats.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Initialize CPU caps infrastructure.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * - Initialize lists of capped zones and capped projects
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * - Set cpucaps_clock_callout to NULL
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Initialize global variables
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Initialize scheduling-class specific CPU Caps data.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Allocate and initialize cpucap structure
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb cpucap_t *cap = kmem_zalloc(sizeof (cpucap_t), KM_SLEEP);
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Free cpucap structure
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * This cap should not be active
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Activate cap - insert into active list and unblock its
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * wait queue. Should be called with caps_lock held.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * The cap_value field is set to the value supplied.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Cap can not be already enabled
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Deactivate cap
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * - Block its wait queue. This prevents any new threads from being
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * enqueued there and moves all enqueued threads to the run queue.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * - Remove cap from list l.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * - Disable CPU caps globally if there are no capped projects or zones
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Should be called with caps_lock held.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Cap should be currently active
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb if (list_is_empty(&capped_projects) && list_is_empty(&capped_zones)) {
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Enable cap for a project kpj
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * It is safe to enable already enabled project cap.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Should be called with caps_lock held.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Create cap kstats
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb if ((cap->cap_kstat = rctl_kstat_create_project(kpj, "cpucaps",
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Disable project cap.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * It is safe to disable already disabled project cap.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Should be called with caps_lock held.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Enable cap for a zone
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * It is safe to enable already enabled zone cap.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Should be called with caps_lock held.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Create cap kstats
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb if ((cap->cap_kstat = rctl_kstat_create_zone(zone, "cpucaps",
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Disable zone cap.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * It is safe to disable already disabled zone cap.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Should be called with caps_lock held.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Apply specified callback to all caps contained in the list `l'.
d3d50737e566cade9a08d73d2af95105ac7cd960Rafael Vanonicap_walk(list_t *l, void (*cb)(cpucap_t *, int64_t))
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb for (cap = list_head(l); cap != NULL; cap = list_next(l, cap)) {
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * If cap limit is not reached, make one thread from wait queue runnable.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * The waitq_isempty check is performed without the waitq lock. If a new thread
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * is placed on the waitq right after the check, it will be picked up during the
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * next invocation of cap_poke_waitq().
d3d50737e566cade9a08d73d2af95105ac7cd960Rafael Vanoni/* ARGSUSED */
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * The callback function called for every cap on capped_projects list.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Decay cap usage by CAP_DECAY_FACTOR
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Add this cap project usage to its zone usage.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Kick off a thread from the cap waitq if cap is not reached.
d3d50737e566cade9a08d73d2af95105ac7cd960Rafael Vanonicap_project_usage_walker(cpucap_t *cap, int64_t gen)
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Set or clear the CAP_REACHED flag based on the current usage.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Only projects having their own caps are ever marked as CAP_REACHED.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Add project's CPU usage to our zone's CPU usage.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * If we haven't reset this zone's usage during this clock tick
d3d50737e566cade9a08d73d2af95105ac7cd960Rafael Vanoni * yet, then do it now. The cap_gen field is used to check
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * whether this is the first zone's project we see during this
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * tick or a subsequent one.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb /* Check for overflows */
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Decay project usage.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb cap->cap_usage -= ROUND_SCALE(cap_usage, CAP_DECAY_FACTOR);
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * On every clock tick walk the list of project caps and update the CPU usage.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Also walk the list of zone caps checking whether any threads should
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * transition from wait queue to run queue.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * This function gets called by the clock thread directly when there are any
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * defined caps. The only lock that it grabs is caps_lock. Nothing else grabs
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * caps_lock for long periods of time, so there should be almost no contention
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * The function is called for each project in a zone when the zone cap is
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * modified. It enables project caps if zone cap is enabled and disables if the
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * zone cap is disabled and project doesn't have its own cap.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * For each project that does not have cpucap structure allocated it allocates a
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * new structure and assigns to kpj->cpu_cap. The allocation is performed
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * without holding caps_lock to avoid using KM_SLEEP allocation with caps_lock
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolbcap_project_zone_modify_walker(kproject_t *kpj, void *arg)
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * This is the first time any cap was established for this
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * project. Allocate a new cpucap structure for it.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Double-check that kpj_cpucap is still NULL - now with caps_lock held
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * and assign the newly allocated cpucap structure to it.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Remove all projects in this zone without caps
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * from the capped_projects list.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Add the project to capped_projects list.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb return (0);
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Set zone cap to cap_val
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * If cap_val is equal to NOCAP, disable zone cap.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * If this is the first time a cap is set on a zone, allocate cpucap structure
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * without holding caps_lock to avoid KM_SLEEP allocation with caps_lock held.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Nothing to do if trying to disable a cap on a zone when caps are off
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * or a zone which does not have a cap yet.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb if ((CPUCAPS_OFF() || !ZONE_IS_CAPPED(zone)) && (cap_val == NOCAP))
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb return (0);
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Double-check whether zone->zone_cpucap is NULL, now with caps_lock
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * held. If it is still NULL, assign a newly allocated cpucap to it.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb /* Nothing to do if the value is staying the same */
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb return (0);
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Clear cap statistics since the cap value itself changes.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Remove cap for the zone
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Disable caps for all project belonging to this zone
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * unless they have their own cap.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Set a cap on a zone which previously was not capped.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Enable cap for all projects belonging to this zone.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * No state transitions, just change the value
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb return (0);
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * The project is going away so disable its cap.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * The zone is going away, so disable its cap.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * New project was created. It should be put on the capped_projects list if
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * its zone has a cap.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * This project was never capped before, so allocate its cap structure.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Double-check with caps_lock held
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Set project cap to cap_val
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * If cap_val is equal to NOCAP, disable project cap.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * If this is the first time a cap is set on a project, allocate cpucap
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * structure without holding caps_lock to avoid KM_SLEEP allocation with
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * caps_lock held.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolbcpucaps_project_set(kproject_t *kpj, rctl_qty_t cap_val)
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Nothing to do if trying to disable project cap and caps are not
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * enabled or if trying to disable cap on a project that does not have
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * cap enabled.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb if ((cap_val == NOCAP) && (CPUCAPS_OFF() || !PROJECT_IS_CAPPED(kpj)))
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb return (0);
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * This project was never capped before, so allocate its cap
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * structure.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Double-check with caps_lock held.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Get the actual pointer to the project cap.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Nothing to do if the value is not changing
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb return (0);
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Clear cap statistics since the cap value itself changes.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Enable this cap if it is not already enabled.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * User requested to drop a cap on the project. If it is part of
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * capped zone, keep the cap and set the value to MAX_USAGE,
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * otherwise disable the cap.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb return (0);
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Get cap usage.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb return (cap != NULL ? (rctl_qty_t)(cap->cap_usage / cap_tick_cost) : 0);
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Get current project usage.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Get current zone usage.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Charge project of thread t the time thread t spent on CPU since previously
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * adjusted.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Record the current on-CPU time in the csc structure.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Do not adjust for more than one tick worth of time.
4b175f6f2f7e98f7eb66bb44971069520bf5a52aakolb * It is possible that the project cap is being disabled while this routine is
4b175f6f2f7e98f7eb66bb44971069520bf5a52aakolb * executed. This should not cause any issues since the association between the
4b175f6f2f7e98f7eb66bb44971069520bf5a52aakolb * thread and its project is protected by thread lock.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb /* Get on-CPU time since birth of a thread */
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb /* Time spent on CPU since last checked */
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb /* Save the accumulated on-CPU time */
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb /* Charge at most one tick worth of on-CPU time */
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb /* Add usage_delta to the project usage value. */
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb /* Check for overflows */
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * cap_maxusage is only kept for observability. Move it outside
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * the lock to reduce the time spent while holding the lock.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Charge thread's project and return True if project or zone should be
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * penalized because its project or zone is exceeding its cap. Also sets
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * TS_PROJWAITQ or TS_ZONEWAITQ in this case.
4b175f6f2f7e98f7eb66bb44971069520bf5a52aakolb * It is possible that the project cap is being disabled while this routine is
4b175f6f2f7e98f7eb66bb44971069520bf5a52aakolb * executed. This should not cause any issues since the association between the
4b175f6f2f7e98f7eb66bb44971069520bf5a52aakolb * thread and its project is protected by thread lock. It will still set
4b175f6f2f7e98f7eb66bb44971069520bf5a52aakolb * TS_PROJECTWAITQ/TS_ZONEWAITQ in this case but cpucaps_enforce will not place
4b175f6f2f7e98f7eb66bb44971069520bf5a52aakolb * anything on the blocked wait queue.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolbcpucaps_charge(kthread_id_t t, caps_sc_t *csc, cpucaps_charge_t charge_type)
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb /* Nothing to do for projects that are not capped. */
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * The caller only requested to charge the project usage, no enforcement
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb if (project_cap->cap_usage >= project_cap->cap_value) {
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Enforce CPU caps. If got preempted in the user-land, we know that thread does
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * not hold any kernel locks, so enqueue ourselves on the waitq, if needed.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * CPU Caps are only enforced for user threads.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Threads flagged with TS_PROJWAITQ are placed on their project wait queues and
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * threads marked with TS_ZONEWAITQ are placed on their zone wait queue.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * It is possible that by the time we enter cpucaps_enforce() the cap is already
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * disabled. In this case waitq_enqueue() fails and doesn't enqueue anything. We
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * still clear TS_PROJWAITQ/TS_ZONEWAITQ flags in this case since they no longer
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb if (waitq_enqueue(&(ttoproj(t)->kpj_cpucap->cap_waitq),
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb if (waitq_enqueue(&(ttozone(t)->zone_cpucap->cap_waitq),
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * The thread is not enqueued on the wait queue.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb * Convert internal cap statistics into values exported by cap kstat.
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb capsp->cap_nwait.value.ui64 = cap->cap_waitq.wq_count;
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb capsp->cap_below.value.ui64 = ROUND_SCALE(cap->cap_below, tick_sec);
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb capsp->cap_above.value.ui64 = ROUND_SCALE(cap->cap_above, tick_sec);
c97ad5cdc75eb73e3cc38542ca3ba783574b0a7aakolb return (0);