da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * CDDL HEADER START
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * The contents of this file are subject to the terms of the
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * Common Development and Distribution License (the "License").
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * You may not use this file except in compliance with the License.
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * See the License for the specific language governing permissions
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * and limitations under the License.
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * When distributing Covered Code, include this CDDL HEADER in each
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * If applicable, add the following below this CDDL HEADER, with the
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * fields enclosed by brackets "[]" replaced with your own identifying
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * information: Portions Copyright [yyyy] [name of copyright owner]
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * CDDL HEADER END
0dc2366f7b9f9f36e10909b1e95edbf2a261c2acVenugopal Iyer * Copyright 2010 Sun Microsystems, Inc. All rights reserved.
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * Use is subject to license terms.
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * General Soft rings - Simulating Rx rings in S/W.
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * Soft ring is a data abstraction containing a queue and a worker
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * thread and represents a hardware Rx ring in software. Each soft
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * ring set can have a collection of soft rings for separating
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * L3/L4 specific traffic (IPv4 from IPv6 or TCP from UDP) or for
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * allowing a higher degree of parallelism by sending traffic to
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * one of the soft rings for a SRS (using a hash on src IP or port).
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * Each soft ring worker thread can be bound to a different CPU
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * allowing the processing for each soft ring to happen in parallel
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * and independent from each other.
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * Protocol soft rings:
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * Each SRS has at an minimum 3 softrings. One each for IPv4 TCP,
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * IPv4 UDP and rest (OTH - for IPv6 and everything else). The
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * SRS does dynamic polling and enforces link level bandwidth but
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * it does so for all traffic (IPv4 and IPv6 and all protocols) on
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * that link. However, each protocol layer wants a different
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * behaviour. For instance IPv4 TCP has per CPU squeues which
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * enforce their own polling and flow control so IPv4 TCP traffic
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * needs to go to a separate soft ring which can be polled by the
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * TCP squeue. It also allows TCP squeue to push back flow control
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * all the way to NIC hardware (if it puts its corresponding soft
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * ring in the poll mode and soft ring queue builds up, the
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * shared srs_poll_pkt_cnt goes up and SRS automatically stops
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * more packets from entering the system).
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * Similarly, the UDP benefits from a DLS bypass and packet chaining
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * so sending it to a separate soft ring is desired. All the rest of
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * the traffic (including IPv6 is sent to OTH softring). The IPv6
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * traffic current goes through OTH softring and via DLS because
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * it need more processing to be done. Irrespective of the sap
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * (IPv4 or IPv6) or the transport, the dynamic polling, B/W enforcement,
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * cpu assignment, fanout, etc apply to all traffic since they
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * are implement by the SRS which is agnostic to sap or transport.
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * Fanout soft rings:
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * On a multithreaded system, we can assign more CPU and multi thread
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * the stack by creating a soft ring per CPU and spreading traffic
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * based on a hash computed on src IP etc. Since we still need to
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * keep the protocol separation, we create a set of 3 soft ring per
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * CPU (specified by cpu list or degree of fanout).
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * NOTE: See the block level comment on top of mac_sched.c
da14cebe459d3275048785f25bd869cb09b5307fEric Chengstatic void mac_rx_soft_ring_drain(mac_soft_ring_t *);
da14cebe459d3275048785f25bd869cb09b5307fEric Chengstatic void mac_soft_ring_fire(void *);
da14cebe459d3275048785f25bd869cb09b5307fEric Chengstatic void mac_soft_ring_worker(mac_soft_ring_t *);
da14cebe459d3275048785f25bd869cb09b5307fEric Chengstatic void mac_tx_soft_ring_drain(mac_soft_ring_t *);
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng#define ADD_SOFTRING_TO_SET(mac_srs, softring) { \
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng /* ADD to the list */ \
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng mac_srs->srs_soft_ring_tail->s_ring_next = softring; \
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * mac_soft_ring_worker_wakeup
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * Wake up the soft ring worker thread to process the queue as long
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * as no one else is processing it and upper layer (client) is still
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * ready to receive packets.
da14cebe459d3275048785f25bd869cb09b5307fEric Chengmac_soft_ring_worker_wakeup(mac_soft_ring_t *ringp)
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng /* Schedule the worker thread. */
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * mac_soft_ring_create
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * Create a soft ring, do the necessary setup and bind the worker
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * thread to the assigned CPU.
0dc2366f7b9f9f36e10909b1e95edbf2a261c2acVenugopal Iyermac_soft_ring_create(int id, clock_t wait, uint16_t type,
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng pri_t pri, mac_client_impl_t *mcip, mac_soft_ring_set_t *mac_srs,
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng processorid_t cpuid, mac_direct_rx_t rx_func, void *x_arg1,
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng ringp = kmem_cache_alloc(mac_soft_ring_cache, KM_SLEEP);
08ac1c49adeb1b81324fa4e70a922581ad7ec309Nicolas Droux "mac_tcp_soft_ring_%d_%p", id, (void *)mac_srs);
08ac1c49adeb1b81324fa4e70a922581ad7ec309Nicolas Droux "mac_udp_soft_ring_%d_%p", id, (void *)mac_srs);
08ac1c49adeb1b81324fa4e70a922581ad7ec309Nicolas Droux "mac_oth_soft_ring_%d_%p", id, (void *)mac_srs);
0dc2366f7b9f9f36e10909b1e95edbf2a261c2acVenugopal Iyer "mac_tx_soft_ring_%d_%p", id, (void *)mac_srs);
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng (void) strncpy(ringp->s_ring_name, name, S_RING_NAMELEN + 1);
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng mutex_init(&ringp->s_ring_lock, NULL, MUTEX_DEFAULT, NULL);
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng ringp->s_ring_notify_cb_info.mcbi_lockp = &ringp->s_ring_lock;
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * Protect against access from DR callbacks (mac_walk_srs_bind/unbind)
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * which can't grab the mac perimeter
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * set the bind CPU to -1 to indicate
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * no thread affinity set
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng ringp->s_ring_cpuid = ringp->s_ring_cpuid_save = -1;
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng mac_soft_ring_worker, ringp, 0, &p0, TS_RUN, pri);
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng ringp->s_ring_drain_func = mac_tx_soft_ring_drain;
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng ringp->s_ring_tx_max_q_cnt = mac_tx_soft_ring_max_q_cnt;
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng (mac_tx_soft_ring_hiwat > mac_tx_soft_ring_max_q_cnt) ?
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng mac_tx_soft_ring_max_q_cnt : mac_tx_soft_ring_hiwat;
0dc2366f7b9f9f36e10909b1e95edbf2a261c2acVenugopal Iyer tx->st_soft_rings[((mac_ring_t *)x_arg2)->mr_index] =
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng ringp->s_ring_drain_func = mac_rx_soft_ring_drain;
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * mac_soft_ring_free
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * Free the soft ring once we are done with it.
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng (S_RING_CONDEMNED | S_RING_CONDEMNED_DONE | S_RING_PROC)) ==
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng mac_pkt_drop(NULL, NULL, softring->s_ring_first, B_FALSE);
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng mac_callback_free(softring->s_ring_notify_cb_list);
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * mac_soft_ring_bind
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * Bind a soft ring worker thread to supplied CPU.
da14cebe459d3275048785f25bd869cb09b5307fEric Chengmac_soft_ring_bind(mac_soft_ring_t *ringp, processorid_t cpuid)
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng DTRACE_PROBE2(mac__soft__ring__cpu__bound, mac_soft_ring_t *,
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng thread_affinity_set(ringp->s_ring_worker, cpuid);
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * mac_soft_ring_unbind
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * Un Bind a soft ring worker thread.
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * PRIVATE FUNCTIONS
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * mac_rx_soft_ring_drain
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * Called when worker thread model (ST_RING_WORKER_ONLY) of processing
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * incoming packets is used. s_ring_first contain the queued packets.
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * s_ring_rx_func contains the upper level (client) routine where the
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * packets are destined and s_ring_rx_arg1/s_ring_rx_arg2 are the
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * cookie meant for the client.
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng/* ARGSUSED */
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng mac_soft_ring_set_t *mac_srs = ringp->s_ring_set;
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * If we have a soft ring set which is doing
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * bandwidth control, we need to decrement its
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * srs_size so it can have a accurate idea of
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * what is the real data queued between SRS and
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * its soft rings. We decrement the size for a
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * packet only when it gets processed by both
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * SRS and the soft ring.
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * mac_soft_ring_worker
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * The soft ring worker routine to process any queued packets. In
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * normal case, the worker thread is bound to a CPU. It the soft
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * ring is dealing with TCP packets, then the worker thread will
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * be bound to the same CPU as the TCP squeue.
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng CALLB_CPR_INIT(&cprinfo, lock, callb_generic_cpr, "mac_soft_ring");
efe28d82661ce6701204798fb838fd29c6348931Rajagopal Kunhappan (ringp->s_ring_state & (S_RING_BLOCK|S_RING_BLANK))) &&
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * Either we have work to do, or we have been asked to
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * shutdown temporarily or permanently
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng cv_wait(&ringp->s_ring_async, &ringp->s_ring_lock);
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng ASSERT(!(ringp->s_ring_state & S_RING_CONDEMNED));
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * mac_soft_ring_intr_enable and mac_soft_ring_intr_disable
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * these functions are called to toggle the sending of packets to the
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * client. They are called by the client. the client gets the name
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * of these routine and corresponding cookie (pointing to softring)
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * during capability negotiation at setup time.
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * Enabling is allow the processing thread to send packets to the
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * client while disabling does the opposite.
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * Stop worker thread from sending packets above.
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * Squeue will poll soft ring when it needs packets.
efe28d82661ce6701204798fb838fd29c6348931Rajagopal Kunhappan if (!(ringp->s_ring_state & S_RING_PROC)) {
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * mac_soft_ring_poll
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * This routine is called by the client to poll for packets from
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * the soft ring. The function name and cookie corresponding to
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * the soft ring is exchanged during capability negotiation during
da14cebe459d3275048785f25bd869cb09b5307fEric Chengmac_soft_ring_poll(mac_soft_ring_t *ringp, int bytes_to_pickup)
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng mac_soft_ring_set_t *mac_srs = ringp->s_ring_set;
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * Update the shared count and size counters so
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * that SRS has a accurate idea of queued packets.
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * mac_soft_ring_dls_bypass
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * Enable direct client (IP) callback function from the softrings.
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * Callers need to make sure they don't need any DLS layer processing
da14cebe459d3275048785f25bd869cb09b5307fEric Chengmac_soft_ring_dls_bypass(void *arg, mac_direct_rx_t rx_func, void *rx_arg1)
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * mac_soft_ring_signal
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * Typically used to set the soft ring state to QUIESCE, CONDEMNED, or
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * In the Rx side, the quiescing is done bottom up. After the Rx upcalls
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * from the driver are done, then the Rx SRS is quiesced and only then can
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * we signal the soft rings. Thus this function can't be called arbitrarily
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * without satisfying the prerequisites. On the Tx side, the threads from
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * top need to quiesced, then the Tx SRS and only then can we signal the
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * Tx soft rings.
da14cebe459d3275048785f25bd869cb09b5307fEric Chengmac_soft_ring_signal(mac_soft_ring_t *softring, uint_t sr_flag)
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * mac_tx_soft_ring_drain
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * The transmit side drain routine in case the soft ring was being
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * used to transmit packets.
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng mac_soft_ring_set_t *mac_srs = ringp->s_ring_set;
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng /* Device out of tx desc, set block */
0dc2366f7b9f9f36e10909b1e95edbf2a261c2acVenugopal Iyer ringp->s_ring_size += (saved_size - stats.mts_obytes);
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng if (ringp->s_ring_count == 0 && ringp->s_ring_state &
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng (S_RING_TX_HIWAT | S_RING_WAKEUP_CLIENT | S_RING_ENQUEUED)) {
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng ~(S_RING_TX_HIWAT | S_RING_WAKEUP_CLIENT | S_RING_ENQUEUED);
0dc2366f7b9f9f36e10909b1e95edbf2a261c2acVenugopal Iyer mac_tx_invoke_callbacks(mcip, (mac_tx_cookie_t)ringp);
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * If the client is not the primary MAC client, then we
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * need to send the notification to the clients upper
da14cebe459d3275048785f25bd869cb09b5307fEric Cheng * MAC, i.e. mci_upper_mip.