From 4e240f2304d9d18dadd47a8c578dd46847f7a595 Mon Sep 17 00:00:00 2001 From: zareck <121526696+cassiozareck@users.noreply.github.com> Date: Fri, 8 Sep 2023 02:22:43 -0300 Subject: [PATCH] Expanding documentation in queue.go (#26889) A set of terminology, along with a broader description, can help more people engage with the Gitea queue system, providing insights and ensuring its correct use. --------- Co-authored-by: wxiaoguang --- modules/queue/queue.go | 61 +++++++++++++++++++++++++++++++++--------- 1 file changed, 48 insertions(+), 13 deletions(-) diff --git a/modules/queue/queue.go b/modules/queue/queue.go index 0ab8dd4ae4..1ad7320e31 100644 --- a/modules/queue/queue.go +++ b/modules/queue/queue.go @@ -1,27 +1,62 @@ // Copyright 2023 The Gitea Authors. All rights reserved. // SPDX-License-Identifier: MIT -// Package queue implements a specialized queue system for Gitea. +// Package queue implements a specialized concurrent queue system for Gitea. // -// There are two major kinds of concepts: +// Terminology: // -// * The "base queue": channel, level, redis: -// - They have the same abstraction, the same interface, and they are tested by the same testing code. -// - The dummy(immediate) queue is special, it's not a real queue, it's only used as a no-op queue or a testing queue. +// 1. Item: +// - An item can be a simple value, such as an integer, or a more complex structure that has multiple fields. +// Usually a item serves as a task or a message. Sets of items will be sent to a queue handler to be processed. +// - It's represented as a JSON-marshaled binary slice in the queue // -// * The WorkerPoolQueue: it uses the "base queue" to provide "worker pool" function. -// - It calls the "handler" to process the data in the base queue. -// - Its "Push" function doesn't block forever, -// it will return an error if the queue is full after the timeout. +// 2. Batch: +// - A collection of items that are grouped together for processing. Each worker receives a batch of items. +// +// 3. Worker: +// - Individual unit of execution designed to process items from the queue. It's a goroutine that calls the Handler. +// - Workers will get new items through a channel (WorkerPoolQueue is responsible for the distribution). +// - Workers operate in parallel. The default value of max workers is determined by the setting system. +// +// 4. Handler (represented by HandlerFuncT type): +// - It's the function responsible for processing items. Each active worker will call it. +// - If an item or some items are not psuccessfully rocessed, the handler could return them as "unhandled items". +// In such scenarios, the queue system ensures these unhandled items are returned to the base queue after a brief delay. +// This mechanism is particularly beneficial in cases where the processing entity (like a document indexer) is +// temporarily unavailable. It ensures that no item is skipped or lost due to transient failures in the processing +// mechanism. +// +// 5. Base queue: +// - Represents the underlying storage mechanism for the queue. There are several implementations: +// - Channel: Uses Go's native channel constructs to manage the queue, suitable for in-memory queuing. +// - LevelDB: Especially useful in persistent queues for single instances. +// - Redis: Suitable for clusters, where we may have multiple nodes. +// - Dummy: This is special, it's not a real queue, it's a immediate no-op queue, which is useful for tests. +// - They all have the same abstraction, the same interface, and they are tested by the same testing code. +// +// 6. WorkerPoolQueue: +// - It's responsible to glue all together, using the "base queue" to provide "worker pool" functionality. It creates +// new workers if needed and can flush the queue, running all the items synchronously till it finishes. +// - Its "Push" function doesn't block forever, it will return an error if the queue is full after the timeout. +// +// 7. Manager: +// - The purpose of it is to serve as a centralized manager for multiple WorkerPoolQueue instances. Whenever we want +// to create a new queue, flush, or get a specific queue, we could use it. // // A queue can be "simple" or "unique". A unique queue will try to avoid duplicate items. // Unique queue's "Has" function can be used to check whether an item is already in the queue, -// although it's not 100% reliable due to there is no proper transaction support. +// although it's not 100% reliable due to the lack of proper transaction support. // Simple queue's "Has" function always returns "has=false". // -// The HandlerFuncT function is called by the WorkerPoolQueue to process the data in the base queue. -// If the handler returns "unhandled" items, they will be re-queued to the base queue after a slight delay, -// in case the item processor (eg: document indexer) is not available. +// A WorkerPoolQueue is a generic struct; this means it will work with any type but just for that type. +// If you want another kind of items to run, you would have to call the manager to create a new WorkerPoolQueue for you +// with a different handler that works with this new type of item. As an example of this: +// +// func Init() error { +// itemQueue = queue.CreateSimpleQueue(graceful.GetManager().ShutdownContext(), "queue-name", handler) +// ... +// } +// func handler(items ...*mypkg.QueueItem) []*mypkg.QueueItem { ... } package queue import "code.gitea.io/gitea/modules/util"