Skip to content

Conversation

@ChristopherHX
Copy link
Contributor

@ChristopherHX ChristopherHX commented Nov 14, 2025

  • Reduces db writes of runner online and idle time
    • previously an inactive runner updated the db every request
    • now the half of the online timeout need to expire first e.g. 30s+ then update online time
  • Reduces fetchtask memory usage by preventing loading the single workflow files of all jobs
    • Only request runsOn and id of all possible job candidates
  • Create an almost empty task entry for acquiring a job prevent additional db pressure when many runners pick in parallel
    • Update the task with all fields once the runner transaction owns the job
  • Use paging
    • Jitter delays during Fetching once we need to look at a second job page
      • To shuffle runners possible calling FetchTask at a very similar time
    • Ignore already picked jobs that may happened within our transaction instead of claiming no job found
    • Workflows with many jobs an runners should bring the pending work to runners without extra delays
    • Do not load all waiting jobs of the runner scope into a slice with unspecified limit
  • Improves my initial draft by using less resources without the need for the runner to retry / let a lot transaction fail

Fixes #33492


  • Evaluation test pending
  • further work would store lastUpdated of the last checked job inside ActionRunner

Feedback welcome and maybe need to link another existing issue.

@ChristopherHX ChristopherHX added type/enhancement An improvement of existing functionality topic/gitea-actions related to the actions of Gitea backport/v1.25 labels Nov 14, 2025
@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Nov 14, 2025
@github-actions github-actions bot added modifies/api This PR adds API routes or modifies them modifies/go Pull requests that update Go code labels Nov 14, 2025
@ChristopherHX ChristopherHX marked this pull request as draft November 14, 2025 16:47

workflowJob, err := job.ParseJob()
if err != nil {
return nil, false, fmt.Errorf("load job %d: %w", job.ID, err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ends in a fetch loop such job should be updated to failed?

@ChristopherHX
Copy link
Contributor Author

ChristopherHX commented Nov 14, 2025

I conclude, this test is a denial of service for sqlite of this change: Not even a single job reports anything

// Tests if many runners can pick up jobs in parallel without conflicts
func TestActionsFetchTaskLoad(t *testing.T) {
	onGiteaRun(t, func(t *testing.T, u *url.URL) {
		user2 := unittest.AssertExistsAndLoadBean(t, &user_model.User{ID: 2})
		user2Session := loginUser(t, user2.Name)
		user2Token := getTokenForLoggedInUser(t, user2Session, auth_model.AccessTokenScopeWriteRepository, auth_model.AccessTokenScopeWriteUser)

		apiBaseRepo := createActionsTestRepo(t, user2Token, "actions-gitea-context", false)
		baseRepo := unittest.AssertExistsAndLoadBean(t, &repo_model.Repository{ID: apiBaseRepo.ID})

		// init the workflow
		wfTreePath := ".gitea/workflows/pull.yml"
		wfFileContent := `name: Pull Request
on: push
jobs:
  wf1-job:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        load: [1,2,3,4,5,6,7,8,9,10]
        load2: [1,2,3,4,5,6,7,8,9,10]
        load3: [1,2,3,4,5]
    
    steps:
      - run: echo 'test the pull'
`
		opts := getWorkflowCreateFileOptions(user2, baseRepo.DefaultBranch, "create "+wfTreePath, wfFileContent)
		createWorkflowFile(t, user2Token, baseRepo.OwnerName, baseRepo.Name, wfTreePath, opts)

		// create a runner that picks a job and get force cancelled
		multiRunner := newMockRunner()
		multiRunner.registerAsRepoRunner(t, baseRepo.OwnerName, baseRepo.Name, "100x parallel fetch", []string{"ubuntu-latest"}, false)

		var wgroup sync.WaitGroup
		for i := 0; i < 25*5; i++ {
			wgroup.Add(1)
			go t.Run(fmt.Sprintf("worker-%d", i), func(t *testing.T) {
				defer wgroup.Done()

				tasksToRun := []*runnerv1.Task{multiRunner.fetchTask(t), multiRunner.fetchTask(t), multiRunner.fetchTask(t), multiRunner.fetchTask(t)}

				for _, task := range tasksToRun {
					multiRunner.execTask(t, task, &mockTaskOutcome{
						result: runnerv1.Result_RESULT_SUCCESS,
					})
				}
			})
		}
		wgroup.Wait()
	})
}

EDIT 16 Nov 2025

Updating the sqlite database took 30minutes, for 25 parallel runners that do not upload logs. Maybe need to run this test against different databases, maybe they work better. Also need to run this against the default branch.

Mysql and Mariadb also support locking rows of the pending jobs, that maybe could optimize them. Otherwise Gitea Actions should be coordinated rate limited between gitea and runner to maintain responsiveness

2025/11/16 00:49:30 HTTPRequest [I] router: completed POST /api/actions/runner.v1.RunnerService/UpdateTask for 127.0.0.1:60150, 200 OK in 29778.7ms @ <autogenerated>:1(http.Handler.ServeHTTP-fm)
This test took > 10 minutes on a m4 pro macbook
// Tests if many runners can pick up jobs in parallel without conflicts
func TestActionsFetchTaskLoad(t *testing.T) {
	onGiteaRun(t, func(t *testing.T, u *url.URL) {
		user2 := unittest.AssertExistsAndLoadBean(t, &user_model.User{ID: 2})
		user2Session := loginUser(t, user2.Name)
		user2Token := getTokenForLoggedInUser(t, user2Session, auth_model.AccessTokenScopeWriteRepository, auth_model.AccessTokenScopeWriteUser)

		apiBaseRepo := createActionsTestRepo(t, user2Token, "actions-gitea-context", false)
		baseRepo := unittest.AssertExistsAndLoadBean(t, &repo_model.Repository{ID: apiBaseRepo.ID})

		// init the workflow
		wfTreePath := ".gitea/workflows/pull.yml"
		wfFileContent := `name: Pull Request
on: push
jobs:
  wf1-job:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        load: [1,2,3,4,5,6,7,8,9,10]
        load2: [1,2,3,4,5,6,7,8,9,10]
        load3: [1,2,3,4,5]
    
    steps:
      - run: echo 'test the pull'
`
		opts := getWorkflowCreateFileOptions(user2, baseRepo.DefaultBranch, "create "+wfTreePath, wfFileContent)
		createWorkflowFile(t, user2Token, baseRepo.OwnerName, baseRepo.Name, wfTreePath, opts)

		// create a runner that picks a job and get force cancelled
		multiRunner := newMockRunner()
		multiRunner.registerAsRepoRunner(t, baseRepo.OwnerName, baseRepo.Name, "100x parallel fetch", []string{"ubuntu-latest"}, false)

		numJobs := 500
		numParallelJobs := 25

		for j := 0; j < numJobs/numParallelJobs; j++ {
			var wgroup sync.WaitGroup
			for i := 0; i < numParallelJobs; i++ {
				wgroup.Add(1)
				go t.Run(fmt.Sprintf("worker-%d", i), func(t *testing.T) {
					defer wgroup.Done()
					task := multiRunner.fetchTask(t)
					multiRunner.execTask(t, task, &mockTaskOutcome{
						result: runnerv1.Result_RESULT_SUCCESS,
					})
				})
			}
			wgroup.Wait()
		}
	})
}

@lunny lunny requested review from Zettat123 and wolfogre November 15, 2025 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport/v1.25 lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. modifies/api This PR adds API routes or modifies them modifies/go Pull requests that update Go code topic/gitea-actions related to the actions of Gitea type/enhancement An improvement of existing functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gitea Actions FetchTask not reliable assigning queued jobs to idle runners as long no new jobs are queued

2 participants