Commit Graph

4297 Commits

Author SHA1 Message Date
Pierre Bourdon
685857df2e web: replace 'errormsg' with 'errormsg IS NULL' in most cases
This is implement in an extremely hacky way due to poor DBIx feature
support. Ideally, what we'd need is a way to tell DBIx to ignore the
errormsg column unless explicitly requested, and to automatically add a
computed 'errormsg IS NULL' column in others. Since it does not support
that, this commit instead hacks some support via method overrides while
taking care to not break anything obvious.
2025-02-12 10:35:17 +01:00
ajs124
f2b6e9d8ab lazy-load evaluation errors
Closes #1362
2025-02-12 10:35:17 +01:00
Maximilian Bosch
b04847335f Only show stepname if it doesn't equal the name of the drv
When building e.g. nixpkgs, the "Running builds" view will mostly look
like this

    hello.x86_64-linux (Build of hello-X.Y)
    exa.x86_64-linux (Build of exa-X.Y)
    ...

This doesn't provide any useful information. Showing the step name only
makes sense if it's not a child of the job's derivation. With this
patch, that information will only be shown if the drv name (i.e. w/o
`/nix/store/` prefix, .drv ext & hash) is not equal to the drv name of
the job itself (build.nixname).
2025-02-12 10:35:17 +01:00
Maximilian Bosch
7e61000172 Running builds view: show build step names
When using Hydra to build machine configurations, you'll often see
"nixosConfigurations.foo" five times, i.e. for each build step being
run. This isn't very helpful I think because in such a case, a single
build step can also be compiling the Linux kernel.

This change also fetches the `drvpath` and `type` from the `buildsteps`
relation. We're already joining it, so this doesn't make much difference
(confirmed via query logging that this doesn't cause extra SQL queries).

Unfortunately build steps don't have a human readable name, so I'm
deriving it from the drvpath by stripping away the hash (assuming that
it'll never contain a `-` and that `/nix/store/` is used as prefix). I
decided against using the Nix bindings for that to avoid too much
overhead due to store operations for each build step.
2025-02-12 10:35:17 +01:00
Maximilian Bosch
6d04e824d5 Make "timed out" and "log limit exceeded" builds aborted
In 73694087a0 I gave builds that failed
because of a timeout or exceeded log limit a stop sign and I stand by
that reasoning: with that it's possible to distinguish between actual
build failures and rather transient things such as timeouts.

Back then I considered it a feature that these are shown in a different
tab, but I don't think that's a good idea anymore. When using a jobset to
e.g. track the regressions from a mass rebuild (like a compiler or gcc
update), "Newly failed builds" should exclusively display regressions (and
flaky builds of course, not much I can do about that).

Also, when a bunch of builds fail in such a jobset because of e.g. a
broken connection to a builder that results in a timeout, I want to be
able to restart them all w/o rebuilding actual regressions.

To make it clear that we not only have "Aborted" builds in the tab, I
renamed the label to "Aborted / Timed out".
2025-02-12 10:35:16 +01:00
Pierre Bourdon
36e25d8fd2 queue-runner: release machine reservation while copying outputs
This allows for better builder usage when the queue runner is busy. To
avoid running into uncontrollable imbalances between builder/queue
runner, we only release the machine reservation after the local
throttler has found a slot to start copying the outputs for that build.
2025-02-12 10:35:16 +01:00
Pierre Bourdon
d4e273f7b1 queue-runner: switch to pseudorandom ordering of builds processing
We don't rely on sequential / monotonic build IDs processing anymore, so
randomizing actually has the advantage of mixing builds for different
systems together, to avoid only one chunk of builds for a single system
getting processed while builders for other systems are starved.
2025-02-12 10:35:16 +01:00
Pierre Bourdon
23366ec10d queue runner: introduce some parallelism for remote paths lookup
Each output for a given step being ingested is looked up in parallel,
which should basically multiply the speed of builds ingestion by the
average number of outputs per derivation.
2025-02-12 10:35:16 +01:00
Pierre Bourdon
98759e4ff9 queue-runner: reduce the time between queue monitor restarts
This will induce more DB queries (though these are fairly cheap), but at
the benefit of processing bumps within 1m instead of within 10m.
2025-02-12 10:35:16 +01:00
Pierre Bourdon
d850c99883 queue-runner: remove id > X from new builds query
Running the query with/without it shows that it makes no difference to
postgres, since there's an index on finished=0 already. This allows a
few simplifications, but also paves the way towards running multiple
parallel monitor threads in the future.
2025-02-12 10:35:16 +01:00
Pierre Bourdon
1b76eec4e8 queue-runner: add prom metrics to allow detecting internal bottlenecks
By looking at the ratio of running vs. waiting for the dispatcher and
the queue monitor, we should get better visibility into what hydra is
currently bottlenecked on.

There are other side effects we can try to measure to get to the same
result, but having a simple way doesn't cost us much.
2025-02-12 10:35:15 +01:00
Pierre Bourdon
3bb1a61a7d web: include current step status on /machines 2025-02-12 10:35:15 +01:00
Pierre Bourdon
3b9045f60d queue-runner: limit parallelism of CPU intensive operations
My current theory is that running more parallel xz than available CPU
cores is reducing our overall throughput by requiring more scheduling
overhead and more cache thrashing.
2025-02-12 10:35:15 +01:00
Jörg Thalheim
c6f98202cd Merge pull request #1438 from NixOS/log-malformed-json
Log malformed JSON received from `nix-eval-jobs`
2025-02-12 12:58:18 +07:00
John Ericson
1dbc7f5845 Log malformed JSON received from nix-eval-jobs 2025-02-11 22:34:49 -05:00
John Ericson
c52845f560 Merge pull request #1421 from NixOS/nix-eval-jobs
Use `nix-eval-jobs` and delete `hydra-eval-jobs`
2025-02-07 19:41:38 -05:00
John Ericson
85383b9522 Render the nix-eval-jobs version too 2025-02-07 16:55:28 -05:00
Pierre Bourdon
2f92846e5a hydra-eval-jobs: remove, replaced by nix-eval-jobs
(cherry picked from commit ed7c58708cd3affd62a598a22a500ed2adf318bf)
2025-02-07 16:55:28 -05:00
Pierre Bourdon
d84ff32ce6 hydra-eval-jobset: Use nix-eval-jobs instead of hydra-eval-jobs
incrementally ingest eval results

nix-eval-jobs streams output, unlike hydra-eval-jobs. Now that we've
migrated, we can use this to:

1. Use less RAM by avoiding buffering a whole eval's worth of metadata
   into a Perl string and an array of JSON objects.
2. Make evals latency a bit lower by allowing the queue runner to start
   ingesting builds faster.

Also use the newly-restored constituents support in `nix-eval-jobs`

Note, we pass --workers and --max-memory-size to n-e-j

Lost in the h-e-j -> n-e-j migration, causing evaluation to always be
single threaded and limited to 4GiB RAM. Follow the config settings like
h-e-j used to do (via C++ code).

`nix-eval-jobs` should check `hydraJobs` and then `checks` with flakes

(cherry picked from commit 6d4ccff43c41adaf6e4b2b9bced7243bc2f6e97b)
(cherry picked from commit b0e9b4b2f99f9d8f5c4e780e89f955c394b5ced4)
(cherry picked from commit cdfc5c81e8037d3e4818a3e459d0804b2c157ea9)
(cherry picked from commit 4b107e6ff36bd89958fba36e0fe0340903e7cd13)

Co-Authored-By: Maximilian Bosch <maximilian@mbosch.me>
2025-02-07 16:55:28 -05:00
Pierre Bourdon
0c9726af59 flake: add nix-eval-jobs as input
(cherry picked from commit 684cc50d86608cccf7500ce00af89ea34c488473)
2025-02-07 16:55:28 -05:00
John Ericson
5100b85537 Merge pull request #1436 from obsidiansystems/test-aliased-constituents
Improve tests around constituents
2025-02-07 16:45:17 -05:00
John Ericson
141b5fd0b5 Improve tests around constituents
- Test how shorter names are preferred when multiple jobs resolve to the
  same derivation.

- Test the exact aggregate map we get, by looking in the DB.
2025-02-07 16:39:13 -05:00
John Ericson
8d78648e65 Merge pull request #1435 from obsidiansystems/flake-tests
Test using Hydra with flakes
2025-02-07 11:21:02 -05:00
John Ericson
8a8ac14877 Test using Hydra with flakes
It seemed there was no self-contained end-to-end test actually doing
this?!

Among other things, this will help ensure that the switch-over to
`nix-eval-jobs` is correct.
2025-02-06 21:30:49 -05:00
John Ericson
250668a19f Merge pull request #1426 from NixOS/queue-runner-wants-after
Make hydra-queue-runner want network-online.target
2024-12-05 19:28:15 -05:00
Martin Weinelt
efadb6a26c Make hydra-queue-runner want network-online.target
Just ordering yourself after network-online.target will not guarantee
that it will be loaded. You'll have to either want or require it. Hence
the following trace on recent nixpkgs versions:

evaluation warning: hydra-queue-runner.service is ordered after 'network-online.target' but doesn't depend on it
2024-12-03 01:44:55 +01:00
Janne Heß
3b16941b14 Merge pull request #1424 from AsterisMono/fix-darwin-tmp-path
reproduce.tt: Use realpath for tmpDir to fix macOS compatibility
2024-11-26 08:53:13 +01:00
Aaron Honeycutt
9de9cb0ad8 Update README (#1271)
* Update version in example

* Update docs to fix invalid indentifier when using 'hello'

* fix build issue for hello example

---------

Co-authored-by: Aaron Honeycutt <aaronhoneycutt@proton.me>
2024-11-26 08:52:24 +01:00
John Ericson
e75a4cbda8 Merge pull request #1422 from NixOS/meson
autotools -> meson
2024-11-25 10:13:36 -05:00
Chatnoir Miki
6456c1d7d6 reproduce.tt: Use realpath for tmpDir to fix macOS compatibility 2024-11-25 11:41:47 +08:00
Pierre Bourdon
182a48c9fb autotools -> meson
Original commit message:

> There are some known regressions regarding local testing setups - since
> everything was kinda half written with the expectation that build dir =
> source dir (which should not be true anymore). But everything builds and
> the test suite runs fine, after several hours spent debugging random
> crashes in libpqxx with MALLOC_PERTURB_...

I have not experienced regressions with local testing.

(cherry picked from commit 4b886d9c45cd2d7fe9b0a8dbc05c7318d46f615d)
2024-11-24 15:58:26 -05:00
John Ericson
f974891c76 Merge pull request #1420 from NixOS/nix-2.23
`sshPublicHostKey` fix for `master`
2024-10-24 17:03:20 +02:00
John Ericson
8515cb183e Merge branch 'nix-2.22' into nix-2.23 2024-10-21 11:23:41 -04:00
John Ericson
60dd7ec187 Merge branch 'nix-2.21' into nix-2.22 2024-10-21 11:23:30 -04:00
John Ericson
53b04ddf74 Merge branch 'nix-2.20' into nix-2.21 2024-10-21 11:23:20 -04:00
Martin Weinelt
4e2c06ec2c queue-runner: don't decode base64 hostkey in hydra
Nix expects a base64 encoded hostkey in SSHMaster, so make sure we don't
decode this prematurely in hydra.

Reported-By: Puck Meerburg <puck@puck.moe>
2024-10-21 11:22:44 -04:00
Jörg Thalheim
d3966d3e4c Merge pull request #1419 from NixOS/refactor-flake
make nixos module hydra from this repository by default
2024-10-20 15:06:48 +02:00
Jörg Thalheim
f442d74f6e remove unused nix dev flake inputs 2024-10-19 16:51:21 +00:00
Jörg Thalheim
a9a5b14331 make nixos module hydra from this repository by default
When people reach out to the git repository they probably want to use
hydra from the same source.
This also removes the need for an overlay with simpler and more
performant direct use of the nixpkgs passed in. Before it was
re-importing nixpkgs.

test
2024-10-19 16:42:38 +00:00
Jörg Thalheim
e6b9f0dec7 Merge pull request #1416 from Mindavi/bugfix/git-init-suppress-hint
nix-prefetch-git: set branch name to suppress hint from git
2024-10-19 18:20:08 +02:00
Jörg Thalheim
72899596df Merge pull request #1417 from Mindavi/bugfix/s3backup
S3Backup: fix compilation issue for undef MACHINE_LOCAL_STORE var
2024-10-19 18:19:39 +02:00
Jörg Thalheim
bdeec354c3 Merge pull request #1418 from NixOS/module-package
Make the in-tree package the default package
2024-10-19 18:09:02 +02:00
Martin Weinelt
1222ba03a6 Make the in-tree package the default package
There is an overlay for the `hydra` name, but `hydra_unstable` was used, which can refer to the nixpkgs package and lead to and outdated hydra version and requires configuring the correct package attribute downstream.
2024-10-19 17:30:59 +02:00
Rick van Schijndel
8a54924d2a nix-prefetch-git: set branch name to suppress hint from git
In my system logs I see this every time a new eval starts:

```
hydra-evaluator[PID]: hint: Using 'master' as the name for the initial branch. This default branch name
hydra-evaluator[PID]: hint: is subject to change. To configure the initial branch name to use in all
hydra-evaluator[PID]: hint: of your new repositories, which will suppress this warning, call:
hydra-evaluator[PID]: hint:
hydra-evaluator[PID]: hint:         git config --global init.defaultBranch <name>
hydra-evaluator[PID]: hint:
hydra-evaluator[PID]: hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hydra-evaluator[PID]: hint: 'development'. The just-created branch can be renamed via this command:
hydra-evaluator[PID]: hint:
hydra-evaluator[PID]: hint:         git branch -m <name>
```

This ensures this hint is not logged anymore and unclutters the syslog.
I presume it does not really matter what name is chosen for the branch.
2024-10-17 22:25:13 +02:00
Rick van Schijndel
2a7b070da0 S3Backup: fix compilation issue where MACHINE_LOCAL_STORE variable is undefined
See https://github.com/NixOS/hydra/pull/1414#issuecomment-2412350929

The variable is defined in src/lib/Hydra/Helper/Nix.pm

Error message without this patch:

```
hydra-evaluator[PID]: Couldn't require Hydra::Plugin::S3Backup : Global symbol "$MACHINE_LOCAL_STORE" requires explicit package name (did you forget to declare "my $MACHINE_LOCAL_STORE"?) at /nix/store/xxx-hydra-0-unstable-2024-09-24/libexec/hydra/lib/Hydra/Plugin/S3Backup.pm line 95.
hydra-evaluator[PID]: Compilation failed in require at /nix/store/xxx-hydra-perl-deps/lib/perl5/site_perl/5.38.2/Module/Runtime.pm line 314.
hydra-evaluator[PID]:  at /nix/store/xxx-hydra-perl-deps/lib/perl5/site_perl/5.38.2/Module/Pluggable.pm line 32.
```
2024-10-17 22:18:58 +02:00
John Ericson
c69e30122b Merge pull request #1411 from NixOS/nix-2.24-upgrade-wip
Nix 2.24 upgrade wip
2024-10-08 01:07:18 -04:00
John Ericson
750275d6e8 Avoid trailing slash that broke lookup 2024-10-07 11:43:58 -04:00
John Ericson
ceb8b48cce Fix type error with NAR accesssor 2024-09-24 12:14:23 -04:00
John Ericson
95003f2eb5 Merge pull request #1415 from NixOS/nix-2.23
Update to Nix 2.23
2024-09-24 12:00:47 -04:00
John Ericson
012cbd43f5 Add missing include 2024-09-24 11:51:17 -04:00