In https://github.com/NixOS/hydra/pull/1549 diffs were
offloaded to github for performance reasons.
While in some endpoints github accepts `.git` suffixed in the
repository name, in the comparison endpoint this does not seem
to be the case.
Specifically, on the main nixos org hydra this isn't working:
Example job: https://hydra.nixos.org/build/320178054
Generates a comparison link like so:
078d69f039...1cd347bf33
This just stips away the suffix and seems to work fine in local
testing.
In nixpkgs this started to fail the hydra tests.
It's not completely clear why because it seems the perlcritic
rule has existed for quite some time.
Anyway, this should solve the issues.
GitHub's reference list API does not actually restrict the specified type, so don't artificially restrict it.
The API does not actually make a distinction between the "type" and "prefix" at all, but this is maintained for backwards compatibility. The two are simply concatenated.
- Add proper waitpid() for child process cleanup
- Simplify file existence check loop with early exit
- Rename variables for clarity ($uri -> $request_uri, remove unused $i)
I did not notice in #1508 that the hydra evaluator now crashed because the hydra config is shared between all components, all of them need to be able to read the secret.
The list of jobsets is very high on hydra.nixos.org and the compare to
dropdown listing goes over multiple full pages in the busy projects.
If we ignore jobsets that we disable this interface becomes more usable
again.
- Add HMAC-SHA256 signature verification for webhooks
- Support multiple secrets for rotation
- Add security logging for authentication events
- Maintain backward compatibility (auth optional during migration)
- Add comprehensive test coverage
Without authentication, anyone could trigger job evaluations by sending
POST requests to webhook endpoints. This could lead to resource exhaustion
through repeated requests or manipulation of build scheduling. While not
a data breach risk, it allows unauthorized control over CI/CD operations.
- Replace deprecated exec_params/exec_params0 calls with exec()
- Wrap all parameterized queries with pqxx::params{}
- Add .no_rows()/.one_row() to exec calls that don't return results
This prevents a forever-hanging build (don't know why) when < or > are
in the path of hydra-build-products. This is not to prevent any XSS (see
next commits), just to prevent the DOS (if you can even call it that).
- Remove bottom margin
- Properly format memory in human format
- Calculate free memory
- Format the load with 2 digits after comma
- Lpad pressure percentages
- Use a macro to render pressure
- Score -> Scheduling Score
- More spacing in the load
- Add IRQ pressure
This is guarded behind a setting and will overwrite everything that was
learned from the machines file. Also drops `sshKeys` since that wasn't
used anyway.
As far as I understand we include nettools for its hostname executable
used by the Sys-Hostname-Long perl package. But if we just need that then
the hostname-debian package provides a simpler and better maintained
version.
- hydra does not remove the base URI from the request before processing
it, so this must be done in the reverse proxy. in nginx this is done
by giving proxy_pass a URI rather than a protocol/host/port; see:
https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_pass
- proxy_redirect is not correct/required: hydra uses proxy headers to
correctly form redirects in most cases, and where it doesn't it
produces local redirects which aren't matched by this directive anyway
error: access to absolute path '/nix/store/sai35xfsrba2a2vasmzxakmn54wdfa13-sourcepackaging' is forbidden in pure evaluation mode (use '--impure' to override)
Instead of just going for "whatever is the oldest build we know of",
use the following first:
- Is the step more constrained? If so, schedule it first to avoid
filling up "more desirable" build slots with less constrained builds.
- Does the step have more dependents? If so, schedule it first to try
and maximize open parallelism and breadth of scheduling options.
(cherry picked from commit b8d03adaf4)
New jobs have their "new" status take precedence over them being
"failed" or "queued", which means actions that can act on "failed" or
"queued" jobs weren't shown to the user when they could only act on
"new" jobs.
(cherry picked from commit 9a4a5dd624)
Quickfix for something that annoyed me once too often.
Specifically, I'm talking about `/eval/1#tabs-errors`.
To not fetch long errors on each request, this is only done on-demand.
I.e., when the tab is opened, an iframe is requested with the errors.
This iframe uses a template for both the jobset view and the jobset-eval
view. It is differentiated by checking if `jobset` or `eval` is defined.
However, the jobset-eval view also has a `jobset` variable in its stash
which means that in both cases the `if` path was used. Since
`jobset.fetcherrormsg` isn't defined in the eval case though, you always
got an empty error.
The band-aid fix is relatively simple: swap if and else: the `eval`
variable is not defined in the stash of the jobset view, so now this is
a useful condition to decide which view we're in.
(cherry picked from commit 70c3d75f73)
This is implement in an extremely hacky way due to poor DBIx feature
support. Ideally, what we'd need is a way to tell DBIx to ignore the
errormsg column unless explicitly requested, and to automatically add a
computed 'errormsg IS NULL' column in others. Since it does not support
that, this commit instead hacks some support via method overrides while
taking care to not break anything obvious.
This allows for better builder usage when the queue runner is busy. To
avoid running into uncontrollable imbalances between builder/queue
runner, we only release the machine reservation after the local
throttler has found a slot to start copying the outputs for that build.
As opposed to asserting uniqueness to understand resource utilization,
we just switch to using `std::unique_ptr`.
We don't rely on sequential / monotonic build IDs processing anymore, so
randomizing actually has the advantage of mixing builds for different
systems together, to avoid only one chunk of builds for a single system
getting processed while builders for other systems are starved.
Each output for a given step being ingested is looked up in parallel,
which should basically multiply the speed of builds ingestion by the
average number of outputs per derivation.
Running the query with/without it shows that it makes no difference to
postgres, since there's an index on finished=0 already. This allows a
few simplifications, but also paves the way towards running multiple
parallel monitor threads in the future.
By looking at the ratio of running vs. waiting for the dispatcher and
the queue monitor, we should get better visibility into what hydra is
currently bottlenecked on.
There are other side effects we can try to measure to get to the same
result, but having a simple way doesn't cost us much.
My current theory is that running more parallel xz than available CPU
cores is reducing our overall throughput by requiring more scheduling
overhead and more cache thrashing.
The third argument to `open()` in `-|` mode is passed to a shell if it's
a string. In my case the store URI contains
`?secret-key=${signingKey.directory}/secret&compression=zstd`
For the `nix store cat` case this means that
* until `&` the process will be started in the background. This fails
immediately because no path to cat is specified.
* `compression=zstd` is a variable assignment
* the `$path` argument to `store cat` is attempted to be executed as
another command
Passing just the list solves the problem.
(cherry picked from commit 3ee51dbe589458cc54ff753317bbc6db530bddc0)
When an artifact is requested from hydra the output is first copied
from the nix store into memory and then sent as a response, delaying
the download and taking up significant amounts of memory.
As reported in https://github.com/NixOS/hydra/issues/1357
Instead of calling a command and blocking while reading in the entire
output, this adds read_into_socket(). the function takes a
command, starting a subprocess with that command, returning a file
descriptor attached to stdout.
This file descriptor is then by responsebuilder of Catalyst to steam
the output directly
(cherry picked from commit 459aa0a5983a0bd546399c08231468d6e9282f54)
When building e.g. nixpkgs, the "Running builds" view will mostly look
like this
hello.x86_64-linux (Build of hello-X.Y)
exa.x86_64-linux (Build of exa-X.Y)
...
This doesn't provide any useful information. Showing the step name only
makes sense if it's not a child of the job's derivation. With this
patch, that information will only be shown if the drv name (i.e. w/o
`/nix/store/` prefix, .drv ext & hash) is not equal to the drv name of
the job itself (build.nixname).
When using Hydra to build machine configurations, you'll often see
"nixosConfigurations.foo" five times, i.e. for each build step being
run. This isn't very helpful I think because in such a case, a single
build step can also be compiling the Linux kernel.
This change also fetches the `drvpath` and `type` from the `buildsteps`
relation. We're already joining it, so this doesn't make much difference
(confirmed via query logging that this doesn't cause extra SQL queries).
Unfortunately build steps don't have a human readable name, so I'm
deriving it from the drvpath by stripping away the hash (assuming that
it'll never contain a `-` and that `/nix/store/` is used as prefix). I
decided against using the Nix bindings for that to avoid too much
overhead due to store operations for each build step.
In 73694087a0 I gave builds that failed
because of a timeout or exceeded log limit a stop sign and I stand by
that reasoning: with that it's possible to distinguish between actual
build failures and rather transient things such as timeouts.
Back then I considered it a feature that these are shown in a different
tab, but I don't think that's a good idea anymore. When using a jobset to
e.g. track the regressions from a mass rebuild (like a compiler or gcc
update), "Newly failed builds" should exclusively display regressions (and
flaky builds of course, not much I can do about that).
Also, when a bunch of builds fail in such a jobset because of e.g. a
broken connection to a builder that results in a timeout, I want to be
able to restart them all w/o rebuilding actual regressions.
To make it clear that we not only have "Aborted" builds in the tab, I
renamed the label to "Aborted / Timed out".
Depends on https://github.com/nix-community/nix-eval-jobs/pull/349 & #1421.
Almost equivalent to #1425, but with a small change: when having e.g. an
aggregate job with a glob that matches nothing, the jobset evaluation is
failed now. This was the intended behavior before (hydra-eval-jobset
fails hard if an aggregate is broken), the code-path was never reached
however since the aggregate was never marked as broken in this case
before.
There were some hangs caused by this. Need to fix them, ideally
reproducing the issue in a test, before trying this again.
This reverts commit 4a4a0f901c.
My main motivation here is to get metrics with brackets to work in order
to support "pytest" test names:
- test_foo.py::test_bar[1]
- test_foo.py::test_bar[2]
I couldn't find an "HTML escape"-style function that would generate
valid html `id` attribute names from random strings, so I went with a
hash digest instead.
Needed one more thing before trying out using `LegacySSHStore` directly.
Flake lock file updates:
• Updated input 'nix':
'github:NixOS/nix/674a87462cb93f605d4fbeef607d3453e7e5a7d8?narHash=sha256-TBoHqnIdVWhsBcL05vO2B1hSl9m//5Mz2NU%2BPMk3h3Y%3D' (2025-02-16)
→ 'github:NixOS/nix/e310c19a1aeb1ce1ed4d41d5ab2d02db596e0918?narHash=sha256-q/RgA4bB7zWai4oPySq9mch7qH14IEeom2P64SXdqHs%3D' (2025-02-18)
This avoids some duplicated code, leveraging the same `StoreReference`
type that also undergirds the machine file dedup we just did prior.
By using `LegacySSHStoreConfig`, we're also taking a baby step towards
using the store interface rather than messing around with the protocol
internals.
incrementally ingest eval results
nix-eval-jobs streams output, unlike hydra-eval-jobs. Now that we've
migrated, we can use this to:
1. Use less RAM by avoiding buffering a whole eval's worth of metadata
into a Perl string and an array of JSON objects.
2. Make evals latency a bit lower by allowing the queue runner to start
ingesting builds faster.
Also use the newly-restored constituents support in `nix-eval-jobs`
Note, we pass --workers and --max-memory-size to n-e-j
Lost in the h-e-j -> n-e-j migration, causing evaluation to always be
single threaded and limited to 4GiB RAM. Follow the config settings like
h-e-j used to do (via C++ code).
`nix-eval-jobs` should check `hydraJobs` and then `checks` with flakes
(cherry picked from commit 6d4ccff43c41adaf6e4b2b9bced7243bc2f6e97b)
(cherry picked from commit b0e9b4b2f99f9d8f5c4e780e89f955c394b5ced4)
(cherry picked from commit cdfc5c81e8037d3e4818a3e459d0804b2c157ea9)
(cherry picked from commit 4b107e6ff36bd89958fba36e0fe0340903e7cd13)
Co-Authored-By: Maximilian Bosch <maximilian@mbosch.me>
It seemed there was no self-contained end-to-end test actually doing
this?!
Among other things, this will help ensure that the switch-over to
`nix-eval-jobs` is correct.
Just ordering yourself after network-online.target will not guarantee
that it will be loaded. You'll have to either want or require it. Hence
the following trace on recent nixpkgs versions:
evaluation warning: hydra-queue-runner.service is ordered after 'network-online.target' but doesn't depend on it
* Update version in example
* Update docs to fix invalid indentifier when using 'hello'
* fix build issue for hello example
---------
Co-authored-by: Aaron Honeycutt <aaronhoneycutt@proton.me>
Original commit message:
> There are some known regressions regarding local testing setups - since
> everything was kinda half written with the expectation that build dir =
> source dir (which should not be true anymore). But everything builds and
> the test suite runs fine, after several hours spent debugging random
> crashes in libpqxx with MALLOC_PERTURB_...
I have not experienced regressions with local testing.
(cherry picked from commit 4b886d9c45cd2d7fe9b0a8dbc05c7318d46f615d)
When people reach out to the git repository they probably want to use
hydra from the same source.
This also removes the need for an overlay with simpler and more
performant direct use of the nixpkgs passed in. Before it was
re-importing nixpkgs.
test
There is an overlay for the `hydra` name, but `hydra_unstable` was used, which can refer to the nixpkgs package and lead to and outdated hydra version and requires configuring the correct package attribute downstream.
In my system logs I see this every time a new eval starts:
```
hydra-evaluator[PID]: hint: Using 'master' as the name for the initial branch. This default branch name
hydra-evaluator[PID]: hint: is subject to change. To configure the initial branch name to use in all
hydra-evaluator[PID]: hint: of your new repositories, which will suppress this warning, call:
hydra-evaluator[PID]: hint:
hydra-evaluator[PID]: hint: git config --global init.defaultBranch <name>
hydra-evaluator[PID]: hint:
hydra-evaluator[PID]: hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hydra-evaluator[PID]: hint: 'development'. The just-created branch can be renamed via this command:
hydra-evaluator[PID]: hint:
hydra-evaluator[PID]: hint: git branch -m <name>
```
This ensures this hint is not logged anymore and unclutters the syslog.
I presume it does not really matter what name is chosen for the branch.
See https://github.com/NixOS/hydra/pull/1414#issuecomment-2412350929
The variable is defined in src/lib/Hydra/Helper/Nix.pm
Error message without this patch:
```
hydra-evaluator[PID]: Couldn't require Hydra::Plugin::S3Backup : Global symbol "$MACHINE_LOCAL_STORE" requires explicit package name (did you forget to declare "my $MACHINE_LOCAL_STORE"?) at /nix/store/xxx-hydra-0-unstable-2024-09-24/libexec/hydra/lib/Hydra/Plugin/S3Backup.pm line 95.
hydra-evaluator[PID]: Compilation failed in require at /nix/store/xxx-hydra-perl-deps/lib/perl5/site_perl/5.38.2/Module/Runtime.pm line 314.
hydra-evaluator[PID]: at /nix/store/xxx-hydra-perl-deps/lib/perl5/site_perl/5.38.2/Module/Pluggable.pm line 32.
```
We should look into how to resolve this, but I tried some things and nothing really worked.
Let's put it skipped for now until someone comes along to improve it.
Only log issues/failures when something's actually up.
It has irked me for a long time that so much output came
out of running the tests, this seems to silence it.
It does hide some warnings, but I think it makes the output
so much more readable that it's worth the tradeoff.
Helps for highly parallel running of jobs, sometimes they'd not give output for a while.
Setting this timeout higher appears to help.
Not completely sure if this is the right place to do it, but it works fine for me.
We've seen many fails on ofborg, at lot of them ultimately appear to come down to
a timeout being hit, resulting in something like this:
Failure executing slapadd -F /<path>/slap.d -b dc=example -l /<path>/load.ldif.
Hopefully this resolves it for most cases.
I've done some endurance testing and this helps a lot.
some other commands also regularly time-out with high load:
- hydra-init
- hydra-create-user
- nix-store --delete
This should address most issues with tests randomly failing.
Used the following script for endurance testing:
```
import os
import subprocess
run_counter = 0
fail_counter = 0
while True:
try:
run_counter += 1
print(f"Starting run {run_counter}")
env = os.environ
env["YATH_JOB_COUNT"] = "20"
result = subprocess.run(["perl", "t/test.pl"], env=env)
if (result.returncode != 0):
fail_counter += 1
print(f"Finish run {run_counter}, total fail count: {fail_counter}")
except KeyboardInterrupt:
print(f"Finished {run_counter} runs with {fail_counter} fails")
break
```
In case someone else wants to do it on their system :).
Note that YATH_JOB_COUNT may need to be changed loosely based on your
cores.
I only have 4 cores (8 threads), so for others higher numbers might
yield better results in hashing out unstable tests.
With https://github.com/NixOS/nix/pull/9839, the `storeUri` field is
much better structured, so we can use it while still opening the SSH
connection ourselves.
Nixpkgs only contains a `hydra_unstable`, not `hydra`, package, so
adjust the default accordingly, and then override it to our package in
the separate module which does that.
This fixes:
> Caught exception in Hydra::Controller::Root->realisations "Undefined subroutine &Hydra::Controller::Root::queryRawRealisation called at /nix/store/v842xb35ph8ka1yi1xanjhk4xh1pn5nm-hydra-2024-04-22/libexec/hydra/lib/Hydra/Controller/Root.pm line 371."
Re-creating `nix-next` after using it in #1375.
Flake lock file updates:
• Updated input 'nix':
'github:NixOS/nix/60824fa97c588a0faf68ea61260a47e388b0a4e5' (2024-04-11)
→ 'github:NixOS/nix/aa438b8fbaebbbdb922655127053c4e8ea3e55bb' (2024-04-12)
When content addressed derivations are built on the hydra server,
one may run into an issue where some builds suddenly don't load anymore.
This seems to be caused by outPaths that are NULL (which is
allowed for ca-derivations). Filter them out to prevent querying the
database for them, which is not supported by the database abstraction
layer that's currently in use.
On my instance this appears to resolve the issue.
I feel like I might be doing this at the wrong abstraction layer, but on
the other hand -- it seems to resolve it and it also doesn't really look
like it will hurt anything.
The test added in a previous commit uncovers this issue, and this commit
resolves it. So I'm happy with this patch for now.
The issue I was seeing on my server:
hydra-server[2549]: [error] Couldn't render template "undef error - DBIx::Class::SQLMaker::ClassicExtensions::puke(): Fatal: NULL-within-IN not implemented: The upcoming SQL::Abstract::Classic 2.0 will emit the logically correct SQL instead of raising this exception. at /nix/store/<hash>-hydra-unstable-2024-03-08_nix_2_20/libexec/hydra/lib/Hydra/Helper/Nix.pm line 190
See also short discussion here: https://github.com/NixOS/nixpkgs/pull/297392#issuecomment-2035366263
Closes#1336
When restarting postgresql, the connections are still reused in
`hydra-queue-runner` causing errors like this
main thread: Lost connection to the database server.
queue monitor: Lost connection to the database server.
and no more builds being processed.
`hydra-evaluator` doesn't have that issue since it crashes right away.
We could let it retry indefinitely as well (see below), but I don't
want to change too much.
If the DB is still unreachable 10s later, the process will stop with a
non-zero exit code because of a missing DB connection. This however
isn't such a big deal because it will be immediately restarted
afterwards. With the current configuration, Hydra will never give up,
but restart (and retry) infinitely. To me that seems reasonable, i.e. to
retry DB connections on a long-running process. If this doesn't work
out, the monitoring should fire anyways because the queue fills up, but
I'm open to discuss that.
Please note that this isn't reproducible with the DB and the queue
runner on the same machine when using `services.hydra-dev`, because of
the `Requires=` dependency `hydra-queue-runner.service` ->
`hydra-init.service` -> `postgresql.service` that causes the queue
runner to be restarted on `systemctl restart postgresql`.
Internally, Hydra uses Nix's pool data structure: it basically has N
slots (here DB connections) and whenever a new one is requested, an idle
slot is provided or a new one is created (when N slots are active, it'll
be waited until one slot is free). The issue in the code here is however
that whenever an error is encountered, the slot is released, however the
same broken connection will be reused the next time. By using
`Pool::Handle::markBad`, Nix will drop a broken slot. This is now being
done when `pqxx::broken_connection` was caught.
This was the source of a flaky test because sometimes hydra-notify was
quick enough to send out `buildStarted` and sometimes it apparently
wasn't which was quickly spottable with `nix build --rebuild`.
Removing that status update doesn't make a difference functionally,
gitea doesn't differentiate between "queued" and "running", so we send
the same status ("pending") out on both events, so we'd even safe one
avoidable request.
It's a pet peeve from me when logging into my personal Hydra that I
always have to press the button rather than hitting Return after entering
my password.
Reason for that is that the form doesn't have a "submit" button, so far
it was always listened to the "click" event. Submit does that and you
can hit Return alternatively.
Implements support for Nix's new Perl bindings[1]. The current state
basically does `openStore()`, but always uses `auto` and doesn't support
stores at other URIs.
Even though the stores are cached inside the Perl implementation, I
decided to instantiate those once in the Nix helper module. That way
store openings aren't cluttered across the entire codebase. Also, there
are two stores used later on - MACHINE_LOCAL_STORE for `auto`,
BINARY_CACHE_STORE for the one from `store_uri` in `hydra.conf` - and
using consistent names should make the intent clearer then.
This doesn't contain any behavioral changes, i.e. the build product
availability issue from #1352 isn't fixed. This patch only contains the
migration to the new API.
[1] https://github.com/NixOS/nix/pull/9863
This is an integration test that confirms that jobset definitions from
git repositories are correctly built and status updates pushed to the
gitea instance. The following things needed to be fixed:
* We're still on 23.05 where gitea is marked as insecure. Not going to
update nixpkgs right now, but going for the quick fix.
* Since gitea 1.19 tokens have scopes that describe what's possible.
Not specifying the scope in the DB appears to imply that no
permissions are granted.
* Apparently we have three status updates now (for three status hooks,
queued/started/finished). No idea why that was broken before, but the
behavior still looks correct.
Re-creating `nix-next` after using it in #1354.
Flake lock file updates:
• Updated input 'nix':
'github:NixOS/nix/8df68a213fc52a57b02a57005b0e06cc8de40ce3' (2024-01-25)
→ 'github:NixOS/nix/75ebb90a70f6320c1c7a1fca87a0a8adb0716143' (2024-01-30)
In 1bd195a513 strictDeps was set for the
Hydra package. As a result, `checkInputs` aren't available anymore in
the local dev-shell which is the sole purpose of foreman, to start
services and a database for development.
In 5db374cb50 the `bootstrap` script was
removed, however it's still referenced in the contribution guidelines.
Change that to `autoreconfPhase` as intended by the commit.
This verison has a worse UI, but also chnages the schema less: One
non-null constraint is removed, but no new columns are added.
Co-Authored-By: Andrea Ciceri <andrea.ciceri@autistici.org>
Co-Authored-By: regnat <rg@regnat.ovh>
We have to oddly make a `StoreConfig` subclass to get it, but
https://github.com/NixOS/nix/pull/9848 will fix that.
The purpose of this is to ensure that, absent an explicit config,
`localhost` includes `ca-derivations` and `recursive-nix` if those
experimental features are enabled.
Very much the complement of #1342, the previous PR.
A slight dedup, and also ensures that floating CA derivations require a
`ca-derivations` experimental feature. This fixes the scheduling issue
that @SuperSandro2000 found.
This is *just* using the fields from that type, and only where the types
coincide. (There are two fields with different types, `speedFactor` most
interestingly.) No code is reused, so we can be sure that no behavior is
changed.
Once the types are reconciled on the Nix side, then we can start
carefully actually reusing code.
Progress on #1164
- Use the type itself
This lays the foundation for being able to dedup the protocol code.
- Use `BasicConnection::handshake`, replacing ours.
- Use `BasicConnection::queryValidPaths`
- Use `BasicConnection::putBuildDerivationRequest`
Instead of doing this partial operation a number of times, assert (with
a comment, get a reference to the thing inside, and use that just once.
(This refactor was done twice, "just once" for each time.)
Both sides need to agree on a version (with `std::min`) for anything to
work. Somehow... we've never done this.
With this comment, the next commit succeeds. Without this commit, the
next commit fails. This is because the next commit exposes serializers
which do different things for proto version 2.7, and we're currently
requesting 2.6.
Opened https://github.com/NixOS/nix/issues/9584 to track this issue
* Let tests themselves intentionally leak temp dir
By default Yath will clean up temporary files, so the result is the
same. But `--keep-dirs` can be passed to `yath test` telling Yath to
*not* clean them up instead. This is very useful for debugging.
* Update t/lib/HydraTestContext.pm
Co-authored-by: Cole Helbling <cole.e.helbling@outlook.com>
It has a performance cost, and as the comment says we should be doing
the better solution. We want to land this preparatory change on prod
while the rest is still on staging, so we should just skip it for now.
Skipping it will not affect regular fixed-output and input-addressed
derivations, which are the only ones prod would deal with upon getting
this code.
The main CA derivations support branch will revert this commit so it
still works.
The point of this branch is to always track Nix master, so we are
proactively ready to upgrade to the next Nix release when it is ready.
Flake lock file updates:
• Updated input 'nix':
'github:NixOS/nix/50f8f1c8bc019a4c0fd098b9ac674b94cfc6af0d' (2023-11-27)
→ 'github:NixOS/nix/c3827ff6348a4d5199eaddf8dbc2ca2e2ef46ec5' (2023-12-07)
• Added input 'nix/libgit2':
'github:libgit2/libgit2/45fd9ed7ae1a9b74b957ef4f337bc3c8b3df01b5' (2023-10-18)
For the record, here is the Nix 2.19 version:
https://github.com/NixOS/nix/blob/2.19-maintenance/src/libstore/serve-protocol.cc,
which is what we would initially use.
It is a more complete version of what Hydra has today except for one
thing: it always unconditionally sets the start/stop times.
I think that is correct at the other end seems to unconditionally
measure them, but just to be extra careful, I reproduced the old
behavior of falling back on Hydra's own measurements if `startTime` is
0.
The only difference is that the fallback `stopTime` is now measured from
after the entire `BuildResult` is transferred over the wire, but I think
that should be negligible if it is measurable at all. (And remember,
this is fallback case I already suspect is dead code.)
An empty string is a sneaky way to avoid hard failures --- things that
expect strings still get strings, but it does conversely open the door
up to soft failures (spooky-action-at-a-distance ones because the string
did not have the expected invariants).
"Fail fast" with null will ultimately make the system more robust, but
force us to fix more things up front, and I don't want to change this
without also fixing those things up front, especially as this commit is
for now just part of the the preparatory PR for which this is dead code.
Brought up by @thufschmitt in
https://github.com/NixOS/hydra/pull/1316#discussion_r1415111329 . This
makes this closer to what was originally there --- which just dispatched
off the experimental feature rather than the presence/absense of the
output, too.
This is just C++ changes without any Perl / Frontend / SQL Schema
changes.
The idea is that it should be possible to redeploy Hydra with these
chnages with (a) no schema migration and also (b) no regressions. We
should be able to much more safely deploy these to a staging server and
then production `hydra.nixos.org`.
Extracted from #875
Co-Authored-By: Théophane Hufschmitt <theophane.hufschmitt@tweag.io>
Co-Authored-By: Alexander Sosedkin <monk@unboiled.info>
Co-Authored-By: Andrea Ciceri <andrea.ciceri@autistici.org>
Co-Authored-By: Charlotte 🦝 Delenk Mlotte@chir.rs>
Co-Authored-By: Sandro Jäckel <sandro.jaeckel@gmail.com>
- `nativeBuildInputs` vs `buildInputs`
- narrow down `with`s for clarity
- use `autoreconfHook` not `bootstrap` script
These sorts of changes have also been done in the Nix repo.
Since the default lengths in Crypt::Passphrase::Argon2 changed from 16
to 32 in in 0.009, some tests that expected the passphrase to be
unchanged started failing.
The previous implementation was O(N²lg(N)) due to sorting the full
runnables priority list once per runnable being scheduled. While not
confirmed, this is suspected to cause performance issues and
bottlenecking with the queue runner when the runnable list gets large
enough.
This commit changes the dispatcher to instead only sort runnables per
priority once per dispatch cycle. This has the drawback of being less
reactive to runnable priority changes: the previous code would react
immediately, while this might end up using "old" priorities until the
next dispatch cycle. However, dispatch cycles are not supposed to take
very long (seconds, not minutes/hours), so this is not expected to have
much or any practical impact.
Ideally runnables would be maintained in a sorted data structure instead
of the current approach of copying + sorting in the scheduler. This
would however be a much more invasive change to implement, and might
have to wait until we can confirm where the queue runner bottlenecks
actually lie.
This prevents eval errors when a machine is just started and the network isn't yet online.
I'm running hydra on a laptop and the network takes a bit of time to come online (WLAN),
so it's nice if the evaluator starts only when the network actually goes online.
Otherwise an error like this can happen on the first eval(s):
```
error fetching latest change from git repo at `https://github.com/nixos/nixpkgs.git':
fatal: unable to access 'https://github.com/nixos/nixpkgs.git/': Could not resolve host: github.com
```
To correctly render HTML reports we make sure to return the following MIME
types instead of "text/plain"
- *.css: "text/css"
- *.js: "application/javascript"
Fixes: #1267
Nowadays `Builds` doesn't reference `Project` directly anymore. This
means that simply resolving both `jobset` and `project` with a single
JOIN from `Builds` doesn't work anymore. Instead we need to resolve the
relation to `jobset` first and then the relation to `project`.
For similar fixes see e.g. c7c4759600.
There's currently no automatic recovery for disconnected databases in
the evaluator. This means if the database is ever temporarily
unavailable, hydra-evaluator will sit and spin with no work
accomplished.
If this condition is caught, the daemon will exit and systemd will be
responsible for resuming the service.
We were using protocol version 6 but requesting version 4. The only
reason that this worked was because of a broken version check in
'nix-store --serve'. That was fixed in
c2d7456926,
which had the side-effect of breaking hydra-queue-runner.
https://en.wikipedia.org/wiki/HipChat says:
> Following this, HipChat and Stride customers were migrated to the
> Slack group collaboration platform in a transition that was completed by
> February 2019.
nix.trustedUsers is deprecated as of 22.05, and since the nix.extraOptions config is just doing something similar, I moved that to the new nix.settings as well
NOTE: I'm well-aware that we have to be careful with this to avoid new
regressions on hydra.nixos.org, so this should only be merged after
extensive testing from more people.
Motivation: I updated Nix in my deployment to 2.9.1 and decided to also
update Hydra in one go (and compile it against the newer Nix). Given
that this also updates the C++ code in `hydra-{queue-runner,eval-jobs}`
this patch might become useful in the future though.
The newest version of git refuses to work on repositories not owned by
the current user. This leads to issues with the /api/scmdiff endpoint:
May 27 11:16:05 myhydra hydra-server[923698]: fatal: unsafe repository ('/var/lib/hydra/scm/git/57ea036ec7ecd85c8dd085e02ecc6f12dd5c079a6203d16aea49f586cadfb2be' is owned by someone else)
May 27 11:16:05 myhydra hydra-server[923698]: To add an exception for this directory, call:
May 27 11:16:05 myhydra hydra-server[923698]: git config --global --add safe.directory /var/lib/hydra/scm/git/57ea036ec7ecd85c8dd085e02ecc6f12dd5c079a6203d16aea49f586cadfb2be
May 27 11:16:05 myhydra hydra-server[923701]: warning: Not a git repository. Use --no-index to compare two paths outside a working tree
May 27 11:16:05 myhydra hydra-server[923701]: usage: git diff --no-index [<options>] <path> <path>
I used the same solution that was used in NixOS/nix#6440.
Fixes#1214
I started to wonder quite recently why Hydra doesn't send email
notifications anymore to me. I saw the following issue in the log of
`hydra-notify.service`:
May 22 11:57:29 hydra 9bik0bxyxbrklhx6lqwifd6af8kj84va-hydra-notify[1887289]: fatal: unsafe repository ('/var/lib/hydra/scm/git/3e70c16c266ef70dc4198705a688acccf71e932878f178277c9ac47d133cc663' is owned by someone else)
May 22 11:57:29 hydra 9bik0bxyxbrklhx6lqwifd6af8kj84va-hydra-notify[1887289]: To add an exception for this directory, call:
May 22 11:57:29 hydra 9bik0bxyxbrklhx6lqwifd6af8kj84va-hydra-notify[1887289]: git config --global --add safe.directory /var/lib/hydra/scm/git/3e70c16c266ef70dc4198705a688acccf71e932878f178277c9ac47d133cc663
May 22 11:57:29 hydra 9bik0bxyxbrklhx6lqwifd6af8kj84va-hydra-notify[1886654]: error running build_finished hooks: command `git log --pretty=format:%H%x09%an%x09%ae%x09%at b0c30a7557685d25a8ab3f34fdb775e66db0bc4c..eaf28389fcebc2beca13a802f79b2cca6e9ca309 --git-dir=.git' failed with e>
This is also a problem because of Git's fix for CVE-2022-24765[1], so I
applied the same fix as for Nix[2], by using `--git-dir` which skips the
code-path for the ownership-check[3].
[1] https://lore.kernel.org/git/xmqqv8veb5i6.fsf@gitster.g/
[2] https://github.com/NixOS/nix/pull/6440
[3] To quote `git(1)`:
> Specifying the location of the ".git" directory using this option
> (or GIT_DIR environment variable) turns off the repository
> discovery that tries to find a directory with ".git" subdirectory
On hydra.nixos.org the queue runner had child processes that were
stuck handling an exception:
Thread 1 (Thread 0x7f501f7fe640 (LWP 1413473) "bld~v54h5zkhmb3"):
#0 futex_wait (private=0, expected=2, futex_word=0x7f50c27969b0 <_rtld_local+2480>) at ../sysdeps/nptl/futex-internal.h:146
#1 __lll_lock_wait (futex=0x7f50c27969b0 <_rtld_local+2480>, private=0) at lowlevellock.c:52
#2 0x00007f50c21eaee4 in __GI___pthread_mutex_lock (mutex=0x7f50c27969b0 <_rtld_local+2480>) at ../nptl/pthread_mutex_lock.c:115
#3 0x00007f50c1854bef in __GI___dl_iterate_phdr (callback=0x7f50c190c020 <_Unwind_IteratePhdrCallback>, data=0x7f501f7fb040) at dl-iteratephdr.c:40
#4 0x00007f50c190d2d1 in _Unwind_Find_FDE () from /nix/store/65hafbsx91127farbmyyv4r5ifgjdg43-glibc-2.33-117/lib/libgcc_s.so.1
#5 0x00007f50c19099b3 in uw_frame_state_for () from /nix/store/65hafbsx91127farbmyyv4r5ifgjdg43-glibc-2.33-117/lib/libgcc_s.so.1
#6 0x00007f50c190ab90 in uw_init_context_1 () from /nix/store/65hafbsx91127farbmyyv4r5ifgjdg43-glibc-2.33-117/lib/libgcc_s.so.1
#7 0x00007f50c190b08e in _Unwind_RaiseException () from /nix/store/65hafbsx91127farbmyyv4r5ifgjdg43-glibc-2.33-117/lib/libgcc_s.so.1
#8 0x00007f50c1b02ab7 in __cxa_throw () from /nix/store/dd8swlwhpdhn6bv219562vyxhi8278hs-gcc-10.3.0-lib/lib/libstdc++.so.6
#9 0x00007f50c1d01abe in nix::parseURL (url="root@cb893012.packethost.net") at src/libutil/url.cc:53
#10 0x0000000000484f55 in extraStoreArgs (machine="root@cb893012.packethost.net") at build-remote.cc:35
#11 operator() (__closure=0x7f4fe9fe0420) at build-remote.cc:79
...
Maybe the fork happened while another thread was holding some global
stack unwinding lock
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744). Anyway, since
the hanging child inherits all file descriptors to SSH clients,
shutting down remote builds (via 'child.to = -1' in
State::buildRemote()) doesn't work and 'child.pid.wait()' hangs
forever.
So let's not do any significant work between fork and exec.
Re-executing this search_related on every access turned out to
create very problematic performance. If a jobset had a lot of
error output stored in the jobset, and there were many hundreds
or thousands of active jobs, this could easily cause >1Gbps of
network traffic.
Otherwise, when the port is randomly chosen (e.g. by specifying no port,
or a port of 0), it will just show that the port is 0 and not the port
that is actually serving the metrics.
[vin@scadrial:~/workspace/vcs/hydra]$ foreman -h
Warning: the running version of Bundler (2.1.4) is older than the version that created the lockfile (2.2.20). We suggest you to upgrade to the version that created the lockfile by running `gem install bundler:2.2.20`.
Traceback (most recent call last):
2: from /nix/store/ycshcdssxcj9sjf6yzb1ydw4fcglf66y-foreman-0.87.2/bin/foreman:20:in `<main>'
1: from /nix/store/ggqacj06n6qfm1iww0bih9ph0j89wcna-bundler-2.1.4/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/rubygems_integration.rb:413:in `block in replace_bin_path'
/nix/store/ggqacj06n6qfm1iww0bih9ph0j89wcna-bundler-2.1.4/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/rubygems_integration.rb:374:in `block in replace_bin_path': can't find executable foreman for gem foreman. foreman is not currently included in the bundle, perhaps you meant to add it to your Gemfile? (Gem::Exception)
This is syntactically lighter wait, and demonstates there are no weird
dynamic lifetimes involved, just regular passing reference to callee
which it only borrows for the duration of the call.
- Add localStore into the stash because it's used in templates
- Hide the Channels button for non-local stores because the link 404s
anyway
- Fix a style issue when having popovers in dark mode
Periodically, I have seen tests fail because of out of order queue runner behavior:
checking the queue for builds > 0...
loading build 1 (tests:basic:empty_dir)
aborting unsupported build step '...-empty-dir.drv' (type 'x86_64-linux')
marking build 1 as failed
adding new machine ‘localhost’
This patch should prevent the dispatcher from running before any machines are
made available.
This in-progress feature will run a dynamically generated set of
buildFinished hooks, which must be nested under the `runCommandHook.*`
attribute set. This implementation is not very good, with some to-dos:
1. Only run if the build succeeded
2. Verify the output is named $out and that it is an executable file
(or a symlink to a file)
3. Require the jobset itself have a flag enabling the feature, since
this feature can be a bit dangerous if various people of different
trust levels can create the jobs.
This shouldn't be possible normally, but it is possible to:
$db->resultset('RunCommandLogs')->new({ uuid => "../etc/passwd" });
if you have access to the `$db`.
Also split it out to a new div -- there are now 3 lines per
RunCommandLog -- the first saying when it started, the second saying how
long it ran for (or has been running), and the third with the buttons
for the pretty, raw, and tail versions of the log.
This also adds the `runcommandlog` object to the stash so that we can
access its uuid as well as command run in order to display more useful
and specific information on the webpage.
Using a sha1 of the command combined with the build ID is not a
particularly good or unique identifier:
* A build could fail, be restarted, and then succeed -- assuming no
configuration changes, the sha1 hash of the command as well as the build
ID will be the same. This would lead to an overwritten log file.
* Allowing user input to influence filenames is not the best of ideas.
Since breaking the filename construction out to a helper function,
Hydra::Model::DB is no longer used. Importing Hydra::Helper::Nix,
however, has the potential to break tests, so just use the functions we
need without importing the entire module.
run3 just seems to do better handling for what we want to do, and
requires less deep-reaching changes to this plugin to get it to play
nice, as IPC::Run::run would.
This uses the somewhat restrictive umask of 0027 so that people outside
the user or group cannot read the files. This also helps to inhibit
TOCTOU where someone else has a handle to our file before we chmod it
and after we close it.
In a Hydra instance I saw:
possibly transient failure building ‘/nix/store/X.drv’ on ‘localhost’:
dependency '/nix/store/Y' of '/nix/store/Y.drv' does not exist,
and substitution is disabled
This is confusing because the Hydra in question does have substitution enabled.
This instance uses:
keep-outputs = true
keep-derivations = true
and an S3 binary cache which is not configured as a substituter in the nix.conf.
It appears this instance encountered a situation where store path Y was built
and present in the binary cache, and Y.drv was GC rooted on the instance,
however Y was not on the host.
When Hydra would try to build this path locally, it would look in the binary
cache to see if it was cached:
(nix)
439 bool valid = isValidPathUncached(storePath);
440
441 if (diskCache && !valid)
442 // FIXME: handle valid = true case.
443 diskCache->upsertNarInfo(getUri(), hashPart, 0);
444
445 return valid;
Since it was cached, the store path was considered Valid.
The queue monitor would then not put this input in for substitution, because
the path is valid:
(hydra)
470 if (!destStore->isValidPath(*i.second.path(*localStore, step->drv->name, i.first))) {
471 valid = false;
472 missing.insert_or_assign(i.first, i.second);
473 }
Hydra appears to correctly handle the case of missing paths that need
to be substituted from the binary cache already, but since most
Hydra instances use `keep-outputs` *and* all paths in the binary cache
originate from that machine, it is not common for a path to be cached
and not GC rooted locally.
I'll run Hydra with this patch for a while and see if we run in to the
problem again.
A big thanks to John Ericson who helped debug this particular issue.
I'm not sure this is a good implementation as-is. It does work,
but the password gets echo'd to the screen. I tried to use IO::Prompt
but IO::Prompt really seems to want to read the password from ARGV.
Deleting jobsets first would fail because buildmetrics has an FK
to the jobset. However, the jobset / project relationship is not
marked as CASCADE.
Deleting all the builds automatically cascades to delete
buildmetrics, so deleting the relevant builds first, then deleting
the jobset solves it.
I tried to write the test in such a way to assert the content matched
what we expected, but since the ordering of them is not known, it
is quite tricky to write.
When having a builder like this in `/etc/nix/machines`
ssh://mfbuild?remote-store=/home/bosch/store
Hydra cannot build there since it tries to pass the entire value to
`ssh(1)` which doesn't work. Also, an alternate store-location is e.g.
used if the user isn't a trusted user on the remote system and thus
cannot use `/nix/store`.
If such a URI is given, Hydra will now add a `--store /home/bosch/store`
to the `ssh`-command to select the appropriate location remotely.
Using it causes database information to get fixated early, before tests can set a
new database. We only used it in one case, and that is an absolute reference anyway. The
tests for channel generation are passing, and that uses
[requireLocalStore, so this should be fine.
I'm honestly too lazy to create two commits for fixing these one-line
issues so here's one.
The first hunk fixes the name of the projectName input. This is relevant
now because it gets logged and the log message looks stupid when there
is an input without a name.
The second hunk fixes a warning when using declarative non-flake
jobsets. The implementation may look weird but it's just the same as the
logical implication operator of nix.
This is necessary because jobset and project names are not allowed to
begin with a digit, and yet the generated jobset and project names would
do just that.
Not the most elegant solution, but it works.
The indentation in the hydra.conf makes it possible to include multi-line
strings without it being likely that the contents of the tracker
is mis-parsed or interrupts tho config parser.
It isn't impossible / foolproof probably, but it shouldn't be likely.
Currently we only track how long individual plugins take.
With #1083 we stop executing a lot of plugins, but we
don't have a way to measure its practical impact on the
execution time of handling events.
Debug enables info and debug log levels and provides quite a lot of useful dev-time information.
Restart automatically restarts the app when the .pm files change.
This might, hopefully, I don't know, possibly force the
database to live a little while longer and *reduce* but not
eliminate errors around stopping the database before we lose all
our DB::PG handles to it.
This happens with flake jobsets for obvious reasons (namely, that nixexprinput
and nixexprpath may be undefined for a flake jobset).
12:38:59 hydra-evaluator.1 | Use of uninitialized value $args[0] in join or string at /home/vin/workspace/vcs/hydra/src/script/hydra-eval-jobset line 648.
12:38:59 hydra-evaluator.1 | Use of uninitialized value $args[1] in join or string at /home/vin/workspace/vcs/hydra/src/script/hydra-eval-jobset line 648.
11:38:20 hydra-server.1 | DEPRECATION WARNING: The Regex dispatch type is deprecated.
11:38:20 hydra-server.1 | It is recommended that you convert Regex and LocalRegex
11:38:20 hydra-server.1 | methods to Chained methods. at /nix/store/aa6gw57fnahd4824pbhmvcs0jlypmynq-hydra-perl-deps/lib/perl5/site_perl/5.32.1/Catalyst/DispatchType/Regex.pm line 210.
12:34:12 hydra-server.1 | Use of uninitialized value $s in substitution (s///) at /home/vin/workspace/vcs/hydra/src/script/../lib/Hydra/Helper/CatalystUtils.pm line 283, <$fh> line 1.
11:28:20 hydra-server.1 | [warn] Unicode::Encoding plugin is auto-applied, please remove this from your appclass and make sure to define "encoding" config
12:10:15 hydra-notify.1 | %channels_to_events{...} in scalar context better written as $channels_to_events{...} at /home/vin/workspace/vcs/hydra/src/lib/Hydra/Event.pm line 20.
At the moment, aggregate jobs can easily break and cause the entire
evaluation to fail, which is not ideal. For Nixpkgs, we do have some
important aggregate jobs (like `tested`), but for debugging and building
purposes it's still useful to get a partial result even if the channel
won't actually advance.
This commit changes the behaviour of hydra-eval-jobs such that it
aggregates any errors found during the construction of an aggregate, and
will instead annotate the job with the evaluation failure such that it
shows up in a "cleaner" way.
There are really two types of failure that we care about: one is where
the attribute just ends up missing altogether in the final output, and
also where the attribute is in the output but fails to evaluate. Both
are handled here.
Note that this does mean that the same error message may be output
multiple times, but this aids debuggability because it'll be much
clearer what's blocking the job from being created.
At the moment, the jobset object is unlikely to actually retrieve the
evaluation error output, because it isn't refreshed after
hydra-eval-jobsets is run.
Explicitly calling DBIx::Class::Row->discard_changes causes any updated
data to be refreshed, at the cost of losing any not-yet committed
changes to the row.
Fix parsing breakage from #1003: assigning the lines to $lines broke chomp and the filters.
This test validates the parsing works as expected, and also fixes
a minor bug where '-' in features isn't pruned, like in the C++
repo.
This is causing CI to fail after #1026 merged. #1026 had a green
bill of health, but #1003 increased perlcritic to level 4. #1003
was not part of #1026 so it was not checked at perlcritic level 4.
Fixes
mv: cannot move './static/bootstrap-4.3.1-dist' to './static/bootstrap/bootstrap-4.3.1-dist': Directory not empty
when 'make' is called more than once.
In NixOS, the user generation script was changed to set the permissions `0700`
to a home-directory that's specified in the `users.users`-submodule with
`createHome` being set to `true`[1].
However, the home-directory of `hydra` is also the base directory of other services using
other users (e.g. `hydra-queue-runner`). With permissions being `0700`, processes with
such a user cannot traverse into `/var/lib/hydra` and thus not into subdirectories.
I guess that this issue was kind of hidden because `hydra-init.service` ensures
proper permissions[2]. However, if `hydra-init.service` is not restarted on a
system-activation, the permissions of `/var/lib/hydra` will be set back to `0700`
by the activation script that runs on each activation.
This has lead to errors like this in `hydra-queue-runner` on my Hydra:
```
Sep 20 09:11:30 hydra hydra-queue-runner[306]: error (ignored): error: cannot unlink '/var/lib/hydra/build-logs/7h/dssz03gazrkqzfmlr5cprd0dvkg4db-squashfs.img.drv': Permission denied
Sep 20 09:11:30 hydra hydra-queue-runner[306]: error (ignored): error: cannot unlink '/var/lib/hydra/build-logs/b9/350vd8jpv1f86i312c9pkdcd2z56aw-squashfs.img.drv': Permission denied
Sep 20 09:11:30 hydra hydra-queue-runner[306]: error (ignored): error: cannot unlink '/var/lib/hydra/build-logs/kz/vlq4v9a1rylcp4fsqqav3lcjgskky4-squashfs.img.drv': Permission denied
Sep 20 09:11:30 hydra hydra-queue-runner[306]: error (ignored): error: cannot unlink '/var/lib/hydra/build-logs/xd/hkjnbbr9jp7364pkn8zpk9v8xapj2c-nix-2.4pre20210917_37cc50f.drv': Permission denied
Sep 20 09:11:30 hydra hydra-queue-runner[306]: error (ignored): error: cannot unlink '/var/lib/hydra/build-logs/zn/9df7225fl8p7iavqqfvlyay4rf0msw-nix-2.4pre20210917_37cc50f.drv': Permission denied
Sep 20 09:11:30 hydra hydra-queue-runner[306]: possibly transient failure building ‘/nix/store/7hdssz03gazrkqzfmlr5cprd0dvkg4db-squashfs.img.drv’ on ‘roflmayr’: error: creating directory '/var/lib/hydra/build-logs': Permission denied
Sep 20 09:11:30 hydra hydra-queue-runner[306]: will retry ‘/nix/store/7hdssz03gazrkqzfmlr5cprd0dvkg4db-squashfs.img.drv’ after 543s
```
Because of that, I decided to remove the `createHome = true;` setting and instead used
`systemd-tmpfiles`[3] which can not only ensure that certain directories
exist, but also proper permissions.
With this change, we can also get rid of the manual setup in
`hydra-init.service` since `systemd-tmpfiles` will be executed by
`switch-to-configuration` before *any* systemd service gets started. On
startup, `systemd-tmpfiles-setup.service` is invoked within
`sysinit.target` being reached, so when `hydra-init.service` gets called
in `multi-user.target`, the structure already exists.
[1] fa0d499dbf
[2] 3cec908738/hydra-module.nix (L260-L262)
[3] https://www.freedesktop.org/software/systemd/man/systemd-tmpfiles.html
When I take a look at *all* failing builds (by clicking at `[...] more
jobs omitted`) and I try to compare the failures to another jobset, I'd
like to still view *all* failing builds in the compare-view.
This wasn't the case before since the `full=`-param was ignored by the
compare-buttons.
Yet again, manual testing is proving to be insufficient. I'm pretty
sure I wrote this code but lost it in a rebase, or perhaps the switch
to result classes.
At any rate, this implements the actual "fetch a retry row and run it"
for the hydra-notify daemon.
Tested by hand.
Get the number of seconds before the next retriable task is ready.
This number is specifically intended to be used as a timeout, where
`undef` means never time out.
passwords were replaced with salted sha1 instead of sha256, because I
don't want to have to figure out how to make slapd load this module
We could also just do {CLEARTEXT} for the purpose of this test
The test seems to be broken for a while[1]. The cause for this is that
in gitea 1.14 the `create-user` command got renamed to `user create`.
[1] https://hydra.nixos.org/build/151092299
This gives us a place to put helper functions that act on entire
tables, not just individual records.
This should be a backwards compatible change, except in places we're
manually using result class names.
Declarative jobsets were sort of tucked in to the event hanlder
itself. It turned out that it could have been implemented as a
plugin without much trouble.
<version> is a standard header with C++20 which could cause issues if a library checks it exists then imports it
Because we have the root of this repo in the include path, it'd see that <version> exists (with, e.g., __has_include), and then try to include it as a header
But because it's just a file that says 0.1, this would fail
This happens with libpqxx 7
Without this commit, two jobsets using the same repository as input,
but different `deepClone` options, end up incorrectly sharing the same
"checkout" for a given (`uri`, `branch`, `revision`) tuple. The
presence or absence of `.git` is determined by the jobset execution
order.
This patch adds the missing `isDeepClone` boolean to the cache key.
The database upgrade script empties the `CachedGitInputs` table, as we
don't know if existing checkouts are deep clones. Unfortunately, this
generally forces rebuilds even for correct `deepClone` checkouts, as
the binary contents of `.git` are not deterministic.
Fixes#510
Nixpkgs on unstable has removed `stdenv.lib` as they've been warning for a while now. This removes the extra references to it in the flake.nix
I'm not entirely sure if this is right, but I figured it was trivial enough to give a quick try using the GH Editor while I was waiting for a job to finish
I broke this when I added `me.` in f1e75c8bff
I added me. to disambiguate `id`, but:
* eval.id works on the per-build page
* me.id works on the other pages
* Just id works everywhere if I drop:
, prefetch => { evaluationerror => [ ] },
but this causes a query per row to collect the evaluationerror
records later, this becomes significantly slow on non-trivial
datasets.
Using evals->current_source_alias will use the correct alias
whether it is me or eval or something else.
Exposes metrics:
* http_request_duration_seconds_bucket
* http_request_size_bytes_bucket
* http_response_size_bytes_bucket
* http_requests_total
with labels of action and controller to help identify popular
endpoints and their performance characteristics.
If the project isn't declarative, who cares about it in the response? After
setting the `declfile` to an empty string, everything related to declarative-
ness is wiped out, anyways.
It appears the Jobs table was removed in
8adb433e3b, but the Jobsets schema was never
updated to reflect this. This relationship was added in
efa1f1d4fb, roughly 3 months prior.
Previously, one would see a message similar to the following logged when
deleting a jobset:
17:38:23 hydra-server.1 | DBIx::Class::Relationship::CascadeActions::delete(): Skipping cascade delete on relationship 'jobs' - related resultsource 'Hydra::Schema::Jobs' is not registered with this schema at /home/vin/workspace/vcs/hydra/src/script/../lib/Hydra/Controller/Jobset.pm line 106
Something in the upgrade of Bootstrap and JQuery broke lazy tab loading.
I don't understand what is providing the tab behavior, how it should
work, or what the correct fix is.
I can tell you that this patch fixes the issue: when loading a tab
with a URL fragment deep-linking to a lazily loaded tab... it now
loads.
Close#959
This appears to have been broken in ac3e8a4a59,
which removed the `jobsetevals` column from the Projects schema, but didn't
update the Controller accordingly.
Fixes the test added in the previous commit.
To further align with the API, we return custom JSON in order to display a
`visible` field rather than `hidden` -- a `PUT` request expects `visible`, while
a `GET` request returns `hidden`.
This also allows us to rename the `jobsetinputs` field to `inputs` for the same
reason: `PUT` expects `inputs`, while `GET` returns `jobsetinputs`.
`PUT /jobsets/{project-id}/{jobset-id}` expects a JSON object `inputs` which
maps a name to a name, a type, a value, and a boolean that enables emailing
responsible parties. However, `GET /jobsets/{project-id}/{jobset-id}` responds
with an object that doesn't contain a value, but does contain a jobsetinputalts
(which is old and should be unused).
This commit aligns the two by removing the old and unused `jobsetinputalts` from
the response and replaces it with `value`.
1. Configure the in-memory Hydra instance with a null path input cache
time to avoid caching slowing the test down.
2. Use the Catalyst::Test helpers so we talk to the application and skip
needing to actually run a webserver.
3. Change path references to use a tempdir, since this is running while
other tests are also running.
4. Change the login flow to save a cookie and pass it manually. A bit
weird, but it avoids a dependency on heavier browser-mimicking
libraries.
Set `dest_store` in the test hydra config, so that the testsuite ensures
that the distinction between the local store and the destination store
is properly taken into account.
Fix#938
* made all columns available via the API (except for forceeval)
* renamed flakeref to flake to unify the API with the database schema
* renamed inputs to jobsetinputs to unify the API with the database schema
The addition of AuthenSASL seems to be necessary to properly
authenticate against an SMTP server. Without this I got errors
such as
error with Hydra::Plugin::EmailNotification=HASH(0x6ad0128)->buildFinished: SMTP auth requires MIME::Base64 and Authen::SASL
The checkbox is only enabled if `email_notification = 1` is set in
`hydra.conf`. However, when creating jobset (in contrast to the edit
form), the checkbox is always disabled because the `emailNotification`
parameter in Catalyst's stash was missing.
Passwords that are sha1 will be transparently upgraded to argon2,
and future comparisons will use Argon2
Co-authored-by: Graham Christensen <graham@grahamc.com>
The default password comparison logic does not use
constant time validation. Switching to constant time
offers a meager improvement by removing a timing
oracle.
A prepatory step in moving to Argon2id password storage, since we'll need this change anyway after
for validating existing passwords.
Co-authored-by: Graham Christensen <graham@grahamc.com>
Some time in the last decade the plugin switched to preferring
a flatter namespace for realm config.
Co-authored-by: Graham Christensen <graham@grahamc.com>
In Nix the protocol was slightly altered[1] to also contain more
information about realisations. This however wasn't read from the pipe
that was used to read from the store.
After the `cmdBuildDerivation` command which caused this issue, Hydra
will issue a `cmdQueryPathInfos` that tries to read from the remote
store as well. However, there's still left over to read from the
previous command and thus Nix fails to properly allocate the expected
string.
[1] See rev a2b69660a9b326b95d48bd222993c5225bbd5b5f
Fixes#898
Otherwise the logs are spammed with database not existing errors:
15:46:07 postgres.1 | 2021-04-05 15:46:07.631 UTC [30742] FATAL: database grahamc does not exist
15:46:08 postgres.1 | 2021-04-05 15:46:08.641 UTC [30759] FATAL: database grahamc does not exist
15:46:09 postgres.1 | 2021-04-05 15:46:09.650 UTC [30765] FATAL: database grahamc does not exist
Co-authored-by: Graham Christensen <graham@grahamc.com>
... but just fixing up merge conflicts from the introduction of flakes
and the removal of the Jobs table.
This is a breaking change. Previously, packages named `packageset.foo`
would be exposed in the fake derivation channel as `packageset-foo`.
Presumably this was done to avoid needing to track attribute sets, and
to avoid the complexity. I think this now correctly handles the
complexity and properly mirrors the input expressions layout.
Previously, the build ID would never flow through channels which
exited.
This patch tracks the buildOne state as part of State and exits avoids
waiting forever for new work.
The code around buildOnly is a bit rough, making this a bit weird to
implement but since it is only used for testing the value of improving
it on its own is a bit questionable.
A reproduce script includes a logline that may resemble:
> using these flags: --arg nixpkgs { outPath = /tmp/build-137689173/nixpkgs/source; rev = "fdc872fa200a32456f12cc849d33b1fdbd6a933c"; shortRev = "fdc872f"; revCount = 273100; } -I nixpkgs=/tmp/build-137689173/nixpkgs/source --arg officialRelease false --option extra-binary-caches https://hydra.nixos.org/ --option system x86_64-linux /tmp/build-137689173/nixpkgs/source/pkgs/top-level/release.nix -A
These are passed along to nix-build and that's fine and dandy, but you can't just copy-paste this as is, as the `{}` introduces a syntax error and the value accompanying `-A` is `''`.
A very naive approach is to just `printf "%q"` the individual args, which makes them safe to copy-paste. Unfortunately, this looks awful due to the liberal usage of slashes:
```
$ printf "%q" '{ outPath = /tmp/build-137689173/nixpkgs/source; rev = "fdc872fa200a32456f12cc849d33b1fdbd6a933c"; shortRev = "fdc872f"; revCount = 273100; }'
\{\ outPath\ =\ /tmp/build-137689173/nixpkgs/source\;\ rev\ =\ \"fdc872fa200a32456f12cc849d33b1fdbd6a933c\"\;\ shortRev\ =\ \"fdc872f\"\;\ revCount\ =\ 273100\;\ \}
```
Alternatively, if we just use `set -x` before we execute nix-build, we'll get the whole invocation in a friendly, copy-pastable format that nicely displays `{}`-enclosed content and preserves the empty arg following `-A`:
```
running nix-build...
using this invocation:
+ nix-build --arg nixpkgs '{ outPath = /tmp/build-138165173/nixpkgs/source; rev = "e0e4484f2c028d2269f5ebad0660a51bbe46caa4"; shortRev = "e0e4484"; revCount = 274008; }' -I nixpkgs=/tmp/build-138165173/nixpkgs/source --arg officialRelease false --option extra-binary-caches https://hydra.nixos.org/ --option system x86_64-linux /tmp/build-138165173/nixpkgs/source/pkgs/top-level/release.nix -A ''
```
By moving the tests subdirectory to t, we gain the ability to run `yath
test` with no arguments from inside `nix develop` in the root of the
the repo.
(`nix develop` is necessary in order to set the proper env vars for
`yath` to find our test libraries.)
This makes the test faster (by removing it and replacing it with a
`TestScmInput` module that exports the `testScmInput` subroutine). Now,
all the input tests can be run in parallel.
Some of the `tests/jobs/*-update.sh` scripts were "broken" (e.g. tests
failed for various reasons on my machine), so I fixed those up as well.
Co-authored-by: gustavderdrache <gustavderdrache@gmail.com>
This will make it easier to track specifically where queries are being
made from (assuming a `log_line_prefix` that includes `%a` in the
postgres configuration).
projects.xml and declarative-projects.xml were merged with xmllint, and
then I ran that to convert files
for i in *.xml; do pandoc -s -f docbook -t markdown $i -o ${i/xml/md}; done
The queue runner used to special-case `localhost` as a remote builder:
Rather than using the normal remote-build (using the
`cmdBuildDerivation` command), it was using the (generally less
efficient, except when running against localhost) `cmdBuildPaths`
command because the latter didn't require a privileged Nix user (so made
testing easier − allowing to run hydra in a container in particular).
However:
1. this means that the build loop can follow two discint code paths depending
on the setup, the irony being that the most commonly used one in production
(the “non-localhost” case) isn't the one used in the testsuite (because all
the tests run against a local store);
2. It turns out that the “localhost” version is buggy in relatively obvious
ways − in particular a failure in a fixed-output derivation or a hash
mismatch isn't reported properly;
3. If the “run in a container” use-case is indeed that important, it can be
(partially) restored using a chroot store (which wouldn't behave excactly
the same way of course, but would be more than good-enough for testing)
The current check happening in jobsets is incorrect.
The wanted constraint is stated as follow :
- If type is 0 (legacy), then the flake field should be null, and
both nixExprInput and nixExprPath should be non-null
- If type is 1 (flake), then the flake field should be non-null, and
both nixExprInput and nixExprPath should be null
The current version will not catch (i.e. it will accept) situations
where you have for instance :
type = 1, nixExprPath null, nixExprInput non-null, flake non-null
This commit fixes that.
I split(ted) that into two constraints, to make it more readable and
easier to extend if a new type appears in the future.
The complete query could be instead :
( type = 0
AND nixExprInput IS NOT NULL AND nixExprPath IS NOT NULL AND flake IS NULL )
OR ( type = 1
AND nixExprInput IS NULL AND nixExprPath IS NULL AND flake IS NOT NULL )
(but an "OR" cannot be split, hence the other formulation)
DBIx likes to eagerly select all columns without a way to really tell
it so. Therefore, this splits this one large column in to its own
table.
I'd also like to make "jobsets" use this table too, but that is on hold
to stop the bleeding caused by the extreme amount of traffic this is
causing.
The database has these constraints:
check ((type = 0) = (nixExprInput is not null and nixExprPath is not null)),
check ((type = 1) = (flake is not null)),
which prevented switching to flakes in a declarative jobspec, since the
nixexpr{path,input} fields were not nulled in such an update
Co-Authored-By: Graham Christensen <graham@grahamc.com>
This search query is pretty heavy. Defaulting to 500 has caused
Hydra's web UI to appear to be down. Since 500 can take it down, users
probably shouldn't be allowed t ask for that many.
Duplicating this data on every record of the builds table cost
approximately 4G of duplication.
Note that the database migration included took about 4h45m on an
untuned server which uses very slow rotational disks in a RAID5 setup,
with not a lot of RAM. I imagine in production it might take an hour
or two, but not 4. If this should become a chunked migration, I can do
that.
Note: Because of the question about chunked migrations, I have NOT
YET tested this migration thoroughly enough for merge.
Looking at AWS' Performance Insights for a Hydra instance, I found
the hydra-queue-runner's query:
select id, buildStatus, releaseName, closureSize, size
from Builds b
join BuildOutputs o on b.id = o.build
where
finished = ?
and (buildStatus = ? or buildStatus = ?)
and path = $1
was the slowest query by at least 10x. Running an explain on this
showed why:
hydra=> explain select id, buildStatus, releaseName, closureSize, size
from Builds b join BuildOutputs o on b.id = o.build where
finished = 1 and (buildStatus = 0 or buildStatus = 6) and
path = '/nix/store/s93khs2dncf2cy273mbyr4fb4ns3db20-MIDIVisualizer-5.1';
QUERY PLAN
------------------------------------------------------------------------
Gather (cost=1000.43..33718.98 rows=2 width=56)
Workers Planned: 2
-> Nested Loop (cost=0.43..32718.78 rows=1 width=56)
-> Parallel Seq Scan on buildoutputs o (cost=0.00..32710.32
rows=1
width=4)
Filter: (path = '/nix/store/s93kh...snip...'::text)
-> Index Scan using indexbuildsonjobsetidfinishedid on builds b
(cost=0.43..8.45 rows=1 width=56)
Index Cond: ((id = o.build) AND (finished = 1))
Filter: ((buildstatus = 0) OR (buildstatus = 6))
(8 rows)
A paralell sequential scan is definitely better than a sequential scan, but the
cost ranging from 0 to 32710 is not great. Looking at the table, I saw the `path`
column is completely unindex:
hydra=> \d buildoutputs
Table "public.buildoutputs"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
build | integer | | not null |
name | text | | not null |
path | text | | not null |
Indexes:
"buildoutputs_pkey" PRIMARY KEY, btree (build, name)
Foreign-key constraints:
"buildoutputs_build_fkey" FOREIGN KEY (build) REFERENCES builds(id)
ON DELETE CASCADE
Since we always do exact matches on the path and don't care about ordering,
and since the path column is very high cardinality a `hash` index is a
good candidate. Note that I did test a btree index and it performed
similarly well, but slightly worse.
After creating the index (this took about 10 seconds) on a test database:
create index IndexBuildOutputsPath on BuildOutputs using hash(path);
We get a *significantly* reduced cost:
hydra=> explain select id, buildStatus, releaseName, closureSize, size
hydra-> from Builds b join BuildOutputs o on b.id = o.build where
hydra-> finished = 1 and (buildStatus = 0 or buildStatus = 6) and
hydra-> path = '/nix/store/s93khs2dncf2cy273mbyr4fb4ns3db20-MIDIVisualizer-5.1';
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.43..41.41 rows=2 width=56)
-> Index Scan using buildoutputs_path_hash on buildoutputs o (cost=0.00..16.05 rows=3 width=4)
Index Cond: (path = '/nix/store/s93khs2dncf2cy273mbyr4fb4ns3db20-MIDIVisualizer-5.1'::text)
-> Index Scan using indexbuildsonjobsetidfinishedid on builds b (cost=0.43..8.45 rows=1 width=56)
Index Cond: ((id = o.build) AND (finished = 1))
Filter: ((buildstatus = 0) OR (buildstatus = 6))
(6 rows)
For direct comparison, the overall query plan was changed:
From: Gather (cost=1000.43..33718.98 rows=2 width=56)
To: Nested Loop (cost= 0.43.....41.41 rows=2 width=56)
and the query plan for buildoutputs changed from a maximum cost of
32,710 down to 16.
In practical terms, the query's planning and execution time was reduced:
Before (ms) | Try 1 | Try 2 | Try 3
------------+---------+---------+--------
Planning | 0.898 | 0.416 | 0.383
Execution | 138.644 | 172.331 | 375.585
After (ms) | Try 1 | Try 2 | Try 3
------------+---------+---------+--------
Planning | 0.298 | 0.290 | 0.296
Execution | 219.625 | 0.035 | 0.034
Requires the following configuration options
enable_github_login = 1
github_client_id
github_client_secret
Or github_client_secret_file which points to a file with the secret
Fixes this error:
ERROR: failed to process declarative jobset test:inputs,
DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st
execute failed: ERROR: null value in column "emailoverride" violates
not-null constraint
This would start happening if the network connection between the Hydra
server and the remote build server breaks after sucessfully importing
at least one output of a derivation, but before having finished
importing all outputs.
Fixes#816.
These make the hydra-queue-runner logs very noisy even when not using the GitlabStatus plugin.
Also, they shouldn't be necessary except when developing the plugin itself and should have been removed before release.
It might happen that a job from the aggregate returned an error!
This is what the vague "[json.exception.type_error.302] type must be string, but is null"
was all about in this instance; there was no `drvPath` to stringify!
So we now actively watch for errors and copy them to the aggregate job.
The vague "[json.exception.type_error.302] type must be string, but is null"
is **absolutely** unhelpful in the way Hydra currently handles it on
evaluation.
This is handling *unexpected* errors only; the following commit will
handle the specific instance of the previously mentioned error.
Also add nix to passthru. This makes it easier to override nix in 'nix
develop', e.g.
$ nix develop \
--redirect .#hydraJobs.build.x86_64-linux.nix ~/Dev/nix/outputs/out \
--redirect .#hydraJobs.build.x86_64-linux.nix.dev ~/Dev/nix/outputs/dev
Recently a few internal APIs have changed[1]. The `outputPaths` function
has been removed and a lot of data structures are modeled with
`std::optional` which broke compilation.
This patch updates the code in `hydra-queue-runner` accordingly to make
sure that Hydra compiles again.
[1] https://github.com/NixOS/nix/pull/3883
With the current implementation, if ANY hash was found inside the decl
spec, the spec would be treated as static. This is problematic since
`inputs` is a hash and hence any configuration would be handled as a
static one.
This fixes the code to match the documentation and only switch to static
processing when ALL values are hashes.
In this newly added test an OpenLDAP server will provide one user
(called `user`) and it will be attempted to login as that said user.
Also logging in with any other password must fail.
Nixpkgs doesn't currently provide these required packages. In order to
use this feature without waiting for a newer release of NixOS/Nixpkgs
thes have been packages inline.
* Fix issue #614: restart queue/evaluator on sufficient disk space available.
* Only try to stop the service if it is currently running.
* Use named variables and added restarting message.
The previous version hard-coded the cache check frequency to 30
seconds. This meant that the path was checked very frequently (max of
30 seconds and the evaluation period of the job), which could be
problematic for URL PathInput specifications, and especially ones that
are automatically updated frequently without *each* update necessarily
being interesting (an example: the haskell hackage index file.)
As of https://github.com/NixOS/hydra/pull/737 (removal of sqlite
dependency), the only supported database is Postgresql.
This change removes all references to hydra-postgresql.sql file. This
file is generated using a cpp on hydra.sql, but doesn't differ from
hydra.sql at all.
PathInput plugin keeps a cache of path evaluations. This cache is simple, and
path is not checked more than once every N seconds, where N=30. The caching is
there to avoid expensive calls to `nix-store --add`.
This change makes the validity period configurable. The main use case is
`api-test.pl` which was implemented wrong for a while, as the invocation of
`hydra-eval-jobset` would return the previous evaluation, claiming there are no
changes. The test has been fixed to check better for a new evaluation.
`build_finished` Postgres event will never be fired for the dependent builds.
For example, on our Hydra, the following query always returns increasing
numbers, even though all notifications have been delivered:
```
hydra=> select count(1) from builds where notificationpendingsince is not null;
count
-------
4583
(1 row)
```
Thus, we have to iterate over all dependent builds and mark their
`notificationpendingsince` as `null`, otherwise they will pile up until
the next restart of hydra-notify, when they will get delivered.
When deploying Hydra different than hydra.nixos.org one may encounter a problem
as building any job that uses IFD fails with:
May 22 19:41:07 hydra hydra-evaluator[6960]: error: "attempted to realize '/nix/store/1jm02mfiv58rpy8zrx95cpqxzsp64ssh-source.drv' during evaluation but 'allow-import-from-derivation' is false"
May 22 19:41:07 hydra hydra-evaluator[6960]: error: "attempted to realize '/nix/store/av3jr8ix4qcadq2wm3y3hplvxwzlhl4y-source.drv' during evaluation but 'allow-import-from-derivation' is false"
May 22 19:41:07 hydra hydra-evaluator[6960]: error: "attempted to realize
'/nix/store/2jm02mfiv58rpy8zrx95cpqxzsp64ssh-source.drv' during evaluation but
'allow-import-from-derivation' is false"
The recent change enforced passing `--no-allow-import-from-derivation`
to `hydra-eval-job` unconditionally. This change makes it configurable and
defaults to **NOT PASSING IT** -- most of the deployments allow IFDs.
The configuration option is called `allow_import_from_derivation` and
defaults to `true`. It is interpreted as a boolean, with only true option being
`true`.
Taken from `Perl::Critic`:
A common idiom in perl for dealing with possible errors is to use `eval`
followed by a check of `$@`/`$EVAL_ERROR`:
eval {
...
};
if ($EVAL_ERROR) {
...
}
There's a problem with this: the value of `$EVAL_ERROR` (`$@`) can change
between the end of the `eval` and the `if` statement. The issue are object
destructors:
package Foo;
...
sub DESTROY {
...
eval { ... };
...
}
package main;
eval {
my $foo = Foo->new();
...
};
if ($EVAL_ERROR) {
...
}
Assuming there are no other references to `$foo` created, when the
`eval` block in `main` is exited, `Foo::DESTROY()` will be invoked,
regardless of whether the `eval` finished normally or not. If the `eval`
in `main` fails, but the `eval` in `Foo::DESTROY()` succeeds, then
`$EVAL_ERROR` will be empty by the time that the `if` is executed.
Additional issues arise if you depend upon the exact contents of
`$EVAL_ERROR` and both `eval`s fail, because the messages from both will
be concatenated.
Even if there isn't an `eval` directly in the `DESTROY()` method code,
it may invoke code that does use `eval` or otherwise affects
`$EVAL_ERROR`.
The solution is to ensure that, upon normal exit, an `eval` returns a
true value and to test that value:
# Constructors are no problem.
my $object = eval { Class->new() };
# To cover the possiblity that an operation may correctly return a
# false value, end the block with "1":
if ( eval { something(); 1 } ) {
...
}
eval {
...
1;
}
or do {
# Error handling here
};
Unfortunately, you can't use the `defined` function to test the result;
`eval` returns an empty string on failure.
Various modules have been written to take some of the pain out of
properly localizing and checking `$@`/`$EVAL_ERROR`. For example:
use Try::Tiny;
try {
...
} catch {
# Error handling here;
# The exception is in $_/$ARG, not $@/$EVAL_ERROR.
}; # Note semicolon.
"But we don't use DESTROY() anywhere in our code!" you say. That may be
the case, but do any of the third-party modules you use have them? What
about any you may use in the future or updated versions of the ones you
already use?
I came across https://github.com/NixOS/hydra/issues/751 and realized
that hydra-notify is responsible for creating the additional jobsets in
a declarative file. My declarative testing works in dev now.
The original code would return standard "Please come back later" page when there
are only fetch errors on a newly setup declarative project. The problem is that
there are two types of errors: standard errors and fetch errors. Each is
acompanied by a corresponding field for time of occurence. Standard errors use
'errortime', while fetch errors have 'lastchecktime' set to the time of the
error. Unfortunately, jobset.tt file was only using 'errortime' for displaying
the time. This would result in the following errors in logs:
Couldn't render template "date error - bad time/date string: expects 'h:m:s d:m:y' got: ''
This change includes using 'lastchecktime' when rendering the error times.
The current implementation will pass all values to `create_or_update` method. The
missing values will end up as `undef` (or `NULL`) when assigned to `%update`.
Thus, for columns that are NOT NULL, when, for example, flakes are not used,
will result in a horrible:
DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st execute failed:
ERROR: null value in column "type" violates not-null constraint
DETAIL: Failing row contains (.jobsets, 118, hydra, hydra jobsets, src, hydra/jobsets.nix, null,
null, null, 1589536378, 1, 0, 0, , 3, 30, 100, null, null, 1589536379, null, null). [for Statement
"UPDATE jobsets SET checkinterval = ?, description = ?, enableemail = ?, nixexprinput = ?,
nixexprpath = ?, type = ? WHERE ( ( name = ? AND project = ? ) )" with ParamValues: 1='30',
2='hydra jobsets', 3='0', 4='src', 5='hydra/jobsets.nix', 6=undef, 7='.jobsets', 8='hydra'] at
/nix/store/lsf81ip9ybxihk5praf2n0nh14a6i9j0-hydra-0.1.19700101.DIRTY/libexec/hydra/lib/Hydra/Helper/AddBuilds.pm line 50
This change just omits adding such values to `%update`, which results in
PostgreSQL assigning the default values.
This adds a `devShell` which unlike `runHydra` doesn't start hydra
automatically and doesn't receive hydra as build input. It is better
suited for interactive development cycles:
```
$ nix-shell -A devShell
$ ./bootstrap
$ configurePhase
$ make
$ # hack hack hack
$ foreman start
# test test test
<C-c>
$ # hack hack hack
```
runHyda automatically starts hydra and postgres:
```
$ nix-shell -A runHydra
```
The shell receives hydra from the working copy as buildInput.
Running hydra, queue-runner, evaluator and postgres is managed
by foreman (https://github.com/ddollar/foreman) and configured
in `Procfile`.
The previous code converted option values to ints when the value
contained a digit somewhere. This is too eager since it also converts
strings like `release-0.2` to an int which should not happen.
We now only convert to int when the value is an integer.
This plugin is a counterpart to GithubPulls plugin. Instead of fetching pull
requests, it will fetch all references (branches and tags) that start with a
particular prefix.
The plugin is a copy of GithubPulls plugin with appropriate changes to call the
right API and parse the config matching the need.
To quote the function's comment:
Awful hack to handle timeouts in SQLite: just retry the transaction.
DBD::SQLite *has* a 30 second retry window, but apparently it
doesn't work.
Since SQLite is now dropped entirely, this wrapper can be removed
completely.
SQLite isn't properly supported by Hydra for a few years now[1], but
Hydra still depends on it. Apart from a slightly bigger closure this can
cause confusion by users since Hydra picks up SQLite rather than
PostgreSQL by default if HYDRA_DBI isn't configured properly[2]
[1] 78974abb69
[2] https://logs.nix.samueldr.com/nixos-dev/2020-04-10#3297342;
In Nixpkgs 20.03, Mercurial fails if PYTHONPATH is set:
$ hg
Traceback (most recent call last):
File "/nix/store/q7s856v6nw4dffdrm9k3w38qs35i8kr3-mercurial-5.2.2/bin/..hg-wrapped-wrapped", line 37, in <module>
dispatch.run()
File "/nix/store/bffdy7q3wi3qinflnvbdkigqj39zzynd-python3-3.7.6/lib/python3.7/importlib/util.py", line 245, in __getattribute__
self.__spec__.loader.exec_module(self)
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/nix/store/q7s856v6nw4dffdrm9k3w38qs35i8kr3-mercurial-5.2.2/lib/python3.7/site-packages/mercurial/dispatch.py", line 10, in <module>
import difflib
File "/nix/store/bffdy7q3wi3qinflnvbdkigqj39zzynd-python3-3.7.6/lib/python3.7/difflib.py", line 1084, in <module>
import re
File "/nix/store/bffdy7q3wi3qinflnvbdkigqj39zzynd-python3-3.7.6/lib/python3.7/re.py", line 143, in <module>
class RegexFlag(enum.IntFlag):
AttributeError: module 'enum' has no attribute 'IntFlag'
(cherry picked from commit 4009d4295e)
If we don't see machine that supports a build step for
'max_unsupported_time' seconds, the step is aborted. The default is 0,
which is appropriate for Hydra installations that don't provision
missing machines dynamically.
(cherry picked from commit f5cdbfe21d)
In Nixpkgs 20.03, Mercurial fails if PYTHONPATH is set:
$ hg
Traceback (most recent call last):
File "/nix/store/q7s856v6nw4dffdrm9k3w38qs35i8kr3-mercurial-5.2.2/bin/..hg-wrapped-wrapped", line 37, in <module>
dispatch.run()
File "/nix/store/bffdy7q3wi3qinflnvbdkigqj39zzynd-python3-3.7.6/lib/python3.7/importlib/util.py", line 245, in __getattribute__
self.__spec__.loader.exec_module(self)
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/nix/store/q7s856v6nw4dffdrm9k3w38qs35i8kr3-mercurial-5.2.2/lib/python3.7/site-packages/mercurial/dispatch.py", line 10, in <module>
import difflib
File "/nix/store/bffdy7q3wi3qinflnvbdkigqj39zzynd-python3-3.7.6/lib/python3.7/difflib.py", line 1084, in <module>
import re
File "/nix/store/bffdy7q3wi3qinflnvbdkigqj39zzynd-python3-3.7.6/lib/python3.7/re.py", line 143, in <module>
class RegexFlag(enum.IntFlag):
AttributeError: module 'enum' has no attribute 'IntFlag'
Apparentely, buildEnv in 20.03 no longer respects
propagated-build-inputs.
Note that the use of a library function (closePropagation) seems
fundamentally wrong to me - propagated-build-inputs should be used at
runtime, not at evaluation time.
If we don't see machine that supports a build step for
'max_unsupported_time' seconds, the step is aborted. The default is 0,
which is appropriate for Hydra installations that don't provision
missing machines dynamically.
When I browse failed builds in a jobset-eval on Hydra, I regularly
mistake actual build-failures with temporary issues like timeouts (that
probably disappear at the next eval).
To prevent this kind of issue, I figured that using the stopsign-svg for
builds with timeouts or exceeded log-limits is a reasonable choice for
the following reasons:
* A user can now distinguish between actual build-errors (like
compilation-failures or oversized outputs) and (usually) temporary issues
(like a bloated log or a timeout).
* The stopsign is also used for aborted jobs that are shown in a
different tab and can't be confused with timeouts for that reason.
Declarative jobsets were broken by the Nix update, causing
nix cat-file to break silently.
This commit restores declarative jobsets, based on top of a commit
making it easier to see what broke.
In the past, jobsets which are automatically evaluated are evaluated
regularly, on a schedule. This schedule means a new evaluation is
created every checkInterval seconds (assuming something changed.)
This model works well for architectures where our build farm can
easily keep up with demand.
This commit adds a new type of evaluation, called ONE_AT_A_TIME, which
only schedules a new evaluation if the previous evaluation of the
jobset has no unfinished builds.
This model of evaluation lets us have 'low-tier' architectures.
For example, we could now have a jobset for ARMv7l builds, where
the buildfarm only has a single, underpowered ARMv7l builder.
Configuring that jobset as ONE_AT_A_TIME will create an evaluation
and then won't schedule another evaluation until every job of
the existing evaluation is complete.
This way, the cache will have a complete collection of pre-built
software for some commits, but the underpowered architecture will
never become backlogged in ancient revisions.
A postgresql column which is non-null and unique is treated with
the same optimisations as a primary key, so we have no need to
try and recreate the `id` as the primary key.
No read paths are impacted by this change, and the database will
automatically create an ID for each insert. Thus, no code needs to
change.
hydra.nixos.org is already running this rev, and it should be safe to
apply to everyone else. If we make changes to this migration, we'll
need to write another migration anyway.
Lowercasing is due to postgresql not having case-sensitive table names.
It always technically workde before, but those table names never
existed literally.
The switch to generating from postgresql is to handle an upcoming
addition of an auto-incrementign ID to the Jobset table. Sqlite doesn't
seem to be able to handle the table having an auto incrementing ID
field which isn't the primary key, but we can't change the primary
key trivially.
Since hydra doesn't support sqlite and hasn't for many year anyway,
it is easier to just generate from pgsql directly.
Building on macOS with the latest nixpkgs master and NixOS/nixpkgs#77147
fails. It seems some `std::experimental` (optional) for instance are
not available as `experimental`, but are in `std`. Also `toJSON` is
missing for `atomic< unsigned long long >`.
In a NixOS container, cmdBuildDerivation doesn't work because we're
not privileged. But we also don't need it because the store already
has the derivation.
Also, don't copy from/to the store since this gives errors about
missing signatures.
It contains Hydra, PostgreSQL and an frontend proxy. So you can get a
running Hydra instance by doing
$ nixos-container create hydra --flake hydra
$ nixos-container start hydra
The web interface is available on port 80.
This removes a supper annoying set of messages that polute the logs:
Aug 30 09:00:30 xxx.compute.internal hydra-server[957]: Trouble trying to detect your terminal size, looking at $ENV{COLUMNS}
Aug 30 09:00:30 xxx.compute.internal hydra-server[957]: Term::Size::Any is not installed, can't autodetect terminal column width
This attribute allows to know if an error occurred or not: when an
error occurs, errormsg is not an empty string. Note we can not use the
errormsg attribute because it can be arbitrarily long and is excluded
from the jobset API response.
This adds the following (pre-existing) attributes to the jobset response:
- nrtotal
- lastcheckedtime
- starttime
- checkinterval
- triggertime
- fetcherrormsg
- errortime
May 15 09:20:10 chef hydra-queue-runner[27523]: Hydra::Plugin::GitlabStatus=HASH(0x519a7b8)->buildFinished: Can't call method "value" on an undefined value at /nix/store/858hinflxcl2jd12wv1r3a8j11ybsf6w-hydra-0.1.2629.89fa829/libexec/hydra/lib/Hydra/Plugin/GitlabStatus.pm line 57.
(cherry picked from commit 438ddf5289)
Plugins are now disabled at startup time unless there is some relevant
configuration in hydra.conf. This avoids hydra-notify having to do a
lot of redundant work (a lot of plugins did a lot of database queries
*before* deciding they were disabled).
Note: BitBucketStatus users will need to add 'enable_bitbucket_status
= 1' to hydra.conf.
* 'eval_started' has the format '<tmpId>\t<project>\t<jobset>'.
* 'eval_failed' has the format '<tmpId>'. (The cause of the error can
be found in the database.)
* 'eval_added' has the format '<tmpId>:<evalId>'.
It now receives notifications about started/finished builds/steps via
PostgreSQL. This gets rid of the (substantial) overhead of starting
hydra-notify for every event. It also allows other programs (even on
other machines) to listen to Hydra notifications.
This adds a `InfluxDBNotification` plugin which is configured as:
```
<influxdb>
url = http://127.0.0.1:8086
db = hydra
</influxdb>
```
which will write a notification for every finished job to the
configured database in InfluxDB looking like:
```
hydra_build_status,cached=false,job=job,jobset=default,project=sample,repo=default,result=success,status=success,system=x86_64-linux build_id="1",build_status=0i,closure_size=584i,duration=0i,main_build_id="1",queued=0i,size=168i 1564156212
```
The creation of the `pg_trgm` extension needs superuser power. So,
this patch makes the extension creation in the Hydra NixOS module when
a local database is used.
If it is not possible to create this extension (remote database for
instance with nosuperuser), the creation of the `pg_trgm` index is
skipped (this index speedup queries on builds.drvpath) and warnings
are emitted:
initialising the Hydra database schema...
WARNING: Can not create extension pg_trgm: permission denied to create extension "pg_trgm"
WARNING: HINT: Temporary provide superuser role to your Hydra Postgresql user and run the script src/sql/upgrade-57.sql
WARNING: The pg_trgm index on builds.drvpath has been skipped (slower complex queries on builds.drvpath)
This allows to keep smooth migrations: the migration process doesn't
require a manual step (but this manual step is recommended on big
remote databases).
The search query uses the LIKE operator which requires a sequential
scan (it can't use the already existing B-tree index). This new
index (trigram) avoids a sequential scan of the builds table when the
LIKE operator is used.
Here is the analyze of a request on the builds table with this index:
explain analyze select * from builds where drvpath like '%k3r71gz0gv16ld8rhcp2bb8gb5w1xc4b%';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on builds (cost=128.00..132.01 rows=1 width=492) (actual time=0.070..0.077 rows=1 loops=1)
Recheck Cond: (drvpath ~~ '%k3r71gz0gv16ld8rhcp2bb8gb5w1xc4b%'::text)
-> Bitmap Index Scan on indextrgmbuildsondrvpath (cost=0.00..128.00 rows=1 width=0) (actual time=0.047..0.047 rows=3 loops=1)
Index Cond: (drvpath ~~ '%k3r71gz0gv16ld8rhcp2bb8gb5w1xc4b%'::text)
Total runtime: 0.206 ms
(5 rows)
Currently, a full store path has to be provided to search in
builds. This patch permits to search jobs with a output path or
derivation hash.
Usecase: we are building Docker images with Hydra. The tag of the
Docker image is the hash of the image output path. This patch would
allow us to find back the build job from the tag of a running
container image.
May 15 09:20:10 chef hydra-queue-runner[27523]: Hydra::Plugin::GitlabStatus=HASH(0x519a7b8)->buildFinished: Can't call method "value" on an undefined value at /nix/store/858hinflxcl2jd12wv1r3a8j11ybsf6w-hydra-0.1.2629.89fa829/libexec/hydra/lib/Hydra/Plugin/GitlabStatus.pm line 57.
No more need for a reproduction script! It just says something like
If you have Nix installed, you can reproduce this build on your own
machine by running the following command:
# nix build github:edolstra/dwarffs/09c823e977946668b63ad6c88ed358b48220f124:hydraJobs.build.x86_64-linux
This plugin expects as inputs to a jobset the following:
- gitlab_status_repo => Name of the repository input pointing to that
status updates should be POST'ed, i.e. the jobset has a git input
"nixexprs": "https://gitlab.example.com/project/nixexprs", in which
case "gitlab_status_repo" would be "nixexprs".
- gitlab_project_id => ID of the project in Gitlab, i.e. in the above
case the ID in gitlab of "nixexprs"
Without this patch running the following on MacOS:
nix-build release.nix -A build.x86_64-linux
results in the following error during the configuration phase (note that Nix
should be configured with a x86_64-linux build machine):
building '/nix/store/jb6ca1gmplyb69ayd43z7fb0y9npxd53-hydra-0.1.2581.8b5948f4cf12424c04df67a6eb136c9846fb2cfd.drv' on 'ssh://my-linux-build-machine'...
...
checking whether /nix/store/s6bhdppx66bkgf741vk4d29hgsj1h1zp-hydra-perl-deps/bin/nix-store is recent enough... ./configure: line 16254: /nix/store/s6bhdppx66bkgf741vk4d29hgsj1h1zp-hydra-perl-deps/bin/nix-store: cannot execute binary file: Exec format error
no
configure: error: `/nix/store/s6bhdppx66bkgf741vk4d29hgsj1h1zp-hydra-perl-deps/bin/nix-store' doesn't support `--timeout'; please use a newer version.
build time elapsed: 0m1.624s 0m1.774s 0m9.366s 0m6.110s
builder for '/nix/store/jb6ca1gmplyb69ayd43z7fb0y9npxd53-hydra-0.1.2581.8b5948f4cf12424c04df67a6eb136c9846fb2cfd.drv' failed with exit code 1
This problem is that the `nix` dependency of hydra is selected from a nixpkgs
set configured with a default `system` parameter,
i.e. `builtin.currentSystem`. This means that the hydra derivation which is
build for and on Linux depends on the nix derivation build for Darwin.
The fix is to select nix from the nixpkgs set configured with a system specified
by the user.
Prior, tests would all fail to build, causing, roughly, the following
error (roughly, because I added some debug log messages :)):
ok 68 - Evaluating jobs/build-products.nix should result in 2 builds
Queue runner stderr: using 4185024512 bytes for the NAR buffer
locking path '/build/source/tests/data/queue-runner/lock'
lock acquired on '/build/source/tests/data/queue-runner/lock.lock'
warning: unknown setting 'max-connection-age'
warning: unknown setting 'max-connections'
dispatcher woken up
dispatcher woken up
dispatcher sleeping for 7674380800s
adding new machine ‘localhost’
dispatcher woken up
checking the queue for builds > 0...
dispatcher sleeping for 7674380800s
sending notification about build 1
loading build 18 (tests:build-products:simple)
considering derivation ‘/build/source/tests/nix/store/24h0i450d4k00a4jhhk6r7qpqdvzskw6-build-product-simple.drv’
sending notification about build 2
creating build step ‘/build/source/tests/nix/store/24h0i450d4k00a4jhhk6r7qpqdvzskw6-build-product-simple.drv’
added build 18 (top-level step /build/source/tests/nix/store/24h0i450d4k00a4jhhk6r7qpqdvzskw6-build-product-simple.drv, 1 new steps)
got 1 new runnable steps from 1 new builds
step ‘/build/source/tests/nix/store/24h0i450d4k00a4jhhk6r7qpqdvzskw6-build-product-simple.drv’ is now runnable
dispatcher woken up
dispatcher sleeping for 7674380800s
performing step ‘/build/source/tests/nix/store/24h0i450d4k00a4jhhk6r7qpqdvzskw6-build-product-simple.drv’ 1 times on ‘localhost’ (needed by build 18
and 0 others)
sending closure of ‘/build/source/tests/nix/store/24h0i450d4k00a4jhhk6r7qpqdvzskw6-build-product-simple.drv’ to ‘localhost’
building ‘/build/source/tests/nix/store/24h0i450d4k00a4jhhk6r7qpqdvzskw6-build-product-simple.drv’ on ‘localhost’
killing process 10462
marking build 18 as failed
finishing build step ‘/build/source/tests/nix/store/24h0i450d4k00a4jhhk6r7qpqdvzskw6-build-product-simple.drv’
ok 69 - Build 'simple' from jobs/build-products.nix should exit with code 0
ok 70 - newbuild->finished was '1' instead of 1
not ok 71 - newbuild->buildstatus was '1' instead of 0
not ok 72 - Build 'simple' from jobs/build-products.nix should have buildstatus 0
Can't call method "name" on an undefined value at ./evaluation-tests.pl line 173.
FAIL: evaluation-tests.pl
The hydra-queue-runner opens a connection to the builder. If the
builder is 'localhost' it starts `nix-store`, otherwise it starts
'ssh'.
Currently, if the hydra-queue-runner can not start `nix-store` (not in
the PATH for instance), the error message is:
cannot connect to ‘localhost’: error: cannot start ssh: No such file
or directory
This is not useful since ssh is actually not started:/
With this patch the error message is now:
cannot connect to ‘localhost’: error: cannot start nix-store: No such file
or directory
Some time ago the data structure for maintainer descriptions in
`nixpkgs` changed from a simple attr set with maintainer emails as
values to an attribute set where the maintainer' nick is associated to
an attribute set with email, GitHub handle and full name.
Hydra can either parse a Nix list or fetches `shortName` from the
associated attribute set (which is used for `meta.licenses` as each
value in it contains a `shortName`). This behavior needs to be
replicated for maintainers to retrieve the emails for `hydra-notify`.
This change is backwards-compatible since `queryMetaStrings` is still
able to understand lists, so old versions of `nixpkgs` or packages using
the old maintainer data structure remain usable.
This is because setting only the initial heap size to more than
the default value (or the configured value) will cause all initial evals
until maxHeapSize expands to the given value to abort.
The 1.1 multiplier comes from the the configured defaults on NixOS' hydra,
and from the previous multiplier used before
7876cf677c.
This is caused by the autoconf check for `nix-store` which is equivalent
to running this:
```
$ nix-store --timeout 123 -q
```
This would open the pager on a 2.1.1 version of nix.
```
$ nix-store --version
nix-store (Nix) 2.1.1
```
Setting `PAGER` to `cat` ensures the pager doesn't block the
configurePhase.
```
$ PAGER=cat nix-store --timeout 123 -q
```
In order to access protected or private repositories. Using the target
repository URL along with the merge-request ref instead of the source
repository url and branch is necessary to avoid running into issues if
the source repository is not actually accessible to the user Hydra is
authenticating as.
Thanks Alexei Robyn for this patch.
The PathInput input for local paths was previously enhanced to allow
URLs for which it would use a nix-prefetch-url operation. This change
updates the prompt for the declarative input type to indicate this
capability.
When I press "n builds omitted" I get back to the first tab of a jobset.
This is extremely counter-intuitive, instead this notice should link to
the currently opened tab.
The job has been failing since https://hydra.nixos.org/eval/1461286
with the following error:
hydra-eval-jobs.cc:278:17: error: 'evalSettings' was not declared in this scope
evalSettings.restrictEval = true;
^~~~~~~~~~~~
This is likely due to a typo in 0882519 where that line and the
corresponding comment were moved, and `settings` was changed in that
one place to `evalSettings`.
I reproduced the error by running `nix-build release.nix -A
build.x86_64-linux` on my machine, and this small change fixes it.
You can now set 'evaluator_max_heap_size' to make hydra-eval-jobs
restart itself if the Boehm heap exceeds the specified size.
For example, with 'evaluator_max_heap_size = 256000000',
$ hydra-eval-jobs '<nixpkgs/pkgs/top-level/release.nix>' -I nixpkgs=channel:nixos-17.09
has a max RSS of .56 GiB rather than 4.7 GiB.
Unfortunately it doesn't help much for the NixOS jobsets because of
the "tested" job which requires a huge amount of memory all by itself.
This cannot be done in the hydra-evaluator systemd unit, since then
every other Nix process (e.g. hydra-evaluator and nix-prefetch-*) will
also allocate the specified heap size, probably leading to OOM.
This is a good way to make Hydra hang. (E.g. we had a deletion of
nixos:gcc-7 running for > 12 hours and blocking UPDATE statements from
hydra-queue-runner.) Generally it's better to just disable/hide an old
jobset anyway.
Frequently users want Hydra access just to restart jobs. However,
prior to this commit the only way to grant that access was by giving
them full Admin access which isn't necessarily what we want to do.
By having a restart-jobs role, we can grant this privilege to users
who are known to the community and want to help, but aren't long-time
members.
I haven't tested this commit, but it looks good to me...
When using the "build" or "sysbuild" jobset input types in conjunction
with a binary cache store, the evaluator needs to be able to fetch
store paths from the binary cache. Typical usage:
store_uri = s3://nix-test-cache?secret-key=...
eval_substituter = s3://nix-test-cache
Also, the public key of the binary cache must be added to
binary-cache-public-keys in nix.conf, otherwise the local nix-daemon
won't allow the store paths to be copied over.
Also, remove support in hydra-eval-jobs for multiple jobset input
alternatives. The web interface hasn't supported this in a long
time. Thus we can use the regular "--arg" handler.
This makes downloading/viewing build results work with binary cache
stores. For good performance, this should be used in conjunction with
ca580bec35,
i.e. you should set store_uri to something like
s3://my-cache?local-nar-cache=/tmp/nar-cache
to cache NARs between requests.
When creating a Hydra user with the `hydra-create-user` command, you can now
provide a SHA1 password hash with the `--password-hash` flag. This is useful for
the upcoming work on Fully Declarative Hydra, since the end user should not have
to specify plaintext passwords in their `configuration.nix` file.
Thus, we no longer hold the send lock while substituting missing paths
on the build machine. This is a good thing in particular for macOS
builders which have a tendency to hang forever in curl downloads.
Previously, when hydra-queue-runner was restarted, any pending "build
finished" notifications were lost. Now hydra-queue-runner marks
finished but unnotified builds in the database and uses that to run
pending notifications at startup.
The queue runner can now run up to ‘max-concurrent-notifications’ in
parallel (default is 2). This is useful when some hydra-notify
invocations can take a long time to complete (e.g. because they need
to compress a giant build log) and we don't want this to block all
other notifications.
As @dtzWill discovered, with the concurrent hydra-evaluator, there can
be multiple active transactions adding builds to the database. As a
result, builds can become visible in a non-monotonically increasing
order, breaking the queue monitor's assumption that build IDs only go
up.
The fix is to have hydra-eval-jobset provide the lowest build ID it
just added in the builds_added notification, and have the queue
monitor check from there.
Fixes#496.
This plugin will post to the build status system in BitBucket. In order
to use it you need to add to ExtraConfig
<bitbucket>
username = bitbucket_username
password = bitbucket_password
</bitbucket>
You can use an application password https://blog.bitbucket.org/2016/06/06/app-passwords-bitbucket-cloud/
This can take an excessive amount of time. For example, on
hydra.nixos.org, a call to hydra-notify takes 0.7s even if there are
no plugins. So for an eval with ~45K new builds, the calls to
hydra-notify add up to about 9 hours.
The proper fix would be to pass a list of build IDs, or an eval ID.
This can be used with declarative projects to build PRs.
The github_authorization section should contain verbatim Authorization header contents keyed by repo owner for private repos
1. From the hydra configuration file.
The configuration is loaded from the "git-input" block.
Currently only the "timeout" variable is been looked up in the file.
<git-input>
# general timeout
timeout = 400
<input-name>
# specific timeout for a particular input name
timeout = 400
</input-name>
# use quotes when the input name has spaces
<"foot with spaces">
# specific timeout for a particular input name
timeout = 400
</"foo with spaces">
</git-input>
2. As an argument in the input value after the repo url and branch (and after the deepClone if is defined)
"timeout=<value>"
The preference on which value is used:
1. input value
2. Block with the name of the input in the <git-input> block
3. "timeout" inside the <git-input> block
4. Default value of 600 seconds. (original hard-coded value)
The code is generalized for more values to be configured, it might be too much
for a single value on a single plugin.
This reverts commit 949e5865c6. This
makes release.nix harder to read/maintain IMHO. There already is a Nix
expression for Hydra in Nixpkgs that can be used for this purpose.
Documents all the endpoints that can be used to retrieve data from
the API without authenticating.
Authentication and manipulating data with the API is not documented.
Adding a 96-core aarch64 build machine to the build farm caused the
potential number of database connections to increase a lot, so we
started hitting the Postgres connection limit.
* The "Jobset" page now shows when evaluations are in progress (rather
than just pending).
* Restored the ability to do a single evaluation from the command line
by doing "hydra-evaluator <project> <jobset>".
* Fix some consistency issues between jobset status in PostgreSQL and
in hydra-evaluator. In particular, "lastCheckedTime" was never
updated internally.
Setting
xxx-jobset-repeats = patchelf:master:2
will cause Hydra to perform every build step in the specified jobset 2
additional times (i.e. 3 times in total). Non-determinism is not fatal
unless the derivation has the attribute "isDeterministic = true"; we
just note the lack of determinism in the Hydra database. This will
allow us to get stats about the (lack of) reproducibility of all of
Nixpkgs.
Builds can now specify the attribute "isDeterministic = true" to tell
Hydra to build with build-repeat > 0. If there is a mismatch between
rounds, the step / build fails with a suitable status.
Maybe this should be a meta attribute, but that makes it invisible to
hydra-queue-runner, and it seems reasonable to make a claim of
mandatory determinism part of the derivation (since e.g. enabling this
flag should trigger a rebuild).
We now take into account the memory necessary for compressing the NAR
being exported to the binary cache, plus xz compression overhead.
Also, we now release the memory tokens for the NAR accessor *after*
releasing the NAR accessor. Previously the memory for the NAR accessor
might still be in use while another thread does an allocation, causing
the maximum to be exceeded temporarily.
Also, use notify_all instead of notify_one to wake up memory token
waiters. This is not very nice, but not every waiter is requesting the
same number of tokens, so some might be able to proceed.
If a step is cancelled just as its builder step is starting,
doBuildStep() will return sRetry. This causes builder() to make the
step runnable again, since the queue monitor may have added new builds
referencing it. The idea is that if the latter condition is not true,
the step's reference count will drop to zero and it will be
deleted. However, if the dispatcher thread sees and locks the step
before the reference count can drop to zero in the builder thread, the
dispatcher thread will start a new builder thread for the step. Thus
the step can be kept alive for an indefinite amount of time.
The fix is for State::builder() to use a weak pointer to the step, to
ensure that the step's reference count can drop to zero *before* it's
added to the runnable queue.
This was a bad idea because pthread_cancel() is unsalvageable broken
in C++. Destructors are not allowed to throw exceptions (especially in
C++11), but pthread_cancel() can cause a __cxxabiv1::__forced_unwind
exception inside any destructor that invokes a cancellation
point. (This exception can be caught but *must* be rethrown.) So let's
just kill the builder process instead.
It was hitting
assert(reservation.unique());
Since we do want the machine reservation to be released before calling
wakeDispatcher(), let's use a different object for keeping track of
active steps.
We now kill active build steps when there are no more referring
builds. This is useful e.g. for preventing cancelled multi-hour TPC-H
benchmark runs from hogging build machines.
If two active steps of the same build failed, then the first would be
marked as "failed", but the second would end up as "orphaned", causing
it to be marked as "aborted" later on. Now it's correctly marked as
"failed".
Without this I got the following error in my journal:
Oct 25 22:42:29 mymachine hydra-evaluator[4085]: starting evaluation of jobset ‘myproject:.jobsets’
Oct 25 22:42:29 mymachine hydra-evaluator[4085]: timeout: failed to run command ‘hydra-eval-jobset’: No such file or directory
Oct 25 22:42:29 mymachine hydra-evaluator[4085]: evaluation of jobset ‘myproject:.jobsets’ finished with status 32512
Without this, if (failed or aborted) derivations have been
garbage-collected, there is no way to restart them, which is very
annoying. Now we set a forceEval flag in the jobset to cause it to be
re-evaluated even if none of the inputs have changed.
‘basicDrv.inputSrcs’ also contains the outputs of inputDrvs. These
don't necessarily exist in the local store, so copying them may cause
an exception. We should only copy the real inputSrcs.
Some Hydra API requests were vulnerable to XSRF attacks, e.g. you
could have a form on another website using http://hydra/logout as the
form action. So we now require POST requests to come from the same
origin.
Reported by Hans-Christian Esperer.
This rewrites the top-level loop of hydra-evaluator in C++. The Perl
stuff is moved into hydra-eval-jobset. (Rewriting the entire evaluator
would be nice but is a bit too much work.) The new version has some
advantages:
* It can run multiple jobset evaluations in parallel.
* It uses PostgreSQL notifications so it doesn't have to poll the
database. So if a jobset is triggered via the web interface or from
a GitHub / Bitbucket webhook, evaluation of the jobset will start
almost instantaneously (assuming the evaluator is not at its
concurrency limit).
* It imposes a timeout on evaluations. So if e.g. hydra-eval-jobset
hangs connecting to a Mercurial server, it will eventually be
killed.
This prevents the server from gradually filling up due to store paths
fetched by hydra-server that then get turned into a GC root by
hydra-update-gc-roots.
Dashboards can now be marked as publically visible in the user
preferences. The dashboard URL has changed from /user/<name>/dashboard
to /dashboard/<name> because /user/<name> requires being logged in as
<name> or as an admin.
This allows fully declarative project specifications. This is best
illustrated by example:
* I create a new project, setting the declarative spec file to
"spec.json" and the declarative input to a git repo pointing
at git://github.com/shlevy/declarative-hydra-example.git
* hydra creates a special ".jobsets" jobset alongside the project
* Just before evaluating the ".jobsets" jobset, hydra fetches
declarative-hydra-example.git, reads spec.json as a jobset spec,
and updates the jobset's configuration accordingly:
{
"enabled": 1,
"hidden": false,
"description": "Jobsets",
"nixexprinput": "src",
"nixexprpath": "default.nix",
"checkinterval": 300,
"schedulingshares": 100,
"enableemail": false,
"emailoverride": "",
"keepnr": 3,
"inputs": {
"src": { "type": "git", "value": "git://github.com/shlevy/declarative-hydra-example.git", "emailresponsible": false },
"nixpkgs": { "type": "git", "value": "git://github.com/NixOS/nixpkgs.git release-16.03", "emailresponsible": false }
}
}
* When the "jobsets" job of the ".jobsets" jobset completes, hydra
reads its output as a JSON representation of a dictionary of
jobset specs and creates a jobset named "master" configured
accordingly (In this example, this is the same configuration as
.jobsets itself, except using release.nix instead of default.nix):
{
"enabled": 1,
"hidden": false,
"description": "js",
"nixexprinput": "src",
"nixexprpath": "release.nix",
"checkinterval": 300,
"schedulingshares": 100,
"enableemail": false,
"emailoverride": "",
"keepnr": 3,
"inputs": {
"src": { "type": "git", "value": "git://github.com/shlevy/declarative-hydra-example.git", "emailresponsible": false },
"nixpkgs": { "type": "git", "value": "git://github.com/NixOS/nixpkgs.git release-16.03", "emailresponsible": false }
}
}
makeBinPath takes care to use the correct output.
nix-repl> lib.makeSearchPath "bin" [pkgs.nix]
"/nix/store/zpp83pr21ihxwsr15l6mkzwkr49zj71d-nix-1.11.2-dev/bin"
nix-repl> lib.makeBinPath [pkgs.nix]
"/nix/store/9n8c3g541qn43yjjs94f1a0m69wp8scg-nix-1.11.2/bin"
Currently, the hydra.nixos.org queue contains 1000s of Darwin builds
that all depend on a stdenv-darwin that previously failed. However,
before, first createStep() would construct a dependency graph for each
build, then getQueuedBuilds() would discover that one of the steps had
failed previously and discard all those steps. Since the graph
construction involves a lot of uncached calls to isValidPath(), this
took several seconds per build.
Now createStep() detects the previous failure right away and bails
out.
These are build steps that remain "busy" in the database even though
they have finished, because they couldn't be updated (e.g. due to a
PostgreSQL connection problem). To prevent them from showing up as
busy in the "Machine status" page, we now periodically purge them.
Previously, if the queue monitor thread encounters a build that Hydra
has previously built, it downloaded the output paths from the binary
cache, just to determine the build products and metrics. This is very
inefficient. In particular, when doing something like merging
nixpkgs:staging into nixpkgs:master, the queue monitor thread will be
locked up for a long time fetching files from S3, causing the build
farm to be mostly idle.
Of course this is entirely unnecessary, since the build
products/metrics are already in the Hydra database. So now we just
look up a previous build with the same output path, and copy the
products/metrics.
Mutliple <githubstatus> sections are possible:
* jobs: regexp for jobs to match
* inputs: the input which corresponds to the github repo/rev whose
status we want to report. Can be repeated
* authorization: Verbatim contents of the Authorization header. See
https://developer.github.com/v3/#authentication.
Otherwise, the browser may mix up HTML and JSON responses if it has
requested both. For example, hitting the back button to return to a
job metric page will show a JSON response, because that was the last
thing the browser fetched for that URL.
This requires Catalyst::Action::Rest >= 1.20.
The previous query
select count(*) from builds b left join buildsteps s on s.build = b.id where busy = 1 and finished = 0
is suddenly taking several minutes. Probably PostgreSQL decided to use
a suboptimal query plan.
The maximum output size per build step (as the sum of the NARs of each
output) can be set via hydra.conf, e.g.
max-output-size = 1000000000
The default is 2 GiB.
Also refactored the build error / status handling a bit.
When using a binary cache store, the queue runner receives NARs from
the build machines, compresses them, and uploads them to the
cache. However, keeping multiple large NARs in memory can cause the
queue runner to run out of memory. This can happen for instance when
it's processing multiple ISO images concurrently.
The fix is to use a TokenServer to prevent the builder threads to
store more than a certain total size of NARs concurrently (at the
moment, this is hard-coded at 4 GiB). Builder threads that cause the
limit to be exceeded will block until other threads have finished.
The 4 GiB limit does not include certain other allocations, such as
for xz compression or for FSAccessor::readFile(). But since these are
unlikely to be more than the size of the NARs and hydra.nixos.org has
32 GiB RAM, it should be fine.
The old page didn't scale very well if you have 150K builds in the
queue, in fact it tended to make browsers hang. The new one just
shows, for each jobset, the number of queued builds. The actual builds
can be seen by going to the corresponding jobset page and looking at
the evals.
Same problem as d744362e4a.
at /nix/store/ksvsbr7pg4z69bv6fbbc8h7x7rm2104m-gcc-4.9.3/include/c++/4.9.3/bits/predefined_ops.h:166
__last@entry=..., __comp=...) at /nix/store/ksvsbr7pg4z69bv6fbbc8h7x7rm2104m-gcc-4.9.3/include/c++/4.9.3/bits/stl_algo.h:1827
__comp=...) at /nix/store/ksvsbr7pg4z69bv6fbbc8h7x7rm2104m-gcc-4.9.3/include/c++/4.9.3/bits/stl_algo.h:4717
Respects <slack> blocks in the hydra config, with attributes:
* jobs: a regexp matching the job name (in the format project:jobset:job)
* url: The URL to a slack incoming webhook
* force: If true, always send messages. Otherwise, only when the build status changes
Multiple <slack> blocks are allowed
To use the local Nix store (default):
store_mode = direct
To use a local binary cache:
store_mode = local-binary-cache
binary_cache_dir = /var/lib/hydra/binary-cache
To use an S3 bucket:
store_mode = s3-binary-cache
binary_cache_s3_bucket = my-nix-bucket
Also, respect binary_cache_{secret,public}_key_file for signing the
binary cache.
The queue runner no longer uses this field, and it doesn't provide
very interesting historical data (mostly SSH failures), but it takes
up a lot of space. Also, it contained some bad UTF-8 which was
preventing an upgrade to Postgres 9.5, so a good occasion to get rid
of it.
The required configuration in hydra.conf:
enable_google_login = 1
google_client_id = 238429sdjkds....apps.googleusercontent.com
and optionally persona_allowed_domains to restrict to one or more
domains.
This is necessary given the current size of the Nixpkgs/NixOS
jobsets. Once we have a Nix store + Postgres on SSD, we can reduce
this again.
Should really make this configurable...
The uid split a while back caused the web interface to create GC roots
in /nix/var/nix/gcroots/per-user/hydra-www, where they wouldn't be
purged by hydra-update-gc-roots. Thus restarted builds would
accumulate forever. The fix is to keep the roots in a shared directory
with gid=hydra.
Regression introduced by 1fdc258de0.
The commit introduced a channel/custom PathPart which uses the new
custom channel expressions, but I forgot to remove CaptureArgs, so the
URL really is channel/latest/ignored-value.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Reported-by: Peter Simons <simons@cryp.to>
This removes the "busy", "locker" and "logfile" columns, which are no
longer used by the queue runner. The "Running builds" page now only
shows builds that have an active build step.
Previously, priority bumps could take a long time to get noticed if
getQueuedBuilds() was busy processing zillions of queue
additions. (This was made worse by the reintroduction of substitute
checking.)
We have this set in upgrade-42.sql, so it's better to stay consistent
with the basic SQL file to avoid problems with new Hydra installations.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Reported-by: Eelco Dolstra <eelco.dolstra@logicblox.com>
There is still a tiny window between the calls to nix-prefetch-* and
addTempRoot. This could be eliminated by adding a "-o" option to
nix-prefetch-*, or by not using those scripts at all (and use
addToStore directly).
This allows Hydra to use binaries from available binary caches. It
makes the queue monitor thread quite a bit slower, so if you don't
want to use binary caches, it's better to add "--option
build-use-substitutes false" to the hydra-queue-runner invocation.
Fixed#243.
The last paragraph states about package installation of the "following"
jobs, but it only applies to generic channels, so let's only display it
there.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
So this is the final part which is needed in order to be able to deliver
custom channels, everything else is now just polishing.
We do this by simply redirecting to the build product download URL and
we use binary_cache_url the same way as in NixChannel.pm.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We should now get an overview and help text on how to add a particular
channel and also a bit of information about the builds that are required
for a channel to get upgraded.
Right now we only select the latest successful build in the latest
successful evaluation, so if someone wants to have more information about
which channel has failed, (s)he still has to look at the "Channels" tab
of the jobset.
We can make this more fancy at some later point if this is really
needed, because right now we're only interested in the latest build,
because it's the only thing necessary to deliver the channel contents.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
It's actually lower-case _despite_ the spelling in the SQL file(s),
because the schema auto-generator from DBIx::Class doesn't take it into
account because it's working on SQLite and the latter seems to ignore
case.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We want to have contents and detauls of channel expressions as well and
we already have that in product.type == file, so why not reuse the same
for the channel expression?
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We now have a searchBuildsAndEvalsForJobset, which creates such a
mapping for us, so we don't need to duplicate code in jobs_tab and
channels_tab.
Also, we're going to use this for the overview of a particular channel
as well, so it makes sense to put it in CatalystUtils instead of
directly in Jobset.pm.
Instead of eval->jobs, it's now eval->builds, because it's really an
aggregate over the builds schema, rather than the job schema.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We only allow channel/latest anyway, so it really doesn't make sense to
explicitly specify this in the PathPart and provide other dispatcher
once we have more than just "latest", which greatly simplifies the
dispatch tree.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We now have a column for that, so no need for counting rows which was a
bit inefficient anyway, because we only would have needed the first row
in the result.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Now that we have our dedicated "Channels" tab, there is no need anymore
to show redundant information.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We now no longer need that additional join of the build outputs and can
solely use the isChannel column of the Builds table to determine whether
it's a channel build.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
This is to properly separate channels from regular jobs and also make
sure that we can always iterate on them, no matter whether the build has
failed. The reason why we were not able to do this until now was because
we were iterating on the build products, and whenever some constituent
of a channel job has failed, we didn't get a build output.
So whenever there is a meta.isHydraChannel, we can now properly
distinguish it from the other jobs.
I still don't have any clue, why "make -C src/sql update-dbix" without
*any* modifications tries to create additional schema definitions. But
I've checked the md5sums of the existing schema definitions and they
don't seem to match, so it seems that they already have been tampered
with.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Now we can provide different channel expressions for one particular
channel build. Not sure yet how this would be useful, but I found it
more appropriate to use a type instead of a subtype of "file".
This should get us consistent with the provious commit.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
This is to get a bit more consistency among channel builds but doesn't
do a radical change on the display. Ideally we may want to have a
channel overview with all the constituents and a small help showing how
the user can add the channel.
Unfortunately, this also introduces an inconsistency: We previously used
the *subtype* "channel", but now we're expecting "channel" as the type
of the product, so we need to change this for the channels overview as
well.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
It's very similar to "jobs" and the code is pretty much the same, except
that we don't do filtering on it. At least it doesn't waste space for a
filter option when there are usually WAY less channel jobs than ordinary
jobs.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Currently I'm using a (not very well) downscaled version of the NixOS
logo, so we want to replace it by a proper image ASAP.
Other than that, the idea is to have something like this in
hydra-build-products:
file channel $out/channel.tar.bz2
Right now of course, it's only displayed at the corresponding builds, so
we might want to have aggregates on all channels for a project, jobset
or maybe even single jobs?
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
They will show up in machineTypes as (e.g.) x86_64-linux:local instead
of x86_64-linux. This is to prevent the Hydra provisioner from
creating machines for steps that are supposed to be executed locally.
It's easier for the Hydra provisioner to put host public keys in the
machines file than to separately manage the known_hosts file
(especially when the provisioner runs on a different machine).
This is necessary because the required system type can become
available later (e.g. by being provisioned by the
auto-scaler). However, in the future, we may want to fail steps if
they have been unsupported for more than a certain amount of time.
For example, steps that require the "kvm" feature may require a
different kind of machine to be provisioned. This can also be used to
require performance-sensitive tests to run on a particular kind of
machine, e.g., by setting requiredSystemFeatures to something like
"ec2-i2.8xlarge".
"hydra-queue-runner --status" now prints how many runnable and running
build steps exist for each machine type. This allows additional
machines to be provisioned based on the Hydra load.
If there is no input named 'inputs', hydra-eval-jobs now passes in a set
of lists, where each attribute corresponds to an input defined in the
jobset specification and each list element is a different input alt, as
an argument named 'inputs'.
Among other things, this allows for generic hydra expressions to be
shared amongst projects with similar structures but different sets of
specific inputs.
Builds can now emit metrics that Hydra will store in its database and
render as time series via flot charts. Typical applications are to
keep track of performance indicators, coverage percentages, artifact
sizes, and so on.
For example, a coverage build can emit the coverage percentage as
follows:
echo "lineCoverage $pct %" > $out/nix-support/hydra-metrics
Graphs of all metrics for a job can be seen at
http://.../job/<project>/<jobset>/<job>#tabs-charts
Specific metrics are also visible at
http://.../job/<project>/<jobset>/<job>/metric/<metric>
The latter URL also allows getting the data in JSON format (e.g. via
"curl -H 'Accept: application/json'").
If Hydra isn't hosted on https://example.com/ but something like
https://example.com/hydra/, the URL for /api/scmdiff would have ended up
on /api/scmdiff rather than /hydra/api/scmdiff.
This is because we didn't use the URI resolver from the controller,
hence we're using it now to build up the whole URL including the query
string.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Without an index on (machine, stoptime desc), this requires a
sequential scan. And adding a whole index for this seems
overkill. (Possibly the queue runner could maintain this info more
efficiently.)
This prevents a race where multiple threads see that machine X is
missing path P, and start sending it concurrently. Nix handles this
correctly, but it's still wasteful (especially for the case where P ==
GHC).
A more refined scheme would be to have per machine, per path locks.
Derivations with "preferLocalBuild = true" can now be executed on
specific machines (typically localhost) by setting the mandary system
features field to include "local". For example:
localhost x86_64-linux,i686-linux - 10 100 - local
says that "localhost" can *only* do builds with "preferLocalBuild =
true". The speed factor of 100 will make the machine almost always win
over other machines.
Otherwise we never recover from reset daemon connections, e.g.
hydra-queue-runner[16106]: while loading build 599369: cannot start daemon worker: reading from file: Connection reset by peer
hydra-queue-runner[16106]: while loading build 599236: writing to file: Broken pipe
...
The error is now handled queueMonitor(), causing the next call to
queueMonitorLoop() to create a new connection.
This is currently done by a separate program that periodically
calls "hydra-queue-runner --status". Eventually, I'll do this
in the queue runner directly.
Fixes#220.
Having a hundred threads doing I/O at the same time is bad on magnetic
disks because of the excessive disk seeks. So allow only 4 threads to
copy closures in parallel.
While sorting machines by load, the load of a machine
(machine->currentJobs) can be changed by other threads. If that
happens, the comparator is no longer a proper ordering, in which case
std::sort() can segfault. So we now make a copy of currentJobs before
sorting.
There is a slight possibility that the queue monitor and a builder
thread simultaneously decide to mark a build as finished. That's fine,
as long as we ensure the DB update is idempotent (as ensured by doing
"update Builds set finished = 1 ... where finished = 0").
If a build A depends on a derivation that is the top-level derivation
of some build B, then we should process B before A (meaning we
shouldn't make the derivation runnable before B has been
added). Otherwise, the derivation will be "accounted" to A rather than
B (so the build step will show up in the wrong build).
Aborted builds are now put back on the runnable queue and retried
after a certain time interval (currently 60 seconds for the first
retry, then tripled on each subsequent retry).
Hydra-queue-runner now no longer polls the queue periodically, but
instead sleeps until it receives a notification from PostgreSQL about
a change to the queue (build added, build cancelled or build
restarted).
Also, for the "build added" case, we now only check for builds with an
ID greater than the previous greatest ID. This is much more efficient
if the queue is large.
It just makes things unnecessarily complicated. We can just exit
without cleaning anything up, since the only thing to do is unmark
builds and build steps as busy. But we can do that by having systemd
call "hydra-queue-runner --unlock" from ExecStopPost.
If multiple threads create a step for the same build, they could get
the same "max(stepnr)" and allocate conflicting new step numbers. So
lock the BuildSteps table while doing this. We could use a different
isolation level, but this is easier.
This removes the need for Nix's build-remote.pl.
Build logs are now written to $HYDRA_DATA/build-logs because
hydra-queue-runner doesn't have write permission to /nix/var/log.
When visiting the tail-reload page, for a short amount of time the
"unscrolled" version is shown. To circumvent that, let's scroll down
immediately at the first possibility to fill the gap between the loading
of the document and the first AJAX request coming in.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
There are quite a lot of build outputs which have lines with a length
exceeding the width of the taillog <pre/> and thus visually produce more
lines than 50. This causes the tail "box" to change height frequently
and to get to the bottom you need to scroll down.
We now set a fixed line-height to 120% of the font size and cap the
maximum height based on that value (50 * 1.2 = 60). It's probably not
nice to override the line-height, but max-lines is currently only
available using browser-specific property names. But after all it's just
for the tail output, if people complain about the line-height, we can
still change it :-)
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We're just implicitly escaping the tail content by not using .load() but
explicitly setting the text content using .text(), so that escaping
isn't needed on our side.
This should get rid of a few formatting errors and possibly XSS if
someone manages to place JS code in the tail of a build and manages to
lurk a user to that tail output.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Like eval IDs, build IDs don't convey useful information.
Also, make the job name link to the build rather than the job. When
people click on a build, they expect to go to the build page, not the
job page.
Scheduling is mostly based on jobset shares these days. So showing and
sorting by priority just wastes space and gives the incorrect
impression that Hydra executes builds in the order shown on the queue
page.
These give warnings in Perl >= 5.18:
given is experimental at /home/hydra/src/hydra/src/lib/Hydra/Helper/CatalystUtils.pm line 241.
when is experimental at /home/hydra/src/hydra/src/lib/Hydra/Helper/CatalystUtils.pm line 242.
...
There is no point in indexing rows with common column values like
"finished = 1", since those are the majority of the table. Only the
exceptions ("finished = 0") are interesting. Having smaller tables
should make updates/insertions faster.
This incorporates the following two commits from <nixpkgs>:
NixOS/nixpkgs@f83af95f8aNixOS/nixpkgs@5e7a1cf955
Hydra was the original reason why I was fixing tempdir creation in the
first place. Seeing that Hydra ships its own versions of these scripts,
we need to patch them here as well.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
If hydra-eval-jobs creates a new root, and hydra-update-gc-roots runs
before hydra-evaluator has had a chance to add the corresponding build
to the database, then hydra-update-gc-roots will remove the root. If
subsequently the Nix garbage collector kicks in, it may remove the
build's .drv file before the build is performed. Since evaluation of
the Nixpkgs and NixOS jobsets nowadays takes a lot of time (e.g. an
hour), the probability of this happening is fairly high.
The quick fix is not to delete roots that are less than a day old. So
long as evaluation doesn't take longer than a day, this should be fine
;-)
Fixes#166.
This adds a Hydra plugin for users to submit their open source projects
to the Coverity Scan system for analysis.
First, add a <coverityscan> section to your Hydra config, including the
access token, project name, and email, and a regex specifying jobs to
upload:
<coverityscan>
project = testrix
jobs = foobar:.*:coverity.*
email = aseipp@pobox.com
token = ${builtins.readFile ./coverity-token}
</coverityscan>
This will upload the scan results for any job whose name matches
'coverity.*' in any jobset in the Hydra 'foobar' project, for the
Coverity Scan project named 'testrix'.
Note that one upload will occur per job matched by the regular
expression - so be careful with how many builds you upload.
The jobs which are matched by the jobs specification must have a file in
their output path of the form:
$out/tarballs/...-cov-int.(xz|lzma|zip|bz2|tgz)
The file must have the 'cov-int' directory produced by `cov-build` in
the root.
(You can also output something into
$out/nix-support/hydra-build-products for the Hydra UI.)
This file will be found in the store, and uploaded to the service
directly using your access credentials. Note the exact extension: don't
use .tar.xz, only use .xz specifically.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Fixes errors like:
Caught exception in engine "Wide character in syswrite at /nix/store/498lwsrn5kkdh1q8kn3vcpd3457w6m7a-hydra-perl-deps/lib/perl5/site_perl/5.16.3/Starman/Server.pm line 547."
Note that these errors didn't happen if the database encoding was set
to SQL_ASCII (which was the case for hydra.nixos.org, explaining why
it didn't get these errors). However, now the encoding must be
UTF8. To change it, do:
update pg_database set encoding = pg_char_to_encoding('UTF8') where datname = 'hydra';
This gets rid of the warning:
DBIx::Class::Storage::DBI::select_single(): Query returned more than one row. SQL that returns multiple rows is DEPRECATED for ->find and ->single at /home/eelco/Dev/hydra/src/script/../lib/Hydra/Controller/Project.pm line 15
This makes it easy to set environment variables for the Hydra server
(for example, your configuration.nix can use readFile to read an API
token to upload build results somewhere).
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Oops, forgot to add this in f75509099a.
This is necessary because we actually want to run the preStart script as
root (because it chmod/chowns stuff and also needs to create the
database using PostgreSQL's superuser) and the actual creation of the
database as user hydra.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
The su binary is now in a separate output of the shadow package and
isn't included in the main output path anymore.
But instead of changing the call to use pkgs.su, we're now entirely
dropping the dependency because systemd is already able to execute
processes under a specific user by itself.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
In the dashboard and on the job page, indicate whether the job appears
in the latest jobset eval. That way, the user gets some indication if
a job has accidentally disappeared (e.g. due to an evaluation error).
Use the following in your hydra.conf to make your instance a
private Hydra instance (public is the default):
private 1
Currently, this will not allow you to use the API, channels
and the binary cache when running in private mode. We will add
solutions for these functionalities later.
This requires adding the following to hydra.conf:
binary_cache_key_name = <key-name>
binary_cache_private_key_file = <path-to-private-key>
e.g.
binary_cache_key_name = hydra.nixos.org-1
binary_cache_private_key_file = /home/hydra/cache-key.sec
All successful, non-garbage-collected builds in the evaluation are
passed in a attribute set. So if you declare a Hydra input named
‘foo’ of type ‘eval’, you get a set with members ‘foo.<jobname>’. For
instance, if you passed a Nixpkgs eval as an input named ‘nixpkgs’,
then you could get the Firefox build for x86_64-linux as
‘nixpkgs.firefox.x86_64-linux’.
Inputs of type ‘eval’ can be specified in three ways:
* As the number of the evaluation.
* As a jobset identifier (‘<project>:<jobset>’), which will yield the
latest finished evaluation of that jobset. Note that there is no
guarantee that any job in that evaluation has succeeded, so it might
not be very useful.
* As a job identifier (‘<project>:<jobset>:<job>’), which will yield
the latest finished evaluation of that jobset in which <job>
succeeded. In conjunction with aggregate jobs, this allows you to
make sure that the evaluation contains the desired builds.
If PostgreSQL is running on the same system, then the "hydra" user can
can connect without a password (via Unix domain socket
authentication), so no need to set up a password. If PostgreSQL is on
another machine, then creating a user/database won't work anyway.
For users who only have the "create-projects" role, actually display the
item in the menu as the only option.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We really don't need to touch a file in the current working directory
to find files that are older than one month. Since findutils 4.3.3 there
is a -newerXY option which allows to specify timestamps directly (as
with `date --date`).
But even when using a reference file, it really causes confusion if
people look into /root and try to debug where that misterious "r" file
is coming from.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
This reverts commit 2d7e106d29.
Unfortunately some jobsets still depend on this behaviour. They could
probably do something like "assert system == input.system; ..." but
changing them all is undesirable.
Include information about who changed the build status in notification
emails, and enable optional per-input notification of said committers.
Conflicts due to two branches modifying the database schema.
Signed-off-by: Shea Levy <shea@shealevy.com>
Conflicts:
src/lib/Hydra/Schema/Jobsets.pm
src/sql/upgrade-23.sql
Currently the dashboard allows users to get a quick overview of the
status of jobs they're interested in, but more will be added,
e.g. viewing all your jobsets or all jobs of which you're a
maintainer.
There are jobsets that are evaluated only once, that is, after they've
been evaluated, they're disabled automatically. This is primarily
useful for doing releases: for instance, doing an evaluation with
"officialRelease" set to "true" should be done only once.
If there are builds in the queue that depend on another scheduled
build, then hydra-queue-runner will start the dependency first and
block the dependent builds. This is implemented in
findBuildDependencyInQueue. However, if there are tens of thousands
of such dependent builds, since each call to
findBuildDependencyInQueue may take a second or so, hydra-queue-runner
will spend hours just deciding which builds *not* to do. Thus very
little progress is made.
So now, when a build is started, we immediately check which builds are
"blocked" by it (i.e. depend on it), and remove such builds from
consideration.
We can just show the normal "edit jobset" page for the original jobset
and then do a PUT request to create a new jobset.
Also simplified updating the jobset inputs. We can just delete all of
them and recreate them from the user parameters. That's safe because
it's done in a transaction.
It's now a dropdown menu in the tabs thingy, which subsumes the
"Reproduce locally" button. This makes the actions in the menu a bit
more visible, IMHO.
This commit is provided by (zsh syntax):
sed -i 's|/static[^"]*|[% c.uri_for("&") %]|;s/\[% size %\]/${size}/' **/*.tt
And the reason for this change is to make it easier to change the base
path with headers like X-Request-Base to be served within a URI prefix,
especially when behind a reverse proxy.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Pick the jobset that has used the smallest fraction of its share,
rather than the jobset furthest below its share in absolute terms.
This gives jobsets with a small share a quicker start (but they
will also run out of their share quicker).
Each jobset now has a "scheduling share" that determines how much of
the build farm's time it is entitled to. For instance, if a jobset
has 100 shares and the total number of shares of all jobsets is 1000,
it's entitled to 10% of the build farm's time. When there is a free
build slot for a given system type, the queue runner will select the
jobset that is furthest below its scheduling share over a certain time
window (currently, the last day). Withing that jobset, it will pick
the build with the highest priority.
So meta.schedulingPriority now only determines the order of builds
within a jobset, not between jobsets. This makes it much easier to
prioritise one jobset over another (e.g. nixpkgs:trunk over
nixpkgs:stdenv).
In your hydra config, you can add an arbitrary number of <s3config>
sections, with the following options:
* name (required): Bucket name
* jobs (required): A regex to match job names (in project:jobset:job
format) that should be backed up to this bucket
* compression_type: bzip2 (default), xz, or none
* prefix: String to prepend to all hydra-created s3 keys (if this is
meant to represent a directory, you should include the trailing slash,
e.g. "cache/"). Default "".
After each build with an output (i.e. successful or failed-with-output
builds), the output path and its closure are uploaded to the bucket as
.nar files, with corresponding .narinfos to enable use as a binary
cache.
This plugin requires that s3 credentials be available. It uses
Net::Amazon::S3, which as of this commit the nixpkgs version can
retrieve s3 credentials from the AWS_ACCESS_KEY_ID and
AWS_SECRET_ACCESS_KEY environment variables, or from ec2 instance
metadata when using an IAM role.
This commit also adds a hydra-s3-backup-collect-garbage program, which
uses hydra's gc roots directory to determine which paths are live, and
then deletes all files except nix-cache-info and any .nar or .narinfo
files corresponding to live paths. hydra-s3-backup-collect-garbage
respects the prefix configuration option, so it won't delete anything
outside of the hierarchy you give it, and it has the same credential
requirements as the plugin. Probably a timer unit running the garbage
collection periodically should be added to hydra-module.nix
Note that two of the added tests fail, due to a bug in the interaction
between Net::Amazon::S3 and fake-s3. Those behaviors work against real
s3 though, so I'm committing this even with the broken tests.
Signed-off-by: Shea Levy <shea@shealevy.com>
We now keep *all* unfinished evaluations of a jobset, in addition to
the <keepnr> most recent finished evaluations.
The main motivation is to ensure that mirror-{nixos,nixpkgs} work
properly: if building an evaluation takes too long, some of its builds
may already have been garbage-collected by the time the others finish.
We had Postgres barfing with this error:
DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st execute failed: ERROR: stack depth limit exceeded
because the ‘drvpath => [ @dependentDrvs ]’ in failDependents can
cause a query of unbounded size. (In this specific case there was a
failure of Bison, which has > 10000 dependent derivations.) So now we
just get all scheduled builds from the DB.
Due to the fixed-output derivation hashing scheme, there can be
multiple derivations of the same output path. But build logs are
indexed by derivation path. Thus, we may not be able to find the
log of a build or build step using its derivation. So as a fallback,
Hydra now looks for other derivations with the same output paths.
They're mostly redundant since there is a faster "jobs" tab on
the jobset pages now. The only thing the latter lacks is the
ability to see status change times, but those are quite expensive
to compute, and are visible on build pages if you really need them.
PostgreSQL and Perl have different sort orders, in particular when
comparing job names such as "aspell.x86_64-linux" and
"aspellDicts.cs.i686-freebsd". This confused the evaluation
comparison code, causing some jobs to appear as "removed".
So now we do all the sorting in Perl.
Fixes#105.
Aggregate constituents are derivations. However there can be multiple
builds in an evaluation that have the same derivation, i.e. they can
alias each other (e.g. "emacs", "emacs24" and "emacs24Packages.emacs"
in Nixpkgs). Previously we picked a build arbitrarily for the
AggregateConstituents table. Now we pick the one with the shortest
name (e.g. "emacs").
On 32-bit, Linux 3.4, and if the memory size is bigger than a certain
value, starting the stage 2 init script fails with "Exec format error"
because the 9P filesystem is returning garbage. No such problem with
Linux 3.10.
http://hydra.nixos.org/build/5737226
We now keep all builds in the N most recent evaluations of a jobset,
rather than the N most recent builds of every job. Note that this
means that typically fewer builds will be kept (since jobs may be
unchanged across evaluations).
For presentation purposes, we need to know what builds are part of an
aggregate build. So at evaluation time, look at the "members"
attribute, find the corresponding builds in the eval, and create a
mapping in the AggregateMembers table.
It redirects to the latest successful build from a finished
evaluation. This is mostly useful for the Nixpkgs/NixOS mirroring
script, which need the latest finished evaluation in which some
aggregate job (such as ‘tested’ in NixOS) succeeded.
The NrBuilds table tracks the value of ‘select count(*) from Builds
where finished = 0’, keeping it up to date via a trigger. This is
necessary to make the /all page fast, since otherwise it needs to do a
sequential scan on the Builds table.
Doing a chdir in the parent is evil. For instance, we had Hydra core
dumps ending up in the cloned directory. Therefore, the function
‘run’ allows doing a chdir in the child. The function ‘grab’ returns
the child's stdout and throws an exception if the child fails.
Some installations may want to use system-wide sendmail (i.e.
/run/setuid-wrappers/sendmail) and those that want ssmtp can add it to
hydra's path themselves.
Signed-off-by: Shea Levy <shea@shealevy.com>
This allows users to sign in to Hydra using Mozilla Persona accounts.
When a user first sign in, a row in the Users table for the given
Persona identity (an email address) is created automatically.
To do: figure out how to deal with legacy accounts.
The catalyst-action-rest branch from shlevy/hydra was an exploration of
using Catalyst::Action::REST to create a JSON API for hydra. This commit
merges in the best bits from that experiment, with the goal that further
API endpoints can be added incrementally.
In addition to migrating more endpoints, there is potential for
improvement in what's already been done:
* The web interface can be updated to use the same non-GET endpoints as
the JSON interface (using x-tunneled-method) instead of having a
separate endpoint
* The web rendering should use the $c->stash->{resource} data structure
where applicable rather than putting the same data in two places in
the stash
* Which columns to render for each endpoint is a completely debatable
question
* Hydra::Component::ToJSON should turn has_many relations that have
strings as their primary keys into objects instead of arrays
FixesNixOS/hydra#98
Signed-off-by: Shea Levy <shea@shealevy.com>
HipChat notification messages now say which committers were
responsible, e.g.
Job patchelf:trunk:tarball: Failed, probably due to 2 commits by Eelco Dolstra
This plugin sends notification of build failure or success to a
HipChat room, if the status differs from the last build.
The plugin can be configured by adding one or more of these stanzas to
hydra.conf:
<hipchat>
jobs = (patchelf|nixops):.*:.*
room = 1234
token = 39ab2198fe...
</hipchat>
Here "jobs" is a regular expression against which the fully qualified
job name of the build is matched (so for instance
"nixops:master:tarball" will match the stanza above).
Restarted builds whose derivation has been garbage-collected in the
meantime caused hydra-queue-runner to get stuck in a loop saying:
Jun 14 11:54:25 lucifer hydra-queue-runner[31844]: system type `x86_64-darwin': 0 active, 2 allowed, started 2 builds
Jun 14 11:54:25 lucifer hydra-queue-runner[31844]: {UNKNOWN}: path `/nix/store/wcizsch2garjlvs4pswrar47i1hwjaia-inconsolata.drv' is not valid at
/nix/store/ypkdm4v13yrk941rvp8h0y425a5ww6nm-hydra-0.1pre1353-40debf1/bin/.hydra-queue-runner-wrapped line 51. at
/nix/store/kjpsc2zdaxnd44azxyw60f2px839m1cd-hydra-perl-deps/lib/perl5/site_perl/5.16.2/Catalyst/Model/DBIC/Schema.pm line 501
This happens if the previous iteration took more than 60 seconds.
Then the queue runner may think that builds failed to start properly
and unlock them, e.g.
build 5264936 pid 19248 died, unlocking
build 5264951 pid 19248 died, unlocking
build 5257073 pid 19248 died, unlocking
...
Because we don't start a build if a dependency is already building,
it's possible that some or all of the $extraAllowed highest-priority
builds in the queue are not eligible. E.g. with $extraAllowed = 32,
we might start only 3 builds even though there are thousands in the
queue. The fix is to try all queued builds until $extraAllowed have
been started.
Issue #99.
Config::Any uses Module::Pluggable to seach for plugins, so it needs
the patched Module::Pluggable in Nixpkgs (rather than the one in Perl
itself) to properly find plugins in symlink trees created by buildEnv.
For some reason, hg clone from a local (path-based) repo will fail if
the parent directory of the destination directory doesn't exist (though
it succeeds when cloning from an http repo).
Signed-off-by: Shea Levy <shea@shealevy.com>
Previously, for scheduled builds, "timestamp" contained the time the
build was added to the queue, while for finished builds, it was the
time the build finished. Now it's always the former.
The revision counting changes depending on which revision is cloned
initially, so clone the default branch first and then checkout the
required revision to match hydra's revCount.
Signed-off-by: Shea Levy <shea@shealevy.com>
See e.g. http://hydra.nixos.org/build/4915744.
P.S. existing active build steps of finished builds can be marked as
aborted by running:
update buildsteps set busy = 0, status = 4
where (build, stepnr) in
(select s.build, s.stepnr from buildsteps s join builds b on s.build = b.id where b.finished = 1 and s.busy = 1);
This is mostly so we don't have to pass around common parameters like
"db" and "config", and we don't have to check for the existence of
methods.
A plugin now looks like this:
package Hydra::Plugin::TwitterNotification;
use parent 'Hydra::Plugin';
sub buildFinished {
my ($self, $build, $dependents) = @_;
print STDERR "tweeting about build ", $build->id, "\n";
# Send tweet...
# Hydra database is $self->{db}.
}
You can now add plugins to Hydra by writing a module called
Hydra::Plugin::<whatever> and putting it in Perl's search path. The
only plugin operation currently supported in buildFinished, called
when hydra-build has finished doing a build.
For instance, a Twitter notification plugin would look like this:
package Hydra::Plugin::TwitterNotification;
sub buildFinished {
my ($self, $db, $config, $build, $dependents) = @_;
print STDERR "tweeting about build ", $build->id, "\n";
# send tweet...
}
1;
Previously this function didn't actually have a lot of effect. If a
build A had a dependency B, Hydra would start B first. But on the
next scan through the queue, it would start A anyway, because of the
"busy => 0" restriction.
Now the queue runner won't start a build if a dependency is already
running. (This is not necessarily optimal, since the build may have
other dependencies that don't correspond to a build in the queue but
could run. One day we'll start all Hydra builds in parallel...)
Also, for performance, use computeFSClosure instead of "nix-store
-qR". And don't bother with topological sorting because it didn't
have an effect anyway since the database returns dependencies in
arbitrary order.
This allows checking a jobset (say) at most once a day. It's also
possible to disable polling by setting the interval to 0. This is
useful for jobsets that use push notification or are manually
evaluated.
This caused exceptions like:
Caught exception in Hydra::Controller::Build->view_build "writing to file: Broken pipe at /nix/store/ihdb3widsq1dk7sbl5vqjxfcxb5ypad4-hydra-0.1pre1297-8158093/libexec/hydra/lib/Hydra/Controller/Build.pm line 59."
because the connection to the Nix daemon would be terminated due to a
protocol violation (calling queryPathInfo with an empty string).
Returning only the first 20 results can cause NixOS/Nixpkgs channel
generation to fail, if the first 20 view results correspond to
evaluations that haven't finished yet. Then URLs like
/view/nixos/tested/latest-finished will return 500 rather than the
latest finished view.
Avoid the frequently printed
hydra-queue-runner[10293]: system type `x86_64-linux': 2 active, 2 allowed, starting 0 builds
message. That information is only interesting when some build are
actually started.
You can now do:
bash <(curl http://hydra-server/build/1238757/reproduce)
to download and execute a script that reproduces a Hydra build
locally. This script fetches all inputs (e.g. Git repositories) and
then invokes nix-build.
The downloaded sources are stored in /tmp/build-<buildid> and reused
between invocations of the script.
Any additional command line options are passed to nix-build. So
bash <(curl http://hydra-server/build/1238757/reproduce) --run-env
will drop you in a shell where you can interactively hack on the
build, e.g.
$ source $stdenv/setup
$ set +e
$ unpackPhase
$ cd $sourceRoot
$ configurePhase
$ emacs foo.c &
$ make
and so on.
Build product paths cannot reference locations outside of the Nix
store. We previously disallowed paths from being symlinks, but this
didn't take into account that parent path elements can be symlinks as
well. So a build product /nix/store/bla.../foo/passwd, with
/nix/store/bla.../foo being a symlink to /etc, would still work.
So now we check all paths encountered during path resolution.
Symlinks are allowed again so long as they point to the Nix store.
This is a followup to commit
10882a1ffd ("Add multiple output
support").
* src/script/hydra-eval-guile-jobs.in (job-evaluations->sxml): Return
several `output' tags in the body, and remove the `outPath' attribute
of `job'.
Remove some unnecessary configurability, remove all hydra.nixos.org
assumptions, remove some policy (e.g. nix.gc.automatic) that are
orthogonal to hydra.
Signed-off-by: Shea Levy <shea@shealevy.com>
Chaining paths only works properly when PathPart is used. Before this
fix, the affected URIs bypassed the top-level 'admin' sub.
Signed-off-by: Shea Levy <shea@shealevy.com>
This reverts commit 71d020735b.
Unfortunately there are still some cases where we need to set Hydra's
concurrency separately. (Ideally, Hydra would start *all* queued
builds in parallel and let Nix take care of everything...)
So now "?compare=<jobset name>" is no longer a hidden feature!
P.S. Encountered this wonderful TemplateToolkit brainfuck again: if
you want to get the number of rows in (say) project.jobsets, you can't
say "project.jobsets.size". That will *usually* give the right
result, except that if there is only one row in project.jobsets, it
will evaluate to 3. Instead you have to use "project.jobsets_rs.count".
Note that on machines that support multiple system types, EACH system type gets the full number of build slots, which is almost certainly not what we want.
You can use the URL
http://<hydra-server>/api/push-github
as GitHub's WebHook URL. Hydra will automatically trigger an
evaluation of all affected jobsets.
External machines can now notify Hydra that it should check a
repository by sending a GET or PUSH request to /api/push, providing a
list of jobsets to be checked and/or a list of repository URLs. In
the latter case, all jobsets that have any of the specified
repositories as an input will be checked.
For instance, you can configure GitHub or BitBucket to send a request
to the URL
http://hydra.example.org/api/push?repos=git://github.com/NixOS/nixpkgs.git
to trigger evaluation of all jobsets that have
git://github.com/NixOS/nixpkgs.git as an input, or to the URL
http://hydra.example.org/api/push?jobsets=patchelf:trunk,nixpkgs:trunk
to trigger evaluation of just the specified jobsets.
Otherwise you can do
ln -s /etc/passwd $out/foo
echo "file misc $out/foo" >> $out/nix-support/hydra-build-products
and get Hydra to serve its /etc/passwd file.
Set a click handler on the table instead of on every row. This should
be faster on large tables. Also, it's easier to use: you just set the
clickable-rows class on the table, and the row-link class on the <a>
element that contains the "main" link of the row.
You can now just click on the evaluation link on the first tab to see
all builds in the same jobset. This also makes rendering build pages
quite a bit faster for jobsets like Nixpkgs.
It's pointless to store these, since Nix knows where the logs are.
Also handle (in fact require) Nix's new log storage scheme. Also some
cleanups in the build page.
The function getDerivation() can return false if its argument is a
derivation. This happens if evaluating the name or system attribute
triggers an assertion. In that case, we shouldn't recurse into the
attributes of the derivation.
If a build has ‘preferLocalBuilds = true’ (or we're not using remote
building), and the build has a non-permanent failure, then the build
status should be "Aborted" rather than "Failed". This is denoted by
an exit status of 100 from nix-store.
The check to see whether a build had been scheduled in a previous
evaluation took about 200 ms for the nixpkgs:trunk jobset. Given
that it has more than 15000 builds, this added up to a lot. Now
it takes 0.2 ms per build.
The action .../jobset/<project>/<jobset>/latest-eval redirects to the
latest evaluation of the jobset that has no unfinished builds. Thus,
for instance,
http://hydra.nixos.org/jobset/nixpkgs/trunk/latest-eval/channel
is the channel containing the latest consistent set of Nixpkgs builds.
The URI parameter "compare=..." can denote either an arbitrary
evaluation ID, or the name of a jobset in the same project. In the
latter case, the comparison is made against the latest completed
evaluation of the specified jobset.
This happened in a pathological case in Nixpkgs: the "grub" job is
evaluated for i686-linux and x86_64-linux, but in the latter case it
returns the same derivation as in the former case. So only one build
should be added.
This gets rid of the openHydraDB function and ensures that we
open the database in a consistent way.
Also drop the PostgreSQL sequence hacks. They don't seem to be
necessary anymore.
When checking whether the jobset is unchanged, we need to compare with
the previous JobsetEval regardless of whether it had new builds.
Otherwise we'll keep adding new JobsetEval rows.
Because of the way DBIx::Class does prepared statements, even
innocuous queries such
$c->model('DB::Builds)->search({finished => 0})
can be extremely slow. This is because DBIx::Class prepares a
PostgreSQL statement
select ... from Builds where finished = ?
and since Builds is very large and there is a large fraction of rows
with "finished = 1", the PostgreSQL query planner decides to implement
this query with a sequential scan of the Builds table (despite the
existence of an index on "finished"), which is extremely slow. It
would be nice if we could tell DBIx::Class that constants should be
part of the prepared statement, i.e.
select ... from Builds where finished = 0
but AFAIK we can't.
In particular the /pkg action is now O(lg n) instead of O(n) in the
number of packages in the channel, and listing the channel contents
no longer requires calling isValidPath() on all packages.
Derivations (and thus build time dependencies) are no longer included
in the channel, because they're not GC roots. Thus they could
disappear unexpectedly.
This isn't perfect because it doesn't handle the case where a
previous build hasn't finished yet. But at least it won't send mail
for old builds that fail while a newer build has already succeeded.
* Don't use isCurrent anymore; instead look up builds in the previous
jobset evaluation. (The isCurrent field is still maintained because
it's still used in some other places.)
* To determine whether to perform an evaluation, compare the hash of
the current inputs with the inputs of the previous jobset
evaluation, rather than checking if there was ever an evaluation
with those inputs. This way, if the inputs of an evaluation change
back to a previous state, we get a new jobset evaluation in the
database (and thus the latest jobset evaluation correctly represents
the latest state of the jobset).
* Improve performance by removing some unnecessary operations and
adding an index.
Since it read the actual roots after determining the set of desired
roots, there was a possibility that it would delete roots added by
hydra-evaluator or hydra-build while hydra-update-gc-roots was
running. This could cause a derivation to be garbage-collected before
the build was performed, for instance. Now the actual roots are read
first, so any root added after that time is not deleted.
The hydra-update-gc-roots script is taking around 95 minutes on our
Hydra instance (though a lot of that is I/O wait). This patch
significantly reduces the number of database queries. In particular,
the N most recent successful builds for each job in a jobset are now
determined in a single query. Also, it removes the calls to
readlink().
Prepared statements are sometimes much slower than unprepared
statements, because the planner doesn't have access to the query
parameters. This is the case for the active build steps query (in
/status), where a prepared statement is three orders of magnitude
slower. So disable the use of prepared statements in this case.
(Since the query parameters are constant here, it would be nicer if we
could tell DBIx::Class to prepare a statement with those parameters
fixed. But I don't know an easy way to do so.)
Hydra is a [Continuous Integration](https://en.wikipedia.org/wiki/Continuous_integration) service for [Nix](https://nixos.org/nix) based projects.
## Installation And Setup
**Note**: The instructions provided below are intended to enable new users to get a simple, local installation up and running. They are by no means sufficient for running a production server, let alone a public instance.
### Enabling The Service
Running Hydra is currently only supported on NixOS. The [hydra module](https://github.com/NixOS/nixpkgs/blob/release-20.03/nixos/modules/services/continuous-integration/hydra/default.nix) allows for an easy setup. The following configuration can be used for a simple setup that performs all builds on _localhost_ (Please refer to the [Options page](https://nixos.org/nixos/options.html#services.hydra) for all available options):
```nix
{
services.hydra={
enable=true;
hydraURL="http://localhost:3000";
notificationSender="hydra@localhost";
buildMachinesFiles=[];
useSubstitutes=true;
};
}
```
### Creating An Admin User
Once the Hydra service has been configured as above and activated, you should already be able to access the UI interface at the specified URL. However some actions require an admin user which has to be created first:
Afterwards you should be able to log by clicking on "_Sign In_" on the top right of the web interface using the credentials specified by `hydra-create-user`. Once you are logged in you can click "_Admin -> Create Project_" to configure your first project.
### Creating A Simple Project And Jobset
In order to evaluate and build anything you need to create _projects_ that contain _jobsets_. Hydra supports imperative and declarative projects and many different configurations. The steps below will guide you through the required steps to creating a minimal imperative project configuration.
#### Creating A Project
Log in as administrator, click "_Admin_" and select "_Create project_". Fill the form as follows:
- **Identifier**: `hello-project`
- **Display name**: `hello`
- **Description**: `hello project`
Click "_Create project_".
#### Creating A Jobset
After creating a project you are forwarded to the project page. Click "_Actions_" and choose "_Create jobset_". Change **Type** to Legacy for the example below. Fill the form with the following values:
- **Identifier**: `hello-project`
- **Nix expression**: `examples/hello.nix` in `hydra`
- **Check interval**: 60
- **Scheduling shares**: 1
We have to add two inputs for this jobset. One for _nixpkgs_ and one for _hydra_ (which we are referencing in the Nix expression above):
Make sure **State** at the top of the page is set to "_Enabled_" and click on "_Create jobset_". This concludes the creation of a jobset that evaluates [./examples/hello.nix](./examples/hello.nix) once a minute. Clicking "_Evaluations_" should list the first evaluation of the newly created jobset after a brief delay.
## Building And Developing
### Building Hydra
You can build Hydra via `nix-build` using the provided [default.nix](./default.nix):
```
$ nix build
```
### Development Environment
You can use the provided shell.nix to get a working development environment:
The development environment can also automatically be established using [nix-direnv](https://github.com/nix-community/nix-direnv).
### Executing Hydra During Development
When working on new features or bug fixes you need to be able to run Hydra from your working copy. This
can be done using [foreman](https://github.com/ddollar/foreman):
```
$ nix develop
$ # hack hack
$ ninja -C build
$ foreman start
```
Have a look at the [Procfile](./Procfile) if you want to see how the processes are being started. In order to avoid
conflicts with services that might be running on your host, hydra and postgress are started on custom ports:
- hydra-server: 63333 with the username "alice" and the password "foobar"
- postgresql: 64444, can be connected to using `psql -p 64444 -h localhost hydra`
Note that this is only ever meant as an ad-hoc way of executing Hydra during development. Please make use of the
NixOS module for actually running Hydra in production.
### Checking your patches
After making your changes, verify the test suite passes and perlcritic is still happy.
Start by following the steps in [Development Environment](#development-environment).
Then, you can run the tests and the perlcritic linter together with:
```console
$ nix develop
$ ninja -C build test
```
You can run a single test with:
```
$ nix develop
$ cd build
$ meson test --test-args=../t/Hydra/Event.t testsuite
```
And you can run just perlcritic with:
```
$ nix develop
$ cd build
$ meson test perlcritic
```
### JSON API
You can also interface with Hydra through a JSON API. The API is defined in [hydra-api.yaml](./hydra-api.yaml) and you can test and explore via the [swagger editor](https://editor.swagger.io/?url=https://raw.githubusercontent.com/NixOS/hydra/master/hydra-api.yaml)
alter table builds add column longDescription text;
alter table builds add column license text;
alter table projects add column homepage text;
alter table builds add column homepage text;
alter table BuildProducts add column defaultPath text;
alter table BuildResultInfo add column failedDepBuild integer;
alter table BuildResultInfo add column failedDepStepNr integer;
alter table ReleaseSetJobs add column jobset text not null default "trunk";
=== (DB dump/load needed after Sqlite upgrade) ===
insert into jobs(project, jobset, name, active) select distinct project, jobset, job, 0 from builds b where not exists (select 1 from jobs where project = b.project and jobset = b.jobset and name = b.job);
create index IndexBuildInputsByBuild on BuildInputs(build);
create index IndexBuildInputsByDependency on BuildInputs(dependency);
create index IndexBuildsByTimestamp on Builds(timestamp);
alter table jobs add column disabled integer not null default 0;
alter table builds add column maintainers text;
# Add the isCurrent column to Builds and use the obsolete Jobs.active to fill it in.
alter table builds add column isCurrent integer default 0;
update builds set isCurrent = 1 where id in (select max(id) from builds natural join (select distinct b.project, b.jobset, b.job, b.system from builds b join (select project, jobset, name from jobs where active = 1) j on b.project = j.project and b.jobset = j.jobset and b.job = j.name) b2 group by project, jobset, job, system);
alter table Jobsets add column enabled integer not null default 1;
# Releases -> Views.
alter table ReleaseSets rename to Views;
alter table ReleaseSetJobs rename to ViewJobs;
alter table ViewJobs rename column release_ to view_;
alter table ViewJobs drop column mayFail;
alter table ViewJobs add column autorelease integer not null default 0;
alter table Builds add column nixExprInput text;
alter table Builds add column nixExprPath text;
# Adding JobsetEvals.
drop table JobsetInputHashes;
(add JobsetEvals, JobsetEvalMembers)
* Job selection:
php-sat:build [system = "i686-linux"]
@@ -120,19 +61,15 @@
--if system i686-linux --arg build {...}
* Restarting a bunch of failed builds:
* Restart all aborted builds in a given evaluation (e.g. 820909):
$ sqlite3 hydra.sqlite "select x.id from builds x join buildresultinfo r on r.id = x.id where project = 'nixpkgs' and jobset = 'stdenv' and exists (select 1 from buildinputs where build = x.id and revision = 14806) and finished = 1 and buildstatus = 3" > ids
> update builds set finished = 0 where id in (select id from builds where finished = 1 and buildstatus = 3 and exists (select 1 from jobsetevalmembers where eval = 820909 and build = id));
$ for i in $(cat ids); do echo $i; sqlite3 hydra.sqlite "begin transaction; insert into buildschedulinginfo (id, priority, busy, locker) values($i, 100, 0, ''); delete from buildresultinfo where id = $i; update builds set finished = 0 where id = $i; commit transaction;"; done
Or with Postgres:
* Restart all builds in a given evaluation that had a build step time out:
(restarting all aborted builds with ID > 42000)
$ psql -h buildfarm.st.ewi.tudelft.nl -U hydra hydra -t -c 'select x.id from builds x join buildresultinfo r on r.id = x.id where finished = 1 and buildstatus = 3 and x.id > 42000' > ids
> update builds set finished = 0 where id in (select id from builds where finished = 1 and buildstatus != 0 and exists (select 1 from jobsetevalmembers where eval = 926992 and build = id) and exists (select 1 from buildsteps where build = id and status = 7));
$ for i in $(cat ids); do echo $i; PGPASSWORD=... psql -h buildfarm.st.ewi.tudelft.nl -U hydra hydra -t -c "begin transaction; insert into buildschedulinginfo (id, priority, busy, locker) values($i, 100, 0, ''); delete from buildresultinfo where id = $i; update builds set finished = 0 where id = $i; commit transaction;"; done
* select * from (select project, jobset, job, system, max(timestamp) timestamp from builds where finished = 1 group by project, jobset, job, system) x join builds y on x.timestamp = y.timestamp and x.project = y.project and x.jobset = y.jobset and x.job = y.job and x.system = y.system;
@@ -141,14 +78,9 @@
* Delete all scheduled builds that are not already building:
delete from builds where finished = 0 and not exists (select 1 from buildschedulinginfo s where s.id = builds.id and busy = 1);
delete from builds where finished = 0 and not exists (select 1 from buildschedulinginfo s where s.id = builds.id and busy != 0);
from (select project, jobset, job, system, max(id) as id from Builds where finished = 1 group by project, jobset, job, system) as a_
natural join Builds x
@@ -159,7 +91,7 @@
where x.project = c.project and x.jobset = c.jobset and x.job = c.job and x.system = c.system
and x.id > c.id and r.buildstatus != r2.buildstatus);
* Using PostgreSQL:
* Using PostgreSQL (version 9.2 or newer is required):
$ HYDRA_DBI="dbi:Pg:dbname=hydra;" hydra-server
@@ -178,3 +110,13 @@
succeed in the nixpkgs:trunk jobset:
select job, system from builds b natural join buildresultinfo where project = 'nixpkgs' and jobset = 'stdenv' and iscurrent = 1 and finished = 1 and buildstatus != 0 and exists (select 1 from builds natural join buildresultinfo where project = 'nixpkgs' and jobset = 'trunk' and job = b.job and system = b.system and iscurrent = 1 and finished = 1 and buildstatus = 0) order by job, system;
* Get all Nixpkgs jobs that have never built succesfully:
select project, jobset, job from builds b1
where project = 'nixpkgs' and jobset = 'trunk' and iscurrent = 1
group by project, jobset, job
having not exists
(select 1 from builds b2 where b1.project = b2.project and b1.jobset = b2.jobset and b1.job = b2.job and finished = 1 and buildstatus = 0)
Hydra stores the following job attributes in its database:
*`nixName` - the Derivation's `name` attribute
*`system` - the Derivation's `system` attribute
*`drvPath` - the Derivation's path in the Nix store
*`outputs` - A JSON dictionary of output names and their store path.
### Meta fields
*`description` - `meta.description`, a string
*`license` - a comma separated list of license names from `meta.license`, expected to be a list of attribute sets with an attribute named `shortName`, ex: `[ { shortName = "licensename"} ]`.
*`homepage` - `meta.homepage`, a string
*`maintainers` - a comma separated list of maintainer email addresses from `meta.maintainers`, expected to be a list of attribute sets with an attribute named `email`, ex: `[ { email = "alice@example.com"; } ]`.
*`schedulingPriority` - `meta.schedulingPriority`, an integer. Default: 100. Slightly prioritizes this job over other jobs within this jobset.
*`timeout` - `meta.timeout`, an integer. Default: 36000. Number of seconds this job must complete within.
*`maxSilent` - `meta.maxSilent`, an integer. Default: 7200. Number of seconds of no output on stderr / stdout before considering the job failed.
Hydra uses a notification-based subsystem to implement some features and support plugin development. Notifications are sent to `hydra-notify`, which is responsible for dispatching each notification to each plugin.
Notifications are passed from `hydra-queue-runner` to `hydra-notify` through Postgres's `NOTIFY` and `LISTEN` feature.
## Notification Types
Note that the notification format is subject to change and should not be considered an API. Integrate with `hydra-notify` instead of listening directly.
### `cached_build_finished`
* **Payload:** Exactly two values, tab separated: The ID of the evaluation which contains the finished build, followed by the ID of the finished build.
* **When:** Issued directly after an evaluation completes, when that evaluation includes this finished build.
* **Delivery Semantics:** At most once per evaluation.
### `cached_build_queued`
* **Payload:** Exactly two values, tab separated: The ID of the evaluation which contains the finished build, followed by the ID of the queued build.
* **When:** Issued directly after an evaluation completes, when that evaluation includes this queued build.
* **Delivery Semantics:** At most once per evaluation.
### `build_queued`
* **Payload:** Exactly one value, the ID of the build.
* **When:** Issued after the transaction inserting the build in to the database is committed. One notification is sent per new build.
* **Delivery Semantics:** Ephemeral. `hydra-notify` must be running to react to this event. No record of this event is stored.
### `build_started`
* **Payload:** Exactly one value, the ID of the build.
* **When:** Issued directly before building happens, and only if the derivation's outputs cannot be substituted.
* **Delivery Semantics:** Ephemeral. `hydra-notify` must be running to react to this event. No record of this event is stored.
### `step_finished`
* **Payload:** Three values, tab separated: The ID of the build which the step is part of, the step number, and the path on disk to the log file.
* **When:** Issued directly after a step completes, regardless of success. Is not issued if the step's derivation's outputs can be substituted.
* **Delivery Semantics:** Ephemeral. `hydra-notify` must be running to react to this event. No record of this event is stored.
### `build_finished`
* **Payload:** At least one value, tab separated: The ID of the build which finished, followed by IDs of all of the builds which also depended upon this build.
* **When:** Issued directly after a build completes, regardless of success and substitutability.
* **Delivery Semantics:** At least once.
`hydra-notify` will call `buildFinished` for each plugin in two ways:
* The `builds` table's `notificationspendingsince` column stores when the build finished. On startup, `hydra-notify` will query all builds with a non-null `notificationspendingsince` value and treat each row as a received `build_finished` event.
* Additionally, `hydra-notify` subscribes to `build_finished` events and processes them in real time.
After processing, the row's `notificationspendingsince` column is set to null.
It is possible for subsequent deliveries of the same `build_finished` data to imply different outcomes. For example, if the build fails, is restarted, and then succeeds. In this scenario the `build_finished` events will be delivered at least twice, once for the failure and then once for the success.
### `eval_started`
* **Payload:** Exactly two values, tab separated: an opaque trace ID representing this evaluation, and the ID of the jobset.
* **When:** At the beginning of the evaluation phase for the jobset, before any work is done.
* **Delivery Semantics:** Ephemeral. `hydra-notify` must be running to react to this event. No record of this event is stored.
### `eval_added`
* **Payload:** Exactly three values, tab separated: an opaque trace ID representing this evaluation, the ID of the jobset, and the ID of the JobsetEval record.
* **When:** After the evaluator fetches inputs and completes the evaluation successfully.
* **Delivery Semantics:** Ephemeral. `hydra-notify` must be running to react to this event. No record of this event is stored.
### `eval_cached`
* **Payload:** Exactly three values: an opaque trace ID representing this evaluation, the ID of the jobset, and the ID of the previous identical evaluation.
* **When:** After the evaluator fetches inputs, if none of the inputs changed.
* **Delivery Semantics:** Ephemeral. `hydra-notify` must be running to react to this event. No record of this event is stored.
### `eval_failed`
* **Payload:** Exactly two values: an opaque trace ID representing this evaluation, and the ID of the jobset.
* **When:** After any fetching any input fails, or any other evaluation error occurs.
* **Delivery Semantics:** Ephemeral. `hydra-notify` must be running to react to this event. No record of this event is stored.
## Development Notes
### Re-sending a notification
Notifications can be experimentally re-sent on the command line with `psql`, with `NOTIFY $notificationname, '$payload'`.
Hydra supports executing a program after certain builds finish.
This behavior is disabled by default.
Hydra executes these commands under the `hydra-notify` service.
### Static Commands
Configure specific commands to execute after the specified matching job finishes.
#### Configuration
-`runcommand.[].job`
A matcher for jobs to match in the format `project:jobset:job`. Defaults to `*:*:*`.
**Note:** This matcher format is not a regular expression.
The `*` is a wildcard for that entire section, partial matches are not supported.
-`runcommand.[].command`
Command to run. Can use the `$HYDRA_JSON` environment variable to access information about the build.
### Example
```xml
<runcommand>
job = myProject:*:*
command = cat $HYDRA_JSON > /tmp/hydra-output
</runcommand>
```
### Dynamic Commands
Hydra can optionally run RunCommand hooks defined dynamically by the jobset. In
order to enable dynamic commands, you must enable this feature in your
`hydra.conf`, *as well as* in the parent project and jobset configuration.
#### Behavior
Hydra will execute any program defined under the `runCommandHook` attribute set. These jobs must have a single output named `out`, and that output must be an executable file located directly at `$out`.
#### Security Properties
Safely deploying dynamic commands requires careful design of your Hydra jobs. Allowing arbitrary users to define attributes in your top level attribute set will allow that user to execute code on your Hydra.
If a jobset has dynamic commands enabled, you must ensure only trusted users can define top level attributes.
#### Configuration
-`dynamicruncommand.enable`
Set to 1 to enable dynamic RunCommand program execution.
#### Example
In your Hydra configuration, specify:
```xml
<dynamicruncommand>
enable = 1
</dynamicruncommand>
```
Then create a job named `runCommandHook.example` in your jobset:
```
{ pkgs, ... }: {
runCommandHook = {
recurseForDerivations = true;
example = pkgs.writeScript "run-me" ''
#!${pkgs.runtimeShell}
${pkgs.jq}/bin/jq . "$HYDRA_JSON"
'';
};
}
```
After the `runcommandHook.example` build finishes that script will execute.
Further information about these environment variables can be found at the
[MetaCPAN documentation of `Email::Sender::Manual::QuickStart`](https://metacpan.org/pod/Email::Sender::Manual::QuickStart#specifying-transport-in-the-environment).
It's recommended to not put this in `services.hydra-dev.extraEnv` as this would
leak the secrets into the Nix store. Instead, it should be written into an
Hydra can notify Git servers (such as [GitLab](https://gitlab.com/), [GitHub](https://github.com)
or [Gitea](https://gitea.io/en-us/)) about the result of a build from a Git checkout.
This section describes how it can be implemented for `gitea`, but the approach for `gitlab` is
analogous:
* [Obtain an API token for your user](https://docs.gitea.io/en-us/api-usage/#authentication)
* Add it to a file which only users in the hydra group can read like this: see [including files](configuration.md#including-files) for more information
```
<gitea_authorization>
your_username=your_token
</gitea_authorization>
```
* Include the file in your `hydra.conf` like this:
``` nix
{
services.hydra-dev.extraConfig = ''
Include /path/to/secret/file
'';
}
```
* For a jobset with a `Git`-input which points to a `gitea`-instance, add the following
This guide helps Hydra administrators migrate from unauthenticated webhooks to authenticated webhooks to secure their Hydra instances against unauthorized job evaluations.
## Why Migrate?
Currently, Hydra's webhook endpoints (`/api/push-github` and `/api/push-gitea`) accept any POST request without authentication. This vulnerability allows:
- Anyone to trigger expensive job evaluations
- Potential denial of service through repeated requests
- Manipulation of build timing and scheduling
## Step-by-Step Migration for NixOS
### 1. Create Webhook Configuration
Create a webhook secrets configuration file with the generated secrets:
```bash
# Create the secrets configuration file with inline secret generation
constexprstd::string_viewcommon="select max(s.build) from BuildSteps s join BuildStepOutputs o on s.build = o.build where startTime != 0 and stopTime != 0 and status = 1";
$c->log->warn("Webhook authentication failed for $platform: Unable to validate signature from IP ".$c->request->address." because no secrets are configured");
,where=>\['me.flake like ? or exists (select 1 from JobsetInputAlts where project = me.project and jobset = me.name and value like ?)',['flake',"%github%$owner/$repo%"],['value',"%github.com%$owner/$repo%"]]
,where=>\['me.flake like ? or exists (select 1 from JobsetInputAlts where project = me.project and jobset = me.name and value like ?)',['flake',"%$url%"],['value',"%$url%"]]
,'+select'=>["(select bs.stoptime from buildsteps as bs where bs.machine = (me.username || '\@' || me.hostname) and not bs.stoptime is null order by bs.stoptime desc limit 1)"]
error($c,"Invalid or empty username.")if$usernameeq"";
error($c,"Max concurrent builds should be an integer > 0.")if$maxconcurrenteq""||!$maxconcurrent=~m/[0-9]+/;
error($c,"Speed factor should be an integer > 0.")if$speedfactoreq""||!$speedfactor=~m/[0-9]+/;
error($c,"Invalid or empty SSH key.")if$ssh_keyeq"";
$machine->update(
{username=>$username
,maxconcurrent=>$maxconcurrent
,speedfactor=>$speedfactor
,ssh_key=>$ssh_key
,options=>$options
my$builds=$c->model('DB::Builds')->search_rs(
{id=>{-in=>\"select id from Builds where id in ((select id from Builds where finished = 0) except (select build from JobsetEvalMembers where eval in (select max(id) from JobsetEvals where hasNewBuilds = 1 group by jobset_id)))"}
push(@select,"(select buildstatus from Builds b where b.id = (select max(id) from Builds t where t.project = me.project and t.jobset = me.jobset and t.job = me.job and t.system = '$system' and t.iscurrent = 1 ))");
push(@as,$system);
push(@select,"(select b.id from Builds b where b.id = (select max(id) from Builds t where t.project = me.project and t.jobset = me.jobset and t.job = me.job and t.system = '$system' and t.iscurrent = 1 ))");
"select build, stepnr, s.system as system, s.drvpath as drvpath, machine, s.starttime as starttime, jobsets.project as project, jobsets.name as jobset, job, s.busy as busy ".
"from BuildSteps s ".
"join Builds b on s.build = b.id ".
"join Jobsets jobsets on jobsets.id = b.jobset_id ".
ornotFound($c,"The RunCommand log is not available.");
my$logFile=constructRunCommandLogPath($runlog);
if(-f$logFile){
serveLogFile($c,$logFile,$tail);
return;
}else{
notFound($c,"The RunCommand log is not available.");
}
}
1;
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.