Spaces:
Sleeping
Sleeping
Upload folder using huggingface_hub
Browse files- LICENSE +262 -0
- policies/POLICIES.md +163 -0
- policies/api_capability_boundaries.md +47 -0
- policies/average_vs_total.md +20 -0
- policies/error_tool_warnings.md +15 -0
- policies/missing_req_id_vs_unsupported.md +28 -0
- policies/multi_api_reasoning.md +34 -0
- policies/policies.json +203 -0
LICENSE
ADDED
|
@@ -0,0 +1,262 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Apache License
|
| 2 |
+
Version 2.0, January 2004
|
| 3 |
+
http://www.apache.org/licenses/
|
| 4 |
+
|
| 5 |
+
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
| 6 |
+
|
| 7 |
+
1. Definitions.
|
| 8 |
+
|
| 9 |
+
"License" shall mean the terms and conditions for use, reproduction,
|
| 10 |
+
and distribution as defined by Sections 1 through 9 of this document.
|
| 11 |
+
|
| 12 |
+
"Licensor" shall mean the copyright owner or entity granting the License.
|
| 13 |
+
|
| 14 |
+
"Legal Entity" shall mean the union of the acting entity and all
|
| 15 |
+
other entities that control, are controlled by, or are under common
|
| 16 |
+
control with that entity. For the purposes of control, an entity
|
| 17 |
+
is "controlled by" another entity if it has the power, directly or
|
| 18 |
+
indirectly, to cause the direction or management of such entity,
|
| 19 |
+
whether by contract or otherwise, or ownership of fifty percent (50%)
|
| 20 |
+
or more of the outstanding shares, or beneficial ownership of such entity.
|
| 21 |
+
|
| 22 |
+
"You" (or "Your") shall mean an individual or Legal Entity
|
| 23 |
+
exercising permissions granted by this License.
|
| 24 |
+
|
| 25 |
+
"Source" shall mean the preferred form for making modifications,
|
| 26 |
+
including but not limited to software source code, documentation
|
| 27 |
+
source, and configuration files.
|
| 28 |
+
|
| 29 |
+
"Object" shall mean any form resulting from mechanical
|
| 30 |
+
transformation or translation of a Source form, including but
|
| 31 |
+
not limited to compiled object code, generated documentation,
|
| 32 |
+
and conversions to other media types.
|
| 33 |
+
|
| 34 |
+
"Work" shall mean the work of authorship, whether in Source or
|
| 35 |
+
Object form, made available under the License, as indicated by a
|
| 36 |
+
copyright notice that is included in or attached to the work
|
| 37 |
+
(which shall not include combinations of Works unless specifically
|
| 38 |
+
authorized by the License).
|
| 39 |
+
|
| 40 |
+
"Derivative Works" shall mean any work, whether in Source or Object
|
| 41 |
+
form, that is based upon (or derived from) the Work and for which the
|
| 42 |
+
editorial revisions, annotations, elaborations, or other modifications
|
| 43 |
+
represent, as a whole, an original work of authorship. For the purposes
|
| 44 |
+
of the License, a Derivative Work shall include any work that is based
|
| 45 |
+
upon a copy of the existing Work.
|
| 46 |
+
|
| 47 |
+
"Contribution" shall mean any work of authorship, including
|
| 48 |
+
the original version of the Work and any modifications or additions
|
| 49 |
+
to that Work or Derivative Works thereof, that is intentionally
|
| 50 |
+
submitted to Licensor for inclusion in the Work by the copyright owner
|
| 51 |
+
or by an individual or Legal Entity authorized to submit on behalf of
|
| 52 |
+
the copyright owner. For the purposes of this definition, "submitted"
|
| 53 |
+
means any form of electronic, verbal, or written communication sent
|
| 54 |
+
to the Licensor or its representatives, including but not limited to
|
| 55 |
+
communication on electronic mailing lists, source code control
|
| 56 |
+
systems, and issue tracking systems that are managed by, or on behalf
|
| 57 |
+
of, the Licensor for the purpose of discussing and improving the Work,
|
| 58 |
+
but excluding communication that is conspicuously marked or otherwise
|
| 59 |
+
designated in writing by the copyright owner as "Not a Contribution."
|
| 60 |
+
|
| 61 |
+
"Contributor" shall mean Licensor and any individual or Legal Entity
|
| 62 |
+
on behalf of whom a Contribution has been received by Licensor and
|
| 63 |
+
subsequently incorporated within the Work.
|
| 64 |
+
|
| 65 |
+
2. Grant of Copyright License. Subject to the terms and conditions of
|
| 66 |
+
this License, each Contributor hereby grants to You a perpetual,
|
| 67 |
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
| 68 |
+
copyright license to use, reproduce, modify, create derivative works,
|
| 69 |
+
distribute, sublicense, and/or sell copies of the Work, and to
|
| 70 |
+
permit persons to whom the Work is furnished to do so, subject to
|
| 71 |
+
the following conditions:
|
| 72 |
+
|
| 73 |
+
The above copyright notice and this permission notice shall be
|
| 74 |
+
included in all copies or substantial portions of the Work.
|
| 75 |
+
|
| 76 |
+
3. Grant of Patent License. Subject to the terms and conditions of
|
| 77 |
+
this License, each Contributor hereby grants to You a perpetual,
|
| 78 |
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
| 79 |
+
(except as stated in this section) patent license to make, have made,
|
| 80 |
+
use, offer to sell, sell, import, and otherwise transfer the Work,
|
| 81 |
+
where such license applies only to those patent claims licensable
|
| 82 |
+
by such Contributor that are necessarily infringed by their
|
| 83 |
+
Contribution(s) alone or by combination of their Contribution(s)
|
| 84 |
+
with the Work to which such Contribution(s) was submitted. If You
|
| 85 |
+
institute patent litigation against any entity (including a
|
| 86 |
+
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
| 87 |
+
or a Contribution incorporated within the Work constitutes direct
|
| 88 |
+
or contributory patent infringement, then any patent licenses
|
| 89 |
+
granted to You under this License for that Work shall terminate
|
| 90 |
+
as of the date such litigation is filed.
|
| 91 |
+
|
| 92 |
+
4. Redistribution. You may reproduce and distribute copies of the
|
| 93 |
+
Work or Derivative Works thereof in any medium, with or without
|
| 94 |
+
modifications, and in Source or Object form, provided that You
|
| 95 |
+
meet the following conditions:
|
| 96 |
+
|
| 97 |
+
(a) You must give any other recipients of the Work or
|
| 98 |
+
Derivative Works a copy of this License; and
|
| 99 |
+
|
| 100 |
+
(b) You must cause any modified files to carry prominent notices
|
| 101 |
+
stating that You changed the files; and
|
| 102 |
+
|
| 103 |
+
(c) You must retain, in the Source form of any Derivative Works
|
| 104 |
+
that You distribute, all copyright, patent, trademark, and
|
| 105 |
+
attribution notices from the Source form of the Work,
|
| 106 |
+
excluding those notices that do not pertain to any part of
|
| 107 |
+
the Derivative Works; and
|
| 108 |
+
|
| 109 |
+
(d) If the Work includes a "NOTICE" text file as part of its
|
| 110 |
+
distribution, then any Derivative Works that You distribute must
|
| 111 |
+
include a readable copy of the attribution notices contained
|
| 112 |
+
within such NOTICE file, excluding those notices that do not
|
| 113 |
+
pertain to any part of the Derivative Works, in at least one
|
| 114 |
+
of the following places: within a NOTICE text file distributed
|
| 115 |
+
as part of the Derivative Works; within the Source form or
|
| 116 |
+
documentation, if provided along with the Derivative Works; or,
|
| 117 |
+
within a display generated by the Derivative Works, if and
|
| 118 |
+
wherever such third-party notices normally appear. The contents
|
| 119 |
+
of the NOTICE file are for informational purposes only and
|
| 120 |
+
do not modify the License. You may add Your own attribution
|
| 121 |
+
notices within Derivative Works that You distribute, alongside
|
| 122 |
+
or as an addendum to the NOTICE text from the Work, provided
|
| 123 |
+
that such additional attribution notices cannot be construed
|
| 124 |
+
as modifying the License.
|
| 125 |
+
|
| 126 |
+
You may add Your own copyright notices to Your modifications and
|
| 127 |
+
may provide additional or different license terms and conditions
|
| 128 |
+
for use, reproduction, or distribution of Your modifications, or
|
| 129 |
+
for any such Derivative Works as a whole, provided Your use,
|
| 130 |
+
reproduction, and distribution of the Work otherwise complies with
|
| 131 |
+
the conditions stated in this License.
|
| 132 |
+
|
| 133 |
+
5. Submission of Contributions. Unless You explicitly state otherwise,
|
| 134 |
+
any Contribution intentionally submitted for inclusion in the Work
|
| 135 |
+
by You to the Licensor shall be under the terms and conditions of
|
| 136 |
+
this License, without any additional terms or conditions.
|
| 137 |
+
Notwithstanding the above, nothing herein shall supersede or modify
|
| 138 |
+
the terms of any separate license agreement you may have executed
|
| 139 |
+
with Licensor regarding such Contributions.
|
| 140 |
+
|
| 141 |
+
6. Trademarks. This License does not grant permission to use the trade
|
| 142 |
+
names, trademarks, service marks, or product names of the Licensor,
|
| 143 |
+
except as required for reasonable and customary use in describing the
|
| 144 |
+
origin of the Work and reproducing the content of the NOTICE file.
|
| 145 |
+
|
| 146 |
+
7. Disclaimer of Warranty. Unless required by applicable law or
|
| 147 |
+
agreed to in writing, Licensor provides the Work (and each
|
| 148 |
+
Contributor provides its Contributions) on an "AS IS" BASIS,
|
| 149 |
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
| 150 |
+
implied, including, without limitation, any warranties or conditions
|
| 151 |
+
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
| 152 |
+
PARTICULAR PURPOSE. You are solely responsible for determining the
|
| 153 |
+
appropriateness of using or redistributing the Work and assume any
|
| 154 |
+
risks associated with Your exercise of permissions under this License.
|
| 155 |
+
|
| 156 |
+
8. Limitation of Liability. In no event and under no legal theory,
|
| 157 |
+
whether in tort (including negligence), contract, or otherwise,
|
| 158 |
+
unless required by applicable law (such as deliberate and grossly
|
| 159 |
+
negligent acts) or agreed to in writing, shall any Contributor be
|
| 160 |
+
liable to You for damages, including any direct, indirect, special,
|
| 161 |
+
incidental, or consequential damages of any character arising as a
|
| 162 |
+
result of this License or out of the use or inability to use the
|
| 163 |
+
Work (including but not limited to damages for loss of goodwill,
|
| 164 |
+
work stoppage, computer failure or malfunction, or any and all
|
| 165 |
+
other commercial damages or losses), even if such Contributor
|
| 166 |
+
has been advised of the possibility of such damages.
|
| 167 |
+
|
| 168 |
+
9. Accepting Warranty or Support. You may choose to offer, and to
|
| 169 |
+
charge a fee for, warranty, support, indemnity or other liability
|
| 170 |
+
obligations and/or rights consistent with this License. However, in
|
| 171 |
+
accepting such obligations, You may act only on Your own behalf and on
|
| 172 |
+
Your sole responsibility, not on behalf of any other Contributor, and
|
| 173 |
+
only if You agree to indemnify, defend, and hold each Contributor
|
| 174 |
+
harmless for any liability incurred by, or claims asserted against,
|
| 175 |
+
such Contributor by reason of your accepting any such warranty or support.
|
| 176 |
+
|
| 177 |
+
END OF TERMS AND CONDITIONS
|
| 178 |
+
|
| 179 |
+
APPENDIX: How to apply the Apache License to your work.
|
| 180 |
+
|
| 181 |
+
To apply the Apache License to your work, attach the following
|
| 182 |
+
boilerplate notice, with the fields enclosed by brackets "[]"
|
| 183 |
+
replaced with your own identifying information. (Don't include
|
| 184 |
+
the brackets!) The text should be enclosed in the appropriate
|
| 185 |
+
comment syntax for the file format. We also recommend that a
|
| 186 |
+
file or class name and description of purpose be included on the
|
| 187 |
+
same page as the copyright notice for easier identification within
|
| 188 |
+
third-party archives.
|
| 189 |
+
|
| 190 |
+
Copyright 2025 CUGA
|
| 191 |
+
|
| 192 |
+
Licensed under the Apache License, Version 2.0 (the "License");
|
| 193 |
+
you may not use this file except in compliance with the License.
|
| 194 |
+
You may obtain a copy of the License at
|
| 195 |
+
|
| 196 |
+
http://www.apache.org/licenses/
|
| 197 |
+
|
| 198 |
+
Unless required by applicable law or agreed to in writing, software
|
| 199 |
+
distributed under the License is distributed on an "AS IS" BASIS,
|
| 200 |
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
| 201 |
+
See the License for the specific language governing permissions and
|
| 202 |
+
limitations under the License.
|
| 203 |
+
|
| 204 |
+
---
|
| 205 |
+
|
| 206 |
+
# NOTICE
|
| 207 |
+
|
| 208 |
+
This project includes code from multiple open source projects:
|
| 209 |
+
|
| 210 |
+
## BrowserGym
|
| 211 |
+
Copyright 2024 ServiceNow
|
| 212 |
+
Licensed under Apache License 2.0
|
| 213 |
+
Source: https://github.com/ServiceNow/BrowserGym
|
| 214 |
+
|
| 215 |
+
Portions of this project are derived from BrowserGym, including:
|
| 216 |
+
- cuga/backend/browser_env/browser/chat_async.py
|
| 217 |
+
- cuga/backend/browser_env/page_understanding/tranformer_utils/transform_utils.py
|
| 218 |
+
- cuga/backend/browser_env/page_understanding/tranformer_utils/dom_transform_utils.py
|
| 219 |
+
- cuga/backend/browser_env/browser/gym_env_async.py
|
| 220 |
+
- cuga/backend/browser_env/browser/gym_obs/obs.py
|
| 221 |
+
- cuga/backend/browser_env/browser/gym_obs/obs_async.py
|
| 222 |
+
- cuga/backend/browser_env/browser/gym_obs/extract_chrome_extension.py
|
| 223 |
+
- cuga/backend/browser_env/browser/env.py
|
| 224 |
+
- cuga/backend/browser_env/browser/extension_env_async.py
|
| 225 |
+
- cuga/backend/browser_env/browser/open_ended_async.py
|
| 226 |
+
- cuga/backend/browser_env/browser/gym_env.py
|
| 227 |
+
- cuga/backend/browser_env/browser/gym_obs/javascript/frame_unmark_elements.js
|
| 228 |
+
- cuga/backend/browser_env/browser/gym_obs/javascript/frame_mark_elements.js
|
| 229 |
+
- cuga/backend/browser_env/browser/utils_async.py
|
| 230 |
+
|
| 231 |
+
## browser-use
|
| 232 |
+
Copyright (c) 2024 Gregor Zunic
|
| 233 |
+
Licensed under MIT License
|
| 234 |
+
Source: https://github.com/browser-use/browser-use
|
| 235 |
+
|
| 236 |
+
Portions of this project are derived from browser-use, including:
|
| 237 |
+
- frontend-workspaces/extension/src/content/page_analysis/CachedXPathBuilder.ts
|
| 238 |
+
- frontend-workspaces/extension/src/content/page_analysis/constants.ts
|
| 239 |
+
- frontend-workspaces/extension/src/content/page_analysis/dom_tree_module.ts
|
| 240 |
+
- frontend-workspaces/extension/src/content/page_analysis/DomCache.ts
|
| 241 |
+
- frontend-workspaces/extension/src/content/page_analysis/DomTree.ts
|
| 242 |
+
- frontend-workspaces/extension/src/content/page_analysis/ElementHighlighter.ts
|
| 243 |
+
- frontend-workspaces/extension/src/content/page_analysis/NodeElementCollector.ts
|
| 244 |
+
- frontend-workspaces/extension/src/content/page_analysis/NodeHelper.ts
|
| 245 |
+
- frontend-workspaces/extension/src/content/page_analysis/PageHighlighter.ts
|
| 246 |
+
- frontend-workspaces/extension/src/content/page_analysis/types.d.ts
|
| 247 |
+
|
| 248 |
+
|
| 249 |
+
## LangChain
|
| 250 |
+
Copyright (c) 2025 LangChain
|
| 251 |
+
Licensed under MIT License
|
| 252 |
+
Source: https://github.com/langchain-ai/langgraph
|
| 253 |
+
|
| 254 |
+
|
| 255 |
+
Portions of this project derived from LangChain include:
|
| 256 |
+
- cuga/backend/cuga_graph/nodes/api/code_agent/code_act_agent.py
|
| 257 |
+
|
| 258 |
+
## Original Work
|
| 259 |
+
Copyright 2025 CUGA
|
| 260 |
+
Licensed under Apache License 2.0
|
| 261 |
+
|
| 262 |
+
All original code and modifications are licensed under Apache License 2.0.
|
policies/POLICIES.md
ADDED
|
@@ -0,0 +1,163 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# CUGA Policies for BPO Benchmark
|
| 2 |
+
|
| 3 |
+
This document tracks all policies and their measured impact on evaluation scores.
|
| 4 |
+
|
| 5 |
+
## Measured Results (5 runs each, clean Milvus DB)
|
| 6 |
+
|
| 7 |
+
### Per-run scores
|
| 8 |
+
|
| 9 |
+
```text
|
| 10 |
+
Config Run1 Run2 Run3 Run4 Run5 Mean pass@5 pass^5
|
| 11 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 12 |
+
No policies 12 13 14 13 12 12.8 15/26 10/26
|
| 13 |
+
5 policies 21 23 22 21 20 21.4 23/26 19/26
|
| 14 |
+
```
|
| 15 |
+
|
| 16 |
+
### Aggregate metrics
|
| 17 |
+
|
| 18 |
+
| Metric | No Policies | 5 Policies | Delta |
|
| 19 |
+
|---|---|---|---|
|
| 20 |
+
| Mean score | 12.8/26 (49.2%) | 21.4/26 (82.3%) | +33.1pp |
|
| 21 |
+
| pass@5 | 15/26 (57.7%) | 23/26 (88.5%) | +30.8pp |
|
| 22 |
+
| pass^5 | 10/26 (38.5%) | 19/26 (73.1%) | +34.6pp |
|
| 23 |
+
|
| 24 |
+
### Per-task breakdown
|
| 25 |
+
|
| 26 |
+
```text
|
| 27 |
+
Task NoPol @5 ^5 | Pol @5 ^5 Delta Category
|
| 28 |
+
ββββ βββββ ββ ββ | βββ ββ ββ βββββ ββββββββ
|
| 29 |
+
1 5/5 Y Y | 4/5 Y - -1 regression (flaky)
|
| 30 |
+
2 5/5 Y Y | 5/5 Y Y = stable pass
|
| 31 |
+
3 1/5 Y - | 0/5 - - -1 regression
|
| 32 |
+
4 5/5 Y Y | 5/5 Y Y = stable pass
|
| 33 |
+
5 5/5 Y Y | 5/5 Y Y = stable pass
|
| 34 |
+
6 0/5 - - | 3/5 Y - +3 IMPROVED (flaky)
|
| 35 |
+
7 0/5 - - | 0/5 - - = stable fail
|
| 36 |
+
8 5/5 Y Y | 5/5 Y Y = stable pass
|
| 37 |
+
9 0/5 - - | 5/5 Y Y +5 IMPROVED (Policy #3)
|
| 38 |
+
10 5/5 Y Y | 5/5 Y Y = stable pass
|
| 39 |
+
11 4/5 Y - | 5/5 Y Y +1 IMPROVED (stabilized)
|
| 40 |
+
12 3/5 Y - | 5/5 Y Y +2 IMPROVED (Policy #2)
|
| 41 |
+
13 5/5 Y Y | 5/5 Y Y = stable pass
|
| 42 |
+
14 4/5 Y - | 5/5 Y Y +1 IMPROVED (Policy #2)
|
| 43 |
+
15 0/5 - - | 1/5 Y - +1 IMPROVED (flaky)
|
| 44 |
+
16 0/5 - - | 5/5 Y Y +5 IMPROVED (Policy #1)
|
| 45 |
+
17 5/5 Y Y | 5/5 Y Y = stable pass
|
| 46 |
+
18 5/5 Y Y | 5/5 Y Y = stable pass
|
| 47 |
+
19 0/5 - - | 5/5 Y Y +5 IMPROVED (Policy #1)
|
| 48 |
+
20 5/5 Y Y | 5/5 Y Y = stable pass
|
| 49 |
+
21 0/5 - - | 4/5 Y - +4 IMPROVED (Policy #1, flaky)
|
| 50 |
+
22 0/5 - - | 5/5 Y Y +5 IMPROVED (Policy #1)
|
| 51 |
+
23 0/5 - - | 5/5 Y Y +5 IMPROVED (Policy #1)
|
| 52 |
+
24 2/5 Y - | 5/5 Y Y +3 IMPROVED (Policy #1/#2)
|
| 53 |
+
25 0/5 - - | 0/5 - - = stable fail
|
| 54 |
+
26 0/5 - - | 5/5 Y Y +5 IMPROVED (Policy #4)
|
| 55 |
+
ββββ βββββ ββ ββ | βββ ββ ββ
|
| 56 |
+
TOTALS 15 10 | 23 19
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
---
|
| 60 |
+
|
| 61 |
+
## Policy #1: API Capability Boundaries [IMPLEMENTED]
|
| 62 |
+
|
| 63 |
+
**Type:** Playbook
|
| 64 |
+
**File:** `api_capability_boundaries.md`
|
| 65 |
+
**Triggers:** Keywords + natural language (threshold 0.65, priority 90)
|
| 66 |
+
|
| 67 |
+
Teaches the agent to recognize when the available APIs cannot answer a question.
|
| 68 |
+
Lists what the APIs can and cannot do, and instructs the agent to decline
|
| 69 |
+
out-of-scope requests directly instead of asking for a requisition ID or calling
|
| 70 |
+
irrelevant tools.
|
| 71 |
+
|
| 72 |
+
**Failure pattern addressed:** The agent would ask for a requisition ID or call
|
| 73 |
+
random APIs for queries about job descriptions, time-to-fill, geography
|
| 74 |
+
filtering, SLA deadlines, funnel timing, and job-card details β none of which
|
| 75 |
+
are supported by any API.
|
| 76 |
+
|
| 77 |
+
**Tasks fixed:** 16, 19, 21, 22, 23 (all 0/5 β 4-5/5)
|
| 78 |
+
|
| 79 |
+
---
|
| 80 |
+
|
| 81 |
+
## Policy #2: Error-Prone Tool Warnings [IMPLEMENTED]
|
| 82 |
+
|
| 83 |
+
**Type:** Tool Guide
|
| 84 |
+
**File:** `error_tool_warnings.md`
|
| 85 |
+
**Target:** 19 error-prone tools (prepended to their descriptions)
|
| 86 |
+
|
| 87 |
+
Prepends a warning to the descriptions of the 19 known-unreliable tools (those
|
| 88 |
+
that return 503s, schema violations, type mismatches). Steers the agent toward
|
| 89 |
+
the 13 reliable core tools and teaches it to recover gracefully when an error
|
| 90 |
+
tool is called.
|
| 91 |
+
|
| 92 |
+
**Failure pattern addressed:** The agent would call tools like `funnel_status`
|
| 93 |
+
(503 error), `model_registry` (wrong data), or `source_recommendation_summary`
|
| 94 |
+
(incomplete shortcut) instead of using the correct granular APIs.
|
| 95 |
+
|
| 96 |
+
**Tasks fixed:** 12 (3/5 β 5/5), 14 (4/5 β 5/5), 24 (2/5 β 5/5)
|
| 97 |
+
|
| 98 |
+
---
|
| 99 |
+
|
| 100 |
+
## Policy #3: Multi-API Reasoning [IMPLEMENTED]
|
| 101 |
+
|
| 102 |
+
**Type:** Playbook
|
| 103 |
+
**File:** `multi_api_reasoning.md`
|
| 104 |
+
**Triggers:** Keywords + natural language (threshold 0.65, priority 80)
|
| 105 |
+
|
| 106 |
+
Instructs the agent on when to call multiple specific APIs instead of relying on
|
| 107 |
+
a single summary endpoint. Provides a mapping from question type to the correct
|
| 108 |
+
specific tool, and clarifies the difference between "total requisitions used for
|
| 109 |
+
computation" (`definitions-and-methodology`) vs "similar requisitions analysed"
|
| 110 |
+
(`metadata-and-timeframe`).
|
| 111 |
+
|
| 112 |
+
**Failure pattern addressed:** The agent would use the summary shortcut tool for
|
| 113 |
+
multi-metric questions, or confuse which API returns the requisition count.
|
| 114 |
+
|
| 115 |
+
**Tasks fixed:** 9 (0/5 β 5/5 β now correctly returns 1047 from
|
| 116 |
+
`definitions-and-methodology` instead of 40 from `metadata-and-timeframe`)
|
| 117 |
+
|
| 118 |
+
---
|
| 119 |
+
|
| 120 |
+
## Policy #4: Average vs Total Calculations [IMPLEMENTED]
|
| 121 |
+
|
| 122 |
+
**Type:** Playbook
|
| 123 |
+
**File:** `average_vs_total.md`
|
| 124 |
+
**Triggers:** Keywords + natural language (threshold 0.65, priority 70)
|
| 125 |
+
|
| 126 |
+
Teaches the agent that when the user asks for "average" or "typical" values, it
|
| 127 |
+
must compute a per-requisition average by dividing the total by the count of
|
| 128 |
+
similar requisitions, rather than returning the raw total.
|
| 129 |
+
|
| 130 |
+
**Failure pattern addressed:** The agent would return the total candidate count
|
| 131 |
+
(2913) when asked "how many candidates do we usually get" instead of computing
|
| 132 |
+
the average (2913 / 40 = ~73).
|
| 133 |
+
|
| 134 |
+
**Tasks fixed:** 26 (0/5 β 5/5)
|
| 135 |
+
|
| 136 |
+
---
|
| 137 |
+
|
| 138 |
+
## Policy #5: Missing Requisition ID vs Unsupported Query [IMPLEMENTED]
|
| 139 |
+
|
| 140 |
+
**Type:** Playbook
|
| 141 |
+
**File:** `missing_req_id_vs_unsupported.md`
|
| 142 |
+
**Triggers:** Keywords + natural language (threshold 0.60, priority 85)
|
| 143 |
+
|
| 144 |
+
Helps the agent distinguish between "I need a requisition ID to answer this"
|
| 145 |
+
(answerable but missing context) vs "This can't be answered regardless of
|
| 146 |
+
requisition ID" (unsupported by any API). Reinforces Policy #1 for edge cases.
|
| 147 |
+
|
| 148 |
+
**Failure pattern addressed:** The agent would ask for a requisition ID even
|
| 149 |
+
when the question was about something no API supports.
|
| 150 |
+
|
| 151 |
+
**Tasks fixed:** Overlaps with Policy #1; provides reinforcement for edge cases.
|
| 152 |
+
|
| 153 |
+
---
|
| 154 |
+
|
| 155 |
+
## Remaining Failing Tasks
|
| 156 |
+
|
| 157 |
+
| Task | Pass rate (no pol) | Pass rate (5 pol) | Issue | Notes |
|
| 158 |
+
|---|---|---|---|---|
|
| 159 |
+
| 1 | 5/5 | 4/5 | Flaky regression | May be LLM non-determinism |
|
| 160 |
+
| 3 | 1/5 | 0/5 | Agent provides incomplete source data | Regression β policies may over-constrain |
|
| 161 |
+
| 7 | 0/5 | 0/5 | LLM judge scores correct behavior as 0 | Judge issue, not agent issue |
|
| 162 |
+
| 15 | 0/5 | 1/5 | Calls too few APIs, misidentifies negative-SLA skills | Complex multi-part question |
|
| 163 |
+
| 25 | 0/5 | 0/5 | Calls error tool for invalid requisition ID instead of declining | Edge case not caught by Policy #1 triggers |
|
policies/api_capability_boundaries.md
ADDED
|
@@ -0,0 +1,47 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# API Capability Boundaries
|
| 2 |
+
|
| 3 |
+
Before answering any question, verify that the available APIs can actually provide the needed data.
|
| 4 |
+
If they cannot, tell the user directly β do NOT attempt to cobble together an answer from unrelated endpoints, and do NOT ask for a requisition ID when the query is fundamentally unsupported.
|
| 5 |
+
|
| 6 |
+
## What the APIs CAN do
|
| 7 |
+
|
| 8 |
+
The available tool suite covers two domains:
|
| 9 |
+
|
| 10 |
+
### Candidate Source Analytics
|
| 11 |
+
- SLA percentage per sourcing channel
|
| 12 |
+
- Total hires per sourcing channel
|
| 13 |
+
- Candidate volume and share per sourcing channel
|
| 14 |
+
- Funnel conversion rates (review %, interview %, offer acceptance %) per source
|
| 15 |
+
- Composite source recommendation summary
|
| 16 |
+
- Metadata: data timeframe, last update date, number of similar requisitions analysed
|
| 17 |
+
- Definitions and methodology: metric definitions, total requisition count used for computation, ML models involved
|
| 18 |
+
|
| 19 |
+
### Skills Analytics
|
| 20 |
+
- Skill-level statistical analysis (historical counts, SLA correlation)
|
| 21 |
+
- Skill impact on fill rate
|
| 22 |
+
- Skill impact on SLA (delta with/without the skill)
|
| 23 |
+
- Skill relevance justification
|
| 24 |
+
- Data sources and ML models used for recommendations
|
| 25 |
+
- Successful posting criteria and benchmarks
|
| 26 |
+
|
| 27 |
+
## What the APIs CANNOT do
|
| 28 |
+
|
| 29 |
+
The following capabilities are NOT available through any API. If the user asks for any of these, explain that the current API suite does not support it:
|
| 30 |
+
|
| 31 |
+
- **Job description text**: No API returns or accepts raw job description content. You cannot read, optimise, or rewrite a job description.
|
| 32 |
+
- **Time-to-fill metrics**: No API provides time-to-fill data, whether overall or broken down by source.
|
| 33 |
+
- **Geographic or channel filtering**: No API supports filtering by country, region, or posting channel (internal vs external).
|
| 34 |
+
- **Live requisition status or SLA countdowns**: The APIs provide historical/aggregate analytics, not real-time status tracking or deadline monitoring.
|
| 35 |
+
- **Stage-by-stage funnel timing**: No API returns average days spent in each pipeline stage or candidate counts per stage over time.
|
| 36 |
+
- **Full job-card details**: No API returns comprehensive requisition details like title, location, hiring-manager name, or contact information. The APIs focus on aggregate analytics, not individual job metadata.
|
| 37 |
+
- **Cross-requisition listing or search**: The APIs analyse one requisition at a time against historical data. They cannot list, search, or filter across all open requisitions.
|
| 38 |
+
|
| 39 |
+
## How to respond when a query is out of scope
|
| 40 |
+
|
| 41 |
+
When you determine that a question cannot be answered with the available APIs:
|
| 42 |
+
|
| 43 |
+
1. State clearly that the current APIs do not provide the requested data
|
| 44 |
+
2. Be specific about what is missing (e.g., "the APIs don't expose time-to-fill broken down by source")
|
| 45 |
+
3. Do NOT ask for a requisition ID β providing one would not help
|
| 46 |
+
4. Do NOT call any API tools β the answer is that the data is unavailable
|
| 47 |
+
5. Do NOT fabricate or infer data that no API returned
|
policies/average_vs_total.md
ADDED
|
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Average vs Total Calculations
|
| 2 |
+
|
| 3 |
+
When the user asks for "average", "typical", "usually", or "per posting" values, you must compute an average β do not return a raw total.
|
| 4 |
+
|
| 5 |
+
## How to compute averages
|
| 6 |
+
|
| 7 |
+
1. Get the total metric value from the relevant API (e.g., total candidate volume from `candidate_source_candidate_volume_by_source`)
|
| 8 |
+
2. Get the number of similar requisitions from `candidate_source_metadata_and_timeframe`
|
| 9 |
+
3. Divide the total by the number of similar requisitions to get the per-requisition average
|
| 10 |
+
4. Report the average, not the total
|
| 11 |
+
|
| 12 |
+
## Example
|
| 13 |
+
|
| 14 |
+
If the user asks "How many candidates do we usually get for postings similar to X?":
|
| 15 |
+
- Total candidates across all sources = 2913
|
| 16 |
+
- Number of similar requisitions = 40
|
| 17 |
+
- Average = 2913 / 40 = ~73 candidates per posting
|
| 18 |
+
- Report: "On average, similar postings attract 73 candidates"
|
| 19 |
+
|
| 20 |
+
Do NOT report 2913 as the answer β that is the total, not the average.
|
policies/error_tool_warnings.md
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Error-Prone Tool Warning
|
| 2 |
+
|
| 3 |
+
WARNING: This tool is known to be unreliable. It may return HTTP errors (e.g. 503 Service Unavailable), schema violations, type mismatches, or unexpected data formats.
|
| 4 |
+
|
| 5 |
+
Before using this tool, check whether one of the 13 core reliable tools can answer the question instead:
|
| 6 |
+
|
| 7 |
+
**Reliable Candidate Source tools:** candidate_source_sla_per_source, candidate_source_total_hires_by_source, candidate_source_candidate_volume_by_source, candidate_source_funnel_conversion_by_source, candidate_source_metadata_and_timeframe, candidate_source_definitions_and_methodology, candidate_source_source_recommendation_summary
|
| 8 |
+
|
| 9 |
+
**Reliable Skills tools:** skills_skill_analysis, skills_skill_impact_fill_rate, skills_skill_impact_sla, skills_skill_relevance_justification, skills_successful_posting_criteria, skills_data_sources_used
|
| 10 |
+
|
| 11 |
+
If this tool returns an error or unexpected data:
|
| 12 |
+
- Do NOT report the raw error message to the user
|
| 13 |
+
- Do NOT retry the same tool
|
| 14 |
+
- Check if a reliable tool can provide the needed data
|
| 15 |
+
- If no reliable tool can help, tell the user the data is not available through the current APIs
|
policies/missing_req_id_vs_unsupported.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Missing Requisition ID vs Unsupported Query
|
| 2 |
+
|
| 3 |
+
When a question does not include a requisition ID, determine whether providing one would actually help before asking for it.
|
| 4 |
+
|
| 5 |
+
## Ask for a requisition ID when:
|
| 6 |
+
|
| 7 |
+
The question is about something the APIs support but needs a specific requisition to look up:
|
| 8 |
+
- SLA performance by source
|
| 9 |
+
- Candidate volume or hires by source
|
| 10 |
+
- Skill analysis or skill impact
|
| 11 |
+
- Funnel conversion rates
|
| 12 |
+
- Data sources or methodology used
|
| 13 |
+
- Metadata and timeframe
|
| 14 |
+
|
| 15 |
+
These all require a requisition ID to return useful results.
|
| 16 |
+
|
| 17 |
+
## Do NOT ask for a requisition ID when:
|
| 18 |
+
|
| 19 |
+
The question is about something no API supports regardless of requisition ID:
|
| 20 |
+
- Job description text (reading, optimizing, rewriting)
|
| 21 |
+
- Time-to-fill metrics (overall or by source)
|
| 22 |
+
- Geographic or location-based filtering
|
| 23 |
+
- Live requisition status or SLA deadline countdowns
|
| 24 |
+
- Stage-by-stage funnel timing (days in each stage)
|
| 25 |
+
- Full job-card details (title, location, hiring-manager info)
|
| 26 |
+
- Listing or searching across all open requisitions
|
| 27 |
+
|
| 28 |
+
For these, explain directly that the current APIs do not support the request. Asking for a requisition ID would be misleading because providing one would not help.
|
policies/multi_api_reasoning.md
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Multi-API Reasoning
|
| 2 |
+
|
| 3 |
+
When a question asks about multiple dimensions of performance, call the individual specific APIs rather than relying on a single summary endpoint. Summary tools give a quick overview but often lack the granular detail needed for a complete answer.
|
| 4 |
+
|
| 5 |
+
## Source Performance Questions
|
| 6 |
+
|
| 7 |
+
When comparing or recommending sources across multiple metrics (SLA, volume, conversion, hires), call the specific tools:
|
| 8 |
+
|
| 9 |
+
- **SLA performance** β `candidate_source_sla_per_source`
|
| 10 |
+
- **Candidate volume and share** β `candidate_source_candidate_volume_by_source`
|
| 11 |
+
- **Funnel conversion rates** (review %, interview %, offer acceptance %) β `candidate_source_funnel_conversion_by_source`
|
| 12 |
+
- **Total hires by source** β `candidate_source_total_hires_by_source`
|
| 13 |
+
|
| 14 |
+
Do NOT rely solely on `candidate_source_source_recommendation_summary` when the question asks for specific metrics like SLA percentages, offer acceptance rates, or conversion rates. The summary tool is useful for a quick recommendation but does not contain all granular metrics.
|
| 15 |
+
|
| 16 |
+
## Requisition Count and Sample Size Questions
|
| 17 |
+
|
| 18 |
+
There are two different "counts" in the system β do not confuse them:
|
| 19 |
+
|
| 20 |
+
- **"How many requisitions were used to compute these metrics"** or **"sample size"** β use `candidate_source_definitions_and_methodology`, which returns the total number of requisitions used for computation across the system
|
| 21 |
+
- **"How many similar requisitions were analysed"** β use `candidate_source_metadata_and_timeframe`, which returns the count of requisitions similar to the given one
|
| 22 |
+
|
| 23 |
+
These return different numbers. Read the question carefully to determine which one is being asked for.
|
| 24 |
+
|
| 25 |
+
## Skill Analysis Questions
|
| 26 |
+
|
| 27 |
+
When a question asks about skill impact across multiple dimensions (SLA, fill rate, relevance):
|
| 28 |
+
|
| 29 |
+
- **SLA impact** β `skills_skill_impact_sla`
|
| 30 |
+
- **Fill rate impact** β `skills_skill_impact_fill_rate`
|
| 31 |
+
- **Historical effectiveness and statistical analysis** β `skills_skill_analysis`
|
| 32 |
+
- **Relevance justification** β `skills_skill_relevance_justification`
|
| 33 |
+
|
| 34 |
+
If a skill is not found in the analysis results, say so explicitly rather than guessing or inferring a negative impact.
|
policies/policies.json
ADDED
|
@@ -0,0 +1,203 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"type": "playbook",
|
| 4 |
+
"id": "playbook_api_capability_boundaries",
|
| 5 |
+
"name": "API Capability Boundaries",
|
| 6 |
+
"description": "Playbook for api capability boundaries questions",
|
| 7 |
+
"triggers": [
|
| 8 |
+
{
|
| 9 |
+
"type": "keyword",
|
| 10 |
+
"value": [
|
| 11 |
+
"job description",
|
| 12 |
+
"optimize posting",
|
| 13 |
+
"optimise posting",
|
| 14 |
+
"rewrite job",
|
| 15 |
+
"time-to-fill",
|
| 16 |
+
"time to fill",
|
| 17 |
+
"SLA deadline",
|
| 18 |
+
"SLA countdown",
|
| 19 |
+
"within 30 days",
|
| 20 |
+
"France",
|
| 21 |
+
"geography",
|
| 22 |
+
"geographic",
|
| 23 |
+
"internal posting",
|
| 24 |
+
"funnel stage",
|
| 25 |
+
"days in stage",
|
| 26 |
+
"time-in-status",
|
| 27 |
+
"job-card",
|
| 28 |
+
"job card details",
|
| 29 |
+
"hiring manager",
|
| 30 |
+
"list all requisitions",
|
| 31 |
+
"search requisitions"
|
| 32 |
+
],
|
| 33 |
+
"target": "intent",
|
| 34 |
+
"case_sensitive": false,
|
| 35 |
+
"operator": "or"
|
| 36 |
+
},
|
| 37 |
+
{
|
| 38 |
+
"type": "natural_language",
|
| 39 |
+
"value": [
|
| 40 |
+
"user asks about data the APIs cannot provide",
|
| 41 |
+
"user wants to optimize or rewrite a job description",
|
| 42 |
+
"user asks about time-to-fill or days to fill a role",
|
| 43 |
+
"user asks about geographic or location filtering",
|
| 44 |
+
"user wants to list or search across all requisitions",
|
| 45 |
+
"user asks for funnel stage timing or candidate counts per stage",
|
| 46 |
+
"user asks for live requisition status or SLA deadlines",
|
| 47 |
+
"user asks for full job details like title location or manager"
|
| 48 |
+
],
|
| 49 |
+
"target": "intent",
|
| 50 |
+
"threshold": 0.65
|
| 51 |
+
}
|
| 52 |
+
],
|
| 53 |
+
"markdown_content": "# API Capability Boundaries\n\nBefore answering any question, verify that the available APIs can actually provide the needed data.\nIf they cannot, tell the user directly β do NOT attempt to cobble together an answer from unrelated endpoints, and do NOT ask for a requisition ID when the query is fundamentally unsupported.\n\n## What the APIs CAN do\n\nThe available tool suite covers two domains:\n\n### Candidate Source Analytics\n- SLA percentage per sourcing channel\n- Total hires per sourcing channel\n- Candidate volume and share per sourcing channel\n- Funnel conversion rates (review %, interview %, offer acceptance %) per source\n- Composite source recommendation summary\n- Metadata: data timeframe, last update date, number of similar requisitions analysed\n- Definitions and methodology: metric definitions, total requisition count used for computation, ML models involved\n\n### Skills Analytics\n- Skill-level statistical analysis (historical counts, SLA correlation)\n- Skill impact on fill rate\n- Skill impact on SLA (delta with/without the skill)\n- Skill relevance justification\n- Data sources and ML models used for recommendations\n- Successful posting criteria and benchmarks\n\n## What the APIs CANNOT do\n\nThe following capabilities are NOT available through any API. If the user asks for any of these, explain that the current API suite does not support it:\n\n- **Job description text**: No API returns or accepts raw job description content. You cannot read, optimise, or rewrite a job description.\n- **Time-to-fill metrics**: No API provides time-to-fill data, whether overall or broken down by source.\n- **Geographic or channel filtering**: No API supports filtering by country, region, or posting channel (internal vs external).\n- **Live requisition status or SLA countdowns**: The APIs provide historical/aggregate analytics, not real-time status tracking or deadline monitoring.\n- **Stage-by-stage funnel timing**: No API returns average days spent in each pipeline stage or candidate counts per stage over time.\n- **Full job-card details**: No API returns comprehensive requisition details like title, location, hiring-manager name, or contact information. The APIs focus on aggregate analytics, not individual job metadata.\n- **Cross-requisition listing or search**: The APIs analyse one requisition at a time against historical data. They cannot list, search, or filter across all open requisitions.\n\n## How to respond when a query is out of scope\n\nWhen you determine that a question cannot be answered with the available APIs:\n\n1. State clearly that the current APIs do not provide the requested data\n2. Be specific about what is missing (e.g., \"the APIs don't expose time-to-fill broken down by source\")\n3. Do NOT ask for a requisition ID β providing one would not help\n4. Do NOT call any API tools β the answer is that the data is unavailable\n5. Do NOT fabricate or infer data that no API returned\n",
|
| 54 |
+
"steps": [],
|
| 55 |
+
"priority": 90,
|
| 56 |
+
"enabled": true
|
| 57 |
+
},
|
| 58 |
+
{
|
| 59 |
+
"type": "playbook",
|
| 60 |
+
"id": "playbook_multi_api_reasoning",
|
| 61 |
+
"name": "Multi-API Reasoning",
|
| 62 |
+
"description": "Playbook for multi-api reasoning questions",
|
| 63 |
+
"triggers": [
|
| 64 |
+
{
|
| 65 |
+
"type": "keyword",
|
| 66 |
+
"value": [
|
| 67 |
+
"which sources",
|
| 68 |
+
"recommend sources",
|
| 69 |
+
"best sources",
|
| 70 |
+
"compare sources",
|
| 71 |
+
"prioritize",
|
| 72 |
+
"effectiveness",
|
| 73 |
+
"conversion rate",
|
| 74 |
+
"multiple metrics",
|
| 75 |
+
"how many requisitions",
|
| 76 |
+
"sample size",
|
| 77 |
+
"most candidates",
|
| 78 |
+
"how effective"
|
| 79 |
+
],
|
| 80 |
+
"target": "intent",
|
| 81 |
+
"case_sensitive": false,
|
| 82 |
+
"operator": "or"
|
| 83 |
+
},
|
| 84 |
+
{
|
| 85 |
+
"type": "natural_language",
|
| 86 |
+
"value": [
|
| 87 |
+
"user asks to compare or recommend sourcing channels",
|
| 88 |
+
"user asks about multiple performance metrics for sources",
|
| 89 |
+
"user asks how many requisitions were used to compute metrics",
|
| 90 |
+
"user asks which sources provided the most candidates and their conversion"
|
| 91 |
+
],
|
| 92 |
+
"target": "intent",
|
| 93 |
+
"threshold": 0.65
|
| 94 |
+
}
|
| 95 |
+
],
|
| 96 |
+
"markdown_content": "# Multi-API Reasoning\n\nWhen a question asks about multiple dimensions of performance, call the individual specific APIs rather than relying on a single summary endpoint. Summary tools give a quick overview but often lack the granular detail needed for a complete answer.\n\n## Source Performance Questions\n\nWhen comparing or recommending sources across multiple metrics (SLA, volume, conversion, hires), call the specific tools:\n\n- **SLA performance** β `candidate_source_sla_per_source`\n- **Candidate volume and share** β `candidate_source_candidate_volume_by_source`\n- **Funnel conversion rates** (review %, interview %, offer acceptance %) β `candidate_source_funnel_conversion_by_source`\n- **Total hires by source** β `candidate_source_total_hires_by_source`\n\nDo NOT rely solely on `candidate_source_source_recommendation_summary` when the question asks for specific metrics like SLA percentages, offer acceptance rates, or conversion rates. The summary tool is useful for a quick recommendation but does not contain all granular metrics.\n\n## Requisition Count and Sample Size Questions\n\nThere are two different \"counts\" in the system β do not confuse them:\n\n- **\"How many requisitions were used to compute these metrics\"** or **\"sample size\"** β use `candidate_source_definitions_and_methodology`, which returns the total number of requisitions used for computation across the system\n- **\"How many similar requisitions were analysed\"** β use `candidate_source_metadata_and_timeframe`, which returns the count of requisitions similar to the given one\n\nThese return different numbers. Read the question carefully to determine which one is being asked for.\n\n## Skill Analysis Questions\n\nWhen a question asks about skill impact across multiple dimensions (SLA, fill rate, relevance):\n\n- **SLA impact** β `skills_skill_impact_sla`\n- **Fill rate impact** β `skills_skill_impact_fill_rate`\n- **Historical effectiveness and statistical analysis** β `skills_skill_analysis`\n- **Relevance justification** β `skills_skill_relevance_justification`\n\nIf a skill is not found in the analysis results, say so explicitly rather than guessing or inferring a negative impact.\n",
|
| 97 |
+
"steps": [],
|
| 98 |
+
"priority": 80,
|
| 99 |
+
"enabled": true
|
| 100 |
+
},
|
| 101 |
+
{
|
| 102 |
+
"type": "playbook",
|
| 103 |
+
"id": "playbook_average_vs_total",
|
| 104 |
+
"name": "Average vs Total Calculations",
|
| 105 |
+
"description": "Playbook for average vs total calculations questions",
|
| 106 |
+
"triggers": [
|
| 107 |
+
{
|
| 108 |
+
"type": "keyword",
|
| 109 |
+
"value": [
|
| 110 |
+
"average",
|
| 111 |
+
"usually",
|
| 112 |
+
"on average",
|
| 113 |
+
"typically",
|
| 114 |
+
"per posting",
|
| 115 |
+
"per requisition",
|
| 116 |
+
"how many candidates"
|
| 117 |
+
],
|
| 118 |
+
"target": "intent",
|
| 119 |
+
"case_sensitive": false,
|
| 120 |
+
"operator": "or"
|
| 121 |
+
},
|
| 122 |
+
{
|
| 123 |
+
"type": "natural_language",
|
| 124 |
+
"value": [
|
| 125 |
+
"user asks for average or typical values per posting",
|
| 126 |
+
"user asks how many candidates a posting usually gets"
|
| 127 |
+
],
|
| 128 |
+
"target": "intent",
|
| 129 |
+
"threshold": 0.65
|
| 130 |
+
}
|
| 131 |
+
],
|
| 132 |
+
"markdown_content": "# Average vs Total Calculations\n\nWhen the user asks for \"average\", \"typical\", \"usually\", or \"per posting\" values, you must compute an average β do not return a raw total.\n\n## How to compute averages\n\n1. Get the total metric value from the relevant API (e.g., total candidate volume from `candidate_source_candidate_volume_by_source`)\n2. Get the number of similar requisitions from `candidate_source_metadata_and_timeframe`\n3. Divide the total by the number of similar requisitions to get the per-requisition average\n4. Report the average, not the total\n\n## Example\n\nIf the user asks \"How many candidates do we usually get for postings similar to X?\":\n- Total candidates across all sources = 2913\n- Number of similar requisitions = 40\n- Average = 2913 / 40 = ~73 candidates per posting\n- Report: \"On average, similar postings attract 73 candidates\"\n\nDo NOT report 2913 as the answer β that is the total, not the average.\n",
|
| 133 |
+
"steps": [],
|
| 134 |
+
"priority": 70,
|
| 135 |
+
"enabled": true
|
| 136 |
+
},
|
| 137 |
+
{
|
| 138 |
+
"type": "playbook",
|
| 139 |
+
"id": "playbook_missing_req_id_vs_unsupported",
|
| 140 |
+
"name": "Missing Requisition ID vs Unsupported Query",
|
| 141 |
+
"description": "Playbook for missing requisition id vs unsupported query questions",
|
| 142 |
+
"triggers": [
|
| 143 |
+
{
|
| 144 |
+
"type": "keyword",
|
| 145 |
+
"value": [
|
| 146 |
+
"this job",
|
| 147 |
+
"this role",
|
| 148 |
+
"the position",
|
| 149 |
+
"for postings",
|
| 150 |
+
"for roles"
|
| 151 |
+
],
|
| 152 |
+
"target": "intent",
|
| 153 |
+
"case_sensitive": false,
|
| 154 |
+
"operator": "or"
|
| 155 |
+
},
|
| 156 |
+
{
|
| 157 |
+
"type": "natural_language",
|
| 158 |
+
"value": [
|
| 159 |
+
"user asks a question without specifying a requisition ID",
|
| 160 |
+
"user refers to a job or role without giving a specific ID"
|
| 161 |
+
],
|
| 162 |
+
"target": "intent",
|
| 163 |
+
"threshold": 0.6
|
| 164 |
+
}
|
| 165 |
+
],
|
| 166 |
+
"markdown_content": "# Missing Requisition ID vs Unsupported Query\n\nWhen a question does not include a requisition ID, determine whether providing one would actually help before asking for it.\n\n## Ask for a requisition ID when:\n\nThe question is about something the APIs support but needs a specific requisition to look up:\n- SLA performance by source\n- Candidate volume or hires by source\n- Skill analysis or skill impact\n- Funnel conversion rates\n- Data sources or methodology used\n- Metadata and timeframe\n\nThese all require a requisition ID to return useful results.\n\n## Do NOT ask for a requisition ID when:\n\nThe question is about something no API supports regardless of requisition ID:\n- Job description text (reading, optimizing, rewriting)\n- Time-to-fill metrics (overall or by source)\n- Geographic or location-based filtering\n- Live requisition status or SLA deadline countdowns\n- Stage-by-stage funnel timing (days in each stage)\n- Full job-card details (title, location, hiring-manager info)\n- Listing or searching across all open requisitions\n\nFor these, explain directly that the current APIs do not support the request. Asking for a requisition ID would be misleading because providing one would not help.\n",
|
| 167 |
+
"steps": [],
|
| 168 |
+
"priority": 85,
|
| 169 |
+
"enabled": true
|
| 170 |
+
},
|
| 171 |
+
{
|
| 172 |
+
"type": "tool_guide",
|
| 173 |
+
"id": "tool_guide_error_tool_warnings",
|
| 174 |
+
"name": "Error-Prone Tool Warnings",
|
| 175 |
+
"description": "Tool guide for error-prone endpoints",
|
| 176 |
+
"triggers": [],
|
| 177 |
+
"target_tools": [
|
| 178 |
+
"skills_skill_summary",
|
| 179 |
+
"candidate_source_source_sla_score",
|
| 180 |
+
"candidate_source_inactive_sources",
|
| 181 |
+
"candidate_source_candidate_pipeline_status",
|
| 182 |
+
"candidate_source_source_sla_check",
|
| 183 |
+
"candidate_source_funnel_status",
|
| 184 |
+
"candidate_source_bulk_source_data",
|
| 185 |
+
"skills_model_registry",
|
| 186 |
+
"skills_skill_lookup",
|
| 187 |
+
"candidate_source_source_metrics_lite",
|
| 188 |
+
"candidate_source_volume_report",
|
| 189 |
+
"candidate_source_full_candidate_details",
|
| 190 |
+
"candidate_source_source_directory",
|
| 191 |
+
"skills_skill_deep_analysis",
|
| 192 |
+
"candidate_source_sla_extended",
|
| 193 |
+
"skills_analyze_skill_match",
|
| 194 |
+
"candidate_source_requisition_details",
|
| 195 |
+
"candidate_source_list_all_sources",
|
| 196 |
+
"candidate_source_batch_metrics"
|
| 197 |
+
],
|
| 198 |
+
"guide_content": "# Error-Prone Tool Warning\n\nWARNING: This tool is known to be unreliable. It may return HTTP errors (e.g. 503 Service Unavailable), schema violations, type mismatches, or unexpected data formats.\n\nBefore using this tool, check whether one of the 13 core reliable tools can answer the question instead:\n\n**Reliable Candidate Source tools:** candidate_source_sla_per_source, candidate_source_total_hires_by_source, candidate_source_candidate_volume_by_source, candidate_source_funnel_conversion_by_source, candidate_source_metadata_and_timeframe, candidate_source_definitions_and_methodology, candidate_source_source_recommendation_summary\n\n**Reliable Skills tools:** skills_skill_analysis, skills_skill_impact_fill_rate, skills_skill_impact_sla, skills_skill_relevance_justification, skills_successful_posting_criteria, skills_data_sources_used\n\nIf this tool returns an error or unexpected data:\n- Do NOT report the raw error message to the user\n- Do NOT retry the same tool\n- Check if a reliable tool can provide the needed data\n- If no reliable tool can help, tell the user the data is not available through the current APIs\n",
|
| 199 |
+
"prepend": true,
|
| 200 |
+
"priority": 50,
|
| 201 |
+
"enabled": true
|
| 202 |
+
}
|
| 203 |
+
]
|