This article describes Validation plug-in v4. If you're using an earlier version, see the release notes for details of what changed.
The Validate URLs workflow action uses HTTP response status codes to check for broken web links. HTTP response status codes show whether a specific HTTP request has been successfully completed.
| HTTP response | Log status |
|---|---|
| Informational responses (100–199) | Info icon Info |
| Successful responses (200–299) | Info icon Info |
| Temporary redirects (302, 307) | Info icon Info |
| Permanent redirects (301, 308) | Warn icon Warn |
| Restricted or conditionally unavailable (401, 403, 405, 406, 412, 415, 417, 429, 451) | Warn icon Warn |
| Broken or gone (400, 404, 410) | Error icon Error |
| Server errors (500–599) | Error icon Error |
Because the Validate URLs workflow action validates web links asynchronously (some links may validate more quickly than others), the display order of messages in the Typefi Server workflow log may not be in the same order URLs appear in the input Content XML file.
Configure the Validate URLs workflow action
| Field | Description |
|---|---|
| Input | Choose the Content XML (CXML) file to be checked for broken web links. |
| Output |
Choose a CXML file to write validation results to. A comment is inserted at the location of each flagged link, showing HTTP status code and reason. For redirected links, the comment includes the destination URL. If no output file is specified, the action logs results only. Note: Rendering comments in a Word document requires the Microsoft Word plug-in v26 or later. |
| Convert plain-text URLs |
Select this checkbox to detect plain-text URLs in the document body and convert them to live hyperlinks before validation runs. URLs starting with Converted links are kept in the output document regardless of validation results. Requires an Output CXML file. |
| Severity filter |
Controls which results produce comments in the output CXML file. Choose Errors and warnings (default) to annotate both failed links and redirects, or Errors only to annotate failed links only. The action log always records all results regardless of this setting. Requires an Output CXML file. |
| Validation rules | Upload a plain-text file (.txt) that defines redirect policies and rate limits for specific domains. For more information, see Validation rules file format. |
| Fail on Error |
Select Fail on Error to fail the job when a URL cannot be reached. Fail on Error is selected by default. |
Because HTTP requests can take between 30 to 120 seconds to time out, the Validate URLs workflow action may take a considerable amount of time if there are many links in the document.
Validation rules file format
The validation rules file is a plain-text file that defines how the Validate URLs workflow action handles redirects and controls requests for specific domains.
Use the following sections to organise your rules:
| Section | Description |
|---|---|
[ALLOW] |
Redirects from matching domains are treated as successful responses, and do not generate warnings or comments. |
[DENY] |
Redirects from matching domains are treated as errors. |
[RATE LIMIT] |
Sets global rate limiting for all domains. |
[RATE LIMIT: pattern] |
Sets rate limiting for a specific domain, overriding the global setting. |
Lines outside any section header default to [ALLOW]. Lines starting with # are treated as comments and ignored.
Domain patterns
Use *.domain.com to match a domain and all its subdomains. For example, *.doi.org matches doi.org, dx.doi.org, and any other subdomain. Exact domain names (for example, example.com) match only that domain.
Rate limit options
| Option | Default | Description |
|---|---|---|
MaxConcurrent |
2 |
Maximum number of concurrent requests to this domain. |
RequestDelay |
0 |
Minimum time in milliseconds between successive requests to this domain. |
RateLimitSeverity |
Warn |
Severity when rate-limit threshold is exceeded. Accepted values: Warn, Error. |
Example
# Allow redirects from trusted domains [ALLOW] *.doi.org *.typefi.com # Treat redirects from these domains as errors [DENY] *.example.com # Global rate limit [RATE LIMIT] MaxConcurrent=3 RequestDelay=200 # Per-domain rate limit override [RATE LIMIT: *.doi.org] MaxConcurrent=1 RequestDelay=1000
Comments
0 comments
Please sign in to leave a comment.