Skip to content

Merge MOODLE_310_STABLE branch #204

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# .github/workflows/ci.yml
name: ci

on: [push, pull_request]

jobs:
ci:
uses: catalyst/catalyst-moodle-workflows/.github/workflows/ci.yml@main
secrets:
moodle_org_token: ${{ secrets.MOODLE_ORG_TOKEN }}
with:
disable_behat: true
114 changes: 0 additions & 114 deletions .travis.yml

This file was deleted.

21 changes: 12 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
[![Build Status](https://travis-ci.org/catalyst/moodle-tool_crawler.svg?branch=master)](https://travis-ci.org/catalyst/moodle-tool_crawler)
[![ci](https://github.com/catalyst/moodle-tool_crawler/actions/workflows/ci.yml/badge.svg?branch=MOODLE_310_STABLE)](https://github.com/catalyst/moodle-tool_crawler/actions/workflows/ci.yml?branch=MOODLE_310_STABLE)

# moodle-tool_crawler

* [What is this?](#what-is-this)
* [How does it work?](#how-does-it-work)
Expand Down Expand Up @@ -30,23 +32,24 @@ Since the plugin cronjob comes in from outside it needs to authenticate in Moodl

# Branches

| Moodle verion | Branch |
| ----------------- | ----------- |
| Moodle 3.4 to 3.8 | master |
| Totara 12+ | master |
| Moodle version | Branch |
| ----------------- | --------------------- |
| Moodle 3.10+ | MOODLE_310_STABLE |
| Moodle 3.4 to 3.9 | master |
| Totara 12+ | master |

# Installation

The plugin has a dependency on the [moodle-auth_basic](https://moodle.org/plugins/auth_basic).
To install the dependency plugin as a git submodule:
```
git submodule add https://github.com/catalyst/moodle-auth_basic auth/basic
git submodule add git@github.com:catalyst/moodle-auth_basic.git auth/basic
```


Install plugin moodle-tool_crawler as a git submodule:
```
git submodule add https://github.com/central-queensland-uni/moodle-tool_crawler.git admin/tool/crawler
git submodule add git@github.com:catalyst/moodle-tool_crawler.git admin/tool/crawler
```
# Configuration

Expand Down Expand Up @@ -156,7 +159,7 @@ be able to see the line "You are logged in as ".
Once Basic HTTP auth works test running the robot task from the CLI:

```
php admin/tool/task/cli/schedule_task.php --execute='\tool_crawler\task\crawl_task'
php admin/cli/scheduled_task.php --execute='\tool_crawler\task\crawl_task'
Execute scheduled task: Parallel crawling task (tool_crawler\task\crawl_task)
... used 22 dbqueries
... used 0.039698123931885 seconds
Expand All @@ -168,7 +171,7 @@ will run in parallel, depending on the crawl_task setting.

You can manually run the adhoc tasks from the CLI with:
```
php admin/tool/task/cli/adhoc_task.php --execute
php admin/cli/adhoc_task.php --execute
Execute adhoc task: tool_crawler\task\adhoc_crawl_task
... used 5733 dbqueries
... used 58.239180088043 seconds
Expand Down
8 changes: 4 additions & 4 deletions classes/helper.php
Original file line number Diff line number Diff line change
Expand Up @@ -252,18 +252,18 @@ public static function send_email($courseid) {
/**
* Count broken links
*
* @param $courseid
* @param int $courseid
* @throws \dml_exception
*/
public static function count_broken_links($courseid) {
public static function count_broken_links(int $courseid) {
global $DB;
$sql = "SELECT count(1) AS count
FROM {tool_crawler_url} b
LEFT JOIN {tool_crawler_edge} l ON l.b = b.id
LEFT JOIN {tool_crawler_url} a ON l.a = a.id
LEFT JOIN {course} c ON c.id = a.courseid
WHERE b.httpcode != '200' AND c.id = $courseid";
return $DB->count_records_sql($sql);
WHERE b.httpcode != '200' AND c.id = :courseid";
return $DB->count_records_sql($sql, ['courseid'=> $courseid]);
}

}
24 changes: 21 additions & 3 deletions classes/robot/crawler.php
Original file line number Diff line number Diff line change
Expand Up @@ -529,6 +529,7 @@ public function process_queue($verbose = false) {
// Iterate through the queue.
$cronstart = time();
$cronstop = $cronstart + $config->maxcrontime;
$hastime = true;

// Get an instance of the currently configured lock_factory.
$lockfactory = \core\lock\lock_config::get_lock_factory('tool_crawler_process_queue');
Expand All @@ -550,7 +551,16 @@ public function process_queue($verbose = false) {
}
}
// While we are not exceeding the maxcron time, and the queue is not empty.
while (time() < $cronstop) {
while ($hastime) {

if (\core\local\cli\shutdown::should_gracefully_exit() ||
\core\task\manager::static_caches_cleared_since($cronstart)) {
if ($verbose) {
echo "Shutting down crawler early\n";
}
return true;
}

if (empty($nodes)) {
// Grab a list of items from the front of the queue. We need the first 1000
// in case other workers are already locked and processing items at the front of the queue.
Expand Down Expand Up @@ -625,6 +635,8 @@ public function process_queue($verbose = false) {
} finally {
$lock->release();
}

$hastime = time() < $cronstop;
}
if ($courselock) {
$courselock->release();
Expand Down Expand Up @@ -906,8 +918,9 @@ public function parse_html($node, $external, $verbose = false) {
} while ($walk);

$text = self::clean_html_node_content($e);
$text = trim($text);
if ($verbose > 1) {
printf (" - Found link to: %-20s / %-50s => %-50s\n", $text, $e->href, $href);
printf (" - Found link to: %-30s -> %s\n", "'$text'", $href);
}
$this->link_from_node_to_url($node, $href, $text, $idattr);
}
Expand Down Expand Up @@ -1134,7 +1147,12 @@ private static function determine_filesize($curlhandle, $method, $success, $body
public function scrape($url) {

global $CFG;
$cookiefilelocation = $CFG->dataroot . '/tool_crawler_cookies.txt';

static $cookiefilelocation = '';
if (!$cookiefilelocation) {
$cookiefilelocation = make_request_directory() . '/tool_crawler_cookies.txt';
}

$config = self::get_config();

$version = moodle_major_version();
Expand Down
29 changes: 16 additions & 13 deletions classes/table/course_links.php
Original file line number Diff line number Diff line change
Expand Up @@ -30,19 +30,22 @@
use tool_crawler\helper;
use moodle_url;
use html_writer;
use stdClass;

class course_links extends table_sql implements renderable {

private $courseid;

private $page;
/**
* table constructor.
*
* @param $uniqueid table unique id
* @param string $uniqueid table unique id
* @param \moodle_url $url base url
* @param int $courseid course id
* @param int $page current page
* @param int $perpage number of records per page
* @throws \coding_exception
* @throws \coding_exception
*/
public function __construct($uniqueid, \moodle_url $url, $courseid, $page = 0, $perpage = 20) {
parent::__construct($uniqueid);
Expand Down Expand Up @@ -165,30 +168,30 @@ public function query_db($pagesize, $useinitialsbar = true) {

/**
*
* @param $row
* @param stdClass $row
* @return string
*/
protected function col_lastcrawledtime($row) {
protected function col_lastcrawledtime(stdClass $row) {
return userdate($row->lastcrawled);
}

/**
*
* @param $row
* @return string
* @param stdClass $row
* @return stdClass $row
* @throws \coding_exception
*/
protected function col_priority($row) {
protected function col_priority(stdClass $row) {
return tool_crawler_priority_level($row->priority);
}

/**
*
* @param $row
* @param stdClass $row
* @return mixed
* @throws \coding_exception
*/
protected function col_httpcode($row) {
protected function col_httpcode(stdClass $row) {
$text = tool_crawler_http_code($row);
if ($translation = \tool_crawler\helper::translate_httpcode($row->httpcode)) {
$text .= "<br/>" . $translation;
Expand All @@ -198,11 +201,11 @@ protected function col_httpcode($row) {

/**
*
* @param $row
* @param stdClass $row
* @return mixed
* @throws \coding_exception
*/
protected function col_target($row) {
protected function col_target(stdClass $row) {
$text = trim($row->text);
if ($text == "") {
$text = get_string('missing', 'tool_crawler');
Expand All @@ -216,11 +219,11 @@ protected function col_target($row) {

/**
*
* @param $row
* @param stdClass $row
* @return mixed
* @throws \coding_exception
*/
protected function col_url($row) {
protected function col_url(stdClass $row) {
return tool_crawler_link($row->url, $row->title, $row->redirect, false, $this->courseid);
}

Expand Down
2 changes: 2 additions & 0 deletions cli/crawler.php
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@
die();
}

\core\local\cli\shutdown::script_supports_graceful_exit();

tool_crawler_crawl($options['verbose']);
exit(0);

1 change: 1 addition & 0 deletions db/uninstall.php
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
* Link checker robot plugin uninstall script.
*
* @package tool_crawler
* @copyright 2019 Nicolas Roeser
* @license http://www.gnu.org/copyleft/gpl.html GNU GPL v3 or later
*/

Expand Down
5 changes: 5 additions & 0 deletions lang/en/tool_crawler.php
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,11 @@
$string['crawlend'] = 'Crawl end';
$string['crawlstart'] = 'Crawl start';
$string['cronticks'] = 'Cron ticks';
$string['debugging'] = 'Verbose debugging';
$string['debugoff'] = 'Debugging off';
$string['debugnormal'] = 'Normal debugging';
$string['debugverbose'] = 'Verbose debugging';
$string['debuggingdesc'] = 'This turns on debugging in the task output';
$string['disablebot'] = 'Disable the link crawler robot';
$string['disablebotdesc'] = 'Make the crawler do nothing when a scheduled task is executed. This effectively prevents crawling of links and running of bot cleanup functions. Intended to deactivate or temporarily pause the crawler without having to disable all its scheduled tasks.';
$string['duration'] = 'Duration';
Expand Down
Loading
Loading