Data analytics and Moodle, getting access to your data.

3 July 2023 by Catalyst

One of the many advantages of using an open source product like Moodle with a company like Catalyst, is we believe that your LMS data is yours. Over the years, we have had several clients come to us asking for copies of their LMS data in different formats.

It often comes as a surprise that we are able to facilitate this at all, and we suspect that some don’t ask exactly for what they need thinking its impossible.

We asked Alex Lawn, Catalyst IT Australia’s Director and Operations Manager, to talk us through the process.

“The way we expose our Internal data has been done in many different ways for our clients,” says Alex. “Let’s run through them, as well as the pros and cons of each.”

1. Access to nightly backups.

Every night, Catalyst takes a full Database dump using pg_dump or mydumper for mysql, we store these in an s3 bucket for long term archival. For some of our clients, we either setup replication or copy the backups to a bucket they control. From there, people can restore these into their own data lakes as required.

The pros of this is it’s inexpensive and easy to setup, given we already have these backups.

The only con here is that data can be up to 24 hours stale.

In some cases, we are able to convert the data format, or provide dumps of particular tables of interest in CSV format.

2. Sync backup to an external SFTP server.

For one of our clients, we take copies of nightly backups and push these through to a client managed SFTP server hosted on premise. For security, we do this via a VPN link. From here the client is able to ingest the data into their own Data lake nightly, and have additional peace of mind that a full backup is stored on premise to their own Data retention and storage standards.

The pros again is that we already have these backups. The setup of these data syncs will only take some overhead to script.

And the only con is again, that data can be up to 24 hours stale.

3. Realtime Read Replica

At Catalyst, we use AWS Aurora for our databases, this allows us to create a ‘realtime replica’ database and configure Moodle to never send traffic to that database. This allows our clients to make heavy use of the ‘read only’ database without impacting the LMS performance.

Normally, our Database servers are kept strictly internally only. However, depending on a client’s network setup, we can expose the dedicated read-replica in different ways. For example:

  • AWS VPC peering directly to your VPC where you might run tools like Power BI
  • AWS Transit gateway, connecting to your wider AWS network
  • Azure to AWS VPN, allowing connectivity from your private Azure networks to our reader database
  • Openvpn setting, allowing an arbitrary on-premise network to connect in a secure manor
  • Configure HA-proxy on a public IP, forwarding traffic to the reader endpoint, then accepting traffic on an SSL connection only from a customers static IP address

Using any one of these these techniques can enable access to our database which may only be 20 milliseconds behind the master database.

The pros here is that you get realtime data access and zero performance impact to your LMS.

The only con is that it’s a more complex setup, requiring a dedicated database which has a small monthly cost.

Our clients have used PowerBI and similar tools for things like:

  • Creating dashboards of student success / enrolment
  • Creating dashboards showing times assessments are due
  • Showing course completion dates
  • Identifying students who need additional assistance

Having the right data at your fingertips is an increasingly important part of constantly improving your operations and your future strategy for success. Please reach out to our team if the access to your LMS data is something you want to know more about.