Skip to content

Duplicate lod_links Entries Due to Missing Polymorphic Constraints #375

@MartinHinz

Description

@MartinHinz

Problem Description

The current implementation of the lod_links table enforces a unique constraint on the combination of source and external_id:

t.index ["source", "external_id"], name: "index_lod_links_on_source_and_external_id", unique: true

This constraint causes issues when multiple Site records with the same name attempt to create a lod_link pointing to the same Wikidata entity. Since the unique index does not account for the linkable_id and linkable_type columns in the polymorphic association, the following problems arise:

  1. Unique Constraint Violation:
  • When two or more Site records try to link to the same source and external_id, a PG::UniqueViolation error occurs:
    ActiveRecord::RecordNotUnique (PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_lod_links_on_source_and_external_id")
  1. Data Integrity Issues:
  • The unique constraint does not align with the polymorphic nature of lod_links, leading to limitations in associating multiple Site records with the same Wikidata entity.

Current Workaround

To address this issue temporarily in production, a monkey patch was applied. The patch prevents duplicate creation by using find_or_initialize_by and skipping record creation if the lod_link already exists. This is not a long-term solution and only mitigates the problem in runtime code.

Proposed Solution

A permanent fix should be implemented in the master branch to properly handle the polymorphic nature of lod_links:

  1. Update the Unique Index:
    Modify the unique index to include linkable_type and linkable_id, ensuring that source and external_id are unique within the scope of a specific linkable record. Migration example:
class UpdateIndexOnLodLinksToPolymorphic < ActiveRecord::Migration[7.0]
  def change
    remove_index :lod_links, name: "index_lod_links_on_source_and_external_id"

    add_index :lod_links, [:linkable_type, :linkable_id, :source, :external_id],
              unique: true, name: "index_lod_links_on_polymorphic_source_and_external_id"
  end
end

Adjust Model Validations:
Ensure that the model reflects this uniqueness at the application level:

validates :source, uniqueness: { scope: [:external_id, :linkable_type, :linkable_id] }

  1. Refactor Batch Processing Logic:
    Update the wikidata_match_candidates_batch method to handle these polymorphic constraints gracefully and avoid creating unnecessary duplicates:
def self.wikidata_match_candidates_batch(sites)
  ActiveRecord::Base.transaction do
    wikidata_results.each do |site_name, matches|
      sites_for_name = sites.select { |s| s.name == site_name }
      matches.each do |match|
        lod_link = LodLink.find_or_create_by!(
          source: "Wikidata",
          external_id: match.qid,
          linkable_type: "Site",
          linkable_id: site.id
        ) do |link|
          link.data = { label: match.label, description: match.description }
        end

        sites_for_name.each do |site|
          site.lod_links << lod_link unless site.lod_links.include?(lod_link)
        end
      end
    end
  end
end

Action Items

  1. Update the lod_links schema to include a polymorphic index.
  2. Refactor the wikidata_match_candidates_batch method to align with the updated schema.
  3. Remove the monkey-patch from production once the changes are deployed to master.

Priority

High: The current issue causes runtime errors in production and relies on a temporary workaround. Proper resolution is needed to ensure database integrity and prevent duplicate errors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions