-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problem Description
The current implementation of the lod_links table enforces a unique constraint on the combination of source and external_id:
t.index ["source", "external_id"], name: "index_lod_links_on_source_and_external_id", unique: true
This constraint causes issues when multiple Site records with the same name attempt to create a lod_link pointing to the same Wikidata entity. Since the unique index does not account for the linkable_id and linkable_type columns in the polymorphic association, the following problems arise:
- Unique Constraint Violation:
- When two or more Site records try to link to the same source and external_id, a PG::UniqueViolation error occurs:
ActiveRecord::RecordNotUnique (PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_lod_links_on_source_and_external_id")
- Data Integrity Issues:
- The unique constraint does not align with the polymorphic nature of lod_links, leading to limitations in associating multiple Site records with the same Wikidata entity.
Current Workaround
To address this issue temporarily in production, a monkey patch was applied. The patch prevents duplicate creation by using find_or_initialize_by and skipping record creation if the lod_link already exists. This is not a long-term solution and only mitigates the problem in runtime code.
Proposed Solution
A permanent fix should be implemented in the master branch to properly handle the polymorphic nature of lod_links:
- Update the Unique Index:
Modify the unique index to include linkable_type and linkable_id, ensuring that source and external_id are unique within the scope of a specific linkable record. Migration example:
class UpdateIndexOnLodLinksToPolymorphic < ActiveRecord::Migration[7.0]
def change
remove_index :lod_links, name: "index_lod_links_on_source_and_external_id"
add_index :lod_links, [:linkable_type, :linkable_id, :source, :external_id],
unique: true, name: "index_lod_links_on_polymorphic_source_and_external_id"
end
end
Adjust Model Validations:
Ensure that the model reflects this uniqueness at the application level:
validates :source, uniqueness: { scope: [:external_id, :linkable_type, :linkable_id] }
- Refactor Batch Processing Logic:
Update the wikidata_match_candidates_batch method to handle these polymorphic constraints gracefully and avoid creating unnecessary duplicates:
def self.wikidata_match_candidates_batch(sites)
ActiveRecord::Base.transaction do
wikidata_results.each do |site_name, matches|
sites_for_name = sites.select { |s| s.name == site_name }
matches.each do |match|
lod_link = LodLink.find_or_create_by!(
source: "Wikidata",
external_id: match.qid,
linkable_type: "Site",
linkable_id: site.id
) do |link|
link.data = { label: match.label, description: match.description }
end
sites_for_name.each do |site|
site.lod_links << lod_link unless site.lod_links.include?(lod_link)
end
end
end
end
end
Action Items
- Update the lod_links schema to include a polymorphic index.
- Refactor the wikidata_match_candidates_batch method to align with the updated schema.
- Remove the monkey-patch from production once the changes are deployed to master.
Priority
High: The current issue causes runtime errors in production and relies on a temporary workaround. Proper resolution is needed to ensure database integrity and prevent duplicate errors.