-
Notifications
You must be signed in to change notification settings - Fork 1.5k
hunspell: don't fail when .dic exists in the temp dir and .aff doesn't; add a way to avoid temp file creation completely (with Lucene) #11359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughThis change refactors the Hunspell dictionary creation mechanism from being file-based to stream-based. It introduces a new Factory interface for creating HunspellDictionary instances from InputStreams, updates related initialization logic in HunspellRule, and removes temporary file extraction for resource-based dictionaries, streamlining dictionary and affix file handling. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Suggested reviewers
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (2)
languagetool-core/src/main/java/org/languagetool/rules/spelling/hunspell/Hunspell.java (2)
75-83
: Potential collision when using filename as “language” code
dictionary.getFileName().toString()
includes the.dic
suffix and is used as the prefix forcreateTempFile
.
If two paths share the same filename (e.g., both aredefault.dic
in different directories), their temp prefixes collide, producing confusing names likedefault.dic123456
. It also deviates from the semantic “language code” parameter expected by custom factories.Consider passing only the stem without extension:
- .create(dictionary.getFileName().toString(), dic, aff); + .create(FilenameUtils.removeExtension(dictionary.getFileName().toString()), dic, aff);This keeps the parameter meaningful and unique.
90-97
: Fail-fast message could be more preciseThe thrown
RuntimeException
reports “Could not find the dictionary … in the classpath” even when only the.aff
is missing.
Return separate messages (or include which stream isnull
) to aid debugging:if (dic == null) { throw new RuntimeException("Missing .dic for language \"" + language + "\" in classpath: " + dicPath); } if (aff == null) { throw new RuntimeException("Missing .aff for language \"" + language + "\" in classpath: " + affPath); }Minor but saves users time when diagnosing mis-packaged resources.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
languagetool-core/src/main/java/org/languagetool/rules/spelling/hunspell/Hunspell.java
(3 hunks)languagetool-core/src/main/java/org/languagetool/rules/spelling/hunspell/HunspellRule.java
(3 hunks)
private static Pattern computNonWordPattern(InputStream stream) { | ||
try (Scanner sc = new Scanner(stream)) { | ||
while (sc.hasNextLine()) { | ||
String line = sc.nextLine(); | ||
if (line.startsWith("WORDCHARS ")) { | ||
String wordCharsFromAff = line.substring("WORDCHARS ".length()); | ||
//System.out.println("#" + wordCharsFromAff+ "#"); | ||
return Pattern.compile("(?![" + wordCharsFromAff.replace("-", "\\-") + "])" + NON_ALPHABETIC); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Escape all regex-meta characters in WORDCHARS
Currently only the dash (-
) is escaped when building the character class.
Characters like ^
, \
, ]
, or [
inside WORDCHARS
will break the pattern and throw PatternSyntaxException
.
Consider escaping every char that has a special meaning in a character class:
private static String escapeForCharClass(String s) {
StringBuilder sb = new StringBuilder();
for (char c : s.toCharArray()) {
if ("\\^[]-".indexOf(c) >= 0) sb.append('\\');
sb.append(c);
}
return sb.toString();
}
...
return Pattern.compile("(?![" + escapeForCharClass(wordCharsFromAff) + "])" + NON_ALPHABETIC);
This makes the method robust for all dictionaries.
Path affPath = Paths.get(shortDicPath + ".aff"); | ||
hunspell = Hunspell.getDictionary(Paths.get(shortDicPath + ".dic"), affPath); | ||
} | ||
} | ||
if (affPath != null) { | ||
nonWordPattern = nonWordPatterns.computeIfAbsent(affPath, (path) -> { | ||
String wordChars = ""; | ||
if (path != null) { | ||
try (Scanner sc = new Scanner(path)) { | ||
while (sc.hasNextLine()) { | ||
String line = sc.nextLine(); | ||
if (line.startsWith("WORDCHARS ")) { | ||
String wordCharsFromAff = line.substring("WORDCHARS ".length()); | ||
//System.out.println("#" + wordCharsFromAff+ "#"); | ||
wordChars = "(?![" + wordCharsFromAff.replace("-", "\\-") + "])"; | ||
break; | ||
} | ||
} | ||
nonWordPattern = nonWordPatterns.computeIfAbsent(shortDicPath, __ -> { | ||
try (InputStream stream = Files.newInputStream(affPath)) { | ||
return computNonWordPattern(stream); | ||
} catch (IOException e) { | ||
logger.error("Could not read aff file: " + path, e); | ||
logger.error("Could not read aff file: " + affPath, e); | ||
} | ||
} | ||
return Pattern.compile(wordChars + NON_ALPHABETIC); | ||
}); | ||
return NON_ALPHABETIC_PATTERN; | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
aff
file may be absent on disk – avoid NoSuchFileException
Files.newInputStream(affPath)
will fail if the .aff
vanished.
Mirror the in-resource handling by checking Files.exists
first, falling back to the default pattern when the file is missing.
- Path affPath = Paths.get(shortDicPath + ".aff");
- hunspell = Hunspell.getDictionary(Paths.get(shortDicPath + ".dic"), affPath);
- nonWordPattern = nonWordPatterns.computeIfAbsent(shortDicPath, __ -> {
- try (InputStream stream = Files.newInputStream(affPath)) {
+ Path affPath = Paths.get(shortDicPath + ".aff");
+ hunspell = Hunspell.getDictionary(Paths.get(shortDicPath + ".dic"), affPath);
+ nonWordPattern = nonWordPatterns.computeIfAbsent(shortDicPath, __ -> {
+ if (!Files.exists(affPath)) {
+ logger.warn("Missing .aff file at {}, falling back to generic pattern", affPath);
+ return NON_ALPHABETIC_PATTERN;
+ }
+ try (InputStream stream = Files.newInputStream(affPath)) {
return computNonWordPattern(stream);
} catch (IOException e) {
Prevents unexpected start-up failures on corrupt installations.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
Path affPath = Paths.get(shortDicPath + ".aff"); | |
hunspell = Hunspell.getDictionary(Paths.get(shortDicPath + ".dic"), affPath); | |
} | |
} | |
if (affPath != null) { | |
nonWordPattern = nonWordPatterns.computeIfAbsent(affPath, (path) -> { | |
String wordChars = ""; | |
if (path != null) { | |
try (Scanner sc = new Scanner(path)) { | |
while (sc.hasNextLine()) { | |
String line = sc.nextLine(); | |
if (line.startsWith("WORDCHARS ")) { | |
String wordCharsFromAff = line.substring("WORDCHARS ".length()); | |
//System.out.println("#" + wordCharsFromAff+ "#"); | |
wordChars = "(?![" + wordCharsFromAff.replace("-", "\\-") + "])"; | |
break; | |
} | |
} | |
nonWordPattern = nonWordPatterns.computeIfAbsent(shortDicPath, __ -> { | |
try (InputStream stream = Files.newInputStream(affPath)) { | |
return computNonWordPattern(stream); | |
} catch (IOException e) { | |
logger.error("Could not read aff file: " + path, e); | |
logger.error("Could not read aff file: " + affPath, e); | |
} | |
} | |
return Pattern.compile(wordChars + NON_ALPHABETIC); | |
}); | |
return NON_ALPHABETIC_PATTERN; | |
}); | |
} | |
Path affPath = Paths.get(shortDicPath + ".aff"); | |
hunspell = Hunspell.getDictionary(Paths.get(shortDicPath + ".dic"), affPath); | |
nonWordPattern = nonWordPatterns.computeIfAbsent(shortDicPath, __ -> { | |
if (!Files.exists(affPath)) { | |
logger.warn("Missing .aff file at {}, falling back to generic pattern", affPath); | |
return NON_ALPHABETIC_PATTERN; | |
} | |
try (InputStream stream = Files.newInputStream(affPath)) { | |
return computNonWordPattern(stream); | |
} catch (IOException e) { | |
logger.error("Could not read aff file: " + affPath, e); | |
} | |
return NON_ALPHABETIC_PATTERN; | |
}); | |
} |
if (JLanguageTool.getDataBroker().resourceExists(shortDicPath)) { | ||
String path = getDictionaryPath(langCountry, shortDicPath); | ||
if ("".equals(path)) { | ||
hunspell = null; | ||
} else { | ||
affPath = Paths.get(path + ".aff"); | ||
hunspell = Hunspell.getDictionary(Paths.get(path + ".dic"), affPath); | ||
addIgnoreWords(); | ||
} | ||
String aff = shortDicPath.substring(0, shortDicPath.length() - 3) + "aff"; | ||
hunspell = Hunspell.forDictionaryInResources(langCountry, shortDicPath, aff); | ||
addIgnoreWords(); | ||
nonWordPattern = nonWordPatterns.computeIfAbsent(shortDicPath, __ -> | ||
computNonWordPattern(JLanguageTool.getDataBroker().getFromResourceDirAsStream(aff))); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gracefully handle missing .aff
resource to avoid NPE
getFromResourceDirAsStream(aff)
may return null
when the .dic
exists but the .aff
doesn’t – exactly the scenario described in #11321.
Passing a null
stream into computNonWordPattern(..)
will throw a NullPointerException
, undoing the intent of this PR.
- nonWordPattern = nonWordPatterns.computeIfAbsent(shortDicPath, __ ->
- computNonWordPattern(JLanguageTool.getDataBroker().getFromResourceDirAsStream(aff)));
+ InputStream affStream = JLanguageTool.getDataBroker().getFromResourceDirAsStream(aff);
+ nonWordPattern = nonWordPatterns.computeIfAbsent(shortDicPath, __ -> {
+ if (affStream == null) { // fallback: no WORDCHARS in aff
+ logger.warn("Missing .aff for {}, using default non-alphabetic pattern", shortDicPath);
+ return NON_ALPHABETIC_PATTERN;
+ }
+ return computNonWordPattern(affStream);
+ });
This keeps the rule operational even when only the dictionary is present.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if (JLanguageTool.getDataBroker().resourceExists(shortDicPath)) { | |
String path = getDictionaryPath(langCountry, shortDicPath); | |
if ("".equals(path)) { | |
hunspell = null; | |
} else { | |
affPath = Paths.get(path + ".aff"); | |
hunspell = Hunspell.getDictionary(Paths.get(path + ".dic"), affPath); | |
addIgnoreWords(); | |
} | |
String aff = shortDicPath.substring(0, shortDicPath.length() - 3) + "aff"; | |
hunspell = Hunspell.forDictionaryInResources(langCountry, shortDicPath, aff); | |
addIgnoreWords(); | |
nonWordPattern = nonWordPatterns.computeIfAbsent(shortDicPath, __ -> | |
computNonWordPattern(JLanguageTool.getDataBroker().getFromResourceDirAsStream(aff))); | |
if (JLanguageTool.getDataBroker().resourceExists(shortDicPath)) { | |
String aff = shortDicPath.substring(0, shortDicPath.length() - 3) + "aff"; | |
hunspell = Hunspell.forDictionaryInResources(langCountry, shortDicPath, aff); | |
addIgnoreWords(); | |
InputStream affStream = JLanguageTool.getDataBroker().getFromResourceDirAsStream(aff); | |
nonWordPattern = nonWordPatterns.computeIfAbsent(shortDicPath, __ -> { | |
if (affStream == null) { // fallback: no WORDCHARS in aff | |
logger.warn("Missing .aff for {}, using default non-alphabetic pattern", shortDicPath); | |
return NON_ALPHABETIC_PATTERN; | |
} | |
return computNonWordPattern(affStream); | |
}); | |
} |
private static Factory viaTempFiles(BiFunction<Path, Path, HunspellDictionary> factory) { | ||
return (language, dictionaryStream, affixStream) -> { | ||
Path dictionary = Files.createTempFile(language, ".dic"); | ||
Path affix = Files.createTempFile(language, ".aff"); | ||
Files.copy(dictionaryStream, dictionary, StandardCopyOption.REPLACE_EXISTING); | ||
Files.copy(affixStream, affix, StandardCopyOption.REPLACE_EXISTING); | ||
try { | ||
return factory.apply(dictionary, affix); | ||
} finally { | ||
Files.deleteIfExists(dictionary); | ||
Files.deleteIfExists(affix); | ||
} | ||
}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleting temp files immediately can break on Windows / memory-mapped impls
viaTempFiles
deletes the newly-created .dic
/.aff
right after the dictionary is built:
Files.deleteIfExists(dictionary);
Files.deleteIfExists(affix);
On Windows this will throw AccessDeniedException
when the native Hunspell DLL still holds the files open.
For Lucene-based factories that memory-map the files, removing them invalidates the mmap’d buffer.
Safer options:
- Use
File.deleteOnExit()
and keep the files for the lifetime of the JVM. - Delegate deletion to
HunspellDictionary#close()
(delete when the dictionary is closed). - At minimum, swallow IOExceptions here and log a warning.
- } finally {
- Files.deleteIfExists(dictionary);
- Files.deleteIfExists(affix);
+ } finally {
+ dictionary.toFile().deleteOnExit();
+ affix.toFile().deleteOnExit();
}
Please reconsider; otherwise cross-platform stability may regress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not tested, yet, but is it safe to delete the files directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Let's first look at the DumontsHunspellDictionary implementation
rg -A 5 "class DumontsHunspellDictionary" --context=20
# Let's also check if there are any close methods and how they handle resources
rg -A 5 "void close" --type java
# And let's check the HunspellDictionary interface
rg -A 10 "interface HunspellDictionary" --type java
# Check for any memory-mapped usage
rg "map\(" --type java
# Any Lucene integrations that might be using memory-mapped files
rg "Lucene" --type java
# Check for any file locking or similar mechanisms
rg "FileLock|lock" --type java
Length of output: 65953
@SteVio89 No, it's not safe to delete the files immediately after creating the dictionary. There are two main issues:
-
Windows file locking: The native Hunspell library (used by DumontsHunspellDictionary) likely keeps the files open. Windows doesn't allow deleting files that are still open, which would cause
AccessDeniedException
. -
Memory-mapped files: The PR mentions Lucene integration - Lucene typically uses memory-mapped files. Deleting a file that's memory-mapped invalidates the mapping and can cause crashes or undefined behavior.
The reviewer's suggestion to use deleteOnExit()
is appropriate as it:
- Ensures files are cleaned up when the JVM exits
- Doesn't interfere with active file handles
- Works cross-platform
dictionary.toFile().deleteOnExit();
affix.toFile().deleteOnExit();
Alternatively, you could delegate deletion to the dictionary's close()
method since it knows when it's done with the files.
🧬 Code Graph Analysis Results
Relevant snippet from languagetool-core/src/main/java/org/languagetool/rules/spelling/hunspell/Hunspell.java
(lines 1-109)
package org.languagetool.rules.spelling.hunspell;
import org.languagetool.JLanguageTool;
import org.languagetool.broker.ResourceDataBroker;
import java.io.*;
import java.nio.file.*;
import java.util.*;
import java.util.function.BiFunction;
public final class Hunspell {
static class LanguageAndPath {
private final Path dictionary;
private final Path affix;
LanguageAndPath(Path dictionary, Path affix) {
this.dictionary = Objects.requireNonNull(dictionary);
this.affix = Objects.requireNonNull(affix);
}
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
LanguageAndPath that = (LanguageAndPath) o;
return Objects.equals(dictionary, that.dictionary) &&
Objects.equals(affix, that.affix);
}
@Override
public int hashCode() {
return Objects.hash(dictionary, affix);
}
}
private static final Map<LanguageAndPath, HunspellDictionary> map = new HashMap<>();
private static Factory hunspellDictionaryFactory = viaTempFiles(DumontsHunspellDictionary::new);
/**
* @deprecated Use {@link #setHunspellStreamFactory}
*/
@Deprecated
public static void setHunspellDictionaryFactory(BiFunction<Path, Path, HunspellDictionary> factory) {
hunspellDictionaryFactory = viaTempFiles(factory);
}
private static Factory viaTempFiles(BiFunction<Path, Path, HunspellDictionary> factory) {
return (language, dictionaryStream, affixStream) -> {
Path dictionary = Files.createTempFile(language, ".dic");
Path affix = Files.createTempFile(language, ".aff");
Files.copy(dictionaryStream, dictionary, StandardCopyOption.REPLACE_EXISTING);
Files.copy(affixStream, affix, StandardCopyOption.REPLACE_EXISTING);
try {
return factory.apply(dictionary, affix);
} finally {
Files.deleteIfExists(dictionary);
Files.deleteIfExists(affix);
}
};
}
/**
* Set a custom way to create Hunspell dictionaries,
* e.g., more efficient or portable than the default, possibly via Apache Lucene.
* The default one is to use a native wrapper over the real Hunspell binary,
* creating temporary files from the streams.
*/
public static void setHunspellStreamFactory(Factory factory) {
hunspellDictionaryFactory = factory;
}
public static synchronized HunspellDictionary getDictionary(Path dictionary, Path affix) {
LanguageAndPath key = new LanguageAndPath(dictionary, affix);
HunspellDictionary hunspell = map.get(key);
if (hunspell != null && !hunspell.isClosed()) {
return hunspell;
}
try(var dic = Files.newInputStream(dictionary); var aff = Files.newInputStream(affix)) {
HunspellDictionary newHunspell = hunspellDictionaryFactory
.create(dictionary.getFileName().toString(), dic, aff);
map.put(key, newHunspell);
return newHunspell;
} catch (IOException e) {
throw new RuntimeException(e);
}
}
public static HunspellDictionary forDictionaryInResources(String language, String resourcePath) {
return forDictionaryInResources(language, resourcePath + language + ".dic", resourcePath + language + ".aff");
}
public static HunspellDictionary forDictionaryInResources(String language, String dicPath, String affPath) {
ResourceDataBroker broker = JLanguageTool.getDataBroker();
try (var dic = broker.getFromResourceDirAsStream(dicPath); var aff = broker.getFromResourceDirAsStream(affPath)) {
if (dic == null || aff == null) {
throw new RuntimeException("Could not find the dictionary for language \"" + language + "\" in the classpath");
}
return hunspellDictionaryFactory.create(language, dic, aff);
} catch (IOException e) {
throw new RuntimeException("Could not create temporary dictionaries for language \"" + language + "\"", e);
}
}
public static HunspellDictionary forDictionaryInResources(String language) {
return forDictionaryInResources(language, "");
}
public interface Factory {
HunspellDictionary create(String languageCode, InputStream dictionary, InputStream affix) throws IOException;
}
}
Summary of relevant details from the above snippet:
- The class
Hunspell
manages Hunspell dictionaries keyed by a combination of dictionary and affix file paths. - It uses a
Factory
interface to createHunspellDictionary
instances from dictionary and affix input streams. - The default factory implementation (
viaTempFiles
) creates temporary files for the dictionary and affix streams, then deletes these temporary files immediately after creating the dictionary. - The temporary files are deleted using
Files.deleteIfExists()
in afinally
block, ensuring cleanup happens even if dictionary creation throws. - There is a static map caching created
HunspellDictionary
instances to avoid reloading. - The method
getDictionary(Path dictionary, Path affix)
opens input streams from the given paths and uses the factory to create or retrieve a cached dictionary. - The method
forDictionaryInResources
loads dictionary and affix files from resources and creates a dictionary using the factory. - The user comment "Not tested, yet, but is it safe to delete the files directly?" relates to the temporary file deletion in
viaTempFiles
method, which deletes the temporary files immediately after dictionary creation.
No other code snippets from the provided possibly_relevant_code_snippets
are directly relevant to the question about deleting files in Hunspell.java
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I've seen in the native code of Hunspell, the files are read and closed. The same happens in Lucene. We can also add a Javadoc requirement that the streams should be closed by the call return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
Thanks @donnerpeter, will do some tests before merging again :) starting next week. |
String dictionaryPath; | ||
//in the webstart, java EE or OSGi bundle version, we need to copy the files outside the jar | ||
//to the local temporary directory | ||
if (StringUtils.equalsAny(dictURL.getProtocol(), "jar", "vfs", "bundle", "bundleresource")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have this check in your change, too? When running LT on our Servers, we have all language-module jar extracted in our file system see this. This means we don't need the copy to /tmp
, before using the files.
On the contrary, it creates unnecessary files.
Sorry for not seeing this in the first review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I'd check for the file://
protocol though, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
languagetool-core/src/main/java/org/languagetool/rules/spelling/hunspell/Hunspell.java (1)
45-66
: File deletion strategy still presents cross-platform concernsThe current implementation in
viaTempFiles
still deletes temporary files immediately after dictionary creation withFiles.deleteIfExists()
. This approach can cause issues:
- On Windows, native libraries may keep files open, causing
AccessDeniedException
- With memory-mapped files (like those used by Lucene), deleting the underlying file invalidates the mapping
While the PR author noted that "the files are read and closed" in the native code, this assumption might not hold across all platforms and implementations.
#!/bin/bash # Check native Hunspell implementations that might be used rg -A 5 "class.*HunspellDictionary" --type java # Look for any file locking or file access patterns rg "map\(|FileChannel|MappedByteBuffer" --type javaConsider using
deleteOnExit()
instead for more robust cross-platform behavior:- Files.deleteIfExists(dictionary); - Files.deleteIfExists(affix); + dictionary.toFile().deleteOnExit(); + affix.toFile().deleteOnExit();
🧹 Nitpick comments (1)
languagetool-core/src/main/java/org/languagetool/rules/spelling/hunspell/Hunspell.java (1)
125-143
: Well-designed Factory interface with clear documentationThe new Factory interface provides a clean separation between file-based and stream-based dictionary creation with good documentation. The default implementation for
createFromLocalFiles
ensures implementers only need to focus on the stream-based method.One suggestion - consider adding a clarification in the Javadoc for
createFromStreams
that while implementers don't need to close the streams, they still need to fully consume them before returning./** * Create a Hunspell dictionary from the given streams. * All necessary information should be extracted from both streams by the time this method returns. - * Closing the streams is not necessary in the implementation of this method, as the caller will close them itself. + * Closing the streams is not necessary in the implementation of this method, as the caller will close them itself. + * However, the implementation must fully consume both streams before returning. */
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
languagetool-core/src/main/java/org/languagetool/rules/spelling/hunspell/Hunspell.java
(4 hunks)
🔇 Additional comments (3)
languagetool-core/src/main/java/org/languagetool/rules/spelling/hunspell/Hunspell.java (3)
98-119
: Good optimization for file-based URLsThe implementation now efficiently handles resource loading by checking if the resources are available as local files first, avoiding unnecessary stream operations when direct file access is possible.
84-91
: Improved resource management in getDictionary methodThe refactoring properly moves stream management responsibility to the Factory implementation, simplifying this method while maintaining proper resource handling.
37-43
: Good deprecation pattern with migration pathThe code properly marks the old method as deprecated and provides a clear migration path to the new API through the Javadoc link. The implementation maintains backward compatibility by adapting the old factory interface to the new one.
I did a test deployment. With these changes, the server restarts every 7–15 minutes (depends on requests). Restart mostly happens because the health check fails. Unfortunately, I could not find any helpful logs, just see that memory usage grows until RAM is full -> restart. |
@SteVio89 Can I do anything to diagnose this on my side? |
You could run the server |
I'd be grateful if you get that memory dump :) |
Unfortunately, I couldn't find the root cause. Another colleague will take another look next week. I keep you updated. |
Any updates? :) |
Hey @SteVio89! Do you have any updates on this? The issue you mentioned isn't reproducible locally on my machine. Sometimes there are memory spikes reaching up to 4Gb, but then everything goes back down to normal (~1Gb). So we'd really like to get the memory dump at least to take a look at the healthcheck problem. |
@SteVio89 any updates? :) |
Sorry, we're currently tied up in a project that's blocking all backend resources. We haven't been able to reinvestigate the issue on our server. If you absolutely want this PR merged, would it be possible to enable the changes (or the old behavior) via an ENV variable? This way, you're not directly dependent on us. |
…t; add a way to avoid temp file creation completely (with Lucene)
Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
languagetool-core/src/main/java/org/languagetool/rules/spelling/hunspell/Hunspell.java (1)
68-69
: Unsafe immediate deletion of temp files persistsThe immediate deletion of temporary files after dictionary creation remains problematic, as identified in previous reviews. This can cause
AccessDeniedException
on Windows when the native Hunspell DLL keeps files open, and invalidate memory-mapped buffers in Lucene-based implementations.As suggested in the previous review, use
deleteOnExit()
to defer deletion:- Files.deleteIfExists(dictionary); - Files.deleteIfExists(affix); + dictionary.toFile().deleteOnExit(); + affix.toFile().deleteOnExit();
🧹 Nitpick comments (1)
languagetool-core/src/main/java/org/languagetool/rules/spelling/hunspell/Hunspell.java (1)
134-137
: Potential Windows path handling issue with URL.getPath()Using
URL.getPath()
on Windows file URLs may return paths like "/C:/path/to/file" whichPath.of()
might not handle correctly due to the leading slash.Consider using
Paths.get(url.toURI())
for more robust cross-platform handling:- return hunspellDictionaryStreamFactory.createFromLocalFiles(language, Path.of(dicUrl.getPath()), Path.of(affUrl.getPath())); + return hunspellDictionaryStreamFactory.createFromLocalFiles(language, Paths.get(dicUrl.toURI()), Paths.get(affUrl.toURI()));
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
languagetool-core/src/main/java/org/languagetool/rules/spelling/hunspell/Hunspell.java
(4 hunks)languagetool-core/src/main/java/org/languagetool/rules/spelling/hunspell/HunspellRule.java
(3 hunks)
✅ Files skipped from review due to trivial changes (1)
- languagetool-core/src/main/java/org/languagetool/rules/spelling/hunspell/HunspellRule.java
🔇 Additional comments (6)
languagetool-core/src/main/java/org/languagetool/rules/spelling/hunspell/Hunspell.java (6)
7-7
: LGTM!The URL import is necessary for the new resource URL handling functionality added later in the file.
34-35
: LGTM!The environment variable flag provides necessary backward compatibility and addresses the production issues mentioned in the PR discussion. This allows teams to revert to the old behavior if needed.
38-38
: LGTM!The new Factory-based field provides better abstraction than the previous BiFunction approach, supporting both file and stream-based dictionary creation.
40-51
: LGTM!The deprecation is properly documented and the conditional logic ensures backward compatibility. The method appropriately delegates to either the old or new approach based on the environment flag.
75-83
: LGTM!The new Factory setter provides a clean API for custom implementations. The Javadoc clearly explains the use cases and benefits over the default temporary file approach.
157-175
: LGTM!The Factory interface is well-designed with clear separation of concerns. The default implementation of
createFromLocalFiles
provides convenience while allowing optimized implementations. The Javadoc clearly specifies stream handling responsibilities.
|
||
if (FORCE_TEMP_FILES) { | ||
HunspellDictionary newHunspell = hunspellDictionaryFactory.apply(dictionary, affix); | ||
map.put(key, newHunspell); | ||
return newHunspell; | ||
} | ||
|
||
try { | ||
HunspellDictionary newHunspell = hunspellDictionaryStreamFactory | ||
.createFromLocalFiles(dictionary.getFileName().toString(), dictionary, affix); | ||
map.put(key, newHunspell); | ||
return newHunspell; | ||
} catch (IOException e) { | ||
throw new RuntimeException(e); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Inconsistent error handling between old and new approaches
The new Factory-based approach (lines 98-105) wraps IOException in RuntimeException, while the old BiFunction approach (lines 92-96) doesn't handle exceptions. This creates inconsistent behavior depending on the FORCE_TEMP_FILES flag.
Consider adding consistent exception handling:
if (FORCE_TEMP_FILES) {
- HunspellDictionary newHunspell = hunspellDictionaryFactory.apply(dictionary, affix);
- map.put(key, newHunspell);
- return newHunspell;
+ try {
+ HunspellDictionary newHunspell = hunspellDictionaryFactory.apply(dictionary, affix);
+ map.put(key, newHunspell);
+ return newHunspell;
+ } catch (Exception e) {
+ throw new RuntimeException(e);
+ }
}
🤖 Prompt for AI Agents
In
languagetool-core/src/main/java/org/languagetool/rules/spelling/hunspell/Hunspell.java
around lines 91 to 106, the code inconsistently handles exceptions between the
FORCE_TEMP_FILES block and the try-catch block. To fix this, add appropriate
exception handling to the FORCE_TEMP_FILES block by wrapping any IOException or
relevant exceptions thrown by hunspellDictionaryFactory.apply in a try-catch,
and rethrow as RuntimeException to match the behavior of the other block,
ensuring consistent error handling regardless of the flag.
if (FORCE_TEMP_FILES) { | ||
try { | ||
ResourceDataBroker broker = JLanguageTool.getDataBroker(); | ||
InputStream dictionaryStream = broker.getAsStream(dicPath); | ||
InputStream affixStream = broker.getAsStream(affPath); | ||
if (dictionaryStream == null || affixStream == null) { | ||
throw new RuntimeException("Could not find dictionary for language \"" + language + "\" in classpath"); | ||
} | ||
Path dictionary = Files.createTempFile(language, ".dic"); | ||
Path affix = Files.createTempFile(language, ".aff"); | ||
Files.copy(dictionaryStream, dictionary, StandardCopyOption.REPLACE_EXISTING); | ||
Files.copy(affixStream, affix, StandardCopyOption.REPLACE_EXISTING); | ||
return hunspellDictionaryFactory.apply(dictionary, affix); | ||
} catch (IOException e) { | ||
throw new RuntimeException("Could not create temporary dictionaries for language \"" + language + "\"", e); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same temp file deletion issue in FORCE_TEMP_FILES path
The FORCE_TEMP_FILES code path has the same immediate temporary file deletion problem, but here the files aren't being deleted at all. The method creates temp files (lines 121-122) and copies streams to them (lines 123-124), but never cleans them up after calling hunspellDictionaryFactory.apply(dictionary, affix)
on line 125.
Add cleanup for the temporary files:
Files.copy(dictionaryStream, dictionary, StandardCopyOption.REPLACE_EXISTING);
Files.copy(affixStream, affix, StandardCopyOption.REPLACE_EXISTING);
- return hunspellDictionaryFactory.apply(dictionary, affix);
+ try {
+ return hunspellDictionaryFactory.apply(dictionary, affix);
+ } finally {
+ dictionary.toFile().deleteOnExit();
+ affix.toFile().deleteOnExit();
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if (FORCE_TEMP_FILES) { | |
try { | |
ResourceDataBroker broker = JLanguageTool.getDataBroker(); | |
InputStream dictionaryStream = broker.getAsStream(dicPath); | |
InputStream affixStream = broker.getAsStream(affPath); | |
if (dictionaryStream == null || affixStream == null) { | |
throw new RuntimeException("Could not find dictionary for language \"" + language + "\" in classpath"); | |
} | |
Path dictionary = Files.createTempFile(language, ".dic"); | |
Path affix = Files.createTempFile(language, ".aff"); | |
Files.copy(dictionaryStream, dictionary, StandardCopyOption.REPLACE_EXISTING); | |
Files.copy(affixStream, affix, StandardCopyOption.REPLACE_EXISTING); | |
return hunspellDictionaryFactory.apply(dictionary, affix); | |
} catch (IOException e) { | |
throw new RuntimeException("Could not create temporary dictionaries for language \"" + language + "\"", e); | |
} | |
} | |
if (FORCE_TEMP_FILES) { | |
try { | |
ResourceDataBroker broker = JLanguageTool.getDataBroker(); | |
InputStream dictionaryStream = broker.getAsStream(dicPath); | |
InputStream affixStream = broker.getAsStream(affPath); | |
if (dictionaryStream == null || affixStream == null) { | |
throw new RuntimeException("Could not find dictionary for language \"" + language + "\" in classpath"); | |
} | |
Path dictionary = Files.createTempFile(language, ".dic"); | |
Path affix = Files.createTempFile(language, ".aff"); | |
Files.copy(dictionaryStream, dictionary, StandardCopyOption.REPLACE_EXISTING); | |
Files.copy(affixStream, affix, StandardCopyOption.REPLACE_EXISTING); | |
try { | |
return hunspellDictionaryFactory.apply(dictionary, affix); | |
} finally { | |
dictionary.toFile().deleteOnExit(); | |
affix.toFile().deleteOnExit(); | |
} | |
} catch (IOException e) { | |
throw new RuntimeException("Could not create temporary dictionaries for language \"" + language + "\"", e); | |
} | |
} |
🤖 Prompt for AI Agents
In
languagetool-core/src/main/java/org/languagetool/rules/spelling/hunspell/Hunspell.java
around lines 113 to 129, the temporary files created for dictionary and affix
are not deleted after use, causing potential resource leaks. Modify the code to
ensure that after calling hunspellDictionaryFactory.apply(dictionary, affix),
both temporary files are deleted in a finally block or using try-with-resources
to guarantee cleanup regardless of success or failure.
With a fix for #11321
Summary by CodeRabbit