Skip to content

Commit 6db6203

Browse files
committed
Code for CROHME 2019
0 parents  commit 6db6203

File tree

123 files changed

+15217
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

123 files changed

+15217
-0
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
/nbproject/
2+
/target/

README.md

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# Offline handwritten mathematical expression regnition via stroke extraction
2+
3+
The repository provide a proof-of-concept stroke extractor that can extract strokes from clean
4+
bitmap images. The stroke extractor can be used to recognize offline handwritten
5+
mathematical expression if a online recognizer is given. For example, when combined
6+
with MyScript, the resulting offline recognition system was **ranked #3 in the offline
7+
task in CROHME 2019.**
8+
9+
## Accuracy
10+
11+
Dataset|Correct|Up to 1 error|Up to 2 errors|Structural correct
12+
---|---|---|---|---
13+
CROHME 2014|58.22%|71.60%|75.15%|77.38%
14+
CROHME 2016|65.65%|77.68%|82.56%|85.00%
15+
CROHME 2019|65.05%|||
16+
17+
Although good accuracy is achieved on datasets from CROHME, the program
18+
may produce poor results on real world images. For example, the procedure do not
19+
work well on the following images:
20+
- Image containing other objects. An image should contains exactly one formula and nothing else.
21+
Ordinary text and grid lines are not allowed.
22+
- Image with low contrast. The strokes may not be distinguished from background properly.
23+
- Image with low resolution. The stroke extractor may not segment touching symbols correctly.
24+
- Printed mathematical expressions. Serifs can distract the stroke extractor.
25+
26+
## Usage
27+
28+
In order to use the MyScript Cloud recognition engine, you need to [create a account](https://sso.myscript.com/register)
29+
and create an application.
30+
31+
### Graphical interface
32+
33+
1. Run the JAR by double click or command like `java -jar mathocr-myscript.jar`
34+
2. Choose `Image file` from the menu `Recognize`
35+
3. Choose the image file
36+
4. Click the button `Recognize` under stroke preview
37+
38+
### API
39+
40+
First add the jar file to classpath. If you are using Maven, add the following
41+
to you `pom.xml`:
42+
43+
```xml
44+
<dependency>
45+
<groupId>com.github.chungkwong</groupId>
46+
<artifactId>mathocr-myscript</artifactId>
47+
<version>1.0</version>
48+
</dependency>
49+
```
50+
51+
Then you can recognize images of mathematical expression by using code like:
52+
53+
```java
54+
String applicationKey="your application key for MyScript";
55+
String hmacKey="hmac key of your Myscript account";
56+
String grammarId="an uploaded grammar of your Myscript account";
57+
int dpi=96;
58+
MyscriptRecognizer myscriptRecognizer=new MyscriptRecognizer(applicationKey,hmacKey,grammarId,dpi);
59+
Extractor extractor=new Extractor(myscriptRecognizer);
60+
61+
File file=new File("Path to file to be recognized");
62+
EncodedExpression expression=extractor.recognize(ImageIO.read(file));
63+
String latexCode=expression.getCodes(new LatexFormat());
64+
```
65+
66+
# 基于笔划提取的脱机手写数学公式识别
67+
68+
本项目提供一个可从清晰的图片中还原笔划信息的程序原型。与联机手写数学公式识别结合的话,
69+
可以打造出脱机数学公式识别系统。例如与MyScript结合时 **在CROHME 2019的脱机任务中位列第3名**
70+
71+
## 准确度
72+
73+
数据集|正确|至多一处错误|至多两处错误|结构正确
74+
---|---|---|---|---
75+
CROHME 2014|58.22%|71.60%|75.15%|77.38%
76+
CROHME 2016|65.65%|77.68%|82.56%|85.00%
77+
CROHME 2019|65.05%|||
78+
79+
虽然在CROHME数据集上取得了良好的表现,本程序对现实世界中的图片表现仍然可能未如理想。
80+
例如对以下类型的图片可能给出差劲的结果:
81+
82+
- 含有其它对象的图片。图片中只应含有一条公式而没有其它东西,不能有普通文本或网格之类。
83+
- 低对比度图片。这时笔划难以从背景区分出来。
84+
- 低清晰度图片。这时粘连在一起的符号难以分割。
85+
- 印刷体数学公式。衬线会干扰笔划提取。
86+
87+
## 用法
88+
89+
如果使用MyScript Cloud作为联机手写数学公式识别器,请[注册一个帐号](https://sso.myscript.com/register)并创建一个应用。
90+
91+
### 图形用户界面
92+
93+
94+
1. 通过双击或命令如`java -jar mathocr-myscript.jar`运行JAR文件
95+
2. 在菜单`识别`中选择`图片文件`
96+
3. 选择图像文件
97+
4. 点击笔划预览下的`识别`按钮(首次使用时需要输入你的MyScript Cloud应用标识和密钥)
98+
99+
### API
100+
101+
首先把JAR文件加到类路径。如果你使用Maven,把以下依赖加到`pom.xml``dependencies`下即可(其它构建系统类似):
102+
103+
```xml
104+
<dependency>
105+
<groupId>com.github.chungkwong</groupId>
106+
<artifactId>mathocr-myscript</artifactId>
107+
<version>1.0</version>
108+
</dependency>
109+
```
110+
111+
然后你可以使用以下样子的代码识别脱机手写数学公式:
112+
113+
```java
114+
String applicationKey="your application key for MyScript";
115+
String hmacKey="hmac key of your Myscript account";
116+
String grammarId="an uploaded grammar of your Myscript account";
117+
int dpi=96;
118+
MyscriptRecognizer myscriptRecognizer=new MyscriptRecognizer(applicationKey,hmacKey,grammarId,dpi);
119+
Extractor extractor=new Extractor(myscriptRecognizer);
120+
121+
File file=new File("Path to file to be recognized");
122+
EncodedExpression expression=extractor.recognize(ImageIO.read(file));
123+
String latexCode=expression.getCodes(new LatexFormat());
124+
```

pom.xml

Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
3+
<modelVersion>4.0.0</modelVersion>
4+
<groupId>com.github.chungkwong</groupId>
5+
<artifactId>mathocr-myscript</artifactId>
6+
<version>1.0</version>
7+
<packaging>jar</packaging>
8+
<properties>
9+
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
10+
<maven.compiler.source>1.8</maven.compiler.source>
11+
<maven.compiler.target>1.8</maven.compiler.target>
12+
<maven.test.skip>true</maven.test.skip>
13+
</properties>
14+
<dependencies>
15+
<dependency>
16+
<groupId>com.github.jai-imageio</groupId>
17+
<artifactId>jai-imageio-core</artifactId>
18+
<version>1.4.0</version>
19+
</dependency>
20+
<dependency>
21+
<groupId>com.fasterxml.jackson.core</groupId>
22+
<artifactId>jackson-databind</artifactId>
23+
<version>2.9.8</version>
24+
</dependency>
25+
<dependency>
26+
<groupId>junit</groupId>
27+
<artifactId>junit</artifactId>
28+
<version>4.12</version>
29+
<scope>test</scope>
30+
</dependency>
31+
<dependency>
32+
<groupId>org.hamcrest</groupId>
33+
<artifactId>hamcrest-core</artifactId>
34+
<version>1.3</version>
35+
<scope>test</scope>
36+
</dependency>
37+
</dependencies>
38+
<build>
39+
<plugins>
40+
<plugin>
41+
<!-- Build an executable JAR -->
42+
<groupId>org.apache.maven.plugins</groupId>
43+
<artifactId>maven-jar-plugin</artifactId>
44+
<configuration>
45+
<archive>
46+
<manifest>
47+
<mainClass>cc.chungkwong.mathocr.ui.Main</mainClass>
48+
</manifest>
49+
</archive>
50+
</configuration>
51+
</plugin>
52+
53+
<plugin>
54+
<groupId>org.apache.maven.plugins</groupId>
55+
<artifactId>maven-source-plugin</artifactId>
56+
<executions>
57+
<execution>
58+
<id>attach-sources</id>
59+
<goals>
60+
<goal>jar</goal>
61+
</goals>
62+
</execution>
63+
</executions>
64+
</plugin>
65+
<plugin>
66+
<groupId>org.apache.maven.plugins</groupId>
67+
<artifactId>maven-javadoc-plugin</artifactId>
68+
<executions>
69+
<execution>
70+
<id>attach-javadoc</id>
71+
<goals>
72+
<goal>jar</goal>
73+
</goals>
74+
</execution>
75+
</executions>
76+
</plugin>
77+
<plugin>
78+
<groupId>org.apache.maven.plugins</groupId>
79+
<artifactId>maven-gpg-plugin</artifactId>
80+
<executions>
81+
<execution>
82+
<id>sign-artifacts</id>
83+
<phase>verify</phase>
84+
<goals>
85+
<goal>sign</goal>
86+
</goals>
87+
</execution>
88+
</executions>
89+
</plugin>
90+
<plugin>
91+
<groupId>org.apache.maven.plugins</groupId>
92+
<artifactId>maven-shade-plugin</artifactId>
93+
<version>3.1.1</version>
94+
<executions>
95+
<execution>
96+
<phase>package</phase>
97+
<goals>
98+
<goal>shade</goal>
99+
</goals>
100+
</execution>
101+
</executions>
102+
</plugin>
103+
</plugins>
104+
<testResources>
105+
<testResource>
106+
<directory>${project.basedir}/src/test/resources</directory>
107+
</testResource>
108+
</testResources>
109+
</build>
110+
<name>MathOCR(MyScript based)</name>
111+
<description>Offline handwritten mathematical expression recognition via stroke extraction</description>
112+
<url>https://github.com/chungkwong/mathocr-myscript</url>
113+
<inceptionYear>2019</inceptionYear>
114+
<licenses>
115+
<license>
116+
<name>GNU Affero General Public License, Version 3</name>
117+
<url>https://www.gnu.org/licenses/agpl-3.0.txt</url>
118+
<distribution>repo</distribution>
119+
</license>
120+
</licenses>
121+
<developers>
122+
<developer>
123+
<id>chungkwong</id>
124+
<name>陈颂光</name>
125+
<email>chan@chungkwong.cc</email>
126+
<url>https://www.chungkwong.cc</url>
127+
<timezone>+8</timezone>
128+
<organization>Sun Yat-sen University</organization>
129+
<organizationUrl>http://www.sysu.edu.cn</organizationUrl>
130+
</developer>
131+
</developers>
132+
<scm>
133+
<url>https://github.com/chungkwong/mathocr-myscript</url>
134+
<tag>HEAD</tag>
135+
<connection>scm:git:https://github.com/chungkwong/mathocr-myscript.git</connection>
136+
</scm>
137+
<issueManagement>
138+
<system>GitHub</system>
139+
<url>https://github.com/chungkwong/mathocr-myscript/issues</url>
140+
</issueManagement>
141+
142+
<distributionManagement>
143+
<snapshotRepository>
144+
<id>nexus-snapshots</id>
145+
<name>Nexus Snapshot Repo</name>
146+
<url>https://oss.sonatype.org/content/repositories/snapshots</url>
147+
</snapshotRepository>
148+
<repository>
149+
<id>nexus-releases</id>
150+
<name>Nexus Staging Repo</name>
151+
<url>https://oss.sonatype.org/service/local/staging/deploy/maven2</url>
152+
</repository>
153+
</distributionManagement>
154+
155+
</project>

0 commit comments

Comments
 (0)