Skip to content

Commit 13380b4

Browse files
committed
更新README
1 parent 9d2c7d0 commit 13380b4

File tree

1 file changed

+26
-18
lines changed

1 file changed

+26
-18
lines changed

README.md

Lines changed: 26 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,18 @@
1212
* PDF文件合并,
1313
* PDF文件截取某些页等
1414

15-
A set of tools for building small crawlers, including accessing links, getting elements, extracting files, etc.
16-
There are also small tools that have been implemented to obtain papers through scihub, as well as pdf to doc, text translation, proxy connection acquisition and proxy link acquisition through api,
17-
PDF file merging, PDF file intercepting certain pages, etc.
15+
A set of tools for building small crawlers, including
16+
1. crawler utils:
17+
1. accessing links
18+
2. getting elements
19+
3. extracting files, etc.
20+
2. other tools:
21+
1. obtain papers through scihub
22+
2. pdf to doc
23+
3. text translation
24+
4. proxy connection acquisition and proxy link acquisition through api
25+
5. PDF file merging
26+
6. PDF file intercepting certain pages, etc.
1827

1928
# 安装与使用
2029
```commandline
@@ -68,8 +77,8 @@ from PaperCrawlerUtil.common_util import *
6877
from PaperCrawlerUtil.crawler_util import *
6978
from PaperCrawlerUtil.document_util import *
7079
basic_config(logs_style=LOG_STYLE_PRINT, require_proxy_pool=True, proxypool_storage="dict")
71-
```
72-
```python
80+
81+
7382
"""
7483
使用dict时,也可以像redis一样,保存数据到硬盘,下次启动再加载,默认保存在dict.db,
7584
可以通过dict_store_path修改路径,如下:
@@ -94,6 +103,8 @@ g.save_dict()
94103
basic_config(require_proxy_pool=True, need_tester_log=False,
95104
need_getter_log=False, need_storage_log=False)
96105
```
106+
107+
### 单独启用代理池(对proxypool项目的一个小扩展,可以直接代码启动,省事)
97108
```python
98109
"""
99110
也可以单独启用代理池,作为其他应用的一部分使用,方法如下:
@@ -107,19 +118,6 @@ basic_config(logs_style=LOG_STYLE_PRINT, require_proxy_pool=True, need_tester_lo
107118
api_port=5556, set_daemon=False)
108119
```
109120

110-
```python
111-
"""
112-
更新,增加cookie的访问
113-
"""
114-
from PaperCrawlerUtil.common_util import *
115-
from PaperCrawlerUtil.crawler_util import *
116-
from PaperCrawlerUtil.document_util import *
117-
cookie = "axxxx=c9IxxxxxdK"
118-
html = random_proxy_header_access(
119-
url="https://s.taobao.com/search?q=iphone5",
120-
require_proxy=False, cookie=cookie)
121-
```
122-
123121
## 爬取CVPR文章
124122
```python
125123
from PaperCrawlerUtil.common_util import *
@@ -166,6 +164,9 @@ name = get_attribute_of_html(html,
166164
rule={"href": IN, "pdf": NOT_IN, "main": IN, "full": NOT_IN, "emnlp": IN,
167165
"align-middle": IN, "emnlp-main.": IN},
168166
attr_list=['strong'])
167+
"""
168+
获取文件名
169+
"""
169170
names = []
170171
for k in name:
171172
p = list(k)
@@ -318,6 +319,13 @@ log(get_split())
318319
log(t.web_translator(translate_method=google_trans_final, need_default_reporthook=True))
319320
log(get_split())
320321
log(t.google_translate_web())
322+
323+
"""
324+
链式翻译示例
325+
"""
326+
tt = Translators(proxy="127.0.0.1:33210")
327+
k = tt.chain_translate(content="Traffic flow forecasting or prediction plays an important role in the traffic control and management of a city. Existing works mostly train a model using the traffic flow data of a city and then test the trained model using the data of the same city. It may not be truly intelligent as there are many cities around us and there should be some shared knowledge among different cities. The data of a city and its knowledge can be used to help improve the traffic flow forecasting of other cities. To address this motivation, we study building a universal deep learning model for multi-city traffic flow forecasting. In this paper, we exploit spatial-temporal correlations among different cities with multi-task learning to approach the traffic flow forecasting tasks of multiple cities. As a result, we propose a Multi-city Traffic flow forecasting Network (MTN) via multi-task learning to extract the spatial dependency and temporal regularity among multiple cities later used to improve the performance of each individual city traffic flow forecasting collaboratively. In brief, the proposed model is a quartet of methods: (1) It integrates three temporal intervals and formulates a multi-interval component for each city to extract temporal features of each city; (2) A spatial-temporal attention layer with 3D Convolutional kernels is plugged into the neural networks to learn spatial-temporal relationship; (3) As traffic peak distributions of different cities are often similar, it proposes to use a peak zoom network to learn the peak effect of multiple cities and enhance the prediction performance on important time steps in different cities; (4) It uses a fusion layer to merge the outputs from distinct temporal intervals for the final forecasting results. Experimental results using real-world datasets from DIDI show the superior performance of the proposed model.")
328+
321329
```
322330

323331
## 进度条

0 commit comments

Comments
 (0)