Skip to content

Commit a4a8915

Browse files
Merge pull request #22 from shaikhsajid1111/v3
Fix for outdated selector and more.
2 parents ef184ed + 08dd5f0 commit a4a8915

File tree

10 files changed

+534
-468
lines changed

10 files changed

+534
-468
lines changed

README.md

Lines changed: 52 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -49,11 +49,12 @@ posts_count = 10
4949
browser = "firefox"
5050
proxy = "IP:PORT" #if proxy requires authentication then user:password@IP:PORT
5151
timeout = 600 #600 seconds
52-
meta_ai = Facebook_scraper(page_name,posts_count,browser,proxy=proxy,timeout=timeout)
52+
headless = True
53+
meta_ai = Facebook_scraper(page_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless)
5354

5455
```
5556

56-
<h3> Parameters for <code>Facebook_scraper(page_name,posts_count,browser,proxy,timeout) </code> class </h3>
57+
<h3> Parameters for <code>Facebook_scraper(page_name, posts_count, browser, proxy, timeout, headless) </code> class </h3>
5758
<table>
5859
<th>
5960
<tr>
@@ -68,10 +69,10 @@ meta_ai = Facebook_scraper(page_name,posts_count,browser,proxy=proxy,timeout=tim
6869
page_name
6970
</td>
7071
<td>
71-
string
72+
String
7273
</td>
7374
<td>
74-
name of the facebook page
75+
Name of the facebook page
7576
</td>
7677
</tr>
7778

@@ -80,10 +81,10 @@ name of the facebook page
8081
posts_count
8182
</td>
8283
<td>
83-
integer
84+
Integer
8485
</td>
8586
<td>
86-
number of posts to scrap, if not passed default is 10
87+
Number of posts to scrap, if not passed default is 10
8788
</td>
8889
</tr>
8990

@@ -92,10 +93,10 @@ number of posts to scrap, if not passed default is 10
9293
browser
9394
</td>
9495
<td>
95-
string
96+
String
9697
</td>
9798
<td>
98-
which browser to use, either chrome or firefox. if not passed,default is chrome
99+
Which browser to use, either chrome or firefox. if not passed,default is chrome
99100
</td>
100101
</tr>
101102

@@ -104,24 +105,36 @@ which browser to use, either chrome or firefox. if not passed,default is chrome
104105
proxy(optional)
105106
</td>
106107
<td>
107-
string
108+
String
108109
</td>
109110
<td>
110-
optional argument, if user wants to set proxy, if proxy requires authentication then the format will be <code> user:password@IP:PORT </code>
111+
Optional argument, if user wants to set proxy, if proxy requires authentication then the format will be <code> user:password@IP:PORT </code>
111112
</td>
112113
</tr>
113114
<tr>
114115
<td>
115116
timeout
116117
</td>
117118
<td>
118-
integer
119+
Integer
119120
</td>
120121
<td>
121122
The maximum amount of time the bot should run for. If not passed, the default timeout is set to 10 minutes
122123
</code>
123124
</td>
124125
</tr>
126+
<tr>
127+
<td>
128+
headless
129+
</td>
130+
<td>
131+
Boolean
132+
</td>
133+
<td>
134+
Whether to run browser in headless mode?. Default is True
135+
</code>
136+
</td>
137+
</tr>
125138

126139
</table>
127140
<br>
@@ -212,7 +225,7 @@ Output Structure for JSON format:
212225

213226
filename = "data_file" #file name without CSV extension,where data will be saved
214227
directory = "E:\data" #directory where CSV file will be saved
215-
meta_ai.scrap_to_csv(filename,directory)
228+
meta_ai.scrap_to_csv(filename, directory)
216229

217230
```
218231

@@ -228,7 +241,7 @@ id,name,shares,likes,loves,wow,cares,sad,angry,haha,reactions_count,comments,con
228241
<hr>
229242
<br>
230243

231-
<h3> Parameters for <code> scrap_to_csv(filename,directory) </code> method. </h3>
244+
<h3> Parameters for <code> scrap_to_csv(filename, directory) </code> method. </h3>
232245

233246
<table>
234247
<th>
@@ -244,11 +257,11 @@ id,name,shares,likes,loves,wow,cares,sad,angry,haha,reactions_count,comments,con
244257
filename
245258
</td>
246259
<td>
247-
string
260+
String
248261
</td>
249262

250263
<td>
251-
name of the CSV file where post's data will be saved
264+
Name of the CSV file where post's data will be saved
252265
</td>
253266

254267
</tr>
@@ -258,11 +271,11 @@ name of the CSV file where post's data will be saved
258271
directory
259272
</td>
260273
<td>
261-
string
274+
String
262275
</td>
263276

264277
<td>
265-
directory where CSV file have to be stored.
278+
Directory where CSV file have to be stored.
266279
</td>
267280

268281
</tr>
@@ -305,7 +318,7 @@ Description
305318
id
306319
</td>
307320
<td>
308-
string
321+
String
309322
</td>
310323
<td>
311324
Post Identifier(integer casted inside string)
@@ -319,7 +332,7 @@ Post Identifier(integer casted inside string)
319332
name
320333
</td>
321334
<td>
322-
string
335+
String
323336
</td>
324337
<td>
325338
Name of the page
@@ -331,10 +344,10 @@ Name of the page
331344
shares
332345
</td>
333346
<td>
334-
integer
347+
Integer
335348
</td>
336349
<td>
337-
share count of post
350+
Share count of post
338351
</td>
339352
</tr>
340353

@@ -343,10 +356,10 @@ share count of post
343356
reactions
344357
</td>
345358
<td>
346-
dictionary
359+
Dictionary
347360
</td>
348361
<td>
349-
dictionary containing reactions as keys and its count as value. Keys => <code> ["likes","loves","wow","cares","sad","angry","haha"] </code>
362+
Dictionary containing reactions as keys and its count as value. Keys => <code> ["likes","loves","wow","cares","sad","angry","haha"] </code>
350363
</td>
351364
</tr>
352365

@@ -355,10 +368,10 @@ dictionary containing reactions as keys and its count as value. Keys => <code> [
355368
reaction_count
356369
</td>
357370
<td>
358-
integer
371+
Integer
359372
</td>
360373
<td>
361-
total reaction count of post
374+
Total reaction count of post
362375
</td>
363376
</tr>
364377

@@ -368,10 +381,10 @@ total reaction count of post
368381
comments
369382
</td>
370383
<td>
371-
integer
384+
Integer
372385
</td>
373386
<td>
374-
comments count of post
387+
Comments count of post
375388
</td>
376389
</tr>
377390

@@ -380,10 +393,10 @@ comments count of post
380393
content
381394
</td>
382395
<td>
383-
string
396+
String
384397
</td>
385398
<td>
386-
content of post as text
399+
Content of post as text
387400
</td>
388401
</tr>
389402

@@ -392,7 +405,7 @@ content of post as text
392405
video
393406
</td>
394407
<td>
395-
string
408+
String
396409
</td>
397410
<td>
398411
URL of video present in that post
@@ -405,10 +418,10 @@ URL of video present in that post
405418
image
406419
</td>
407420
<td>
408-
list
421+
List
409422
</td>
410423
<td>
411-
python's list containing URLs of all images present in the post
424+
List containing URLs of all images present in the post
412425
</td>
413426
</tr>
414427

@@ -417,10 +430,10 @@ python's list containing URLs of all images present in the post
417430
posted_on
418431
</td>
419432
<td>
420-
datetime
433+
Datetime
421434
</td>
422435
<td>
423-
time at which post was posted(in ISO 8601 format)
436+
Time at which post was posted(in ISO 8601 format)
424437
</td>
425438
</tr>
426439

@@ -429,7 +442,7 @@ time at which post was posted(in ISO 8601 format)
429442
post_url
430443
</td>
431444
<td>
432-
string
445+
String
433446
</td>
434447
<td>
435448
URL for that post
@@ -449,9 +462,10 @@ URL for that post
449462
<h2> Tech </h2>
450463
<p>This project uses different libraries to work properly.</p>
451464
<ul>
452-
<li> <a href="https://www.selenium.dev/" target='_blank'>selenium</a>
453-
<li> <a href="https://pypi.org/project/webdriver-manager/" target='_blank'>webdriver manager</a>
454-
<li> <a href="https://pypi.org/project/python-dateutil/" target='_blank'>python dateutil</a>
465+
<li> <a href="https://www.selenium.dev/" target='_blank'>Selenium</a></li>
466+
<li> <a href="https://pypi.org/project/webdriver-manager/" target='_blank'>Webdriver Manager</a></li>
467+
<li> <a href="https://pypi.org/project/python-dateutil/" target='_blank'>Python Dateutil</a></li>
468+
<li> <a href="https://pypi.org/project/selenium-wire/" target='_blank'>Selenium-wire</a></li>
455469
</ul>
456470
<br>
457471

0 commit comments

Comments
 (0)