Fetches the full text of input URLs and persists them to sqlite3 DB file.
Fetching is resumable and comes with a progressbar.
pip install web2db
import web2db
web2db.dump('data.db', urls=[
'https://www.google.com',
'https://www.yahoo.com',
'https://www.msn.com'
])
Query the DB file:
df = web2db.to_df(sqlite3_file_path)
print(df.shape)
print(df)
- Table:
-
WebPages
url fulltext status_code text text int
-
- Resumable webpage fetching
- Saves to local SQLITE3 DB
- tqdm progress bar