Like them or loathe them, CSV files represent a quick and simple way of storing the results of all our scrapping, and allow us to perform analyses on these results quickly and easily using existing software such as Microsoft Excel.
In order to append our results to a CSV file, we can use the already available csv module that comes with Python.
We'll define an appendToCSV function, which will take an input and append this input as a line to our results.csv file, as follows:
Import csv
...
def appendToCSV(result):
print("Appending result to CSV File: {}".format(result))
with open('results.csv', 'a') as csvfile:
resultwriter = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)
resultwriter.writerow(result)
This appendToCSV function will live within our main thread, and will be called as and when a result is returned from our executor object. Having this live solely within our main thread means that we don't have to worry about race conditions and place locks to guard this resource.
Now that we've defined our appendToCSV file, we need to actually call this method whenever we get a result. In order to do this, we'll have to update our main function, and add the following code to it, where we submit the URLs in our queue to the executor:
for future in as_completed(futures):
try:
if future.result() != None:
appendToCSV(future.result())
except:
print(future.exception())
This will leave our final main method looking something like this:
def main():
url = input("Website > ")
Crawler(url)
linksToCrawl.put(url)
while not linksToCrawl.empty():
with ThreadPoolExecutor(max_workers=THREAD_COUNT) as executor:
url = linksToCrawl.get()
futures = []
if url is not None:
future = executor.submit(run, url)
futures.append(future)
for future in as_completed(futures):
try:
if future.result() != None:
appendToCSV(future.result())
except:
print(future.exception())