summaryrefslogtreecommitdiff
path: root/README.md
blob: ef6eab1bcf554093620ab2f946d137f8fb50d87b (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
# Introduction
This crawler gets all important information and all links of a website and writes the links to a queue.
After it has finished the information gathering, it will go on by using the first url of the queue and it will start again.

# Using the crawler
1. Create a mysql database: `mysql -u username -p` and `CREATE DATABASE database_name;`
2. Import the `database.sql` file into your database with `mysql -u username -p database_name < database.sql`
3. Edit `mysql_conf.inc` according to your databases credentials
4. Run `cd crawler && php crawler.php http://dmoztools.net/` (or any other domain)
5. For future runs, just execute `cd crawler && php crawler.php` without any arguments and it will automatically
   start with the first url of the queue
6. Finished!