Indexing and crawling
Ahmia is mainly divided into three parts. Codes for these parts are open source and available on Github. You can click on respective links to access them.
- Index: It refers to the data that Ahmia has collected. We use Elascticsearch to maintain this data.
- Crawler: It is the part that crawls onions on the Tor network and feed it to the index. Scrapy is one of the crawlers that we use.
- Site: It is the backbone of Ahmia that includes the design of the website and makes the search engine work.
Ahmia collects a list of known .onion sites, information about these sites and saves it to its database. Collected data is filtered to remove child abuse content (Refer: Blacklist ).
Once data is filtered, it is available for search at Ahmia.
We are honouring robots.txt files. This means that you can prevent our crawler from indexing your page by using robots.txt rules. This way your page is not shown in the search results. On the other hand, your page still shows in the list of all known hidden services that are online.
Check robotstxt.org for details to create robots.txt.
Right onion address starts with msydq and ends with zerdg.onion.
Find real address from ahmia.fi