UCSB Science Line
Sponge Spicules Nerve Cells Galaxy Abalone Shell Nickel Succinate X-ray Lens Lupine
UCSB Science Line
Home
How it Works
Ask a Question
Search Topics
Webcasts
Our Scientists
Science Links
Contact Information
Hi! How could I, (or what would I need, too) to make a relatively simple search engine of my own?
Question Date: 2008-06-15
Answer 1:

Creating a basic search engine is a relatively simple task for a good computer programmer. However, it does require that you have a good understanding of a computing language such as C or Perl. Basically, you have to write and execute a program that will request information from a server that hosts a webpage. Once you have that information, your program has to store and process it so that when a user enters in search terms, the relevant information from the server will be presented.

The following is a more detailed description on the use of Perl, which is becoming quite popular for handling of large amounts of text.

alistapart

Answer 2:

There are two parts: crawling and searching.The crawl is where you make a list of where everything is on the Internet. The search is where you make use of your results to find a particular page.

This isn't a very efficient example, but it gives you an idea whereto start. Write a "crawler" program that downloads a web page,remembers the address (URL) of the web page and all the words on it,and extracts all the links to other web pages. Create a separate file for each word and append the URL to that file. Then have the program pick one of the links which it hasn't seen before, download the new web page, and repeat. You can only do the search after the crawl has covered a lot of web pages. When you're ready to do the search, select the files corresponding to the words you're searching for, and find the URLs that are in all of those files.

Also, there are standards on the Internet for how crawlers are supposed to behave. Some sites like eBay don't want you recording and "remembering" stale auctions. Most sites have a file called robots.txt that defines what pages the crawler is or is not supposed to remember. For example,http://santabarbara.craigslist.org/robots.txt says every crawler("User-Agent: *") should ignore any URL which starts with "/forums"(http://santabarbara.craigslist.org/forums).


Answer 3:

A search engine is just a computer program, so all you really need is a computer language and a compiler. However, I don't know what the components that go into a search engine are, or how it finds what it searches for, and my programming skills are those of a weak amateur at best, so that's about as much help as I can give, limited though it is. Sorry!



Click Here to return to the search form.

University of California, Santa Barbara Materials Research Laboratory National Science Foundation
This program is co-sponsored by the National Science Foundation and UCSB School-University Partnerships
Copyright © 2020 The Regents of the University of California,
All Rights Reserved.
UCSB Terms of Use