Technologies du Web: The Internet

May 16, 2017 | Author: Reynard Emil Lloyd | Category: N/A
Share Embed Donate


Short Description

Download Technologies du Web: The Internet...

Description

Technologies du Web: The Internet Fabian M. Suchanek

Course material adapted from Antoine Amarilli http://pierre.senellart.com and Pierre Senellart

http://a3nm.net

Organisation Mercredi 2014-12-03, 09:00 - 13:10 Jeudi 2014-12-18, 14:00 - 16:50 Lundi 2015-01-05, 09:00 - 12:40 Mardi 2015-01-06, 09:00 - 12:40 Mercredi 2015-01-07, 09:00 - 12:20 Jeudi 2015-01-08, 09:00 - 12:40 Vendredi 2015-01-09, 09:00 - 11:00 Vendredi 2015-01-09, 11:10 - 12:40: examen

http://suchanek.name/work/teaching/tdw2014/

Attention

Fabian Suchanek ´ Maˆıtre de Conferences ´ ecom ´ a` Tel ParisTech bureau C201-6

http://suchanek.name

3

Le cours • L’Internet • HTML • Technologies cote´ serveur • CSS • Ergonomie du Web (avec Gilles Bailly) • Moteurs de recherche ´ • Web Semantique • Extraction d’Informations

4

Overview Introduction Browsers IP Addresses Internet Protocol Suite HTTP Other Protocols

5

Un bref historique du Web ˆ 1969 ARPANET (ancetre d’Internet) 1974 Transmission Control Protocol (TCP) (Vinton G. Cerf & Robert E. Kahn, Turing award 2004) 1990 World Wide Web, HTTP, HTML (Tim Berners-Lee, Robert Cailliau) ` 1993 Mosaic (premier navigateur graphique a` succes, ˆ ancetre de Netscape) 1994 Yahoo! (David Filo, Jerry Yang) 1994 Fondation du W3C 1995 Amazon.com, Ebay 1995 Internet Explorer 1995 AltaVista (Louis Monier, Michael Burrows) 1998 Google (Larry Page, Sergey Brin) 2001 Wikipedia (Jimmy Wales) 2004 Mozilla Firefox 2005 YouTube

6

L’Internet, visualise´

7

Statistiques ` de 150 millions de domaines, dont 75% dans .com • Pres

(source)

• En 2002, >50% du contenu est en anglais, 6% en franc¸ais.

(source)

• Plus de 2 milliards d’utilisateurs sur Internet. • Google connaˆıt plus d’un trillion (10∧12) d’URLs uniques.

(source)

(source)

=> On soupc¸onne qu’une large partie du Web n’est pas indexable : ´ le Web cache.

8

Overview Introduction Browsers IP Addresses Internet Protocol Suite HTTP Other Protocols

9

Architecture client-serveur

important

• Le client (navigateur: Internet Explorer, Firefox, Safari) • demande au serveur des informations • affiche des pages pour l’utilisateur • Le serveur (Apache, Microsoft IIS)

ˆ • rec¸oit en permanence les requetes des clients • renvoie les documents correspondants

Client

Serveur

10

Navigateur Un navigateur est un logiciel qui permet d’obtenir et afficher des pages Web. • Les plus connus sont: Internet Explorer, Chrome, Safari

´ ` • Depend du systeme d’exploitation sous-jacent ´ ephone). ´ • Fonctionne sur un ordinateur (ou tel ˆ du client. Le navigateur prend le role

Client

11

Navigateurs Web historiques ´ Mosaic. Premier navigateur graphique repandu, 1993-1997. • De 80% en 1994 a` < 10% en 1996.

Netscape. Lance´ en 1994, base´ sur Mosaic. ´ • Proprietaire, usage non-commercial gratuit. • 80% d’usage en 1996, The name servers are like a global distributed database • The internet service providers usually also maintain a copy • If the host name is unknown, requests are forwarded

to another name server • The name servers have a fixed, known IP address • The entire structure is called the Domain Name System (DNS)

• Registration of a hostname for an IP address is

done by domain name registrars • These have to be accredited by the Internet Corporation

for Assigned Names and Numbers (ICANN) 26

Overview Introduction Browsers IP Addresses Internet Protocol Suite HTTP Other Protocols

27

The Internet Protocol Suite Client

Server

28

The Internet Protocol Suite Client

Server abstract, virtual communication

intermediate abstraction layers

real, physical communication

29

The Internet Protocol Suite Client

Server abstract, virtual communication

intermediate abstraction layers

(see Wikipedia)

30

important

The Internet Protocol Suite Client Protocols: HTTP,FTP,

Server Application Layer

POP, SSH, ... Protocols: TCP, UDP; ...

Transport Layer

Internet Layer

Protocols: IPv4, IPv6, ... Protocols: DSL, ISBN, Ethernet, ...

Link Layer

31

Protocol A protocol is a standardized set of rules that describe how information is transmitted, for example in a network such as Internet between a client and a server. Hi! Hi! How are you doing? Awesome! How are you doing? Awesome! (content here) 32

important

Application Layer Protocols Client Protocols: HTTP,FTP,

Server Application Layer

POP, SSH, ... HTTP: HyperText Transfer Protocol, most widely used protocol on the World Wide Web. Allows a client to say which Web page it wants, and a server to respond with that page. HTTPS: Like HTTP, but with encryption and authentification. FTP: For transmission of files. Used sometimes on the Web for the transmission of large files. SSH: Cryptographic protocol for secure data communication, remote command-line login, remote command execution, etc. 33

Overview Introduction Browsers IP Addresses Internet Protocol Suite HTTP Other Protocols

34

important

HTTP Protocol HTTP is an application protocol at the basis of the World Wide Web. The latest and most widely used version is HTTP/1.1.

Client request: GET /MarkUp/ HTTP/1.1 Host: www.w3.org

Server response: HTTP/1.1 200 OK ... Content-Type: text/html; ...

• Two main HTTP methods: GET and POST (HEAD is also used in

place of GET, to retrieve meta-information only). • Additional headers, in the request and the response • Possible to send parameters in the request (key/value pairs).

35

HTTP, Client, GET • Simplest type of request. • Possible parameter are sent at the end of a URL, after a “?” • Not applicable when there are too many parameters,

or when their values are too long. • Method used when a URL is directly accessed in a browser,

when a link is followed, and for some forms.

Example: URL: http://www.google.com/search?q=hello Corresponding HTTP GET request: GET /search?q=hello HTTP/1.1 Host: www.google.com

Try it out! wget -S -O 36

HTTP, Client, POST • Method only used for submitting (longer) forms.

Example: POST /php/test.php HTTP/1.1 Host: www.w3.org Content-Type: application/x-www-form-urlencoded Content-Length: 100 type=search&title=The+Dictator&format=long&country=US

37

HTTP, Client, Identification • Web clients can identify themselves in the request

with a character string • Useful to serve different content to

different browsers, detect robots • but any client can say it’s any other client! • Historical confusion on naming: all common browsers

identify themselves as Mozilla!

User-Agent: Mozilla/5.0 (X11; U; Linux x86 64; fr; rv:1.9.0.3) Gecko/2008092510 Ubuntu/8.04 (hardy) Firefox/3.0.3

38

HTTP, Client, Content negotiation • A Web client can specify in the request to the Web server: • the content type it can process

(text, images, multimedia content), with preferrence indicators • the languages preferred by the user • The Web server can thus propose different file formats,

in different languages. • In practice, content negociation on the language works,

and is used, but content negociation on file types does not work because of bad default configuration of some browsers. Accept: text/html,application/xhtml+xml,application/xml; q=0.9,*/*;q=0.8 Accept-Language: fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3 39

HTTP, Client, Conditional downloading • A client can ask for downloading a page only if it has been modified

since some given date. • Most often not applicable, the server giving rarely a reliable

last modification date (difficult to obtain for dynamically generated content!).

If-Modified-Since: Wed, 15 Oct 2008 19:40:06 GMT

304 Not Modified Last-Modified: Wed, 15 Oct 2008 19:20:00 GMT

40

HTTP, Client, Referer • When a Web browser follows a link or submits a form,

the client transmits the originating URL to the destination Web server. • Even if it is not on the same server!

Example: Visitors of the dancing class Web page came from:

41

HTTP, Client, Range • The client can ask for only a portion of the file • This is useful if the download is interrupted • This feature was misused in 2011 to bring down

Apache servers

Range: bytes=0-42

42

HTTP, Client, Authentication

important

• HTTP allows for protecting access to a Web site

by an identifier and a password • Attention: (most of the time) the password goes

through the network uncrypted (but for instance, just encoded in Base64, revertible encoding) • HTTPS (variant of HTTP that includes encryption,

cryptographic authentication, session tracking, etc.) can be used instead to transmit sensitive data

GET ... HTTP/1.1 Authorization: Basic dG90bzp0aXRp

43

Parameter encoding • By default, parameters are sent (with GET or POST) in the form:

name1=value1&name2=value2 and special characters (accented characters, spaces) are replaced by codes such as + and %20. This way of sending parameters is called application/x-www-form-urlencoded. • For the POST method, another heavier encoding can be used

(several lines per parameter), similar to the way emails are built: mostly useful for sending large quantity of information. Encoding named multipart/form-data.

44

HTTP, Server Response HTTP/1.1 200 OK ... Content-Type: text/html; ...

Status code

Header Body (= content)

45

HTTP, Server, Status 1xx Information 2xx Success 200 : OK 3xx Redirection 301 : permanent 302 : temporary 4xx Client error 400 : syntax error 401 : authentification needed 403 : forbidden 404 : not found 5xx Server error 500 : Internal server error 46

HTTP, Server, Content type • The browser behaves differently depending on the content type

returned: display a Web page with the layout engine display an image, load an external application, etc. • MIME classification of content types (e.g., image/jpeg,

text/plain, text/html, application/xhtml+xml, application/pdf etc.) • For a HTML page, or for text, the browser must also know what character set is used (this has precedence over the information contained in the document itself) • Also returned: the content length (can be used to display a progress bar) HTTP/1.1 200 OK Content-Type: text/html; charset=UTF-8 Content-Length: 3046

47

HTTP, Server, Cookies

important

A cookie (in networking jargon) is Information in the form of key/value pairs that a Web server asks a Web client to keep and retransmit with each HTTP request (for a given domain name). • Can be used to keep information on a user as she is visiting a

Web site, between visits, etc.: electronic cart, identifier, and so on. • Practically speaking, most often only stores a session identifier, connected, on the server side, to all session information (connected or not, user name, data) • Simulates the notion of session, absent from HTTP itself

Set-Cookie: session-token=RJYBsG//azkfZrRazQ3SPQhlo1FpkQka2; path=/; domain=.amazon.de; expires=Fri Oct 17 09:35:04 2008 GMT Cookie: session-token=RJYBsG//azkfZrRazQ3SPQhlo1FpkQka2 48

Proxies A Proxy is an intermediate server between client and server. • Can be used on client side or server side. • Applications: • Filter or censor the Web

(employers; schools, totalitarian regimes, parental control, etc.) • Keep a diary of activity • Keep a cache (across several users) • Anonymisation to hide the true origin of the request • Other modifications : translation, etc.

49

Overview Introduction Browsers IP Addresses Internet Protocol Suite HTTP Other Protocols

50

The Internet Protocol Suite Client Protocols: HTTP,FTP,

Server Application Layer

POP, SSH, ... Protocols: TCP, UDP; ...

Transport Layer

Internet Layer

Protocols: IPv4, IPv6, ... Protocols: DSL, ISBN, Ethernet, ...

Link Layer

51

TCP Transport Layer

TCP Protocol

The Transmission Control Protocol (TCP) provides reliable, ordered, error-checked delivery of a stream of bytes between client and server. • Establish a connection between client and server • Transmit a sequence of bytes,

split into packets • Make sure the packets arrive in the right order • Get confirmation from the other side • Retransmit lost packages • Check for errors in the packages by a checksum • Control for congestion

52

The Internet Protocol Suite Client Protocols: HTTP,FTP,

Server Application Layer

POP, SSH, ... Protocols: TCP, UDP; ...

Transport Layer

Internet Layer

Protocols: IPv4, IPv6, ... Protocols: DSL, ISBN, Ethernet, ...

Link Layer

53

Internet Protocol (IP) Internet Layer

Protocols: IPv4, IPv6, ... The Internet Protocol (IP) has the task of delivering packets from the source host to the destination host solely based on the IP addresses in the packet headers.

Client

Intermediate routers

Server

54

Summary: Internet Protocol Suite Client

Server Application Layer: HTTP

Transport Layer: TCP

Internet Layer: IP

55

View more...

Comments

Copyright � 2017 SILO Inc.