✔ [Python] HTML-content ausgeben

KillerPikachu · 29. Mai 2010

Heyho, ich versuche mittels read() die Antwort einer GET-Anfrage aus einem network-object auszulesen.
Allerdings les ich immer nur bis zum <meta>-Tag aus und nicht den kompletten Quellcode?

Ich frag mich jetzt, liegt das an read(), an der Anfrage die ich stelle oder was mach ich falsch? :/

Als quasi etwas in der Art:

Code:

obj = urllib.request.urlopen('http://content.de/index.php');
html_content = obj.read()

CPoly · 29. Mai 2010

Ich hab keine Ahnung von Python, aber kann es sein, dass das ganze gepuffert wird? Dann ließt du mit read() nur den ersten Block. Guck mal was kommt, wenn du nochmal read() aufrufst.

KillerPikachu · 29. Mai 2010

Das hab ich mir auch schon überlegt, aber funktioniert nicht. Er gibt das gleiche dann nochmal aus.

Edit: ok, im Meta-Tag ist eine Anweisung http-equiv auf eine andere php-Datei. Kann ich das irgendwie umgehen?
Wird das überhaupt beachtet?

Rubbe · 8. Juni 2010

Hi.

Aus der Doku von urllib.urlopen:

One caveat: the read() method, if the size argument is omitted or negative, may not read until the end of the data stream; there is no good way to determine that the entire stream from a socket has been read in the general case.

Gib doch einfach eine bestimmte Größe bei read() an, ungefähr so:

Python:

html = ''
data = obj.read(200)
while data != '':
    html += data
    data = obj.read(200)

Und nein, die <meta/> Tags werden nicht beachtet.

CU

✔ [Python] HTML-content ausgeben

KillerPikachu

Grünschnabel

CPoly

Mitglied Weizenbier

KillerPikachu

Grünschnabel

Rubbe

Neue Beiträge