-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Leak in html2text 2014.4.5 #13
Comments
I think this can be related to this too: aaronsw/html2text#78 Thank you so much for reporting this with such details and information, especially heapy details. Also is this happening on |
I'm afraid I haven't tested with Python 3 :-( Some extra information I forgot to note:
|
@mcepl since you've been into aaronsw/html2text#78, any comment ? |
OK we have this at the
Perhaps throwing away html_to_text for each run of your cycle would help and constructing new object for each new HTML document processed? But that's probably horribly slow. I will take a look whether we should throw away |
@mcepl @OOPMan
Probably |
I recently added html2text 2014.4.5 to my project and have been using it to convert HTML generated from Jinja2 templates into text. I attach the HTML and the text version of said HTML to emails constructed using the standard email.mime classes.
I added html2text amidst some other changes and so it took me a little time to track down that the source of a memory leak issue that started occurring to html2text:
The above information was captured using heapy component of Guppy-PE after following information detailed in http://python.dzone.com/articles/diagnosing-memory-leaks-python and http://www.smira.ru/wp-content/uploads/2011/08/heapy.html
As you can see, the contents of ['outtext'] are huge and based on inspection of the data itself (See last file referenced below) basically consist of the same text repeated over and over. This would seem to indicate some kind of looping error.
I'm not sure if it is relevant to this issue but every now and then when using html2text it fails after reaching line 360 of /usr/lib64/python2.7/HTMLParser.py:
raise AssertionError("we should not get here!")
On a final note, I have replicated both of these issues using both Python 2.7.5 64-bit and PyPy 2.3.0 64-bit.
For your reference as to the context, please see the following pastebin links:
send_mail: http://pastebin.com/hHHh1fUN
email_tasks.py (used by send_mail): http://pastebin.com/Eqcsk23X
email_template: http://pastebin.com/XWS4VreU
base_email_template (used by email_template): http://pastebin.com/X7GfT1LJ
contents of ['outtext']: http://www.mediafire.com/view/6uoj861r59oxme9/problemcontents.txt
I have not done any investigation yet into the exact cause of this issue with html2text, although I hope to do so tomorrow.
For now, hopefully this information will prove useful in determining the source of the issue.
The text was updated successfully, but these errors were encountered: