Praw is a Snoo's best friend... maybes.
Streaming data from Reddit is surprisingly easy with PRAW.
Create a read-only Reddit instance
We need keys. Create a new app with type script: https://www.reddit.com/prefs/apps
import praw
client_id = 'XXXX'
client_secret = 'XXXX'
user_agent = 'Streaming tutorial thingy (by u/impshum)'
reddit = praw.Reddit(client_id=client_id, client_secret=client_secret, user_agent=user_agent)
Stream submissions
for comment in reddit.subreddit('all').stream.comments():
print(comment)
Explore
I'm sure we're all interested what else is in there. We can use pprint to find out just that.
import pprint
for comment in reddit.subreddit('all').stream.comments():
pprint.pprint(vars(comment))
break
Which P prints out this lovely pile of json ready to pick apart and play with.
{'_fetched': True,
'_info_params': {},
'_mod': None,
'_reddit': <praw.reddit.Reddit object at 0x102c30898>,
'_replies': [],
'_submission': None,
'approved_at_utc': None,
'approved_by': None,
'archived': False,
'author': Redditor(name='impshum'),
'author_flair_background_color': '',
'author_flair_css_class': None,
'author_flair_richtext': [{'e': 'text', 't': '2 minutes as a carrot'}],
'author_flair_template_id': None,
'author_flair_text': '2 minutes as a carrot',
'author_flair_text_color': 'dark',
'author_flair_type': 'richtext',
'author_fullname': 't2_3lgbz',
'author_patreon_flair': False,
'banned_at_utc': None,
'banned_by': None,
'body': "Hey look it's a comment by me. How nice.",
'body_html': '<div class="md"><p>Hey look it's a comment by me. How '
'nice.</p>\n'
'</div>',
'can_gild': True,
'can_mod_post': False,
'collapsed': False,
'collapsed_reason': None,
'controversiality': 0,
'created': 1545381925.0,
'created_utc': 1545353125.0,
'distinguished': None,
'downs': 0,
'edited': False,
'gilded': 0,
'gildings': {'gid_1': 0, 'gid_2': 0, 'gid_3': 0},
'id': 'ec7vzie',
'is_submitter': True,
'likes': None,
'link_author': 'impshum',
'link_id': 't3_9vu3e3',
'link_permalink': 'https://www.reddit.com/r/recycledrobot/comments/9vu3e3/lorem_ipsum_is_simply_dummy_text_of_the_printing/',
'link_title': 'Lorem Ipsum is simply dummy text of the printing and '
"typesetting industry. Lorem Ipsum has been the industry's "
'standard dummy text ever since the 1500s',
'link_url': 'https://i.imgur.com/x4RUp9d.jpg',
'mod_note': None,
'mod_reason_by': None,
'mod_reason_title': None,
'mod_reports': [],
'name': 't1_ec7vzie',
'no_follow': True,
'num_comments': 1,
'num_reports': None,
'over_18': False,
'parent_id': 't3_9vu3e3',
'permalink': '/r/recycledrobot/comments/9vu3e3/lorem_ipsum_is_simply_dummy_text_of_the_printing/ec7vzie/',
'quarantine': False,
'removal_reason': None,
'report_reasons': None,
'saved': False,
'score': 1,
'score_hidden': False,
'send_replies': True,
'stickied': False,
'subreddit': Subreddit(display_name='recycledrobot'),
'subreddit_id': 't5_35zp4',
'subreddit_name_prefixed': 'r/recycledrobot',
'subreddit_type': 'public',
'ups': 1,
'user_reports': []}
So let's get some specific data from all of this. I mean there's tons of it to wade through.
for comment in reddit.subreddit('all').stream.comments():
created = comment.created_utc
body = comment.body
print(f'{created} {body}')
Swap comment
and comments
with submission
and submissions
to get submissions if you want. Careful as they have a slightly different format to the comments. You can change 'all'
to any subreddit you like apart from those which are private and you have no access to (obviously).
Sifting through the text
We can check if certain words appear in the body. This only prints comments which contain all words in the triggers
list.
triggers = ['love', 'wooden', 'cats']
print('\nStarting the stream\n')
for comment in reddit.subreddit('all').stream.comments():
created = comment.created_utc
author = comment.author.name
sub = comment.subreddit
body = comment.body
link = comment.permalink
if all(str(l) in body for l in triggers):
print(f'{created} {author} {sub} {body} {link}')
Which if we comment somewhere exposed to r/all it should pick it up.
Starting the stream
1545382804.0 impshum recycledrobot I love wooden cats /r/recycledrobot/comments/9qqxzc/sticky_what_the_stick/ec8nfpi/
Sending email notifications
With Yagmail we can send emails in seconds. Notice I've added if created > start:
so it only picks up posts from after we run the program.
import praw
from time import time, strftime, gmtime
import yagmail
client_id = 'XXXX'
client_secret = 'XXXX'
user_agent = 'Streaming tutorial thingy (by u/impshum)'
gmailusername = 'username@gmail.com'
gmailpassword = 'XXXX'
send_to_email = 'XXXX'
email_subject = 'Reddit Alert'
reddit = praw.Reddit(client_id=client_id, client_secret=client_secret, user_agent=user_agent)
yag = yagmail.SMTP(gmailusername, gmailpassword)
triggers = ['love', 'wooden', 'cats']
start = time()
print('\nStarting the stream\n')
for comment in reddit.subreddit('all').stream.comments():
created = comment.created_utc
if created > start:
created = strftime("%d %b %Y %H:%M:%S", gmtime(created))
author = comment.author.name
sub = comment.subreddit
body = comment.body
link = comment.permalink
if all(str(l) in body for l in triggers):
print(f'{created}: r/{sub} - {body}')
msg = f'Author: u/{author}\nSub: r/{sub}\nDate: {created}\n\n{body}\n\nhttps://reddit.com{link}'
yag.send(send_to_email, email_subject, msg)
Thanks for reading. x