Streaming Reddit With Python



Praw is a Snoo's best friend... maybes.

Streaming data from Reddit is surprisingly easy with PRAW.

Create a read-only Reddit instance

We need keys. Create a new app with type script: https://www.reddit.com/prefs/apps

import praw

client_id = 'XXXX'
client_secret = 'XXXX'
user_agent = 'Streaming tutorial thingy (by u/impshum)'

reddit = praw.Reddit(client_id=client_id, client_secret=client_secret, user_agent=user_agent)

Stream submissions

for comment in reddit.subreddit('all').stream.comments():
    print(comment)

Explore

I'm sure we're all interested what else is in there. We can use pprint to find out just that.

import pprint

for comment in reddit.subreddit('all').stream.comments():
    pprint.pprint(vars(comment))
    break

Which P prints out this lovely pile of json ready to pick apart and play with.

{'_fetched': True,
'_info_params': {},
'_mod': None,
'_reddit': <praw.reddit.Reddit object at 0x102c30898>,
'_replies': [],
'_submission': None,
'approved_at_utc': None,
'approved_by': None,
'archived': False,
'author': Redditor(name='impshum'),
'author_flair_background_color': '',
'author_flair_css_class': None,
'author_flair_richtext': [{'e': 'text', 't': '2 minutes as a carrot'}],
'author_flair_template_id': None,
'author_flair_text': '2 minutes as a carrot',
'author_flair_text_color': 'dark',
'author_flair_type': 'richtext',
'author_fullname': 't2_3lgbz',
'author_patreon_flair': False,
'banned_at_utc': None,
'banned_by': None,
'body': "Hey look it's a comment by me. How nice.",
'body_html': '<div class="md"><p>Hey look it&#39;s a comment by me. How '
            'nice.</p>\n'
            '</div>',
'can_gild': True,
'can_mod_post': False,
'collapsed': False,
'collapsed_reason': None,
'controversiality': 0,
'created': 1545381925.0,
'created_utc': 1545353125.0,
'distinguished': None,
'downs': 0,
'edited': False,
'gilded': 0,
'gildings': {'gid_1': 0, 'gid_2': 0, 'gid_3': 0},
'id': 'ec7vzie',
'is_submitter': True,
'likes': None,
'link_author': 'impshum',
'link_id': 't3_9vu3e3',
'link_permalink': 'https://www.reddit.com/r/recycledrobot/comments/9vu3e3/lorem_ipsum_is_simply_dummy_text_of_the_printing/',
'link_title': 'Lorem Ipsum is simply dummy text of the printing and '
             "typesetting industry. Lorem Ipsum has been the industry's "
             'standard dummy text ever since the 1500s',
'link_url': 'https://i.imgur.com/x4RUp9d.jpg',
'mod_note': None,
'mod_reason_by': None,
'mod_reason_title': None,
'mod_reports': [],
'name': 't1_ec7vzie',
'no_follow': True,
'num_comments': 1,
'num_reports': None,
'over_18': False,
'parent_id': 't3_9vu3e3',
'permalink': '/r/recycledrobot/comments/9vu3e3/lorem_ipsum_is_simply_dummy_text_of_the_printing/ec7vzie/',
'quarantine': False,
'removal_reason': None,
'report_reasons': None,
'saved': False,
'score': 1,
'score_hidden': False,
'send_replies': True,
'stickied': False,
'subreddit': Subreddit(display_name='recycledrobot'),
'subreddit_id': 't5_35zp4',
'subreddit_name_prefixed': 'r/recycledrobot',
'subreddit_type': 'public',
'ups': 1,
'user_reports': []}

So let's get some specific data from all of this. I mean there's tons of it to wade through.

for comment in reddit.subreddit('all').stream.comments():
    created = comment.created_utc
    body = comment.body
    print(f'{created} {body}')

Swap comment and commentswith submission and submissions to get submissions if you want. Careful as they have a slightly different format to the comments. You can change 'all' to any subreddit you like apart from those which are private and you have no access to (obviously).

Sifting through the text

We can check if certain words appear in the body. This only prints comments which contain all words in the triggers list.

triggers = ['love', 'wooden', 'cats']

print('\nStarting the stream\n')

for comment in reddit.subreddit('all').stream.comments():
    created = comment.created_utc
    author = comment.author.name
    sub = comment.subreddit
    body = comment.body
    link = comment.permalink

    if all(str(l) in body for l in triggers):
        print(f'{created} {author} {sub} {body} {link}')

Which if we comment somewhere exposed to r/all it should pick it up.

Starting the stream

1545382804.0 impshum recycledrobot I love wooden cats /r/recycledrobot/comments/9qqxzc/sticky_what_the_stick/ec8nfpi/

Sending email notifications

With Yagmail we can send emails in seconds. Notice I've added if created > start: so it only picks up posts from after we run the program.

import praw
from time import time, strftime, gmtime
import yagmail

client_id = 'XXXX'
client_secret = 'XXXX'
user_agent = 'Streaming tutorial thingy (by u/impshum)'

gmailusername = 'username@gmail.com'
gmailpassword = 'XXXX'
send_to_email = 'XXXX'
email_subject = 'Reddit Alert'

reddit = praw.Reddit(client_id=client_id, client_secret=client_secret, user_agent=user_agent)

yag = yagmail.SMTP(gmailusername, gmailpassword)

triggers = ['love', 'wooden', 'cats']

start = time()

print('\nStarting the stream\n')

for comment in reddit.subreddit('all').stream.comments():
    created = comment.created_utc

    if created > start:
        created = strftime("%d %b %Y %H:%M:%S", gmtime(created))
        author = comment.author.name
        sub = comment.subreddit
        body = comment.body
        link = comment.permalink

        if all(str(l) in body for l in triggers):
            print(f'{created}: r/{sub} - {body}')
            msg = f'Author: u/{author}\nSub: r/{sub}\nDate: {created}\n\n{body}\n\nhttps://reddit.com{link}'
            yag.send(send_to_email, email_subject, msg)

Thanks for reading. x

Resources