Introduction
About two months ago, I created the Rare Books feed on BlueSky. Ever since, I’ve been thinking that I need to release the specifics of the feed (to publish my algorithm)–but then, I didn’t have an outlet for those sorts of side projects, curiosities, or longform updates. Hence this post, my first blog. I’m about two decades late to the blogosphere, but that’s alright. I’ll try to make use of the format more often.
My motivation for the Rare Books feed was simple: I wanted a place where I could casually scroll through rare book and book collecting content. Why rare books? Well, I’m interested in them as a collector and bookseller. I’m always on the hunt for first editions or signed copies. I like to discuss book collecting as a pastime. I try to keep up with the trade by reviewing bookseller and auction catalogues and by reading news from major organizations like the Antiquarian Booksellers’ Association of America (ABAA) or the International Online Booksellers Association (IOBA). I’m also interested in how book collecting is portrayed in popular media. I like to hear from bibliophiles at various levels of collecting, whether they’re amassing a vast library of notable high-spots, or if they focus on a particular niche or subject matter that may not be monetarily valuable, but certainly holds intellectual or nostalgic promise.
Social media and online forums provide some resources for finding related content. There are Facebook groups and Subreddits and so forth, but honestly, they were never very satisfying. So, I looked to BlueSky because it has a “custom feed” feature. That means BlueSky lets you enact your own content algorithms on its platform. If that seems strange, it’s because it is. I don’t know of any other social media that allows you so much control over the content you consume. I love BlueSky for that. I spend more time there than on any other platform these days. Yet I don’t compulsively check it like I used to do with other social media. Why? I don’t know if I can articulate it in detail, but essentially, I think BlueSky is just conducive to a healthy relationship with social media. It’s also not owned by a weird billionaire… But that’s beside the point. Long-story-short: BlueSky is rad and you should make an account.
To create the Rare Books feed, I used SkyFeed. It’s a third-party application that allows you to build custom feeds. I don’t want this post to become a tutorial on SkyFeed so I won’t get into all the specifics about how it works. It’s very user-friendly, though. You don’t need to know how to code to use SkyFeed. Some familiarity with regular expressions is helpful, but that’s the most technical aspect of the app. If you’re interested, you can learn more on BlueSky’s blog. They explain more about it and highlight its developer, Redsolver, who has done a great job maintaining it through various server expansions and influxes of users on BlueSky.
Breakdown of the Algorithm
Okay, so let’s get into the algorithm. It follows six straightforward steps. I’ll try to break them down as clearly and concisely as possible. By the end of reading, you should have a better sense of why you see what you see when you scroll through the Rare Books feed.
Step 1: Inputting Content
I needed to input content to give my algorithm something to work with. On BlueSky, the content includes posts, reposts, replies, etc. In the SkyFeed app, you can select various sources for content–things like individual users’ posts, content on existing feeds, tagged posts, and so on.
I made it so the Rare Books feed is compiled from the entire network on BlueSky over a five-day period. This means the feed reviews every post from the past five days in search of rare book content. I chose the entire network because I wanted to cast a wide net. If there are any conversations about rare books on BlueSky, I want to be able to find them. Using the whole network as an input gives me an opportunity to do so.
I chose a five-day window because I’ve found that reviewing the past five days provides enough content to scroll for a long time, but not so much that it makes loading the feed painfully slow. I’ll address this challenge in more detail below but suffice it to say that SkyFeed works best when you keep its computational workload as manageable as possible.
Step 2: Removing Replies
I removed all replies from the input. This decision did not come lightly. After several weeks of assessing the quality of the feed, I noticed that a lot of its content included replies with offhand use of book collecting jargon, but not necessarily posts directly discussing rare books or book collecting. Of course, this omission sacrifices those replies that are in fact relevant, but I also didn’t want whole conversations about other stuff appearing on the feed just because one person replied with something like, “I remember when he signed copies of his book at an event in town.” Or “I bet you could learn more by visiting the special collections.” It wasn’t like the feed was ruined by including replies, but after I omitted them, I realized the relevance of the content was mostly better.
Step 3: Regular Expressions
I created a list of regular expressions that capture words or phrases commonly used among booksellers and collectors. This step was the most technical part of the algorithm. It’s also been the most work–work that, in my opinion, will continue for as long as the feed exists. Put simply, language changes and social media platforms are dynamic spaces where the userbase evolves over time. That means there’s no accounting for the precise words that people will use to discuss rare books in the future–and that’s why my list of regular expressions will need to be updated and maintained.
But to get things rolling, I considered what words or phrases most often appear in book collecting circles. To be honest, I haven’t been very scientific about my list. I’m just going off vibes and experience. I would say my experience with the jargon of the rare book world is satisfactory, though. I spend an inordinate amount of time discussing rare books, reading book collecting forums, reading catalogue descriptions, and writing the catalogues for Evening Land Books. I also wrote this free online glossary with contextualized definitions for over 300 book terms. But that being said, I’m sure there are more accurate ways of identifying the most quintessential rare book lingo. There’s a host of text-mining and LLM possibilities in that regard. I just haven’t bothered to try those things for this little side project (yet).
Instead, I took my list of words or phrases that I believe signal rare book content and I added them into my algorithm. As I did this, I reviewed what sorts of content they were finding. Through dozens of iterations, I honed the current list (see Table 1). Over time, I’ll continue to add, replace, or remove words or phrases. It’s certainly laborious to do this kind of maintenance, but I think it’s worth it. It means I’ll continue to have control over the content I consume on BlueSky.
1st edition […] antiquarian | book seminar | huntington library |
1st edition […] book | bookauction | ilab […] book |
1st edition […] copy | bookhistory | incunab |
1st edition […] hardcover | center for the […] book | inscribed 1st edition |
1st edition […] novel | club edition […] book | inscribed first edition |
1st edition […] rare | club edition […] rare | john carter brown library |
abaa […] book | collectible […] book | manuscript […] volume |
abebook | color engraving | manuscript library |
antiquarian […] 1st edition | cover […] first edition | marble […] endpaper |
antiquarian […] book | dust cover […] edition | marble […] volume |
antiquarian […] bookseller | dust cover) […] rare | modern 1st edition |
antiquarian […] edition | dust jacket […] book | modern first edition |
antiquarian […] first edition | dust jacket […] edition | morgan library |
antiquarian […] rare | dust jacket […] rare | newberry library |
aquatint | early edition […] rare | original […] dust jacket |
biblio.com | early printing […] rare | photogravure |
bibliographical society | engraving […] book | rare […] bookseller |
book […] 1st edition | ephemera […] book | rare […] bookshop |
book […] 1st edition | ephemera […] rare | rare […] bookstore |
book […] antiquarian | extant cop | rare […] lithograph |
book […] bibliographical | fine press | rare […] manuscript |
book […] club edition | finebooksmagazine | rare […] woodcut |
book […] early edition | first american edition | rare book |
book […] early printing | first british edition | rarebook |
book […] ephemera | first edition […] book | second printing […] rare |
book […] first edition | first edition […] copy | signed […] 1st edition |
book […] first edition | first edition […] hardback | signed […] first edition |
book […] first printing | first edition […] hardcover | signed […] lithograph |
book […] original binding | first edition […] novel | signed first edition |
book […] woodcut | first edition […] rare | special collections |
book auction | first english edition | true 1st edition |
book catalogue | first printing […] book | true first edition |
book collect […] rare | first printing […] rare | woodcut […] book |
book collecting | folio society | |
book expert | grolier club | |
book histor | houghton library |
With the list in place, my next two challenges were translating the words or phrases into efficient regular expressions and culling the list of regular expressions so it didn’t exhaust SkyFeed’s computational resources. I should note here, too, that these challenges will continue as part of the maintenance of the feed. But on the former challenge, I had numerous options for capturing words or phrases with regular expressions. And just so we’re clear, regular expressions are metacharacters that represent patterns in written language. For example, if you wanted to find any content that mentions “ephemera” and then “first edition” in the same post, you could use the regular expression “ephemera.*first edition”. The “.*” essentially translates to “any number of characters between”–so, “ephemera.*first edition” essentially tells the algorithm to find any posts where 1) the word “ephemera” appears, 2) followed by any number of characters, and 3) followed by the phrase “first edition”. That’s just one example of a regular expression. If you’d like to learn more about them or practice using them, I’d try this website. It allows you to test regular expressions and navigate exactly what metacharacter patterns they represent.
So far, I’ve come up with 31 regular expressions (see Table 2) that capture all the words and phrases in my list. Some are just keywords or phrases like “rare book” or “fine press” or “incunab”. Others are far more complicated with several combinations of words or phrases regarded in multiple orders at once. These regular expressions have proven effective, but again, they require continual maintenance. Sometimes, I’ll be scrolling the Rare Books feed and I’ll notice some content that I don’t think really belongs. When that happens, I return to SkyFeed and tweak the regular expression that is causing this irrelevant content to appear.
Finally, I should note that these regular expressions don’t account for every word or phrase I’d like to capture, but the second challenge at this step in the algorithm is culling my list so it doesn’t exhaust SkyFeed’s computational resources. In other words, I can’t use too many regular expressions or the feed will take forever to load. I’ve found that SkyFeed seems to be able to handle about 30 regular expressions. But that’s still pushing it… And it’s a shame! I know there are a lot more words, phrases, and combinations I could use to identify rare book content, but it’s a balancing act between thoroughness and efficiency.
([\W]signed|inscribed) (1st|first) edition |
([\W]signed|inscribed) (lithograph|ephemera) |
([^a-z]book|hardback|rare).*(first edition|1st edition|early edition|antiquarian|first printing|second printing|early printing|\d\w\w printing|club edition|ephemera|book collect|dust jacket|woodcut|engraving|[^a-z]ilab|[^a-z]abaa[^a-z]|lithograph|original binding) |
(antiquarian|first printing|second printing|early printing|\d\w\w printing|club edition|ephemera|book collect|dust jacket|woodcut|engraving|[^a-z]ilab|[^a-z]abaa[^a-z]|center for the|lithograph|original binding).*rare |
(bibliographical|[\W]folio) society |
(book|antiquarian|volume|text|novel|copy).*(first edition|1st edition) |
(first edition|1st edition).*(novel|copy|volume|text|book|antiquarian) |
(first edition|1st edition|early edition|antiquarian|first printing|second printing|early printing|\d\w\w printing|club edition|ephemera|book collect|dust jacket|woodcut|collect[\w]ble|engraving|[^a-z]ilab|[^a-z]abaa[^a-z]|center for the|lithograph|original binding).*([^a-z]book|hardback) |
(grolier|caxton) club |
(manuscript|newberry|john carter brown|morgan|houghton|huntington) library |
(modern first|modern 1st)[\W]edition |
(rare|[\W]abe)book |
[^a-z]book (expert|auction|collecting|histor|seminar|catalog) |
aquatint[^a-z] |
biblio[.]com |
bookauction |
bookhistory |
col(o|ou)r engraving |
dust jacket.*(book|edition) |
extant cop |
fine press |
finebooksmagazine |
first (american|english|british) edition |
incunab |
manuscript.*volume |
marble.*(endpaper|volume) |
original dust (jacket|cover) |
photogravure |
rare book |
special collections |
true (1st|first) edition |
Step 4: Inverted Regular Expressions
Next, I created a list of inverted regular expressions (see Table 3) that–it may surprise you–remove certain posts. This step is a bit controversial because it implies some distinctions between closely associated areas of collecting. It also implies some definitions of what constitutes a “rare book”. While I don’t usually subscribe to such hard rules or designations, I still don’t think I can part with my inverted regular expressions. They help to ensure that the Rare Books feed is capturing what I want it to. Otherwise, it would be subsumed by other things.
Those “other things” mainly fall into five categories: 1) comic books, 2) vinyl records, 3) trading cards, 4) Dungeons & Dragons collectibles, and 5) independent authors promoting their books.
Here’s how I determined these things: as I reviewed the content captured by my regular expressions in Table 2, I noticed a lot of overlap between the jargon of the rare book world and the jargons of these other categories. If it was just the occasional comic book post or indie author promoting their work, I wouldn’t have felt the need to remove them. But it seems these subcommunities produce a lot of content on BlueSky–and that’s good. However, they produced so much that it was drowning out the other content on my feed. I therefore had to remove content falling into these categories. I decided to do this by using inverted regular expressions–that is, rather than include content containing these regular expressions, the inverted ones filter it out.
I suppose there’s one other category I filtered out, too. It’s miscellaneous stuff. For example, the electronic music duo Autechre released an album called “Incunabula”. You’d be surprised how often it gets mentioned on BlueSky. There’s also an account that posts top trending words on BlueSky and it consistently highlights words from Table 1–so, a few times a week, there would be a “top trending” word cloud on the Rare Books feed, which wasn’t relevant. This is why some seemingly random phrases appear in Table 3–things like “autechre” or “trending words”. They’re just anecdotal instances of diction overlap that cannot be categorized as overlap with other bookish subcommunities or collecting interests. Yet they appeared on Rare Books feed often enough for me to notice, so I filtered them out.
(roleplaying|role-playing|role playing) |
[^a-z]I published the \w\w\w\w\w edition |
[^a-z]I wrote the \w\w\w\w\w edition |
autechre |
collect[\w]ble.*card |
comic book |
comicbook |
(pre-order|preorder) |
sourcebook |
strictly prohibited |
trending words |
vinyl |
Step 5: Input List of Rare Book Posters
Alright, I guess I’ll pause here and note that the Rare Books feed is technically functional with steps 1 through 4. It sifts through everything on BlueSky over a five-day period, identifies content that uses the rare book jargon I’ve defined in my regular expressions, then filters out irrelevant content. For a while, I felt these steps were enough. They were providing a fair amount of content to keep me satisfied each day as I scrolled through the feed.
But then I began to notice some recurring accounts on the feed. They were individuals, booksellers, and institutions that regularly posted rare book content. Naturally, I began to follow them, and in turn, I noticed that some of their rare book posts were not appearing on the feed. This was because my list of regular expressions was limited for the sake of computational efficiency. The consequence was that it was missing posts that were relevant to rare books and book collecting.
While it’s not a perfect solution, I decided to add this step to the algorithm: the step of inputting all posts from accounts I have deemed Rare Book Posters. These are accounts I’ve vetted that almost always, if not always, post rare book and book collecting content. My list of Rare Book Posters can be viewed here. As I write this in October of 2024, the list includes about twenty accounts. They are all booksellers, rare book institutions, or rare book librarians and archivists. Everything they post appears on the feed whether it includes rare book language or not.
Of course, this doesn’t solve the limitations of my regular expressions, but it does make the feed more robust without adding too much computational stress to the system. As I continue to enjoy the feed–and as the BlueSky userbase continues to grow–I’ll vet more accounts and add them to the list. It’s just another way of capturing a larger slice of the relevant content.
Step 6: Sort Output
Last but not least, I needed to sort the content the algorithm captured. In SkyFeed, you can sort feeds in several ways (by like count, reply count, repost count, randomly, and more). I chose to sort the Rare Books feed by Creation Date–or, in other words, newest posts first. I may change this option eventually, but I think it works best for feeds checked daily. By sorting by Creation Date, I don’t ever have to scroll past the stuff I’ve already seen.
Final Thoughts
That’s it! Six easy steps. Just to recap more concisely, the six steps to the algorithm are:
- Inputting everything on BlueSky over the last five days
- Removing replies
- Capturing relevant content with regular expressions
- Filtering out irrelevant content with regular expressions
- Inputting posts from especially bookish accounts
- Sorting the output
This project was more involved than I had imagined when I first started, but I wouldn’t say it’s work. It’s just become part of my social media usage. Once I had established the basic steps of the algorithm, it wasn’t hard to tweak things to try to carve out better slices of rare book content on BlueSky. Overall, too, this project has been quite liberating. Back when I spent more time on Meta’s platforms, I often felt like I was being inundated with hot garbage. I’ve never felt that way with BlueSky, and I’ve especially never felt that way with the Rare Books feed because I get to choose the filters, parameters, inputs, and so forth. It’s a nice feeling (autonomy), one not typically felt on social media.
The custom feed feature is not perfect, though. The guardrails in terms of overloading the app are difficult to work around. As it currently stands, the Rare Books feed takes about six seconds to load. I know that’s not great. But it’s the compromise I’m willing to make in order to use the number of regular expressions I think are necessary and to sift through the entire BlueSky network in order to find as much relevant content as possible. I also believe the server challenges will be addressed in time. If more users start making custom feeds and BlueSky continues to grow, I bet there’ll be more investment in SkyFeed or other custom feed features.
Now let me conclude by directly addressing some potential audiences who may have bothered to read this far.
Firstly, to my fellow bibliophiles: please join BlueSky and post rare book content. Invite booksellers, rare book institutions, and collectors. If you primarily post rare book content, I’ll add you to my list of vetted Rare Book Posters. It would be great to see the rare book community grow on BlueSky.
Secondly, to my fellow academics: may I kindly suggest you toy around with custom feeds on BlueSky and consider assigning the activity to undergraduate students. I think our students would benefit tremendously from at least some exposure to custom feeds. I’ve observed that students desire more knowledge about social media and its impact(s) on their lives. By building custom feeds, they would be given the opportunity to learn more about content algorithms and some fundamental differences between platforms. They may also discover the value (as I have) of more autonomy online. I know I’ll be incorporating the exercise into classes I teach in the future. I’m confident my students will respond positively.