Content analysis is a term covering a range of research methods for studying documents (texts, audio, video, etc.) by assigning labels (or codes) to mark interesting pieces of content. Such analysis can be done entirely by hand (coding), or a computer can be trained to identify meaningful content based on a sample (a training set or training data), as in the fields of information retrieval (IR), text mining, stylometry, and opinion mining (also known as sentiment analysis or emotion AI).
Content analysis can use quantitative and/or qualitative approaches. Different techniques allow different types of inferences or results to be derived from the content. The following pages provide an introduction to these two approaches with suggested library resources for further learning.
If you're interested in studying the content of printed, published literature found in the collections of major research libraries, the easiest path is to use HTRC Analytics. Alternatively, word frequency for most of this content can also be explored using the Google Ngram Viewer. A narrower set data (scholarly journals and a few books, primarily in English) is available through JSTOR's Data for Research.
There are also various special-purpose text collections available, such as RadioTalk (transcriptions of talk radio in the US during a few months of 2018–2019).
Otherwise, if you would like to gather documents on your own and prepare them for analysis, visit the Gathering and Preparing Data page to start.