Content analysis is a term covering a range of research methods for studying documents (texts, audio, video, etc.) by assigning labels (or codes) to mark interesting pieces of content. Such analysis can be done entirely by hand (coding), or a computer can be trained to identify meaningful content based on a sample (a training set or training data), as in the fields of information retrieval (IR), text mining, stylometry, and opinion mining (also known as sentiment analysis or emotion AI).
Content analysis can use quantitative and/or qualitative approaches. Different techniques allow different types of inferences or results to be derived from the content. The following pages provide an introduction to these two approaches with suggested library resources for further learning.
If you're interested in studying the content of printed, published literature found in the collections of major research libraries, the easiest path is to use HTRC Analytics. A narrower set data (scholarly journals and a few books, primarily in English) is available through JSTOR's Data for Research and their newer platform, Constellate. For more on these and other sources, see the "Text Mining" box on the Gathering and Preparing Data page.
Otherwise, if you would like to gather documents on your own and prepare them for analysis, visit the Gathering and Preparing Data page to start.