API documentation

Topic Models (e.g. LDA) visualization using D3

Functions: General Use

prepare()
transform and prepare a LDA model’s data for visualization
prepared_data_to_html()
convert prepared data to an html string
show()
launch a web server to view the visualization
save_html()
save a visualization to a standalone html file
save_json()
save the visualization JSON data of to a file

Functions: IPython Notebook

display()
display a figure in an IPython notebook
enable_notebook()
enable automatic D3 display of prepared model data in the IPython notebook.
disable_notebook()
disable automatic D3 display of prepared model data in the IPython notebook.
pyLDAvis.prepare(topic_term_dists, doc_topic_dists, doc_lengths, vocab, term_frequency, R=30, lambda_step=0.01, mds=<function js_PCoA>, n_jobs=-1, plot_opts={'xlab': 'PC1', 'ylab': 'PC2'}, sort_topics=True)[source]

Transforms the topic model distributions and related corpus data into the data structures needed for the visualization.

Parameters:
topic_term_dists : array-like, shape (n_topics, n_terms)

Matrix of topic-term probabilities. Where n_terms is len(vocab).

doc_topic_dists : array-like, shape (n_docs, n_topics)

Matrix of document-topic probabilities.

doc_lengths : array-like, shape n_docs

The length of each document, i.e. the number of words in each document. The order of the numbers should be consistent with the ordering of the docs in doc_topic_dists.

vocab : array-like, shape n_terms

List of all the words in the corpus used to train the model.

term_frequency : array-like, shape n_terms

The count of each particular term over the entire corpus. The ordering of these counts should correspond with vocab and topic_term_dists.

R : int

The number of terms to display in the barcharts of the visualization. Default is 30. Recommended to be roughly between 10 and 50.

lambda_step : float, between 0 and 1

Determines the interstep distance in the grid of lambda values over which to iterate when computing relevance. Default is 0.01. Recommended to be between 0.01 and 0.1.

mds : function or a string representation of function

A function that takes topic_term_dists as an input and outputs a n_topics by 2 distance matrix. The output approximates the distance between topics. See js_PCoA() for details on the default function. A string representation currently accepts pcoa (or upper case variant), mmds (or upper case variant) and tsne (or upper case variant), if sklearn package is installed for the latter two.

n_jobs : int

The number of cores to be used to do the computations. The regular joblib conventions are followed so -1, which is the default, will use all cores.

plot_opts : dict, with keys ‘xlab’ and ylab

Dictionary of plotting options, right now only used for the axis labels.

sort_topics : sort topics by topic proportion (percentage of tokens covered). Set to false to

to keep original topic order.

Returns:
prepared_data : PreparedData

A named tuple containing all the data structures required to create the visualization. To be passed on to functions like display().

See also

save_json()
save json representation of a figure to file
save_html()
save html representation of a figure to file
show()
launch a local server and show a figure in a browser
display()
embed figure within the IPython notebook
enable_notebook()
automatically embed visualizations in IPython notebook

Notes

This implements the method of Sievert, C. and Shirley, K. (2014): LDAvis: A Method for Visualizing and Interpreting Topics, ACL Workshop on Interactive Language Learning, Visualization, and Interfaces.

http://nlp.stanford.edu/events/illvi2014/papers/sievert-illvi2014.pdf

pyLDAvis.js_PCoA(distributions)[source]

Dimension reduction via Jensen-Shannon Divergence & Principal Coordinate Analysis (aka Classical Multidimensional Scaling)

Parameters:
distributions : array-like, shape (n_dists, k)

Matrix of distributions probabilities.

Returns:
pcoa : array, shape (n_dists, 2)
class pyLDAvis.PreparedData[source]
Attributes:
R

Alias for field number 3

lambda_step

Alias for field number 4

plot_opts

Alias for field number 5

token_table

Alias for field number 2

topic_coordinates

Alias for field number 0

topic_info

Alias for field number 1

topic_order

Alias for field number 6

Methods

count(value)
index(value, [start, [stop]]) Raises ValueError if the value is not present.
to_dict  
to_json  
pyLDAvis.prepared_data_to_html(data, d3_url=None, ldavis_url=None, ldavis_css_url=None, template_type='general', visid=None, use_http=False)[source]

Output HTML with embedded visualization

Parameters:
data : PreparedData, created using prepare()

The data for the visualization.

d3_url : string (optional)

The URL of the d3 library. If not specified, a standard web path will be used.

ldavis_url : string (optional)

The URL of the LDAvis library. If not specified, a standard web path will be used.

template_type : string

string specifying the type of HTML template to use. Options are:

"simple"

suitable for a simple html page with one visualization. Will fail if require.js is available on the page.

"notebook"

assumes require.js and jquery are available.

"general"

more complicated, but works both in and out of the notebook, whether or not require.js and jquery are available

visid : string (optional)

The html/css id of the visualization div, which must not contain spaces. If not specified, a random id will be generated.

use_http : boolean (optional)

If true, use http:// instead of https:// for d3_url and ldavis_url.

Returns:
vis_html : string

the HTML visualization

See also

save_json()
save json representation of visualization to file
save_html()
save html representation of a visualization to file
show()
launch a local server and show a visualization in a browser
display()
embed visualization within the IPython notebook
enable_notebook()
automatically embed visualizations in IPython notebook
pyLDAvis.display(data, local=False, **kwargs)[source]

Display visualization in IPython notebook via the HTML display hook

Parameters:
data : PreparedData, created using prepare()

The data for the visualization.

local : boolean (optional, default=False)

if True, then copy the d3 & mpld3 libraries to a location visible to the notebook server, and source them from there. See Notes below.

**kwargs :

additional keyword arguments are passed through to prepared_data_to_html().

Returns:
vis_d3 : IPython.display.HTML object

the IPython HTML rich display of the visualization.

See also

show()
launch a local server and show a visualization in a browser
enable_notebook()
automatically embed visualizations in IPython notebook

Notes

Known issues: using local=True may not work correctly in certain cases:

  • In IPython < 2.0, local=True may fail if the current working directory is changed within the notebook (e.g. with the %cd command).
  • In IPython 2.0+, local=True may fail if a url prefix is added (e.g. by setting NotebookApp.base_url).
pyLDAvis.show(data, ip='127.0.0.1', port=8888, n_retries=50, local=True, open_browser=True, http_server=None, **kwargs)[source]

Starts a local webserver and opens the visualization in a browser.

Parameters:
data : PreparedData, created using prepare()

The data for the visualization.

ip : string, default = ‘127.0.0.1’

the ip address used for the local server

port : int, default = 8888

the port number to use for the local server. If already in use, a nearby open port will be found (see n_retries)

n_retries : int, default = 50

the maximum number of ports to try when locating an empty port.

local : bool, default = True

if True, use the local d3 & LDAvis javascript versions, within the js/ folder. If False, use the standard urls.

open_browser : bool (optional)

if True (default), then open a web browser to the given HTML

http_server : class (optional)

optionally specify an HTTPServer class to use for showing the visualization. The default is Python’s basic HTTPServer.

**kwargs :

additional keyword arguments are passed through to prepared_data_to_html()

See also

display()
embed visualization within the IPython notebook
enable_notebook()
automatically embed visualizations in IPython notebook
pyLDAvis.save_html(data, fileobj, **kwargs)[source]

Save an embedded visualization to file.

This will produce a self-contained HTML file. Internet access is still required for the D3 and LDAvis libraries.

Parameters:
data : PreparedData, created using prepare()

The data for the visualization.

fileobj : filename or file object

The filename or file-like object in which to write the HTML representation of the visualization.

**kwargs :

additional keyword arguments will be passed to prepared_data_to_html()

See also

save_json()
save json representation of a visualization to file
prepared_data_to_html()
output html representation of the visualization
fig_to_dict()
output dictionary representation of the visualization
pyLDAvis.save_json(data, fileobj)[source]

Save the visualization’s data a json file.

Parameters:
data : PreparedData, created using prepare()

The data for the visualization.

fileobj : filename or file object

The filename or file-like object in which to write the HTML representation of the visualization.

See also

save_html()
save html representation of a visualization to file
prepared_data_to_html()
output html representation of the visualization
pyLDAvis.enable_notebook(local=False, **kwargs)[source]

Enable the automatic display of visualizations in the IPython Notebook.

Parameters:
local : boolean (optional, default=False)

if True, then copy the d3 & LDAvis libraries to a location visible to the notebook server, and source them from there. See Notes below.

**kwargs :

all keyword parameters are passed through to prepared_data_to_html()

See also

disable_notebook()
undo the action of enable_notebook
display()
embed visualization within the IPython notebook
show()
launch a local server and show a visualization in a browser

Notes

Known issues: using local=True may not work correctly in certain cases:

  • In IPython < 2.0, local=True may fail if the current working directory is changed within the notebook (e.g. with the %cd command).
  • In IPython 2.0+, local=True may fail if a url prefix is added (e.g. by setting NotebookApp.base_url).
pyLDAvis.disable_notebook()[source]

Disable the automatic display of visualizations in the IPython Notebook.

See also

enable_notebook()
automatically embed visualizations in IPython notebook

pyLDAvis Utilities

Utility routines for the pyLDAvis package

class pyLDAvis.utils.NumPyEncoder(skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, encoding='utf-8', default=None)[source]

Methods

encode(o) Return a JSON string representation of a Python data structure.
iterencode(o[, _one_shot]) Encode the given object and yield each string representation as available.
default  
default(obj)[source]

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)
pyLDAvis.utils.get_id(obj, suffix='', prefix='el', warn_on_invalid=True)[source]

Get a unique id for the object

pyLDAvis.utils.html_id_ok(objid, html5=False)[source]

Check whether objid is valid as an HTML id attribute.

If html5 == True, then use the more liberal html5 rules.

pyLDAvis.utils.write_ipynb_local_js(location=None, d3_src=None, ldavis_src=None, ldavis_css=None)[source]

Write the pyLDAvis and d3 javascript libraries to the given file location.

This utility is used by the IPython notebook tools to enable easy use of pyLDAvis with no web connection.

Parameters:
location : string (optioal)

the directory in which the d3 and pyLDAvis javascript libraries will be written. If not specified, the IPython nbextensions directory will be used. If IPython doesn’t support nbextensions (< 2.0), the current working directory will be used.

d3_src : string (optional)

the source location of the d3 library. If not specified, the standard path in pyLDAvis.urls.D3_LOCAL will be used.

ldavis_src : string (optional)

the source location of the pyLDAvis library. If not specified, the standard path in pyLDAvis.urls.LDAVIS_LOCAL will be used.

Returns:
d3_url, ldavis_url : string

The URLs to be used for loading these js files.

LDAvis URLs

URLs and filepaths for the LDAvis javascript libraries