API documentation¶
Topic Models (e.g. LDA) visualization using D3¶
Functions: General Use¶
prepare()
- transform and prepare a LDA model’s data for visualization
prepared_data_to_html()
- convert prepared data to an html string
show()
- launch a web server to view the visualization
save_html()
- save a visualization to a standalone html file
save_json()
- save the visualization JSON data of to a file
Functions: IPython Notebook¶
display()
- display a figure in an IPython notebook
enable_notebook()
- enable automatic D3 display of prepared model data in the IPython notebook.
disable_notebook()
- disable automatic D3 display of prepared model data in the IPython notebook.
-
pyLDAvis.
prepare
(topic_term_dists, doc_topic_dists, doc_lengths, vocab, term_frequency, R=30, lambda_step=0.01, mds=<function js_PCoA>, n_jobs=-1, plot_opts={'xlab': 'PC1', 'ylab': 'PC2'}, sort_topics=True)[source]¶ Transforms the topic model distributions and related corpus data into the data structures needed for the visualization.
Parameters: topic_term_dists : array-like, shape (n_topics, n_terms)
Matrix of topic-term probabilities. Where n_terms is len(vocab).
- doc_topic_dists : array-like, shape (n_docs, n_topics)
Matrix of document-topic probabilities.
- doc_lengths : array-like, shape n_docs
The length of each document, i.e. the number of words in each document. The order of the numbers should be consistent with the ordering of the docs in doc_topic_dists.
- vocab : array-like, shape n_terms
List of all the words in the corpus used to train the model.
- term_frequency : array-like, shape n_terms
The count of each particular term over the entire corpus. The ordering of these counts should correspond with vocab and topic_term_dists.
- R : int
The number of terms to display in the barcharts of the visualization. Default is 30. Recommended to be roughly between 10 and 50.
- lambda_step : float, between 0 and 1
Determines the interstep distance in the grid of lambda values over which to iterate when computing relevance. Default is 0.01. Recommended to be between 0.01 and 0.1.
- mds : function or a string representation of function
A function that takes topic_term_dists as an input and outputs a n_topics by 2 distance matrix. The output approximates the distance between topics. See
js_PCoA()
for details on the default function. A string representation currently accepts pcoa (or upper case variant), mmds (or upper case variant) and tsne (or upper case variant), if sklearn package is installed for the latter two.- n_jobs : int
The number of cores to be used to do the computations. The regular joblib conventions are followed so -1, which is the default, will use all cores.
- plot_opts : dict, with keys ‘xlab’ and ylab
Dictionary of plotting options, right now only used for the axis labels.
- sort_topics : sort topics by topic proportion (percentage of tokens covered). Set to false to
to keep original topic order.
Returns: prepared_data : PreparedData
A named tuple containing all the data structures required to create the visualization. To be passed on to functions like
display()
.See also
save_json()
- save json representation of a figure to file
save_html()
- save html representation of a figure to file
show()
- launch a local server and show a figure in a browser
display()
- embed figure within the IPython notebook
enable_notebook()
- automatically embed visualizations in IPython notebook
Notes
This implements the method of Sievert, C. and Shirley, K. (2014): LDAvis: A Method for Visualizing and Interpreting Topics, ACL Workshop on Interactive Language Learning, Visualization, and Interfaces.
http://nlp.stanford.edu/events/illvi2014/papers/sievert-illvi2014.pdf
-
pyLDAvis.
js_PCoA
(distributions)[source]¶ Dimension reduction via Jensen-Shannon Divergence & Principal Coordinate Analysis (aka Classical Multidimensional Scaling)
Parameters: distributions : array-like, shape (n_dists, k)
Matrix of distributions probabilities.
Returns: pcoa : array, shape (n_dists, 2)
-
pyLDAvis.
prepared_data_to_html
(data, d3_url=None, ldavis_url=None, ldavis_css_url=None, template_type='general', visid=None, use_http=False)[source]¶ Output HTML with embedded visualization
Parameters: data : PreparedData, created using
prepare()
The data for the visualization.
d3_url : string (optional)
The URL of the d3 library. If not specified, a standard web path will be used.
ldavis_url : string (optional)
The URL of the LDAvis library. If not specified, a standard web path will be used.
template_type : string
string specifying the type of HTML template to use. Options are:
"simple"
suitable for a simple html page with one visualization. Will fail if require.js is available on the page.
"notebook"
assumes require.js and jquery are available.
"general"
more complicated, but works both in and out of the notebook, whether or not require.js and jquery are available
visid : string (optional)
The html/css id of the visualization div, which must not contain spaces. If not specified, a random id will be generated.
use_http : boolean (optional)
Returns: vis_html : string
the HTML visualization
See also
save_json()
- save json representation of visualization to file
save_html()
- save html representation of a visualization to file
show()
- launch a local server and show a visualization in a browser
display()
- embed visualization within the IPython notebook
enable_notebook()
- automatically embed visualizations in IPython notebook
-
pyLDAvis.
display
(data, local=False, **kwargs)[source]¶ Display visualization in IPython notebook via the HTML display hook
Parameters: data : PreparedData, created using
prepare()
The data for the visualization.
local : boolean (optional, default=False)
if True, then copy the d3 & mpld3 libraries to a location visible to the notebook server, and source them from there. See Notes below.
**kwargs :
additional keyword arguments are passed through to
prepared_data_to_html()
.Returns: vis_d3 : IPython.display.HTML object
the IPython HTML rich display of the visualization.
See also
show()
- launch a local server and show a visualization in a browser
enable_notebook()
- automatically embed visualizations in IPython notebook
Notes
Known issues: using
local=True
may not work correctly in certain cases:- In IPython < 2.0,
local=True
may fail if the current working directory is changed within the notebook (e.g. with the %cd command). - In IPython 2.0+,
local=True
may fail if a url prefix is added (e.g. by setting NotebookApp.base_url).
-
pyLDAvis.
show
(data, ip='127.0.0.1', port=8888, n_retries=50, local=True, open_browser=True, http_server=None, **kwargs)[source]¶ Starts a local webserver and opens the visualization in a browser.
Parameters: data : PreparedData, created using
prepare()
The data for the visualization.
ip : string, default = ‘127.0.0.1’
the ip address used for the local server
port : int, default = 8888
the port number to use for the local server. If already in use, a nearby open port will be found (see n_retries)
n_retries : int, default = 50
the maximum number of ports to try when locating an empty port.
local : bool, default = True
if True, use the local d3 & LDAvis javascript versions, within the js/ folder. If False, use the standard urls.
open_browser : bool (optional)
if True (default), then open a web browser to the given HTML
http_server : class (optional)
optionally specify an HTTPServer class to use for showing the visualization. The default is Python’s basic HTTPServer.
**kwargs :
additional keyword arguments are passed through to
prepared_data_to_html()
See also
display()
- embed visualization within the IPython notebook
enable_notebook()
- automatically embed visualizations in IPython notebook
-
pyLDAvis.
save_html
(data, fileobj, **kwargs)[source]¶ Save an embedded visualization to file.
This will produce a self-contained HTML file. Internet access is still required for the D3 and LDAvis libraries.
Parameters: data : PreparedData, created using
prepare()
The data for the visualization.
fileobj : filename or file object
The filename or file-like object in which to write the HTML representation of the visualization.
**kwargs :
additional keyword arguments will be passed to
prepared_data_to_html()
See also
save_json()
- save json representation of a visualization to file
prepared_data_to_html()
- output html representation of the visualization
fig_to_dict()
- output dictionary representation of the visualization
-
pyLDAvis.
save_json
(data, fileobj)[source]¶ Save the visualization’s data a json file.
Parameters: data : PreparedData, created using
prepare()
The data for the visualization.
fileobj : filename or file object
The filename or file-like object in which to write the HTML representation of the visualization.
See also
save_html()
- save html representation of a visualization to file
prepared_data_to_html()
- output html representation of the visualization
-
pyLDAvis.
enable_notebook
(local=False, **kwargs)[source]¶ Enable the automatic display of visualizations in the IPython Notebook.
Parameters: local : boolean (optional, default=False)
if True, then copy the d3 & LDAvis libraries to a location visible to the notebook server, and source them from there. See Notes below.
**kwargs :
all keyword parameters are passed through to
prepared_data_to_html()
See also
disable_notebook()
- undo the action of enable_notebook
display()
- embed visualization within the IPython notebook
show()
- launch a local server and show a visualization in a browser
Notes
Known issues: using
local=True
may not work correctly in certain cases:- In IPython < 2.0,
local=True
may fail if the current working directory is changed within the notebook (e.g. with the %cd command). - In IPython 2.0+,
local=True
may fail if a url prefix is added (e.g. by setting NotebookApp.base_url).
-
pyLDAvis.
disable_notebook
()[source]¶ Disable the automatic display of visualizations in the IPython Notebook.
See also
enable_notebook()
- automatically embed visualizations in IPython notebook
pyLDAvis Utilities¶
Utility routines for the pyLDAvis package
-
pyLDAvis.utils.
get_id
(obj, suffix='', prefix='el', warn_on_invalid=True)[source]¶ Get a unique id for the object
-
pyLDAvis.utils.
html_id_ok
(objid, html5=False)[source]¶ Check whether objid is valid as an HTML id attribute.
If html5 == True, then use the more liberal html5 rules.
-
pyLDAvis.utils.
write_ipynb_local_js
(location=None, d3_src=None, ldavis_src=None, ldavis_css=None)[source]¶ Write the pyLDAvis and d3 javascript libraries to the given file location.
This utility is used by the IPython notebook tools to enable easy use of pyLDAvis with no web connection.
Parameters: location : string (optioal)
the directory in which the d3 and pyLDAvis javascript libraries will be written. If not specified, the IPython nbextensions directory will be used. If IPython doesn’t support nbextensions (< 2.0), the current working directory will be used.
d3_src : string (optional)
the source location of the d3 library. If not specified, the standard path in pyLDAvis.urls.D3_LOCAL will be used.
ldavis_src : string (optional)
the source location of the pyLDAvis library. If not specified, the standard path in pyLDAvis.urls.LDAVIS_LOCAL will be used.
Returns: d3_url, ldavis_url : string
The URLs to be used for loading these js files.
LDAvis URLs¶
URLs and filepaths for the LDAvis javascript libraries