Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
model=FastText(
vector_size=m.dim,
vector_size=m.dim,
window=m.ws,
window=m.ws,
epochs=m.epoch,
epochs=m.epoch,
negative=m.neg,
negative=m.neg,
# FIXME: these next 2 lines read in unsupported FB FT modes (loss=3 softmax or loss=4 onevsall,# or model=3 supervi
bug
Issue described a bug
difficulty easy
Easy issue: required small fix
good first issue
Issue for new contributors (not required gensim understanding + very simple)
fasttext
Issues related to the FastText model
_catboost.pyx in _catboost._set_features_order_data_pd_data_frame()
_catboost.pyx in _catboost.get_cat_factor_bytes_representation()
CatBoostError: Invalid type for cat_feature[non-default value idx=1,feature_idx=336]=2.0 : cat_features must be integer or string, real number values and NaN values should be converted to string.
The exmaple notebooks should show the cell outputs. Its much more readable and convenient for users if they can just see the output alreadys and dont need to run it themsels.
Unless I missed something, the documentation doesn't explain how to query document metadata (searching "site:montferret.dev metadata" through Google returned nothing, neither did grepping the source code).
As an example, I tried to query the og:url metadata.
I tried variations of //meta[property='og:url']::attr(content), with or without the leading //, and with or without the `attr(conte
Summary
mypyshows some issues in LightGBM's Python package.mypy \ --exclude='python-package/compile/|python-package/build' \ --ignore-missing-imports \ python-package/18 errors in 4 files (click me)