Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Cracking the DataScience Interview  



1.1  Basic Stuff To Know  
















User:LI AR/Books/Cracking the DataScience Interview

















User page
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
User contributions
User logs
View user groups
Upload file
Special pages
Permanent link
Page information
Get shortened URL
Download QR code
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 

< User:LI AR


Cracking the DataScience Interview

[edit]

Basic Stuff To Know

[edit]
Generic pages
Glossaire_de_l'exploration_de_données
Big_data
Tips / Known Limits of DS
Overfitting
Bias–variance_tradeoff / http://www.ritchieng.com/machinelearning-learning-curve/
Sampling_bias
Survivorship_bias
Selection_bias
Concept_drift
Correlation_does_not_imply_causation
Curse_of_dimensionality
Vanishing_gradient_problem
Machine Learning definition and types
Artificial_intelligence
List_of_machine_learning_concepts
Machine_learning
Data_mining
Knowledge_extraction
Knowledge_extraction#Knowledge_discovery
Pattern_recognition
Signal_processing
Supervised_learning
Semi-supervised_learning
Unsupervised_learning
Reinforcement_learning
Online_machine_learning
Incremental_learning
Q-learning
One-shot_learning / https://www.quora.com/What-is-zero-shot-learning
Feature_learning
Learning_to_rank
Similarity_learning
Biclustering
Natural_language_processing
Biomimetics
Collective_intelligence
Data_stream_mining
Sequential_pattern_mining
Clickstream
Semantics
Semantic_Web
Speech_recognition
Speech_synthesis
Collaborative_filtering
Competitions
Datasets
List_of_datasets_for_machine_learning_research
Software
Data Manipulation
Annotate examples: https://prodi.gy/
Data_pre-processing
Data_cleansing
Data_reduction
Data_wrangling
Data_scrubbing
Data_editing
Data_scraping
Data_curation
Data_pre-processing
Data_fusion
Data_integration
Data_binning
Sanitization_(classified_information)
Extract,_transform,_load
Imputation_(statistics)
Interpolation
Outlier
Local_case-control_sampling#Imbalanced_datasets
Sampling_(statistics)
Sampling_(statistics)#Stratified_sampling
Stratified_sampling
Jackknife_resampling
Oversampling_and_undersampling_in_data_analysis
Oversampling_and_undersampling_in_data_analysis#SMOTE
AdaBoost
Unicode_equivalence#Normalization
URL_normalization
Text_segmentation
N-gram
Tokenization_(lexical_analysis)
Stemming
Word2vec https://www.tensorflow.org/tutorials/word2vec
 https://github.com/explosion/thinc
Spatial_data
Trend_surface_analysis
Variogram
Geary's_C
Moran's_I
Spatial_descriptive_statistics#Ripley.27s_K_and_L_functions
Dynamic_time_warping
Normalization_(image_processing)
Normalized_frequency_(unit)
Image_segmentation


Techniques for Feature/Attribute Selection/Dimensionality Reduction
High-dimensional_statistics
Dimensionality_reduction
Factor_analysis
Principal_component_analysis
Independent_component_analysis
Singular_value_decomposition
Multidimensional_scaling
T-distributed_stochastic_neighbor_embedding
Autoencoder
Deep_learning#Stacked_.28de-noising.29_auto-encoders
Elastic_map
Linear_discriminant_analysis
Compressed_sensing
Spatial_analysis
Spatial_analysis#Spatial_dependency_or_auto-correlation
Maths (Stats / Algebra)
Pseudo-random_number_sampling
Glossary_of_probability_and_statistics
Bijection,_injection_and_surjection
Mean
Harmonic_mean
Median
Mode_(statistics)
Range_(mathematics)
Quartile
Interquartile_range
Variance
Covariance
Standard_deviation
Collinearity#Usage_in_statistics_and_econometrics
ANOVA
ANCOVA
MANOVA
ANORVA
Moving_average
EWMA_chart
Exponential_smoothing
Autoregressive_model
Autoregressive–moving-average_model
Autoregressive_integrated_moving_average
Autocorrelation
Cross-correlation
Entropy_in_thermodynamics_and_information_theory
Moment_(mathematics)
Residual
Expected_value
Likelihood_function
Cumulative_distribution_function
Probability
Probability_mass_function
Probability_density_function
Prior_probability
Prior_knowledge_for_pattern_recognition
Permutation https://fr.wikipedia.org/wiki/Arrangement
Combination https://fr.wikipedia.org/wiki/Combinaison_(math%C3%A9matiques)
Dependent_and_independent_variables
Independence_(probability_theory)
Hoeffding's_inequality
Pareto_efficiency
Nash_equilibrium
Pareto_principle
Tensor
Tensor_product
Cross_product
Taxicab_geometry
Norm_(mathematics)#Euclidean_norm
Lp_space
Norm_(mathematics)
Determinant
Trace_(linear_algebra)
Eigenvalues_and_eigenvectors
Projection_(mathematics)
Curvature
Convolution
Hadamard_product_(matrices)
Kernel_(statistics)
Radial_basis_function
Logit
Latent_variable
Inference
Statistical_inference
Inductive_reasoning
Deduction_and_induction
Transduction_(machine_learning)
Stochastic
Stochastic_process
Probability_theory
Probability
Posterior_probability
Statistic
Statistics
Gaussian_noise
Bayesian_inference
Bayes_rule
Bayes'_theorem
Bayesian_network
Naive_Bayes_spam_filtering
Naive_Bayes_classifier
Belief_propagation#Approximate_algorithm_for_general_graphs
Loss_function
Regularization_(mathematics)
Normalization_(statistics)
Quantile_normalization
Nyström_method (+PCA)
Preference_(economics)
Delaunay_triangulation
Neighbourhood_(mathematics)
Mutation_(genetic_algorithm)
Crossover_(genetic_algorithm)
Selection_(genetic_algorithm)
Fitness_function
Utility#Utility_functions
Kernel_method
Kernel_(image_processing)
Kernel_(statistics)
Rectifier_(neural_networks)
Backpropagation
Gradient
Gradient_descent
Stochastic_gradient_descent
Gradient_boosting
Softmax_function
Sigmoid_function
Hyperbolic_function#Tanh
Dropout_(neural_networks)
Radial_basis_function
Hebbian_theory
Signal_processing
Low-pass_filter
High-pass_filter
Energy_(signal_processing)
Fast_Fourier_transform
Wavelet
Discrete_wavelet_transform
Coherence_(signal_processing)
Kalman_filter
Time_series
Decomposition_of_time_series
Seasonal_adjustment
Seasonality
Frequency_domain
Time_domain
Spectral_density
Game_theory
A*_search_algorithm
Minimax
Multi-armed_bandit
Zero-sum_game


Distances
Distance
Euclidean_distance [dim1]
Edit_distance
Hamming_distance
Manhattan_distance [dim1]
Levenshtein_distance
Needleman–Wunsch_algorithm
Minkowski_distance [dim n == generalization]
Mahalanobis_distance
Canberra_distance
Distance_correlation
Angular_distance
String_metric
Jaro–Winkler_distance
Jaccard_index
Kendall_tau_distance
Chebyshev_distance
Tf–idf
Neural_coding
Hausdorff_distance [between clouds of points, a point and a cloud]
Distance#Distances_between_sets_and_between_a_point_and_a_set


Distributions
Discrete_uniform_distribution
Normal_distribution
Bernoulli_distribution
Binomial_distribution
Poisson_distribution
Chi-squared_distribution
Log-normal_distribution
Pareto_distribution
Chi-squared_distribution
Gibbs_distribution
Weibull_distribution
Gamma_distribution
Beta_distribution
Hypergeometric_distribution
Dirac_delta_function
Evaluation
Performance_indicator
Mean_absolute_percentage_error
Mean_absolute_scaled_error
Symmetric_mean_absolute_percentage_error
Regression-kriging
Information_gain_ratio
Kullback–Leibler_divergence
Gini_coefficient
Pearson_correlation_coefficient
Entropy

http://www.cbcb.umd.edu/~salzberg/docs/murthy_thesis/node15.html

Akaike_information_criterion https://twitter.com/DataSciFact/status/963129411250933760
Bayesian_information_criterion
Brier_score == RMSE
Structural_similarity
Type_I_and_type_II_errors
False_positive_rate
False_coverage_rate
False_discovery_rate
Confusion_matrix
Accuracy_and_precision
Precision_and_recall
F1_score
Sensitivity_and_specificity
Receiver_operating_characteristic
Receiver_operating_characteristic#Area_under_the_curve
Discounted_cumulative_gain
Cross-validation_(statistics)
Errors_and_residuals
Heteroscedasticity
Dunn_index
Rand_index
Jaccard_index
Silhouette_(clustering)
Item_response_theory
BLEU
Working with Text
Part_of_speech
Semantic_similarity
Tf–idf
Cosine_similarity
Okapi_BM25
Named-entity_recognition
Conditional_random_field
Latent_Dirichlet_allocation
Sentiment_analysis
Web_mining
Web_crawler
Text_mining
Document_classification
Automatic_summarization
Working with Images
Working with concepts (Ontologies)

https://en.wikipedia.org/wiki/YAGO_%28database%29 http://wiki.dbpedia.org/ http://conceptnet.io/ http://cogcomp.org/Data/QA/QC/definition.html

Visualization
Data_visualization
Exploratory_data_analysis
List_of_graphical_methods
Category:Statistical_charts_and_diagrams
Statistical_graphics
Visual_perception
Heat_map
Misleading_graph
Pareto_chart
(Statistical) tests
A/B_testing
Statistical_power
Statistical_hypothesis_testing
P-value
Student's_t-test
Chi-squared_test
Type_I_and_type_II_errors
Stationary_process
Structural_break
Chow_test
Kruskal–Wallis_one-way_analysis_of_variance
F-test
F-statistics
Pairwise_summation
CUSUM
Lyapunov_exponent
Kolmogorov_complexity
Machine Learning Techniques
Statistical_classification
One-class_classification
Binary_classification
Multiclass_classification
Multi-label_classification
Structured_prediction
Cluster_analysis
Elbow_method_(clustering)
Nearest_neighbor_search#Approximate_nearest_neighbor
Regression_analysis
Linear_regression
Logistic_regression
Ridge_regression
Kriging
Multivariate_adaptive_regression_splines
Association_rule_learning
Apriori_algorithm
Survival_analysis
Monte_Carlo_method
Monte_Carlo_algorithm
Multinomial_logistic_regression
Lasso_(statistics)
Expectation–maximization_algorithm
Markov_chain_Monte_Carlo
Hidden_Markov_Models
Viterbi_algorithm
Convolutional_code
Forward–backward_algorithm
Markov_random_field
Mean_field_theory
Mean_field_particle_methods
CART
Decision_tree_learning
Decision_tree
Pruning_(decision_trees)
ID3_algorithm
C4.5_algorithm
Random_forest
Support_vector_machine
Support_vector_machine#Support_vector_clustering_.28SVC.29
Support_vector_machine#Regression
Conditional_random_field
Latent_semantic_analysis
Genetic_algorithm
Evolutionary_algorithm
Evolutionary_computation
Voronoi_diagram
Local_outlier_factor
Ordered_weighted_averaging_aggregation_operator
Support_vector_machine
Types_of_artificial_neural_networks
Comparison_of_deep_learning_software/Resources
Artificial_neural_network
Perceptron
Feedforward_neural_network
Multilayer_perceptron
Radial_basis_function_network
Long_short-term_memory
SNNS
Time_delay_neural_network
Recursive_neural_network
Recurrent_neural_network
Hopfield_network
Content-addressable_memory
Boltzmann_machine
Self-organizing_map
Learning_vector_quantization
Long_short-term_memory
Liquid_state_machine
Autoassociative_memory
Convolutional_neural_network
Autoencoder
Neuroevolution
Neuroevolution_of_augmenting_topologies
Deep_learning
Deep_learning#Deep_neural_network_architectures
Deep_belief_network
Generative_adversarial_networks
Neural_Turing_machine
Early_stopping
ADALINE
Memristor
Instantaneously_trained_neural_networks
Spiking_neural_network
Optical_character_recognition
Fuzzy_logic
Inference_engine
Fuzzy_logic
Type-2_fuzzy_sets_and_systems
T-norm_fuzzy_logics
Adaptive_neuro_fuzzy_inference_system
Fuzzy_control_system
Spatial_association


Ensemble Techniques
Ensemble_learning
Ensembles_of_classifiers
Ensemble_learning#Implementations_in_statistics_packages
Bootstrap_aggregating
Boosting_(machine_learning)
Gradient_boosting
Committee_machine
Applications
Bayesian_spam_filtering
Root_cause_analysis
Inpainting


Experimentation framework
Coding / Exposing API to the rest of the application
Microservices
BigData
Data_lake
Streaming_algorithm
Star_schema
OLAP_cube
Solid-state_drive
MongoDB
Apache_Hadoop https://hadoop.apache.org/
Apache_Flume http://flume.apache.org/
Apache_Hadoop#HDFS https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
Apache_HBase http://hbase.apache.org/
Apache_Hive https://hive.apache.org/
Sqoop http://sqoop.apache.org/
Apache_Avro http://avro.apache.org/
Apache_Kafka https://kafka.apache.org/
Apache_Spark https://spark.apache.org/
Apache_Flink http://flink.apache.org/
Apache_ZooKeeper http://zookeeper.apache.org/
Apache_Cassandra https://cassandra.apache.org
Ambari http://ambari.apache.org/
Apache_Oozie http://oozie.apache.org/
Pig_(programming_tool) https://pig.apache.org/
Apache_Mahout http://mahout.apache.org/
Apache_SystemML http://systemml.apache.org/
Apache_Lucene
Elasticsearch https://www.elastic.co/
Kibana https://www.elastic.co/products/kibana
Small_data


Multi-Agent Systems
Agent-based_model
Multi-agent_system
Agent-oriented_software_engineering

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.47.7968&rep=rep1&type=pdf [YDemazeau: Vowels Methodology]

Ant_colony_optimization_algorithms


Quantum Machine Learning
Quantum_machine_learning
Quantum_tunnelling
Quantum_annealing
Adiabatic_quantum_computation


Resources
Books
  https://github.com/janishar/mit-deep-learning-book-pdf
  https://web.stanford.edu/~hastie/ElemStatLearn/printings/ESLII_print10.pdf
  http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing.pdf
  http://infolab.stanford.edu/~ullman/mmds/booka.pdf
  http://www.guidetodatamining.com/assets/guideChapters/Guide2DataMining.pdf
  https://github.com/ajaymache/machine-learning-yearning
News/Blogs/RSS
Podcasts
YT Channels
MOOCs
Jobs
Teaching

http://edison-project.eu/edison/edison-data-science-framework-edsf

Curated list of similar pages

https://github.com/search?utf8=%E2%9C%93&q=curated+list+awesome+frameworks&type= https://github.com/josephmisiti/awesome-machine-learning https://github.com/onurakpolat/awesome-bigdata https://github.com/onurakpolat/awesome-analytics https://github.com/analyticalmonk/awesome-neuroscience https://github.com/igorbarinov/awesome-data-engineering https://github.com/quantmind/awesome-data-science-viz https://github.com/fasouto/awesome-dataviz https://github.com/qinwf/awesome-R https://github.com/datascience-python/awesome-datascience-python https://github.com/caesar0301/awesome-public-datasets


Retrieved from "https://en.wikipedia.org/w/index.php?title=User:LI_AR/Books/Cracking_the_DataScience_Interview&oldid=986052775"

Category: 
User namespace book pages
 



This page was last edited on 29 October 2020, at 14:34 (UTC).

Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



Privacy policy

About Wikipedia

Disclaimers

Contact Wikipedia

Code of Conduct

Developers

Statistics

Cookie statement

Mobile view



Wikimedia Foundation
Powered by MediaWiki